kubernetes Resource Isolation - 11. Kubernetes Performance Tuning Playbooks

October 15, 2025 4 minute read

Segment 11 is where we turn everything from the previous deep dives into practical tuning playbooks for real workloads. These are the exact patterns used by:

AI/ML infra teams
FinTech low-latency clusters
Telco NFV teams
High-throughput data platforms
Large enterprise Kubernetes platforms (AKS/EKS/GKE)

We’ll cover CPU, Memory, GC, NUMA, CFS throttling, working set behavior, eviction safety, request/limit design, and more per workload.

SEGMENT 11 — Kubernetes Performance Tuning Playbooks

We will produce 8 workload-specific tuning playbooks:

Java services (Spring Boot / Micronaut / Kafka / Pega / JVM-based apps)
Go microservices (Envoy, API, controller workloads)
Node.js / Python microservices
High-performance Redis / Memcached / in-memory DBs
AI/ML inference workloads (TensorRT, ONNX, PyTorch Serve)
Dataplane workloads (Envoy, NGINX, Cilium agent, DPDK, NFV)
Databases (Postgres, MySQL, Elasticsearch, Cassandra)
Batch/ETL (Spark, Flink, Ray)

Let’s go through each one with recommended CPU/memory patterns, cgroup settings, kubelet implications, GC tuning, and best practices.

PLAYBOOK 1 — Java Applications

(Java is the #1 source of Kubernetes performance issues.)

What to expect from Java:

Large and bursty allocations
High thread count
Off-heap usage (direct buffers, metaspace)
Page cache usage (class loading)
Predictable memory spikes during GC

CPU Tuning

Never set CPU limit unless required CPU limit introduces throttling → GC pauses → latency spikes Use:

requests.cpu = <expected>
limits.cpu = none

If latency-sensitive: Enable CPU pinning:

Guaranteed QoS (requests=limits)
integer CPUs (2,4,8)
cpuManagerPolicy: static

Memory Tuning

Set limit higher than heap: Example:

heap = 3Gi
limit = 4Gi

Tune:

-XX:+UseContainerSupport
-XX:MaxRAMPercentage=70
-XX:InitialRAMPercentage=70

Tune metaspace:

-XX:MaxMetaspaceSize=512m

Tune thread stack:

-Xss512k

MemoryQoS

Enable MemoryQoS:

memory.min = request
memory.high = limit x 0.9 Prevents sudden OOM.

Pod configuration

requests:
  cpu: "1"
  memory: "3Gi"
limits:
  memory: "4Gi"
  cpu: "2" (optional)

PLAYBOOK 2 — Go Microservices

What to expect

Very efficient CPU usage
Low memory footprint
But high concurrency may need CPU cycles
GC pauses rare but CPU-intensive under load

CPU Tuning

Remove CPU limit Throttle causes significant latency spikes in high qps load.

requests.cpu = N
limits.cpu = none

For HFT or low-latency:

Pin 1 dedicated CPU
Static CPU Manager
single-numa-node

Memory Tuning

Go apps rarely exceed heap unless misconfigured
Set memory limit ≈ 2x expected RSS Example: If app uses 400Mi:

requests.memory = 400Mi
limits.memory = 800Mi

GOMAXPROCS

Use:

runtime.GOMAXPROCS = cpu_count

Go 1.19+ auto-detects cgroup CPU quota.

PLAYBOOK 3 — Node.js & Python Microservices

Node.js

Single-threaded by default
Sensitive to CPU throttling
Memory usage often unstable

Best patterns:

Do NOT set CPU limit
Set memory limit ≈ 2–3x heap
Scale horizontally

Python

GIL → only one thread runs Python bytecode at a time
Heavy on page cache for unpickling/ML models

Best patterns:

Do not set CPU limit
Favor more CPU requests for concurrency
Use MemoryQoS to prevent page cache starvation

PLAYBOOK 4 — Redis / Memcached / In-memory data stores

Characteristics:

Extremely sensitive to CPU jitter
Memory footprint equals data size
Must avoid page cache interference
Single-threaded or few-threaded

CPU Tuning

DO THIS:

Use Guaranteed QoS:

requests.cpu = 2
limits.cpu   = 2

CPU pinning is critical:

cpuManagerPolicy: static
topologyManagerPolicy: restricted or single-numa-node

Memory Tuning

Memory.limit must include:

object overhead
fragmentation
AOF buffers
replication buffers

Recommended:

limit = dataset_size * 1.3

Disable overcommit for Redis:

vm.overcommit_memory = 1

PLAYBOOK 5 — AI/ML Inference Workloads

Characteristics:

Spiky memory and page cache
NUMA-sensitive
GPU memory bottlenecks
High CPU for preprocessing

CPU Tuning:

Use integer CPU Guaranteed pods:

requests.cpu=4
limits.cpu=4

Enable CPUManager:

cpuManagerPolicy=static

NUMA Tuning:

Enable:

topologyManagerPolicy=single-numa-node

Ensures:

CPU
GPU
HugePages all come from same NUMA socket → 20–40% speedup.

Memory Tuning:

ML models produce:

page cache pressure
pinned memory
large temporary tensors

Set:

limit = expected_peak * 1.4

Enable MemoryQoS.

GPU Tuning:

Node should have:

MIG profiles (NVIDIA)
fixed GPU memory budgets
exclusive compute setting

PLAYBOOK 6 — Dataplane Agents (Envoy, Cilium agent, NGINX)

Characteristics:

Hot code paths
Extremely latency-sensitive
Should NEVER be throttled
High memory for buffers
NUMA-sensitive

CPU Tuning:

Absolute must:

requests.cpu = 2
limits.cpu = none

or Guaranteed integer CPU with static policy.

For serious performance:

Pin to CPU cores in NUMA socket
Reserve core exclusively

Memory:

Set moderately high memory limit (buffer heavy)

requests.memory = 1Gi
limits.memory   = 2Gi

PLAYBOOK 7 — Databases (Postgres, MySQL, Elasticsearch, Cassandra)

Common issues:

page cache interactions
fsync stalls
stack overflow on huge queries
JVM (ES) GC

CPU:

Databases need stable CPU, but throttling is okay.

Use:

requests.cpu = moderate
limits.cpu = moderate

Memory:

Always leave headroom for:

page cache
background processes

For Postgres:

shared_buffers ≈ 25% memory
effective_cache_size ≈ 60%

For Elasticsearch:

heap = 50% memory
limit = heap * 1.3
MemoryQoS recommended

PLAYBOOK 8 — Batch/ETL (Spark, Flink, Ray)

Characteristics:

Heavy IO
heavy page cache
transient memory spikes
multiple executors

CPU:

Executors need large CPU but not strict latency.

Use limits:

requests.cpu = 1
limits.cpu = 4

Memory:

Executors have:

heap
off-heap
shuffle buffers
page cache

Set:

limit = executor_memory * 1.5
requests.memory = executor_memory

MemoryQoS strongly recommended.

GLOBAL BEST PRACTICES ACROSS ALL WORKLOADS

CPU

Do NOT set CPU limits unless required
Always set requests
Use CPUManager + integer CPUs for low latency workloads

Memory

Always set memory limits
MemoryQoS eliminates sudden OOM kills
Overcommit memory very cautiously

QoS

Avoid BestEffort
Use Burstable for general workloads
Use Guaranteed ONLY for latency-sensitive workloads

Eviction

Tune eviction thresholds
Use kube/system reserved memory

Node Selection

NUMA-aware workload placement
Local SSD for I/O heavy workloads

SEGMENT 11 SUMMARY

You now have workload-specific tuning playbooks for:

Java
Go
Node/Python
Redis
AI/ML
Dataplane agents
Databases
Batch/ETL

Each includes:

CPU patterns
Memory patterns
NUMA rules
CFS throttling guidance
GC tuning
MemoryQoS recommendations

This is the actionable knowledge used by senior Kubernetes performance engineers.

Share on

Twitter Facebook Reddit LinkedIn Mastodon

Maung San

SEGMENT 11 — Kubernetes Performance Tuning Playbooks

PLAYBOOK 1 — Java Applications

What to expect from Java:

CPU Tuning

Memory Tuning

MemoryQoS

Pod configuration

PLAYBOOK 2 — Go Microservices

What to expect

CPU Tuning

Memory Tuning

GOMAXPROCS

PLAYBOOK 3 — Node.js & Python Microservices

Node.js

Best patterns:

Python

Best patterns:

PLAYBOOK 4 — Redis / Memcached / In-memory data stores

Characteristics:

CPU Tuning

DO THIS:

Memory Tuning

Disable overcommit for Redis:

PLAYBOOK 5 — AI/ML Inference Workloads

Characteristics:

CPU Tuning:

NUMA Tuning:

Memory Tuning:

GPU Tuning:

PLAYBOOK 6 — Dataplane Agents (Envoy, Cilium agent, NGINX)

Characteristics:

CPU Tuning:

Absolute must:

For serious performance:

Memory:

PLAYBOOK 7 — Databases (Postgres, MySQL, Elasticsearch, Cassandra)

Common issues:

CPU:

Memory:

PLAYBOOK 8 — Batch/ETL (Spark, Flink, Ray)

Characteristics:

CPU:

Memory:

GLOBAL BEST PRACTICES ACROSS ALL WORKLOADS

CPU

Memory

QoS

Eviction

Node Selection

SEGMENT 11 SUMMARY

Share on

Leave a comment

You may also enjoy

DevOps Quick Read - How to Read a Packer Template in 60 Seconds

kubernetes Resource Isolation - 14. A catalog of **cluster design patterns

kubernetes Resource Isolation - 13. Production-ready node & kubelet blueprint

kubernetes Resource Isolation - 12. Ultimate Node Sizing Guide for AKS, EKS, and GKE