kubernetes Resource Isolation - 04. MEMORY ISOLATION IN KUBERNETES USING CGROUPS

October 05, 2025 4 minute read

Segment 4 is the most important one, because memory isolation is the mechanism that actually kills containers, unlike CPU which is elastic.

We’ll go deep into how Kubernetes + cgroups enforce memory isolation, how OOM decisions are made, and how eviction works.

SEGMENT 4 — MEMORY ISOLATION IN KUBERNETES USING CGROUPS

Structure of this segment:

Memory basics: how cgroups control memory
Memory limits (hard cap): memory.max / memory.limit_in_bytes
Memory requests & QoS effects
What actually happens when memory exceeds limit (OOM)
Pod vs container memory enforcement
Kubelet eviction (soft/hard thresholds)
New Memory QoS (memory.min, memory.high)
Noisy neighbor behavior
Best practices for memory-heavy workloads

PART 1 — How cgroups enforce memory

Kubernetes (via the container runtime) relies on the Linux memory controller.

On cgroup v1 (older systems), files include:

memory.limit_in_bytes
memory.usage_in_bytes
memory.stat
memory.oom_control

On cgroup v2 (modern):

memory.max (hard limit)
memory.high (throttling / slow reclaim)
memory.current
memory.stat
memory.min (guaranteed allocation)

cgroup v2 is much more powerful and allows Kubernetes MemoryQoS.

PART 2 — Memory Limits = Hard Cap

Kubernetes memory limit:

resources:
  limits:
    memory: "1Gi"

cgroup v1:

memory.limit_in_bytes = 1073741824

cgroup v2:

memory.max = 1073741824

This is an absolute, non-negotiable cap.

If memory usage hits the limit → kernel kills a process inside the cgroup → “OOMKilled”.

This happens immediately and does not depend on QoS.

Hitting the memory limit kills:

that container, not the Pod
unless Pod-level cgroup limit is hit → kills container inside that pod

PART 3 — Memory Requests → Scheduling + QoS Only

Unlike CPU:

Memory requests DO NOT create any memory control in cgroups (except when MemoryQoS is enabled)

Memory requests influence:

Pod placement (scheduler)
QoS class
Kubelet eviction priority
MemoryQoS (optional, creates memory.min / memory.high)

Requests DO NOT:

limit container memory
reserve physical memory
guarantee the Pod will not be OOMKilled

This surprises many engineers.

PART 4 — What Happens When Memory Is Exceeded

This is the key logic:

1. Container starts using memory

→ memory.current grows

2. When it approaches limit

→ kernel starts reclaiming pages inside the cgroup

3. When it hits `memory.max`

→ kernel picks a process in the cgroup and kills it → kubelet reports: OOMKilled

4. Container restarts (based on restartPolicy)

Who decides what to kill?

Linux kernel’s OOM killer
Based on:
- badness score
- memory consumption
- oom_score_adj

Kubernetes does not choose which process dies.

PART 5 — Pod-Level vs Container-Level Memory Enforcement

Kubernetes creates:

Pod cgroup:

memory.max = sum(all container limits)

Container cgroup:

memory.max = per-container limit

This gives two failure modes:

1. Container hits its own memory limit → container OOMKill

Common case

2. Pod hits its aggregated limit → one container chosen to die

Rare but happens when:
- sidecars + main container exceed pod sum
- multi-container pods share memory under the Pod cgroup

PART 6 — Kubelet Eviction (System-Level Memory Pressure)

Even if Pods haven’t exceeded their own limits, the node can become memory pressured.

Kubelet periodically checks:

Hard eviction:

--eviction-hard=memory.available<500Mi

Kubelet immediately evicts Pods to recover memory.

Soft eviction:

--eviction-soft=memory.available<1Gi
--eviction-soft-grace-period=30s

Eviction order:

BestEffort pods killed first
Burstable pods with biggest usage surplus
Guaranteed pods last

This behavior is independent of container limits.

PART 7 — MemoryQoS (New Feature)

Enabled with:

--memory-qos=true

This introduces new cgroup settings:

memory.min = reserved memory (based on request)

Guarantees the Pod gets at least this much before being reclaimed.

memory.high = throttling boundary (based on limit)

Above this, kernel applies slowdown instead of OOM.

MemoryQoS dramatically improves:

Java applications
Node.js apps
Any app with bursty allocations

Without MemoryQoS, memory.limit is a hard cliff.

With MemoryQoS, behavior is smoother and fairer.

PART 8 — Noisy Neighbor Behavior

Memory is not compressible, so isolation is strict:

Scenario A — Pod with limit 512Mi tries to use 600Mi

→ immediate OOM

Scenario B — Pod with no limit (BestEffort)

Can use entire node memory
but will be the first killed during pressure

Scenario C — Two Burstable pods, one spikes

Pod A:

request=100Mi limit=1Gi

Pod B:

request=200Mi limit=1Gi

If Node memory is pressured:

B has a higher request → slightly more protection
A likely evicted earlier

Scenario D — Java apps

JVM will allocate heap close to container limit
Memory limit must include:
- heap
- metaspace
- off-heap buffers
- thread stacks
- long-lived caches

Otherwise → frequent OOMKills.

PART 9 — Memory Best Practices

1. Always set memory limits for production

Otherwise:

A single pod can take down the node
BestEffort class gets worst scheduling & eviction priority

2. Memory limit > memory request

This enables efficient bin-packing + burst ability.

3. Avoid equal request/limit unless you want Guaranteed QoS

Guaranteed = best stability But limits burst room.

4. For Java apps

Set:

-Xmx = (limit - buffer)
Use MemoryQoS if possible
Avoid limit too close to Xmx

5. Do not rely solely on requests for memory protection

Requests don’t cap actual usage.

6. Tune kubelet eviction thresholds

Default values can be too aggressive or too lenient depending on node density.

Segment 4 Summary

You now fully understand:

1. Memory limits

Enforced by kernel
Hard cap → immediate OOM

2. Memory requests

Influence scheduling & QoS
Do NOT limit memory usage

3. Pod vs container memory

Pod sum limit
Container individual limit

4. Kubelet eviction

Kills Pods to save node
BestEffort → Burstable → Guaranteed priority

5. MemoryQoS

Introduces memory.min & memory.high
Smooths out memory behavior
Great for Java & bursty workloads

6. Noisy neighbor behavior

Memory isolation = strict
CPU isolation = soft sharing

Share on

Twitter Facebook Reddit LinkedIn Mastodon

Maung San