kubernetes Resource Isolation - 04. MEMORY ISOLATION IN KUBERNETES USING CGROUPS

October 05, 2025  4 minute read  

Segment 4 is the most important one, because memory isolation is the mechanism that actually kills containers, unlike CPU which is elastic.

We’ll go deep into how Kubernetes + cgroups enforce memory isolation, how OOM decisions are made, and how eviction works.

SEGMENT 4 — MEMORY ISOLATION IN KUBERNETES USING CGROUPS

Structure of this segment:

  1. Memory basics: how cgroups control memory
  2. Memory limits (hard cap): memory.max / memory.limit_in_bytes
  3. Memory requests & QoS effects
  4. What actually happens when memory exceeds limit (OOM)
  5. Pod vs container memory enforcement
  6. Kubelet eviction (soft/hard thresholds)
  7. New Memory QoS (memory.min, memory.high)
  8. Noisy neighbor behavior
  9. Best practices for memory-heavy workloads

PART 1 — How cgroups enforce memory

Kubernetes (via the container runtime) relies on the Linux memory controller.

On cgroup v1 (older systems), files include:

  • memory.limit_in_bytes
  • memory.usage_in_bytes
  • memory.stat
  • memory.oom_control

On cgroup v2 (modern):

  • memory.max (hard limit)
  • memory.high (throttling / slow reclaim)
  • memory.current
  • memory.stat
  • memory.min (guaranteed allocation)

cgroup v2 is much more powerful and allows Kubernetes MemoryQoS.


PART 2 — Memory Limits = Hard Cap

Kubernetes memory limit:

resources:
  limits:
    memory: "1Gi"

cgroup v1:

memory.limit_in_bytes = 1073741824

cgroup v2:

memory.max = 1073741824

This is an absolute, non-negotiable cap.

If memory usage hits the limit → kernel kills a process inside the cgroup → “OOMKilled”.

This happens immediately and does not depend on QoS.

Hitting the memory limit kills:

  • that container, not the Pod
  • unless Pod-level cgroup limit is hit → kills container inside that pod

PART 3 — Memory Requests → Scheduling + QoS Only

Unlike CPU:

Memory requests DO NOT create any memory control in cgroups (except when MemoryQoS is enabled)

Memory requests influence:

  • Pod placement (scheduler)
  • QoS class
  • Kubelet eviction priority
  • MemoryQoS (optional, creates memory.min / memory.high)

Requests DO NOT:

  • limit container memory
  • reserve physical memory
  • guarantee the Pod will not be OOMKilled

This surprises many engineers.


PART 4 — What Happens When Memory Is Exceeded

This is the key logic:

1. Container starts using memory

→ memory.current grows

2. When it approaches limit

→ kernel starts reclaiming pages inside the cgroup

3. When it hits memory.max

→ kernel picks a process in the cgroup and kills it → kubelet reports: OOMKilled

4. Container restarts (based on restartPolicy)

Who decides what to kill?

  • Linux kernel’s OOM killer
  • Based on:

    • badness score
    • memory consumption
    • oom_score_adj

Kubernetes does not choose which process dies.


PART 5 — Pod-Level vs Container-Level Memory Enforcement

Kubernetes creates:

Pod cgroup:

memory.max = sum(all container limits)

Container cgroup:

memory.max = per-container limit

This gives two failure modes:

1. Container hits its own memory limit → container OOMKill

  • Common case

2. Pod hits its aggregated limit → one container chosen to die

  • Rare but happens when:

    • sidecars + main container exceed pod sum
    • multi-container pods share memory under the Pod cgroup

PART 6 — Kubelet Eviction (System-Level Memory Pressure)

Even if Pods haven’t exceeded their own limits, the node can become memory pressured.

Kubelet periodically checks:

Hard eviction:

--eviction-hard=memory.available<500Mi

Kubelet immediately evicts Pods to recover memory.

Soft eviction:

--eviction-soft=memory.available<1Gi
--eviction-soft-grace-period=30s

Eviction order:

  1. BestEffort pods killed first
  2. Burstable pods with biggest usage surplus
  3. Guaranteed pods last

This behavior is independent of container limits.


PART 7 — MemoryQoS (New Feature)

Enabled with:

--memory-qos=true

This introduces new cgroup settings:

memory.min = reserved memory (based on request)

Guarantees the Pod gets at least this much before being reclaimed.

memory.high = throttling boundary (based on limit)

Above this, kernel applies slowdown instead of OOM.

MemoryQoS dramatically improves:

  • Java applications
  • Node.js apps
  • Any app with bursty allocations

Without MemoryQoS, memory.limit is a hard cliff.

With MemoryQoS, behavior is smoother and fairer.


PART 8 — Noisy Neighbor Behavior

Memory is not compressible, so isolation is strict:

Scenario A — Pod with limit 512Mi tries to use 600Mi

→ immediate OOM

Scenario B — Pod with no limit (BestEffort)

  • Can use entire node memory
  • but will be the first killed during pressure

Scenario C — Two Burstable pods, one spikes

Pod A:

request=100Mi limit=1Gi

Pod B:

request=200Mi limit=1Gi

If Node memory is pressured:

  • B has a higher request → slightly more protection
  • A likely evicted earlier

Scenario D — Java apps

  • JVM will allocate heap close to container limit
  • Memory limit must include:

    • heap
    • metaspace
    • off-heap buffers
    • thread stacks
    • long-lived caches

Otherwise → frequent OOMKills.


PART 9 — Memory Best Practices

1. Always set memory limits for production

Otherwise:

  • A single pod can take down the node
  • BestEffort class gets worst scheduling & eviction priority

2. Memory limit > memory request

This enables efficient bin-packing + burst ability.

3. Avoid equal request/limit unless you want Guaranteed QoS

Guaranteed = best stability But limits burst room.

4. For Java apps

Set:

  • -Xmx = (limit - buffer)
  • Use MemoryQoS if possible
  • Avoid limit too close to Xmx

5. Do not rely solely on requests for memory protection

Requests don’t cap actual usage.

6. Tune kubelet eviction thresholds

Default values can be too aggressive or too lenient depending on node density.


Segment 4 Summary

You now fully understand:

1. Memory limits

  • Enforced by kernel
  • Hard cap → immediate OOM

2. Memory requests

  • Influence scheduling & QoS
  • Do NOT limit memory usage

3. Pod vs container memory

  • Pod sum limit
  • Container individual limit

4. Kubelet eviction

  • Kills Pods to save node
  • BestEffort → Burstable → Guaranteed priority

5. MemoryQoS

  • Introduces memory.min & memory.high
  • Smooths out memory behavior
  • Great for Java & bursty workloads

6. Noisy neighbor behavior

  • Memory isolation = strict
  • CPU isolation = soft sharing

Leave a comment