kubernetes Resource Isolation - 04. MEMORY ISOLATION IN KUBERNETES USING CGROUPS
Segment 4 is the most important one, because memory isolation is the mechanism that actually kills containers, unlike CPU which is elastic.
We’ll go deep into how Kubernetes + cgroups enforce memory isolation, how OOM decisions are made, and how eviction works.
SEGMENT 4 — MEMORY ISOLATION IN KUBERNETES USING CGROUPS
Structure of this segment:
- Memory basics: how cgroups control memory
- Memory limits (hard cap):
memory.max/memory.limit_in_bytes - Memory requests & QoS effects
- What actually happens when memory exceeds limit (OOM)
- Pod vs container memory enforcement
- Kubelet eviction (soft/hard thresholds)
- New Memory QoS (
memory.min,memory.high) - Noisy neighbor behavior
- Best practices for memory-heavy workloads
PART 1 — How cgroups enforce memory
Kubernetes (via the container runtime) relies on the Linux memory controller.
On cgroup v1 (older systems), files include:
memory.limit_in_bytesmemory.usage_in_bytesmemory.statmemory.oom_control
On cgroup v2 (modern):
memory.max(hard limit)memory.high(throttling / slow reclaim)memory.currentmemory.statmemory.min(guaranteed allocation)
cgroup v2 is much more powerful and allows Kubernetes MemoryQoS.
PART 2 — Memory Limits = Hard Cap
Kubernetes memory limit:
resources:
limits:
memory: "1Gi"
cgroup v1:
memory.limit_in_bytes = 1073741824
cgroup v2:
memory.max = 1073741824
This is an absolute, non-negotiable cap.
If memory usage hits the limit → kernel kills a process inside the cgroup → “OOMKilled”.
This happens immediately and does not depend on QoS.
Hitting the memory limit kills:
- that container, not the Pod
- unless Pod-level cgroup limit is hit → kills container inside that pod
PART 3 — Memory Requests → Scheduling + QoS Only
Unlike CPU:
Memory requests DO NOT create any memory control in cgroups (except when MemoryQoS is enabled)
Memory requests influence:
- Pod placement (scheduler)
- QoS class
- Kubelet eviction priority
- MemoryQoS (optional, creates memory.min / memory.high)
Requests DO NOT:
- limit container memory
- reserve physical memory
- guarantee the Pod will not be OOMKilled
This surprises many engineers.
PART 4 — What Happens When Memory Is Exceeded
This is the key logic:
1. Container starts using memory
→ memory.current grows
2. When it approaches limit
→ kernel starts reclaiming pages inside the cgroup
3. When it hits memory.max
→ kernel picks a process in the cgroup and kills it
→ kubelet reports: OOMKilled
4. Container restarts (based on restartPolicy)
Who decides what to kill?
- Linux kernel’s OOM killer
-
Based on:
- badness score
- memory consumption
- oom_score_adj
Kubernetes does not choose which process dies.
PART 5 — Pod-Level vs Container-Level Memory Enforcement
Kubernetes creates:
Pod cgroup:
memory.max = sum(all container limits)
Container cgroup:
memory.max = per-container limit
This gives two failure modes:
1. Container hits its own memory limit → container OOMKill
- Common case
2. Pod hits its aggregated limit → one container chosen to die
-
Rare but happens when:
- sidecars + main container exceed pod sum
- multi-container pods share memory under the Pod cgroup
PART 6 — Kubelet Eviction (System-Level Memory Pressure)
Even if Pods haven’t exceeded their own limits, the node can become memory pressured.
Kubelet periodically checks:
Hard eviction:
--eviction-hard=memory.available<500Mi
Kubelet immediately evicts Pods to recover memory.
Soft eviction:
--eviction-soft=memory.available<1Gi
--eviction-soft-grace-period=30s
Eviction order:
- BestEffort pods killed first
- Burstable pods with biggest usage surplus
- Guaranteed pods last
This behavior is independent of container limits.
PART 7 — MemoryQoS (New Feature)
Enabled with:
--memory-qos=true
This introduces new cgroup settings:
memory.min = reserved memory (based on request)
Guarantees the Pod gets at least this much before being reclaimed.
memory.high = throttling boundary (based on limit)
Above this, kernel applies slowdown instead of OOM.
MemoryQoS dramatically improves:
- Java applications
- Node.js apps
- Any app with bursty allocations
Without MemoryQoS, memory.limit is a hard cliff.
With MemoryQoS, behavior is smoother and fairer.
PART 8 — Noisy Neighbor Behavior
Memory is not compressible, so isolation is strict:
Scenario A — Pod with limit 512Mi tries to use 600Mi
→ immediate OOM
Scenario B — Pod with no limit (BestEffort)
- Can use entire node memory
- but will be the first killed during pressure
Scenario C — Two Burstable pods, one spikes
Pod A:
request=100Mi limit=1Gi
Pod B:
request=200Mi limit=1Gi
If Node memory is pressured:
- B has a higher request → slightly more protection
- A likely evicted earlier
Scenario D — Java apps
- JVM will allocate heap close to container limit
-
Memory limit must include:
- heap
- metaspace
- off-heap buffers
- thread stacks
- long-lived caches
Otherwise → frequent OOMKills.
PART 9 — Memory Best Practices
1. Always set memory limits for production
Otherwise:
- A single pod can take down the node
- BestEffort class gets worst scheduling & eviction priority
2. Memory limit > memory request
This enables efficient bin-packing + burst ability.
3. Avoid equal request/limit unless you want Guaranteed QoS
Guaranteed = best stability But limits burst room.
4. For Java apps
Set:
-Xmx= (limit - buffer)- Use MemoryQoS if possible
- Avoid limit too close to Xmx
5. Do not rely solely on requests for memory protection
Requests don’t cap actual usage.
6. Tune kubelet eviction thresholds
Default values can be too aggressive or too lenient depending on node density.
Segment 4 Summary
You now fully understand:
1. Memory limits
- Enforced by kernel
- Hard cap → immediate OOM
2. Memory requests
- Influence scheduling & QoS
- Do NOT limit memory usage
3. Pod vs container memory
- Pod sum limit
- Container individual limit
4. Kubelet eviction
- Kills Pods to save node
- BestEffort → Burstable → Guaranteed priority
5. MemoryQoS
- Introduces memory.min & memory.high
- Smooths out memory behavior
- Great for Java & bursty workloads
6. Noisy neighbor behavior
- Memory isolation = strict
- CPU isolation = soft sharing
Leave a comment