kubernetes Resource Isolation - 02. From PodSpec → QoS → cgroup layout → actual cgroup settings

October 04, 2025 3 minute read

Part 1 — How Kubernetes Determines QoS Class

Kubernetes evaluates containers in a Pod and then places the Pod as a whole into one of three QoS buckets:

1. Guaranteed

A Pod is Guaranteed if every container:

has memory limit = memory request
has cpu limit = cpu request
AND all containers have these values set

Example:

resources:
  requests:
    cpu: "500m"
    memory: "512Mi"
  limits:
    cpu: "500m"
    memory: "512Mi"

Also allowed:

cpu: 1  # limit=request
memory: 1Gi

2. Burstable

A Pod is Burstable if:

it has some requests set but
NOT everyone has limit=request OR
at least one container has requests but no limits

Example:

resources:
  requests:
    cpu: "200m"
    memory: "512Mi"

(no limits → burstable)

3. BestEffort

A Pod is BestEffort if:

No container sets ANY requests or limits for CPU or memory.

Example:

resources: {}

Part 2 — How QoS Determines cgroup Placement

On systemd+cgroupv2 (modern systems):

/sys/fs/cgroup/
  kubepods.slice/
    kubepods-guaranteed.slice/
    kubepods-burstable.slice/
    kubepods-besteffort.slice/

Inside each class, you get one cgroup per pod, then one cgroup per container.

Full example path:

/sys/fs/cgroup/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod<UID>.slice/cri-containerd-<containerid>.scope

On cgroupfs (legacy):

/sys/fs/cgroup/cpu/kubepods/burstable/pod<UID>/<container-id>/

Why this matters:

Each QoS class has different baseline protections and different CPU/memory behaviors.
Guaranteed pods get best isolation.
BestEffort pods are always the first to be killed under pressure.

Part 3 — How Requests & Limits Map to cgroup Controller Settings

Let’s break it down by resource:

CPU Mapping

Requests → cpu.shares

For cgroup v1:

cpu.shares = request_cpu * 1024

Request: 100m → shares = 102 (rounded up)
Request: 1 CPU → shares = 1024

For cgroup v2:

cpu.weight = a value from 1–10000 (kubelet maps shares → weight internally)

Limits → cpu.cfs_quota_us / cpu.cfs_period_us

Defaults:

cpu.cfs_period_us = 100000 (100ms)

If limit = 2 CPUs:

cpu.cfs_quota_us = 200000 (200ms)

This enforces hard max CPU.

If no CPU limit → Pod can burst to entire node.

Memory Mapping

Memory Limit → memory.max (or limit_in_bytes)

For cgroup v2:

memory.max = <limit bytes>

If memory limit is 1Gi:

memory.max = 1073741824

Hitting this causes:

kernel OOM inside the cgroup
Kubernetes sees container OOMKilled

Memory Request

This is NOT translated to any cgroup value on traditional setups.

However:

Memory Request affects:

QoS classification
scheduler bin-packing
eviction ordering
new kubelet memory QoS feature (v1.22+)
- maps request → memory.min
- maps limit → memory.high

Part 4 — cgroup settings per QoS class

QoS Class	CPU Behavior	Memory Behavior	Eviction Priority	Typical Use
Guaranteed	Strong isolation (quota + shares)	Hard memory limit; highest protection	Last to be evicted	Critical workloads
Burstable	Shared CPU; may be throttled	Can burst up to limit; request gives some protection	Evicted after BestEffort	Most apps
BestEffort	Lowest CPU share	No memory limit → can use all memory, but first killed	First to be evicted	Non-critical, debug jobs

Part 5 — Pod-level vs Container-level Enforcement

Kubernetes enforces limits at:

1. Container level

Hard memory limit → container cannot exceed Hard CPU limit → container cannot exceed

2. Pod-level

Memory:

Pod gets a cgroup with memory.max where:

pod_memory_limit = sum(container_limits)

CPU:

Quota is applied to each container, not the whole Pod (historical reason)
But you can enable pod-level CPU cgroups (kubelet --cpu-cfs-quota + feature gates)

When Pod-level CPU accounting is enabled, you get:

cpu.max at the Pod cgroup

Part 6 — Node Allocatable & How Pods Fit Into the Node’s Hierarchy

Before a Pod gets placed, kubelet ensures there is enough room based on:

Node Capacity
- kube-reserved
- system-reserved
- eviction-hard margins
= Node Allocatable

Only Node Allocatable is schedulable to Pods.

This prevents user pods from starving system components.

Node-level cgroups:

/sys/fs/cgroup/system.slice/     → OS daemons
/sys/fs/cgroup/kubelet.slice/    → kubelet itself
/sys/fs/cgroup/kubepods.slice/   → all pods

These are set by systemd.

Part 7 — Putting It All Together (Example)

Example PodSpec:

apiVersion: v1
kind: Pod
spec:
  containers:
  - name: api
    resources:
      requests:
        cpu: "500m"
        memory: "512Mi"
      limits:
        cpu: "2"
        memory: "1Gi"

Result:

limit ≠ request → Burstable
Pod cgroup under:
```
kubepods-burstable.slice
```

CPU:

shares = 0.5 CPU * 1024 = 512 quota = 2 CPUs → 200000 period = 100000

Memory:

memory.max = 1Gi No memory.min unless MemoryQoS is enabled.

Conclusion of Segment 2

After Segment 2, you should have a clear mental model of:

How requests/limits classify the Pod
How QoS maps to cgroup hierarchies
How CPU/memory settings become real cgroup controller values
Pod vs container vs node-level cgroups

Share on

Twitter Facebook Reddit LinkedIn Mastodon

Maung San

kubernetes Resource Isolation - 02. From PodSpec → QoS → cgroup layout → actual cgroup settings

Part 1 — How Kubernetes Determines QoS Class

1. Guaranteed

2. Burstable

3. BestEffort

Part 2 — How QoS Determines cgroup Placement

On systemd+cgroupv2 (modern systems):

On cgroupfs (legacy):

Why this matters:

Part 3 — How Requests & Limits Map to cgroup Controller Settings

CPU Mapping

Requests → cpu.shares

Limits → cpu.cfs_quota_us / cpu.cfs_period_us

Memory Mapping

Memory Limit → memory.max (or limit_in_bytes)

Memory Request

Memory Request affects:

Part 4 — cgroup settings per QoS class

Part 5 — Pod-level vs Container-level Enforcement

Kubernetes enforces limits at:

1. Container level

2. Pod-level

Part 6 — Node Allocatable & How Pods Fit Into the Node’s Hierarchy

Part 7 — Putting It All Together (Example)

Example PodSpec:

Result:

CPU:

Memory:

Conclusion of Segment 2

Pod vs container vs node-level cgroups

Share on

Leave a comment

You may also enjoy

DevOps Quick Read - How to Read a Packer Template in 60 Seconds

kubernetes Resource Isolation - 14. A catalog of **cluster design patterns

kubernetes Resource Isolation - 13. Production-ready node & kubelet blueprint

kubernetes Resource Isolation - 12. Ultimate Node Sizing Guide for AKS, EKS, and GKE