kubernetes Resource Isolation - 14. A catalog of **cluster design patterns
Segment 14 as a catalog of cluster design patterns you can combine:
- How to slice the cluster into node pools
- How to slice workloads via namespaces, tenants, and QoS
- How to use taints/tolerations, priority classes, PDBs, and topology to control behavior
- When to make more clusters vs fewer clusters
I’ll keep each pattern fairly tight so you can remix them.
1. Node Pool Segmentation Patterns
1.1 General vs Specialized Pools
Pattern:
- general-pool for 80–90% of workloads
-
One or more specialized pools:
perf(CPUManager, TopologyManager)gpubatchdborstateful
Mechanics:
-
Labels:
kubectl label node node-1 node-pool=general kubectl label node node-2 node-pool=perf -
Taints on special pools:
kubectl taint node node-2 perf-only=true:NoSchedule -
Workload spec:
nodeSelector: node-pool: perf tolerations: - key: "perf-only" operator: "Exists" effect: "NoSchedule"
When to use: almost always. This is the baseline pattern.
1.2 Horizontal Isolation by “Noisy Class”
Separate node pools for:
system(CNI, CSI, metrics, logging)user-appsnoisy-batch(Spark, ETL, big cronjobs)
Idea: Keep noisy, spiky workloads from contaminating general services.
Mechanics:
-
System DaemonSets:
nodeSelector: node-role.kubernetes.io/system: "true" -
Batch node pool tainted:
kubectl taint node batch-pool batch-only=true:NoSchedule
1.3 Cost/Hardware Pools
Pools by machine type:
spotorpreemptiblestandardhigh-memssd-local
Use them like:
- Non-critical workers →
spot - Latency-critical →
standard - Memory-heavy →
high-mem - Spark/Redis →
ssd-local
Key:
Every pool has labels & taints; workloads choose via nodeSelector / nodeAffinity + tolerations.
2. Namespace & Tenant Patterns
2.1 Namespace-per-team / namespace-per-product
Pattern:
team-a-dev,team-a-prodproduct-x-dev,product-x-prod
Controls per namespace:
- ResourceQuota
- LimitRange
- NetworkPolicy
- RBAC
Example:
apiVersion: v1
kind: ResourceQuota
metadata:
name: team-a-quota
namespace: team-a-prod
spec:
hard:
requests.cpu: "40"
requests.memory: "80Gi"
limits.cpu: "80"
limits.memory: "160Gi"
pods: "200"
When to use: Multi-team clusters, platform teams serving app teams.
2.2 Soft Multi-Tenancy vs Hard Multi-Tenancy
- Soft: Same cluster, tenants isolated via namespaces, quotas, network policies, RBAC. Most enterprises.
- Hard: Separate clusters per tenant or per BU, sometimes separate accounts/subscriptions.
Rules of thumb:
- If tenants can be semi-trusted & share infra → soft.
- If you need strong isolation / different compliance regimes / noisy security boundaries → multiple clusters.
3. Workload Admission & QoS Patterns
3.1 Enforce Requests & Limits via Policy
Use an admission policy (OPA/Gatekeeper, Kyverno, or built-in ValidatingAdmissionPolicy) to:
- Reject Pods without
resources.requests&resources.limits - Forbid BestEffort except for
debugnamespaces - Enforce max/min resource sizes per namespace
Pattern:
- Default: require at least
requestsandlimits.memory. - Exception: special
allow-burstynamespace.
3.2 Priority Classes for SLO Layers
Define PriorityClasses like:
system-critical(CNI, kube-dns)platform-critical(ingress, logging, metrics)business-critical(user-facing prod services)batch(ETL, reports)best-effort(preemptible stuff)
Example:
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: business-critical
value: 900
globalDefault: false
Use in Pod spec:
priorityClassName: business-critical
Behavior:
- On resource pressure, lower-priority Pods get evicted first.
- Scheduler gives high-priority workloads first dibs on resources.
3.3 PodDisruptionBudget (PDB) + Autoscaling
Pattern:
- For every stateful or important stateless workload, define PDB:
apiVersion: policy/v1
kind: PodDisruptionBudget
spec:
minAvailable: 2
Combine with:
- HPA for scale-out
- Cluster Autoscaler / Karpenter for node scale-out
This gives:
- Safe rollouts
- Safe node drain / spot preemption
- Enough replicas for resilience
4. Topology & Failure-Domain Patterns
4.1 Spread Across Zones / Nodes
Use topology spread constraints or anti-affinity:
topologySpreadConstraints:
- maxSkew: 1
topologyKey: topology.kubernetes.io/zone
whenUnsatisfiable: ScheduleAnyway
labelSelector:
matchLabels:
app: my-api
Or simpler:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
topologyKey: kubernetes.io/hostname
labelSelector:
matchLabels:
app: my-api
Goal: Avoid all replicas landing on same node or same AZ.
4.2 Zone-aware Node Pools
Per cloud:
- Separate node pools per AZ
- Label nodes with zone
- Use
topologySpreadConstraintsto distribute workloads evenly
This prevents:
- All traffic going through a single zone
- Single-AZ outages taking entire app down
5. Security & Network Isolation Patterns
5.1 Zero-Trust-by-default NetworkPolicy
Base policy in each namespace:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
Then explicit “allow” policies for:
- namespace-local communication
- calls to specific backends (DBs, APIs)
- calls to observability stack
Pattern: No ingress/egress allowed by default → everything opt-in.
5.2 Security Boundary Namespaces
For particularly sensitive apps, combine:
- Dedicated namespace
- Dedicated node pool (taints)
- Strict
NetworkPolicy - Stricter
PodSecurity/ PSP replacement (restricted baseline) - Separate secrets store (external KMS, Vault, AKV, etc.)
This is a cluster-within-a-cluster pattern.
6. Multi-Cluster Patterns
6.1 Env-tier Clusters
One of the most common:
prodcluster(s)nonprodcluster(s) (dev/uat/stage)
Sometimes:
prod-us,prod-eu(data residency)
Pros:
- Strong blast-radius isolation
- Simple mental model: “prod is sacred”
Cons:
- More control-plane overhead
- You need a GitOps story that understands multiple clusters (ArgoCD, Flux).
6.2 Function-based Clusters
Patterns like:
core-platformcluster (ingress, observability, shared platform services)app-tenantclusters for main product linesdatacluster for Kafka/Spark/Cassandra
This is helpful if:
- Data-plane loads are wildly different than API-plane loads
- Observability stack is heavy and you want to isolate it
7. Putting It Together – Example Design
Here’s a concrete cluster design pattern you can adapt:
Clusters
corp-nonprodcorp-prod
Node Pools in each cluster
system(small, stable, for CNI/CSI/monitoring)general(default microservice nodes, D/E/m6i/n2)perf(CPUManager+TopologyManager, latency/cpu-critical)batch(cheaper, spot, larger nodes)db(memory-heavy, local SSD, tainted)
Namespaces
platform-system(CNI, CSI, logging, metrics, ingress)platform-observability(Prometheus, Loki, Tempo, etc.)team-a-dev,team-a-prodteam-b-dev,team-b-prodshared-services(auth, messaging, etc.)
Controls
- ResourceQuota + LimitRange per team namespace
- NetworkPolicy default-deny per namespace
-
PriorityClasses:
system-criticalplatform-criticalbusiness-criticalbatch-low
Scheduling hints
- Platform & observability →
system&generalpools - Latency-critical apps →
perfpool (Guaranteed, pinned CPUs) - Spark jobs →
batchpool (spot, large nodes, local SSD) - Redis/DB →
dbpool (memory-heavy, local SSD)
8. Quick design checklist
When you design or refactor a cluster, ask:
- Do I have at least two node pools? (general + something else)
- Are system components isolated or competing with apps?
- Do teams have clear namespace boundaries, quotas, and limits?
- Are BestEffort workloads controlled or confined?
- Do I have PriorityClasses & PDBs for production services?
- Are workloads spread across zones and nodes?
- Do sensitive workloads have network & node isolation?
- Do I need multiple clusters for prod vs nonprod or for legal isolation?
If the answer to most of these is “yes”, you’re in serious platform-engineering territory already.
Leave a comment