Part 1 of a series on sizing a private AKS cluster: Part 2 — ephemeral OS and a sizing table · Part 3 — Cilium overlay, IP math, and capacity

I needed to size a private AKS cluster from scratch: not just “how many nodes,” but which node pools, which Azure VM family, and which exact SKU for system components, platform charts, and application workloads.

This post is what I settled on for pool topology and VM family/SKU naming. Parts 2 and 3 cover ephemeral OS disks and networking/capacity.

The setup I was designing for

  • Private AKS with a dedicated system pool and separate user pools
  • Platform services (ingress, cert-manager, policy, secrets operators, etc.) — on the order of ten Helm charts, not ten pods
  • Application workloads on their own pool
  • Cluster autoscaler on each pool with different min/max per environment
  • Lean node subnet (I only get so many VNet IPs for nodes — networking detail in Part 3)

AKS only has two pool modes: System and User. “Platform” is not a third mode — it is a user pool with labels/taints.

Three logical tiers, two AKS modes

TierAKS modeWhat runs hereWhy separate
SystemSystemCoreDNS, CNI, metrics, kube-systemCritical path; isolate from apps
PlatformUser + taintTraefik, cert-manager, Gatekeeper, ESO, Reloader, …Shared cluster services; own scale bounds
AppsUserTenant / product workloadsBlast radius and sizing independent of platform

Microsoft recommends a dedicated system node pool and running application work on user pools. I would not put Traefik or Gatekeeper on the system pool in production: you get resource contention and shared fate with CoreDNS and the CNI if a platform chart misbehaves.

For sandbox, a single pool is a tolerable shortcut. For anything that pretends to be prod, I wanted three pools.

Platform scheduling is ordinary Kubernetes:

nodeSelector:
  agentpool: platform
tolerations:
  - key: platform
    value: "true"
    effect: NoSchedule

Default VM family: general-purpose D

Azure’s VM overview lists many families. For mixed Kubernetes workloads, the default is general purpose (D-family):

FamilyFit for my pools
DDefault — APIs, controllers, ingress, typical microservices
FCPU-heavy batch/encoding — only if profiling proves CPU-bound apps
ERAM-heavy JVM/caches — second user pool later, not the first shared pool
BBurstable — avoid for always-on AKS (throttling under sustained load)
LLocal-disk databases — not a generic app pool

Microsoft’s “use larger node sizes to pack more pods per node” advice means bigger D SKUs (e.g. D8 instead of many D2s), not switching to F or E. Daemonsets and per-node agents amortize better over more schedulable pods.

My split:

  • System + platform: Standard_D4ds_v5 class (4 vCPU, 16 GiB)
  • Apps (prod): Standard_D8ds_v5 (8 vCPU, 32 GiB) for density
  • Apps (sandbox): D4ds_v5 is fine when cost matters

Reading the SKU name: why D4ds_v5 and not D4s_v5

Azure VM names encode features. For AKS nodes with ephemeral OS, the letters matter:

Standard_D4ds_v5 breakdown:

PartMeaning
DGeneral purpose
44 vCPUs
dTemp / local disk (ephemeral OS placement)
sPremium Storage capable (remote disks)
v5Series generation

Common traps:

SKULocal temp diskGood AKS ephemeral node?
Standard_D4ds_v5~150 GiBYes
Standard_D4s_v5NoneNo — managed OS only
Standard_D4as_v5NoneNo — AMD diskless (Dasv5)
Standard_D4ads_v5~150 GiBYes — AMD mirror of D4ds_v5

Rule of thumb: want ephemeral OS on local disk → look for d in the name (dds, ads).

System pool rules I almost got wrong

I initially considered D2 for the system pool to save money. Dedicated system pools have documented constraints (manage system node pools):

RequirementDetail
VM size≥ 4 vCPUs and ≥ 4 GB memory
Min nodesAt least 2; 3 recommended
B-seriesNot supported for system pools
SpotSystem pools cannot be Spot

So Standard_D2ds_v5 (2 vCPU) is out for a real system pool, even though it has enough RAM. Standard_D4ds_v5 is the practical floor — not vanity sizing, compliance with the system pool bar.

A separate quotas page mentions softer limits (2 vCPU / 4 GB “might not be used”). I designed to the stricter system-pool document.

Prod system min 3 pairs well with spreading across availability zones when the SKU and region support it.

What I did not overthink

  • F/E families for a first shared app pool — D is enough until metrics say otherwise
  • Ddsv6 / NVMe for system/platform — Part 2 explains why v5 was enough for controllers
  • Putting platform on the system pool — tempting for small clusters, wrong trade for prod

Where this lands before Parts 2 and 3

PoolModeSKU (Intel default)Autoscale (prod example)
SystemSystemStandard_D4ds_v5min 3 / max 5
PlatformUserStandard_D4ds_v5min 2 / max 5
AppsUserStandard_D8ds_v5min 2 / max 5

Disk sizes, ephemeral OS details, and the full environment matrix are in Part 2. Part 3 covers Cilium overlay, why a /26 subnet can still work, and quota vs capacity vs reservations.

References