Part 2 of a series on sizing a private AKS cluster: Part 1 — node pools and VM SKUs · Part 3 — Cilium overlay, IP math, and capacity

In Part 1 I split the cluster into system, platform, and apps pools and picked D-family SKUs with a d for local temp disk. This post is the disk strategy, the full sizing table, and the Intel vs AMD vs v6 decisions that fed it.

Why ephemeral OS on AKS

AKS prefers ephemeral OS disks when the VM SKU allows: the OS volume lives on local temp storage instead of a remote managed disk. You get faster reimage/scale and lower latency for kubelet and image layers.

Relevant knobs:

SettingWhat I use
osDiskTypeEphemeral
kubeletDiskTypeOS — images and kubelet data share the OS disk (simple)
osDiskSize / node-osdisk-sizeSet explicitly per pool

Set osDiskSize on purpose

On SKUs with large temp disks (150–300 GiB), leaving size unset can lead Azure/AKS to allocate a much larger ephemeral OS than you need — especially as defaults evolve on big temp SKUs.

I sized intentionally:

PoolosDiskSizeWhy
System64 GiBFew images; CoreDNS/CNI only
Platform128 GiB~10 Helm charts, more image layers
Apps128 GiBPull-heavy apps; bump toward 150 only if image churn proves it

Each cap must stay ≤ the SKU max temp (150 GiB on D4ds_v5 / D4ads_v5; 300 GiB on D8ds_v5 / D8ads_v5).

Ephemeral is for stateless nodes. Anything that must survive node loss belongs on PersistentVolumes, not the OS disk.

Ddsv5 vs Ddsv6: I stayed on v5

I compared Standard_D4ds_v5 and Standard_D4ds_v6. On paper, v6 wins local IOPS (NVMe temp vs SCSI temp on v5). For system and platform pools, I did not pay the v6 premium.

Two reasons:

  1. Cost — roughly tens of dollars per node per month adds up across three pools.
  2. Ephemeral OS on NVMe — Microsoft documents that with ephemeral OS on NVMe VMs, the OS path may not expose “full NVMe” performance the way raw local disk benchmarks suggest (NVMe temp FAQs). On a single-disk D4 class VM using the whole disk for OS + kubeletDiskType: OS, the headline 75k IOPS are often misleading for day-to-day node I/O.

v6 is still reasonable if you standardize the whole fleet on v6 or v5 is unavailable in your region. It was not worth it only for Traefik and cert-manager.

AMD fallback: ads, not as

When Intel Ddsv5 hits regional capacity limits, Microsoft often points to AMD v5. The correct mirror SKU includes d:

IntelAMD fallbackLocal temp
Standard_D4ds_v5Standard_D4ads_v5150 GiB
Standard_D8ds_v5Standard_D8ads_v5300 GiB

Standard_D4as_v5 is not equivalent. Dasv5 is diskless by design — your pricing calculator row showing N/A for local storage is the tell. AKS falls back to managed Premium OS disks. Fine as a short-term capacity hack; it breaks an ephemeral-first design.

I learned this the hard way after misreading “AMD alternative” as D4as_v5 instead of D4ads_v5.

Pre-flight:

az vm list-skus --location <region> --size Standard_D4ads_v5 --output table

Reference sizing table

Template from my design exercise — adjust for your workloads and region.

SKUs and disks

PoolAKS modeSKUOS disk (ephemeral)
SystemSystemStandard_D4ds_v564 GiB
PlatformUser + taintStandard_D4ds_v5128 GiB
Apps (prod)UserStandard_D8ds_v5128 GiB
Apps (sandbox)UserStandard_D4ds_v5128 GiB

AMD: swap dsads at the same size tier.

Autoscale bounds

PoolProductionNon-prodSandbox
Systemmin 3 / max 5min 2 / max 5min 2 / max 5
Platformmin 2 / max 5min 2 / max 5min 1 / max 5
Appsmin 2 / max 5min 2 / max 5min 1 / max 5

Operational habits:

  • Set node_count = min_count when turning on the cluster autoscaler.
  • Sandbox min 1 on platform/apps saves money; you accept no HA during node drains.

Worst-case node count (prod, all pools at max)

5 + 5 + 5 = 15 nodes

With Cilium overlay (Part 3), that number drives node subnet IPs — not pod IPs.

Rough vCPU quota ask

PoolMax nodesvCPU eachSubtotal
System5420
Platform5420
Apps5840
Total1580

Add upgrade surge headroom (~one extra node per pool during rolling upgrade) → I asked for ~100 vCPUs in the Ddsv5 family when requesting quota.

Decisions I would make again

ChoiceRationale
D4 system (not D2)System pool ≥ 4 vCPU
D4 platform (not D8)Controllers + ingress, not app-scale density
D8 apps in prodFewer, larger nodes for daemonset overhead
v5 over v6Cost; NVMe upside muted for ephemeral OS on controllers
64 vs 128 GiB OSExplicit caps; avoid “use all temp” surprises

What’s next

Part 3 answers the question that kept me up at night with a /26 node subnet: how do fifteen autoscaled nodes and hundreds of pods fit? Short answer: overlay + Cilium — pods do not eat your /26. I also unpack max-pods vs autoscaler max_count, and quota vs Reserved Instances vs on-demand capacity reservation when a region runs out of Intel SKUs.

References