Part 3 of a series on sizing a private AKS cluster: Part 1 — node pools and VM SKUs · Part 2 — ephemeral OS and a sizing table

Parts 1 and 2 fixed three node pools, Ddsv5-class SKUs, and ephemeral OS sizes. I still had to answer: can a /26 subnet survive cluster autoscaler max on all pools, and what happens when Azure runs out of Intel VMs in a region?

Networking choice: Azure CNI + Cilium + overlay

I went with Azure CNI powered by Cilium in overlay mode:

az aks create \
  --network-plugin azure \
  --network-plugin-mode overlay \
  --network-dataplane cilium \
  --pod-cidr 192.168.0.0/16 \
  ...

Cilium gives you the data plane (eBPF, network policies without a separate engine) and drops kube-proxy on new clusters. The important sizing consequence is where pod IPs come from.

IPAM styleNode IPsPod IPsSmall node subnet (/26)
Overlay + pod-cidrNode subnetPrivate overlay CIDRWorks for many nodes
Dedicated pod subnetNode subnetSeparate VNet prefixNode subnet can stay small
Node subnet (legacy)Node subnetSame subnet as nodesPoor for dense clusters

With overlay, my /26 is a node subnet, not a pod subnet.

/26 math when all pools hit autoscaler max

Azure reserves five addresses in every subnet.

Count
Addresses in /2664
Azure reserved5
Usable59

From Part 2, production worst case:

PoolMax nodes
System5
Platform5
Apps5
Total15

Nodes only: 59 − 15 = 44 IPs left before internal load balancers, private endpoints, or other NICs in the same subnet.

Planning formula:

sum(system_max, platform_max, apps_max) + other_subnet_ips ≤ 59

I budget 10–20 IPs for non-node consumers → comfortable ceiling around 45 nodes total across pools if nothing else shares the subnet. My 5+5+5 max leaves plenty of room to raise max_count later.

Overlay pod CIDR is separate — plan it for max node count (a /16 pod CIDR is a common starting point). Pod count does not consume the /26.

What overlay does not put in the /26

  • Pod IPs (overlay)
  • Kubernetes ClusterIP addresses

max-pods is not the same as autoscaler max_count

Easy to conflate when reading IP planning docs written for flat Azure CNI.

KnobLimits
min_count / max_countNumber of nodes in the pool
max-podsPods per node (set at pool create)

With overlay, default max-pods is much higher than legacy flat CNI (maximum pods per node) — often 250 by default. Tuning max-pods does not free /26 IPs when pods use overlay.

Dedicated system pools must support at least 30 pods per node (use-system-pools) — a capability floor, not a target to run 30 system pods per node. D4ds_v5 + overlay clears that without thinking.

You might still lower max-pods on system/platform for kubelet overhead (e.g. 30–50), but that is an operational choice, not a subnet lever.

Cilium policy gotcha (Gatekeeper users)

On Azure CNI + Cilium, NetworkPolicy ipBlock cannot match pod or node IPs even inside a wide CIDR. Traffic to pod/node IPs may still be blocked. Prefer namespaceSelector / podSelector — relevant if you run Gatekeeper or hand-written policies (Cilium FAQ).

Three gates: quota, capacity, reservations

When node creates fail, three different things get blamed — they are not interchangeable.

ConceptWhat it isTypical error
QuotaSubscription permission for vCPUs in a family/regionQuotaExceeded
CapacityPhysical slots in region/zoneAllocationFailed, SKUNotAvailable
Reserved InstancesPrepaid billing discountDeploy can still fail — no capacity guarantee

Microsoft is explicit (Reserved VM Instances):

Reserved VM Instances provide a billing discount only and do not reserve or guarantee compute capacity.

On-demand capacity reservation is the mechanism that holds hardware for a specific VM size, region, and optional zone. You can attach it to AKS node pools at create with --crg-id (user-assigned identity required; cannot retrofit an existing pool).

ToolBuys you
Quota increaseRight to deploy up to N vCPUs
Reserved Instance / Savings PlanCheaper hours
Capacity reservation + CRGBaseline nodes more likely to exist

You can combine RI + capacity reservation on the same baseline slots — discount on capacity you pay to hold.

Empty capacity reservation slots still bill. I would only CR prod minimums (e.g. 3 system + 2 platform + 2 apps), not every autoscaler max.

Regional Intel shortage: what actually helped

I hit a period of Intel general-purpose allocation failures in a region (Ddsv5 / Dsv5 families). Lessons:

Mitigation order

  1. Retry availability zones — constraints are often zonal (--zones 1 2 3).
  2. Stagger pool creates/scales — do not scale system + platform + apps simultaneously during a shortage.
  3. AMD with temp disk: Standard_D4ads_v5 / Standard_D8ads_v5 — not D4as_v5 (Part 2).
  4. Capacity reservation if CR creation succeeds in that region.
  5. Cobalt ARM (*pds_v6) only after validating Arm64 for every platform chart.

What did not help

  • Reserved Instances alone — billing, not allocation.
  • Dropping system to D2 — violates ≥ 4 vCPU system pool rule.
  • B-series — not supported for system pools anyway.
  • D4as_v5 as “equivalent” — diskless; managed OS path.

Pre-flight:

az vm list-skus --location <region> --size Standard_D4ds_v5 --output table
az vm list-usage --location <region> --output table

Closing the loop

QuestionAnswer
Three pools on a /26 with CA at max?Yes, with Cilium overlay — 15 node IPs ≪ 59 usable
Room to raise max_count?~30+ nodes headroom with conservative non-node reserve
Stay on Intel D4ds_v5 during shortage?RI for cost; CR for baseline; AMD ads for allocation; zones + retries first
max-pods vs subnet?With overlay, tune max-pods for density, not /26

This series is the distillation of a sizing pass I wish had been one coherent doc when I started. Your numbers will differ — but the structure (three pools, d SKUs, explicit ephemeral size, overlay for pods, three capacity gates) transferred cleanly to production planning.

References