Automating Kubeadm Init & Join on AWS EC2

When you're setting up a Kubernetes cluster using kubeadm, one of the first questions is:
“How do I automate the init/join logic without hardcoding IPs or manually copying tokens?”

In my AWS-based Kubernetes homelab, I wanted a fully automated, reproducible setup — including both control plane and worker nodes joining the cluster automatically as soon as they boot.

This blog explains how I accomplished that using:

EC2 instance tags and metadata
SSM Parameter Store (for secure state sharing)
Cloud-init & systemd (for boot-time logic)

🧱 Background

I built a custom AMI (Ubuntu-based) using Packer + Ansible, used by both control plane and worker nodes. At boot, every EC2 instance checks its role and automatically does one of the following:

If it's the control plane, run kubeadm init, install Cilium, and push the join command to SSM.
If it's a worker node, fetch the join command from SSM and run kubeadm join.

This results in zero manual steps, even when scaling the cluster.

🔑 The Strategy

Here's how I approached the automation:

1. Cloud-init triggers the logic on boot

In my AMI, I include this cloud-init config to run a custom systemd service:

# /etc/cloud/cloud.cfg.d/99_k8s.cfg
#cloud-config
runcmd:
  - systemctl daemon-reload
  - systemctl enable kubeadm-init.service
  - systemctl start kubeadm-init.service

This means the node’s role evaluation and bootstrapping start automatically at boot.

2. Role detection via EC2 Metadata

Each EC2 node has a Role tag (k8s-control-plane or k8s-worker), and this script uses the EC2 metadata service (IMDSv2) to fetch it:

TOKEN=$(curl -s -X PUT "http://169.254.169.254/latest/api/token" \
  -H "X-aws-ec2-metadata-token-ttl-seconds: 21600")

ROLE=$(curl -s -H "X-aws-ec2-metadata-token: ${TOKEN}" \
  http://169.254.169.254/latest/meta-data/tags/instance/Role)

Since the worker node needs the control plane’s IP and the join command, I used AWS SSM Parameter Store to store:

The control plane’s private IP
The kubeadm join command, generated with --print-join-command

Example upload on the control plane:

aws ssm put-parameter \
  --name "/k8s-homelab/control-plane-private-ip" \
  --value "$CONTROL_PLANE_PRIVATE_IP" \
  --type "String" --overwrite

And for the join command (as a SecureString):

aws ssm put-parameter \
  --name "/k8s-homelab/worker-node-join-command" \
  --value "$JOIN_COMMAND" \
  --type "SecureString" --overwrite

4. Workers: Wait, Fetch, and Join

To give the control plane time to initialize, workers wait 2 minutes, then:

Fetch control plane IP and add it to /etc/hosts
Retrieve the join command from SSM
Execute kubeadm join

CONTROL_PLANE_PRIVATE_IP=$(aws ssm get-parameter \
  --name "/k8s-homelab/control-plane-private-ip" \
  --query "Parameter.Value" --output text)

WORKER_NODE_JOIN_COMMAND=$(aws ssm get-parameter \
  --name "/k8s-homelab/worker-node-join-command" \
  --with-decryption --query "Parameter.Value" --output text)

eval "${WORKER_NODE_JOIN_COMMAND}"

🔄 What Could Be Improved?

❌ Avoid using never-expiring tokens: In my current setup, the kubeadm join token is created with --ttl 0, meaning it never expires. This is fine for bootstrapping, but in a production or long-lived setup, it's a security risk. Ideally, use a short TTL and regenerate as needed via automation.
⏳ Replace static wait with readiness checks: Right now, worker nodes wait a fixed 2 minutes before trying to join. A better approach would be to poll the SSM parameter or check API server health before proceeding.
📡 Move to DNS-based discovery: Instead of writing the control plane's IP into /etc/hosts, I could use private DNS or AWS Cloud Map to dynamically resolve the control plane node.
📈 Explore scaling with Auto Scaling Groups (ASG): This current setup works well for static clusters, but I could extend it to support dynamic scaling by integrating with ASG and lifecycle hooks.

🎯 Final Thoughts

This was a fun and educational challenge. I used this approach to strengthen my prep for the CKA certification, but it’s also laying the foundation for running production-grade workloads on a homelab cluster I fully understand and control.

📌 Curious about the full setup? Check out the GitHub repo:
👉 github.com/hoaraujerome/k8s-homelab

💡 Want to understand the design trade-offs and cost-saving decisions behind this setup?
👉 Read the blog post on design and cost decisions

Automating kubeadm Init & Join on AWS: My Cloud Homelab Approach

🧱 Background

🔑 The Strategy

1. Cloud-init triggers the logic on boot

2. Role detection via EC2 Metadata

4. Workers: Wait, Fetch, and Join

🔄 What Could Be Improved?

🎯 Final Thoughts

Comments

More from this blog

Why You Can’t Terminate TLS at Traefik for PostgreSQL (and What to Do Instead)

Fix VSCodeVim Arrow Key Motion in Cursor on macOS

From Goals to Constraints to Costs: Designing a Lean AWS Kubernetes Homelab

How Rosetta Broke My Terraform Setup (and How I Fixed It on Apple Silicon)

Command Palette

🧱 Background

🔑 The Strategy

1. Cloud-init triggers the logic on boot

2. Role detection via EC2 Metadata

3. SSM Parameter Store for dynamic state sharing

4. Workers: Wait, Fetch, and Join

🔄 What Could Be Improved?

🎯 Final Thoughts

Comments

More from this blog