Login Register

Kubernetes - Basic to Advanced - lane-qaca-20260106-022319

← Back to Recording

Summary

Overview

This course module provides an in-depth exploration of Kubernetes control plane architecture, high availability (HA) configuration, and operational practices using Minikube. It covers the critical components of the control plane—Kube API server and etcd—along with encryption requirements for inter-component communication, version lifecycle management, and the practical challenges of deploying and managing HA clusters. The session includes hands-on demonstrations of creating, scaling, and failing over HA control planes, inspecting etcd data stores, and understanding the limitations of Minikube’s HA implementation. Emphasis is placed on production-grade considerations such as resource isolation, certificate management, and the necessity of virtual IPs or cloud load balancers in HA topologies.

Topic (Timeline)

1. Control Plane Security and Encryption Principles [00:00:13.030 - 00:04:10.710]

Explains the need to encrypt etcd communication between control plane nodes due to sensitive, unencrypted data (e.g., secrets, config maps) stored in etcd.
Highlights the evolution from per-application TLS certificates in early Kubernetes to modern node-to-node and pod-to-pod encrypted tunnels, reducing certificate management complexity.
Describes the role of the Gateway API and Ingress controllers in terminating TLS at the cluster edge using a single certificate, with internal traffic encrypted via CNI plugins (e.g., Cilium).
Emphasizes that losing all Kube API servers renders the cluster non-operational: no new pods can be scheduled, existing pods continue running until resource changes are needed, and the cluster becomes a “brainless” state.

2. High Availability Control Plane Design and Requirements [00:04:10.710 - 00:06:01.810]

Defines HA control plane as essential for production: requires at least three control plane nodes to maintain quorum in etcd.
Warns against running control plane nodes on shared-resource VMs due to unpredictable CPU throttling (e.g., down to 0.01 vCPUs), which can destabilize critical components.
Identifies the two most critical control plane components: Kube API server and etcd data store—without either, cluster state cannot be accessed or modified.
Notes that HA must be explicitly and correctly configured; a misconfigured 3-node setup with no true failover is functionally equivalent to a single-node cluster.

3. Minikube HA Setup and Node Management [00:06:03.490 - 00:13:47.300]

Demonstrates setting up a Minikube HA cluster with minikube start --ha --container-runtime=containerd, which spawns three containerized control plane nodes (minikube, minikube-m02, minikube-m03).
Shows the sequential node joining process and the time required for cluster stabilization.
Walks through deleting a control plane node (minikube node delete minikube-m02) and adding a replacement (minikube node add --role=control-plane), observing cluster recovery.
Addresses VM resource limits (inotify watches) causing node creation failures, and resolves via sysctl tuning: fs.inotify.max_user_watches and fs.inotify.max_user_instances.
Deletes a failed worker node (minikube node delete minikube-m05) and confirms cluster health.

4. Cilium CNI Integration in HA Clusters and Configuration Limitations [00:13:47.300 - 00:19:38.750]

Demonstrates deploying a 6-node HA cluster (3 control plane, 3 worker) with Cilium CNI using minikube start --ha --cni=cilium --container-runtime=containerd.
Reveals a critical flaw: Minikube’s --ha flag does not properly configure a Virtual IP (VIP) or integrate Cilium agents with the HA control plane endpoints.
Shows that deleting the primary control plane node (minikube) causes cluster failure—no automatic failover occurs, and new control plane nodes fail to join due to misconfigured CNI and lack of VIP.
Explains that a true HA setup requires: (1) initial single control plane, (2) CNI installation, (3) VIP or cloud load balancer, (4) joining additional control planes, (5) configuring CNI to use the VIP, and (6) restarting services.
Concludes that Minikube’s current HA implementation is incomplete and does not reflect production-grade HA behavior.

5. Kubelet and API Server Configuration Inspection [00:20:30.590 - 00:22:41.390]

Uses kubectl proxy to expose the Kube API server locally, then queries the kubelet config via curl and jq to extract:
- Container runtime endpoint
- Healthz port and bind address
Repeats the process on port 8080 to inspect the Kube API server configuration, confirming API version (e.g., v1.34) and server address.
Notes the utility of jq over yq for JSON parsing due to richer filtering capabilities.

6. Etcd Data Store Inspection and Security Implications [00:23:31.760 - 00:32:45.100]

Demonstrates accessing the etcd pod (etcd-minikube in kube-system) via kubectl exec and using etcdctl to query the key-value store.
Uses etcdctl get --endpoints=... --cacert=... --cert=... --key=... --prefix / --keys-only to list keys without values.
Reveals that secrets (e.g., bootstrap tokens) and config maps are stored in plain text in etcd, highlighting the critical need for etcd encryption at rest and in transit.
Shows how to retrieve a specific secret value by copying the full key path (e.g., /registry/secrets/kube-system/bootstrap-token-...) and using etcdctl get without --keys-only.

7. HA Cluster Diagram Analysis and Production Topology [00:33:43.160 - 00:34:42.060]

Analyzes a diagram from the Kubernetes SIG team and identifies its omission: a load balancer (in cloud) or VIP (on bare metal) in front of the control plane nodes.
Clarifies that cloud environments use cloud provider load balancers, while bare metal requires software-based VIPs (e.g., kube-vip).
Notes that the diagram lacks the cloud controller manager, which is necessary for cloud-native load balancing.

8. Cluster Lifecycle, Scaling, and Upgrade Practices [00:34:42.060 - 00:36:01.650]

Reviews key learnings:
- Kubernetes clusters have a 12-month support lifecycle; upgrades must occur before EOL (e.g., v1.34 EOL ~2026).
- Control plane nodes must be deployed in odd numbers (3, 5) to maintain etcd quorum.
- Scaling down: drain → delete; scaling up: create → join → uncordone.
- Upgrading a node: cordone → drain → upgrade → restart → uncordone.
- Infrastructure-as-Code (IaC) enables cluster replacement: spin up new cluster → apply workloads via GitOps → switch DNS → monitor.
Emphasizes that HA is not achieved by --ha flag alone—it requires correct CNI, VIP, and service configuration.

Appendix

Key Principles

etcd must be encrypted in transit and at rest; its data (secrets, config maps) is stored in plain text by default.
Kube API server and etcd are the two most critical control plane components; their failure renders the cluster non-functional.
HA requires more than multiple nodes: a VIP or cloud load balancer and proper CNI integration are mandatory for true failover.
Control plane nodes must not share resources with other workloads in production due to unpredictable CPU throttling.

Tools Used

kubectl proxy – to expose internal cluster APIs locally
curl + jq – to query and parse Kube API and kubelet configurations
etcdctl – to inspect and query the etcd key-value store
minikube node add/delete – to dynamically scale control plane and worker nodes
sysctl – to adjust Linux file watch limits for Minikube VM stability

Common Pitfalls

Assuming minikube start --ha creates a production-ready HA cluster (it does not).
Using shared-resource VMs for control plane nodes in production.
Not encrypting etcd traffic or data at rest.
Failing to configure a VIP or load balancer in front of control plane nodes.
Not upgrading clusters before EOL, leading to unsupported or insecure states.

Practice Suggestions

Always test HA failover in a non-production environment using Minikube or k3s.
Use etcdctl regularly to audit secrets and config maps in etcd.
Automate cluster upgrades using IaC and GitOps workflows (e.g., ArgoCD, Flux).
Monitor etcd health and disk usage in production clusters.
Use kubectl get nodes -o wide and minikube status to validate cluster state after any change.