Cloud Desktop Teaching Platforms

This course session covers advanced Kubernetes networking with Cilium as a CNI, high-availability cluster deployment challenges, and resource management via requests and limits. The instructor demonstrates deploying Cilium in HA mode on miniKube, troubleshooting node and pod lifecycle issues, and explains the transition from Ingress to Gateway API. The session then shifts to resource allocation best practices, including CPU and memory requests/limits, kubelet behavior under pressure, OOMKills, and practical methods for determining optimal resource settings. Real-world deployment pitfalls, such as image pull failures, startup probe timeouts, and misconfigured resource constraints, are explored through live debugging.

Topic (Timeline)

1. Cilium CNI Overview and HA Deployment Setup [00:00:00 - 00:05:24]

Introduced Cilium as a CNI that replaces kube-proxy, provides IPAM, and enables node-to-node and pod-to-pod encryption.
Clarified that Cilium does not provide load balancing; an external LB (e.g., Cilium LoadBalancer) is still required.
Noted Gateway API is the future replacement for Ingress but is not installable on miniKube in this context.
Began setting up a fresh miniKube HA cluster with six nodes, identifying that the existing cluster state persisted and required cleanup.
Discussed the steep learning curve of Kubernetes and the value of targeted review for certification prep.

2. miniKube HA Cluster Initialization and Max Files Issue [00:05:24 - 00:06:26]

Encountered failure during HA cluster startup due to Docker’s file descriptor limits exceeding Ubuntu 24’s default.
Identified the root cause: miniKube runs Kubernetes inside Docker containers, which inherit host system limits.
Applied three system-level commands to increase fs.file-max and ulimit values to resolve the issue.
Verified cluster recovery by checking pod status and confirming Cilium components began initializing.

3. Cilium Component Status and Hubble Telemetry [00:06:26 - 00:07:39]

Ran cilium status to inspect component health: Cilium agent and Envoy were healthy; Hubble relay and ClusterMesh were disabled.
Explained Hubble as a telemetry and observability layer for visualizing network flows; requires explicit enablement.
Clarified that ClusterMesh allows multiple independent clusters to operate as a unified mesh, but was not configured.
Noted that Hubble UI and telemetry require additional configuration and are not enabled by default in the deployment script.

4. High-Availability Cilium Architecture and Limitations [00:07:39 - 00:10:01]

Examined Cilium components: DaemonSet (agent + Envoy on each node), single-operator pod (not HA), and kube-vip for control plane VIP.
Highlighted that Cilium’s operator is not HA by default, even when HA mode is requested.
Stated that to fully enable HA, Cilium must be configured with:
- kube-vip VIP address for Cilium to use
- kube-proxy replacement enabled
- Hubble relay and UI enabled
- Gateway API CRDs installed
Noted Gateway API requires HTTPRoute resources and can coexist with Ingress API during migration.

5. Cilium Deployment Analysis and Scripting Flaws [00:10:01 - 00:14:01]

Inspected Cilium agent and operator pods: agent is a DaemonSet, operator is a single-replica Deployment.
Identified that the deployment script attempted to install Cilium before all nodes were ready, causing pod creation failures.
Confirmed Cilium image version: 1.17.4 pulled from quay.io (not Docker Hub, likely due to rate limits or cost).
Explained Kubernetes operator pattern: a controller (Cilium operator) manages lifecycle of tenant resources (Cilium agents).
Drew parallel to other operators (e.g., MinIO, Loki) that manage stateful workloads via custom controllers.

6. Pod Probes, ConfigMaps, and Node Affinity [00:14:01 - 00:19:41]

Analyzed liveness, readiness, and startup probes: startup probe (105s timeout) was present in Envoy and Cilium agent, indicating robust startup handling.
Discussed probe semantics: readiness = ready to serve, liveness = still running, startup = initial boot phase.
Found Cilium Envoy config map: cilium-envoy-config in kube-system namespace, used for Envoy proxy configuration.
Identified node affinity issue: Cilium DaemonSet was configured to allow only one pod per node; fifth node failed to schedule because ports were not yet available.
Noted that the script did not wait for node readiness before deploying, leading to repeated pod creation/deletion cycles.

7. DaemonSet Lifecycle and Self-Healing Behavior [00:19:41 - 00:22:01]

Reviewed DaemonSet events: multiple pod creation/deletion cycles occurred before six stable pods were running.
Emphasized this as expected Kubernetes self-healing behavior under unstable conditions.
Noted use of deprecated DaemonSet template, suggesting the deployment script is outdated despite using a recent Cilium image.
Confirmed Cilium runs in kube-system namespace and uses hostPath mounts for BPF, config, and socket access.

8. CNI, Ingress, and Gateway API Transition [00:22:03 - 00:26:33]

Confirmed kube-proxy was replaced by Cilium, as no kube-proxy pods were present.
Reviewed Ingress API: uses Ingress resource + controller to route traffic; being deprecated in favor of Gateway API.
Explained Gateway API: uses Gateway (L4/L7 listener) and HTTPRoute (routing rules); supports shared gateways across namespaces.
Highlighted Gateway API’s advantages: standardization, scalability, TLSRoute support (experimental).
Emphasized Cilium’s role as a CNI that supports both Ingress and Gateway API, and provides observability via Hubble.

9. Resource Requests and Limits Fundamentals [00:26:33 - 00:34:23]

Introduced resource requests (guaranteed minimum) and limits (maximum allowed).
Explained that total limits can exceed node capacity because not all pods use max simultaneously.
Described kubelet’s proactive eviction: under pressure (CPU/memory/disk), kubelet terminates pods to reclaim resources.
Discussed OOMKills: memory limit enforcement via kernel OOM killer; CPU limits via throttling.
Noted that container may briefly exceed request (e.g., during startup) but not limit over time.
Specified resource configuration syntax: spec.containers.resources.requests.{cpu,memory} and limits.{cpu,memory}.

10. Resource Sizing Strategies and Upstream Testing Pitfalls [00:34:23 - 00:37:17]

Advised two methods to determine resource values: 1) consult application documentation, 2) empirical testing with monitoring.
Warned that upstream teams (e.g., CNCF projects) often test in kind (Kubernetes-in-Docker), not production-like environments.
Highlighted that Helm chart upgrades with tight memory limits frequently cause OOMKills due to untested configurations.
Noted that CPU limits may be counterproductive for apps with short-term spikes (e.g., startup, batch jobs).

11. Resource Configuration Debugging and Node Pressure [00:37:17 - 00:48:56]

Deployed nginx with 12Mi memory request → failed with “OOMKilled” and “minimum memory is 6Mi” (instructor’s system) or “unknown” (student’s).
Observed that container creation errors may have no logs if pod never starts.
Discovered that requests cannot exceed limits: error message clearly indicated “7 must be ≤ 3” for CPU.
Used kubectl describe node to analyze allocatable resources: 16 CPUs, ~12.2Gi RAM.
Calculated that 3 nginx pods with 7 CPU requests each exceeded node capacity (21 > 16), causing pending state.
Demonstrated that kube-scheduler respects requests, not limits, for scheduling.

12. Node Resource Utilization and Best Practices [00:48:56 - 00:52:25]

Adjusted resource requests to 6 CPU and 3Gi memory per pod → all three pods scheduled successfully.
Noted 98% CPU request utilization; emphasized that 90% is a practical upper bound to allow for system processes and spikes.
Confirmed that Kubernetes does not allow 100% request utilization — scheduler reserves headroom.
Highlighted miniKube’s value for learning despite limitations (e.g., network disconnects, resource constraints).

13. Review: Resource Management and Cluster Stability [00:52:26 - 00:57:50]

Summarized key concepts: node resources (CPU, memory, storage, pods, PIDs), requests (reservation), limits (cap).
Reinforced that kubelet monitors pressure and evicts pods before system failure.
Clarified that disk pressure often stems from unrotated logs — log shipping/rotation is critical in production.
Explained OOMKills can affect both node processes and pods; upgrades must be tested in production-like environments.
Noted that resource requests/limits are optional but strongly recommended.
Mentioned pod-level resource spec (beta) as an alternative when container-level config is inaccessible.
Concluded: CPU limits = throttling, memory limits = OOMKill; startup probes are essential for complex apps.

Appendix

Key Principles

Cilium as CNI: Replaces kube-proxy, enables eBPF-based networking, encryption, and observability.
Gateway API: Modern replacement for Ingress; uses Gateway + HTTPRoute CRDs; supports multi-namespace routing.
Resource Requests: Guaranteed minimum; used by scheduler for node placement.
Resource Limits: Hard cap; enforced by kubelet via throttling (CPU) or OOMKill (memory).
Startup Probes: Critical for apps with long initialization; prevents premature restarts.
Node Pressure: Managed by kubelet via image garbage collection before pod eviction.

Tools Used

miniKube (HA mode with 6 nodes)
Cilium (v1.17.4)
Hubble (telemetry, disabled by default)
kube-vip (control plane VIP)
kubectl (for inspection and debugging)
Helm (implied for future Gateway API deployment)

Common Pitfalls

Installing Cilium before nodes are ready → repeated pod churn.
Not enabling Hubble relay/UI → missing observability.
Setting requests > limits → scheduler rejects pod.
Using Docker Hub for images → rate limiting → switch to quay.io.
Upgrading Helm charts without testing → OOMKills due to tight memory limits.
Ignoring log rotation → disk pressure → node instability.
Assuming HA mode automatically enables HA operators → Cilium operator remains single-replica unless explicitly configured.

Practice Suggestions

Deploy Cilium on miniKube with HA mode and manually enable Hubble relay and UI.
Simulate resource pressure by deploying pods with aggressive limits and observe eviction behavior.
Compare Ingress vs. Gateway API manifests for the same service.
Use kubectl describe node and kubectl get events to diagnose scheduling failures.
Test Helm chart upgrades on a local cluster mimicking production resource constraints.

Kubernetes Comprehensive 2-Day - lane-nbsb-20250627-210626

Visit NobleProg websites for related course

Summary

Overview