10 videos 📅 2025-06-26 09:00:00 America/New_York
2:14:39
2025-06-26 09:07:32
1:12:32
2025-06-26 09:11:34
6:42
2025-06-26 11:08:41
35:51
2025-06-26 11:24:37
38:41
2025-06-26 13:21:35
20:37
2025-06-26 15:06:35
51:46
2025-06-27 09:06:19
58:45
2025-06-27 09:06:25
36:01
2025-06-27 11:26:09
1:12:38
2025-06-27 13:45:09

Course recordings on DaDesktop for Training platform

Visit NobleProg websites for related course

Visit outline: Kubernetes Comprehensive (Course code: kubernetescompr)

Categories: Kubernetes

Summary

Overview

This course session provides a comprehensive introduction to Kubernetes architecture, components, and operational best practices, delivered by an experienced Kubernetes engineer and government contractor. The instructor covers foundational concepts including cluster design, control plane and worker node roles, key components (etcd, API server, scheduler, controller manager, kube-proxy, kubelet), Kubernetes distributions (flavors), high availability, self-healing mechanisms, and the causality dilemma in automation. The session also includes hands-on demonstrations using Minikube to verify environment setup, manage node labels, inspect container runtimes, and deploy a basic pod. The goal is to equip learners with the conceptual and practical knowledge needed to build, operate, and troubleshoot production-grade Kubernetes clusters, with emphasis on infrastructure-as-code, GitOps, and bare-metal deployment economics.

Topic (Timeline)

1. Introduction & Course Overview [00:00:00 - 00:04:56]

  • Instructor introduces background: government contractor since 2016, built first Kubernetes cluster in Tampa, designs and deploys production-grade clusters, primarily on bare metal.
  • Teaches clients on their own clusters, including troubleshooting and migration from cloud-native workloads (e.g., containerizing a 20-CPU Django server).
  • Outlines 11-course structure: architecture fundamentals → namespaces/taints → workloads (Deployments, StatefulSets, DaemonSets) → Jobs/CronJobs → Services/DNS → Ingress → resource requests/limits → ConfigMaps/Secrets/PVs → cluster scaling/upgrades → troubleshooting → Helm chart deployment.
  • Emphasizes progressive, interactive learning: each lesson builds on the prior, with Lesson 11 integrating all prior concepts.
  • Mentions course schedule: 6 hours instruction/day, 1-hour total breaks (30-min lunch, two 15-min breaks), 30-min review at end of Day 1, 30-min Q&A on Day 2, post-course questionnaire.

2. Kubernetes Learning Environment: Minikube & Toy Clusters [00:04:56 - 00:14:08]

  • Explains use of Minikube on DA Desktop as a shared, interactive learning environment (since clients typically use their own clusters).
  • Minikube enables single-node and high-availability (3-control-plane-node) clusters using Docker containers as nodes (Docker-in-Docker).
  • Contrasts with other toy clusters: MicroK8s (single-node only), K3s (Raspberry Pi optimized), and Dr. K8s/Rancher Desktop (resource-heavy, unstable on Mac).
  • Clarifies that toy clusters are for learning CLI and concepts, not production; they are ephemeral, crash-prone, and lack full feature parity (e.g., no TLS encryption, no production-grade storage).
  • Demonstrates: minikube version, kubectl get nodes, docker stats, docker ps -a to inspect Minikube’s Docker container structure and resource limits (2 CPUs, ~2.9GB RAM).
  • Shows Minikube’s internal components: kube-apiserver, kube-scheduler, kube-controller-manager, kube-proxy, coreDNS, storage-provisioner.
  • Demonstrates cluster reset: minikube stop, minikube delete --all to wipe state and ensure clean start.

3. Kubernetes Fundamentals: Architecture, Components, and Design [00:14:08 - 00:28:20]

  • Defines Kubernetes (K8s): open-source system for automating deployment, scaling, and management of containerized applications.
  • Contrasts cloud provisioning (Terraform → AWS) with Kubernetes: in K8s, the user is the infrastructure engineer.
  • Explains shift from Terraform to Ansible (OS layer) + GitOps (workload state) for modern cluster management.
  • Introduces Argo CD as a GitOps tool: auto-syncs Helm charts from Git repos, provides UI, logs, events, and rollback via code push.
  • Describes cluster design process: balance workload needs and budget → engineer solution → automate with IaC → train DevOps on GitOps.
  • Highlights cost savings: client reduced $500K/year AWS bill to <$50K/year on bare metal using Kubernetes.
  • Kubernetes release cycle: 3 active minor releases; dot-zero for dev/testing; dot-two recommended for production; 4-month release cadence.
  • Introduces core components:
    • Control Plane: API Server, etcd, Scheduler, Controller Manager, Cloud Controller Manager (not used on bare metal).
    • Nodes: kubelet (agent), kube-proxy (legacy), container runtime.
    • etcd: leader-based key-value store; requires odd-numbered nodes (1, 3, 5) for HA; NVMe/SSD recommended; TLS and encryption-at-rest for security.
    • API Server: REST API endpoint; critical for cluster control; single point of failure if all nodes down.
    • Scheduler: assigns pods to nodes based on resource and constraints.
    • Controller Manager: maintains desired state via control loops (replication, namespace, endpoints controllers).
    • kube-proxy: legacy network proxy; replaced by modern CNI (e.g., Cilium) in production.

4. Kubernetes Flavors, Security, and Node Management [00:28:20 - 00:49:43]

  • Defines “flavors” as Kubernetes distributions: Vanilla (original), RKE2 (enterprise), OpenShift (DOD STIG), K3s (Raspberry Pi), MicroK8s, Minikube, Dr. K8s, Rancher Desktop.
  • Warns against using K3s for government contracts: federal agencies now deny payment for K3s-based products due to toy-cluster limitations.
  • Explains security features: TLS between etcd nodes, encryption-at-rest for secrets, node-to-node and pod-to-pod encryption via CNI.
  • Highlights Cilium as preferred CNI: provides transparent encryption, Gateway API support, and full feature set out-of-the-box.
  • Discusses Gateway API: replaces Ingress with centralized TLS termination at gateway, eliminating need for 500+ certificates.
  • Explains node roles: control plane vs. agent (formerly worker); optional dedicated storage nodes to avoid resource contention.
  • Node limits: max 110 pods/node due to /24 CIDR block allocation (256 IPs minus reserved).
  • Node labeling: kubectl label node <name> key=value, remove with kubectl label node <name> key-.
  • Emphasizes importance of control plane node isolation: avoid shared VMs (risk of resource starvation → cluster failure).

5. High Availability, Self-Healing, and the Causality Dilemma [00:49:43 - 00:57:52]

  • Defines Day One vs. Day Two clusters: Day One = ephemeral (dev/staging), rebuildable via IaC; Day Two = persistent (production), never modified directly.
  • HA control plane: leader election + VIP (via MetalLB or similar) ensures API accessibility during leader changes.
  • HA worker nodes: multiple replicas of stateless workloads + load balancer + auto-scaling.
  • Pod anti-affinity: ensures pods distribute evenly across nodes (e.g., avoid 3 pods on 1 node).
  • Self-healing mechanisms:
    • Container restarts (based on restart policy).
    • Service endpoint removal for failed pods.
    • Persistent volume reattachment on node failure.
    • Pod replacement via Deployment/StatefulSet/ DaemonSet.
  • Causality dilemma: circular dependency in automation (e.g., CNI needs VIP, but VIP requires CNI to be installed).
    • Example: CNI configured with static IP → VIP assigned → CNI must be reconfigured with VIP → requires reinstallation.
    • Commonly ignored in automation scripts → leads to non-production-ready clusters.

6. Practical Lab: Environment Verification, Runtime, Labels, and Pod Deployment [00:57:52 - 01:11:50]

  • Verifies Kubernetes version via kubectl version (not Minikube output).
  • Notes EOL implications: Kubernetes v1.33.1 is dev version; production should use v1.32.x; upgrade path involves either in-place upgrade or cluster replacement via GitOps.
  • Checks container runtime: kubectl get nodes -o wide → shows containerd (modern default, replaced Docker).
  • Demonstrates Minikube bug: reports Ubuntu 22 despite host being Ubuntu 24.
  • Resets Minikube environment: minikube stop, minikube delete --all, restarts with --container-runtime=containerd.
  • Labels node: kubectl label node minikube node-type=test.
  • Removes label: kubectl label node minikube node-type-.
  • Assigns role label: kubectl label node minikube node-role.kubernetes.io/control-plane="".
  • Creates and applies minimal Pod YAML via vim: kind: Pod, metadata, spec.containers.name/image.
  • Demonstrates kubectl get pods -o wide to observe pod status.

Appendix

Key Principles

  • Infrastructure as Code (IaC): Essential for reproducible, scalable Kubernetes deployments. Use Ansible for OS provisioning, GitOps (Argo CD) for workload management.
  • GitOps: Declarative state management via Git repositories; enables automated rollbacks and upgrades via pull requests.
  • Production Readiness: Avoid dot-zero Kubernetes releases; use dot-two or higher. Avoid toy clusters (K3s, Minikube) in regulated environments.
  • Security: Enable TLS for etcd, encrypt secrets at rest, use CNI with built-in encryption (e.g., Cilium).
  • Node Isolation: Never run control plane nodes on shared-resource VMs; dedicate hardware or isolated VMs.

Tools Used

  • Minikube: Learning environment (single-node HA via Docker containers).
  • kubectl: Primary CLI for cluster interaction.
  • containerd: Default container runtime (replaced Docker).
  • Cilium: Recommended CNI for production (encryption, Gateway API, performance).
  • Argo CD: GitOps operator for Helm chart deployment and lifecycle management.
  • Ansible: Preferred tool for provisioning OS and installing Kubernetes components.
  • Terraform: Used only for underlying infrastructure (VMs), not for Kubernetes resource management.

Common Pitfalls

  • Assuming Kubernetes behaves like cloud platforms (AWS/GCP); misunderstanding state management.
  • Using K3s or Minikube in production or government contracts (violates compliance).
  • Running etcd on HDDs or under-resourced nodes → cluster instability.
  • Not isolating control plane nodes → resource contention → unrecoverable failure.
  • Ignoring the causality dilemma in automation scripts → clusters that appear functional but are misconfigured.
  • Using Terraform for Helm deployments → unreliable, 50% failure rate.

Practice Suggestions

  • Practice YAML syntax daily: write and debug minimal manifests (Pods, Deployments, Services).
  • Use kubectl get nodes -o wide and kubectl describe pod <name> to understand status and events.
  • Experiment with node labeling and taints/tolerations to control pod placement.
  • Try deploying a simple app with Argo CD from a Git repo.
  • Simulate cluster failure: stop Minikube, delete, restart — observe recovery steps.
  • Compare Minikube vs. MicroK8s vs. K3s: note feature differences and limitations.