Course recordings on DaDesktop for Training platform
Visit NobleProg websites for related course
Visit outline: Kubernetes from Basic to Advanced (Course code: kubernetes)
Categories: Docker · Kubernetes
Summary
Overview
This course session provides a comprehensive, hands-on deep dive into Kubernetes scheduling mechanisms, including node labeling, node selectors, taints, tolerations, Jobs, and CronJobs. The instructor guides learners through practical exercises in a Minikube environment to demonstrate how to control pod placement, enforce node policies, and automate one-off and recurring tasks. Emphasis is placed on understanding scheduling constraints, troubleshooting common failures (e.g., unscheduled pods due to mismatched labels or untolerated taints), and implementing best practices such as automatic job cleanup and infrastructure-as-code principles. The session concludes with a team-based exercise integrating multiple concepts into a single deployable job.
Topic (Timeline)
1. Course Logistics and Slide Management [00:00:06 - 00:01:12]
- Instructor addresses technical issues with corrupted slide decks in LibreOffice due to auto-save corruption.
- Explains pre-class review process to ensure slide integrity before each session.
- Mentions use of default templates and iterative slide building for each lesson.
2. Kubernetes Scheduling Fundamentals [00:01:12 - 00:03:02]
- Defines Kubernetes scheduling as the process by which the kube-scheduler matches pods to nodes for execution by kubelet.
- Introduces feasible nodes: nodes that meet a pod’s scheduling requirements.
- Explains that if no feasible nodes exist, the pod remains unscheduled until resources become available.
- Defines node labels and roles as key-value pairs; clarifies that roles are a convention (e.g.,
node-role.kubernetes.io/control-plane=true), and values are often ignored or set totrueor empty.
3. Node Selectors and Node Affinity [00:03:02 - 00:05:04]
- Describes node selectors as the simplest form of pod-to-node affinity, requiring exact key-value matches between pod spec and node labels.
- Introduces node affinity as a more granular mechanism with two types:
requiredDuringSchedulingIgnoredDuringExecutionandpreferredDuringSchedulingIgnoredDuringExecution. - Provides example: pod requiring
storagerole and preferringfastlabel; iffastis unavailable, pod still schedules onslowstorage node. - Clarifies that node selectors are a subset of node affinity.
4. Taints and Tolerations [00:05:04 - 00:07:03]
- Defines taints as node-level repelling mechanisms that prevent pods from being scheduled unless they have matching tolerations.
- Explains taint format:
key=value:effect(e.g.,pod=true:NoSchedule). - Describes tolerations as pod-level specifications that allow scheduling on tainted nodes.
- Uses example: control plane nodes are tainted to repel application pods, but system pods like Cilium are given tolerations to run on them.
- Notes: tolerations do not guarantee scheduling — other constraints (e.g., resource requests) still apply.
5. Jobs and CronJobs Overview [00:07:03 - 00:08:34]
- Defines Jobs as one-off tasks that terminate after successful completion; events are retained for 1 hour.
- Best practice: auto-delete completed jobs to avoid clutter.
- Use cases: database user creation, metrics injection, one-time data processing.
- Defines CronJobs for recurring tasks (e.g., backups, log shipping, report generation, SSD trim).
- Notes CronJobs use standard cron syntax and macros for scheduling.
- Mentions lab objectives: practice node labels, roles, selectors, taints, tolerations, Jobs, and CronJobs.
6. Lab Setup and Environment Preparation [00:08:34 - 00:10:01]
- Instructs learners to start with a fresh Minikube environment: stop and delete existing, then restart.
- Confirms availability of updated YAML files for Lessons 6 and 7 via Git; warns of potential conflicts if local files were modified.
- Notes: YAML files are pre-provided to reduce typing, but learners may type manually for practice.
7. Node Labeling and Removal Exercises [00:10:01 - 00:13:41]
- Demonstrates adding a node label:
kubectl label node minikube kubernetes.io/node-type=test - Verifies label with:
kubectl get nodes --show-labels - Demonstrates label removal:
kubectl label node minikube kubernetes.io/node-type- - Repeats process for node role label:
node-role.kubernetes.io/node-type=test - Emphasizes exact key-value matching and common typos (e.g., missing
node-role.kubernetes.io/prefix). - Instructs learners to reapply and remove labels to reinforce understanding.
8. Node Selector Troubleshooting [00:13:41 - 00:17:20]
- Applies
pod-node-selector.yamlwith node selectornode-type=test. - Pod remains in
Pendingstate due to mismatch: node label isnode-type=testbut pod expectsnode-type=true. - Uses
kubectl describe pod node-selectorto identify error: “0 of 1 nodes are available: 1 node didn’t match pod’s node affinity selector.” - Corrects YAML to match node label exactly (
node-type=true), deletes and reapplies pod. - Pod successfully schedules and runs.
- Introduces resource requests and limits:
cpu: 250m(0.25 vCPU),memory: 512Mi. - Explains best practice: request both CPU and memory, limit only memory (to allow CPU burst), as CPU usage is typically short-lived.
9. Node Resource Utilization and Best Practices [00:17:20 - 00:20:31]
- Demonstrates checking resource usage via
kubectl top nodesandkubectl top pods. - Explains that low-load pods (e.g., CoreDNS) often request minimal resources (e.g., 100m CPU, 64Mi RAM).
- Highlights that resource limits prevent runaway processes from consuming all node resources and crashing other pods.
- Notes: monitoring spikes (e.g., 1Gi → 4Gi RAM) indicates application bugs.
10. Taints: Application and Verification [00:20:31 - 00:23:17]
- Taints control plane node:
kubectl taint nodes minikube pod=true:NoSchedule - Verifies taint with:
kubectl describe node minikube→ checks “Taints” section. - Deploys pod without toleration → pod remains
Pendingwith “untolerated taint” error. - Confirms taint is working as intended: node repels untolerated pods.
11. Tolerations: Enabling Pod Scheduling on Tainted Nodes [00:23:17 - 00:25:10]
- Applies
pod-tolerated.yamlwith toleration:key=pod,operator=Equal,value=true,effect=NoSchedule - Pod still fails to schedule due to node label mismatch: node has
node-role.kubernetes.io/control-plane=(empty) but pod expectscontrol-plane=true. - Corrects node label:
kubectl label node minikube node-role.kubernetes.io/control-plane=true --overwrite - Pod successfully schedules and runs.
- Emphasizes: never modify production nodes manually; use infrastructure-as-code (IaC) for changes.
12. Namespace Deletion and Resource Cleanup [00:25:10 - 00:26:04]
- Destroys namespace
app-awithkubectl delete namespace app-a→ all associated resources (pods, services, etc.) are automatically removed. - Reinforces importance of namespace isolation and clean resource lifecycle.
13. Job Creation and Lifecycle [00:27:00 - 00:29:41]
- Applies
job.yamlwith a simplehellocontainer. - Pod runs to completion → job status shows
Completions: 1/1. - Pod remains in
Completedstate after job finishes. - Demonstrates manual cleanup:
kubectl delete job hello-single-job - Warns: uncleaned jobs accumulate and clutter the system.
14. Automatic Job Cleanup with TTL [00:29:41 - 00:30:38]
- Applies
auto-delete-job.yamlwithspec.ttlSecondsAfterFinished: 30 - Job auto-deletes 30 seconds after completion.
- Notes: for auditability, integrate logging/event shipping (e.g., to Grafana) to retain records of completed jobs.
15. CronJob Implementation and Behavior [00:30:38 - 00:33:27]
- Applies
cronjob.yamlscheduled to run every 60 seconds. - Observes new pods created for each run with unique hashes.
- Confirms CronJob maintains a default history of 3 successful pods; older ones auto-delete.
- Demonstrates scaling: after ~4 minutes, 3 pods are active, then one completes and is removed.
- Deletes CronJob with
kubectl delete cronjob <name>.
16. Team Exercise: Integrated Job with Taint, Tolerations, and TTL [00:33:27 - 00:36:09]
- Instructs learners to:
- Start fresh Minikube cluster (
minikube stop && minikube delete && minikube start) - Label control plane node:
node-role.kubernetes.io/control-plane=true - Taint control plane node:
kubectl taint nodes minikube pod=true:NoSchedule - Create Job with:
- TTL: 180 seconds
- Node selector:
node-role.kubernetes.io/control-plane=true - Tolerations: matching the taint (
pod=true:NoSchedule) - Deploy, troubleshoot, verify auto-deletion.
- Start fresh Minikube cluster (
- Instructor confirms learners successfully identify and apply correct commands.
- Summarizes key takeaways: taints repel, tolerations permit, Jobs handle one-offs, CronJobs handle repeats, and cleanup is critical.
17. Lesson Recap and Break Announcement [00:36:09 - 00:36:36]
- Recap: node labels/roles, selectors, taints/tolerations, Jobs, CronJobs, auto-deletion, logging for audit.
- Announces 30-minute lunch break; session resumes at 1:35 PM Eastern.
Appendix
Key Principles
- Node Labels: Used for pod affinity; must match exactly in node selectors.
- Taints: Applied to nodes to repel pods; effect types include
NoSchedule,PreferNoSchedule,NoExecute. - Tolerations: Applied to pods to allow scheduling on tainted nodes; must match key, value, and effect.
- Jobs: Run once to completion; auto-delete via
ttlSecondsAfterFinished. - CronJobs: Run periodically; maintain a configurable history of successful pods (default: 3).
- Resource Requests/Limits: Request CPU and memory; limit only memory to allow CPU bursting.
Tools Used
- Minikube (local Kubernetes cluster)
- kubectl (CLI for cluster management)
- LibreOffice (slide authoring)
- Git (code and YAML file distribution)
Common Pitfalls
- Typos in label keys (e.g., missing
node-role.kubernetes.io/prefix). - Mismatched label values (e.g.,
testvstrue). - Forgetting to apply tolerations when scheduling on tainted nodes.
- Not auto-deleting Jobs → resource clutter.
- Assuming
kubectl delete jobremoves pods — it does not; pods remain until garbage-collected or manually deleted.
Practice Suggestions
- Recreate all exercises in a fresh Minikube cluster.
- Modify YAML files manually (without copying) to reinforce syntax.
- Intentionally introduce errors (e.g., wrong label value) and troubleshoot using
kubectl describe. - Combine taints, tolerations, and node selectors in a single deployment.
- Set up a CronJob to run every 5 minutes and observe pod lifecycle over 30 minutes.