2:14:39
2025-06-26 09:07:32
1:12:32
2025-06-26 09:11:34
6:42
2025-06-26 11:08:41
35:51
2025-06-26 11:24:37
38:41
2025-06-26 13:21:35
20:37
2025-06-26 15:06:35
51:46
2025-06-27 09:06:19
58:45
2025-06-27 09:06:25
36:01
2025-06-27 11:26:09
1:12:38
2025-06-27 13:45:09
Visit the Kubernetes Comprehensive 2-Day course recordings page
WEBVTT--> Maybe it will stay this way. --> Okay, all right. --> Yep, we can go ahead and get started. --> All right, so my name is Lane Johnson, --> and I've been working with Kubernetes since 2016. --> So as a government contractor, --> the government encouraged us to learn how --> to use the new Kubernetes that Google had released, --> and we've used a lot of content. --> containers back then. It was very difficult to manage them. And so I began learning Kubernetes. We actually had a competition in Tampa among government contractors. And it was the first time anyone had ever built a Kubernetes cluster in that area. So it was a nice tech challenge that we took on. So I design engineer and build production grade clusters. And that's kind of what I do full time. --> right now. It's just build clusters. And I use infrastructure as code for all of my clusters. So I --> instruct my clients on cluster concepts and operations, and typically I instruct them on the cluster --> that I built for them. And so I will deploy it in their environment, which is typically --> bare metal. And then I will instruct them on how to operate it, how to modify it, so that they --> learn on their own cluster that they're going to be using. --> And then I also assist clients in troubleshooting production clusters. --> So I have a decent client list who like to build their own clusters. --> And then they run into problems, which we'll get into some of the issues that you can run --> into with a cluster. --> And they reach out to me for help and how to troubleshoot it. --> Yes. --> Yes, now sometimes they're coming from the cloud, --> and they have cloud programs that they're containerizing. --> So they're not necessarily built for Kubernetes. --> And so it might be an old cloud instance --> that uses, say, 20 corps for a Django server, right? --> And so I might assist them on how they would containerize that, --> and they'll containerize it into a pod, --> for example, it needs 20 CPU cores, right? --> So the containers that they're running --> aren't necessarily built for Kubernetes, --> but they're wanting to run it on. --> All right, so we have 11 lessons, --> and we'll go to these real quick. --> The first one is understand the Kubernetes architecture --> and its components. --> This will be a very comprehensive lesson --> that will take quite a while to go through. --> It's the foundation for the rest of the course. --> Lesson two, isolate resources effectively using namespaces, taints, and tolerations. --> Lesson three, manage and customized workloads with deployments, staple sets, and daimonset. --> Lesson four, we're going to work with jobs and Chrome jobs for scheduled tasks. --> Lesson five, Understand Services and DNS within Kubernetes. --> Lesson six, exposed applications using ingress. --> Lesson seven will be defining computational resources --> using requests and limits. --> Lesson eight, manage config maps, secrets, --> and persistent volume. --> Lesson nine, will be scaling and upgrading --> Kubernetes clusters using advanced strategies. --> Lesson 10, analyze and troubleshoot Kubernetes issues. --> And then lesson 11 is where we take the first 10 --> lessons and then we dive in by deploying resources effectively using Helm charts. --> Because at the end of the day, most of what we do inside the cluster will work with some --> sort of manifest files or templating today, that's Helm chart. --> And so we will be using everything from Lesson 1 through 10 when we get to Lesson 11. --> Each lesson will be interactive and we'll build on the prior lesson and then Lesson 11 will utilize --> everything that we'll have time set aside hopefully at the end of the day today maybe 30 minutes --> to review the lessons that we've gone through and then tomorrow we'll set aside a half an hour --> for Q&A and then they have a questionnaire that they like to have the student fill out at the --> end of the two-day course so we'll plan on a half an hour tomorrow so we're six --> of instruction per day with an hour of breaks throughout and the breaks will be we'll have a half an --> hour for lunch we'll do a 15 minute break at around 1030 and then the other 15 minute break will --> be this afternoon these are approximate times but I believe it's going to be around 2.15 or so --> for that afternoon break does that sound about right for you or are you comfortable with that --> Okay. And we do have an interactive Kubernetes learning environment. Now, as I mentioned, I normally train clients on their own cluster. And so in that case, I would Zoom, meet with them, and they would open their own terminal and the UIs that I've deployed. And then I will coach them while they input all the commands and learn how the closest works. In this case, we don't have that. So we have the DA desktop, which we've installed Minicube on. --> And so that will give us our interactive Kubernetes learning environment. --> So our course stop time is 4 p.m. each day. --> And the MiniCube environment, have you used it before? --> Okay. --> So it does allow us to demonstrate concepts, most concepts, not all concepts. --> So Minicube enables you to work in both single node and high availability mode. --> So a single node being one node and high availability where you have three control planes. --> And it's an interesting environment, and it took a lot of engineering for the Kubernetes team to create it, --> because it's basically Kubernetes and Docker, but with a twist, which we'll make it. --> All right, so I don't think when you open your command line terminal, I don't believe you need to use the username and password. --> I think they may enable you to use to have pseudo privileges without any using the password. --> So go to your menu and then you should see, you should be able to type in terminal and it should --> pop up the flavor of terminal that they're using inside. So if you open up menu, I believe it will --> enable you to type in the word terminal and it will bring up a terminal. --> looks like yours is mate terminal okay I want to expand that out I don't think that --> you have to use pseudo or use your name in password I believe that Neil and I --> tried that okay so the first thing we're going to do is verify that all the --> resources are installed so I've type you can type in mini-cube version let's make --> that's installed. No, most of these in MiniCube and with Ubuntu require no dash in front of the --> version. I'm used to typing a dash, but not in this DA desktop. All right, so we've got --> Minicube, and now we'll do Cube Control. Okay, and we're going to check for Helm and check for --> Docker. And we're going to check to see if Sillium is installed. All right, we're going to build --> their first Minicube setup. So Minicube is a toy clip. --> cluster. And so most of your clusters, and I have a slide where I kind of explain the differences. --> Most clusters that you will encounter today are demonstration or toy clusters in the Kubernetes world, --> and they're designed to teach the CLI commands. Not all features work in every toy or demonstration cluster. --> And they're not recommended for production because they're day one. --> is they're day one clusters, which are designed to crash, you know, and then just be restarted. --> So they're not designed to live past day one. --> So Minicube is one of those mini. --> But Minicub has an interesting feature that they built in. --> And I think you'll see here in a minute, once you type that command in, I'll show you the engineers designed it as Kubernetes and Docker. --> And in high availability mode, each node is its own Docker container, which contains Docker inside of it. --> So when you build a node, it's actually a VM that's a Docker container that runs Docker containers inside of it, or Kubernetes inside of it. --> So three nodes will show up as three Docker containers. --> So go ahead and type that command in, for MiniCube Start, and then there are also abbreviations that you can use, but not all abbreviations work. --> So your environment should be 12 CPUs and 12.2 gig of memory, and so you can see here that it's just setting up a single node with two CPUs and 2.9 gig. --> Okay, now we're going to run Cube Control, get nodes, and we can see that we have a single control plane node, and now we're going to run, let's run Docker Stats, DOC-K-E-R, space, STATS. --> And you can see, MiniCube is running as a Docker container, and it has a limit on its memory of 2.8 gig. --> And then the CPUs, it's giving itself two CPUs, which you can't see there, but we know that in the configuration. --> Okay, and you can control C out of that. --> And then you can do Docker PS minus A. --> And we get a little different view. --> So you can see they've got a little bit of engineering magic going on with the Docker network. --> And so Minicube enables you with that to do a multi-node cluster. --> If you were to try to do this on MicroK8s, let's see we had installed MicroK8s, --> which is generally how you would learn to do the full Kubernetes set with Helm. --> you can't install more than one microk8s on a VM. --> So this VM that you're on with T8 desktop, --> you'd only be able to have a single node. --> You would get pretty much the full suite of Kubernetes --> in a toy cluster, but only a single node. --> MiniCube uses Docker magic behind the scenes, --> but it allows you to have multi-nodes. --> That makes sense to you? --> Okay, so and then there are other toy clusters like --> K3S, which is designed for a Raspberry Pi. --> Let's see, there are a few other ones out there as well. --> Correct, yes, demonstration. --> You can run mini-cube on your laptop. --> So for example, if you had a Linux laptop, --> which some Kubernetes developers, that's what they use now, --> you could set a mini-cube on it, --> and that would be your development environment. --> It's just that it won't duplicate a production environment because you have to do certain things a certain way to work on a mini-cube because of the doctor magic that's going on, that you would not do with a production cluster, and there are certain things that will work on a production cluster that will not work on a mini-cube cluster. --> Yeah. So it does have some limitations, but it allows us to do things with multiple nodes that we can't do any other way in a in a virtual environment. --> And so let's do a Gitpods minus A. And we have, as you can see, we've got a core DNS. We have our SDD. We have what they use as kind of net. That's its own mini-cube setup. --> The API server, which is very important, our Qube controller manager, and two proxy, the Qube scheduler, and our storage provision. --> There's a little bit of delay in the slides, but, okay, so now you can MiniCube stop. --> And as soon as that is completed, we'll do a MiniCube delete all, and this actually does require flags, hyphens, and that will give us a --> fresh mini-cube environment when we start up the next one. --> So that wipes everything out. --> And we'll do this frequently between --> Pre-e eliminate issues. --> Okay, so we've gone to the course outline, --> the course schedule, how our Kubernetes environment works. --> Do you have any questions before we begin? --> Okay, let me upload. --> Let's first time I've run to this. --> It says click, exit, --> exit, but there is no exit. --> All right. --> So what is Kubernetes? --> So Kubernetes, which is referred to as K-8s, oftentimes to shorten the name. --> So K-8 is pronounced K-8, and that is based on the first letter K, the next eight letters, and the final letter is S. --> So K-8-S is the shortening for Kubernetes. --> And it's a system that enables the automated deployment and management of containerized applications. --> It's a free open source software program. --> And KH was designed to be a cost effective alternative for managing containerized workloads and services versus standardized cloud application. --> Conventional cloud providers. --> Typically, if you've worked with AWS before, you have a team of infrastructure. --> engineers in the background who created AWS and they created the UIs and the APIs so --> that as a cloud engineer, you can communicate with that using Curaform or some other cloud --> provisioning tool. --> They've abstracted away a lot of the difficult parts of the process. --> With Kubernetes, you are the engineer when you build a cluster. --> So that engineer that worked in the background at AWS, you are that engineer now. --> And so I think that that's where a lot of cloud engineers, where they run into difficulties with Kubernetes thinking that it's going to be just like spinning up a cloud instance, you know, on AWS. --> And they realize that there's actually a steep learning curve because you have to understand how the cluster works behind the scenes before you. --> So in the cloud, you might provision infrastructure using Terraform. --> But in K-8s, we have a Kubernetes engineer who designs and automates the creation of the cluster. --> And then we have DevOps personnel who then provision new clusters and run their application workloads on the cluster using manifest files or Helm templating. --> So, yeah, so in the cloud, and I can actually go through that here. --> So in the cloud, we use state management tools such as terraform. --> So we write our terraform code, infrastructure's code, and we say, you know, spin up this network, --> spin up this cloud server with, you know, 20 CPUs and 12 gig of RAM, --> and pull this, you know, image in or this container. --> build this cloud server and deploy it, provide an ingress. --> And we manage that state with something like Terraform. --> Have you ever used Terraform before? --> Okay. --> So, and that's the way that if you work with AWS, --> almost everything is done through Terraform or their own provisioning tool. --> With Kubernetes, that does not work well. --> And so Kubernetes, the reason why is because when you use Terraform, you are managing --> the state. --> And Terraform will provide back to you feedback and says, okay, this is the state you requested. --> I have now provisioned it and this is the state that it is in, right? --> And so you manage the state yourself. --> And that's because the AWS engineers are managing the infrastructure behind the scenes. --> So they're managing the state of the entire infrastructure and you're just managing the state --> of your particular cloud instance. --> But with Kubernetes, it maintains its own state. --> It's a completely different concept. --> And so because it's maintaining its own state, then we don't use Terraform to provision Kubernetes --> clusters today in modern Kubernetes. --> So we use Terraform for the OS layer. --> the Ubuntu 24 instance. --> And then we use Ansible to install and configure Kubernetes --> on the OS layer. --> Have you used Ansible before? --> Okay. --> Ansible is, I would say probably the preferred tool --> for working with Kubernetes outside of Helm. --> And the Ansible can control Helm. --> What you'll find is that Terraform was created pre-Helm. --> And they do have a Helm module. --> However, it only works about 50% of the time with Helm charts. --> So let that sink in how frustrating that would be trying to deploy a Kubernetes cluster --> and then deploy workloads with Helm, with Terra. --> Oh, you haven't written them? --> Okay, they're not that difficult to write once you learn the formatting. --> Getting the bars correct and managing your secret. --> of you know bars I think is that probably the most difficult aspect of it and then --> once you learn the templating for ancible they're pretty easy to ride actually --> and ancible can almost do the entire thing for Kubernetes the one thing it does not do --> well is spinning up the OS layer and then in Kubernetes we use declarative --> GitOps to provision and maintain the state of the workloads and the cluster --> you can manage it manually through the command line you can manage it --> by running your helm commands but we have declarative git ops tools that will actually manage --> and maintain this state for you and allow you to pull those workloads in change the versions --> it'll use the helm chart templating and it actually makes life a lot easier for devops personnel --> that are managing kubernetes class once you get past the initial templating file of your declarative --> get ops it's it works very smooth oh you haven't so have you heard of argo cd okay so when you deploy your --> your workloads today we typically use helm charts for a production workload and so argo will --> take that helm chart and it will inflate it into the cluster for you it provides a very nice --> UI for you to see and work with and it'll provide the events and as well as the logs to you. --> So if there's an issue and then you can actually go in and see the events and logs because --> it will read them in real time. --> There's a few seconds delay and then you can roll back by just pushing code to say a GitLab repo. --> can roll it back by just changing the version, push that up to your GitLab repo, and --> Argo automatically pulls that repo, and then it'll roll back the version for you. --> If you want to change and upgrade a version, you just push the new version to the repo, Argo pulls --> the repo, and then it will pull in the new container image and the new Helm chart image, --> and then it will manage upgrading that container and that pod inside the --> the Kubernetes cluster and it allows you to really manage your state a lot better than I mean I'm --> used to using the command line but I use Argo now so I can do it either way but when you're --> training DevOps personnel I recommend starting with the command line just so you understand it --> but then once you understand how declarative GitOps works it makes your life a lot easier --> Okay, here we go. --> All right, so the Kubernetes cluster design process --> involves determining the workload needs --> along with the available budget. --> And then you engineer a solution. --> Kubernetes is designed to be engineered --> and automatically provisioned. --> And DevOps personnel are typically trained --> to maintain the cluster. --> And so why Kubernetes came about is with, --> I use AWS as an example, --> but you could use Google Cloud or Azure. --> You had an engineer that provisioned everything and then turned it over to DevOps personnel. --> But that created an additional job, right? --> And so with Kubernetes, it was designed the idea behind it at Google is that we have an initial provisioning with infrastructure as code, --> and then DevOps personnel can run the whole thing. --> They can provision a cluster. --> They can maintain the cluster. --> They can resize the cluster. --> and they can run all of their workloads on the cluster. --> And so it's designed basically to save costs for small and medium-sized companies. --> And so typically I'll also train DevOps personnel on provisioning workloads using GitOps. --> So it can be provisioned to be resource-friendly, maintainable by DevOps personnel, --> and it can provide savings versus the cloud of up the 90% on bare metal. --> So I actually have one client who owns eight companies. --> And his cloud bill with AWS between the eight companies was over half a million dollars a year. --> So pretty, pretty hefty cloud bill. --> And so he decided to run everything on bare metal and trained his son who's in high school to build it with him and run it. --> and his son is running it while he's in high school. --> And so his whole cost for everything --> after his initial outlay for hardware --> and paying his son as less than $50,000 a year. --> That's electricity, air conditioning, everything. --> So he took cloud costs from his eight companies --> from over a half a million a year to less than $50,000. --> And so the savings can be substantial running on Kubernetes --> on bare metal versus the plus. --> So Kubernetes 1.0 was released in July 2015. --> And it experienced rapid growth three years later --> when we brought in all of the cloud engineers. --> And that's where it picked up the name cloud-native Kubernetes. --> And that expanded the ecosystem by bringing, --> for example, AWS cloud engineers into it --> with their cloud tools. --> So that brought a lot of funding into --> Kubernetes and really brought it mean. Kubernetes maintains a minimum of three releases at a time. --> It has three minor releases. So you receive approximately one year of updates after the initial. --> Dot Zero release, and then they reach end of life. The initial dot zero releases are generally for development. --> testing and then once they start with the dot two release you can start running it in production --> and generally you'll see a release schedule that runs from dot zero to dot 13 on --> kubernetes so this is kind of what it looks like so you can see that today we're on --> 1.3 3.3 note 1 so this would be for development and so if i were running a production cluster --> for a client right now I would be running 1.3 2 because we have a dot 2 so we're ready for --> production they're notorious for having bugs sometimes major bugs in the dot zeros so I would --> avoid those in a production setting you know for sure dot 1 sometimes you can get away with it --> but generally by the time you get the dot 2 it's mature enough that you can run by bugs I mean you know --> regressions where something that works on a 1.32.5 will not work on a 1.33.0. --> Yeah, so it's fairly fast moving every four months they come up with. --> All right. So our K-8's components, it consists of nodes, --> odds, the Etsy D, Kubernetes API server, --> Kube scheduler, Kube Controller Manager, --> Kube Cloud Controller Manager, --> which we won't use on Bear Metal, --> and Kube Proxy. --> And then on each node, we have a Kubeet. --> So it looks something like this. --> And again, on bare metal, we generally don't use --> a cloud provider API. --> And there are some instances where you might utilize one, --> but you can see if you can see --> from this diagram, we have a control plane. --> Actually, let me see if you can tell me, --> basically how many nodes are really running on this. --> This is kind of a trick question because of the way --> that they drew the diagram. --> Correct, that's correct. --> Yeah, so this is kind of how it might look in the cloud --> if you're running Kubernetes in the cloud. --> But the control plane is actually running on its own node as well. --> So when we run under metal, we create our own control plane nodes. --> So this is four. --> And you can see we're running Ku Proxy and Kublet on each node. --> And then on the control plane, we have our scheduler, our API, our Etsy, --> a cloud controller manager, and our cloud manager. --> All right, nodes, there are two primary types of nodes within a cluster. --> The control plane node and the worker and the worker node has changed names and it is now called an agent node. --> They got rid of the name worker, but you will still see that in a lot of cluster. --> And then you can also set up a storage node, which removes the storage tasks from the worker node --> and enables all storage containers to run on the dedicated storage node. --> And I, and why might you think that would be a good idea? --> Correct. --> More easily managed storage and resource constraints. --> So if you had your storage running on a worker node and that worker node experienced a high volume of container activity, it might slow down your ability to access your storage, right? --> because it might become resource constraints. --> So if we move those to a storage node, --> their own storage node, --> and that's what we have all of our persistent volumes, --> persistent volume claims, et cetera on. --> It enables those resources for storage --> to be used just for accessing, reading, writing --> to that storage, snapshots, except. --> And then you have a maximum of 5,000 nodes in the cluster. --> So nodes may be labeled using cube control. --> Nodes may be labeled or have a role added, --> such as control plane or agent. --> So often when you look at a production cluster today, --> your control plane nodes will say control plane, --> and your worker nodes will say worker. --> Or if you're building a cluster today, it'll say age. --> And node labels may be removed by repeating the same command --> with a hyphen at the end. --> So you can add labels, --> using Q control and then you can delete them by removing everything we'll go into this --> and demonstrate this a minute that you remove everything from the equal sign forward and replace --> it. So each node can manage up to 110 pods and that is the maximum. It's imposed by --> assigning a 24 cider block in each node. And although there are maximum 256 addresses in the 24 --> Sider block, Kubernetes reserves addresses for other purposes, such as spinning of nodes or spinning up pods, and this leaves 110 addresses available for running pods. --> And you can reduce that if you want to put a limit on your node. --> You can actually limit the nodes, so you want to limit to 100 pods. --> You can do that as well. --> But you can't run more than 100. --> All right, --> NCD is a key value store, --> which is a Kubernetes backing store for the cluster data. --> It is designed for high availability --> and needs appropriate resources to send --> and receive heartbeat messages. --> XED is a leader-based distribution system --> and needs constant communication with each member. --> Etsy stores that are in a high availability environment --> I run an odd numbers, such as one, three, or five. --> So on a single node, we would have one, right? --> But when we go to a high availability cluster, --> we would run our control planes in either three or five nodes. --> So FED is generally run inside of each control plane node. --> If resource starvation occurs, --> you may need to remove resources running --> on the control plane nodes and move them --> to a separate management node. --> Alternatively, you can remove the SED pods and run them as separate SED nodes. --> SED needs to perform multiple reads and rights to the key value store and can run into a situation --> where rights are running behind or delayed. And this may be due to running the SEDD on unsuitable --> local storage. NVME storage works best and SSD storage works great in early all situations. --> Spinning hard disk drive storage, --> which may have difficulty keeping up, --> may leave the cluster in an integrated state --> if the rights are delayed for too long. --> And for enhanced security and communication --> between ETCD nodes, TLS encryption may be enabled. --> Etsy data which contains the secret key value pairs --> may also be encrypted at rest, at rest. --> So there are multiple security protocols --> that can be enabled particularly for this component. --> They are not enabled, however, on toy cluster. --> You may not be able to enable them on toy clusters with the exception of maybe microk8s. --> All right, the next component, the Kubernetes API. --> Kubernetes API server is located in the control plane node. --> It validates and configures the API objects, which can include pods, services, replication --> controllers, among others. --> The Kubernetes API server provides the rest API. --> access to the clusters shared state. --> The Kubernetes API supports retrieving, creating, updating, and deleting resources via the post, put, patch, delete, and get HTTP verbs. --> So it's a real API. --> The Kubernetes API server interacts directly with the EtsyD key value store, which is the Kubernetes Backing Store. --> In a resource-constrained environment, it is not unusual to say. --> see logs showing client timeout or error retrieving resource lock or the Kubernetes API --> server input. And if all of the Kubernetes API servers are unavailable, the cluster may not be --> recoverable. So a very key and important thing to remember, the Kubernetes API server is --> probably, you know, one of the two most important. So if you lose all three nodes and you --> cannot access them then yes you would not be able to bring your cluster but you always --> want to make sure that it's designed so that at least one of your control plane nodes is --> available so that with a VIP in front of it I think you mentioned you used did you --> use metal LB okay so in Kubev is a popular one as well but with a VIP in front of it --> if you still have one control plane node and it is configured correctly, --> then you will still be able to access that one control plane node. --> It will just be in the degree. All right, so the kubb scheduler, --> the kubb scheduler resides in the control plane and assigns pods. --> And the kub scheduler determines which nodes are available --> and assigns pods for scheduling according to constraints and available resources. --> Using a hands-on approach, we will work with some of those constraints --> later on in this course. --> The Cube scheduler communicates with the Kubernetes API server. --> Next is the Cube controller manager. --> The Cube controller manager maintains --> the state of the cluster using control loops. --> Control loops in a Kubernetes cluster --> are non-terminating and regulate the state of the system. --> Examples of controllers in the Kubernetes cluster --> are the replication controller, the namespace controller, --> the endpoints controller, --> and the service accounts controller. --> And it communicates with the Kubernetes API server. --> The cloud controller manager, --> and this is something that you generally don't see --> on their metal. --> The cloud controller manager, --> not to be confused with the controller manager, --> is a Kubernetes control plane component --> that embeds clouds, --> specific control logic. --> The cloud controller manager let you decouple the interoperability logic between Kubernetes --> and the underlying cloud infrastructure it is running on, such as AWS, for example. --> The cloud controller manager includes a node controller, a route controller, and a service controller. --> And it communicates with the Kubernetes API. --> And we have Kube proxy. --> Kube proxy. --> Most Kubernetes clusters come with Kube proxy by default. --> Kube proxy runs on each node. --> Kube proxy can do simple TCP, UDP, and SCTP, --> stream forwarding, or round-robin, TCP, UDP, --> and SET. --> Modern production Kubernetes environments --> use a container network interface for cluster networking. --> Modern CNI, such as Scyllium, --> have replaced Kube proxy and --> internally. This improves reliability within the cluster and can also boost the network --> performance. So Kube proxy comes standard, but in production environments, we typically would --> remove Kube proxy with a replacement and then Kubeut. Kubit service runs on each node as a --> Damon and you can view the status by reviewing the node details. All right, Kubernetes flavors are --> distributions. In 2015, there was only one flavor of Kubernetes, plain vanilla. And that's why they --> call them flavors, if you are wondering, instead of distributions. If you run into someone who's --> worked with Kubernetes for a while, and they say, what flavor are you running? That's because the first --> flavor was vanilla. It must have been a Google thing. I'm not sure. Today, however, there are many --> flavors or distributions to choose from it. So some of the distributions are designed. --> designed for toy or demonstration clusters with limited feature sets that consume fewer resources --> and enable the node to run on a Raspberry Pi or laptop. --> Other flavors are designed with security and production in mind and enable robust security --> measures such as EtsyD encrypted communication, --> XED encrypted data at res, XED snapshoting, node to node encryption, --> pot-a-pot encryption, secrets encryption, there's even a distribution design, --> to meet the Department of Defense Stig requirements. --> Choosing a flavor of Kubernetes is an important decision. --> It can have an impact on a long-term viability --> about the cluster itself and the workloads --> that the cluster manages. --> So the following is a list of inundated list of flavors. --> For enterprise production, --> you have rancher's own RKE2, --> and then you have red hats, --> Micershift or OpenShift. --> And Red Hat is geared towards Department of Defense Stig requirements. --> So if you are using, if you, for example, --> bidding a job for DOD or working with DOD, --> then they may require that you meet certain state requirements, --> which is what Red Hat is designed for. --> And so you'll see those used in those environments. --> For other government contracting, and again, I'm long-term --> government contractor. You'll see RKE2 specified. And in fact, about a year and a half ago, --> government agencies began refusing to pay for products that are run on K3S clusters. So you'll --> probably see a lot of K3S clusters run by companies and they sell a product that runs on --> the K3S clusters. So they provide the entire environment to their customer. And those customers --> just happen to be federal government customers, and about a year and a half ago, they began --> denying payment for products run on K3S clusters because they're actually a toy cluster that's --> designed for a Raspberry. Yeah, so that's a cloud environment, and so that would be similar --> to an RKE2, for example, yeah. But that's their own flavor. So they built their own, they took --> vanilla, and then they built their own flavor on it. --> it and the cost is actually higher than running it on the cloud. --> The way I describe AWS or Google or Azure's Kubernetes offering is it's to learn on --> and then move off of it as quickly as possible because the cost is actually higher than the cloud. --> My clients have told me it when they move their workloads over, it actually costs them --> a little bit more each month to run it in Kubernetes on the cloud than the cloud itself. --> and then they quickly figure out how to discover bare metal and move to bear metal --> for the savings so for development demonstration we have RKE1 which is Rancher's original --> Kubernetes and that is actually Kubernetes in Docker similar to minicube that we're using now --> K3S was developed by Rancher and it was designed for Raspberry Pi --> and you may have you ever worked with k3s okay so it was kind of made famous back during --> covid when an engineer during covid moved to south america and i forget which country --> peru or somewhere and he needed to maintain a presence in the u.s and so he developed an ansible --> script for spending up a k3s cluster on a raspberry pie setup and then he used tail scale to --> connect to a DigitalOcean droplet for an IP address in the U.S. --> And he was able to service all of his clients using a U.S.-based IP address. --> And once he published that, it put K3S on the map. --> And so those scripts were made available in a lot of companies because they did not know how --> to create a Kubernetes cluster. --> They took his scripts and another individual named Technotim modified it and created --> at his own. And so a lot of companies have been running K3S in production because those scripts --> are available for free in GitHub repos. But they're actually, yeah, and so those are actually --> toy clusters. So what happened is Rancher was raising capital to develop their RKE2. And so somebody --> had a bright idea to put on the K3S page at its production grade, even though it wasn't, and that --> help them raise the capital. And then they quickly remove that and turned it over to, I believe --> it's the Cloud Native Foundation is now running K3S. The K3S engine, however, is also the engine for --> RKE2, but that's where it stops. So the K3S engine is quite robust. It's just that K3S has a lot --> of features disabled so that they can run on a Raspberry Pot. Then you have Dr. K-8s. We've already --> discuss microkates a minicube, Dr. K8s, which is resource intensive. You can run it on a --> a Mac, for example. Delay on this. Okay, so Dr. K8 is resource intensive. I've actually --> destroyed a MacBook running Dr. K8s on it. Yeah, fairly new MacBook. It's, so what --> happens is it uses so many resources and Rancher desktop. --> as well that the battery can't cool down even with the fan running and so the battery --> starts expanding rapidly because of the heat yeah and so you have Rancher desktop --> which competes with Doctor K-8s and it allows you to run traffic and so you can --> develop on it but a lot of features don't work inside Rancher desktop and --> And they didn't devote the resources to really finishing it. --> There are some teams, and you'll run into this in the Kubernetes ecosystem where they go ahead and --> push something to production that's not ready. --> And Ranch or desktop was an interesting idea, but it was not ready for production when they pushed it. --> And then you have the original Kaniacal from Ubuntu, and then of course in 2015, --> Vanilla came out and it's still around, is what everything was based. --> All right, can you think of a reason you may not --> you may not want to encrypt at cd communication why you may want to increase yes so on a single --> control plane you wouldn't need it because everything's within the same node so it would all be --> running in the same VM or virtual private server but as soon as you go to a three --> node high availability setup um or five nodes which you probably won't run into a five --> node control plane um but a three node control plane then you need to be able to --> protect the data flowing back and forth because the EtsyD contains your secret store for your --> key value pairs so passwords think of terms of all right can you think of a reason you may want --> node to node or pod to pod encryption yeah so so think of in the cloud right so we had --> TLS search that connected directly to the container in the cloud --> And so when you're ingress into the cloud into your your workload, your TLS serve as residing on that cloud instance. --> But in Kubernetes, that's cumbersome. --> If you have 500 containers, do you really want to manage 500 certificates, right, and update them? --> And it's a failure point in Kubernetes. --> And so, and we'll talk about this, but Kubernetes is replacing ingress with Gateway API. --> And it's been in beta for three years, and it works really well. --> But Gateway API is designed with a single TLS search, which terminates at the gateway, --> and then traffic is passed to the containers in an encrypted tunnel. --> So it gets past node-to-node encryption. --> It gets passed through pod-to-pod encryption, and it can do node-de-pod and pod-to-node encryption as well. --> And so that enables us to terminate the TLS. --> and still have multiple nodes spread out amongst a network --> where the traffic between them is encrypted, --> even if we don't control that entire network. --> So it's available as part of a CNI. --> And so Cillium is one of the few C&Is that offers that out of the box. --> So when I talk to clients, --> when they ask me about C&Is, --> I always encourage them to use Cillium because it's the easiest --> and it has everything out of the box, --> so you don't pay for it. --> But it's done through transparent encryption through the C&I. --> All right, so what would happen to a cluster if the Kubernetes API server were available? --> Right. --> So although the cluster doesn't fail immediately, it will typically lead to loss of control over the Kubernetes cluster. --> So it will prevent the deployment of new resources and can prevent the management of existing resources. --> This can manifest itself in an error such as unable to connect a service. --> to connect a server or client timeout when running Qube control Gitpods or QC control --> apply etc. It will also show up in the logs for individual Kubernetes components that need --> to communicate with the API server. These are typically on the control plans. Those logs will --> be unavailable to see unless they have been shipped to a log collector as a command used --> to view logs, QQ control logs, will return the unable to connect server error. So you can see where --> where if the Kubernetes API server is unavailable, --> it becomes very difficult to even try to diagnose a cluster. --> All right, can you think of a reason you might not want --> to run your control plane nodes on shared resource virtual machines --> or shared resource virtual servers? --> So let's say, for example, --> you had the ability to spin up a Kubernetes cluster --> on DigitalOcean or Hexner or, you know, OV Cloud, --> OVH Cloud, you have a choice between dedicated --> dedicated resources or shared resources. --> Both are virtual machines, by the way. --> But with a shared resource, it's cheaper. --> It's half the price. --> And so you'll see initial Kubernetes practitioners --> gravitate towards that because when you spend up --> Kubernetes cluster, you have multiple nodes, right? --> So you have to pay for each VM. --> So if you do shared, it's half the price. --> So it makes sense. --> I can spend up a cluster for half the price. --> But shared resources versus dedicated resource --> could enable your control playing to degrade beyond a repairable state and render your cluster --> inoperable. So you're competing for resources in a shared environment. But it is half the price --> to spin up a cluster. So for demonstration purposes, or if you're testing and you want to test --> it on bare metal in the cloud or a virtual machine, it is a cheaper way to test your cluster. --> say you want to build an Ancable script to automate the spin-up. --> You can test it on shared resources and then, you know, spin it down, delete them, --> and then run it on dedicated resources after that. --> If you could choose the two most valuable components in a control plane node, what would they be? --> Yes, yes. --> So like the Kubernetes components that we went through earlier, there's about, what, six or seven different components. --> And you can see on your screen right here, some of the components. --> So on your terminal, you've got core DNS, SED, Kynet, API, server, controller manager. --> There you go. --> Without them, you will not be able to access the cluster. --> Read the current state of the cluster. --> You have a cluster, but you're not going to be able to do anything with it if you lose those two. --> All right. --> Day one versus day two clusters. --> All right. --> A day one cluster, day one cluster is designed to operate for a single day or task. --> It can be spun down, modified, and rebuilt using infrastructure as code. --> Example of day one clusters or development and staging environments. --> A day two cluster is designed to survive for longer than a day, obviously. --> Typically a Kubernetes engineer will not test new node or pod configurations on the day. --> two cluster. Any testing that needs to be performed should be performed on a day one cluster. --> After all proper configurations have been determined and tested, the new configurations can then --> be applied to a day two cluster. And this is typically performed using infrastructure as code --> or declarative GitHub. So ensuring the availability of both control plane nodes and agent nodes --> requires proper plan. Control plane nodes and we're dealing with high availability in --> Kubernetes environment. Control plane nodes on bare metal use a leader election with a load balancer --> that assigns a virtual IP to the leader. The virtual IP address ensures that if the leader changes, --> the proper node can still be queried and receive messages from other nodes. For the agent or --> worker nodes, high availability means enabling multiple instances of a --> stateless workload across multiple nodes. This is enabled with a load --> balancer service and may also include the ability to auto-scale pods up or --> down with a minimum and maximum number. In a high availability set up with --> three nodes, a pod anti-infinity setting is used to make sure the pods are --> distributed equally across available nodes. This --> It enables a cluster to avoid ending up with three pods on a single node and the two remaining nodes containing no. --> You know, apologies for the delay there. --> Every once in a while it clicks through and other times it just freezes on the slide deck. --> All right, so Kubernetes is designed from the beginning with self-healing capabilities. --> This enables it to maintain the availability of workloads. --> It automatically reschedules workloads when nodes become unavailable. --> replaces field containers and ensures that the desired state of the system is maintained. --> The self-huling capabilities include four key areas. --> Container level restarts. --> If a container fails, Kubernetes will automatically restart the container within a pod based on the restart policy. --> Load balancing for services. --> If a pod fails, Kubernetes will automatically remove it from the services endpoints --> and route traffic only to healthy pods. --> Persistent storage recovery, if a node in a running pod --> with a persistent volume attached and the node fails, --> Kubernetes will reattachats the volume to a new pod on a replica replacement. --> In a pod, if a pod in a deployment or staple set fails, --> a replacement pod will be created by Kubernetes --> to maintain the desired state of route. --> If the pod fails as part of a daemon set, --> the control plane will replace the pod to run on the same node. --> This self-healing capability is one of the reasons it has grown in popularity with DevOps. --> All right, now we get to the causality dilemma. --> What is the causality dilemma in Kubernetes? --> Are you familiar with the causality dilemma? --> Okay, so it's commonly referred to as a chicken or egg paradox, which came first? --> It's a circular situation that describes very well what is often experienced in design. --> and engineering a Kubernetes cluster using automation. --> Certain processes, and this also has to do with GitOps, --> certain processes need a resource to start. --> However, for that resource to be in place --> and may require a certain process to run first, --> hence the causality dilemma. --> So if you automate the building with Kubernetes clusters from scratch, --> using infrastructure as code, you will likely run into this issue. --> And what I see is the automation --> that are out there, the engineers typically do not even attempt to solve the cause --> that all the dilemma. And so your cluster is not actually operating the way that it's intended --> to run because they couldn't figure out how to solve that piece. So one piece where that's --> important is when you have a CNI. So if we have a high availability control plane with three --> and we have a BIP in front of it, --> that we need to access our API server --> based on who's the leader. --> But when we set up our CNI, --> it's based on its own internal API. --> Or on step one, --> it's based on the IP address --> of the primary control plane node. --> But when we assign a VIP, --> we then need to give that VIP to the CNI, --> right? --> So we have to install. --> the C&I, install the VIP, and then get the VIP, and then reinstall the CNI. --> So that's the cause. --> Did that make sense? --> I think we have one practice where we might delve into it just a little. --> We won't be building clusters, so we won't have too much on that. --> All right. --> The Cube config file. --> So it's required for access to the Coup Control command line tool. --> And the Coup Config file name may be descriptive, or it may be --> or it may reside as just config under the .cube directory. --> This is the key to the system to access the file, --> navigate to the dot cube directory of the host or client, --> and cat the config file. --> In a production cluster, you will need to export the location --> of the CoupConfig file --> for the Coup Control command using something like --> export Cube Config equals where it's located at the name. --> So you can see here, this is a code config --> here. This is an example of a production setup where you had production, you have development, --> and you have multiple clusters. And so you would export the K-Config, so that it knows exactly --> where the config file is. In this case, it's named K-Config. Yamil in that particular location. --> So, and then to unset the K-Config, we simply run Unset Coup Config. So MiniCube takes --> care of this every time you start up a dev cluster. --> And then the Kubernetes API server listens on port 644-3 --> or 443, depending on how it is configured. --> So it can be either port. --> All right, we're gonna get into practical application one. --> So we're going to, let's see, spin up a mini-cube cluster. --> So we're gonna do the original command, --> we did mini-cube start. --> I have a question for you here while you're waiting for that to spin up. --> So how do you check the version? --> So obviously it's cheating by telling us that says it downloaded Kubernetes version 1.3.1, right? --> You see it in there. --> So obviously that's cheating. --> We know it's version 1.33.1, but in the absence of MiniCube, if we were doing this on our own cluster, --> how would we check the version? --> So using the Coup Control command line tool, we check the version of Coup Control. --> And we can see that the version is printed every time with that command. --> version 1.3.3.1. And based on the Kubernetes version shown, what is the anticipated end of life --> for this version? What task will need to be performed by this EOL date? And let me see here. --> There we go. Correct. Now, we would not be running this in production. This would be in a --> development cluster. So in production, we'd have 1.32. But yes. And then, --> And so what task would need to be performed by this EOLD? --> Correct. Or if we're using infrastructure as code and we have designed the clusters so that they're stateless, right? --> So our storage is in a separate cluster or it's using a managed storage service, then we would simply create a new cluster and using GitOps, manage our workloads in it and just point over to a new cluster. --> So in today's environment, with the advance of infrastructure as code, you can do it two ways. --> You can manage a cluster and keep it running forever, and the artifacts will build up inside that cluster over time. --> And it can introduce issues if you've had a cluster running for three or four years and you upgrade it in place. --> It can introduce issues into that. --> Or you can build a fresh cluster with infrastructure as code. --> So there are several ways to do it. --> You can build cluster one, cluster two. --> So let's say cluster one is 1.31 and cluster 2 is 1.33. --> And you can run a cluster mesh between them and then you can just start draining the nodes. --> And if you have your workload set up properly, as you drain the nodes on cluster 1, which --> is 1.3, it will move those workloads automatically over to the cluster 2, which is 1.33. --> And so some enterprises will use a cluster A, --> cluster B type of setup with a cluster mesh. --> And the other cluster is the one that they're, --> then would start upgrading. --> So once everything transferred over and work, --> then you would start building a new cluster in cluster one, --> join it with cluster mesh tested out. --> The other way, oh, second, go ahead. --> It could, yes, as long as nothing goes wrong. --> The other way is that you spin up a new cluster --> with all of your workloads already on it. --> Because you're using stateless workloads, --> if you've designed it for Kubernetes, --> and you don't have a database in that cluster. --> Then you can just spin up an entirely new cluster --> using Argo CD, for example, for your declarative GitOps. --> Everything is running, and then you just point the DNS --> over to the new cluster. --> Yeah, and so that would be probably more seamless, --> depending on how big your cluster. --> So those are two ways to do it with infrastructure as code. --> And then the other way is you can simply create a node or drain a node, cordon a node, --> and then you do your upgrade in place, and then you uncoordant the node, --> and then allow the pods to repopulate on the node after it joins and it's uncordined. --> The pods will populate if the pods have been set up correctly to do that. --> If they have not, however, then they may just stay on the existing node that they're on until you --> cordon that node and then start draining it and then they'll force them to populate over --> So you can run into issues when you're upgrading a cluster in place if you haven't set up your pods from the very beginning --> whether it's a deployment or a staple set if you haven't set it up so that it is designed to run on multiple --> nodes if that's the type of setup that you have. So one of the many issues that you can run --> into when upgrading the cluster. And that's why with infrastructure is code, most of my clients, they --> just spin up a new cluster and move everything over to the new cluster. They don't, you know, mess with --> today upgrading in place, but we're going to go through that and show you how to do that. --> So you at least know how the concept. Okay. I think we are, you know, we're at 1030. Let's take a break. --> You ready for a break? --> They're gonna get into lots of practical application --> when we come back. --> So we'll do a 15 minute break and we'll be back. --> Let's see, I've got 1032, so about 10.40 seconds. --> Okay, see in a few. --> All right, are you back? --> All right. --> So we left off at practice two. --> Okay, we're gonna check the container runtime. --> How do we check the container run? --> So I guess what is a container runtime? --> What types of container run times are there? --> And we don't really go over this in detail --> because it's more of a Docker concept. --> But you have a Docker container runtime, right? --> And then you have several different types of run times. --> So what we're going to do is we are going to check --> and see what container runtime we're running in Mini. --> We'll run through control, get nodes, minus, --> and that's a lowercase O, and then wide. --> And what container runtime --> you see. Correct. And do you notice, but there's a little bug in MiniCube. I don't know why, --> but bugs always stand out to me. You're running on Ubuntu 24, and it reads the image as Ubuntu --> 22, and that's actually a mini-cube bug. So interesting. All right. So we're going to --> change the container runtime. So how do we ensure a fresh mini-cube environment? So we'll --> We're going to stop the MiniCube environment and delete all. --> And then we're going to start it with a new container runtime, which is going to be Container D. --> Correct. --> And so Kubernetes today uses Container D, and they've gotten away from Docker. --> So they do very little with Docker these days, and that's because they competed. --> Docker had Swarm, which was kind of like the early. --> containerization orchestration program. And that competed with Kubernetes, so Kubernetes moved away from --> that to container D, which also, by the way, came from D. You'll run into that a lot in the Kubernetes --> ecosystem where teams compete with it. Okay, now we're going to label a node for Kubernetes. --> node type equals test. So if you look, all right, so go ahead and label the node, --> if control label node, and our node name is minicube, and then label it with Kubernetes. --> I.O. Have you ever labeled nodes before? So the node name is actually Minicube. So the --> command, yeah, it's just we didn't name the mini-cube cluster something else, so the actually --> name of this node is MiniCube. In a multi-node environment, the next one would be Minicube --> dash 02, I believe. Okay, you can try it. You can try running it in there and see what comes up. --> So in order to see the node label, we're going to run Cube Control getting nodes, but then --> show labels. Okay, see if you can find it. And that long string node. --> So I've tried to use parsing tools, and there are a few out there, but they still leave something to be desired. --> Okay. So now, we're going to remove the node label. And we do that by removing everything after the equal sign and replacing it with a hyphen. --> So it's the same command that we used before. --> Okay, let me see it's unlabeled. When you look, it'll now be gone. All right, we're going to assign. --> a role to a node. So if you look at that, you'll see that you have the name is mini-cube, --> the status is ready, and the roll is control plane. Right. So now we're going to actually --> assign this a role. Previously we just labeled it and now we're going to label it with a --> node role. I would have as well. Let me check here. Yeah, that's interesting. --> So I would have expected to see test as well. --> Oh, no, you know what? --> No, node type is. --> So you can use true. --> Okay, so here's the thing. --> We named it node type equals test. --> Because sometimes when you use true, label selectors do not recognize that. --> And you'll uncover that when you start to use Helm charts. --> And so you can use true. --> but in this case I just use test so you can put anything you want in it so it's actually labeled as a node type and if you look at the label for control planes so there's nothing after the equal now we're going to remove that node label the role all right now we're going to install a pod so we're going to relabel it with node type equals test here you go verify it looks like it's in there yep okay so now --> Now we're going to install a pod. --> So using VIM, we should have VIN installed, create a pod-nash note selector. --> YAML file. --> I don't think so. --> I think they disabled to do where everyone is auto- --> Yes. --> And so the reason I do this, what I've done is I've shortcut a lot of my examples, so they just --> have the basics. --> And the reason for that is a lot of the errors that you will run into running a Kubernetes --> cluster and running workloads on the Kubernetes cluster. --> will be based on the syntax of your YAML. --> And so by practicing on several of these short YAML files --> that we have in each lesson, --> you'll start to understand how the syntax goes together. --> And one of the problems with Kubernetes --> that can be so frustrating is that it doesn't provide --> descriptions of your errors. --> And sometimes it will tell you which line --> you have a YAML error on, --> but oftentimes it does not, --> especially with helen charts. --> So learning the proper formatting --> will save you valuable time later on --> on your troubles sheet. --> And I pulled a lot of it out of these, --> so we should be pretty brief. --> All right. --> Yeah. --> Looks right, if you have a lot of kind metadata. --> No, select here you. --> Inspeb, containers, name, image. --> Yeah, looks good. --> Go ahead and escape. --> And then, yep, right quick. --> That'll work. --> And then I'm going to apply to keep control applying minus f for file --> pod dash note selector.m. All right now we're going to keep control and get pods --> minus it and what is this static?