2:14:39
2025-06-26 09:07:32
1:12:32
2025-06-26 09:11:34
6:42
2025-06-26 11:08:41
35:51
2025-06-26 11:24:37
38:41
2025-06-26 13:21:35
20:37
2025-06-26 15:06:35
51:46
2025-06-27 09:06:19
58:45
2025-06-27 09:06:25
36:01
2025-06-27 11:26:09
1:12:38
2025-06-27 13:45:09
Visit the Kubernetes Comprehensive 2-Day course recordings page
WEBVTT--> All right, lesson nine, scale and upgrade Kubernetes clusters using advanced strategy. --> So this will be kind of an abbreviated session, and then we'll go into testing it out. --> We don't have a way to really test this on a full Kubernetes cluster. --> So I'll explain to you the concepts of how it works, and then we'll. --> We'll demonstrate it on a Coup on a mini. --> Typically, if you do this, you would use something like Cupidmin. --> The problem with that is we don't use Cupidmin really --> to manage clusters too much if we're using infrastructure as code --> especially. --> So if there are two ways to do it. --> And so with infrastructure is code, --> we build the cluster how we need it. --> And so using Terraformer hands, --> well, we build it how we need it. --> If we want a new cluster with a new version, then we use IAC. --> The other way is you build a cluster like a cloud server, right? --> And then you manage it with KUBITM, which maintains your state. --> And again, that's an older way of doing it, but there are still practitioners who use. --> Infrastructure is code? --> Yeah. --> Yeah, so there are several ways to do it, and it just depends on how you're, you know, --> how you're setting up your cluster to begin with. --> So if we're using IAC to set it up, --> we wouldn't use COVID-MIN. --> But if you're using KubeDmin to set it up, --> then you could use Kupidmin to upgrade your nodes --> to scale your nodes up or down. --> So but we'll go through the concepts --> and then I'll demonstrate it with many. --> Oh yeah, yeah. --> So remember when we looked at Sillium, --> when we did Sillium status and it showed --> cluster mesh was disabled okay so with cillium you can set up two clusters and you can join them --> with a cluster mesh and then what you can do is cluster a is running your workloads your --> stateless workloads cluster b is your new cluster with the latest kubernetes version --> and it's connected with a cluster mesh and you can start draining your nodes and um start deploying your --> your stateful workloads onto the second cluster because it joins them together like it's one --> cluster it's a very advanced concept and it's improved through the years i mean when it first --> started um there were government agencies that put you know a half a million dollars into their --> cluster mesh um that was when it was you know beta today you can build a cluster mesh --> really economical. --> So you can see with a lot of the concepts of Kubernetes --> who started out very complex. --> Only a few engineers knew how to make it work. --> They were the early alpha testers, if you will. --> And then that piece has actually improved considerably, --> especially in the past year. --> Yes, by pointing the DNS. --> So the way that I recommend for my clients is --> just build another cluster the way you want it --> with the latest version. --> Test it. --> And then once you're competent, --> that your tests have all passed, then go ahead and use declarative GitOps to bring in your --> stateless workloads. And then once you're satisfied with the state that they're in, --> then you just point the DNS from the old ingress to the new ingress. --> Because they're stateless workloads, so it doesn't matter, right? --> So that's the simplest way. Oh, no, no, that's actually the simplest way. --> So Kubernetes worker nodes can be scaled up or down as needed. --> You can have as many as you want up to 4,000. --> Let's see. --> On a high availability, you can add 4,900 and 97 workers. --> So there is a limit. --> The limit is 5,000 nodes per cluster. --> So if you have three for your control plane, you can have 4,997 worker nodes. --> Now, that's not going to give you any room to add a node before you delete a node, though. --> So you might not want to run up against that maximum. --> I've never run into anyone who has come close. --> There are government agencies that run 1,000 node clusters, --> but they've never run into anyone who did 5,000. --> Kubernetes control plane nodes must be scaled an odd number. --> Control plane nodes are running one, three, or five node cluster. --> And Minicube provides the ability to add or delete --> nodes. Kind of a neat little feature. They abstracted away a little bit of the more difficult --> aspects of it. And so to scale downward, Kubernetes scaling downward involves coordinating the node. --> Coordining a node prevents future pods from being scheduled on the nodes. While it's cordoned, --> the pods run that are already scheduled, they run on it, but no new work can be scheduled. --> So this is a similar state that you would encounter if you lose your high availability on your control plane. --> Everything would continue to run, but you wouldn't be able to schedule, you know, new workloads. --> Because essentially all of your nodes would be cordoned. --> So coordinating a node prevents future pods from being scheduled. --> The next step is you drain the node of all the pods. --> So you drain it, which tells you. --> tells a scheduler, hey, find another node --> to put these pods on. --> It will remove them and will simultaneously --> attempt to schedule them on another node --> if there is availability and if not, --> then it will just say pending note. --> So you'll be able to look at the pod, --> but there will be no node assigned to it. --> If you use the hyphen O wide, --> you'll see that the node space is empty. --> and it'll say no nodes available to schedule to and it'll wait until the scheduler finds it a node okay --> so pod disruption budgets may need to be set to avoid interruption so that could cause issues with some of your auto-scaling --> setups so you need to enable pod disruption budgets if you're going to be doing this manually means an extra step --> that needs to be tested on a test server before you put that into deployment if you plan on doing this on a --> a production server. --> I don't recommend it, but --> there are engineers who --> do scale up or down --> in production. --> Draining a node forces pods to be --> scheduled on other nodes. --> So the final step after we --> drain the node is to remove --> the node. --> Essentially, deleting a node removes it --> from the Kubernetes cluster right. --> Scaling Kubernetes nodes --> upward. --> Kubernetes scaling upward involves creating a node. --> So we would create a VM to Ubuntu 24 on there, --> install our Kubernetes flavor. --> And now we're ready to, --> the next step would be to join the node to the cluster. --> So in that case, we give it the cert, --> so that it can connect to the cluster --> because it uses a certificate to connect --> along with the stream. --> And it tells the to the API server that, hey, I am a server node or, hey, I am a worker. --> The next step after that is we accordion the node, if it was cordoned during startup. --> And this notifies the pod scheduler to allow pods to be added to them. --> And we're up and running. --> All right, upgrading Kubernetes nodes. --> Upgrading the Kubernetes version requires cordoning the node. --> Next is to drain the node so that all of the pods are off of it. --> We then upgrade the node to the latest version that we're happy with, --> probably not a dot zero or maybe not even a dot one, but a dot two onward. --> We then restart the node. --> Alternatively, you can just restart the system control processes, --> but I would recommend restarting the full node is that cleans everything out. --> Imagine if you had a server running for two years and all you did was a fake version of Kubernetes and didn't actually restart the server. --> So generally, this is an existing VM and it's been running for probably close to a year. --> And so you'd want to totally upgrade the server. --> You restart the node and then finally you uncordant the node. --> And you'll run into issues probably 10% of the time when you do this. --> But it's just not going to want to come up or joy. --> That's why I don't recommend doing this with production clusters. --> Okay, cluster scaling and upgrading using infrastructure is code. --> So cluster scaling or upgrading involving IAC in a stateless workload. --> We create a new cluster with the latest tested version. --> The workload state can be copied using GitOps, declared a GitOps, onto the new cluster. --> reaching the desired state, the DNS can be transferred to the new IP. So you could have multiple --> gateways coming in. You could add multiple ingresses. You would transfer your ingress, --> your DNS to the IP addresses of the new cluster. You then monitor stateless workloads --> for any issues, and you can transfer back to the original cluster if issues around. So the alternative --> is in a cluster mesh then you can simply drain the nodes and if you have it set up --> correctly inside that cluster mesh the pods will automatically transfer over to --> the duplicate and obviously takes a fair amount of work to settle all that up to --> work in an automated fashion okay let's go right into the practical shall we --> make sure we have a fresh mini-cube profile and this time we're going to create a --> H-A. Mini-Cube cluster with Container D. --> Up till now, other than yesterday morning, we've been using the Docker runtime, --> and so now we're going to do this with Container D just to mix it up with it. --> All right. --> Let's make sure we have a happy cluster. --> Is it happy? --> Yeah, Mini-Cube engineers had obviously too much time on their hands in a sense of humor at some point. --> So, all right, we'll get the status of a cluster. --> Okay, and then go back to that happy. --> The profile has noticed anything interesting in the port. --> Typically, a Kubernetes cluster in H.A. production is on one of two ports. --> It's going to be on 443, 46443. --> But the engineers decided to put MiniCube on 8,443, which nothing else is going to use. --> My guess is somewhere along the way, someone had a conflict inside a single VM with 6443, or with 6443. --> So they moved it to 8443, which would eliminate most conflict. --> Okay. So if you're looking for 6443 or 443, MiniCube's going to be 8443. --> Let's have some fun with this cluster. --> Now my slides are those. All right. So we delete. --> a control plane node. You can see that M-O-2 is number two. Yeah, so it, it knows that it's --> Minicube-M-O-2 because it's given zero two because that's the profile. So if we were --> deleting the primary node, we would call it just mini-cube, but it recognizes after --> the height. At least it does on MIS, check the status. Well, only two nodes, --> and notice we have mini-cube an M-0-3. --> Is it a happy cluster? --> And take a look at the pods real quick. --> All right. So notice we just have the pods that were on the two remaining nodes --> and the pods that were on the original node we deleted. --> Number two are gone. So nothing there. --> Okay, let's add a control plane node. --> So now we've, yeah, so now we have three so we're happy again. --> let's take a look at the pods um well it's not going to be able to do a leader election --> it can cause issues with a leader election correct well depending on how the --> many cube engineers set this up they may have only accounted for three nodes in an h a --> we could definitely test that for fun though you want to go ahead and add --> one more and let's test it and two more all right --> It's happy with four nodes, but it won't be able to do a leader election, probably, because it's an uneven number. --> Well, it's a lottery, so it depends, but go ahead and add a fifth one. --> All right, let's see if it's happy or if anything changes there. --> Oh, happy with five. --> So there you go. --> Now you have a five control plane node. --> Now, what is the difference between a three and a five? --> We haven't talked about that because that's an advanced concept, and you probably won't run into it anyway, but what is the difference? --> Well, if we have three control planes in AHA and we lose one, then we still have two left, --> but they can't elect a leader because there's two. --> So each one's going to elect themselves on the first round, right? --> And then they might elect the other one on the second round. --> So then they elect each other. --> So you can't schedule anything new, but you can continue to run your existing workloads. --> And if you drop down to one node, you're going to start to lose pods. --> Because it's going to be too much for that node to handle by itself because it went from a high availability to no high availability. --> But if we have five and we lose a node, we still have four. --> So we can still schedule and it will eventually probably elect a leader. --> But there is a very remote chance that it does not work for. --> And then if we lose another node, we can still schedule pods because we have three. --> We have to lose three nodes before we can no longer schedule pods. --> You have to have three nodes come offline. --> On your control plane, if you're running a five control plane, high availability setup. --> Now that requires a lot of additional work to go from three to five, but it can be done. --> When it does give you extra resiliency, if you think that you might lose one, you know, one of your bare metal instances. --> Correct. --> Yeah, correct. --> So once you lose one, you can't schedule because you no longer have the high availability, but you can continue to run. --> But if you lose a second one, then you may start drop odds. --> Yeah. --> So that's, yeah, so that's the way you, you run things with your own ORCA. --> So typically if you're running, say, no more than 20 nodes, a three-node control plane is fine. --> But when you start getting above 100 nodes, that's when I start telling clients, you know, consider running five control plane notes. --> Why, yeah, why they don't is because, well, I have to pay for those control planes. --> So I have to spend up VMs that's using resources. --> Why am I doing that? --> What do I gain? --> But you can lose three and still maintain your workloads, right? --> And still schedule new workloads. --> You can lose three of them. --> So you can, yeah, you can lose three and still maintain. --> You can lose two and still schedule. --> Yeah. --> A hundred. --> Yeah, once you get beyond a hundred nodes, yeah. --> So you probably get, --> into complexity when you have, you know, above 100 nodes. So I define clusters as above 100 node --> and below 100 nodes. And the majority of clusters are below 20 nodes total, including control --> frame. So that's your vast majority of clusters, probably 95% are going to be below 20. And then you --> have this area where you start moving up towards 100, still a 3 will work fine. Well, when you get above --> 100, you're probably going away above 100, right? I mean, that's kind of the difference in --> Kubernetes. You could have 500, you could have a thousand. And so you need to be running a five-node --> control plane. And it's, yeah, it has to do with the elector and how all of the services work on the --> control plane that once you lose one node and you just have two, an even number, there's issues --> that arise so it keeps it from scheduling. Now it doesn't mean that it might not resolve it for a couple --> of hours and allow you to schedule pods for a couple hours and then all of a sudden it goes back --> into the hey we didn't you know we can't communicate we can't figure out who the leader is. But I wouldn't --> count on that. You know that's kind of like when you set a limit for a pod and you're able or a --> container and you're able to exceed that limit for a minute before the limit kicks in kind of the same --> principle i wouldn't count on that but yeah for sure a five control plane setup will give you more --> resiliency than a three okay let's see we add so now we need to remove two of those --> yeah it's somewhere in there any group node delete control plane or no it's delete uh we have --> give it a name delete m06 and we'll 5 yeah they've abstracted away some of the difficult --> stuff because we'd have the cord and then then drain and then delete and they do that for us --> Part of that mini-cube magic. --> Yeah, because daemon sets are automatically on each node or not on each node, so you ignore the daemon sets. --> So when you start draining it, well, if you're, yeah, if you're just talking about removing a node, but if it's a, if it's a node you're going to be working on, then you want to, the proper step is the coordinate first and then drain. --> Yeah, it might not actually do anything because the scheduler might say, --> hey, I don't have anything on the schedule for your nodes, so it was a waste of effort. --> But yeah, okay. --> Yeah, coordinate first and then give it a second to settle so they can pick another node to put pods on and then start drinking. --> All right. --> So now I'm going to do, see this works here. --> Let's add a worker node. --> All right, so now let's have some fun here. --> Let's deploy a mini-cube cluster. --> It's your a fresh mini-cube environment and deploy a cluster with the Ecyllium CNI. --> Notice we have six nodes. --> And I'll be right back. --> It's Docker magic, you know, mini-cube engineering magic also. --> Let's see where we're at here. --> Looks like we've settled out. --> Should we have a happy cluster? --> Hey, hey, she's happy. --> Yeah, let's start the status real quick. --> Yeah, there we've got three control planes and three workers. --> Now we're going to look for a control plane node with a cillium operator on it. --> So we'll do nodes O wide. --> Control these nodes are wide. --> Oh, whoops, no pods, my bad. --> Minus A, uh, minus A. --> There we go. --> Right. Where's mini-coo? All right. --> Let's delete this. --> Oh, you only knew how many weeks it takes to build all that out. --> It's amazing. --> Let's take the status of the cluster. --> Let's take a look at the nodes. --> And is it happy? --> Doesn't say degraded, does it? --> It just says running. --> Hmm, interesting. --> All right. --> So, let's have some fun here. --> I like breaking things so you know yeah just go ahead and control C out of that --> obviously you can run minicube profile list you have six nodes running but something's not --> right um right so uh all kinds of stuff going on so you can look at the nodes let's look at the --> pods what we have running is running but we don't have all of our nodes all right so what happened --> MiniCube with Cillium and Kubup, HA, is not really H.A. --> There's a lot more to configuring H.A. --> than just a basic install. --> So this is similar to what can happen in a production H.A. --> Cluster that's not all configured correctly to handle. --> Right. So let's review. Lesson 9, we learned how to scale and upgrade Kubernetes clusters. --> We learned how control plane nodes must be scaled in odd numbers. --> Not that they won't work for a few minutes in an even, and it will work with four, --> while it does not work with two. --> Scaling downward involves coordinating, draining, and deleting, and scaling upward involves creating, joining, and then un-coordening. --> We learned that upgrading a node requires coordinating a node, then drain, upgrade, restart, and finally uncordinated. --> The cluster scaling using IAC is easier with stateless workloads. --> Simple as creating a new cluster with IEC and set workload state with the clarity of GitOps. --> After reaching desired state, transfer DNS to new cluster. --> The new cluster ingress and then monitor the new cluster for any issues. --> We learned how to scale a mini-cube H-A control plane down, --> how to scale a mini-cube H-A control plane up. --> how to scale a worker node, how complex HAA setups may be more difficult, and what happens --> when the primary control plane goes offline? So what you just saw there was mini-coop was our --> primary control plane and went offline. So that's part of what our problem is that we just saw. --> So Kubevip and SELAM are not configured correctly for H.A. --> So it works great if you take out O2 or O3, I think. --> But when you take out O1 or take out mini-cube, which is O1, --> and then add one back in, we're missing a few things. --> All right. --> Any questions on lesson nine? --> All right, let's jump into lesson 10. --> might, I'm not sure which environment will meet. --> Oh yeah, for sure. --> Yeah, analyze and troubleshoot Kubernetes issues. --> So rather than devote a large lesson for 10, --> I've incorporated much of this lesson into all of the other eight lessons, --> not so much nine, although we did have a little bit, --> but lesson one through eight, --> I incorporated analyzing and troubleshooting so that you --> could see it in real time, you know. --> And so I think that's better than just studying the theory. --> So this is kind of a recap of some of it. --> So poop control is a primary CLI tool --> for obtaining cluster information. --> And we know that when COP API server is unavailable, --> then we're going to be unavailable, --> we're going to be unable to use COP control, right? --> So we're going to be very limited in what we will have access --> to if API server is unavailable, --> group control has nothing to connect to. --> So in that case, when we're analyzing, --> events provide important information for one hour --> and then they're gone as well, right? --> And they're deleted in order to save space. --> So logs can provide important information --> after starting up, but we don't have logs --> when we're trying to start up a container, right? --> So container creating, there's nothing there other than events. --> And then the other problem is that logs are deleted if the container is deleted. --> So when it's running a job and it deletes a container and starts a new one up, --> the logs are all shipping events and logs can save valuable time and troubleshoot. --> Yes. --> So node analysis is important to analyze resource constraints. --> And resource constraints can be in many forms. --> You can have CPU, memory, or PID, and that's the one that gets people. --> I nodes also. --> Pid and I nodes, two that developers don't even think about, right? --> So when your DevOps personnel are deploying their workloads --> and you can't figure out how it's crashing or what's going on with the database, --> those are the two that are they hidden. --> They just sneak up on them, Pids and I nodes. --> So I nodes are on your database. --> that allows so many it's like files and so if you have a lot of so i believe it's a lot of rights of very --> small files you get a generous i node allocation but if you have a lot of rights of very small files --> then it'll blow up your i nodes after a year right so that so everything's running along nicely --> on your application and after a year all of a sudden your database crashes like what happened --> I didn't do anything. I didn't update it. I didn't, you know, we were at, you know, 100 meg yesterday and we're at 101 meg today. What happened? We had 200 meg available, right? Or whatever size your data received. Yeah, I don't understand it. Why did it crash? If you start looking at I nodes relating to like a MySQL or Postgres or whatever, you'll find that your I nodes have probably been exceeded if you have a type of application. That's one of them that, um, --> We'll creep up on you. --> And so the other constraint can be node paints, right? --> So we paint a node. --> And so we have to tolerate it to enable a workload or not tolerate it, and it will --> repel it. --> So we need to check that. --> And then node pressure, disk. --> So if we're not properly managing our node logs, that can creep up, depending on how --> much you've allocated for your node storage, or especially, --> you know control planes we generally don't allocate a lot of storage to control planes --> control planes are an expense they don't generate anything because you're not a --> workload so we try to use this resource friendly as possible all right --> pod analysis is important to analyze pod constraints so is there a note to deploy on --> are there any taints that need to be tolerated is the image repository --> available right so there's we have a credential issue is the image repo down did the --> upstream repo change they've been using a repo for the last five years and all of a --> sudden with no notice you using a totally different one we can't pull an image so is the --> image able to be pulled because of credentials right repo is there but we don't --> have credentials are there enough resources allocated to start up the node ready --> state. So it must be ready or no pods can deploy. Our pod ready state must be ready for --> workloads to function. Services need endpoints to connect to, even Headless. Headless uses DNS. --> So it uses DNS instead of going through proxy, to proxy it goes through the DNS server in order --> to connect to the service. But it still needs endpoint. Ingress needs endpoints to connect to, right? --> So just like when we look through the ingress and the endpoints were empty. And even when it --> connected to the service, the service endpoints. --> Labels enable the ability to query all resources by label. --> That can be a time. --> And namespaces enable the ability --> to query all resources by namespace. --> So we do a practice here if we have time. --> So it depends. --> So this was designed to basically go through everything --> that we've learned and spin it up again --> and see how much you've obtained. --> It's about 240. --> So if we take our last 15 minute --> it would put us at 255 so if we you know skip this or put us at around 3 o'clock and --> then we can spend the last 45 minutes because we have to do the review before 4 o'clock --> so did you receive the review from from Neil okay because he wanted to make sure that --> was completed before 4 o'clock so they'd like to do that between 345 and 4 so --> Okay, so I'm Eastern. --> Oh, interesting. --> So, yeah, so for me, he wants it filled out, and I guess you email it to me. --> Is that how? --> Okay, yeah, so he says, have the suit filled it out around 345, so 15 minutes before the end of the course. --> Are you central? --> Huh, interesting. --> We'll be my back. --> Had a little. --> So we can do a quick review, and then we can go on break, and then --> lesson 11 is where we take everything we've learned --> with Helm charts. We're going to learn Helm charts --> and then we're going to take everything we've learned with Helm charts. --> Okay, the Helm chart pieces. Yeah, it's involved --> but it's fun. So Helm is, to me, is where Kubernetes --> comes together. Okay, so --> let's skip this. --> So proper analysis and troubleshooting starts with --> events and logs. Shipping events and logs --> can save valuable time. --> Coop control provides feedback on YAML formatting errors. --> That's kind of nice, but that's one of the very few feedbacks that you will receive on YAML formatting is with Coop Control. --> So if you're using Helm charge, you may not receive feedback that's helpful at all. --> And if you have a thousand line values file, shipping events and logs can save valuable time. --> Helm may not provide descriptive feedback on YAMLOR. --> on Yamow errors. Start with the pod and any controllers, whether it's a replica set or --> deployment controllers. Next, the service, if there is any service, ingress or gateway, may have --> multiple components to analyze, especially when you get into gateways, your gateway class may --> be incorrect. Or your HTTP route may not be read correctly, may not be connecting to --> the gateway and always remember to test on a development or production like cluster, especially --> when working with Helm charge. All right, let's go ahead and take our last 15-minute break and --> we'll come back at just before 3 o'clock. How's that? Yeah, yeah, yeah, absolutely. --> All right, I will see you at 3 o'clock and you're ready to continue. Oh, okay. All right. --> all right now we get into the fun part so this is kind of where everything that we have learned so far --> or it all kind of gel and that's because you know helm charts are the you know production way of deploying --> manifest files and you can certainly do it the way that we know deployed different containers workloads --> with manifest files, but Helm templating provides a uniform method of doing that. --> So what were you going to say? --> Let's jump into it here. --> So Helm templating follows a specific layout for standard charts. --> Within the chart folder, you will find several required folders in file. --> Part.comel file lists both the chart version and the app version, --> the chart version being self-descriptive. --> It's the version of that Axel Helm chart and the app version relating to whatever workload that you're --> you know running controlled by that chart the values dot yamil file contains the user-defined variables --> of course it's a lowercase v not an uppercase is that changed to chart. yamma however is uppercase --> the templates directory contains manifest file templates templates are designed to read variables in the values --> YAML file. Templates rarely contain user-defined variables. If they do, generally, it's going to --> trigger something. So we'll get chart directory. The charts directory is optional and may contain --> upstream charts. So it may pull in upstream charts and then those upstream charts will be --> updated within that charts direction. All right, Helm templating. --> Home templating follows a specific design batter. --> It is designed to enable DevOps personnel to modify the chart, meaning the modifications --> are to the values that AML file only. --> And the rest of the chart is maintained by upstream chart captains. --> And it's the actual term, for some reason, someone chose a long time ago. --> I think when the word container was chosen, they decided to go full nautical with every --> family. --> So that's where that comes from the chart captain. --> The chart is versioned and installed by latest or the version number. --> Helm issues, upstream providers often do not have a full-time helm captain. --> This leads to upstream personnel maintaining the template and errors can arise due to lack --> of templating knowledge. --> Common errors, headless service, I see that one a lot, ingress, ports, image version. --> Oh, by the way, a good practice, and I did not cover this, but a good practice is to install an ironclad firewall on your Kubernetes node so that you can catch all of the ports that are being used that you have no idea. --> And what I mean by that is upstream maintainers will release a container with a helm chart, and it might use three different ports, metrics, --> ingress and maybe a communication port for high availability or it'll have a --> live-ness probe port right and then all of a sudden they'll release a new chart --> with a new version of the container and it's using two new ports what are those --> ports and when you communicate with them sometimes we're not using two new --> ports and well here's here you go take a look and they realize they've incorporated --> something in there that they didn't even realize that was using two additional ports. --> So all of them, yeah, so I install firewalls on all of my nodes, worker, control plane, --> and storage because each one, yeah, yeah, so you can use UFW and install UFW with Ubuntu, --> and then you will find that your container doesn't look, you know, finish spinning up, right? --> The pod doesn't become ready yet. --> So why is it? --> And you look and it says it can't reach something. --> So you go into your node and you query to see what the latest firewall block is. --> And it will show you whether it's incoming or outgoing. --> And then you can see what port that that container is using. --> You can go and try to look it up in the manifest file, but oftentimes it won't even show --> you that port in the manifest file. --> So there's no way to know unless you had a firewall that it's actually communicating --> through a specific port, whether it's going out through a port and then coming back into another --> node for that same container running on a second node in a different port. And then it allows you to --> just open the ports that you fill are necessary. And so if a container is shipping metrics, --> for example, and I didn't authorize metrics to be shipped, you know, to a foreign country, --> and some of these open source projects are actually maintained in a foreign country. And so you --> can block that port so no metrics are going to be shipped you know to their upstream data collection --> project so yeah firewalls on nodes yeah they're very very important yeah yeah um so ports and then --> image version is another common area it's you'd be amazed how many times the helmsert ships with the --> old version so that the helms chart has a new version but the the actual container still the the old --> And again, a lot of that has to do with the team doesn't have a Helm captain, so they're --> not familiar with every process for updating the Helm chart. --> All right, Helm solutions to issues. --> So it can take a while if a home chart is not correct and it does not work, your choice is --> to roll back to an older version of work, that may not be possible due to issues with the older --> version and so you can modify the home templating within the chart to fix the issues. --> If you understand templating, you can do that. However, this leads to a requirement to maintain your --> own version. This requires that you have your own chart repository. Common repositories are GitLab --> and GitHub. There are newer ones that are out there. However, every time you upgrade to a new version, --> and you have to compare charts for any changes and pull them in manually. --> That is a very tedious process to maintain a downstream chart. --> All right, any question. --> Okay, so we're going to ensure a fresh mini-cube profile. --> Yeah, just a single node on this exercise. --> All right, okay. --> So now we're going to create a Helm chart. --> So we have Helm installed and checked the version yesterday. --> So the command to create a Helm chart, and we're in the root directory, is just Helm Create, --> and we'll call it test-hyphen app. --> So now we're going to LS and find it. --> We're going to CD into it, and what do we see in there? --> So let's take a look at value. --> I don't know. --> I see anything familiar in here? --> Let's scroll up to the top and then read it from the top down to the bottom. --> Got a few lines in there, don't we? --> So we have a rep, yeah, I can see that. --> And you also have the delay, --> because the data center they're running this on. --> We can see we have a replica count of one. --> All right, so that's interesting. --> File that away. --> And see we have our image, yeah, yeah. --> And very similar to the template we worked with before. --> Now there is no private registry because we're pulling it from a public registry for EngineX, which it's going to assume that's Docker, probably. --> If it's Quaid.io, you generally need to put Quaio. --> Let's see here. --> Nothing in there, no secrets to pull from the private registry. --> So that Docker config JSON, if we had created one for private registry, that would go in the image pull secrets. --> The name of that, making sure that, --> that it's in the proper name space so you can see this doesn't have anything for --> service accounts and if we scroll down a little more we see pod annotations nothing in --> there so pod annotations pod labels nothing in there that's where we could label our --> pods pod security context nothing there we have a service we're at 80 cluster IP --> Let's see if that service is created. --> Let's go a little further. --> It looks like it does create the service. --> So it's going to create pods, one pod. --> It's going to create a service from 4 to 80. --> And is it going to create an ingress? --> Correct. --> It's false. --> So this leads to an interesting question. --> If we are using ingress and we want to use this to handle the TLS cert, --> into its own ecosystem, its own container, we can use the ingress. --> However, I generally do not use ingresses anymore because I use Gateway API. --> So I terminate automatically at the gateway. --> In fact, my Helm charts, so I'm a Heldencaptain, and I build Helm Charts from scratch, --> and I don't actually have an ingress in my home charts anymore. --> I have HTTP routes and so I had to create from scratch and we'll see that in a --> minute my own templating for HTTP routes for Gateway API because that doesn't --> exist in the Helm ecosystem yet so something to keep in mind is is Kubernetes --> goes from ingress to Gateway API so you can see our resources we don't have --> any constraints but it's there so you can look at it and we go down a little further --> we've got our probes and probably empty let's see yeah they're empty so no no port --> my other port is http but oh they're just hitting the path i guess well we'll find out if it has --> probes usually i see something else there beyond path it's hitting probe or something like that --> after the auto scaling oh interesting that could be done enabled to all you know --> a replica is one max replica is 100 nice and it targets CPU utilization percentage 80 so we didn't --> demonstrate that because it's more of an advanced concept but essentially you could set of --> auto scaling and add those in there and test that out additional volumes if we want to attach a --> volume so we could attach a secret as a volume for example --> I'm applying mount, see the node selector down there, and then right below that tolerations. --> So on a lot of helm charts where they don't have a helm captain, those won't do anything. --> You can fill those out, and it will not translate up to the template, which we'll look at in a minute. --> And that's because they don't have a helm captain, and they don't know how to read what you're putting in there for the value. --> So there's no way to force that pod to run on a specific node. --> There's no way to tolerate that node for a taint. --> And so you're stuck running that workload on whatever workload is available. --> And then our affinity is for pod affinity, pod anti-affinity, and node affinity, node-anty-afinity. --> and node affinity, node, anti-affinity. --> And you will find that some teams, --> some upstream teams, --> they don't know how to work with node selector. --> You know how to work with affinity. --> So everything they do is it's affinity or anti-affinity or no affinity. --> And so usually you have to propose to them how to modify their Helm chart --> to enable node selector into corporations. --> Again, limiting factors, --> you'll run into in Upstream Helm chart. --> All right, let's view the chart. Gamal file. --> So here we have our two versions, --> we have our chart version, 0.1.0. --> Every time you change anything in the chart, --> and that should increment, --> and then the app version should follow --> the container version that you're pulling in. --> So if your container is 1.17, --> So it's EngineX1.17, then the app version should also read 1.17. --> And the values file and the templating should automatically read the app version for the container model. --> CD into charts. --> Let's take the look at it. --> So this folder is for if you pull in upstream charts, so if you have a chart, it depends on other charts to run. --> Then they will go into this folder. --> Let's CD back out of that. --> video into templates. --> Describe the contents here. --> So we've got a notes. --> Dot text tells us a little bit. --> Sometimes it's descriptive, sometimes it's just, --> there's nothing important in there. --> The helpers. --> What's that? --> Uh, yeah, you can go through them one at a time, yeah? --> Absolutely. --> Okay, so this is what gets output for this chart --> when it loads. --> Okay, so you can get out of that. --> Helpers. --> I think there's an underscore there. --> So this is a helper which creates variables, which are then used throughout the other --> templates. --> So the helper creates a variable from something, oftentimes pulls it from the values, --> a YAML file, makes it available to the other type. --> It doesn't look at all, like what we typed out earlier, doesn't. --> This is Helm template. --> So you notice, so it's designed to read the values file. --> This is a very simplistic Helm chart. --> But if you scroll down to the bottom, I think, yep, here we go. --> Node selector with values, not node selector. --> Node selector, and it takes it to YAML. --> So whatever you put in that node selector value, it converts it directly to YAML --> and indents it and puts it right there and so that is actually missing in a lot of upstream charts --> just because the team doesn't know how to put that in there and apply it in the same way with --> tolerations you might have node selector but no tolerations so if your note is tainted --> there's no way to deploy it all right let's do the next one this is your horizontal pod --> auto scaler and all right let's do the next one in our ingress and you can see that we use the --> ingress API version one and your name space name label sanitation no name space interesting very very --> basic culture okay the next one all right so it's going to create a basic service again --> no namespace making it very simple to deploy --> All right, let's do the next one. --> And we don't use service accounts because we didn't. --> It's an advanced topic, RVAC service accounts, et cetera. --> All right. --> So now we can change back to the root folder. --> And we're going to create a namespace test app. --> And we'll make sure that that's created. --> Okay. --> Now we're going to deploy the Helm chart using the Helm install command. --> So the format is command, which it's written there, --> Helm install, chart name, deployment name, and namespace with a minus N, right? --> So how would you write? --> I'll give you the first hint is we're going to start with two words, Helm, space, install, --> and then there will be a space. --> And we can give it a cool deployment name. --> We can call it cool, cool name or something like that. --> Whatever you want to do, call it whatever you want. --> It won't actually work if you're in a test app. --> It'll give you an error. We can do it, changing the test app. Well, let's get it. I was going to say it'll be fun to see what the error is like. Yeah, we might as well see it. No, it's just test dash app. The name of the chart is test, hyphen app. So let's run the command from here just for fun. Yeah, so we'll do Helmin install. Let's call it cool, cool app or something. There we go. Name space test test app. And so see, it can't find the --> chart because you're in the chart. So let's go out to the root. Let's do the same. --> No, let's change it. Helm install test, let's do test app, test. My bad. We named it --> test app, but it can't find test. So the first one should be cool. Yeah. So we're naming it --> first and then we're finding the charters. Yeah. So the first test app is actually the --> deployment name. There we go. This should work. All right, remember what we saw in the notes. --> text file that grabbed what it needed and through templating and then output it here so it says --> get the application URL running that you may have to run them one at a time i'm not sure looks like --> fork commands there yeah they don't have the uh the backslash yeah so you're yeah you're --> exporting these first export both yeah you did and an echo and then yeah now run cue control yeah --> Yeah, just the last line. Yeah, I think that's correct. --> Oh, cool, it's forwarding. You could open your browser and test it. --> See if it works. I have no idea if that's going to work inside here, though. --> So it's port forwarding. If it'll let us open a browser and connect to --> 127.0.0.1880. Bingo. Port forwarding magic. All right, we can cancel out of that. --> Let's take a look at our pods. --> Let's look at how to get all the pods and services and a single command for that namespace. --> So we've got our pod, cool app, test app, and you can overwrite that and call it whatever you want. --> It's just automatically taking your deployment name, the chart name, adding a, let's see, it's got a replica set. --> So adding a replica set hash and a pod hash. --> We can see our service is running on port 80. --> Let's take a look at that service. --> And we have an endpoint, a single endpoint. --> And we have no ingress. --> You can test it and control get ingress. --> It should not have deployed one. --> I would do hyphen A. --> All right. --> Now let's use the deployments. --> List all of the helm deployments. --> There we go. --> That is our deployment. --> It tells us what version we're running, what app version, everything there. --> Now, how do we get rid of it? --> Well, Helm uninstall, which is your command, --> then the name of the deployment, and then the name space. --> All right. --> Cool. --> Now let's check to see if it was removed. --> So we're going to do Helm List. --> Yeah, I believe Helm List. --> I don't believe so. --> Let's see here. --> Yep, there we go. --> Yeah. --> All right. --> All right. --> Everything's gone now. --> So now let's re-deploy it, but let's adjust the values that YAML file to enable --> ingress within the . --> VEM. --> We have to CD into the folder. --> And then the values. --> YAML. --> And scroll down the ingress. --> We'll change it to true and we'll give it a host at enginex. --> example. --> So we'll change enabled to true. --> And then the host will be enginex. --> Because that's the DNS that we have in our Ubuntu. --> All right, that looks pretty good. --> You can set resources, you can do whatever you want, --> but this will at least enable our ingress to pop up, --> I think. --> We'll see if there are any bugs. --> So this chart actually didn't work for several years. --> It was frustrating for individuals. --> for individuals trying to learn Helm. --> They said, you know, even the tutorials don't work. --> And I said, yeah, there's a bunch of errors in the tutorial Helm chart. --> I couldn't figure out if that was by design or if they just hadn't maintained it for several years. --> All right, so now, I'm going to read a boy. --> Go back to, yeah, there you go. --> Yeah, test that. --> And, yeah. --> Yeah. Oh, look at that. Yeah. So it has a still has a mistake. Yeah, it still has a mistake because it, well, okay, they could have done that better. They could have provided that information to you the first time instead of having to run it yourself. Oh, well, maybe it's a little more comfortable. So let's try it. I don't think it's going to work. Did it work? I know the connect. Oh, yeah, take out the S. --> Let's see what happened. --> Yeah. --> All right. --> So now let's ensure a fresh mini-cube environment. --> We're going to install a CNI-free cluster. --> Neal pinged me a little while ago, and he said, --> have him fill out the form from 4 p.m. to 4.15. --> I'm like, okay. --> Yeah, Eastern Time. --> Yeah, he says, do it. --> Yeah, at the end, fill of 15 minutes afterward. --> Okay. --> Yeah, there was your disconnect. --> I think you have that one figured out. --> Right. --> And oh, coming up. --> There's no network. --> There's no CNI. --> Yep. --> So core DNS isn't going to work. --> Doesn't want anything to connect to it. --> Okay. --> So now we're going to connect our Helm instance to an upstream Helm repo. --> We're going to use the Cillium Helm repo. --> The command is Helm repo add, and then name, and then repository. --> So the upstream Cillium repo is usually listed on the GitHub. --> on the GitHub read me file for that Helm chart repo and so we're going to add --> Cillium so Helm repo add note yeah so this is actually Cillium so we're --> gonna we're gonna install the chart for Cillium so here's the command film repo --> add it's going to be Cillium and then HTPS helm.cilium. --> So Cillium has their own repo self-hosted. --> Probably get left. --> All right. --> Verify the home repo as that. --> It won't be there. --> Alright, now. --> And verify what is available in our home repo list for installation. --> So what do we have available to us if you want to install something? --> Now we've got Cillium Cillium, Cetragon. --> We've got two Helm charts. --> Cillium Cillium Cillium. --> is 1.17.5 for the chart. --> The app is also 1.17.5 because --> their brain did when they keep the chart version --> the same as their app versions. --> Now actually it's because they want to keep things simple. --> And they're differing from Helm templating --> where your app version runs based on your container --> and your chart version runs based on your chart. --> So what they do is they don't push a chart unless --> They have a container change, --> but if they have a mistake in a chart, --> what happens now? --> So in that case, they, I guess, --> delete the chart and we push the chart. --> Anyway, this is just a selling team --> doing their own thing. --> All right, let's get all versions --> that are available to us in our repo, --> now they're connected to it. --> And this actually goes out to the internet. --> All we did is grab the ceiling. --> scroll up and see all the different versions that are available to install so if you're running into --> issues and you say hey all of these versions don't work but this old one does we can install --> it next now what we need to do is every time we add something to the report every time we get --> ready to install a chart we need to update the repo we need to pull in all of the new --> information so let's Helm repo update and so now we've we've pulled it in that's what's available --> Now we're going to helm repo up. --> So let's say we hadn't run this for two weeks, right? --> And they have a new version. --> We would need to helm repo update. --> So it's happy helming. --> All right. --> Now let's show the values for cillium ciline or Xpo all the way up. --> The thing they don't have line numbers. --> You see just how many lines are in this thing. --> Okay, so here's what we're going to do. --> We're going to cat the values to a values. --> YAML file for editing. --> So we're going to helm show values again. --> So Liam Sillingham and then cat it to values. --> Yeah, obviously we can only do this once. --> But, and now let's then have the values. --> Alright, you start at the top. --> That's nice. --> Okay, you don't want you to edit this --> because of the way that they have the templateing head. --> So rather than editing this, what you would do --> is you would create your own values. YAML. --> And you would follow the same format. --> Can you see my cursor moving on your screen? --> moving on your screen right now no oh interesting all right let me try --> okay can you see me now okay so when you create your own values file you want to keep --> the same formatting so let's say you wanted to change common labels you would --> create just a values that you have a file and you might start it with just this right here --> so common labels and add your you know label in there and make sure you maintain your --> you're indenting um okay i'm gonna change you back to view only i don't want to mess up your um --> your terminal there all right take a second here and you should be back --> Did I lose you? --> Oh, there you go. Oh, there you go. --> Okay, you're back now. --> Okay, I can see you. --> Okay, so we can kind of go down through. --> It's got a debug mode. --> So if you run into issues with Sillium, --> turn on the debug mode, and it'll help you solve them. --> That's nice, nice features. --> Somebody thought ahead. --> Don't need upgrade compatibility, --> but there was a time where you needed upgrade compatibility. --> I don't need that today. --> Let's see, scroll on down, and it lets you turn on debug for specific containers. --> It's got Arbac controls, Instcle Secrets. --> Well, we would use secrets, we're still in, but we're not using it now. --> Let's see here, configure IP tables, no. --> And scroll on down to the bottom. --> Cluster. See, we have cluster in there, and then we have a scroll on down level. --> Scroll on down a lot for is at the bottom. Okay, there's a lot reload relay. So that's that's going to be your Hubble relay create truth. You I create truth. All right, let's take this thing on install. Let's exit out of this. Okay, this is going to be a little different because remember we want version 1.17.5. That's what we catted now. So we're going to do helm install cillium, cillium, cillium. --> because remember that's what it's called, --> that we're going to call it cillium on our machine, --> but cillium cillium is what it's called inside the repo, --> and then minus f for file values.amil, --> because that's our values. --> YAML, --> we're going to tell it we want version 1.17.5, --> and then the namespace is going to be Cuvith systems, --> just to make sure it can communicate with DNS proxy. --> I eliminate any problems on the first step up, but who know, --> because it's MiniCube and it's your version of MiniCube and my version. --> All right, let's for the pod. --> Creating crash loop back off. --> Oh, Core TNS, because it doesn't have anything to connect to. --> We don't have a CNI, so it says, hey, I don't have a network address. --> So Envoy's up. What else is that? Nothing else. --> Yep, because we only have one node. --> So operator, remember, that's a Damon set. Is that a Damon set? Let's check. Let's check real quick. --> Oh, let's do. Yeah, let's get daemon sets and see if it's listed. --> Cillium is a daemon set. Interesting. Well, let's see if operator's up yet. --> I wonder why it's pending. Yeah, let's look at it and see. --> No, this core DNS is running now. So we got a network. So we got a container --> networking interface now. Oh, that's on the same port. Wondo didn't have --> free ports for the request pod port. So it's trying to use a specific port. --> 9234 would be my guess. --> Could be 9963. --> I see a bunch of ports listed there. --> So it doesn't have enough free ports, --> so it can't install both on the same node. --> But if we had two nodes, --> they could install it just fine, --> because it would automatically bump it to the next node. --> Or should. --> So you could fix that. --> But this gives you an idea of, you know, --> how to spin up Sillium, it's a very simplified version. --> you can feed it a kubevip from the kubbip load balancer if you installed kubbip on your control plane --> in high availability then you take your kubvip vip and feed that in the sillium values --> yaml file it has a line in there for your vip and you give it the vip and so like that you can --> create your own values file you can install your own helm chart and now --> We're almost out of time. --> We have five minutes left, but if you're done with that, we can... --> Oh, let's look at the deployment. --> Let's check our deployment again. --> Helm. --> Let's see what we have installed with. --> So it's a helm. --> Let's see. --> Let's see. --> Let's let up for you. --> LIS, when I say, I'd have to go back 10 charts. --> All right. --> Yeah. --> So, you can see it's deployed. --> It's telling us the app person. --> the chart version when it was updated. So you can change your values file and then instead of --> Helm install you can do Helm and update it with a similar command and it will apply the values --> file, your new values file to it and it will restart it. So you could install cillium and then --> install coup-v and then get your vip and feed it to cillium and restart cillium. Again chicken and egg --> in an egg and we discussed that earlier. --> Yeah, so there's a sequence here, --> and you just update Cillium to give it to a VIP --> because it's going to install before the VIP is it --> because you need a CNI in climate. --> So there you go, so let's look like we don't have any errors, --> but let's check real quick, --> get all minus the N cube system. --> Don't think we have any errors, but yeah, --> Cillium operator. --> Yeah, so because we don't have the same ports available --> on the same node, one one port would be two nodes to do that. --> Because it's set, now you could set Selim operator to one in the values. --> YAML, and then reapply that with Helm update, --> and that error would go away and you would just have one of one for your deployment. --> Yes, you could add a worker node. --> I would add a control plane node, but the problem with another control plane is --> they won't be high availability, so, because you don't have kubeb. --> So I would do a worker nose and see if it forces that over. --> But yeah, definitely. --> I mean, you could spend a lot of time playing around with this --> and get to no cillium just in any cue nuances of mini-cube. --> And then I would do cube-vip with the chart. --> I wouldn't mess with mini-cub. --> In instance, I would see if you can get cub-vip to run with a chart, --> get it to feed you the vip, and then feed cillium the vip, --> and then reinstall cillium. --> Then you would have a high availability cluster. --> But you'd have to create three of everything for cillium, so you need three cillium operators. --> And then when you do your Hubble relay and all of that, you need three of everything. --> So as you go in and you start adding it all of the other cillium pieces that were in that values. --> YSyamophile, you'd want to make sure that you did high availability on all of them, --> and then you would force them all to the control plane node --> so that they run on the control plane because that's part of the networking. --> All right, we can remove this. --> So we can now immediately look at your pods. --> Scroll up to and just get in. --> Oh yeah, they're already gone. --> Yeah, they're already gone. --> Yeah. --> It knows how quick helm was. --> It knocked them all out of their immediate. --> All right. --> So let's recap. --> We had time. --> We were going to do a few more. --> Longhorn would have been fun ones. --> All right, so we learned how Helm templating works, --> how the Helm chart is structured, --> how to create a Helm chart, --> how to add an upstream Helm repo, --> how to view all repos that are in your Helm instance, --> how to view all versions in a home repository, --> how to install a Helm chart, --> how to modify a values at YAML file, --> adding it out and seeing what you need to do to create your own values that yellow file.