Kubernetes Comprehensive 2-Day Videos

                WEBVTT

00:00:00.000 --> 00:00:05.000
Maybe it will stay this way.

00:00:05.000 --> 00:00:06.780
Okay, all right.

00:00:06.780 --> 00:00:09.780
Yep, we can go ahead and get started.

00:00:09.780 --> 00:00:14.060
All right, so my name is Lane Johnson,

00:00:14.060 --> 00:00:19.180
and I've been working with Kubernetes since 2016.

00:00:19.180 --> 00:00:21.500
So as a government contractor,

00:00:21.500 --> 00:00:24.160
the government encouraged us to learn how

00:00:24.160 --> 00:00:28.220
to use the new Kubernetes that Google had released,

00:00:28.220 --> 00:00:29.980
and we've used a lot of content.

00:00:30.000 --> 00:00:59.980
containers back then. It was very difficult to manage them. And so I began learning Kubernetes. We actually had a competition in Tampa among government contractors. And it was the first time anyone had ever built a Kubernetes cluster in that area. So it was a nice tech challenge that we took on. So I design engineer and build production grade clusters. And that's kind of what I do full time.

00:01:00.000 --> 00:01:06.320
right now. It's just build clusters. And I use infrastructure as code for all of my clusters. So I

00:01:06.320 --> 00:01:13.800
instruct my clients on cluster concepts and operations, and typically I instruct them on the cluster

00:01:13.800 --> 00:01:20.280
that I built for them. And so I will deploy it in their environment, which is typically

00:01:20.280 --> 00:01:28.120
bare metal. And then I will instruct them on how to operate it, how to modify it, so that they

00:01:28.120 --> 00:01:30.960
learn on their own cluster that they're going to be using.

00:01:32.160 --> 00:01:36.260
And then I also assist clients in troubleshooting production clusters.

00:01:36.260 --> 00:01:42.160
So I have a decent client list who like to build their own clusters.

00:01:43.160 --> 00:01:49.660
And then they run into problems, which we'll get into some of the issues that you can run

00:01:49.660 --> 00:01:50.500
into with a cluster.

00:01:51.500 --> 00:01:55.940
And they reach out to me for help and how to troubleshoot it.

00:01:57.300 --> 00:01:57.860
Yes.

00:01:58.120 --> 00:02:01.840
Yes, now sometimes they're coming from the cloud,

00:02:01.840 --> 00:02:05.760
and they have cloud programs that they're containerizing.

00:02:05.760 --> 00:02:09.640
So they're not necessarily built for Kubernetes.

00:02:09.640 --> 00:02:13.520
And so it might be an old cloud instance

00:02:13.520 --> 00:02:18.840
that uses, say, 20 corps for a Django server, right?

00:02:18.840 --> 00:02:23.920
And so I might assist them on how they would containerize that,

00:02:23.920 --> 00:02:27.880
and they'll containerize it into a pod,

00:02:27.880 --> 00:02:31.420
for example, it needs 20 CPU cores, right?

00:02:31.420 --> 00:02:35.260
So the containers that they're running

00:02:35.260 --> 00:02:37.220
aren't necessarily built for Kubernetes,

00:02:37.220 --> 00:02:38.440
but they're wanting to run it on.

00:02:38.440 --> 00:02:42.360
All right, so we have 11 lessons,

00:02:42.360 --> 00:02:45.000
and we'll go to these real quick.

00:02:45.000 --> 00:02:47.800
The first one is understand the Kubernetes architecture

00:02:47.800 --> 00:02:49.340
and its components.

00:02:49.340 --> 00:02:52.040
This will be a very comprehensive lesson

00:02:52.860 --> 00:02:54.440
that will take quite a while to go through.

00:02:54.440 --> 00:02:57.820
It's the foundation for the rest of the course.

00:02:57.880 --> 00:03:05.880
Lesson two, isolate resources effectively using namespaces, taints, and tolerations.

00:03:05.880 --> 00:03:13.880
Lesson three, manage and customized workloads with deployments, staple sets, and daimonset.

00:03:13.880 --> 00:03:19.880
Lesson four, we're going to work with jobs and Chrome jobs for scheduled tasks.

00:03:19.880 --> 00:03:25.880
Lesson five, Understand Services and DNS within Kubernetes.

00:03:25.880 --> 00:03:30.380
Lesson six, exposed applications using ingress.

00:03:30.380 --> 00:03:34.260
Lesson seven will be defining computational resources

00:03:34.260 --> 00:03:36.160
using requests and limits.

00:03:37.360 --> 00:03:40.920
Lesson eight, manage config maps, secrets,

00:03:40.920 --> 00:03:42.460
and persistent volume.

00:03:42.460 --> 00:03:45.360
Lesson nine, will be scaling and upgrading

00:03:45.360 --> 00:03:48.100
Kubernetes clusters using advanced strategies.

00:03:48.100 --> 00:03:52.080
Lesson 10, analyze and troubleshoot Kubernetes issues.

00:03:52.080 --> 00:03:55.860
And then lesson 11 is where we take the first 10

00:03:55.880 --> 00:04:04.220
lessons and then we dive in by deploying resources effectively using Helm charts.

00:04:04.220 --> 00:04:09.260
Because at the end of the day, most of what we do inside the cluster will work with some

00:04:09.260 --> 00:04:15.260
sort of manifest files or templating today, that's Helm chart.

00:04:15.260 --> 00:04:20.220
And so we will be using everything from Lesson 1 through 10 when we get to Lesson 11.

00:04:20.220 --> 00:04:26.400
Each lesson will be interactive and we'll build on the prior lesson and then Lesson 11 will utilize

00:04:26.400 --> 00:04:34.980
everything that we'll have time set aside hopefully at the end of the day today maybe 30 minutes

00:04:34.980 --> 00:04:44.020
to review the lessons that we've gone through and then tomorrow we'll set aside a half an hour

00:04:44.020 --> 00:04:50.500
for Q&A and then they have a questionnaire that they like to have the student fill out at the

00:04:50.500 --> 00:04:56.380
end of the two-day course so we'll plan on a half an hour tomorrow so we're six

00:04:56.400 --> 00:05:02.580
of instruction per day with an hour of breaks throughout and the breaks will be we'll have a half an

00:05:02.580 --> 00:05:10.260
hour for lunch we'll do a 15 minute break at around 1030 and then the other 15 minute break will

00:05:10.260 --> 00:05:18.760
be this afternoon these are approximate times but I believe it's going to be around 2.15 or so

00:05:18.760 --> 00:05:24.200
for that afternoon break does that sound about right for you or are you comfortable with that

00:05:24.200 --> 00:05:52.780
Okay. And we do have an interactive Kubernetes learning environment. Now, as I mentioned, I normally train clients on their own cluster. And so in that case, I would Zoom, meet with them, and they would open their own terminal and the UIs that I've deployed. And then I will coach them while they input all the commands and learn how the closest works. In this case, we don't have that. So we have the DA desktop, which we've installed Minicube on.

00:05:54.200 --> 00:05:58.380
And so that will give us our interactive Kubernetes learning environment.

00:05:59.020 --> 00:06:01.940
So our course stop time is 4 p.m. each day.

00:06:04.020 --> 00:06:07.320
And the MiniCube environment, have you used it before?

00:06:07.440 --> 00:06:07.660
Okay.

00:06:07.660 --> 00:06:15.700
So it does allow us to demonstrate concepts, most concepts, not all concepts.

00:06:17.020 --> 00:06:23.140
So Minicube enables you to work in both single node and high availability mode.

00:06:24.200 --> 00:06:29.360
So a single node being one node and high availability where you have three control planes.

00:06:30.260 --> 00:06:35.980
And it's an interesting environment, and it took a lot of engineering for the Kubernetes team to create it,

00:06:36.440 --> 00:06:41.360
because it's basically Kubernetes and Docker, but with a twist, which we'll make it.

00:06:41.440 --> 00:06:49.860
All right, so I don't think when you open your command line terminal, I don't believe you need to use the username and password.

00:06:49.860 --> 00:06:55.540
I think they may enable you to use to have pseudo privileges without any using the password.

00:06:56.740 --> 00:07:03.220
So go to your menu and then you should see, you should be able to type in terminal and it should

00:07:03.220 --> 00:07:10.820
pop up the flavor of terminal that they're using inside. So if you open up menu, I believe it will

00:07:10.820 --> 00:07:16.820
enable you to type in the word terminal and it will bring up a terminal.

00:07:16.820 --> 00:07:27.080
looks like yours is mate terminal okay I want to expand that out I don't think that

00:07:27.080 --> 00:07:34.580
you have to use pseudo or use your name in password I believe that Neil and I

00:07:34.580 --> 00:07:42.320
tried that okay so the first thing we're going to do is verify that all the

00:07:42.320 --> 00:07:48.620
resources are installed so I've type you can type in mini-cube version let's make

00:07:48.680 --> 00:07:53.720
that's installed. No, most of these in MiniCube and with Ubuntu require no dash in front of the

00:07:53.720 --> 00:07:59.960
version. I'm used to typing a dash, but not in this DA desktop. All right, so we've got

00:07:59.960 --> 00:08:07.160
Minicube, and now we'll do Cube Control. Okay, and we're going to check for Helm and check for

00:08:07.160 --> 00:08:12.760
Docker. And we're going to check to see if Sillium is installed. All right, we're going to build

00:08:12.760 --> 00:08:18.520
their first Minicube setup. So Minicube is a toy clip.

00:08:18.680 --> 00:08:25.680
cluster. And so most of your clusters, and I have a slide where I kind of explain the differences.

00:08:27.680 --> 00:08:35.680
Most clusters that you will encounter today are demonstration or toy clusters in the Kubernetes world,

00:08:35.680 --> 00:08:44.680
and they're designed to teach the CLI commands. Not all features work in every toy or demonstration cluster.

00:08:44.680 --> 00:08:47.680
And they're not recommended for production because they're day one.

00:08:47.680 --> 00:08:53.260
is they're day one clusters, which are designed to crash, you know, and then just be restarted.

00:08:53.860 --> 00:08:56.680
So they're not designed to live past day one.

00:08:57.160 --> 00:08:58.940
So Minicube is one of those mini.

00:08:59.040 --> 00:09:03.560
But Minicub has an interesting feature that they built in.

00:09:03.560 --> 00:09:13.800
And I think you'll see here in a minute, once you type that command in, I'll show you the engineers designed it as Kubernetes and Docker.

00:09:13.800 --> 00:09:24.220
And in high availability mode, each node is its own Docker container, which contains Docker inside of it.

00:09:24.680 --> 00:09:34.720
So when you build a node, it's actually a VM that's a Docker container that runs Docker containers inside of it, or Kubernetes inside of it.

00:09:35.140 --> 00:09:39.020
So three nodes will show up as three Docker containers.

00:09:39.020 --> 00:09:51.420
So go ahead and type that command in, for MiniCube Start, and then there are also abbreviations that you can use, but not all abbreviations work.

00:09:51.420 --> 00:10:06.260
So your environment should be 12 CPUs and 12.2 gig of memory, and so you can see here that it's just setting up a single node with two CPUs and 2.9 gig.

00:10:06.260 --> 00:10:25.650
Okay, now we're going to run Cube Control, get nodes, and we can see that we have a single control plane node, and now we're going to run, let's run Docker Stats, DOC-K-E-R, space, STATS.

00:10:25.650 --> 00:10:36.270
And you can see, MiniCube is running as a Docker container, and it has a limit on its memory of 2.8 gig.

00:10:36.810 --> 00:10:43.190
And then the CPUs, it's giving itself two CPUs, which you can't see there, but we know that in the configuration.

00:10:45.010 --> 00:10:47.210
Okay, and you can control C out of that.

00:10:47.210 --> 00:10:52.710
And then you can do Docker PS minus A.

00:10:52.710 --> 00:10:55.810
And we get a little different view.

00:10:55.810 --> 00:11:00.570
So you can see they've got a little bit of engineering magic going on with the Docker network.

00:11:01.610 --> 00:11:07.970
And so Minicube enables you with that to do a multi-node cluster.

00:11:08.410 --> 00:11:13.910
If you were to try to do this on MicroK8s, let's see we had installed MicroK8s,

00:11:14.390 --> 00:11:21.190
which is generally how you would learn to do the full Kubernetes set with Helm.

00:11:21.190 --> 00:11:26.190
you can't install more than one microk8s on a VM.

00:11:26.190 --> 00:11:28.870
So this VM that you're on with T8 desktop,

00:11:28.870 --> 00:11:31.390
you'd only be able to have a single node.

00:11:31.390 --> 00:11:35.570
You would get pretty much the full suite of Kubernetes

00:11:35.570 --> 00:11:39.470
in a toy cluster, but only a single node.

00:11:39.470 --> 00:11:43.570
MiniCube uses Docker magic behind the scenes,

00:11:43.570 --> 00:11:45.870
but it allows you to have multi-nodes.

00:11:45.870 --> 00:11:47.230
That makes sense to you?

00:11:48.350 --> 00:11:51.170
Okay, so and then there are other toy clusters like

00:11:51.190 --> 00:11:56.190
K3S, which is designed for a Raspberry Pi.

00:11:58.290 --> 00:12:00.930
Let's see, there are a few other ones out there as well.

00:12:00.930 --> 00:12:03.770
Correct, yes, demonstration.

00:12:05.610 --> 00:12:09.530
You can run mini-cube on your laptop.

00:12:09.530 --> 00:12:12.730
So for example, if you had a Linux laptop,

00:12:12.730 --> 00:12:16.790
which some Kubernetes developers, that's what they use now,

00:12:16.790 --> 00:12:19.190
you could set a mini-cube on it,

00:12:19.190 --> 00:12:21.170
and that would be your development environment.

00:12:21.190 --> 00:12:45.190
It's just that it won't duplicate a production environment because you have to do certain things a certain way to work on a mini-cube because of the doctor magic that's going on, that you would not do with a production cluster, and there are certain things that will work on a production cluster that will not work on a mini-cube cluster.

00:12:45.190 --> 00:12:54.770
Yeah. So it does have some limitations, but it allows us to do things with multiple nodes that we can't do any other way in a in a virtual environment.

00:12:54.970 --> 00:13:16.380
And so let's do a Gitpods minus A. And we have, as you can see, we've got a core DNS. We have our SDD. We have what they use as kind of net. That's its own mini-cube setup.

00:13:16.500 --> 00:13:29.400
The API server, which is very important, our Qube controller manager, and two proxy, the Qube scheduler, and our storage provision.

00:13:29.560 --> 00:13:37.070
There's a little bit of delay in the slides, but, okay, so now you can MiniCube stop.

00:13:38.010 --> 00:13:49.070
And as soon as that is completed, we'll do a MiniCube delete all, and this actually does require flags, hyphens, and that will give us a

00:13:49.090 --> 00:13:54.090
fresh mini-cube environment when we start up the next one.

00:13:54.090 --> 00:13:57.090
So that wipes everything out.

00:13:57.090 --> 00:13:59.090
And we'll do this frequently between

00:13:59.090 --> 00:14:01.090
Pre-e eliminate issues.

00:14:01.090 --> 00:14:04.090
Okay, so we've gone to the course outline,

00:14:04.090 --> 00:14:08.090
the course schedule, how our Kubernetes environment works.

00:14:08.090 --> 00:14:10.090
Do you have any questions before we begin?

00:14:10.090 --> 00:14:14.090
Okay, let me upload.

00:14:14.090 --> 00:14:17.090
Let's first time I've run to this.

00:14:17.090 --> 00:14:18.090
It says click, exit,

00:14:18.090 --> 00:14:39.200
exit, but there is no exit.

00:14:39.200 --> 00:14:40.200
All right.

00:14:40.200 --> 00:14:42.200
So what is Kubernetes?

00:14:42.200 --> 00:14:50.200
So Kubernetes, which is referred to as K-8s, oftentimes to shorten the name.

00:14:50.200 --> 00:15:01.200
So K-8 is pronounced K-8, and that is based on the first letter K, the next eight letters, and the final letter is S.

00:15:01.200 --> 00:15:05.200
So K-8-S is the shortening for Kubernetes.

00:15:05.200 --> 00:15:11.960
And it's a system that enables the automated deployment and management of containerized applications.

00:15:12.960 --> 00:15:15.800
It's a free open source software program.

00:15:18.370 --> 00:15:28.730
And KH was designed to be a cost effective alternative for managing containerized workloads and services versus standardized cloud application.

00:15:29.030 --> 00:15:30.590
Conventional cloud providers.

00:15:32.030 --> 00:15:36.750
Typically, if you've worked with AWS before, you have a team of infrastructure.

00:15:36.770 --> 00:15:43.670
engineers in the background who created AWS and they created the UIs and the APIs so

00:15:43.670 --> 00:15:49.950
that as a cloud engineer, you can communicate with that using Curaform or some other cloud

00:15:49.950 --> 00:15:51.790
provisioning tool.

00:15:51.790 --> 00:15:59.050
They've abstracted away a lot of the difficult parts of the process.

00:15:59.050 --> 00:16:06.270
With Kubernetes, you are the engineer when you build a cluster.

00:16:06.270 --> 00:16:11.170
So that engineer that worked in the background at AWS, you are that engineer now.

00:16:11.510 --> 00:16:26.350
And so I think that that's where a lot of cloud engineers, where they run into difficulties with Kubernetes thinking that it's going to be just like spinning up a cloud instance, you know, on AWS.

00:16:27.690 --> 00:16:35.450
And they realize that there's actually a steep learning curve because you have to understand how the cluster works behind the scenes before you.

00:16:36.270 --> 00:16:41.950
So in the cloud, you might provision infrastructure using Terraform.

00:16:42.850 --> 00:16:50.350
But in K-8s, we have a Kubernetes engineer who designs and automates the creation of the cluster.

00:16:51.950 --> 00:17:03.970
And then we have DevOps personnel who then provision new clusters and run their application workloads on the cluster using manifest files or Helm templating.

00:17:03.970 --> 00:17:14.020
So, yeah, so in the cloud, and I can actually go through that here.

00:17:14.320 --> 00:17:17.960
So in the cloud, we use state management tools such as terraform.

00:17:18.640 --> 00:17:24.120
So we write our terraform code, infrastructure's code, and we say, you know, spin up this network,

00:17:25.040 --> 00:17:31.160
spin up this cloud server with, you know, 20 CPUs and 12 gig of RAM,

00:17:32.160 --> 00:17:36.100
and pull this, you know, image in or this container.

00:17:36.120 --> 00:17:43.120
build this cloud server and deploy it, provide an ingress.

00:17:43.120 --> 00:17:47.120
And we manage that state with something like Terraform.

00:17:47.120 --> 00:17:49.120
Have you ever used Terraform before?

00:17:49.120 --> 00:17:50.120
Okay.

00:17:50.120 --> 00:17:53.120
So, and that's the way that if you work with AWS,

00:17:53.120 --> 00:17:59.120
almost everything is done through Terraform or their own provisioning tool.

00:17:59.120 --> 00:18:05.120
With Kubernetes, that does not work well.

00:18:05.120 --> 00:18:11.680
And so Kubernetes, the reason why is because when you use Terraform, you are managing

00:18:11.680 --> 00:18:12.680
the state.

00:18:12.680 --> 00:18:18.240
And Terraform will provide back to you feedback and says, okay, this is the state you requested.

00:18:18.240 --> 00:18:22.840
I have now provisioned it and this is the state that it is in, right?

00:18:22.840 --> 00:18:25.380
And so you manage the state yourself.

00:18:25.380 --> 00:18:31.880
And that's because the AWS engineers are managing the infrastructure behind the scenes.

00:18:31.880 --> 00:18:36.900
So they're managing the state of the entire infrastructure and you're just managing the state

00:18:36.900 --> 00:18:40.220
of your particular cloud instance.

00:18:40.220 --> 00:18:43.700
But with Kubernetes, it maintains its own state.

00:18:43.700 --> 00:18:47.660
It's a completely different concept.

00:18:47.660 --> 00:18:56.640
And so because it's maintaining its own state, then we don't use Terraform to provision Kubernetes

00:18:56.640 --> 00:18:59.500
clusters today in modern Kubernetes.

00:18:59.500 --> 00:19:01.860
So we use Terraform for the OS layer.

00:19:01.880 --> 00:19:06.480
the Ubuntu 24 instance.

00:19:06.480 --> 00:19:10.480
And then we use Ansible to install and configure Kubernetes

00:19:10.480 --> 00:19:11.480
on the OS layer.

00:19:11.480 --> 00:19:15.500
Have you used Ansible before?

00:19:15.500 --> 00:19:16.400
Okay.

00:19:16.400 --> 00:19:19.580
Ansible is, I would say probably the preferred tool

00:19:19.580 --> 00:19:23.360
for working with Kubernetes outside of Helm.

00:19:23.360 --> 00:19:27.220
And the Ansible can control Helm.

00:19:27.220 --> 00:19:32.020
What you'll find is that Terraform was created pre-Helm.

00:19:32.020 --> 00:19:33.680
And they do have a Helm module.

00:19:33.700 --> 00:19:38.700
However, it only works about 50% of the time with Helm charts.

00:19:40.200 --> 00:19:46.700
So let that sink in how frustrating that would be trying to deploy a Kubernetes cluster

00:19:46.700 --> 00:19:51.700
and then deploy workloads with Helm, with Terra.

00:19:51.700 --> 00:19:52.700
Oh, you haven't written them?

00:19:52.700 --> 00:19:57.700
Okay, they're not that difficult to write once you learn the formatting.

00:19:57.700 --> 00:20:02.700
Getting the bars correct and managing your secret.

00:20:02.700 --> 00:20:09.340
of you know bars I think is that probably the most difficult aspect of it and then

00:20:09.340 --> 00:20:13.900
once you learn the templating for ancible they're pretty easy to ride actually

00:20:13.900 --> 00:20:19.020
and ancible can almost do the entire thing for Kubernetes the one thing it does not do

00:20:19.020 --> 00:20:22.860
well is spinning up the OS layer and then in Kubernetes we use declarative

00:20:22.860 --> 00:20:26.780
GitOps to provision and maintain the state of the workloads and the cluster

00:20:26.780 --> 00:20:32.540
you can manage it manually through the command line you can manage it

00:20:32.700 --> 00:20:40.220
by running your helm commands but we have declarative git ops tools that will actually manage

00:20:40.220 --> 00:20:46.460
and maintain this state for you and allow you to pull those workloads in change the versions

00:20:46.460 --> 00:20:53.260
it'll use the helm chart templating and it actually makes life a lot easier for devops personnel

00:20:53.260 --> 00:20:59.900
that are managing kubernetes class once you get past the initial templating file of your declarative

00:20:59.900 --> 00:21:10.620
get ops it's it works very smooth oh you haven't so have you heard of argo cd okay so when you deploy your

00:21:10.620 --> 00:21:19.420
your workloads today we typically use helm charts for a production workload and so argo will

00:21:19.420 --> 00:21:26.700
take that helm chart and it will inflate it into the cluster for you it provides a very nice

00:21:26.700 --> 00:21:36.900
UI for you to see and work with and it'll provide the events and as well as the logs to you.

00:21:36.900 --> 00:21:42.300
So if there's an issue and then you can actually go in and see the events and logs because

00:21:42.300 --> 00:21:46.900
it will read them in real time.

00:21:46.900 --> 00:21:56.460
There's a few seconds delay and then you can roll back by just pushing code to say a GitLab repo.

00:21:56.460 --> 00:22:01.980
can roll it back by just changing the version, push that up to your GitLab repo, and

00:22:01.980 --> 00:22:08.140
Argo automatically pulls that repo, and then it'll roll back the version for you.

00:22:08.140 --> 00:22:13.820
If you want to change and upgrade a version, you just push the new version to the repo, Argo pulls

00:22:13.820 --> 00:22:20.140
the repo, and then it will pull in the new container image and the new Helm chart image,

00:22:20.140 --> 00:22:26.380
and then it will manage upgrading that container and that pod inside the

00:22:26.460 --> 00:22:37.340
the Kubernetes cluster and it allows you to really manage your state a lot better than I mean I'm

00:22:37.340 --> 00:22:43.580
used to using the command line but I use Argo now so I can do it either way but when you're

00:22:43.580 --> 00:22:48.940
training DevOps personnel I recommend starting with the command line just so you understand it

00:22:48.940 --> 00:22:54.700
but then once you understand how declarative GitOps works it makes your life a lot easier

00:22:54.700 --> 00:22:56.240
Okay, here we go.

00:22:56.240 --> 00:22:59.160
All right, so the Kubernetes cluster design process

00:22:59.160 --> 00:23:01.280
involves determining the workload needs

00:23:01.280 --> 00:23:02.940
along with the available budget.

00:23:04.460 --> 00:23:05.900
And then you engineer a solution.

00:23:05.900 --> 00:23:08.000
Kubernetes is designed to be engineered

00:23:08.000 --> 00:23:09.520
and automatically provisioned.

00:23:09.520 --> 00:23:12.280
And DevOps personnel are typically trained

00:23:12.280 --> 00:23:14.520
to maintain the cluster.

00:23:14.520 --> 00:23:19.940
And so why Kubernetes came about is with,

00:23:19.940 --> 00:23:21.880
I use AWS as an example,

00:23:21.880 --> 00:23:23.900
but you could use Google Cloud or Azure.

00:23:24.700 --> 00:23:29.100
You had an engineer that provisioned everything and then turned it over to DevOps personnel.

00:23:30.100 --> 00:23:33.600
But that created an additional job, right?

00:23:34.980 --> 00:23:45.780
And so with Kubernetes, it was designed the idea behind it at Google is that we have an initial provisioning with infrastructure as code,

00:23:45.780 --> 00:23:48.820
and then DevOps personnel can run the whole thing.

00:23:49.520 --> 00:23:51.000
They can provision a cluster.

00:23:51.560 --> 00:23:52.840
They can maintain the cluster.

00:23:53.380 --> 00:23:54.640
They can resize the cluster.

00:23:54.700 --> 00:23:57.140
and they can run all of their workloads on the cluster.

00:23:58.560 --> 00:24:03.500
And so it's designed basically to save costs for small and medium-sized companies.

00:24:03.500 --> 00:24:08.820
And so typically I'll also train DevOps personnel on provisioning workloads using GitOps.

00:24:10.200 --> 00:24:16.280
So it can be provisioned to be resource-friendly, maintainable by DevOps personnel,

00:24:17.180 --> 00:24:22.640
and it can provide savings versus the cloud of up the 90% on bare metal.

00:24:22.640 --> 00:24:27.000
So I actually have one client who owns eight companies.

00:24:28.860 --> 00:24:36.280
And his cloud bill with AWS between the eight companies was over half a million dollars a year.

00:24:37.200 --> 00:24:39.680
So pretty, pretty hefty cloud bill.

00:24:40.580 --> 00:24:52.460
And so he decided to run everything on bare metal and trained his son who's in high school to build it with him and run it.

00:24:52.640 --> 00:24:54.860
and his son is running it while he's in high school.

00:24:55.580 --> 00:24:57.600
And so his whole cost for everything

00:24:57.600 --> 00:25:00.360
after his initial outlay for hardware

00:25:00.360 --> 00:25:04.620
and paying his son as less than $50,000 a year.

00:25:05.160 --> 00:25:07.960
That's electricity, air conditioning, everything.

00:25:07.960 --> 00:25:10.900
So he took cloud costs from his eight companies

00:25:10.900 --> 00:25:14.400
from over a half a million a year to less than $50,000.

00:25:16.180 --> 00:25:20.160
And so the savings can be substantial running on Kubernetes

00:25:20.160 --> 00:25:21.760
on bare metal versus the plus.

00:25:21.760 --> 00:25:27.000
So Kubernetes 1.0 was released in July 2015.

00:25:27.000 --> 00:25:31.840
And it experienced rapid growth three years later

00:25:31.840 --> 00:25:34.300
when we brought in all of the cloud engineers.

00:25:34.300 --> 00:25:39.880
And that's where it picked up the name cloud-native Kubernetes.

00:25:39.880 --> 00:25:42.940
And that expanded the ecosystem by bringing,

00:25:42.940 --> 00:25:45.440
for example, AWS cloud engineers into it

00:25:45.440 --> 00:25:48.500
with their cloud tools.

00:25:48.500 --> 00:25:51.740
So that brought a lot of funding into

00:25:51.760 --> 00:25:57.760
Kubernetes and really brought it mean. Kubernetes maintains a minimum of three releases at a time.

00:25:59.760 --> 00:26:09.760
It has three minor releases. So you receive approximately one year of updates after the initial.

00:26:09.760 --> 00:26:21.740
Dot Zero release, and then they reach end of life. The initial dot zero releases are generally for development.

00:26:21.760 --> 00:26:30.000
testing and then once they start with the dot two release you can start running it in production

00:26:31.600 --> 00:26:35.760
and generally you'll see a release schedule that runs from dot zero to dot 13 on

00:26:35.760 --> 00:26:42.800
kubernetes so this is kind of what it looks like so you can see that today we're on

00:26:42.800 --> 00:26:50.240
1.3 3.3 note 1 so this would be for development and so if i were running a production cluster

00:26:50.240 --> 00:26:58.880
for a client right now I would be running 1.3 2 because we have a dot 2 so we're ready for

00:26:58.880 --> 00:27:07.440
production they're notorious for having bugs sometimes major bugs in the dot zeros so I would

00:27:07.440 --> 00:27:12.800
avoid those in a production setting you know for sure dot 1 sometimes you can get away with it

00:27:12.800 --> 00:27:18.800
but generally by the time you get the dot 2 it's mature enough that you can run by bugs I mean you know

00:27:18.800 --> 00:27:26.320
regressions where something that works on a 1.32.5 will not work on a 1.33.0.

00:27:26.320 --> 00:27:31.360
Yeah, so it's fairly fast moving every four months they come up with.

00:27:31.360 --> 00:27:37.440
All right. So our K-8's components, it consists of nodes,

00:27:37.440 --> 00:27:43.120
odds, the Etsy D, Kubernetes API server,

00:27:43.120 --> 00:27:48.120
Kube scheduler, Kube Controller Manager,

00:27:48.120 --> 00:27:50.080
Kube Cloud Controller Manager,

00:27:50.080 --> 00:27:52.560
which we won't use on Bear Metal,

00:27:54.360 --> 00:27:56.620
and Kube Proxy.

00:27:56.620 --> 00:27:58.540
And then on each node, we have a Kubeet.

00:27:59.980 --> 00:28:02.740
So it looks something like this.

00:28:02.740 --> 00:28:06.280
And again, on bare metal, we generally don't use

00:28:06.280 --> 00:28:07.920
a cloud provider API.

00:28:07.920 --> 00:28:12.520
And there are some instances where you might utilize one,

00:28:12.520 --> 00:28:13.100
but you can see if you can see

00:28:13.120 --> 00:28:18.120
from this diagram, we have a control plane.

00:28:18.680 --> 00:28:20.980
Actually, let me see if you can tell me,

00:28:20.980 --> 00:28:30.280
basically how many nodes are really running on this.

00:28:30.280 --> 00:28:33.080
This is kind of a trick question because of the way

00:28:33.080 --> 00:28:34.240
that they drew the diagram.

00:28:34.240 --> 00:28:36.360
Correct, that's correct.

00:28:36.360 --> 00:28:39.740
Yeah, so this is kind of how it might look in the cloud

00:28:39.740 --> 00:28:42.300
if you're running Kubernetes in the cloud.

00:28:42.300 --> 00:28:46.380
But the control plane is actually running on its own node as well.

00:28:46.380 --> 00:28:51.820
So when we run under metal, we create our own control plane nodes.

00:28:51.820 --> 00:28:53.020
So this is four.

00:28:53.020 --> 00:28:59.820
And you can see we're running Ku Proxy and Kublet on each node.

00:28:59.820 --> 00:29:07.420
And then on the control plane, we have our scheduler, our API, our Etsy,

00:29:07.420 --> 00:29:12.140
a cloud controller manager, and our cloud manager.

00:29:12.300 --> 00:29:17.300
All right, nodes, there are two primary types of nodes within a cluster.

00:29:17.300 --> 00:29:24.300
The control plane node and the worker and the worker node has changed names and it is now called an agent node.

00:29:24.300 --> 00:29:28.300
They got rid of the name worker, but you will still see that in a lot of cluster.

00:29:28.300 --> 00:29:35.300
And then you can also set up a storage node, which removes the storage tasks from the worker node

00:29:35.300 --> 00:29:40.300
and enables all storage containers to run on the dedicated storage node.

00:29:40.300 --> 00:29:44.800
And I, and why might you think that would be a good idea?

00:29:44.800 --> 00:29:45.800
Correct.

00:29:45.800 --> 00:29:51.800
More easily managed storage and resource constraints.

00:29:51.800 --> 00:30:09.300
So if you had your storage running on a worker node and that worker node experienced a high volume of container activity, it might slow down your ability to access your storage, right?

00:30:09.300 --> 00:30:11.180
because it might become resource constraints.

00:30:11.180 --> 00:30:14.360
So if we move those to a storage node,

00:30:14.360 --> 00:30:15.820
their own storage node,

00:30:15.820 --> 00:30:18.600
and that's what we have all of our persistent volumes,

00:30:18.600 --> 00:30:21.060
persistent volume claims, et cetera on.

00:30:21.980 --> 00:30:24.080
It enables those resources for storage

00:30:24.080 --> 00:30:27.540
to be used just for accessing, reading, writing

00:30:27.540 --> 00:30:30.760
to that storage, snapshots, except.

00:30:30.760 --> 00:30:36.530
And then you have a maximum of 5,000 nodes in the cluster.

00:30:36.530 --> 00:30:39.810
So nodes may be labeled using cube control.

00:30:39.810 --> 00:30:42.690
Nodes may be labeled or have a role added,

00:30:42.690 --> 00:30:45.090
such as control plane or agent.

00:30:46.170 --> 00:30:50.050
So often when you look at a production cluster today,

00:30:51.050 --> 00:30:53.370
your control plane nodes will say control plane,

00:30:54.290 --> 00:30:57.570
and your worker nodes will say worker.

00:30:59.170 --> 00:31:02.290
Or if you're building a cluster today, it'll say age.

00:31:02.290 --> 00:31:06.930
And node labels may be removed by repeating the same command

00:31:06.930 --> 00:31:08.770
with a hyphen at the end.

00:31:08.770 --> 00:31:09.790
So you can add labels,

00:31:09.810 --> 00:31:16.210
using Q control and then you can delete them by removing everything we'll go into this

00:31:16.210 --> 00:31:22.130
and demonstrate this a minute that you remove everything from the equal sign forward and replace

00:31:22.130 --> 00:31:29.730
it. So each node can manage up to 110 pods and that is the maximum. It's imposed by

00:31:29.730 --> 00:31:39.650
assigning a 24 cider block in each node. And although there are maximum 256 addresses in the 24

00:31:39.650 --> 00:31:52.110
Sider block, Kubernetes reserves addresses for other purposes, such as spinning of nodes or spinning up pods, and this leaves 110 addresses available for running pods.

00:31:52.110 --> 00:31:58.210
And you can reduce that if you want to put a limit on your node.

00:31:58.210 --> 00:32:01.990
You can actually limit the nodes, so you want to limit to 100 pods.

00:32:01.990 --> 00:32:03.910
You can do that as well.

00:32:03.910 --> 00:32:05.410
But you can't run more than 100.

00:32:05.410 --> 00:32:06.850
All right,

00:32:06.850 --> 00:32:09.730
NCD is a key value store,

00:32:09.730 --> 00:32:13.650
which is a Kubernetes backing store for the cluster data.

00:32:14.770 --> 00:32:17.410
It is designed for high availability

00:32:17.410 --> 00:32:19.810
and needs appropriate resources to send

00:32:19.810 --> 00:32:21.730
and receive heartbeat messages.

00:32:24.770 --> 00:32:27.330
XED is a leader-based distribution system

00:32:27.330 --> 00:32:29.810
and needs constant communication with each member.

00:32:30.610 --> 00:32:34.130
Etsy stores that are in a high availability environment

00:32:34.130 --> 00:32:37.670
I run an odd numbers, such as one, three, or five.

00:32:37.670 --> 00:32:40.890
So on a single node, we would have one, right?

00:32:40.890 --> 00:32:45.070
But when we go to a high availability cluster,

00:32:45.070 --> 00:32:49.310
we would run our control planes in either three or five nodes.

00:32:49.310 --> 00:32:54.590
So FED is generally run inside of each control plane node.

00:32:54.590 --> 00:32:56.290
If resource starvation occurs,

00:32:56.290 --> 00:32:58.610
you may need to remove resources running

00:32:58.610 --> 00:33:00.830
on the control plane nodes and move them

00:33:00.830 --> 00:33:03.290
to a separate management node.

00:33:03.290 --> 00:33:09.530
Alternatively, you can remove the SED pods and run them as separate SED nodes.

00:33:09.530 --> 00:33:16.650
SED needs to perform multiple reads and rights to the key value store and can run into a situation

00:33:16.650 --> 00:33:23.290
where rights are running behind or delayed. And this may be due to running the SEDD on unsuitable

00:33:23.290 --> 00:33:32.650
local storage. NVME storage works best and SSD storage works great in early all situations.

00:33:32.650 --> 00:33:34.990
Spinning hard disk drive storage,

00:33:34.990 --> 00:33:37.030
which may have difficulty keeping up,

00:33:37.030 --> 00:33:40.350
may leave the cluster in an integrated state

00:33:40.350 --> 00:33:42.430
if the rights are delayed for too long.

00:33:42.430 --> 00:33:45.150
And for enhanced security and communication

00:33:45.150 --> 00:33:48.870
between ETCD nodes, TLS encryption may be enabled.

00:33:49.790 --> 00:33:53.930
Etsy data which contains the secret key value pairs

00:33:53.930 --> 00:33:56.850
may also be encrypted at rest, at rest.

00:33:56.850 --> 00:33:59.130
So there are multiple security protocols

00:33:59.130 --> 00:34:02.010
that can be enabled particularly for this component.

00:34:02.650 --> 00:34:06.110
They are not enabled, however, on toy cluster.

00:34:06.110 --> 00:34:12.410
You may not be able to enable them on toy clusters with the exception of maybe microk8s.

00:34:12.410 --> 00:34:14.410
All right, the next component, the Kubernetes API.

00:34:14.410 --> 00:34:19.090
Kubernetes API server is located in the control plane node.

00:34:19.090 --> 00:34:25.990
It validates and configures the API objects, which can include pods, services, replication

00:34:25.990 --> 00:34:29.110
controllers, among others.

00:34:29.110 --> 00:34:32.610
The Kubernetes API server provides the rest API.

00:34:32.650 --> 00:34:34.990
access to the clusters shared state.

00:34:36.890 --> 00:34:47.870
The Kubernetes API supports retrieving, creating, updating, and deleting resources via the post, put, patch, delete, and get HTTP verbs.

00:34:49.190 --> 00:34:50.230
So it's a real API.

00:34:52.790 --> 00:34:58.710
The Kubernetes API server interacts directly with the EtsyD key value store, which is the Kubernetes Backing Store.

00:34:59.990 --> 00:35:04.230
In a resource-constrained environment, it is not unusual to say.

00:35:04.250 --> 00:35:10.570
see logs showing client timeout or error retrieving resource lock or the Kubernetes API

00:35:10.570 --> 00:35:16.410
server input. And if all of the Kubernetes API servers are unavailable, the cluster may not be

00:35:16.410 --> 00:35:24.810
recoverable. So a very key and important thing to remember, the Kubernetes API server is

00:35:24.810 --> 00:35:34.990
probably, you know, one of the two most important. So if you lose all three nodes and you

00:35:34.990 --> 00:35:41.310
cannot access them then yes you would not be able to bring your cluster but you always

00:35:41.310 --> 00:35:49.390
want to make sure that it's designed so that at least one of your control plane nodes is

00:35:49.390 --> 00:35:55.870
available so that with a VIP in front of it I think you mentioned you used did you

00:35:55.870 --> 00:36:03.310
use metal LB okay so in Kubev is a popular one as well but with a VIP in front of it

00:36:03.310 --> 00:36:07.310
if you still have one control plane node and it is configured correctly,

00:36:07.310 --> 00:36:10.670
then you will still be able to access that one control plane node.

00:36:10.670 --> 00:36:14.670
It will just be in the degree. All right, so the kubb scheduler,

00:36:14.670 --> 00:36:18.430
the kubb scheduler resides in the control plane and assigns pods.

00:36:18.430 --> 00:36:21.310
And the kub scheduler determines which nodes are available

00:36:22.350 --> 00:36:28.430
and assigns pods for scheduling according to constraints and available resources.

00:36:28.430 --> 00:36:31.470
Using a hands-on approach, we will work with some of those constraints

00:36:31.470 --> 00:36:34.630
later on in this course.

00:36:34.630 --> 00:36:39.030
The Cube scheduler communicates with the Kubernetes API server.

00:36:39.030 --> 00:36:44.250
Next is the Cube controller manager.

00:36:44.250 --> 00:36:47.150
The Cube controller manager maintains

00:36:47.150 --> 00:36:51.310
the state of the cluster using control loops.

00:36:51.310 --> 00:36:53.310
Control loops in a Kubernetes cluster

00:36:53.310 --> 00:36:57.610
are non-terminating and regulate the state of the system.

00:36:57.610 --> 00:37:00.790
Examples of controllers in the Kubernetes cluster

00:37:00.790 --> 00:37:05.790
are the replication controller, the namespace controller,

00:37:07.750 --> 00:37:09.610
the endpoints controller,

00:37:09.610 --> 00:37:13.860
and the service accounts controller.

00:37:13.860 --> 00:37:17.020
And it communicates with the Kubernetes API server.

00:37:17.020 --> 00:37:19.460
The cloud controller manager,

00:37:19.460 --> 00:37:21.760
and this is something that you generally don't see

00:37:21.760 --> 00:37:22.920
on their metal.

00:37:25.040 --> 00:37:26.400
The cloud controller manager,

00:37:26.400 --> 00:37:28.880
not to be confused with the controller manager,

00:37:28.880 --> 00:37:31.020
is a Kubernetes control plane component

00:37:31.020 --> 00:37:32.300
that embeds clouds,

00:37:32.320 --> 00:37:34.880
specific control logic.

00:37:34.880 --> 00:37:41.380
The cloud controller manager let you decouple the interoperability logic between Kubernetes

00:37:41.380 --> 00:37:47.380
and the underlying cloud infrastructure it is running on, such as AWS, for example.

00:37:47.380 --> 00:37:54.840
The cloud controller manager includes a node controller, a route controller, and a service controller.

00:37:54.840 --> 00:37:57.640
And it communicates with the Kubernetes API.

00:37:57.640 --> 00:37:59.440
And we have Kube proxy.

00:37:59.440 --> 00:38:01.840
Kube proxy.

00:38:01.840 --> 00:38:06.200
Most Kubernetes clusters come with Kube proxy by default.

00:38:06.200 --> 00:38:09.180
Kube proxy runs on each node.

00:38:09.180 --> 00:38:13.220
Kube proxy can do simple TCP, UDP, and SCTP,

00:38:13.220 --> 00:38:17.480
stream forwarding, or round-robin, TCP, UDP,

00:38:17.480 --> 00:38:18.140
and SET.

00:38:18.140 --> 00:38:20.360
Modern production Kubernetes environments

00:38:20.360 --> 00:38:24.760
use a container network interface for cluster networking.

00:38:24.760 --> 00:38:27.880
Modern CNI, such as Scyllium,

00:38:27.880 --> 00:38:29.420
have replaced Kube proxy and

00:38:29.440 --> 00:38:36.160
internally. This improves reliability within the cluster and can also boost the network

00:38:36.160 --> 00:38:44.560
performance. So Kube proxy comes standard, but in production environments, we typically would

00:38:44.560 --> 00:38:51.520
remove Kube proxy with a replacement and then Kubeut. Kubit service runs on each node as a

00:38:51.520 --> 00:38:59.280
Damon and you can view the status by reviewing the node details. All right, Kubernetes flavors are

00:38:59.280 --> 00:39:06.840
distributions. In 2015, there was only one flavor of Kubernetes, plain vanilla. And that's why they

00:39:06.840 --> 00:39:13.520
call them flavors, if you are wondering, instead of distributions. If you run into someone who's

00:39:13.520 --> 00:39:17.480
worked with Kubernetes for a while, and they say, what flavor are you running? That's because the first

00:39:17.480 --> 00:39:24.060
flavor was vanilla. It must have been a Google thing. I'm not sure. Today, however, there are many

00:39:24.060 --> 00:39:29.260
flavors or distributions to choose from it. So some of the distributions are designed.

00:39:29.280 --> 00:39:35.680
designed for toy or demonstration clusters with limited feature sets that consume fewer resources

00:39:35.680 --> 00:39:39.680
and enable the node to run on a Raspberry Pi or laptop.

00:39:39.680 --> 00:39:44.080
Other flavors are designed with security and production in mind and enable robust security

00:39:44.080 --> 00:39:47.680
measures such as EtsyD encrypted communication,

00:39:47.680 --> 00:39:52.880
XED encrypted data at res, XED snapshoting, node to node encryption,

00:39:52.880 --> 00:39:59.260
pot-a-pot encryption, secrets encryption, there's even a distribution design,

00:39:59.280 --> 00:40:02.980
to meet the Department of Defense Stig requirements.

00:40:04.120 --> 00:40:06.960
Choosing a flavor of Kubernetes is an important decision.

00:40:06.960 --> 00:40:09.880
It can have an impact on a long-term viability

00:40:09.880 --> 00:40:12.900
about the cluster itself and the workloads

00:40:14.120 --> 00:40:16.220
that the cluster manages.

00:40:16.220 --> 00:40:21.540
So the following is a list of inundated list of flavors.

00:40:21.540 --> 00:40:23.640
For enterprise production,

00:40:23.640 --> 00:40:27.440
you have rancher's own RKE2,

00:40:27.440 --> 00:40:28.740
and then you have red hats,

00:40:28.740 --> 00:40:30.740
Micershift or OpenShift.

00:40:32.260 --> 00:40:37.060
And Red Hat is geared towards Department of Defense Stig requirements.

00:40:38.180 --> 00:40:41.780
So if you are using, if you, for example,

00:40:41.780 --> 00:40:45.940
bidding a job for DOD or working with DOD,

00:40:45.940 --> 00:40:49.620
then they may require that you meet certain state requirements,

00:40:49.620 --> 00:40:51.460
which is what Red Hat is designed for.

00:40:52.340 --> 00:40:54.740
And so you'll see those used in those environments.

00:40:55.620 --> 00:40:58.580
For other government contracting, and again, I'm long-term

00:40:58.580 --> 00:41:06.820
government contractor. You'll see RKE2 specified. And in fact, about a year and a half ago,

00:41:08.100 --> 00:41:16.980
government agencies began refusing to pay for products that are run on K3S clusters. So you'll

00:41:16.980 --> 00:41:23.380
probably see a lot of K3S clusters run by companies and they sell a product that runs on

00:41:23.380 --> 00:41:28.500
the K3S clusters. So they provide the entire environment to their customer. And those customers

00:41:28.580 --> 00:41:34.600
just happen to be federal government customers, and about a year and a half ago, they began

00:41:34.600 --> 00:41:40.880
denying payment for products run on K3S clusters because they're actually a toy cluster that's

00:41:40.880 --> 00:41:47.220
designed for a Raspberry. Yeah, so that's a cloud environment, and so that would be similar

00:41:47.220 --> 00:41:54.840
to an RKE2, for example, yeah. But that's their own flavor. So they built their own, they took

00:41:54.840 --> 00:41:58.560
vanilla, and then they built their own flavor on it.

00:41:58.580 --> 00:42:03.000
it and the cost is actually higher than running it on the cloud.

00:42:05.280 --> 00:42:12.600
The way I describe AWS or Google or Azure's Kubernetes offering is it's to learn on

00:42:12.600 --> 00:42:18.100
and then move off of it as quickly as possible because the cost is actually higher than the cloud.

00:42:18.900 --> 00:42:24.040
My clients have told me it when they move their workloads over, it actually costs them

00:42:24.040 --> 00:42:28.420
a little bit more each month to run it in Kubernetes on the cloud than the cloud itself.

00:42:28.580 --> 00:42:34.340
and then they quickly figure out how to discover bare metal and move to bear metal

00:42:34.340 --> 00:42:41.700
for the savings so for development demonstration we have RKE1 which is Rancher's original

00:42:41.700 --> 00:42:51.540
Kubernetes and that is actually Kubernetes in Docker similar to minicube that we're using now

00:42:51.540 --> 00:42:58.020
K3S was developed by Rancher and it was designed for Raspberry Pi

00:42:58.020 --> 00:43:05.260
and you may have you ever worked with k3s okay so it was kind of made famous back during

00:43:05.260 --> 00:43:11.420
covid when an engineer during covid moved to south america and i forget which country

00:43:11.420 --> 00:43:20.120
peru or somewhere and he needed to maintain a presence in the u.s and so he developed an ansible

00:43:20.120 --> 00:43:27.840
script for spending up a k3s cluster on a raspberry pie setup and then he used tail scale to

00:43:27.840 --> 00:43:32.020
connect to a DigitalOcean droplet for an IP address in the U.S.

00:43:33.060 --> 00:43:37.780
And he was able to service all of his clients using a U.S.-based IP address.

00:43:38.680 --> 00:43:44.540
And once he published that, it put K3S on the map.

00:43:45.260 --> 00:43:50.160
And so those scripts were made available in a lot of companies because they did not know how

00:43:50.160 --> 00:43:51.740
to create a Kubernetes cluster.

00:43:52.660 --> 00:43:57.720
They took his scripts and another individual named Technotim modified it and created

00:43:57.720 --> 00:44:03.480
at his own. And so a lot of companies have been running K3S in production because those scripts

00:44:03.480 --> 00:44:10.440
are available for free in GitHub repos. But they're actually, yeah, and so those are actually

00:44:10.440 --> 00:44:19.240
toy clusters. So what happened is Rancher was raising capital to develop their RKE2. And so somebody

00:44:19.240 --> 00:44:25.800
had a bright idea to put on the K3S page at its production grade, even though it wasn't, and that

00:44:25.800 --> 00:44:33.060
help them raise the capital. And then they quickly remove that and turned it over to, I believe

00:44:33.060 --> 00:44:41.080
it's the Cloud Native Foundation is now running K3S. The K3S engine, however, is also the engine for

00:44:41.080 --> 00:44:48.480
RKE2, but that's where it stops. So the K3S engine is quite robust. It's just that K3S has a lot

00:44:48.480 --> 00:44:55.720
of features disabled so that they can run on a Raspberry Pot. Then you have Dr. K-8s. We've already

00:44:55.800 --> 00:45:01.920
discuss microkates a minicube, Dr. K8s, which is resource intensive. You can run it on a

00:45:01.920 --> 00:45:11.700
a Mac, for example. Delay on this. Okay, so Dr. K8 is resource intensive. I've actually

00:45:11.700 --> 00:45:22.880
destroyed a MacBook running Dr. K8s on it. Yeah, fairly new MacBook. It's, so what

00:45:22.880 --> 00:45:28.460
happens is it uses so many resources and Rancher desktop.

00:45:28.560 --> 00:45:35.280
as well that the battery can't cool down even with the fan running and so the battery

00:45:35.280 --> 00:45:42.720
starts expanding rapidly because of the heat yeah and so you have Rancher desktop

00:45:42.720 --> 00:45:51.760
which competes with Doctor K-8s and it allows you to run traffic and so you can

00:45:51.760 --> 00:45:58.480
develop on it but a lot of features don't work inside Rancher desktop and

00:45:58.560 --> 00:46:03.200
And they didn't devote the resources to really finishing it.

00:46:03.200 --> 00:46:07.520
There are some teams, and you'll run into this in the Kubernetes ecosystem where they go ahead and

00:46:07.520 --> 00:46:10.400
push something to production that's not ready.

00:46:10.400 --> 00:46:16.960
And Ranch or desktop was an interesting idea, but it was not ready for production when they pushed it.

00:46:16.960 --> 00:46:23.280
And then you have the original Kaniacal from Ubuntu, and then of course in 2015,

00:46:23.280 --> 00:46:26.880
Vanilla came out and it's still around, is what everything was based.

00:46:26.880 --> 00:46:28.080
All right, can you think of a reason you may not

00:46:28.080 --> 00:46:34.480
you may not want to encrypt at cd communication why you may want to increase yes so on a single

00:46:34.480 --> 00:46:40.800
control plane you wouldn't need it because everything's within the same node so it would all be

00:46:40.800 --> 00:46:45.760
running in the same VM or virtual private server but as soon as you go to a three

00:46:45.760 --> 00:46:52.240
node high availability setup um or five nodes which you probably won't run into a five

00:46:52.240 --> 00:46:58.000
node control plane um but a three node control plane then you need to be able to

00:46:58.080 --> 00:47:09.200
protect the data flowing back and forth because the EtsyD contains your secret store for your

00:47:09.200 --> 00:47:17.280
key value pairs so passwords think of terms of all right can you think of a reason you may want

00:47:17.280 --> 00:47:26.160
node to node or pod to pod encryption yeah so so think of in the cloud right so we had

00:47:26.160 --> 00:47:29.920
TLS search that connected directly to the container in the cloud

00:47:30.480 --> 00:47:38.540
And so when you're ingress into the cloud into your your workload, your TLS serve as residing on that cloud instance.

00:47:39.460 --> 00:47:41.680
But in Kubernetes, that's cumbersome.

00:47:41.780 --> 00:47:47.440
If you have 500 containers, do you really want to manage 500 certificates, right, and update them?

00:47:48.360 --> 00:47:50.940
And it's a failure point in Kubernetes.

00:47:51.940 --> 00:47:58.820
And so, and we'll talk about this, but Kubernetes is replacing ingress with Gateway API.

00:47:58.820 --> 00:48:04.520
And it's been in beta for three years, and it works really well.

00:48:04.980 --> 00:48:09.580
But Gateway API is designed with a single TLS search, which terminates at the gateway,

00:48:10.120 --> 00:48:14.340
and then traffic is passed to the containers in an encrypted tunnel.

00:48:14.780 --> 00:48:17.300
So it gets past node-to-node encryption.

00:48:17.860 --> 00:48:24.240
It gets passed through pod-to-pod encryption, and it can do node-de-pod and pod-to-node encryption as well.

00:48:25.300 --> 00:48:28.540
And so that enables us to terminate the TLS.

00:48:28.820 --> 00:48:32.820
and still have multiple nodes spread out amongst a network

00:48:32.820 --> 00:48:35.020
where the traffic between them is encrypted,

00:48:35.020 --> 00:48:37.580
even if we don't control that entire network.

00:48:37.580 --> 00:48:40.500
So it's available as part of a CNI.

00:48:40.500 --> 00:48:46.800
And so Cillium is one of the few C&Is that offers that out of the box.

00:48:46.800 --> 00:48:49.360
So when I talk to clients,

00:48:49.360 --> 00:48:51.560
when they ask me about C&Is,

00:48:51.560 --> 00:48:54.740
I always encourage them to use Cillium because it's the easiest

00:48:54.740 --> 00:48:56.440
and it has everything out of the box,

00:48:56.440 --> 00:48:57.820
so you don't pay for it.

00:48:57.820 --> 00:49:02.820
But it's done through transparent encryption through the C&I.

00:49:02.820 --> 00:49:07.820
All right, so what would happen to a cluster if the Kubernetes API server were available?

00:49:07.820 --> 00:49:08.820
Right.

00:49:08.820 --> 00:49:16.820
So although the cluster doesn't fail immediately, it will typically lead to loss of control over the Kubernetes cluster.

00:49:16.820 --> 00:49:23.820
So it will prevent the deployment of new resources and can prevent the management of existing resources.

00:49:23.820 --> 00:49:26.820
This can manifest itself in an error such as unable to connect a service.

00:49:26.820 --> 00:49:32.500
to connect a server or client timeout when running Qube control Gitpods or QC control

00:49:32.500 --> 00:49:38.660
apply etc. It will also show up in the logs for individual Kubernetes components that need

00:49:38.660 --> 00:49:43.700
to communicate with the API server. These are typically on the control plans. Those logs will

00:49:43.700 --> 00:49:49.380
be unavailable to see unless they have been shipped to a log collector as a command used

00:49:49.380 --> 00:49:56.660
to view logs, QQ control logs, will return the unable to connect server error. So you can see where

00:49:56.660 --> 00:49:59.340
where if the Kubernetes API server is unavailable,

00:49:59.340 --> 00:50:02.700
it becomes very difficult to even try to diagnose a cluster.

00:50:02.700 --> 00:50:05.440
All right, can you think of a reason you might not want

00:50:05.440 --> 00:50:10.660
to run your control plane nodes on shared resource virtual machines

00:50:10.660 --> 00:50:12.860
or shared resource virtual servers?

00:50:13.780 --> 00:50:14.940
So let's say, for example,

00:50:14.940 --> 00:50:17.900
you had the ability to spin up a Kubernetes cluster

00:50:17.900 --> 00:50:23.380
on DigitalOcean or Hexner or, you know, OV Cloud,

00:50:23.380 --> 00:50:26.640
OVH Cloud, you have a choice between dedicated

00:50:26.660 --> 00:50:29.300
dedicated resources or shared resources.

00:50:29.300 --> 00:50:31.760
Both are virtual machines, by the way.

00:50:31.760 --> 00:50:34.680
But with a shared resource, it's cheaper.

00:50:34.680 --> 00:50:36.680
It's half the price.

00:50:36.680 --> 00:50:39.840
And so you'll see initial Kubernetes practitioners

00:50:39.840 --> 00:50:42.320
gravitate towards that because when you spend up

00:50:42.320 --> 00:50:44.980
Kubernetes cluster, you have multiple nodes, right?

00:50:44.980 --> 00:50:47.280
So you have to pay for each VM.

00:50:47.280 --> 00:50:48.900
So if you do shared, it's half the price.

00:50:48.900 --> 00:50:49.900
So it makes sense.

00:50:49.900 --> 00:50:51.980
I can spend up a cluster for half the price.

00:50:53.380 --> 00:50:56.640
But shared resources versus dedicated resource

00:50:56.660 --> 00:51:02.860
could enable your control playing to degrade beyond a repairable state and render your cluster

00:51:02.860 --> 00:51:09.240
inoperable. So you're competing for resources in a shared environment. But it is half the price

00:51:09.240 --> 00:51:14.300
to spin up a cluster. So for demonstration purposes, or if you're testing and you want to test

00:51:14.300 --> 00:51:20.820
it on bare metal in the cloud or a virtual machine, it is a cheaper way to test your cluster.

00:51:20.820 --> 00:51:25.500
say you want to build an Ancable script to automate the spin-up.

00:51:25.500 --> 00:51:30.100
You can test it on shared resources and then, you know, spin it down, delete them,

00:51:30.100 --> 00:51:33.060
and then run it on dedicated resources after that.

00:51:34.900 --> 00:51:39.380
If you could choose the two most valuable components in a control plane node, what would they be?

00:51:39.380 --> 00:51:40.580
Yes, yes.

00:51:40.580 --> 00:51:48.100
So like the Kubernetes components that we went through earlier, there's about, what, six or seven different components.

00:51:48.100 --> 00:51:50.800
And you can see on your screen right here, some of the components.

00:51:50.820 --> 00:51:57.920
So on your terminal, you've got core DNS, SED, Kynet, API, server, controller manager.

00:51:59.600 --> 00:52:00.480
There you go.

00:52:01.400 --> 00:52:04.420
Without them, you will not be able to access the cluster.

00:52:04.720 --> 00:52:06.360
Read the current state of the cluster.

00:52:06.960 --> 00:52:12.240
You have a cluster, but you're not going to be able to do anything with it if you lose those two.

00:52:13.160 --> 00:52:13.920
All right.

00:52:13.920 --> 00:52:16.180
Day one versus day two clusters.

00:52:18.840 --> 00:52:19.480
All right.

00:52:19.480 --> 00:52:24.360
A day one cluster, day one cluster is designed to operate for a single day or task.

00:52:26.040 --> 00:52:31.560
It can be spun down, modified, and rebuilt using infrastructure as code.

00:52:32.360 --> 00:52:36.360
Example of day one clusters or development and staging environments.

00:52:37.720 --> 00:52:42.200
A day two cluster is designed to survive for longer than a day, obviously.

00:52:42.840 --> 00:52:49.320
Typically a Kubernetes engineer will not test new node or pod configurations on the day.

00:52:49.480 --> 00:52:55.220
two cluster. Any testing that needs to be performed should be performed on a day one cluster.

00:52:56.760 --> 00:53:02.020
After all proper configurations have been determined and tested, the new configurations can then

00:53:02.020 --> 00:53:08.240
be applied to a day two cluster. And this is typically performed using infrastructure as code

00:53:08.240 --> 00:53:14.660
or declarative GitHub. So ensuring the availability of both control plane nodes and agent nodes

00:53:14.660 --> 00:53:20.100
requires proper plan. Control plane nodes and we're dealing with high availability in

00:53:20.100 --> 00:53:26.420
Kubernetes environment. Control plane nodes on bare metal use a leader election with a load balancer

00:53:26.420 --> 00:53:33.300
that assigns a virtual IP to the leader. The virtual IP address ensures that if the leader changes,

00:53:34.100 --> 00:53:39.940
the proper node can still be queried and receive messages from other nodes. For the agent or

00:53:39.940 --> 00:53:45.080
worker nodes, high availability means enabling multiple instances of a

00:53:45.080 --> 00:53:49.600
stateless workload across multiple nodes. This is enabled with a load

00:53:49.600 --> 00:53:54.580
balancer service and may also include the ability to auto-scale pods up or

00:53:54.580 --> 00:53:59.200
down with a minimum and maximum number. In a high availability set up with

00:53:59.200 --> 00:54:04.280
three nodes, a pod anti-infinity setting is used to make sure the pods are

00:54:04.280 --> 00:54:09.920
distributed equally across available nodes. This

00:54:09.940 --> 00:54:16.940
It enables a cluster to avoid ending up with three pods on a single node and the two remaining nodes containing no.

00:54:16.940 --> 00:54:18.940
You know, apologies for the delay there.

00:54:18.940 --> 00:54:26.540
Every once in a while it clicks through and other times it just freezes on the slide deck.

00:54:26.540 --> 00:54:31.540
All right, so Kubernetes is designed from the beginning with self-healing capabilities.

00:54:31.540 --> 00:54:35.540
This enables it to maintain the availability of workloads.

00:54:35.540 --> 00:54:40.540
It automatically reschedules workloads when nodes become unavailable.

00:54:40.540 --> 00:54:45.840
replaces field containers and ensures that the desired state of the system is maintained.

00:54:45.840 --> 00:54:49.540
The self-huling capabilities include four key areas.

00:54:49.540 --> 00:54:52.540
Container level restarts.

00:54:52.540 --> 00:54:59.540
If a container fails, Kubernetes will automatically restart the container within a pod based on the restart policy.

00:54:59.540 --> 00:55:02.540
Load balancing for services.

00:55:02.540 --> 00:55:08.540
If a pod fails, Kubernetes will automatically remove it from the services endpoints

00:55:08.540 --> 00:55:12.380
and route traffic only to healthy pods.

00:55:12.380 --> 00:55:16.460
Persistent storage recovery, if a node in a running pod

00:55:16.460 --> 00:55:20.700
with a persistent volume attached and the node fails,

00:55:20.700 --> 00:55:26.140
Kubernetes will reattachats the volume to a new pod on a replica replacement.

00:55:26.140 --> 00:55:30.860
In a pod, if a pod in a deployment or staple set fails,

00:55:30.860 --> 00:55:34.300
a replacement pod will be created by Kubernetes

00:55:34.300 --> 00:55:36.060
to maintain the desired state of route.

00:55:36.060 --> 00:55:38.060
If the pod fails as part of a daemon set,

00:55:38.060 --> 00:55:41.600
the control plane will replace the pod to run on the same node.

00:55:42.400 --> 00:55:47.900
This self-healing capability is one of the reasons it has grown in popularity with DevOps.

00:55:48.140 --> 00:55:50.720
All right, now we get to the causality dilemma.

00:55:51.540 --> 00:55:54.500
What is the causality dilemma in Kubernetes?

00:55:55.600 --> 00:55:57.320
Are you familiar with the causality dilemma?

00:55:57.320 --> 00:56:03.300
Okay, so it's commonly referred to as a chicken or egg paradox, which came first?

00:56:03.420 --> 00:56:08.040
It's a circular situation that describes very well what is often experienced in design.

00:56:08.060 --> 00:56:12.060
and engineering a Kubernetes cluster using automation.

00:56:12.060 --> 00:56:16.060
Certain processes, and this also has to do with GitOps,

00:56:16.060 --> 00:56:19.060
certain processes need a resource to start.

00:56:19.060 --> 00:56:21.060
However, for that resource to be in place

00:56:21.060 --> 00:56:25.060
and may require a certain process to run first,

00:56:25.060 --> 00:56:27.060
hence the causality dilemma.

00:56:27.060 --> 00:56:31.060
So if you automate the building with Kubernetes clusters from scratch,

00:56:31.060 --> 00:56:35.060
using infrastructure as code, you will likely run into this issue.

00:56:35.060 --> 00:56:38.060
And what I see is the automation

00:56:38.060 --> 00:56:43.660
that are out there, the engineers typically do not even attempt to solve the cause

00:56:43.660 --> 00:56:50.820
that all the dilemma. And so your cluster is not actually operating the way that it's intended

00:56:50.820 --> 00:56:58.700
to run because they couldn't figure out how to solve that piece. So one piece where that's

00:56:58.700 --> 00:57:04.580
important is when you have a CNI. So if we have a high availability control plane with three

00:57:04.580 --> 00:57:06.460
and we have a BIP in front of it,

00:57:07.180 --> 00:57:09.500
that we need to access our API server

00:57:09.500 --> 00:57:10.980
based on who's the leader.

00:57:11.620 --> 00:57:13.620
But when we set up our CNI,

00:57:14.400 --> 00:57:17.160
it's based on its own internal API.

00:57:18.500 --> 00:57:19.900
Or on step one,

00:57:20.000 --> 00:57:23.260
it's based on the IP address

00:57:23.260 --> 00:57:26.680
of the primary control plane node.

00:57:27.140 --> 00:57:28.480
But when we assign a VIP,

00:57:28.640 --> 00:57:32.140
we then need to give that VIP to the CNI,

00:57:33.140 --> 00:57:33.460
right?

00:57:33.700 --> 00:57:34.560
So we have to install.

00:57:34.580 --> 00:57:40.580
the C&I, install the VIP, and then get the VIP, and then reinstall the CNI.

00:57:40.580 --> 00:57:41.580
So that's the cause.

00:57:41.580 --> 00:57:42.580
Did that make sense?

00:57:42.580 --> 00:57:46.580
I think we have one practice where we might delve into it just a little.

00:57:46.580 --> 00:57:50.580
We won't be building clusters, so we won't have too much on that.

00:57:50.580 --> 00:57:52.580
All right.

00:57:52.580 --> 00:57:54.580
The Cube config file.

00:57:54.580 --> 00:57:58.580
So it's required for access to the Coup Control command line tool.

00:57:58.580 --> 00:58:03.580
And the Coup Config file name may be descriptive, or it may be

00:58:03.580 --> 00:58:08.580
or it may reside as just config under the .cube directory.

00:58:08.580 --> 00:58:13.580
This is the key to the system to access the file,

00:58:13.580 --> 00:58:18.580
navigate to the dot cube directory of the host or client,

00:58:18.580 --> 00:58:19.580
and cat the config file.

00:58:19.580 --> 00:58:22.580
In a production cluster, you will need to export the location

00:58:22.580 --> 00:58:24.580
of the CoupConfig file

00:58:24.580 --> 00:58:27.580
for the Coup Control command using something like

00:58:27.580 --> 00:58:31.580
export Cube Config equals where it's located at the name.

00:58:31.580 --> 00:58:32.580
So you can see here, this is a code config

00:58:32.580 --> 00:58:40.180
here. This is an example of a production setup where you had production, you have development,

00:58:40.180 --> 00:58:45.660
and you have multiple clusters. And so you would export the K-Config, so that it knows exactly

00:58:45.660 --> 00:58:53.620
where the config file is. In this case, it's named K-Config. Yamil in that particular location.

00:58:53.620 --> 00:59:02.560
So, and then to unset the K-Config, we simply run Unset Coup Config. So MiniCube takes

00:59:02.580 --> 00:59:05.800
care of this every time you start up a dev cluster.

00:59:05.800 --> 00:59:10.160
And then the Kubernetes API server listens on port 644-3

00:59:10.160 --> 00:59:14.080
or 443, depending on how it is configured.

00:59:14.080 --> 00:59:15.940
So it can be either port.

00:59:17.760 --> 00:59:21.700
All right, we're gonna get into practical application one.

00:59:22.960 --> 00:59:27.340
So we're going to, let's see, spin up a mini-cube cluster.

00:59:27.340 --> 00:59:30.440
So we're gonna do the original command,

00:59:30.440 --> 00:59:34.290
we did mini-cube start.

00:59:34.430 --> 00:59:36.950
I have a question for you here while you're waiting for that to spin up.

00:59:37.030 --> 00:59:38.090
So how do you check the version?

00:59:39.890 --> 00:59:46.490
So obviously it's cheating by telling us that says it downloaded Kubernetes version 1.3.1, right?

00:59:47.750 --> 00:59:48.750
You see it in there.

00:59:48.810 --> 00:59:49.710
So obviously that's cheating.

00:59:49.810 --> 00:59:55.430
We know it's version 1.33.1, but in the absence of MiniCube, if we were doing this on our own cluster,

00:59:55.970 --> 00:59:56.970
how would we check the version?

00:59:57.010 --> 01:00:00.730
So using the Coup Control command line tool, we check the version of Coup Control.

01:00:00.890 --> 01:00:04.290
And we can see that the version is printed every time with that command.

01:00:04.430 --> 01:00:14.570
version 1.3.3.1. And based on the Kubernetes version shown, what is the anticipated end of life

01:00:14.570 --> 01:00:20.870
for this version? What task will need to be performed by this EOL date? And let me see here.

01:00:22.110 --> 01:00:27.610
There we go. Correct. Now, we would not be running this in production. This would be in a

01:00:27.610 --> 01:00:34.410
development cluster. So in production, we'd have 1.32. But yes. And then,

01:00:34.430 --> 01:00:37.590
And so what task would need to be performed by this EOLD?

01:00:37.590 --> 01:00:44.910
Correct. Or if we're using infrastructure as code and we have designed the clusters so that they're stateless, right?

01:00:45.030 --> 01:01:01.810
So our storage is in a separate cluster or it's using a managed storage service, then we would simply create a new cluster and using GitOps, manage our workloads in it and just point over to a new cluster.

01:01:01.810 --> 01:01:11.130
So in today's environment, with the advance of infrastructure as code, you can do it two ways.

01:01:11.290 --> 01:01:18.770
You can manage a cluster and keep it running forever, and the artifacts will build up inside that cluster over time.

01:01:19.710 --> 01:01:24.710
And it can introduce issues if you've had a cluster running for three or four years and you upgrade it in place.

01:01:25.130 --> 01:01:26.970
It can introduce issues into that.

01:01:28.270 --> 01:01:31.790
Or you can build a fresh cluster with infrastructure as code.

01:01:31.810 --> 01:01:35.750
So there are several ways to do it.

01:01:35.750 --> 01:01:38.390
You can build cluster one, cluster two.

01:01:38.390 --> 01:01:44.390
So let's say cluster one is 1.31 and cluster 2 is 1.33.

01:01:44.390 --> 01:01:49.330
And you can run a cluster mesh between them and then you can just start draining the nodes.

01:01:49.330 --> 01:01:54.970
And if you have your workload set up properly, as you drain the nodes on cluster 1, which

01:01:54.970 --> 01:02:01.790
is 1.3, it will move those workloads automatically over to the cluster 2, which is 1.33.

01:02:01.810 --> 01:02:05.970
And so some enterprises will use a cluster A,

01:02:05.970 --> 01:02:09.930
cluster B type of setup with a cluster mesh.

01:02:09.930 --> 01:02:12.970
And the other cluster is the one that they're,

01:02:12.970 --> 01:02:14.410
then would start upgrading.

01:02:14.410 --> 01:02:17.450
So once everything transferred over and work,

01:02:17.450 --> 01:02:20.770
then you would start building a new cluster in cluster one,

01:02:20.770 --> 01:02:23.130
join it with cluster mesh tested out.

01:02:24.330 --> 01:02:26.770
The other way, oh, second, go ahead.

01:02:26.770 --> 01:02:30.650
It could, yes, as long as nothing goes wrong.

01:02:30.650 --> 01:02:34.030
The other way is that you spin up a new cluster

01:02:34.030 --> 01:02:36.210
with all of your workloads already on it.

01:02:36.210 --> 01:02:38.370
Because you're using stateless workloads,

01:02:38.370 --> 01:02:40.450
if you've designed it for Kubernetes,

01:02:40.450 --> 01:02:42.990
and you don't have a database in that cluster.

01:02:43.910 --> 01:02:46.110
Then you can just spin up an entirely new cluster

01:02:46.110 --> 01:02:49.830
using Argo CD, for example, for your declarative GitOps.

01:02:49.830 --> 01:02:53.610
Everything is running, and then you just point the DNS

01:02:53.610 --> 01:02:54.550
over to the new cluster.

01:02:54.550 --> 01:02:58.130
Yeah, and so that would be probably more seamless,

01:02:58.130 --> 01:03:00.630
depending on how big your cluster.

01:03:00.650 --> 01:03:03.470
So those are two ways to do it with infrastructure as code.

01:03:03.830 --> 01:03:09.810
And then the other way is you can simply create a node or drain a node, cordon a node,

01:03:11.230 --> 01:03:15.370
and then you do your upgrade in place, and then you uncoordant the node,

01:03:16.510 --> 01:03:23.430
and then allow the pods to repopulate on the node after it joins and it's uncordined.

01:03:24.030 --> 01:03:27.590
The pods will populate if the pods have been set up correctly to do that.

01:03:27.590 --> 01:03:35.030
If they have not, however, then they may just stay on the existing node that they're on until you

01:03:35.750 --> 01:03:40.370
cordon that node and then start draining it and then they'll force them to populate over

01:03:40.370 --> 01:03:48.430
So you can run into issues when you're upgrading a cluster in place if you haven't set up your pods from the very beginning

01:03:48.430 --> 01:03:56.230
whether it's a deployment or a staple set if you haven't set it up so that it is designed to run on multiple

01:03:56.230 --> 01:04:02.770
nodes if that's the type of setup that you have. So one of the many issues that you can run

01:04:02.770 --> 01:04:08.130
into when upgrading the cluster. And that's why with infrastructure is code, most of my clients, they

01:04:08.130 --> 01:04:13.270
just spin up a new cluster and move everything over to the new cluster. They don't, you know, mess with

01:04:13.270 --> 01:04:17.510
today upgrading in place, but we're going to go through that and show you how to do that.

01:04:17.510 --> 01:04:28.350
So you at least know how the concept. Okay. I think we are, you know, we're at 1030. Let's take a break.

01:04:28.890 --> 01:04:29.730
You ready for a break?

01:04:29.730 --> 01:04:31.830
They're gonna get into lots of practical application

01:04:31.830 --> 01:04:32.610
when we come back.

01:04:32.610 --> 01:04:35.530
So we'll do a 15 minute break and we'll be back.

01:04:35.530 --> 01:04:38.610
Let's see, I've got 1032, so about 10.40 seconds.

01:04:38.610 --> 01:04:40.570
Okay, see in a few.

01:04:41.990 --> 01:04:43.310
All right, are you back?

01:04:43.310 --> 01:04:44.810
All right.

01:04:44.810 --> 01:04:49.450
So we left off at practice two.

01:04:49.450 --> 01:04:52.730
Okay, we're gonna check the container runtime.

01:04:54.570 --> 01:04:56.130
How do we check the container run?

01:04:56.130 --> 01:04:58.330
So I guess what is a container runtime?

01:05:01.800 --> 01:05:03.800
What types of container run times are there?

01:05:03.800 --> 01:05:06.140
And we don't really go over this in detail

01:05:06.140 --> 01:05:08.840
because it's more of a Docker concept.

01:05:08.840 --> 01:05:13.140
But you have a Docker container runtime, right?

01:05:13.140 --> 01:05:17.640
And then you have several different types of run times.

01:05:17.640 --> 01:05:20.880
So what we're going to do is we are going to check

01:05:20.880 --> 01:05:23.480
and see what container runtime we're running in Mini.

01:05:23.480 --> 01:05:26.240
We'll run through control, get nodes, minus,

01:05:26.240 --> 01:05:30.200
and that's a lowercase O, and then wide.

01:05:30.200 --> 01:05:31.780
And what container runtime

01:05:31.800 --> 01:05:38.000
you see. Correct. And do you notice, but there's a little bug in MiniCube. I don't know why,

01:05:38.100 --> 01:05:46.060
but bugs always stand out to me. You're running on Ubuntu 24, and it reads the image as Ubuntu

01:05:46.060 --> 01:05:54.580
22, and that's actually a mini-cube bug. So interesting. All right. So we're going to

01:05:54.580 --> 01:06:01.780
change the container runtime. So how do we ensure a fresh mini-cube environment? So we'll

01:06:01.800 --> 01:06:07.340
We're going to stop the MiniCube environment and delete all.

01:06:08.140 --> 01:06:13.580
And then we're going to start it with a new container runtime, which is going to be Container D.

01:06:15.120 --> 01:06:15.680
Correct.

01:06:15.880 --> 01:06:21.300
And so Kubernetes today uses Container D, and they've gotten away from Docker.

01:06:22.040 --> 01:06:27.300
So they do very little with Docker these days, and that's because they competed.

01:06:28.360 --> 01:06:31.740
Docker had Swarm, which was kind of like the early.

01:06:31.800 --> 01:06:40.680
containerization orchestration program. And that competed with Kubernetes, so Kubernetes moved away from

01:06:40.680 --> 01:06:46.440
that to container D, which also, by the way, came from D. You'll run into that a lot in the Kubernetes

01:06:46.440 --> 01:06:52.360
ecosystem where teams compete with it. Okay, now we're going to label a node for Kubernetes.

01:06:52.360 --> 01:07:02.680
node type equals test. So if you look, all right, so go ahead and label the node,

01:07:02.680 --> 01:07:09.080
if control label node, and our node name is minicube, and then label it with Kubernetes.

01:07:09.080 --> 01:07:16.280
I.O. Have you ever labeled nodes before? So the node name is actually Minicube. So the

01:07:16.280 --> 01:07:22.340
command, yeah, it's just we didn't name the mini-cube cluster something else, so the actually

01:07:22.360 --> 01:07:27.320
name of this node is MiniCube. In a multi-node environment, the next one would be Minicube

01:07:27.320 --> 01:07:33.740
dash 02, I believe. Okay, you can try it. You can try running it in there and see what comes up.

01:07:33.740 --> 01:07:38.600
So in order to see the node label, we're going to run Cube Control getting nodes, but then

01:07:38.600 --> 01:07:50.820
show labels. Okay, see if you can find it. And that long string node.

01:07:55.780 --> 01:08:03.060
So I've tried to use parsing tools, and there are a few out there, but they still leave something to be desired.

01:08:04.340 --> 01:08:17.220
Okay. So now, we're going to remove the node label. And we do that by removing everything after the equal sign and replacing it with a hyphen.

01:08:18.100 --> 01:08:20.100
So it's the same command that we used before.

01:08:21.460 --> 01:08:25.620
Okay, let me see it's unlabeled. When you look, it'll now be gone. All right, we're going to assign.

01:08:25.780 --> 01:08:31.940
a role to a node. So if you look at that, you'll see that you have the name is mini-cube,

01:08:31.940 --> 01:08:41.540
the status is ready, and the roll is control plane. Right. So now we're going to actually

01:08:41.540 --> 01:08:47.620
assign this a role. Previously we just labeled it and now we're going to label it with a

01:08:47.620 --> 01:08:54.100
node role. I would have as well. Let me check here. Yeah, that's interesting.

01:08:55.780 --> 01:08:58.420
So I would have expected to see test as well.

01:08:59.300 --> 01:09:00.380
Oh, no, you know what?

01:09:00.480 --> 01:09:01.560
No, node type is.

01:09:02.460 --> 01:09:04.680
So you can use true.

01:09:05.520 --> 01:09:06.540
Okay, so here's the thing.

01:09:06.660 --> 01:09:08.900
We named it node type equals test.

01:09:09.800 --> 01:09:15.980
Because sometimes when you use true, label selectors do not recognize that.

01:09:17.000 --> 01:09:21.320
And you'll uncover that when you start to use Helm charts.

01:09:22.720 --> 01:09:24.560
And so you can use true.

01:09:24.560 --> 01:09:56.430
but in this case I just use test so you can put anything you want in it so it's actually labeled as a node type and if you look at the label for control planes so there's nothing after the equal now we're going to remove that node label the role all right now we're going to install a pod so we're going to relabel it with node type equals test here you go verify it looks like it's in there yep okay so now

01:09:56.450 --> 01:09:57.630
Now we're going to install a pod.

01:09:58.030 --> 01:10:03.630
So using VIM, we should have VIN installed, create a pod-nash note selector.

01:10:03.910 --> 01:10:04.550
YAML file.

01:10:06.280 --> 01:10:07.480
I don't think so.

01:10:07.780 --> 01:10:09.880
I think they disabled to do where everyone is auto-

01:10:09.880 --> 01:10:10.240
Yes.

01:10:10.360 --> 01:10:16.500
And so the reason I do this, what I've done is I've shortcut a lot of my examples, so they just

01:10:16.500 --> 01:10:17.220
have the basics.

01:10:17.920 --> 01:10:23.560
And the reason for that is a lot of the errors that you will run into running a Kubernetes

01:10:23.560 --> 01:10:27.680
cluster and running workloads on the Kubernetes cluster.

01:10:27.700 --> 01:10:31.080
will be based on the syntax of your YAML.

01:10:31.080 --> 01:10:35.560
And so by practicing on several of these short YAML files

01:10:35.560 --> 01:10:37.480
that we have in each lesson,

01:10:37.480 --> 01:10:41.300
you'll start to understand how the syntax goes together.

01:10:41.300 --> 01:10:43.780
And one of the problems with Kubernetes

01:10:43.780 --> 01:10:48.860
that can be so frustrating is that it doesn't provide

01:10:48.860 --> 01:10:52.040
descriptions of your errors.

01:10:52.040 --> 01:10:54.240
And sometimes it will tell you which line

01:10:54.240 --> 01:10:55.900
you have a YAML error on,

01:10:55.900 --> 01:10:57.180
but oftentimes it does not,

01:10:57.180 --> 01:10:58.900
especially with helen charts.

01:10:58.900 --> 01:11:02.220
So learning the proper formatting

01:11:02.220 --> 01:11:04.240
will save you valuable time later on

01:11:04.240 --> 01:11:05.200
on your troubles sheet.

01:11:05.200 --> 01:11:07.740
And I pulled a lot of it out of these,

01:11:07.740 --> 01:11:09.000
so we should be pretty brief.

01:11:09.000 --> 01:11:09.840
All right.

01:11:11.260 --> 01:11:12.300
Yeah.

01:11:12.300 --> 01:11:14.600
Looks right, if you have a lot of kind metadata.

01:11:14.600 --> 01:11:16.540
No, select here you.

01:11:16.540 --> 01:11:19.940
Inspeb, containers, name, image.

01:11:19.940 --> 01:11:21.100
Yeah, looks good.

01:11:21.100 --> 01:11:22.580
Go ahead and escape.

01:11:22.580 --> 01:11:24.940
And then, yep, right quick.

01:11:24.940 --> 01:11:25.900
That'll work.

01:11:25.900 --> 01:11:38.480
And then I'm going to apply to keep control applying minus f for file

01:11:38.480 --> 01:11:46.720
pod dash note selector.m. All right now we're going to keep control and get pods

01:11:46.720 --> 01:11:50.720
minus it and what is this static?