Kubernetes Comprehensive 2-Day Videos

                WEBVTT

00:00:00.180 --> 00:00:02.280
So I don't have to do anything on my end.

00:00:03.480 --> 00:00:04.400
All right.

00:00:05.180 --> 00:00:06.680
Let's go ahead and get this.

00:00:07.600 --> 00:00:08.280
All right.

00:00:08.500 --> 00:00:09.440
Well, let's see.

00:00:09.540 --> 00:00:16.320
Yesterday we left off in lesson six, and we were going through an ingress.

00:00:17.160 --> 00:00:18.660
We'll do a quick review here.

00:00:18.660 --> 00:00:32.040
So ingresses enable us to map our traffic to our back-in based on rules that we define.

00:00:33.240 --> 00:00:34.760
We utilize an ingress controller.

00:00:34.760 --> 00:00:38.700
and as well as an Ingress Manifest file.

00:00:40.200 --> 00:00:47.520
And it's being replaced by the Gateway API, which we're not going to use today because

00:00:47.520 --> 00:00:52.520
I don't believe you can install that fully on mini Q's.

00:00:52.520 --> 00:00:59.000
And then we talked about CNIs and how Sillium is a GNI that's useful on bare metal.

00:00:59.460 --> 00:01:04.920
It provides a lot of features out of the box and allows you to replace Kube Proxy.

00:01:04.920 --> 00:01:10.960
includes IPAM support if you want to feed it your own pool of IP addresses.

00:01:11.800 --> 00:01:19.370
It does not however provide load balancers and you still need an external load balancers such as Coombe.

00:01:19.370 --> 00:01:25.750
All right now let's see if you can get to we're on slide 29 here.

00:01:34.520 --> 00:01:45.430
All right, we created a cillium install and we deleted it and we'd seen what cillium installs

00:01:45.990 --> 00:01:48.690
Cillium is actually a good one to practice with.

00:01:49.690 --> 00:01:54.270
So now I'm going to ensure a fresh mini-cube environment,

00:01:54.810 --> 00:02:02.790
and we are going to deploy Cillium in an H-A mode with six-node.

00:02:02.950 --> 00:02:06.150
We want to ensure we have a fresh mini-cube environment first.

00:02:07.470 --> 00:02:08.010
Stop.

00:02:08.270 --> 00:02:09.670
I would probably do stop first.

00:02:09.750 --> 00:02:10.110
Okay.

00:02:10.830 --> 00:02:12.110
Now, do LS real quick.

00:02:12.170 --> 00:02:12.470
Oh, good.

00:02:12.710 --> 00:02:12.850
Yeah.

00:02:12.850 --> 00:02:13.970
So it didn't recreate.

00:02:14.130 --> 00:02:15.890
It's the same one that we left off with.

00:02:15.990 --> 00:02:18.990
I need to have to type all those back in again.

00:02:18.990 --> 00:02:19.990
All right.

00:02:19.990 --> 00:02:29.990
So how do you feel about your grasp of deployments and staple sets and replica sets?

00:02:29.990 --> 00:02:41.990
Yeah, I mean, Kubernetes has such a steep learning curve starting out, so just knowing what to review can be a huge time saver.

00:02:41.990 --> 00:02:42.990
Oh, interesting.

00:02:42.990 --> 00:02:43.990
Okay.

00:02:43.990 --> 00:03:08.150
Yeah. Oh, okay. Yeah. Oh, interesting. And that's, that's to get your certification? Oh, okay. Interesting. Yeah, I know some of the courses use a script to spin up K3S inside ECS. So let's spin up a little K3S cluster.

00:03:10.230 --> 00:03:13.970
All right. Mm-hmm. We do. Let's see if we can control

00:03:13.990 --> 00:03:20.020
see out of that. Let's just run all kinds. Okay, so what does it say here?

00:03:22.100 --> 00:03:25.620
All right, let's try to scroll up to the top. We may have been too many logs there, but let's

00:03:25.620 --> 00:03:33.140
try to scroll up to the top, I think, or here. Yeah, we can stop it and then start it, and then

00:03:33.140 --> 00:03:40.420
let's see right when it starts to fail. It'll do a simple mini-cube start, but if you do it,

00:03:40.420 --> 00:03:44.100
HA, we have to fix our cluster.

00:03:44.100 --> 00:03:45.200
Uh-huh.

00:03:45.200 --> 00:03:46.320
Yeah.

00:03:46.320 --> 00:03:50.540
Although I didn't expect it to actually try to join servers,

00:03:50.540 --> 00:03:52.660
usually it just fails out.

00:03:52.660 --> 00:03:56.180
So yeah, so there's usually a-

00:03:56.180 --> 00:04:00.280
Oh yeah, so usually it tells you kind of an idea

00:04:00.280 --> 00:04:02.180
of what happened and what went wrong,

00:04:02.180 --> 00:04:05.120
but this thing actually tried to join the Etsy B cluster.

00:04:05.120 --> 00:04:07.660
So that's interesting.

00:04:07.660 --> 00:04:10.400
We'll see if we can fix this and hopefully.

00:04:10.420 --> 00:04:20.420
So if you just start it back up, it might give us the error right at the very beginning.

00:04:20.420 --> 00:04:23.420
And then just do Control C.

00:04:23.420 --> 00:04:24.420
And let's see.

00:04:24.420 --> 00:04:32.420
It looks like it did three nodes, so MiniCube 03, so it started MiniCube M04, and that's when the error happened.

00:04:32.420 --> 00:04:34.420
So that one never started.

00:04:34.420 --> 00:04:36.420
So let's see what happened.

00:04:36.420 --> 00:04:39.420
Yeah, let's see if it'll tell us at M04.

00:04:39.420 --> 00:04:44.420
M-04 when it starts at M-O-4. Let's see if there's an error before it tries to join.

00:04:44.420 --> 00:04:45.420
Yeah.

00:04:45.420 --> 00:04:48.420
Okay, we're on two.

00:04:48.420 --> 00:04:52.420
See where it failed before.

00:04:52.420 --> 00:04:53.420
Stopping node.

00:04:53.420 --> 00:04:56.420
Okay, now go ahead and control C.

00:04:56.420 --> 00:04:59.420
All right, let's see what it says here.

00:04:59.420 --> 00:05:04.420
So we need to fix the Max files issue.

00:05:04.420 --> 00:05:07.420
But you can go ahead and...

00:05:07.420 --> 00:05:08.420
There we go.

00:05:08.420 --> 00:05:15.580
Yep. So three commands we need to put in there. And what we're going, yeah, so this is, because

00:05:15.580 --> 00:05:24.900
Minicube is using Docker and Docker and Kubernetes inside that final Docker. And so it, it exceeds

00:05:24.900 --> 00:05:37.340
the normal Ubuntu, DBN max files. And so assuming that the Ubuntu 24 instance that you are on is the same

00:05:37.340 --> 00:05:44.380
is what i'm on this should fix it that's correct let's uh cross our fingers if they didn't do anything

00:05:44.380 --> 00:05:50.540
else to this a bunch of 24 instance that makes it different right their own little distributions

00:05:52.380 --> 00:06:04.030
and i'll be right back so what happens cool let's um go ahead and check the pods you notice there

00:06:05.070 --> 00:06:11.130
maybe a few issues why do you think that is so he yep still settling out looks like we

00:06:11.150 --> 00:06:12.550
We have everything running now.

00:06:12.550 --> 00:06:13.550
We do.

00:06:13.550 --> 00:06:15.550
All right.

00:06:15.550 --> 00:06:19.750
So now we're going to check cilliums.

00:06:19.750 --> 00:06:21.890
So we're going to type Cillium status.

00:06:21.890 --> 00:06:24.090
All right.

00:06:24.090 --> 00:06:26.810
Yeah, so what are those warnings?

00:06:28.930 --> 00:06:31.690
So Hubble, yeah, so yeah,

00:06:31.690 --> 00:06:37.250
Hubble is a telemetry relay and it allows you to visualize

00:06:37.250 --> 00:06:41.110
the telemetry throughout your entire Kubernetes cluster.

00:06:41.150 --> 00:06:43.210
So it's very useful in troubleshoots.

00:06:43.210 --> 00:06:48.690
And when you start to have issues in your networking in your cluster, if you're using cillium,

00:06:48.690 --> 00:06:55.150
you can enable the Hubble relay and it has a UI and it gives you a graphical representation

00:06:55.150 --> 00:06:58.110
of what's going on inside the cluster.

00:06:58.110 --> 00:07:02.490
So they don't have that configuration.

00:07:02.490 --> 00:07:09.270
It's probably going to just keep throwing errors.

00:07:09.270 --> 00:07:19.750
see what else we see here operator is okay the envoy is okay and you can see where it says

00:07:19.750 --> 00:07:27.510
Hubble relay is disabled so Hubble won't be able to start up and cluster mesh is

00:07:27.510 --> 00:07:34.630
disabled so Cillian has the ability to add in a cluster mesh and join two clusters together

00:07:34.630 --> 00:07:39.970
So they operate almost as one cluster, but they're two completely separate clusters.

00:07:42.780 --> 00:07:43.160
All right.

00:07:43.380 --> 00:07:54.160
So what would we do if we were trying to actually deploy C-N-I and H-A-1?

00:07:55.480 --> 00:07:56.480
Let's see here.

00:07:57.300 --> 00:08:00.220
So the first, let's take a look at the pods.

00:08:01.320 --> 00:08:02.800
And what do we have here?

00:08:02.800 --> 00:08:14.160
we've got a cillium agent and a cillium envoy on each node we have a cillium operator let's do a wide

00:08:14.160 --> 00:08:21.440
you might have to widen your terminal to see this so it's not confusing um you might have to

00:08:21.440 --> 00:08:25.280
oh they won't i'd say i wonder if they'll let you string down the size of the terminal

00:08:25.280 --> 00:08:26.280
I guess not.

00:08:26.280 --> 00:08:30.280
And then do minus O wide.

00:08:30.280 --> 00:08:33.280
So do the same, oh, there we go.

00:08:33.280 --> 00:08:35.280
Let's see, yeah, there we go.

00:08:35.280 --> 00:08:43.650
So we have the cillium operator is running on mini Q.

00:08:43.650 --> 00:08:48.650
Which is our number one control plane node, right?

00:08:48.650 --> 00:08:51.650
We only have one cillium operator.

00:08:51.650 --> 00:08:54.650
So that's not in high available.

00:08:54.650 --> 00:08:57.650
And let's look down at our Cuvip, which manages

00:08:57.650 --> 00:09:00.470
our high availability.

00:09:00.470 --> 00:09:04.290
And we can see we have a kubev, which is a load balancer

00:09:04.290 --> 00:09:09.010
with vip running on all through the control plane nodes.

00:09:09.010 --> 00:09:14.550
So that provides a VIT to everything that needs to operate

00:09:14.550 --> 00:09:16.470
within the control plane.

00:09:16.470 --> 00:09:20.330
However, Scyllium doesn't have that VIT.

00:09:20.330 --> 00:09:22.370
So we would need to provide the kubbiv

00:09:22.370 --> 00:09:25.250
VIP address to Sillium for this work.

00:09:25.250 --> 00:09:27.630
We'd also need to enable the Hubble VL

00:09:27.650 --> 00:09:37.730
telemetry we would need to enable kub proxy replacement which means it would enable koup proxy

00:09:37.730 --> 00:09:43.490
replacement and silling and then we would remove kub proxy so you see the kub proxy pods one

00:09:43.490 --> 00:09:49.970
running on each node there enable gateway API for the replacement and you can actually

00:09:49.970 --> 00:09:56.290
run both side-by-side so you can run gateway API and Ingress the i-sun-free is here

00:09:57.650 --> 00:09:59.310
All right, there we go.

00:09:59.310 --> 00:10:01.230
Now, if you're using Gateway API,

00:10:01.230 --> 00:10:03.970
you'd need to create an HTTP route,

00:10:03.970 --> 00:10:07.130
or if you were using Ingress API,

00:10:07.130 --> 00:10:10.030
you'd need to create an ingress for the Hubble UI.

00:10:10.030 --> 00:10:13.150
So you can see the telemetry.

00:10:13.150 --> 00:10:17.330
And if you are installing Gateway API,

00:10:17.330 --> 00:10:20.030
you would need the CRDs.

00:10:20.030 --> 00:10:24.410
And Hollis is probably best done with Helm,

00:10:24.410 --> 00:10:27.590
rather than trying to use this implementation.

00:10:27.650 --> 00:10:32.990
of cillium. This gives you a representation of what it would look like when you go to install

00:10:32.990 --> 00:10:39.370
cillium. In fact, let's look at one of those cillium agents. You can pick any one of the

00:10:39.370 --> 00:10:46.490
agents that just say cillium hyphen and then the hash. Let's take a look at it. Did that work or no?

00:10:47.950 --> 00:10:53.910
Okay. It doesn't work in mine either. Yeah, I forget what happens on mine when I do that,

00:10:53.910 --> 00:10:55.510
but it always does something weird.

00:10:55.830 --> 00:10:56.510
Yeah.

00:10:57.590 --> 00:10:57.990
Yeah.

00:10:58.230 --> 00:10:58.830
Uh-huh.

00:10:59.850 --> 00:11:02.030
Yeah, C&Is, once you learn how to use them,

00:11:02.070 --> 00:11:03.430
they make your life a lot easier.

00:11:04.870 --> 00:11:06.130
All right, see what we have.

00:11:06.130 --> 00:11:07.950
You can pick out

00:11:07.950 --> 00:11:10.170
the bacillium.

00:11:10.790 --> 00:11:12.990
You can see that it's a Damon set, right?

00:11:13.230 --> 00:11:13.470
Yeah.

00:11:14.510 --> 00:11:16.030
You can see it installed it

00:11:16.030 --> 00:11:18.910
in the Cube System namespace, which is appropriate.

00:11:19.230 --> 00:11:20.990
It looks like MiniCube 06

00:11:20.990 --> 00:11:22.410
was the last one created.

00:11:22.410 --> 00:11:52.390
and so their script behind the scenes started installing cillium before node six was up so rather than waiting until all the pods on the node were up to determine they just determined that the node was accessible but not ready and then they tried to install cillium on it so that would be a mistake in your scripting but still i mean what they've done here is pretty

00:11:52.410 --> 00:11:54.330
amazing from an engineering standpoint.

00:11:54.330 --> 00:11:56.630
We just have a little mistake there.

00:11:58.690 --> 00:12:00.710
So you would wait until all of your pods are up

00:12:00.710 --> 00:12:05.070
and then you put your cellium on all of the nodes.

00:12:06.910 --> 00:12:08.750
But what version are we running?

00:12:08.750 --> 00:12:11.530
No, it's right there in the event.

00:12:11.530 --> 00:12:14.750
We look at the container image

00:12:14.750 --> 00:12:16.770
where it says successfully pulled the image,

00:12:16.770 --> 00:12:20.590
it's telling us that in the script that they're running,

00:12:20.590 --> 00:12:22.270
they pulled up from Quaid,

00:12:22.270 --> 00:12:23.270
I.O.

00:12:23.270 --> 00:12:27.270
Cillium version 1.17.4.

00:12:27.270 --> 00:12:32.270
And they switched to Quay from Docker,

00:12:32.270 --> 00:12:37.270
probably because the doctor was charging them too much,

00:12:37.270 --> 00:12:39.270
or they had timeouts or DNS.

00:12:39.270 --> 00:12:43.270
A few upstream charge switched to Quay.

00:12:43.270 --> 00:12:48.270
.io from the Docker repose.

00:12:48.270 --> 00:12:50.270
All right, let's get out of that.

00:12:50.270 --> 00:12:51.270
Let's take a look at the controller.

00:12:51.270 --> 00:12:58.270
controller. There's only one controller. So you can tell this is not a high availability

00:12:58.270 --> 00:13:03.270
setup, even though we requested high availability. There's a concilium operator.

00:13:03.270 --> 00:13:16.270
This is what manages all of the individual agents and make sure that the agents are running

00:13:16.270 --> 00:13:18.270
and spun up properly.

00:13:18.270 --> 00:13:23.270
Kubernetes use as an operator principle.

00:13:23.270 --> 00:13:26.510
And you'll see that, especially with staple sets,

00:13:26.510 --> 00:13:28.070
so when you're running a stateful set,

00:13:28.070 --> 00:13:30.310
there will be an operator in front of it as well,

00:13:30.310 --> 00:13:32.350
so your databases.

00:13:32.350 --> 00:13:36.950
Most of your mainstream databases have converted over to,

00:13:36.950 --> 00:13:39.950
typically they use go, but they're go operators

00:13:39.950 --> 00:13:42.910
behind the, you know, under the hood.

00:13:42.910 --> 00:13:45.510
And that manages spinning up your tenants.

00:13:45.510 --> 00:13:48.190
So you have an operator and then you have tenants

00:13:48.270 --> 00:13:54.810
So if you recall, when we studied namespaces, we had a minio operator, right?

00:13:55.490 --> 00:14:01.690
And then we had menu tenants for Loki for the logging, and then we had a minio tenant for GitLab.

00:14:02.490 --> 00:14:14.430
And so this is a similar concept for an operator, and it spins up the tenants, which in this case would be cillium agent, and we'll see what it does here.

00:14:14.530 --> 00:14:17.590
So it has a replica set of one, right?

00:14:17.590 --> 00:14:18.590
Let's see here.

00:14:18.590 --> 00:14:20.590
It's controlled by a replica sub.

00:14:20.590 --> 00:14:22.590
Liveness and readiness probes.

00:14:22.590 --> 00:14:24.590
Very important for Kubernetes to have

00:14:24.590 --> 00:14:26.590
livenous and readiness probes.

00:14:26.590 --> 00:14:29.590
Let's continue down through.

00:14:29.590 --> 00:14:31.590
Everything is true.

00:14:31.590 --> 00:14:33.590
So we don't see one replica.

00:14:33.590 --> 00:14:36.590
We just see it's controlled by a replica set.

00:14:36.590 --> 00:14:39.590
And so this is an individual pod.

00:14:39.590 --> 00:14:41.590
And so it just says it is this individual pod

00:14:41.590 --> 00:14:44.590
is controlled by a replica set.

00:14:44.590 --> 00:14:46.590
And we can look at that replica set in a minute

00:14:46.590 --> 00:14:52.670
minute when we get out of this. Did you see that up there where it was controlled by replica set?

00:14:53.870 --> 00:15:00.030
Okay. And then you can see conditions are all true, so this thing is in good shape. And

00:15:00.030 --> 00:15:07.230
the versions should be the same with cillium, with the exception of, I believe, Hubble. So when you

00:15:07.230 --> 00:15:14.650
use cillium in your own cluster, if you decide to, you'll see the versions are pretty much the same

00:15:14.650 --> 00:15:16.850
except for when you get into the Hubble aspect.

00:15:17.890 --> 00:15:20.670
All right, so we can go ahead.

00:15:20.770 --> 00:15:23.570
Let's take a look at the replica sets for all namespace.

00:15:23.570 --> 00:15:24.250
Yeah, there we go.

00:15:24.370 --> 00:15:27.190
So just desired one, current one, ready one.

00:15:27.890 --> 00:15:29.890
That's all they gave it in the config.

00:15:30.590 --> 00:15:32.550
And then let's take a look at the pods again.

00:15:32.730 --> 00:15:35.990
We'll look at one of the envoy.

00:15:36.110 --> 00:15:39.430
What's the difference between the operator and the envoy?

00:15:40.330 --> 00:15:40.650
Correct.

00:15:41.190 --> 00:15:43.810
So when it spins it up, if you think internally,

00:15:43.810 --> 00:15:48.810
how the script is deploying it.

00:15:49.690 --> 00:15:51.110
What's the difference between the two?

00:15:51.110 --> 00:15:52.170
Correct.

00:15:52.170 --> 00:15:53.170
Yep.

00:15:53.170 --> 00:15:58.050
And then the Damon set is set so that whether you do HA mode or not,

00:15:58.050 --> 00:16:04.650
it just does one, or the operator is set so that whether you do HA mode or not,

00:16:04.650 --> 00:16:08.050
it only installs one.

00:16:08.050 --> 00:16:13.410
And so they're probably using the same script for every installsillium so that you have

00:16:13.410 --> 00:16:14.410
one operator.

00:16:14.410 --> 00:16:21.090
And let's see if you can find anything else in here.

00:16:21.090 --> 00:16:26.650
So in this case, Envoy has everything's true,

00:16:26.650 --> 00:16:30.330
got lightness, readiness, and startup probe.

00:16:30.330 --> 00:16:31.790
Notice that?

00:16:31.790 --> 00:16:33.970
You don't see startup proof too often.

00:16:33.970 --> 00:16:36.170
You see lightness and readiness.

00:16:36.170 --> 00:16:41.090
And sometimes you'll only see readiness, which is not good.

00:16:41.090 --> 00:16:43.230
You want to see lightness as well.

00:16:43.230 --> 00:16:46.530
Startup provides that additional measure.

00:16:46.870 --> 00:16:47.310
Yes.

00:16:47.950 --> 00:16:52.390
So readiness means it's ready to accept workloads.

00:16:53.270 --> 00:16:55.130
And liveliness means we're alive.

00:16:55.370 --> 00:16:56.550
It's just basically peeing.

00:16:56.650 --> 00:16:57.670
Hey, are you there?

00:16:57.950 --> 00:16:58.610
Yes, I'm here.

00:16:58.710 --> 00:16:59.190
Okay, great.

00:16:59.190 --> 00:17:02.330
So I can still keep sending,

00:17:03.730 --> 00:17:06.450
I can still keep sending workloads to you.

00:17:06.970 --> 00:17:09.510
Readiness is, I'm not ready yet.

00:17:09.610 --> 00:17:10.370
I'm sending anything.

00:17:10.510 --> 00:17:11.150
Okay, I'm ready.

00:17:11.250 --> 00:17:13.030
Now we go to liveness, right?

00:17:13.230 --> 00:17:15.970
And then startup is as it's starting up.

00:17:16.630 --> 00:17:18.930
And what kind of time out do you have here and start up?

00:17:19.090 --> 00:17:21.970
Okay, five seconds, time out one second, three, two.

00:17:23.510 --> 00:17:25.130
They will hear 100.

00:17:25.390 --> 00:17:26.350
Does that say 105?

00:17:27.150 --> 00:17:27.870
I'm not having it.

00:17:28.190 --> 00:17:28.390
Yeah.

00:17:28.790 --> 00:17:29.090
Okay.

00:17:29.290 --> 00:17:30.870
And that's because, you know, wow.

00:17:32.750 --> 00:17:36.090
Yeah, they have a lot of failures on starting this up before it's ready.

00:17:36.370 --> 00:17:38.150
So that's not unusual, by the way.

00:17:38.990 --> 00:17:41.730
Somebody actually tested it out and said,

00:17:41.730 --> 00:17:43.370
That was the number that worked.

00:17:43.370 --> 00:17:45.170
So this team actually tests.

00:17:45.170 --> 00:17:47.770
That's a good sign, by the way, when you see that.

00:17:47.770 --> 00:17:50.750
So I mean, say they have a complete team.

00:17:50.750 --> 00:17:52.730
So all right.

00:17:52.730 --> 00:18:00.970
Yeah, so if the startup, it'll wait and then it'll just go into a failure and it'll tell you startup probe failed.

00:18:02.230 --> 00:18:05.230
And then it'll try to restart it if that's the way to this set up.

00:18:05.230 --> 00:18:06.350
It's a game and set.

00:18:06.350 --> 00:18:11.710
So it may kill the pod.

00:18:11.730 --> 00:18:13.090
And then start up another.

00:18:13.970 --> 00:18:15.970
So let's see if there's anything else good in here.

00:18:16.050 --> 00:18:16.890
What do we have here?

00:18:17.190 --> 00:18:18.850
We have a config map.

00:18:19.970 --> 00:18:20.370
Look at that.

00:18:20.430 --> 00:18:21.530
We have an Envoy config.

00:18:23.130 --> 00:18:24.550
We'll check that out here.

00:18:24.550 --> 00:18:28.630
And that's we have Qube API access with a token.

00:18:29.710 --> 00:18:34.130
And gave it a token to be able to access the Kube API.

00:18:34.410 --> 00:18:41.710
So that's the Kube Group C-A-SERT, which is probably in, it's probably using the Kube system namespace.

00:18:41.730 --> 00:18:44.050
so it automatically has access to that.

00:18:44.050 --> 00:18:46.850
Node selectors, well, it's pretty basic.

00:18:46.850 --> 00:18:51.890
It just says, you know, you can install on any nodes as Linux.

00:18:51.890 --> 00:18:55.250
If it says Windows, it's probably, okay, what do I have here?

00:18:55.250 --> 00:18:58.610
Node affinity, four did not satisfy.

00:18:58.610 --> 00:19:04.530
Okay, so it says four didn't satisfy node affinity because it already had one on the first four.

00:19:04.530 --> 00:19:07.570
So the fifth one was spinning up.

00:19:07.570 --> 00:19:08.690
This must be five.

00:19:08.690 --> 00:19:09.810
Is this five?

00:19:09.810 --> 00:19:10.850
Can we scroll it to the top?

00:19:10.850 --> 00:19:11.730
This is on five.

00:19:11.930 --> 00:19:13.290
It should tell us what node were on.

00:19:13.770 --> 00:19:15.410
Node Minibube, yep, 05.

00:19:15.410 --> 00:19:20.250
So it says, okay, the first four already had a daemon set running.

00:19:20.550 --> 00:19:21.350
Already had a pod.

00:19:22.010 --> 00:19:25.670
So that's not available because note affinity says we can only run a one.

00:19:26.530 --> 00:19:33.130
And so it says, so the fifth one is where it wanted to install, however, it was not available yet.

00:19:33.810 --> 00:19:39.370
And it started out because it didn't have any free ports or the requested pod ports.

00:19:39.370 --> 00:19:41.690
So the reports weren't up yet on the note.

00:19:41.690 --> 00:19:45.850
And if we look at the version, we see this is not 1.17 because this is envoy.

00:19:45.850 --> 00:19:50.250
And not everyone installs an envoy, but for some reason, the engineering team crew,

00:19:50.250 --> 00:19:54.330
MiniCube, they have an envoy.

00:19:54.330 --> 00:19:58.810
All right, and we're going to look at the daemon set.

00:20:00.090 --> 00:20:05.610
And we can see, yep, so we'll describe the daemon set.

00:20:05.610 --> 00:20:09.210
Wow, it had to create those pods quite a few times, didn't it?

00:20:09.370 --> 00:20:11.370
Do you notice that?

00:20:11.370 --> 00:20:13.370
All right.

00:20:13.370 --> 00:20:17.420
So what we have is a label.

00:20:17.420 --> 00:20:20.420
Syllium Envoy part of Cillium.

00:20:20.420 --> 00:20:24.420
They're using a deprecated demon set template.

00:20:24.420 --> 00:20:32.420
So they might be using a newer version of the image,

00:20:32.420 --> 00:20:38.420
but we may be working with an older script on this.

00:20:38.420 --> 00:20:41.540
Okay, and let's see our lifeness readiness.

00:20:41.540 --> 00:20:43.600
See if they have a startup probe.

00:20:43.600 --> 00:20:46.900
And there's the startup probe with the 105.

00:20:46.900 --> 00:20:48.060
We have an environment.

00:20:48.060 --> 00:20:53.180
This is K8's node names, so in K8's namespace.

00:20:53.180 --> 00:20:54.180
We have mounts.

00:20:54.180 --> 00:20:55.660
What do we have mounts here?

00:20:55.660 --> 00:20:59.460
We have BPS mounts, config.

00:20:59.460 --> 00:21:01.400
Let's see, we have a config map.

00:21:01.400 --> 00:21:03.060
All right, so we have a config map,

00:21:03.060 --> 00:21:05.340
and what's the name of the config map?

00:21:05.340 --> 00:21:08.400
So it's saying, I need to read this config map.

00:21:08.420 --> 00:21:15.860
silly monvoy config and where do you think that config map is located what names be yeah

00:21:15.860 --> 00:21:21.140
that would be my guess as well all right let's scroll down to the bottom and look at our events

00:21:21.140 --> 00:21:30.980
see how many times this thing we have here so we created three pods and we deleted a pod

00:21:30.980 --> 00:21:40.260
and we created three pods and we deleted three pods and we deleted three pods

00:21:40.260 --> 00:21:46.000
Created three, dated four, created three, delete two, created three, deleted two.

00:21:46.480 --> 00:21:49.600
And finally got six out of all of that.

00:21:50.260 --> 00:21:52.580
But that's what it's designed to do.

00:21:52.660 --> 00:21:56.420
That's part of the self-healing and everything that it needs to do when it's spinning up.

00:21:56.420 --> 00:22:01.120
So that's part of the magic that makes Kubernetes so great.

00:22:03.800 --> 00:22:06.860
Okay, let's take a look at the pods again and see if they're all still running,

00:22:06.860 --> 00:22:12.640
except if any of them have crashed now that they've been running for a while yeah we have a few that

00:22:12.640 --> 00:22:24.060
restarted but um not too bad all right yeah so so you can see here selling a little bit

00:22:24.060 --> 00:22:30.880
complexity to it but um it will take over for you notice you know something that's missing in this

00:22:30.880 --> 00:22:36.480
cluster what what isn't a normal single node cluster when we spend at a byte so we think back to what

00:22:36.480 --> 00:22:41.340
what is typically shown in our pods when we just do a mini-cube start obviously nowhere near the

00:22:41.340 --> 00:22:46.320
number of pods that we have showing right now but what is normally in there that's missing

00:22:46.320 --> 00:22:56.130
remember kubnet remember kubnet yeah kubnet is the mini-cubes c and so it right because we're we're

00:22:56.130 --> 00:23:07.890
using cellium so okay so in lesson six we learned how ingress work how ingress is work how ingress

00:23:08.050 --> 00:23:12.110
enables clients to access workload endpoints,

00:23:12.110 --> 00:23:16.550
how ingress uses an ingress controller,

00:23:16.550 --> 00:23:19.990
how ingress is managed through the Kubernetes API.

00:23:19.990 --> 00:23:26.890
So remember, we use the ingress file, right?

00:23:26.890 --> 00:23:29.990
And we used Coup Control, and that sent that

00:23:29.990 --> 00:23:32.230
to the Kubernetes API.

00:23:32.230 --> 00:23:36.230
And how Ingress API is being replaced by Gateway API.

00:23:36.230 --> 00:23:38.030
And when you switch to Gateway API,

00:23:38.050 --> 00:23:46.090
with an existing workload setup you must convert all ingress kind which is the

00:23:46.090 --> 00:23:53.830
kind and then ingress to Gateway API when migrating so it requires

00:23:53.830 --> 00:23:58.750
CRDs to be installed first for using the Gateway API so those get installed

00:23:58.750 --> 00:24:06.830
before CILUM would be installed then CILUM would read those CRDs when you

00:24:06.830 --> 00:24:09.830
enable the Gateway API flag.

00:24:11.730 --> 00:24:19.730
How a Gateway API relies on a gateway and an HTTP route.

00:24:21.590 --> 00:24:26.730
How a Gateway can share many HTTP routes across Namespea.

00:24:26.730 --> 00:24:32.430
Now a gateway provides greater flexibility, standardization, and scalability.

00:24:32.430 --> 00:24:36.810
And it's anticipated to be released later this

00:24:36.830 --> 00:24:44.510
here has one more experimental feature that they are testing out relating to TLS routes, which

00:24:44.510 --> 00:24:48.790
very few practitioners actually eat.

00:24:48.790 --> 00:24:58.950
We talked about CNIs, CNI plugins are used for cluster networking, used to manage network and security

00:24:58.950 --> 00:24:59.950
capabilities.

00:24:59.950 --> 00:25:06.810
So that enables you to use, for example, CILAM will enable pod to pod no-digit

00:25:06.830 --> 00:25:14.350
node encryption. So if these nodes were spread out across multiple bare metal instances with a

00:25:14.350 --> 00:25:20.970
network cable running between them, it would encrypt the traffic node to node and pod to pod,

00:25:20.970 --> 00:25:28.950
node to pod, node to pod, and pod to node. That would transparent encryption. So that way you don't need

00:25:28.950 --> 00:25:36.690
a TLS cert when you're communicating into the pod directly to the container anymore.

00:25:36.830 --> 00:25:42.510
The Kubernetes method is that we've terminated at the gateway now, and then we use pod-to-pod node

00:25:42.510 --> 00:25:50.430
to node encryption with Kubernetes native, which saves a lot of time when you're managing, you know,

00:25:50.430 --> 00:25:53.550
hundreds and hundreds of containers with TLS.

00:25:53.550 --> 00:25:57.950
How a CNI can be used to deploy a gateway API.

00:25:57.950 --> 00:26:06.270
We just got CILUM as a CNI, and how CILM is networking, observability, and security solutions.

00:26:06.270 --> 00:26:09.390
and the observability is through Hubble,

00:26:09.390 --> 00:26:12.210
so you have to actually enable the relay,

00:26:12.210 --> 00:26:16.190
and then you have to also enable the UI as well.

00:26:16.190 --> 00:26:20.510
And Sillium works well on bare metal with Google.

00:26:20.510 --> 00:26:33.160
All right, let me go ahead and start.

00:26:33.160 --> 00:26:46.680
Lesson seven.

00:26:46.680 --> 00:26:49.480
So lesson seven, we're gonna learn how

00:26:49.480 --> 00:26:54.560
to define computational resources using requests and limits.

00:26:54.560 --> 00:26:57.760
In Kubernetes, resource limits are crucial

00:26:57.760 --> 00:27:00.260
for enabling efficient resource utilization

00:27:00.260 --> 00:27:05.110
and to prevent resource starvation.

00:27:05.110 --> 00:27:07.810
If resources are constrained within the control plane,

00:27:07.810 --> 00:27:11.810
for example, the Kubernetes API server

00:27:11.810 --> 00:27:15.110
or XEDD resource may become unavailable.

00:27:16.590 --> 00:27:19.750
If insufficient resources are not available

00:27:19.750 --> 00:27:22.210
for a particular node or pod,

00:27:22.210 --> 00:27:25.510
then the resources may become unavailable for use.

00:27:25.510 --> 00:27:32.570
Available resources are defined within the node status and may be accessed using the Qube Control CLI.

00:27:32.650 --> 00:27:39.130
Node resources consist of CPU, memory, storage, and pods.

00:27:40.050 --> 00:27:49.110
Node requests enable pods to reserve a specific resource and ensures its availability when needed.

00:27:50.630 --> 00:27:55.310
Node limits, so different than a request, node limits.

00:27:55.510 --> 00:28:02.510
define the maximum resources available to the pods on the node.

00:28:02.510 --> 00:28:10.760
Total node limits may exceed 100% of the available capacity.

00:28:10.760 --> 00:28:19.760
So if you have 12 CPUs, you can set the limits for everything running on that node to 15 CPUs.

00:28:19.760 --> 00:28:25.760
This concept is based on the realization that not all pods

00:28:25.760 --> 00:28:30.660
pods will hit their limits at the same time.

00:28:30.660 --> 00:28:39.520
The kublet monitors note resources and will proactively terminate pods to reclaim resources when

00:28:39.520 --> 00:28:43.520
a particular resource is under pressure.

00:28:43.520 --> 00:28:51.060
The kublet can fail one or more pods in order to reclaim that resource.

00:28:51.060 --> 00:28:57.400
The kublet will set the pod status of the evicted pod to fail and then terminate the

00:28:57.420 --> 00:29:04.300
pod. So when you look at your coop control pods, you'll see it show it's failed. And sometimes

00:29:04.300 --> 00:29:09.820
you can't even see it. It'll just go straight to terminating. So, and then it just disappears.

00:29:11.180 --> 00:29:19.320
The kublet will attempt to reclaim resources before evicting pods, such as when experiencing

00:29:19.320 --> 00:29:28.600
disk pressure. And the kublet will delete old unused container images first before evicting. So if you

00:29:28.600 --> 00:29:33.400
experienced disk pressure it'll try to delete all the container images on that

00:29:33.400 --> 00:29:38.600
first and if you remember back to the example where we were installing a deployment

00:29:38.600 --> 00:29:43.720
on all a Damon set on all three nodes and it took longer and that's because it had

00:29:43.720 --> 00:29:49.240
to download the image onto each node and so those images stacked up after a while

00:29:49.240 --> 00:29:53.480
they're never you know deleted or garbage collected and so it looked at we and then

00:29:53.480 --> 00:29:58.440
one area in which Kubernetes cluster operators experienced node pressure

00:29:58.600 --> 00:30:05.880
is with the disk filling up on the node due to unrealized log collection without a proper log

00:30:05.880 --> 00:30:12.840
shipping and rotation process. This can cause failures in a production environment several months

00:30:12.840 --> 00:30:18.600
after a Kubernetes cluster has been provisioned. That's one of the areas that can get you.

00:30:18.600 --> 00:30:24.440
All right. Oom Killing. Killing a process due to out of memory can happen for both node

00:30:24.440 --> 00:30:32.040
processes and pods. Oomkilling a process or pod is usually due to unavailable resources or

00:30:32.040 --> 00:30:40.520
constraints. When upgrading containers with Helm charts to a new version with tightly constrained

00:30:40.520 --> 00:30:47.400
memory limits, it is not unusual to experience umkills for the pod that was just upgraded.

00:30:47.400 --> 00:30:58.240
This is one of the reasons that upgrades to be tested on a production-like cluster before deploying to a real production cluster.

00:30:59.340 --> 00:31:11.240
And this often happens because the upstream maintainer, the team, whose resource you're using, forgot to test in a production-like environment before pushing to production.

00:31:11.240 --> 00:31:20.200
and what will happen is you will go to run it in your cluster and all of a sudden you're being

00:31:20.200 --> 00:31:26.160
oomkilled and it will oomkilled doing startup or it will run for a week and then it'll start

00:31:26.160 --> 00:31:33.680
umkilling and that's because the upstream team forgot to test it right right so they use kind

00:31:33.680 --> 00:31:41.220
so a lot of upstream CNCF teams that are funded through CNCF they use kind which is different

00:31:41.240 --> 00:31:48.440
than the kind on the YAML template and so kind is Kubernetes and Docker and so CNCF provides

00:31:48.440 --> 00:31:54.440
a lot of templating to teams and so they just spin up and test it in kind it pulls it in as

00:31:54.440 --> 00:32:00.680
says all pods are running you completed your test pretty basic and then okay let's it pushes

00:32:00.680 --> 00:32:07.560
automatically to production so new helmets are typically container resource requests and

00:32:11.240 --> 00:32:17.160
limits are optional and we don't have to provide that information in our

00:32:17.160 --> 00:32:26.520
templating or manifest files the most common resource to specify our CPU and memory

00:32:27.960 --> 00:32:34.680
both CPU and memory may be requested and limited in the container configuration

00:32:34.680 --> 00:32:39.960
that would be under the spec container and then down further would be your resources

00:32:39.960 --> 00:32:50.360
So when you specify a resource request, the Kube Scheduler will determine which node to place the pod on.

00:32:51.000 --> 00:32:58.400
So if you need five CPUs and you only have one node that has, you know, five available, that's what it'll be placed on.

00:32:58.900 --> 00:33:04.420
If you need 10 gig of RAM and only one of your nodes has 10 gig available, then that's what it.

00:33:04.420 --> 00:33:09.660
So the kublet handles the reservation of resource requests on a node.

00:33:11.020 --> 00:33:17.360
A container is allowed to consume more resources than requested if necessary.

00:33:19.280 --> 00:33:22.000
Container requests are said as follows.

00:33:23.680 --> 00:33:31.540
Spec containers and resources request CPU, spec containers, resources requests memory of the

00:33:31.540 --> 00:33:36.540
The Kublet enforces limits on each node.

00:33:37.580 --> 00:33:40.540
And the container may consume more resources

00:33:40.540 --> 00:33:42.740
than specified short term.

00:33:43.940 --> 00:33:47.680
Container is generally not allowed

00:33:47.680 --> 00:33:51.540
to consume more resources than specified over time.

00:33:51.540 --> 00:33:54.260
In the short term, it can consume more.

00:33:54.260 --> 00:33:57.540
So you might see a pod consuming more than you've allowed,

00:33:57.540 --> 00:34:00.140
but that's not allowed to take place for very long.

00:34:00.140 --> 00:34:03.140
It's just letting you know it's not.

00:34:03.140 --> 00:34:07.540
And so CPU limits, when you specify a resource limit,

00:34:07.540 --> 00:34:10.140
the Kublett will enforce that limit.

00:34:10.140 --> 00:34:13.740
CPU limits are enforced by throttling.

00:34:13.740 --> 00:34:16.440
Memory limits are enforced with oomkills.

00:34:16.440 --> 00:34:21.440
In a container that experiences a container resource memory limit

00:34:21.440 --> 00:34:23.240
may not be unkilled immediately.

00:34:23.240 --> 00:34:28.340
And how do you handle your limits?

00:34:28.340 --> 00:34:32.780
are said as follows, spec containers resources limits.

00:34:32.780 --> 00:34:36.560
So this is different than before where we saw requests.

00:34:36.560 --> 00:34:39.320
And then spec containers resources limits memory.

00:34:39.320 --> 00:34:40.320
All right.

00:34:40.320 --> 00:34:43.800
Determining the correct setting for requests or limits

00:34:43.800 --> 00:34:46.620
can be accomplished in multiple ways.

00:34:46.620 --> 00:34:49.620
The first method is to review the documentation

00:34:49.620 --> 00:34:53.340
for the application running inside the pod.

00:34:53.340 --> 00:34:57.660
Typically, engineering teams who

00:34:57.660 --> 00:35:06.120
Managed provenage products will post a minimum set of requests and limits, along with advice on if certain limits such as CPU should be avoided.

00:35:06.860 --> 00:35:17.500
In other words, they may experience a temporary spike in CPU, and if that happens, they recommend not setting a limit because it's temporary.

00:35:17.960 --> 00:35:19.800
It may only last a few seconds.

00:35:20.440 --> 00:35:27.200
It may happen during startup, for example, and they don't want it to throttle and then fail.

00:35:27.660 --> 00:35:32.660
I think that's because the team has requested their product extensively.

00:35:32.660 --> 00:35:40.160
So you want to look for advice to avoid the CPU limits in the documentation.

00:35:40.160 --> 00:35:45.800
And then in the absence of good documentation on resources, the next best method is setting

00:35:45.800 --> 00:35:52.300
a practical resource for both requests and limits involving CPU and memory.

00:35:52.300 --> 00:35:56.600
Using your favorite deployment method, dart up the pod with a container and monitor both

00:35:56.600 --> 00:36:04.200
the events and logs for any error messages adjust the settings is necessary to eliminate startup

00:36:04.200 --> 00:36:14.840
issues first and then proper monitoring of the container logs is necessary to monitor and

00:36:14.840 --> 00:36:19.880
troubles through any issues that may arise after the initial startup sequence is finished

00:36:19.880 --> 00:36:24.040
so there's two steps involved when you're determining requests and limits name your

00:36:24.040 --> 00:36:29.080
startup you get to your startup and then you monitor it over time it's not a right

00:36:29.080 --> 00:36:37.000
practical application sure you have a fresh mini Q profile and you're going to

00:36:37.000 --> 00:36:44.960
get the node resources so once that comes up let me know we'll be right back

00:36:44.960 --> 00:36:49.120
wow what's going on with your disconnected up there that's weird I don't think mine

00:36:49.120 --> 00:36:54.000
does it at all have you ever seen that pop up online so when you're not in session

00:36:55.680 --> 00:37:00.680
They only allowed the student to run for 15 minutes.

00:37:01.560 --> 00:37:03.340
Have you noticed, is that every 15 minutes

00:37:03.340 --> 00:37:04.400
that it does that or?

00:37:05.440 --> 00:37:06.440
I know if it's a bub.

00:37:06.440 --> 00:37:07.720
Yeah, no one, okay.

00:37:07.720 --> 00:37:10.300
I know when we were setting up the VMs,

00:37:10.300 --> 00:37:14.860
we had to hurry because they were only up for 15 minutes at a time.

00:37:14.860 --> 00:37:17.780
Okay, VM is staying up,

00:37:17.780 --> 00:37:21.500
but it's disconnecting the network and reconnected.

00:37:21.500 --> 00:37:24.080
All right, we're going to look for it this.

00:37:24.080 --> 00:37:25.280
Well, there's a maximum capacity.

00:37:25.280 --> 00:37:27.280
capacity for CPU.

00:37:27.280 --> 00:37:30.620
And where do you find that at?

00:37:30.620 --> 00:37:33.260
It's under the Capacity.C.U.

00:37:33.260 --> 00:37:38.060
What are the allocated resources for CPU?

00:37:38.060 --> 00:37:38.300
Yep.

00:37:38.300 --> 00:37:40.140
Mm-hmm.

00:37:40.140 --> 00:37:41.660
Scroll down just a little bit.

00:37:41.660 --> 00:37:42.700
There we go.

00:37:42.700 --> 00:37:43.980
How much do we have allocated?

00:37:43.980 --> 00:37:50.940
So 750M would be assuming that 1,000M is a CPU.

00:37:51.980 --> 00:37:55.580
That our request, we have three quarters of a CPU.

00:37:55.580 --> 00:37:58.620
and limits are ZY.

00:37:59.680 --> 00:38:01.840
Which pod has the greatest CPU requires?

00:38:01.840 --> 00:38:04.140
Yeah, the most important one in the whole thing, huh?

00:38:05.660 --> 00:38:08.400
And what is the disc pressure condition?

00:38:08.400 --> 00:38:10.080
Yep, no disc pressure.

00:38:10.080 --> 00:38:12.460
And if we look, what is the memory pressure?

00:38:13.960 --> 00:38:15.880
And then how about the Pids?

00:38:15.880 --> 00:38:17.400
That's when it gets people.

00:38:17.400 --> 00:38:18.980
They don't even realize it's possible.

00:38:18.980 --> 00:38:19.720
Yep.

00:38:19.720 --> 00:38:24.890
And which maintains and reports them?

00:38:24.890 --> 00:38:26.970
So what process maintains

00:38:26.990 --> 00:38:30.960
reports on pressure.

00:38:30.960 --> 00:38:32.720
So what handles that process?

00:38:32.720 --> 00:38:34.320
All right.

00:38:34.320 --> 00:38:39.800
Looks like Neil used a 16 CPU set up with,

00:38:39.800 --> 00:38:40.920
how much memory do we have there?

00:38:40.920 --> 00:38:44.360
12.2 gig looks like, all right.

00:38:44.360 --> 00:38:50.660
So we may create a new deployment file

00:38:50.660 --> 00:38:53.260
using the EngineX app, but we're gonna copy it

00:38:53.260 --> 00:38:55.120
to an EngineX app, Limits.

00:38:55.120 --> 00:38:58.080
YAML file.

00:38:58.080 --> 00:38:59.820
All right, so what are we doing here?

00:38:59.820 --> 00:39:01.820
12 megabytes is correct.

00:39:01.820 --> 00:39:04.820
Okay, let's deploy the engine X.

00:39:04.820 --> 00:39:07.820
Yep, so we can see we have a memory and CPU.

00:39:07.820 --> 00:39:09.820
Now let's go back to pods.

00:39:09.820 --> 00:39:13.820
Okay, so we'll go back to pods and we'll see what's going on here.

00:39:13.820 --> 00:39:18.820
Let's take a look at one of those.

00:39:18.820 --> 00:39:21.820
We can figure this out.

00:39:21.820 --> 00:39:23.820
Mm-hmm.

00:39:23.820 --> 00:39:26.820
So being um-kill, but it's not telling us why.

00:39:26.820 --> 00:39:28.820
So let's scroll down to the bottom.

00:39:28.820 --> 00:39:35.740
down to the bottom. So successfully fold the image. I've created it now five times. It's already

00:39:35.740 --> 00:39:44.400
present. It's just this back off restarting failed container. See if we can look at the logs. You can get

00:39:44.400 --> 00:39:52.980
the logs for that container. All right. So let's go ahead and stop that and delete that. No,

00:39:52.980 --> 00:39:58.800
just delete the deployment. And we're going to modify that file. No, it's not telling us, well, it's

00:39:58.820 --> 00:40:03.860
being why it's being um killed is telling us um kill so we're going to see if we can

00:40:03.860 --> 00:40:09.700
prod it to tell us something oh let's see we didn't delete it yet try it try to oh do you delete

00:40:09.700 --> 00:40:19.700
it okay so let's let's modify this and that's correct to one main deploy it and we'll get the

00:40:19.700 --> 00:40:28.800
node resources yeah and are the pods running let's check the pod

00:40:28.820 --> 00:40:37.580
appointment zero of three right yeah so let's look at the pods again we have a

00:40:37.580 --> 00:40:42.140
container create error so subscribe the pod there's a minimum memory on this

00:40:42.140 --> 00:40:49.700
just failed to create pods sandball okay down at the bottom it says container

00:40:49.700 --> 00:40:58.660
init was um killed memory limit to low this is interesting

00:40:58.820 --> 00:41:08.980
a little bit different setup for some reason your mini cube is treating this slightly different and mine does

00:41:09.940 --> 00:41:18.420
so when i run this it actually tells me what the minimum is on mine um so that's interesting

00:41:18.420 --> 00:41:26.500
it just says unknown on yours and instead of saying unknown on mine it says minimum six in i

00:41:28.820 --> 00:41:40.300
is yours wouldn't run even with 12 am i which is quite interesting okay so um let's so let's set it to um

00:41:40.300 --> 00:41:46.060
let's try to uh check the log so we have no logs right and after an hour the events are

00:41:46.060 --> 00:41:51.980
defeated so if this were setting here after an hour the early events would be deleted out of here

00:41:51.980 --> 00:41:59.660
okay so let's close that out i'm going to go off the script here because for some reason your mini

00:41:59.660 --> 00:42:02.340
Q was acting differently with that deployment.

00:42:02.340 --> 00:42:09.220
Well, no, so what it says is it provides it where yours says unknown under almost the last

00:42:09.220 --> 00:42:13.300
one, second to the last up above pod sandbox changed.

00:42:13.300 --> 00:42:18.100
Yes, where yours says unknown, mine says minimum memory is six in mine, but we tried it

00:42:18.100 --> 00:42:19.900
with 12 and it didn't work either.

00:42:19.900 --> 00:42:23.260
So let's go ahead and delete that deployment.

00:42:23.260 --> 00:42:25.220
Let's change it to 100 in mind.

00:42:25.220 --> 00:42:29.640
So we can figure out what's going on here because this changed in two days.

00:42:29.660 --> 00:42:39.440
something that's go through it again all right so something changed in the last two

00:42:39.440 --> 00:42:47.120
days with this deployment and the last day with this deployment and it it I wonder

00:42:47.120 --> 00:42:52.560
if it's the version is the same that's weird okay didn't try it yeah oh well let's

00:42:52.560 --> 00:42:56.300
take a look at it first though describe it while it's running and let's see what

00:42:56.300 --> 00:42:59.640
it said um yeah let's look at the events down at the bottom again

00:42:59.660 --> 00:43:04.660
Yeah, so that's, yeah, everything looks good there.

00:43:04.660 --> 00:43:06.660
Okay, so go ahead and delete that and try it.

00:43:06.660 --> 00:43:08.660
You said you wanted to try sick.

00:43:08.660 --> 00:43:09.660
Oh, there we go.

00:43:09.660 --> 00:43:11.660
All right, let's take a look at the oomkill.

00:43:11.660 --> 00:43:13.660
We know for sure had an oom kill.

00:43:13.660 --> 00:43:16.660
We know that there will probably be a good event in there.

00:43:16.660 --> 00:43:20.660
Well, it didn't give us the message on the RAM.

00:43:20.660 --> 00:43:22.660
Well, that's interesting.

00:43:22.660 --> 00:43:28.660
So, when I run this on my mini-cube, it actually gives me the event on what's the

00:43:28.660 --> 00:43:33.860
the event on what the minimum requirement is so that was interesting and you're running yeah i

00:43:33.860 --> 00:43:42.100
see up there memory six in my CPU 250 that nothing in there check some configuration complete ready

00:43:42.100 --> 00:43:50.660
for startup and set one two three four five six processes and then it's it's it's umkilling

00:43:50.660 --> 00:43:58.580
before it gets any further interesting because it does a engine X automatically uses um it

00:43:58.660 --> 00:44:00.900
It reads how many CPUs available in the node.

00:44:00.900 --> 00:44:03.840
In this case, there's 16 CPUs.

00:44:03.840 --> 00:44:07.280
And so you should see Start Worker Process.

00:44:07.280 --> 00:44:08.900
There should be 16 of those.

00:44:12.120 --> 00:44:15.800
And you only see one, two, three, four, five, six.

00:44:15.800 --> 00:44:18.080
So that's interesting, because then the crash loop back off.

00:44:18.080 --> 00:44:22.200
So yeah, no read out on the minimum.

00:44:22.200 --> 00:44:24.320
All right, it's a nuance of your,

00:44:24.320 --> 00:44:27.520
and in that case, crash with back off,

00:44:27.520 --> 00:44:32.320
There are generally no logs if it stays in container creating.

00:44:32.320 --> 00:44:34.280
So container creating wouldn't have a log,

00:44:34.280 --> 00:44:35.360
and then when it goes in the crash,

00:44:35.360 --> 00:44:39.590
we back off you may not have a log.

00:44:39.590 --> 00:44:43.590
So, okay, just a second here.

00:44:43.590 --> 00:44:44.750
All right.

00:44:44.750 --> 00:44:45.830
So we're going to query the node.

00:44:45.830 --> 00:44:48.450
We're going to delete that and query the node.

00:44:48.450 --> 00:44:52.410
We're going to check and see what is allocatable for CPU and RAM.

00:44:52.410 --> 00:44:55.990
And then how much RAM, 12.2 gig?

00:44:55.990 --> 00:44:58.910
Yep, okay, so now, and then

00:44:58.930 --> 00:45:04.450
how many pods were in our last deployment, the EngineX app, how many pods did it deploy?

00:45:04.450 --> 00:45:10.990
There we go. Okay, so let's do this. Let's create a new EngineX app, and we're going to call it,

00:45:11.190 --> 00:45:17.890
so we just copy it over, and we're going to call it EngineX app dash request. All right, so this is

00:45:17.890 --> 00:45:30.960
designed for a 12 CPU cluster. So what we need to do is we have three deployments. So we're going to

00:45:30.980 --> 00:45:43.700
set the CPU limit to what the limit first set the CPU limit to do three times three that's going to be

00:45:43.700 --> 00:45:52.420
nine CPUs we do three times three that'll be nine big of memory right let's go ahead and set the

00:45:52.420 --> 00:46:00.420
limit now let's go ahead and do this let's set the limit to three for CPU so set that is

00:46:00.420 --> 00:46:09.200
3.0 and then from memory set that as 3,000 MI and then for requests that the CPU to

00:46:09.200 --> 00:46:16.180
7.0. Neil has this slightly different setup for you than what I used so they need to modify

00:46:16.180 --> 00:46:22.220
this here in a little bit. Make sure that's correct. We're going to deploy it. All right so what we

00:46:22.220 --> 00:46:30.180
have here is we had limits of three and requests of seven on CPU right and it says

00:46:30.180 --> 00:46:32.940
seven must be less than or equal to the limit.

00:46:32.940 --> 00:46:35.180
So you can't request more than your limit.

00:46:36.300 --> 00:46:39.640
Right, if you limited to three, yeah, absolutely, yeah.

00:46:39.640 --> 00:46:45.640
So in the same with the RAM, so now we're gonna modify.

00:46:45.640 --> 00:46:49.400
So we're gonna, so a request is a reservation.

00:46:49.400 --> 00:46:52.540
It's saying, hey, you know, it's like reserving a hotel room.

00:46:52.540 --> 00:46:56.680
Hey, I need to reserve seven CPUs and four gig of memory.

00:46:56.680 --> 00:47:01.680
And the node controller and the coulid says,

00:47:01.680 --> 00:47:06.680
okay, I've got seven CPUs available and four gig of memory.

00:47:06.680 --> 00:47:11.680
And if it doesn't, it says, hey, I have no nodes to put you on,

00:47:11.680 --> 00:47:13.680
you'll have to wait until I have a node available.

00:47:13.680 --> 00:47:17.680
And so it will loop and move and move and loop until a note is available.

00:47:19.680 --> 00:47:23.680
And so limit is, you know, okay, you've requested it,

00:47:23.680 --> 00:47:25.680
and then now my max limit is,

00:47:25.680 --> 00:47:33.760
is whatever we have there. So in this case we're going to modify this so that they both say the

00:47:33.760 --> 00:47:44.160
same thing. CPU 7 and memory 4,000 on both. It did not, no. And it gave you actually a very

00:47:44.160 --> 00:47:52.880
good verbose, you know, error message. So and that's where Coup control actually sometimes

00:47:52.880 --> 00:47:56.800
does provide very verbose mess. All right. That's the first step to troublesweet.

00:47:57.120 --> 00:48:27.100
Mm-hmm. Yeah. All right. And what is the next step? Yep. Because it hasn't started up yet, so there's no logs. So now what's the next step? Yep. Yeah. Let's take a look and see what that looks like. So what do we have running on there? Yep. And how much are they consuming of our requests? 92? Yep. So I see that. So if you look at allocated resources. Yeah. Yeah. So 14,750. So 14.75. And limits were.

00:48:27.120 --> 00:48:34.320
individual pod correct correct and so this is saying we're trying to deploy pod number three

00:48:34.320 --> 00:48:42.240
but hey 92 percent is already requested 87 percent um which doesn't matter the limits can go over

00:48:42.240 --> 00:48:47.840
a hundred as you can see and we have plenty of memory available although we are maybe a little

00:48:47.840 --> 00:48:56.960
short um so it's telling you we're full there's you know we're going to modify the limits and you can go ahead

00:48:57.120 --> 00:49:07.200
take that down yeah all right delete the deployment let's modify and what should we modify

00:49:07.200 --> 00:49:16.080
so let's see we have 16 so let me see here we have 16 available and normally we have 12

00:49:16.080 --> 00:49:22.080
when we do a mini-cube cluster so we have four additional available so normally we would

00:49:22.080 --> 00:49:26.960
change the request to cp6 um two

00:49:27.120 --> 00:49:31.120
or for three. Let's see here.

00:49:31.120 --> 00:49:34.120
Let's go ahead and change it to six.

00:49:34.120 --> 00:49:36.120
Six and see.

00:49:36.120 --> 00:49:38.120
Go ahead and apply it.

00:49:38.120 --> 00:49:41.120
Let's run through.

00:49:41.120 --> 00:49:43.120
We can see it's still pending so we could

00:49:43.120 --> 00:49:46.120
go straight to the node.

00:49:46.120 --> 00:49:50.120
Check the node resources.

00:49:50.120 --> 00:49:53.120
And what do we have for our allocated?

00:49:53.120 --> 00:49:55.120
It's lower.

00:49:55.120 --> 00:49:58.720
Okay, so let's change it again.

00:49:58.720 --> 00:50:02.320
Okay, yeah, let's try five.

00:50:02.320 --> 00:50:05.120
Let's check that one that's still pending.

00:50:05.120 --> 00:50:08.120
Let's check the pending pod, see what's going.

00:50:08.120 --> 00:50:10.120
Okay, let's look at the node.

00:50:10.120 --> 00:50:13.120
All right, so how much memory is requested?

00:50:13.120 --> 00:50:17.120
Okay, and let's go up and deduct that from how much is available.

00:50:17.120 --> 00:50:20.120
So we scroll up a little higher,

00:50:20.120 --> 00:50:23.120
and it'll tell us what the total allocated,

00:50:23.120 --> 00:50:34.620
located was 12-236, so scroll back down minus 81170, so 12-236 minus 81170.

00:50:35.300 --> 00:50:47.720
That leaves us with about 4.05, and on request, and for some reason, here's this saying,

00:50:49.320 --> 00:50:50.900
that's not enough.

00:50:50.900 --> 00:50:54.580
So it won't let you go to 100% per requests is what that's telling you.

00:50:54.580 --> 00:50:58.820
Okay, so let's change again.

00:50:58.820 --> 00:51:03.540
Again, your mini cube cluster is two days newer than mine

00:51:03.540 --> 00:51:06.260
and yours is slightly different than mine.

00:51:06.260 --> 00:51:11.460
So yeah, and you'll run into this with Kubernetes clusters

00:51:11.460 --> 00:51:14.340
where that's the nuance of two different clusters

00:51:14.340 --> 00:51:16.820
spun up on two different, you know, BMs.

00:51:16.820 --> 00:51:19.820
Yep, two different clouds.

00:51:19.820 --> 00:51:23.820
Okay, so we're going to change the requested memory.

00:51:23.820 --> 00:51:25.820
Let's change it to three days.

00:51:25.820 --> 00:51:29.820
So it won't let you go to 100% for resources or requests.

00:51:29.820 --> 00:51:31.820
All right, boom, all three run.

00:51:31.820 --> 00:51:33.820
Let's take a look at the nodes.

00:51:33.820 --> 00:51:37.820
And when you see there, well, we've got 98% allocated on the CPU, right?

00:51:37.820 --> 00:51:43.820
So, yeah, so is there any room left for more deployments?

00:51:43.820 --> 00:51:44.820
Yeah.

00:51:44.820 --> 00:51:54.820
And so this is what mine would look like, but in general, you shouldn't have any more than 90% of a node requested on either one of those CPU memory.

00:51:54.820 --> 00:51:59.820
And so as you can see, it wouldn't even let us do 100% with everything running.

00:51:59.820 --> 00:52:03.820
So it killed us at about 99 right.

00:52:03.820 --> 00:52:06.820
So we're going to insure a...

00:52:06.820 --> 00:52:07.820
Yeah, yeah.

00:52:07.820 --> 00:52:12.820
And Minicube is actually great for testing concepts out there.

00:52:12.820 --> 00:52:14.240
It just has some limitations.

00:52:14.660 --> 00:52:17.460
It's a great Docker engineering magic.

00:52:17.700 --> 00:52:18.460
Oh, interesting.

00:52:20.340 --> 00:52:21.220
Oh, interesting.

00:52:21.420 --> 00:52:22.520
It's restarting the network.

00:52:23.880 --> 00:52:24.580
All right.

00:52:24.940 --> 00:52:25.960
So, question.

00:52:26.300 --> 00:52:29.260
Can requests be greater than available re-bring?

00:52:30.580 --> 00:52:33.620
Can limits be greater than available?

00:52:33.860 --> 00:52:34.580
All right.

00:52:34.720 --> 00:52:36.280
So unless, what's that?

00:52:36.440 --> 00:52:37.000
Correct.

00:52:37.400 --> 00:52:38.160
That is correct.

00:52:38.280 --> 00:52:40.380
All right, we're going to do a review, and then we're going to,

00:52:41.700 --> 00:52:42.800
you know, actually, you know what?

00:52:42.820 --> 00:52:50.980
yeah it's 11 o'clock we'll go ahead and take a 15 minute break come back at 1115 all right went a little over there

00:52:50.980 --> 00:52:58.560
it took a little longer than I thought um so we'll take a 15 minute break come back at 1115 and then we'll

00:52:58.560 --> 00:53:13.320
do our review and go to the next list see in 15 let's go ahead and review so in lesson seven we learned

00:53:13.320 --> 00:53:22.920
about node resources and limits and how constrained control planes can affect the Kubernetes API server.

00:53:22.920 --> 00:53:27.160
How available resources are defined in the node status.

00:53:27.160 --> 00:53:32.440
Node status is accessible by describing node using the group control API.

00:53:32.440 --> 00:53:39.640
Now node resources consist of CPU, memory, storage, pods, and also Pids.

00:53:39.640 --> 00:53:42.840
Now requests enable pods to reserve a specific resource.

00:53:42.840 --> 00:53:46.940
resource, requests ensure a resource is available if needed.

00:53:46.940 --> 00:53:52.320
Node limits define the maximum resources available to pod.

00:53:52.320 --> 00:54:02.080
Total node limits may exceed 100% of the maximum resources, and that is due to the fact

00:54:02.080 --> 00:54:10.280
that most pods will not experience 100% at the same time.

00:54:10.280 --> 00:54:13.160
Kublett monitors and node resources for node pressure.

00:54:13.160 --> 00:54:16.040
Kublett will proactively terminate pods.

00:54:18.080 --> 00:54:21.680
Kublett can fail one or more pods to reclaim resources.

00:54:21.680 --> 00:54:26.280
Kublett will reclaim resources before evicting the pods.

00:54:26.280 --> 00:54:32.340
And it will do that by deleting old container images first.

00:54:32.340 --> 00:54:37.840
Learn how node disk pressure can be caused by unmanaged logs.

00:54:37.840 --> 00:54:41.820
And why a proper node log shipping and rotation process.

00:54:41.840 --> 00:54:50.840
is important. Boom killing a process can happen to node processes and pods. So if you run into an area

00:54:50.840 --> 00:54:59.520
where your pods are working, but your node is unresponsive in some aspect, if you can still access

00:54:59.520 --> 00:55:06.100
it with the Kubernetes API server, you might see that a process has been killed inside the node

00:55:06.100 --> 00:55:11.820
itself, which you don't normally interact with. But you may see that under your

00:55:11.840 --> 00:55:16.840
events. Boom killing can happen due to unavailable resources or constraints. This can happen when

00:55:16.840 --> 00:55:23.080
upgrading a container with tight constraints. Upgrades should be tested on a production-like cluster

00:55:23.080 --> 00:55:29.900
first, and how upstream maintainers forget to test before pushing to production in a production

00:55:29.900 --> 00:55:35.580
like and buying. Actually more common than you would think it should be when working with Helm charts,

00:55:35.580 --> 00:55:41.680
by the way. Container resource requests and limits are optional. Most common are CPU and memory. I've

00:55:41.680 --> 00:55:49.800
actually never seen kids may be requested and limited in the container config and here's a new one

00:55:49.800 --> 00:55:56.800
we didn't go over this because it's in beta that's not enabled in the cluster but you can also

00:55:56.800 --> 00:56:03.820
request and limit it in the pod config so you can think of what it might be a pod can contain multiple

00:56:03.820 --> 00:56:11.660
containers right and so if we can request and limit for the pod sometimes we can't have

00:56:11.680 --> 00:56:21.730
add it in for the container so we might not have access to those configs but if we have access

00:56:21.730 --> 00:56:27.810
to the pod configs we can add it in there or we can just set the request or limit for the pod

00:56:27.810 --> 00:56:34.850
only and skip the container so there are certain situations where that might be advantageous

00:56:34.850 --> 00:56:40.690
somebody requested it and it's now in ebbly beta coup scheduler determines which node to place the

00:56:40.690 --> 00:56:48.930
pod on and the kublet handles reservation of the resource so the kublet runs on each node and then kub

00:56:48.930 --> 00:56:57.810
scheduler runs as a pod which handles the schedule containers may consume more resources

00:56:57.810 --> 00:57:04.370
than requested kublet enforces resource limits container may temporarily consume more than its limit

00:57:04.370 --> 00:57:10.530
containers may not consume more than its limits over time however it might be a delay

00:57:10.530 --> 00:57:13.970
CPU limits are enforced by throttling,

00:57:15.310 --> 00:57:18.070
memory limits are enforced with oomkills,

00:57:19.190 --> 00:57:22.390
and oomkilling a container may not happen immediately,

00:57:22.390 --> 00:57:26.050
although as we saw when we practiced it,

00:57:26.050 --> 00:57:28.450
it looked pretty immediate to me.

00:57:28.450 --> 00:57:32.070
Determining the correct resources can be done in two ways,

00:57:32.070 --> 00:57:36.010
review the documentation for recommendations,

00:57:36.010 --> 00:57:37.970
setting a practical resource,

00:57:37.970 --> 00:57:39.530
both requests and limits,

00:57:39.530 --> 00:57:44.530
Monitor the events and logs for any error messages.

00:57:44.530 --> 00:57:50.530
Adjust as necessary first to eliminate startup issues.