11 videos 📅 2024-05-20 09:00:00 America/Creston
2:05:45
2024-05-20 09:46:48
2:09
2024-05-20 12:30:32
2:41:18
2024-05-20 12:33:23
1:36:58
2024-05-21 08:00:54
5:24:36
2024-05-21 10:06:11
3:24
2024-05-22 06:36:04
9:25
2024-05-22 08:03:05
40:22
2024-05-22 08:14:12
2:49
2024-05-22 09:47:03
1:48:29
2024-05-22 09:50:24
1:57:28
2024-05-22 12:09:49

Visit the Apache Nifi GROUP 2 course recordings page

                WEBVTT

00:00:01.680 --> 00:00:02.120
Perfect.

00:00:05.900 --> 00:00:06.340
Perfect.

00:00:07.180 --> 00:00:14.540
Thomas, did you make it back yet or did he get called out to another call or something?

00:00:14.840 --> 00:00:14.960
Sorry.

00:00:15.040 --> 00:00:16.280
Were you looking for me?

00:00:16.700 --> 00:00:16.720
Tom?

00:00:17.200 --> 00:00:17.660
Hey, Tom.

00:00:18.080 --> 00:00:23.480
Yeah, if you can, can you go ahead and start your desktop, get logged in?

00:00:24.860 --> 00:00:27.160
I think it should be good to go.

00:00:27.680 --> 00:00:27.840
Sure.

00:00:27.840 --> 00:00:28.580
Thank you.

00:00:40.940 --> 00:00:44.880
All right, looks like everyone is coming up.

00:00:45.140 --> 00:00:48.920
Peter, just so you know, the Uploads folder is what I uploaded back.

00:00:48.980 --> 00:00:55.420
I actually need to upload a newer presentation, so the PDF you see there has a couple of errors.

00:00:56.260 --> 00:01:01.360
So later today after the class, I will give an updated PDF.

00:01:01.540 --> 00:01:06.280
I also will email it out, it's there just for reference for the most part.

00:01:06.720 --> 00:01:12.220
I still need to email it out to the previous class, and so I have all of you all's email

00:01:12.220 --> 00:01:15.540
addresses now, so I will handle that.

00:01:18.260 --> 00:01:19.820
So let's get started.

00:01:19.820 --> 00:01:26.920
All right, so looks like everyone logged in to DAW Desktop, depending on the system

00:01:26.920 --> 00:01:31.200
you're using and proxies and things like that.

00:01:31.340 --> 00:01:37.380
The last training class, a couple of folks got proxies to death, they had to have

00:01:37.380 --> 00:01:40.260
some fixes applied before things were working.

00:01:41.080 --> 00:01:47.120
But once you get logged in, you're going to have your virtual desktop, and so the

00:01:47.120 --> 00:01:49.320
latency sometimes can be an issue.

00:01:49.960 --> 00:01:53.500
You may point and click at something and it takes a second to respond.

00:01:54.340 --> 00:01:59.060
You know, you just have to bear with it, but this way it gives us that common training

00:01:59.060 --> 00:02:01.080
environment to work off of.

00:02:02.300 --> 00:02:06.960
So that being said, feel free to follow along with me.

00:02:07.160 --> 00:02:13.780
There's nothing specific you would need to do right this minute except for follow along.

00:02:13.780 --> 00:02:19.920
You can go through Non-Find, install it, and stuff like that if you want.

00:02:20.420 --> 00:02:26.400
That's what I'm going to do, but we will also have time to work on that as well

00:02:26.400 --> 00:02:28.120
in this presentation.

00:02:30.680 --> 00:02:31.660
All right.

00:02:33.100 --> 00:02:39.520
So just like I mentioned earlier, everything that is taught in this training class,

00:02:39.520 --> 00:02:45.540
well, except for the actual data flows and those types of things, is available online.

00:02:46.740 --> 00:02:55.580
So for some quick resources, nonfind.apache.org is the website to go to.

00:02:56.800 --> 00:03:00.120
The documentation is very, very extensive.

00:03:00.980 --> 00:03:07.180
As you can imagine, being a government system, it requires lots of documentation.

00:03:07.180 --> 00:03:11.120
You can download it from here, those types of things.

00:03:11.680 --> 00:03:16.060
So if you're at home and want to play around with Non-Find, have at it.

00:03:16.620 --> 00:03:27.920
You can go to nonfind.apache.org, click download, and download the latest release.

00:03:29.360 --> 00:03:32.580
So Non-Find 2.0 is out now.

00:03:32.580 --> 00:03:39.440
Well, it's not a full-fledged release version yet, but it's coming.

00:03:40.260 --> 00:03:44.580
You'll notice that even on the 16th of May, four days ago,

00:03:44.620 --> 00:03:48.040
they had an updated release to this pre-release.

00:03:48.500 --> 00:03:50.440
You're more than welcome to download that.

00:03:50.800 --> 00:03:55.400
For this training class, though, we're going to work off of Non-Find 126,

00:03:55.400 --> 00:04:00.180
just because that's the version that most people will...

00:04:00.180 --> 00:04:02.640
It's a major version.

00:04:02.680 --> 00:04:08.300
That's not the next major version in the version that most people have installed.

00:04:08.940 --> 00:04:11.300
So I think you all are running 125.

00:04:12.200 --> 00:04:18.220
In some instances, you may have a 126, but you can download that.

00:04:18.460 --> 00:04:20.280
You can download the source.

00:04:20.280 --> 00:04:27.760
If you want the source files and to compile Non-Find yourself, you can have the source.

00:04:28.600 --> 00:04:31.080
It is an open-source application.

00:04:32.000 --> 00:04:35.360
So you're able to download the source files.

00:04:35.900 --> 00:04:39.400
If you're a software engineer, you're able to go in and make changes.

00:04:40.360 --> 00:04:44.060
If you're running scans or vulnerabilities, have at it.

00:04:44.440 --> 00:04:45.900
There's a lot of capabilities.

00:04:47.940 --> 00:04:53.920
The administrator guide actually kind of goes into how to build Non-Find from source

00:04:53.920 --> 00:04:55.720
and those types of things.

00:04:56.440 --> 00:05:02.480
But for this class, we are going to work off of the Non-Find standard 126 binary.

00:05:02.900 --> 00:05:04.840
This is already pre-built.

00:05:05.240 --> 00:05:06.640
It's ready to go.

00:05:06.740 --> 00:05:09.980
We just need to install it.

00:05:10.160 --> 00:05:14.440
But it's not an install like you would think with a normal Windows application.

00:05:14.440 --> 00:05:18.320
Now, I chose Windows for this class.

00:05:19.260 --> 00:05:24.680
The last class, there was a couple of folks who really liked the Ubuntu Linux.

00:05:25.660 --> 00:05:30.900
But for ease of use, everyone on this class is using Windows.

00:05:31.500 --> 00:05:32.720
Non-Find can run on Windows.

00:05:32.880 --> 00:05:33.840
It can run on Linux.

00:05:34.000 --> 00:05:38.440
It can run on numerous types of devices and operating systems.

00:05:39.520 --> 00:05:44.560
So again, when I downloaded Non-Find for everyone for this class,

00:05:44.640 --> 00:05:45.980
I just went to this link.

00:05:46.620 --> 00:05:49.180
I said, go to the 126 binaries.

00:05:49.540 --> 00:05:52.880
And I clicked the HTTP.

00:05:54.340 --> 00:05:56.040
And there's a backup site as well.

00:05:56.180 --> 00:05:59.700
They're chosen based upon the CDN and stuff like that.

00:06:00.460 --> 00:06:02.400
It takes a second, and then it will start downloading.

00:06:03.020 --> 00:06:06.840
So Non-Find itself is pretty big.

00:06:06.840 --> 00:06:13.500
And the reason being is Non-Find has all of its processors

00:06:13.500 --> 00:06:19.840
and just everything going on here bundled up into this zip file.

00:06:24.840 --> 00:06:30.920
So as you can imagine, with everything installed coming with Non-Find,

00:06:31.140 --> 00:06:33.820
it's over a gig download.

00:06:33.820 --> 00:06:36.580
And so the zip file is pretty big.

00:06:36.840 --> 00:06:40.160
When we extract it, it's going to get even larger.

00:06:41.300 --> 00:06:45.360
Some of the newer versions of Non-Find should be a little bit smaller.

00:06:46.340 --> 00:06:50.200
I know that some of the processors that we're going to talk about today

00:06:50.200 --> 00:06:53.980
may not be available in the newer version of Non-Find

00:06:53.980 --> 00:06:58.680
just because they are taking some of that processors out

00:06:58.680 --> 00:07:01.920
and having it as an optional download.

00:07:01.920 --> 00:07:04.580
Just for FYI.

00:07:05.080 --> 00:07:06.140
It actually is downloading.

00:07:06.320 --> 00:07:08.780
You can see that it's downloading another version of it.

00:07:08.800 --> 00:07:10.680
It's 1.2 gig.

00:07:12.120 --> 00:07:15.880
I'm going to just delete it because we already have it downloaded.

00:07:16.380 --> 00:07:18.080
I've downloaded it for all of you all

00:07:18.080 --> 00:07:20.220
just in case it was a network connection.

00:07:21.260 --> 00:07:24.780
And honestly, having seven or eight people download it all at once,

00:07:24.780 --> 00:07:27.100
it might slow down a little bit.

00:07:29.280 --> 00:07:34.280
So again, if you want to follow up, you can follow along with me.

00:07:34.320 --> 00:07:36.300
You can pull up the website.

00:07:37.640 --> 00:07:41.160
You can look at the documentation, those types of things.

00:07:42.140 --> 00:07:47.080
So the main parts of the documentation that we are going to go over

00:07:47.080 --> 00:07:50.660
is part of the admin guide and the user guide.

00:07:50.660 --> 00:07:55.840
But like I said, this is extremely well documented.

00:07:56.400 --> 00:07:58.400
For an open source application,

00:07:58.680 --> 00:08:02.460
it's actually one of the best documented products out there.

00:08:03.060 --> 00:08:05.620
Not only do you have documentation about NAFA

00:08:05.620 --> 00:08:10.420
and what you need to do as an administrator or user,

00:08:10.880 --> 00:08:15.440
but you also have the documentation about every single processor

00:08:15.440 --> 00:08:17.300
that they support.

00:08:17.300 --> 00:08:24.180
Now, I say they support because the NAFA product itself,

00:08:24.580 --> 00:08:28.020
you're going to see it has 300-plus processors,

00:08:28.540 --> 00:08:32.200
but there's other processors out there as well.

00:08:32.440 --> 00:08:35.100
They just may not have the documentation

00:08:36.300 --> 00:08:40.620
that you'll see with the Apache-level product.

00:08:43.000 --> 00:08:45.200
So like I said, we're going to go over some of the things

00:08:45.200 --> 00:08:47.000
in the sysadmin guide.

00:08:47.360 --> 00:08:50.320
We're also going to go through some of the NAFA user guide.

00:08:51.540 --> 00:08:55.420
Those are two links that's included in the presentation

00:08:55.420 --> 00:08:58.160
I like to send out after the class,

00:08:58.640 --> 00:09:00.660
just so you have that for reference.

00:09:01.380 --> 00:09:03.740
We've already kind of talked about what is NAFA

00:09:03.740 --> 00:09:05.500
and some of these things.

00:09:05.760 --> 00:09:11.960
Here is some additional requirements and use cases

00:09:11.960 --> 00:09:13.460
and those types of things.

00:09:13.460 --> 00:09:17.980
But yeah, so we've talked about a flow file,

00:09:18.100 --> 00:09:22.220
a processor connection, those types of capabilities.

00:09:22.900 --> 00:09:25.120
Again, we'll walk through this.

00:09:25.220 --> 00:09:27.960
I find it best to just hands-on learning.

00:09:28.780 --> 00:09:31.120
But if you have any questions,

00:09:32.060 --> 00:09:34.720
you can revert back to this documentation.

00:09:35.320 --> 00:09:39.440
Another great thing about NAFA is all of this documentation

00:09:39.440 --> 00:09:42.780
that you're seeing right here is included

00:09:42.780 --> 00:09:45.300
with the NAFA product itself.

00:09:45.700 --> 00:09:50.380
So if you have a question on what does the git file

00:09:50.380 --> 00:09:52.220
processor do, for instance,

00:09:52.620 --> 00:09:54.780
you can go into your NAFA instance,

00:09:54.860 --> 00:09:56.320
even if you do not have internet,

00:09:56.780 --> 00:09:58.200
and pull the documentation.

00:09:59.420 --> 00:10:01.820
For this one, though, I like to just work off

00:10:01.820 --> 00:10:05.400
of the official documentation.

00:10:05.400 --> 00:10:08.360
But yeah, I can go right here,

00:10:08.420 --> 00:10:13.440
and as soon as the internet wants to respond,

00:10:13.880 --> 00:10:16.400
I will have the documentation on git file.

00:10:21.200 --> 00:10:22.180
There we go.

00:10:23.060 --> 00:10:25.400
So for instance, the git file processor,

00:10:26.300 --> 00:10:29.260
it creates flow files from files in the directory.

00:10:30.520 --> 00:10:32.740
It will ignore files it doesn't have

00:10:32.740 --> 00:10:34.800
at least read permission to.

00:10:35.860 --> 00:10:39.420
And then each processor has a property.

00:10:39.820 --> 00:10:43.580
Some are required, and some are optional.

00:10:43.980 --> 00:10:48.460
And then we also have some that we can add.

00:10:49.500 --> 00:10:51.940
There's a relationship to the processor.

00:10:52.960 --> 00:10:55.940
There's other attributes, things like that

00:10:55.940 --> 00:10:57.720
that we can take a look at.

00:10:58.400 --> 00:11:02.200
But for this part of it,

00:11:02.200 --> 00:11:05.080
just remember that the documentation is there.

00:11:05.740 --> 00:11:08.740
So everyone has NAFA downloaded.

00:11:09.980 --> 00:11:12.620
And I'm going to kind of walk you through.

00:11:13.480 --> 00:11:17.160
I was debating on whether to include this

00:11:17.160 --> 00:11:19.680
as part of the class, but I felt like

00:11:19.680 --> 00:11:22.420
I think we can all accomplish this pretty easily.

00:11:22.880 --> 00:11:24.380
So actually what we're all going to do

00:11:24.380 --> 00:11:26.540
is kind of install our NAFA,

00:11:26.740 --> 00:11:28.460
walk through what some of that means.

00:11:29.080 --> 00:11:30.640
If you don't understand it,

00:11:30.640 --> 00:11:33.100
or you have any questions or some additional details

00:11:33.100 --> 00:11:36.700
you might need, again, feel free to interrupt me.

00:11:36.940 --> 00:11:39.160
This time between now and lunch,

00:11:39.480 --> 00:11:42.460
I've set aside just to get this up and running

00:11:43.060 --> 00:11:45.880
and kind of go over what some of it means.

00:11:46.620 --> 00:11:48.200
And then when we come back from lunch,

00:11:48.280 --> 00:11:50.260
we'll actually start building a data flow.

00:11:50.760 --> 00:11:53.180
My goal by the end of today,

00:11:53.560 --> 00:11:57.540
you will be able to download NAFA yourself

00:11:57.540 --> 00:12:00.340
onto your own device if you need to,

00:12:00.360 --> 00:12:03.500
install that, get it up and running,

00:12:04.020 --> 00:12:05.760
and then build your own data flow.

00:12:06.040 --> 00:12:08.540
So by the end of the day, you're going to go from

00:12:09.420 --> 00:12:11.100
potentially never touching NAFA

00:12:11.100 --> 00:12:13.580
to having your own data flow running.

00:12:13.760 --> 00:12:15.380
So let's make that a goal,

00:12:15.480 --> 00:12:17.840
and I think that's a goal we can accomplish.

00:12:18.900 --> 00:12:20.400
All right, so that being said,

00:12:20.540 --> 00:12:22.360
everybody should be in their desktop.

00:12:22.880 --> 00:12:25.340
If you can, bring up your folder,

00:12:25.340 --> 00:12:27.280
your file explorer.

00:12:29.660 --> 00:12:31.940
It might take a second like mine to load.

00:12:32.940 --> 00:12:36.280
And everything that I'm going over

00:12:36.280 --> 00:12:38.760
is either in the downloads folder

00:12:38.760 --> 00:12:41.260
or the uploads folder on your desktop.

00:12:42.240 --> 00:12:43.760
So for this case, we're going to go

00:12:43.760 --> 00:12:44.980
to the downloads folder.

00:12:48.020 --> 00:12:48.540
There we go.

00:12:48.940 --> 00:12:51.680
And you're going to see a bunch of files

00:12:51.680 --> 00:12:52.780
that I've downloaded.

00:12:53.600 --> 00:12:56.600
Again, I've downloaded NAFA twice,

00:12:57.640 --> 00:12:58.780
so I'll delete that.

00:13:00.200 --> 00:13:04.440
There is, this is an executable I use

00:13:04.440 --> 00:13:05.780
to install Notepad++

00:13:06.300 --> 00:13:08.640
because we're going to need to be able to edit files.

00:13:12.260 --> 00:13:14.920
But it's really easy to download.

00:13:14.980 --> 00:13:17.800
Once it's downloaded, you'll have a zip file.

00:13:18.360 --> 00:13:19.480
So for this instance, we have

00:13:19.480 --> 00:13:22.720
NAFA-126.0-bin.

00:13:22.760 --> 00:13:25.420
That tells me that this is not the source code,

00:13:25.500 --> 00:13:28.380
but it's actually a binary that's ready to go.

00:13:29.260 --> 00:13:30.600
So for this exercise,

00:13:31.100 --> 00:13:35.220
if you can click on the NAFA-126-bin,

00:13:36.100 --> 00:13:37.400
just a single click,

00:13:37.520 --> 00:13:40.280
and then right click to extract all.

00:13:40.980 --> 00:13:42.860
So again, this is just a zip file.

00:13:43.240 --> 00:13:44.260
We're extracting it.

00:13:44.520 --> 00:13:47.300
I'm going to leave it in the downloads folder

00:13:47.300 --> 00:13:48.200
where it's at.

00:13:48.200 --> 00:13:51.680
You can actually move locations if you want to.

00:13:52.200 --> 00:13:53.660
It's totally your call.

00:13:54.440 --> 00:13:57.360
And I'll leave it show extracted files when complete

00:13:57.360 --> 00:14:00.740
just because I want to kind of go over some of that stuff.

00:14:01.680 --> 00:14:03.760
So I'll just click Extract.

00:14:04.500 --> 00:14:05.760
It takes a minute.

00:14:06.360 --> 00:14:08.300
Again, this is a virtual machine.

00:14:09.640 --> 00:14:11.600
All of us running on it.

00:14:12.160 --> 00:14:14.660
I've given the machine eight games of RAM,

00:14:14.660 --> 00:14:16.680
eight virtual cores,

00:14:18.220 --> 00:14:21.560
and I think it was like 300 or 400 gig of space,

00:14:21.580 --> 00:14:24.080
but we won't even use that much.

00:14:26.720 --> 00:14:28.360
Well, that is extracting.

00:14:29.180 --> 00:14:34.560
Oh, it looks like everyone is good to go.

00:14:36.560 --> 00:14:37.820
Leroy, did you get yours extracted?

00:14:38.540 --> 00:14:39.360
Let's see.

00:14:45.540 --> 00:14:47.660
Oh, there you go.

00:14:51.200 --> 00:14:52.140
Give it another second.

00:14:52.640 --> 00:14:57.100
Like I said, it takes a minute to extract all of that zip file.

00:14:57.220 --> 00:15:01.060
That zip file is a 1.2526 gig file,

00:15:01.800 --> 00:15:05.780
and when it extracts, it's going to be even bigger.

00:15:06.420 --> 00:15:11.020
That's probably been the biggest complaint that I know of

00:15:11.020 --> 00:15:14.100
that the community, not the NiFi community,

00:15:14.380 --> 00:15:18.140
but the user community in general complains about.

00:15:18.440 --> 00:15:21.260
It's just the massive size to download this.

00:15:23.020 --> 00:15:25.760
But if any of you have played Xbox,

00:15:26.000 --> 00:15:28.420
you know some of these games can be 50, 60 game now.

00:15:29.420 --> 00:15:30.380
So yeah.

00:15:50.500 --> 00:15:51.200
All right.

00:15:51.200 --> 00:15:54.200
It looks like most everyone got that extracted.

00:15:54.980 --> 00:15:58.000
Give me just another second for Alderius to finish up

00:15:58.660 --> 00:16:00.820
and Peter's to finish up.

00:16:00.860 --> 00:16:01.620
All right.

00:16:01.660 --> 00:16:03.200
Let's go back to my Peter here.

00:16:03.300 --> 00:16:04.160
So it looks like it's finished.

00:16:04.340 --> 00:16:04.800
Perfect.

00:16:05.640 --> 00:16:09.700
So you should open a new folder in Windows

00:16:09.700 --> 00:16:17.400
and the only folder in that folder is a NiFi 126.0.

00:16:17.980 --> 00:16:20.600
So if you can double click and go into that,

00:16:21.320 --> 00:16:25.100
and then you should see a bin folder, a comp folder,

00:16:25.380 --> 00:16:26.860
docs, extensions, lib.

00:16:27.600 --> 00:16:32.040
These are not all of the folders that NiFi creates.

00:16:32.160 --> 00:16:35.220
This is just the initial downloaded install.

00:16:35.220 --> 00:16:39.440
When we get up and running and started,

00:16:40.100 --> 00:16:42.820
it's going to create some additional folders

00:16:42.820 --> 00:16:48.380
called our content repository, provenance repository,

00:16:49.020 --> 00:16:50.220
and a couple of other repositories.

00:16:51.700 --> 00:16:56.220
So I was talking about earlier how it keeps track

00:16:56.220 --> 00:17:00.100
of all the changes of data, that data provenance,

00:17:00.180 --> 00:17:01.220
that lineage, that pedigree.

00:17:02.300 --> 00:17:04.300
So it keeps track of that.

00:17:04.300 --> 00:17:07.500
And where it keeps track and how it stores it,

00:17:07.640 --> 00:17:09.700
the system itself keeps tracking

00:17:09.700 --> 00:17:12.700
and stores all of that information locally.

00:17:13.280 --> 00:17:19.600
So keep that in mind for those that are only

00:17:19.600 --> 00:17:21.140
infrastructure side of the house.

00:17:21.540 --> 00:17:24.100
And for those that will install this,

00:17:24.520 --> 00:17:31.140
some of the sys admins, you all would need to know this as well.

00:17:31.140 --> 00:17:35.040
But in general, these are the files and folders

00:17:35.040 --> 00:17:37.980
that NiFi will create when you extract it.

00:17:38.940 --> 00:17:40.960
As soon as we start the application,

00:17:41.480 --> 00:17:43.900
it's going to create some additional files,

00:17:44.140 --> 00:17:46.280
and that all lives locally.

00:17:47.760 --> 00:17:52.080
Now, depending on your strategy of deploying this

00:17:52.080 --> 00:17:55.040
and scaling this and some of the other things,

00:17:55.960 --> 00:17:59.980
you may want to have some of these content repositories,

00:17:59.980 --> 00:18:03.020
some of these other repositories on different network

00:18:03.020 --> 00:18:04.400
and stat attached storage.

00:18:04.940 --> 00:18:12.980
I know that for the content and flow and provenance,

00:18:13.700 --> 00:18:16.760
those are usually stored on high speed drives

00:18:16.760 --> 00:18:19.940
just because there's a lot of reading and writing back and forth.

00:18:20.340 --> 00:18:23.460
And then you'll have some of the other repositories

00:18:23.460 --> 00:18:27.280
that they really don't get used as much.

00:18:27.280 --> 00:18:28.560
They're still needed.

00:18:28.560 --> 00:18:32.140
So they may break this up a little bit

00:18:32.140 --> 00:18:35.720
and put some of these folders on some high speed drives,

00:18:35.720 --> 00:18:38.280
some of the other folders on some normal drives

00:18:38.280 --> 00:18:40.920
for cost savings and performance gains.

00:18:42.020 --> 00:18:45.340
But that all depends on your deployment strategy.

00:18:47.380 --> 00:18:48.900
Whenever we have time,

00:18:49.220 --> 00:18:51.340
and we're going to have plenty of time for this,

00:18:51.360 --> 00:18:54.560
but if you want to go very technical into details,

00:18:54.740 --> 00:19:00.000
and I'll be happy to give you my opinion on that.

00:19:00.680 --> 00:19:02.000
I can get very technical.

00:19:02.380 --> 00:19:03.720
I still write software.

00:19:03.820 --> 00:19:06.740
I still write software for NiFi even.

00:19:07.940 --> 00:19:12.040
But I kind of like the training part as well for this.

00:19:12.660 --> 00:19:15.460
So anyways, so when we extract it,

00:19:15.480 --> 00:19:17.880
we've got the bin folder, the comp folder,

00:19:18.520 --> 00:19:20.080
docs, extensions, and lib.

00:19:20.520 --> 00:19:21.780
Docs is docs.

00:19:22.680 --> 00:19:24.000
And I got mentioned earlier,

00:19:24.000 --> 00:19:27.660
everything that you can find on the website,

00:19:28.120 --> 00:19:31.220
you're going to get in the docs folder as well.

00:19:31.460 --> 00:19:33.880
And NiFi utilizes that docs folder

00:19:33.880 --> 00:19:35.620
to provide you information.

00:19:37.380 --> 00:19:41.900
The bin folder, that's your binary, right?

00:19:41.900 --> 00:19:46.640
This is where you would execute the start of NiFi

00:19:46.640 --> 00:19:48.120
and those types of things.

00:19:48.320 --> 00:19:51.420
We'll go into more of that once we start.

00:19:51.420 --> 00:19:55.060
But the bin folder for NiFi,

00:19:55.060 --> 00:19:58.640
it contains both Windows batch files,

00:19:58.660 --> 00:20:03.300
as well as Linux shell scripts.

00:20:03.720 --> 00:20:05.320
So if you're running this on Linux,

00:20:05.700 --> 00:20:07.500
you have a way to start NiFi.

00:20:07.660 --> 00:20:09.060
If you're running this on Windows,

00:20:09.280 --> 00:20:11.200
you have a way to start NiFi.

00:20:12.300 --> 00:20:15.420
So that's how you would start NiFi

00:20:16.060 --> 00:20:18.620
as well as some of those binaries

00:20:18.620 --> 00:20:22.700
to if you need to change a username or password

00:20:22.700 --> 00:20:24.700
or something like that,

00:20:25.060 --> 00:20:26.500
you can utilize those.

00:20:27.880 --> 00:20:31.520
The conf directory, which we are going to go into,

00:20:32.620 --> 00:20:36.260
is where all the configuration for NiFi exists.

00:20:37.960 --> 00:20:41.560
So all your properties and where does NiFi,

00:20:41.720 --> 00:20:43.880
what IP address is NiFi running on,

00:20:43.980 --> 00:20:46.300
what port number is NiFi running on,

00:20:46.300 --> 00:20:48.280
those types of things.

00:20:49.220 --> 00:20:51.380
So there is a lot of configuration.

00:20:52.360 --> 00:20:54.120
A lot of this is the security.

00:20:54.980 --> 00:20:58.040
So plugging in that security infrastructure

00:20:58.040 --> 00:21:00.340
and those types of things,

00:21:01.280 --> 00:21:03.640
you would do the configuration here.

00:21:04.560 --> 00:21:06.260
So what I am going to do though,

00:21:06.440 --> 00:21:09.900
and this is totally up to you if you want to,

00:21:09.920 --> 00:21:12.300
but if you go to nifi.properties,

00:21:13.640 --> 00:21:15.740
you should see nifi.properties.

00:21:15.740 --> 00:21:18.960
I am going to open that

00:21:18.960 --> 00:21:23.460
and go over some of the key points of the properties

00:21:24.320 --> 00:21:28.300
just so for those that are sysadmins and others,

00:21:28.500 --> 00:21:30.000
you'll have this information.

00:21:30.780 --> 00:21:35.280
I know for some of you who may not be that technical,

00:21:35.460 --> 00:21:37.040
this may be a little overwhelming.

00:21:38.080 --> 00:21:40.260
Again, this is just for information.

00:21:40.380 --> 00:21:41.740
You're more than welcome to follow along,

00:21:42.000 --> 00:21:44.440
but there are some key points

00:21:45.000 --> 00:21:48.280
that I feel like everyone needs to see

00:21:48.280 --> 00:21:49.760
as part of the properties file.

00:21:51.040 --> 00:21:54.440
So anyway, this is your core properties section.

00:21:55.200 --> 00:21:57.240
Again, a lot of this is documented.

00:21:57.800 --> 00:22:02.360
A lot of this relates back to the website even.

00:22:02.960 --> 00:22:06.520
So what is the main flow configuration file

00:22:06.520 --> 00:22:07.640
and where is that located?

00:22:08.040 --> 00:22:10.440
Of course, it's going to be your conf directory.

00:22:11.280 --> 00:22:13.180
Where is the JSON file?

00:22:13.300 --> 00:22:14.440
It's also there.

00:22:15.800 --> 00:22:19.540
You have Archive enabled, those types of things.

00:22:20.500 --> 00:22:22.920
So that's some of your core properties.

00:22:23.880 --> 00:22:27.440
Some of the other ones is your authorizers configuration file.

00:22:27.620 --> 00:22:31.020
This is where, as it was mentioned earlier,

00:22:32.060 --> 00:22:33.980
how you're trying to work on,

00:22:34.340 --> 00:22:36.260
the other organization is trying to work on

00:22:36.260 --> 00:22:39.180
getting NiFi installed, up and running,

00:22:39.180 --> 00:22:42.180
get the multi-tenancy, the multi-users.

00:22:43.080 --> 00:22:45.460
I think it was Brett is working on some of that.

00:22:46.200 --> 00:22:48.820
And so Brett would go in here.

00:22:48.960 --> 00:22:51.840
He would configure these properties.

00:22:51.980 --> 00:22:55.580
He would take a look at the authorizers.xml file

00:22:55.580 --> 00:23:00.080
and start building in some of his configurations

00:23:00.080 --> 00:23:04.600
he would need for security and user permissions

00:23:04.600 --> 00:23:07.500
and identity management and all that fun stuff.

00:23:07.500 --> 00:23:09.820
But that's where you would find that.

00:23:10.260 --> 00:23:15.380
But there's one key property that you can just tell

00:23:15.380 --> 00:23:16.560
that's come from the government.

00:23:17.160 --> 00:23:20.700
And that is the niFi.ui.banner.text.

00:23:21.280 --> 00:23:24.600
Now, this property lives there for,

00:23:24.780 --> 00:23:28.360
now it's for a couple of different reasons.

00:23:28.760 --> 00:23:31.960
But this banner, as you can imagine,

00:23:32.220 --> 00:23:33.880
you could put unclassified.

00:23:34.120 --> 00:23:35.860
You could put secret.

00:23:35.860 --> 00:23:37.180
You could put top secret.

00:23:37.340 --> 00:23:40.720
You could put kui.

00:23:41.060 --> 00:23:44.660
You could do whatever classification header

00:23:44.660 --> 00:23:46.140
you would need.

00:23:46.200 --> 00:23:51.780
So what that does is it provides the government

00:23:51.780 --> 00:23:55.560
an easy way to put the classification

00:23:55.560 --> 00:23:58.900
of the system on a banner.

00:23:59.060 --> 00:24:01.540
So when you pull up this niFi instance,

00:24:01.540 --> 00:24:06.620
you immediately see the classification of the system.

00:24:08.220 --> 00:24:11.660
Also, because that is such a government property,

00:24:12.520 --> 00:24:15.040
the way that commercial companies use it

00:24:15.040 --> 00:24:17.120
and others in the government as well

00:24:17.120 --> 00:24:20.920
is this may be our dev instance of niFi.

00:24:21.040 --> 00:24:22.800
This may be our test instance.

00:24:22.960 --> 00:24:24.040
It may be prod.

00:24:24.620 --> 00:24:27.140
And so I know a lot of companies

00:24:27.140 --> 00:24:30.540
that use this banner as a description.

00:24:30.700 --> 00:24:33.900
So you can quickly go to the UI,

00:24:34.440 --> 00:24:35.800
and you will immediately see,

00:24:35.800 --> 00:24:37.920
I am working on the test system

00:24:37.920 --> 00:24:39.940
or I'm working on the dev system.

00:24:40.320 --> 00:24:44.740
And so for me, I am actually going to put something in

00:24:44.740 --> 00:24:49.800
and I'll say, this is a test system.

00:24:50.420 --> 00:24:52.940
You know, I can put in whatever.

00:24:55.560 --> 00:24:56.080
Okay.

00:24:57.480 --> 00:25:01.620
And, you know, again, you don't necessarily need to do this.

00:25:01.760 --> 00:25:02.940
If you're following along,

00:25:03.880 --> 00:25:06.160
feel free to put in whatever you would like.

00:25:06.740 --> 00:25:08.940
This is your own personal niFi instance.

00:25:11.040 --> 00:25:11.640
And you go from there.

00:25:11.880 --> 00:25:12.940
Or you can just leave it blank.

00:25:13.960 --> 00:25:15.860
So some of the other properties

00:25:15.860 --> 00:25:19.080
that you would need to potentially look at

00:25:19.080 --> 00:25:20.860
if you're like a sys admin

00:25:20.860 --> 00:25:23.320
and stuff like that is, you know,

00:25:24.520 --> 00:25:25.980
where the NAR library,

00:25:26.440 --> 00:25:31.560
you know, all of the processors are,

00:25:32.100 --> 00:25:35.920
if you're familiar with, you know, software engineering,

00:25:36.260 --> 00:25:37.720
if you're not, it's okay.

00:25:38.600 --> 00:25:40.300
But, you know, in Java,

00:25:40.300 --> 00:25:44.000
we usually create a Java jar, a jar file.

00:25:44.280 --> 00:25:47.360
And we will then run Java-jar,

00:25:47.820 --> 00:25:49.540
you know, the name of the jar file,

00:25:49.540 --> 00:25:53.200
give it memory, you know, configuration and stuff like that,

00:25:53.260 --> 00:25:55.400
and execute and run that application.

00:25:56.180 --> 00:25:59.540
In niFi, they're called NARs, N-A-Rs.

00:26:00.500 --> 00:26:02.820
So, you know, it didn't take a lot of imagination

00:26:02.820 --> 00:26:04.560
to see where we stole that from.

00:26:04.960 --> 00:26:08.660
But NARs are basically Java jars

00:26:09.760 --> 00:26:12.740
built specifically for niFi.

00:26:13.540 --> 00:26:15.220
So, you know, properties like this,

00:26:15.320 --> 00:26:18.260
you know, where is that NAR library?

00:26:18.260 --> 00:26:22.020
You know, the autoload library.

00:26:22.740 --> 00:26:25.680
One of the things that we are going to do during this class

00:26:25.680 --> 00:26:29.140
is we are going to import a new processor

00:26:29.720 --> 00:26:32.380
and have it up and running and usable

00:26:32.380 --> 00:26:35.500
without ever stopping our data flow,

00:26:35.740 --> 00:26:38.520
without ever restarting the system, right?

00:26:38.540 --> 00:26:41.280
The data is still flowing, it's still working,

00:26:41.560 --> 00:26:42.860
and I'm going to go in,

00:26:43.000 --> 00:26:45.540
I'm going to deploy a new connection type,

00:26:45.540 --> 00:26:50.120
a new processor and build a flow for that

00:26:50.120 --> 00:26:52.500
and have that flow up and running as well.

00:26:53.160 --> 00:26:55.960
So, I will, you know, I'll show you how we do that

00:26:55.960 --> 00:26:57.080
and how that works.

00:26:57.420 --> 00:26:59.160
But just so you know, you know,

00:26:59.220 --> 00:27:01.420
the lib directory is the library directory

00:27:01.420 --> 00:27:05.820
that's where all the core niFi files reside.

00:27:06.200 --> 00:27:12.920
And you can see that, you know, in...

00:27:12.920 --> 00:27:15.920
We have extensions that we talked about.

00:27:16.200 --> 00:27:17.380
It's empty right now.

00:27:18.320 --> 00:27:19.680
And then we have the lib directory,

00:27:19.900 --> 00:27:21.680
which should be pretty full,

00:27:21.740 --> 00:27:23.200
and there's the NARs.

00:27:23.620 --> 00:27:25.900
So, you know, even in the file name,

00:27:25.920 --> 00:27:29.360
you can see niFi AvroNAR,

00:27:29.640 --> 00:27:32.900
niFi AWS service, AzureNARs,

00:27:32.940 --> 00:27:35.060
niFi Dropbox extensions,

00:27:36.440 --> 00:27:41.320
niFi GeoHash, GRPC, HL7,

00:27:41.320 --> 00:27:42.760
which is the medical format.

00:27:43.240 --> 00:27:46.440
You know, all of these are processors

00:27:47.320 --> 00:27:48.700
that come out of the box.

00:27:48.920 --> 00:27:50.840
They all live in the lib directory.

00:27:51.740 --> 00:27:55.180
And these will be immediately available

00:27:55.180 --> 00:27:57.240
as soon as we start niFi.

00:27:58.180 --> 00:27:59.900
And then, you know, like I said,

00:28:00.100 --> 00:28:01.620
there's an extensions directory.

00:28:01.980 --> 00:28:02.460
It's empty.

00:28:03.220 --> 00:28:08.120
This is where if we had a special processor

00:28:08.120 --> 00:28:12.660
and you had a CI-CD process set up

00:28:12.660 --> 00:28:16.860
where, you know, a developer could create a processor,

00:28:17.140 --> 00:28:18.980
it checks it in, it builds it,

00:28:19.000 --> 00:28:20.280
it tests it for vulnerabilities,

00:28:20.540 --> 00:28:23.340
you know, it goes through that whole CI-CD

00:28:23.340 --> 00:28:26.520
and DevSecOps, you know,

00:28:26.720 --> 00:28:28.960
policies and things like that that you have set up.

00:28:29.140 --> 00:28:32.560
You know, ultimately, it will spit out a NAR file,

00:28:32.840 --> 00:28:34.800
and that NAR file, you know,

00:28:34.900 --> 00:28:36.120
could be automatically installed

00:28:36.340 --> 00:28:38.600
into the extensions directory,

00:28:38.940 --> 00:28:42.700
and you would have immediate access to that processor.

00:28:43.200 --> 00:28:45.540
Not only would you have immediate access,

00:28:45.820 --> 00:28:48.500
but if the permissions and the policy was there,

00:28:49.020 --> 00:28:52.200
everyone would have access to that same processor.

00:28:53.080 --> 00:28:55.080
So, you know, as part of, you know,

00:28:55.140 --> 00:28:58.340
some of the usability points I was making earlier

00:28:58.920 --> 00:29:02.740
where you're able to reuse these components.

00:29:02.740 --> 00:29:07.060
So if I build a connector to say, you know,

00:29:07.140 --> 00:29:09.700
let's go, it's already built,

00:29:10.460 --> 00:29:12.620
but let's talk SQL Server.

00:29:13.120 --> 00:29:15.800
If I build a connector for SQL Server

00:29:16.680 --> 00:29:18.300
and test it out,

00:29:18.460 --> 00:29:21.420
it's went through the processes that, you know,

00:29:21.500 --> 00:29:23.720
you may have set up and things like that.

00:29:23.960 --> 00:29:26.720
It gets deployed to that extensions directory.

00:29:27.200 --> 00:29:29.760
Well, now everyone can use that connector.

00:29:30.020 --> 00:29:32.120
So as a different organization,

00:29:32.120 --> 00:29:35.200
I don't have to go and build a new connector.

00:29:35.580 --> 00:29:39.400
I can just reuse one that was already built,

00:29:39.680 --> 00:29:41.920
but I may be connecting to a different instance.

00:29:42.100 --> 00:29:44.280
I may be connecting to, you know,

00:29:44.480 --> 00:29:46.100
different usernames, passwords,

00:29:47.320 --> 00:29:49.160
authentication methods in SQL Server.

00:29:49.920 --> 00:29:51.880
You know, it may be the same SQL Server

00:29:51.880 --> 00:29:53.580
as just pulling from a different database

00:29:53.580 --> 00:29:55.600
or a different table, you know,

00:29:55.660 --> 00:29:56.500
those types of things.

00:29:56.940 --> 00:29:59.040
So, you know, that extensions directory,

00:29:59.320 --> 00:30:01.360
you know, is pretty important here,

00:30:01.360 --> 00:30:05.400
and that is how we hotload processors.

00:30:05.900 --> 00:30:09.380
So that means we do not need to stop data from flowing.

00:30:09.800 --> 00:30:12.080
We do not need to turn data flows off.

00:30:12.200 --> 00:30:14.460
We don't need to restart the application.

00:30:15.260 --> 00:30:16.220
It can run.

00:30:16.480 --> 00:30:18.860
Data can continuously flow through the system,

00:30:18.860 --> 00:30:21.500
and now I have a newer capability

00:30:21.500 --> 00:30:23.980
so I can connect to new data sources.

00:30:24.860 --> 00:30:29.620
So that's the purpose of the extensions and lib.

00:30:29.620 --> 00:30:33.140
Again, all of that is referenced, you know,

00:30:33.360 --> 00:30:35.100
into the 9.5.0 properties.

00:30:35.560 --> 00:30:36.540
You can change it.

00:30:36.840 --> 00:30:39.140
I've seen, you know, some folks change it

00:30:39.140 --> 00:30:40.840
to a different lib directory

00:30:40.840 --> 00:30:43.700
depending on their policies, things like that.

00:30:43.940 --> 00:30:45.620
But, you know, as a sysadmin,

00:30:46.500 --> 00:30:48.920
this is the section to do that.

00:30:50.520 --> 00:30:52.460
And then, of course, you know, you need a state.

00:30:53.400 --> 00:30:54.580
You need half-state.

00:30:54.880 --> 00:30:57.260
If you're working in a clustered environment,

00:30:57.260 --> 00:30:59.400
you would need ZooKeeper,

00:30:59.600 --> 00:31:01.380
which is another software application

00:31:01.380 --> 00:31:03.040
that is open source.

00:31:03.840 --> 00:31:08.280
If you've been around any kind of distributed system,

00:31:09.100 --> 00:31:11.360
clustered system, you've heard of ZooKeeper.

00:31:11.520 --> 00:31:14.040
ZooKeeper is widely used, you know,

00:31:14.080 --> 00:31:17.320
across the board with government and commercial alike.

00:31:17.840 --> 00:31:19.920
So, you know, here's where you would manage

00:31:19.920 --> 00:31:22.000
some of that state management.

00:31:22.860 --> 00:31:24.420
We have a database directory,

00:31:24.460 --> 00:31:26.300
so that's our database repository.

00:31:26.300 --> 00:31:30.580
Again, we have multiple repositories here.

00:31:31.420 --> 00:31:34.700
And where you store those is configurable.

00:31:35.180 --> 00:31:38.980
So you may, depending on if it's a repository

00:31:38.980 --> 00:31:41.480
that needs a lot of reads and writes,

00:31:42.300 --> 00:31:46.500
you may store it on a different type of system.

00:31:47.360 --> 00:31:50.980
So, you know, to save costs, I know, you know,

00:31:51.080 --> 00:31:54.380
some companies really streamline and fine-tune this

00:31:54.380 --> 00:31:59.820
where some repositories will live on a very high-speed SSD

00:32:00.380 --> 00:32:03.540
or, you know, even potentially in memory,

00:32:03.540 --> 00:32:05.600
you know, mapped back to the file system.

00:32:05.960 --> 00:32:07.980
And then some of these repositories

00:32:07.980 --> 00:32:10.660
that really don't have a lot of reads and writes,

00:32:11.220 --> 00:32:13.240
you don't worry about those as much

00:32:13.240 --> 00:32:14.580
in the performance aspect.

00:32:14.700 --> 00:32:16.980
You know, they may go on to a slower drive,

00:32:16.980 --> 00:32:20.580
you know, instead of having to either choose one or the other.

00:32:21.000 --> 00:32:23.360
Because this is so highly configurable,

00:32:23.360 --> 00:32:26.480
it's there so you can do those types of things,

00:32:26.660 --> 00:32:28.660
you know, reducing your cloud costs,

00:32:28.800 --> 00:32:31.960
your server resources, you know, your own Prim resource.

00:32:32.040 --> 00:32:33.860
I know you guys do a lot of stuff on Prim,

00:32:34.300 --> 00:32:37.700
so, you know, it may help reduce some of those resources.

00:32:38.940 --> 00:32:40.680
And that's the database settings.

00:32:42.400 --> 00:32:45.060
And there's a flow file repository,

00:32:46.000 --> 00:32:48.500
the content repository we talked about.

00:32:49.220 --> 00:32:51.840
And, you know, one of the things I like to point out here

00:32:51.840 --> 00:32:58.040
is that content repository is keeping, basically, your flow file.

00:32:58.340 --> 00:33:01.260
So if you told it to ingest a CSV,

00:33:02.080 --> 00:33:06.840
that content repository is keeping a copy of that CSV,

00:33:07.640 --> 00:33:09.600
you know, for the time being.

00:33:10.440 --> 00:33:16.880
You know, that's because if NaFa were to crash and shut down

00:33:16.880 --> 00:33:19.300
and when you restarted it,

00:33:19.300 --> 00:33:24.980
that processor that was processing, you know, that flow file

00:33:24.980 --> 00:33:28.060
is going to go back to the content repository

00:33:28.060 --> 00:33:32.660
and say, give me back that file, I need to finish processing it.

00:33:34.120 --> 00:33:37.660
And before a processor that's, you know,

00:33:37.700 --> 00:33:41.720
say we've got three or four processors chained together,

00:33:42.080 --> 00:33:46.040
you know, we get file and we send it to the next step and the next step.

00:33:46.040 --> 00:33:49.220
Well, that next step, if something crashes

00:33:49.220 --> 00:33:52.540
before it gets time to go to that next step where it completes,

00:33:53.020 --> 00:33:57.060
when NaFa comes back up, it will reprocess that content,

00:33:57.340 --> 00:34:03.480
that file based upon whatever processor was working on it

00:34:03.480 --> 00:34:05.640
because of that content repository.

00:34:06.800 --> 00:34:10.380
And, you know, just so you have a little bit

00:34:10.380 --> 00:34:13.300
of the underneath-the-hood workings of this,

00:34:13.300 --> 00:34:18.840
when a flow file or a piece of data, you know, goes to a processor,

00:34:19.740 --> 00:34:23.760
it will not release that flow file and that data

00:34:23.760 --> 00:34:26.680
until the next processor has it.

00:34:26.940 --> 00:34:32.300
And so what that does is it guarantees that a copy of that data

00:34:32.300 --> 00:34:35.740
is on the next processor doing that function

00:34:35.740 --> 00:34:39.480
and that next processor got a thumbs up

00:34:39.480 --> 00:34:42.980
from the previous processor that it was complete.

00:34:43.360 --> 00:34:46.260
So that way, you know, if something crashes

00:34:46.260 --> 00:34:49.300
and things like that, you don't lose data.

00:34:49.740 --> 00:34:53.260
Now, if it is in the middle of processing data and it crashes,

00:34:53.680 --> 00:34:56.220
it's going to try to reprocess that data.

00:34:56.580 --> 00:34:59.220
So, you know, just keep that in mind where you may get

00:34:59.220 --> 00:35:01.900
some initial results from NaFa,

00:35:02.100 --> 00:35:05.560
but you need some additional, you know, processing to happen.

00:35:05.560 --> 00:35:10.040
So you may get duplication of data because, you know,

00:35:10.140 --> 00:35:12.860
it produced 25% of the output.

00:35:13.040 --> 00:35:16.260
But, you know, before it crashed, when it come back up,

00:35:16.260 --> 00:35:18.820
it's going to try to redo that.

00:35:18.980 --> 00:35:23.060
And so, you know, you may get an additional duplication of data.

00:35:23.540 --> 00:35:24.680
Now, with that being said,

00:35:25.100 --> 00:35:27.460
we do have ways of dealing with that as well.

00:35:27.560 --> 00:35:30.120
There's actually a DDoT processor.

00:35:30.680 --> 00:35:32.940
I don't know if it's in this latest version,

00:35:32.940 --> 00:35:37.160
but I do know it's there because duplicate data

00:35:37.160 --> 00:35:41.200
is a pretty big issue in my experience

00:35:41.200 --> 00:35:42.880
in all the years I've been with the government.

00:35:44.060 --> 00:35:46.740
So, yeah, that is our content.

00:35:46.980 --> 00:35:49.240
Then you have all the provenance events.

00:35:49.940 --> 00:35:53.720
It has its own, you know, repository.

00:35:54.620 --> 00:35:58.060
When we start NaFa, that new folder is going to be created,

00:35:58.060 --> 00:36:00.620
and that's where the provenance events will go into.

00:36:00.620 --> 00:36:06.240
So you can specify how much, you know, Richard,

00:36:06.240 --> 00:36:07.540
I'm thinking about you here,

00:36:07.620 --> 00:36:11.620
where you may have an overarching data governance plan

00:36:12.560 --> 00:36:14.040
and strategy.

00:36:14.500 --> 00:36:17.980
And so, you know, you want your NaFa to retain

00:36:17.980 --> 00:36:20.180
the last 14 days.

00:36:20.660 --> 00:36:24.060
And then, you know, during that 14 days,

00:36:24.320 --> 00:36:27.680
you're offloading all that provenance information

00:36:27.680 --> 00:36:31.820
into a larger data governance.

00:36:32.340 --> 00:36:33.560
Informatica has one.

00:36:33.980 --> 00:36:36.780
You know, there's a couple of open source versions.

00:36:37.100 --> 00:36:41.360
You know, there's like Knox and Tika,

00:36:41.600 --> 00:36:44.160
or not Tika, but Ranger and Apache Ranger,

00:36:44.240 --> 00:36:46.340
Apache Knox, and a few of those tools

00:36:46.340 --> 00:36:48.120
that kind of work well with NaFa.

00:36:48.580 --> 00:36:51.960
So, you know, you may have a, you know,

00:36:52.700 --> 00:36:56.680
a corporate-wide or unit-wide governance policy.

00:36:56.940 --> 00:36:59.840
So that's where this would get configured.

00:37:00.220 --> 00:37:01.740
You can configure it to keep the,

00:37:01.740 --> 00:37:04.580
right now it's configured to keep all the provenance events

00:37:04.580 --> 00:37:09.520
for 30 days with a max storage size of 10 gigs.

00:37:10.620 --> 00:37:14.740
So, you know, so keep that in mind, you know,

00:37:14.740 --> 00:37:16.880
when you're building it and designing your system

00:37:16.880 --> 00:37:22.200
that if you have a ton of data coming through,

00:37:23.120 --> 00:37:25.400
you may want to, you know,

00:37:25.400 --> 00:37:27.400
those events are being offloaded,

00:37:28.060 --> 00:37:31.680
and you know, you have those data provenance events,

00:37:31.680 --> 00:37:34.720
so you don't need to keep 30 days worth of data.

00:37:35.060 --> 00:37:37.860
You need to only keep it for a week or a day.

00:37:38.040 --> 00:37:39.480
You know, I've seen this configured

00:37:40.600 --> 00:37:43.560
where it keeps it only for a couple of hours,

00:37:43.840 --> 00:37:46.160
because as those events happen,

00:37:46.640 --> 00:37:48.120
all of the data governance events

00:37:48.120 --> 00:37:50.640
is being offloaded to the, you know,

00:37:50.760 --> 00:37:53.900
the corporate-wide data governance system.

00:37:53.900 --> 00:37:56.900
And so, you know, this is highly configurable,

00:37:57.420 --> 00:37:59.680
you know, for you sysadmins out there

00:37:59.680 --> 00:38:02.180
as you start working through getting it installed

00:38:02.180 --> 00:38:04.220
and things like that, you know,

00:38:04.260 --> 00:38:06.080
pay attention to some of these properties,

00:38:06.380 --> 00:38:08.780
because, you know, this one, for instance,

00:38:09.620 --> 00:38:13.880
it'll take 30 days or 10 gigs to fill up,

00:38:13.920 --> 00:38:17.240
and so, you know, you may want to adjust those settings.

00:38:18.900 --> 00:38:22.320
Again, you know, you see a lot of times

00:38:22.320 --> 00:38:26.660
applications have settings that you can just, like,

00:38:26.660 --> 00:38:30.020
you know, go to a menu and select the setting

00:38:30.020 --> 00:38:31.780
and change it and those types of things.

00:38:32.620 --> 00:38:34.000
We have some of that in i5,

00:38:34.060 --> 00:38:37.000
but this is part of some of those core settings.

00:38:37.500 --> 00:38:39.800
There is no UI for this.

00:38:40.820 --> 00:38:42.160
You know, that was, you know,

00:38:42.220 --> 00:38:43.120
one of the things that we went over

00:38:43.120 --> 00:38:45.120
in the last training class was,

00:38:45.780 --> 00:38:49.100
you're gonna have to go in and edit these files.

00:38:49.100 --> 00:38:52.260
You're going to have to put in, you know,

00:38:52.360 --> 00:38:55.180
different properties based upon your organization.

00:38:56.080 --> 00:38:58.540
I wish there was an easier way to do this.

00:38:58.880 --> 00:39:01.000
I find that this way is not too bad.

00:39:01.500 --> 00:39:03.520
I find that setting up security

00:39:03.520 --> 00:39:04.880
and those types of things,

00:39:05.160 --> 00:39:07.060
now, that's the more difficult part,

00:39:07.080 --> 00:39:09.220
and the problem with that is,

00:39:09.320 --> 00:39:11.280
you know, if you do run into an issue,

00:39:11.840 --> 00:39:13.580
you're asking the community,

00:39:13.840 --> 00:39:16.100
you're asking Google, you know,

00:39:16.160 --> 00:39:20.320
or you're emailing me and saying,

00:39:20.560 --> 00:39:22.000
hey, Josh, how do I do this?

00:39:22.520 --> 00:39:24.300
You'll get my contact information

00:39:24.300 --> 00:39:27.340
at the end of the, you know,

00:39:27.520 --> 00:39:29.060
at the end of every day, I think it is.

00:39:29.440 --> 00:39:32.500
But I will be happy to answer

00:39:32.500 --> 00:39:35.040
any quick questions after this training.

00:39:35.980 --> 00:39:37.660
Do remember, like, you know,

00:39:37.660 --> 00:39:39.600
I'm delivering training

00:39:39.600 --> 00:39:42.680
and the support after the class,

00:39:42.680 --> 00:39:44.760
you know, falls upon you,

00:39:44.760 --> 00:39:48.700
but, you know, you now have a contact

00:39:49.300 --> 00:39:51.540
that is an original contributor,

00:39:52.100 --> 00:39:53.660
you know, still contributes,

00:39:54.180 --> 00:39:56.200
still uses it, still builds it,

00:39:56.440 --> 00:39:58.120
is still our design

00:39:58.120 --> 00:40:00.020
and architect solutions around this.

00:40:00.620 --> 00:40:02.920
So, and I'll give you my contact information,

00:40:03.340 --> 00:40:04.800
and if you have a quick question,

00:40:04.940 --> 00:40:06.540
feel free to reach out, you know,

00:40:06.540 --> 00:40:07.560
after this training class.

00:40:08.220 --> 00:40:10.420
But anyway, so, you know,

00:40:10.560 --> 00:40:12.560
that's the properties file.

00:40:12.560 --> 00:40:15.620
Another quick property

00:40:15.620 --> 00:40:17.020
you might want to take a look at,

00:40:17.420 --> 00:40:20.280
you have the remote host.

00:40:20.680 --> 00:40:24.280
So that is when we go into site-to-site

00:40:24.280 --> 00:40:27.680
because two NIFI instances can talk to each other,

00:40:27.900 --> 00:40:29.360
send data from one to the other,

00:40:31.260 --> 00:40:32.340
those types of things.

00:40:33.120 --> 00:40:35.060
So, you know, that's a good property

00:40:35.060 --> 00:40:36.400
you may want to take a look at.

00:40:37.620 --> 00:40:39.180
We have our web properties

00:40:39.180 --> 00:40:42.220
that right now, because of security,

00:40:42.800 --> 00:40:45.420
everything is going to run on your local host

00:40:45.420 --> 00:40:48.060
and the local backup secure port.

00:40:49.540 --> 00:40:51.840
Also, when we start NIFI,

00:40:52.380 --> 00:40:55.140
we are going to have to go into the logs file

00:40:55.140 --> 00:40:57.860
and find our username and password.

00:40:58.620 --> 00:41:00.880
So, a couple of versions ago,

00:41:00.940 --> 00:41:02.700
they implemented this change

00:41:02.700 --> 00:41:06.620
where every time you download and install it,

00:41:06.620 --> 00:41:09.680
it requires a username and password to log in,

00:41:09.680 --> 00:41:11.280
even on your local machine.

00:41:11.860 --> 00:41:14.420
The reason being, and I've seen it a thousand times,

00:41:14.540 --> 00:41:16.780
as a matter of fact, if you do some good Googling,

00:41:17.160 --> 00:41:20.600
you can still find NIFI instances

00:41:20.600 --> 00:41:24.980
sitting on an EC2 instance publicly exposed

00:41:24.980 --> 00:41:27.880
and no username, no password.

00:41:28.180 --> 00:41:30.720
So you can actually go into that instance,

00:41:31.160 --> 00:41:34.240
create a data flow that, you know,

00:41:34.240 --> 00:41:37.360
picks up data from something and delivers it to yourself.

00:41:38.500 --> 00:41:40.900
And this whole data flow is residing

00:41:40.900 --> 00:41:42.460
in someone else's instance,

00:41:42.780 --> 00:41:44.780
and so you're not paying for that resource.

00:41:44.880 --> 00:41:46.240
You're not paying for the EC2

00:41:46.760 --> 00:41:48.320
and the data and stuff like that.

00:41:49.260 --> 00:41:51.540
So, you know, there was a lot of people

00:41:51.540 --> 00:41:54.560
that was just downloading, installing, and running,

00:41:54.660 --> 00:41:56.260
and they were getting, you know,

00:41:56.320 --> 00:41:59.220
just hammered by malicious activity.

00:41:59.840 --> 00:42:01.900
So NIFI said, you know what,

00:42:01.900 --> 00:42:05.920
we are going to mandate a username and password

00:42:06.720 --> 00:42:08.420
on every install.

00:42:09.020 --> 00:42:11.120
So that way, like, you know,

00:42:11.240 --> 00:42:13.880
that way nobody can just randomly come in,

00:42:14.640 --> 00:42:16.320
once you see the username and password,

00:42:16.360 --> 00:42:17.780
it's actually very difficult.

00:42:18.060 --> 00:42:19.160
You can't guess it.

00:42:19.740 --> 00:42:22.060
So, you know, it's going to be very difficult

00:42:22.060 --> 00:42:23.400
to make sure that, you know,

00:42:23.480 --> 00:42:25.400
someone can just come in and run a flow.

00:42:26.160 --> 00:42:28.200
So we'll go through more of that,

00:42:28.220 --> 00:42:29.920
but, you know, just a little background,

00:42:29.920 --> 00:42:33.260
a little history, you know, of what's going on here.

00:42:33.560 --> 00:42:35.580
Again, we're going to be, you know,

00:42:35.700 --> 00:42:37.780
our NIFI instance is going to be on localhost,

00:42:37.880 --> 00:42:41.260
which is the IP address of 127.0.0.1.

00:42:42.100 --> 00:42:45.320
You can also use localhost in your domain name,

00:42:45.360 --> 00:42:48.240
but I like going just to the IP address.

00:42:48.780 --> 00:42:50.920
The port is going to be 8443.

00:42:51.840 --> 00:42:54.160
So if you are at home and download this

00:42:54.160 --> 00:42:58.160
and you're like, oh, I can go to this 127.0.0.1 IP

00:42:58.160 --> 00:43:01.060
and it will work, no, we specify the port.

00:43:01.540 --> 00:43:04.480
And so we use 8443 as the default.

00:43:05.400 --> 00:43:07.680
443 port, as you may know,

00:43:08.160 --> 00:43:10.940
is the secure port for, you know,

00:43:11.100 --> 00:43:13.020
most websites that you see.

00:43:13.300 --> 00:43:15.320
So when you go to Google, you know,

00:43:15.380 --> 00:43:19.600
you can go to http colon google.com

00:43:19.600 --> 00:43:22.200
and it will automatically redirect you

00:43:22.200 --> 00:43:24.460
to the secure version,

00:43:24.460 --> 00:43:27.940
the HTTPS of google.com.

00:43:28.340 --> 00:43:30.280
You know, same kind of principle here.

00:43:30.840 --> 00:43:33.460
You know, we have a very secure,

00:43:34.400 --> 00:43:37.340
instead of using this typical 443 port,

00:43:38.280 --> 00:43:40.560
it uses the backup SSL port,

00:43:40.600 --> 00:43:42.540
which is usually 8443.

00:43:43.240 --> 00:43:47.380
So we will need to specify the port in our browser.

00:43:47.960 --> 00:43:50.680
If we left this at 443,

00:43:50.880 --> 00:43:52.760
we wouldn't even need to specify the port.

00:43:52.760 --> 00:43:56.180
We can just say HTTPS and send it.

00:43:56.840 --> 00:43:58.640
So, you know, keep that in mind.

00:43:59.720 --> 00:44:01.760
Underneath 9.5, underneath the hood,

00:44:02.300 --> 00:44:03.220
is Jetty server.

00:44:04.280 --> 00:44:07.600
Jetty is another open source package.

00:44:08.380 --> 00:44:12.340
If you've ever heard of things like Apache Tomcat,

00:44:13.040 --> 00:44:18.380
JBoss, you know, these web server applications,

00:44:19.540 --> 00:44:21.460
you know, 9.5 is a web app.

00:44:21.460 --> 00:44:25.080
And so you need a server to run that web app.

00:44:25.780 --> 00:44:29.480
So under the hood of 9.5, that server is Jetty.

00:44:29.600 --> 00:44:31.980
There's a lot of configuration you can use.

00:44:32.320 --> 00:44:35.560
I don't see a lot of people messing with that.

00:44:35.780 --> 00:44:38.940
You know, just because, you know, Jetty, it works great.

00:44:39.100 --> 00:44:41.740
It's very lightweight in the performance there.

00:44:43.180 --> 00:44:45.420
Then you have some additional security.

00:44:46.440 --> 00:44:48.940
There's Apache Knox already mentioned.

00:44:48.940 --> 00:44:55.940
Some SAML properties, additional properties for multi-tenancy,

00:44:57.780 --> 00:45:00.740
you know, identity mapping, those types of things.

00:45:01.540 --> 00:45:03.940
You know, you have clustering and where's your zookeeper.

00:45:05.980 --> 00:45:08.120
So, you know, as a sysadmin,

00:45:08.240 --> 00:45:10.160
you may need some of these properties,

00:45:10.600 --> 00:45:15.840
but for the sake of time and, you know, for this class,

00:45:15.840 --> 00:45:19.000
we don't really need to worry about any of those others.

00:45:19.660 --> 00:45:21.960
But the main ones that we need to worry about

00:45:21.960 --> 00:45:23.740
is just where is this running?

00:45:24.120 --> 00:45:26.320
What's the IP address and port?

00:45:26.600 --> 00:45:29.360
Kind of like showing off the banner because, you know,

00:45:29.400 --> 00:45:32.040
it has that government, you know,

00:45:32.140 --> 00:45:33.880
even though it's an open source product,

00:45:34.140 --> 00:45:35.480
that property is still there,

00:45:35.480 --> 00:45:37.300
and it's there because of the government.

00:45:37.480 --> 00:45:41.160
The government is a contributor to this as well.

00:45:41.820 --> 00:45:44.820
So, you know, they keep tabs of 9.5.

00:45:44.820 --> 00:45:47.500
It's widely used, and so, you know,

00:45:47.640 --> 00:45:50.160
government employees and contractors, right, you know,

00:45:50.160 --> 00:45:53.920
they provide information back to the Apache Foundation.

00:45:54.320 --> 00:45:56.140
Hey, you know, either we need to build this

00:45:56.140 --> 00:45:59.880
or we built a patch and we want to include it.

00:45:59.880 --> 00:46:01.880
So, you know, just keep those things in mind.

00:46:02.480 --> 00:46:05.500
So, what we're going to do is close out of that.

00:46:07.280 --> 00:46:08.900
We'll just close out.

00:46:09.260 --> 00:46:12.560
Actually, we'll bring that up one more time,

00:46:12.560 --> 00:46:14.560
make sure I saved it.

00:46:24.820 --> 00:46:26.840
So, I'll put in my banner.

00:46:29.120 --> 00:46:30.880
This is a test system.

00:46:33.560 --> 00:46:34.120
Saved.

00:46:35.600 --> 00:46:36.500
All right.

00:46:37.920 --> 00:46:41.560
So, also, if you notice,

00:46:41.560 --> 00:46:45.140
there's no repository folders and stuff like that

00:46:45.140 --> 00:46:46.220
that we talked about.

00:46:46.380 --> 00:46:48.280
There's also no logs directory

00:46:49.040 --> 00:46:52.220
because all of these things are going to be created

00:46:52.220 --> 00:46:55.900
when we first execute 9.5.

00:46:56.100 --> 00:46:59.540
So, I went over the lib directory,

00:47:00.020 --> 00:47:03.000
the extensions directory, the docs of what they have,

00:47:03.640 --> 00:47:04.480
conf directory.

00:47:05.000 --> 00:47:08.280
A lot of you will not even touch this probably, right?

00:47:08.320 --> 00:47:09.560
You're going to rely on your sysadmins

00:47:09.680 --> 00:47:12.160
and others to get this going.

00:47:12.620 --> 00:47:15.460
But I like to go over it because, you know,

00:47:16.000 --> 00:47:17.720
anyone can download the application.

00:47:18.100 --> 00:47:20.740
You know, I was teaching my nine-year-old how to do this.

00:47:21.200 --> 00:47:24.700
So, she can download it and start playing around with it

00:47:24.700 --> 00:47:26.660
and run her own configuration if needed.

00:47:27.700 --> 00:47:29.820
You know, so that information is there.

00:47:29.840 --> 00:47:31.680
Most likely, you know, you won't need it.

00:47:32.500 --> 00:47:34.960
But for you sysadmins on the training today,

00:47:35.380 --> 00:47:36.920
you know where to get things.

00:47:37.620 --> 00:47:41.780
Well, that being said, now I'm going to open the bin directory again.

00:47:43.880 --> 00:47:47.080
And so, some of 9.5 requires a few things.

00:47:47.780 --> 00:47:50.900
So, when I say we are installing 9.5,

00:47:51.220 --> 00:47:52.700
technically it's installed.

00:47:52.980 --> 00:47:56.120
9.5, when you did that extract zip file,

00:47:56.460 --> 00:47:57.320
you installed it.

00:47:57.320 --> 00:47:59.720
You installed it into that directory

00:47:59.720 --> 00:48:02.240
that, you know, was extracted into.

00:48:03.300 --> 00:48:06.000
You know, that's a positive

00:48:06.000 --> 00:48:10.420
because you don't need to actually do installing onto the operating system.

00:48:11.020 --> 00:48:15.960
9.5, you know, you can run 9.5 without that installer

00:48:15.960 --> 00:48:17.740
and installing it in Windows.

00:48:18.120 --> 00:48:22.300
A lot of times, Windows has some restrictions on what gets installed

00:48:22.300 --> 00:48:24.320
and those types of things.

00:48:25.160 --> 00:48:27.000
So, 9.5 is a very portable.

00:48:27.520 --> 00:48:30.380
You can download it. You can run it.

00:48:30.900 --> 00:48:34.400
You know, your Windows at home or something may block,

00:48:34.400 --> 00:48:37.060
like localhost and those types of connections.

00:48:37.140 --> 00:48:38.320
Usually they don't,

00:48:38.320 --> 00:48:41.920
but there could be some additional security you would need to worry about.

00:48:42.160 --> 00:48:45.140
But for this instance, we should be good to go.

00:48:45.580 --> 00:48:49.300
So, what I like to do is, you know, that is installed.

00:48:49.820 --> 00:48:51.160
It's up and running.

00:48:52.140 --> 00:48:53.580
You know, those types of things.

00:48:53.920 --> 00:48:57.840
So, what I like to do then is actually run 9.5

00:48:57.840 --> 00:49:01.720
and that way I can go in and start looking at it.

00:49:01.720 --> 00:49:04.880
So, you don't necessarily need to follow along,

00:49:04.980 --> 00:49:06.940
but you're more than welcome to.

00:49:07.380 --> 00:49:09.960
So, what I like to do is just double click run 9.5

00:49:09.960 --> 00:49:15.080
and, you know, Windows is going to make sure that I can run it

00:49:15.080 --> 00:49:18.200
and all those things can go from there.

00:49:18.200 --> 00:49:19.860
So, I'm going to say run it.

00:49:20.200 --> 00:49:22.500
It's going to bring up a command line prompt

00:49:23.600 --> 00:49:26.620
and it's going to start, you know,

00:49:26.660 --> 00:49:29.000
it's generating a self-signed certificate,

00:49:29.000 --> 00:49:31.560
you know, those types of things.

00:49:32.520 --> 00:49:34.380
You know, so give it just a minute.

00:49:34.400 --> 00:49:36.020
It's going to come back up and running.

00:49:36.420 --> 00:49:39.280
While I'm waiting on 9.5 to come back,

00:49:39.320 --> 00:49:42.800
again, a lot of this is in the sysadmin guide.

00:49:43.200 --> 00:49:46.080
You know, how to install and start 9.5.

00:49:46.740 --> 00:49:48.200
You know, so here's the Windows.

00:49:48.340 --> 00:49:49.880
Here's the Linux version.

00:49:50.560 --> 00:49:53.960
When 9.5 starts up, the following files and directories are created.

00:49:54.360 --> 00:49:56.680
You know, we talked about these repositories,

00:49:56.680 --> 00:49:58.000
the logs directory.

00:49:58.060 --> 00:50:01.720
There's a work directory, but it's like basically here's the PID,

00:50:01.840 --> 00:50:03.140
which is the process ID.

00:50:03.500 --> 00:50:05.480
Not a lot of information in that one.

00:50:06.500 --> 00:50:11.640
And then the conf directory, this flow.json.gz file is created

00:50:11.640 --> 00:50:18.460
because that's the actual flow files that you've built get saved.

00:50:19.040 --> 00:50:23.540
And so, you know, it makes it where that's quasi-portable as well.

00:50:23.540 --> 00:50:28.820
But that's how it reads what initial flow files it needs to load,

00:50:29.020 --> 00:50:30.180
you know, upon startup.

00:50:31.820 --> 00:50:34.920
The flow.json.gz is empty for us because, you know,

00:50:34.920 --> 00:50:36.380
this is a brand new install.

00:50:36.880 --> 00:50:39.860
But, you know, once we start building some flows

00:50:39.860 --> 00:50:42.340
and those get automatically saved,

00:50:42.840 --> 00:50:45.740
you're going to see the size of that file increase.

00:50:46.960 --> 00:50:49.020
So again, all of that is here.

00:50:49.400 --> 00:50:51.180
You know, if you want to run it on Windows,

00:50:51.180 --> 00:50:56.220
just double-click and, you know, just start NaFi, right?

00:50:57.300 --> 00:51:02.900
There's also a capability to install NaFi as a service

00:51:02.900 --> 00:51:05.240
on both Windows and Linux.

00:51:05.700 --> 00:51:10.920
So when Linux starts, you know, you may have a startup where,

00:51:10.960 --> 00:51:13.120
you know, as the server starts,

00:51:13.420 --> 00:51:15.960
it automatically starts NaFi and it's up and running.

00:51:16.400 --> 00:51:19.560
You may want to with your Windows laptop

00:51:19.560 --> 00:51:21.160
or your Windows machine at home.

00:51:22.000 --> 00:51:24.000
Or, you know, if you have permission, you know,

00:51:24.020 --> 00:51:25.920
to install this at work, you know, at work,

00:51:26.360 --> 00:51:29.900
where you're able to install this as a service

00:51:29.900 --> 00:51:33.840
and then that way, every time your laptop starts,

00:51:34.240 --> 00:51:37.760
it automatically starts Windows or NaFi as well.

00:51:38.420 --> 00:51:43.380
And so, you know, it requires admin rights on the box

00:51:43.380 --> 00:51:47.800
to do the service, you know, so kind of keep that in mind.

00:51:47.800 --> 00:51:50.860
But, you know, you do have that capability.

00:51:52.660 --> 00:51:57.260
But again, you can download the source code

00:51:57.260 --> 00:51:59.800
and build a custom distribution.

00:52:00.280 --> 00:52:05.300
I know a lot of people who do this that deal with the CICD process

00:52:05.300 --> 00:52:07.800
because NaFi is massive.

00:52:08.820 --> 00:52:14.100
You know, we installed it and we started it.

00:52:14.440 --> 00:52:16.860
We haven't even brought the UI up.

00:52:16.860 --> 00:52:19.860
We haven't even built a flow or anything else.

00:52:20.620 --> 00:52:23.940
And the download was 1.26 gig.

00:52:24.360 --> 00:52:30.960
And we are now just extracting it, 2.46 gig.

00:52:31.460 --> 00:52:36.980
So, you know, that's a pretty substantial size application.

00:52:38.080 --> 00:52:42.740
But if you look at like Minify that can go on an edge device,

00:52:43.200 --> 00:52:44.960
that's less than one meg.

00:52:44.960 --> 00:52:50.060
And so, you know, there's a lot of capabilities here,

00:52:50.260 --> 00:52:51.180
a lot of flexibility.

00:52:51.240 --> 00:52:55.660
So I know a lot of people who will build their own distribution

00:52:56.260 --> 00:53:01.100
just so they can make sure they only include processors they need

00:53:01.100 --> 00:53:04.300
and not any of these additional processors

00:53:04.300 --> 00:53:06.720
that will either never be used.

00:53:07.360 --> 00:53:09.960
There are additional assets that need to be managed.

00:53:10.340 --> 00:53:12.040
You know, you got to look at the, you know,

00:53:12.160 --> 00:53:14.020
is there a vulnerability, right?

00:53:14.020 --> 00:53:15.860
Remember the log4j vulnerability?

00:53:16.500 --> 00:53:18.180
I know you guys know about the log4j

00:53:18.180 --> 00:53:20.940
because it was brought up multiple times in the last class.

00:53:22.120 --> 00:53:24.200
But, you know, NaFi, for instance,

00:53:24.740 --> 00:53:29.900
swapped to logback, which is another logging application.

00:53:30.380 --> 00:53:32.960
It's based off of log4j,

00:53:33.100 --> 00:53:36.980
but it was the original contributor to log4j.

00:53:37.140 --> 00:53:41.080
He started, you know, another logging service that's more secure.

00:53:41.080 --> 00:53:44.580
And so NaFi, just so FYI,

00:53:44.580 --> 00:53:47.760
NaFi uses logback instead of log4j.

00:53:48.240 --> 00:53:52.360
Now that's not saying someone can create a processor

00:53:52.360 --> 00:53:58.680
that has some log4j components inside and utilize those.

00:53:59.080 --> 00:54:01.060
So, you know, just keep that in mind.

00:54:01.660 --> 00:54:03.900
But, you know, for security reasons

00:54:03.900 --> 00:54:06.280
and just distribution reasons,

00:54:06.360 --> 00:54:09.040
you may want to build your own from source

00:54:09.040 --> 00:54:11.360
and not include some of these processors,

00:54:12.360 --> 00:54:13.360
you know, and those types of things.

00:54:13.920 --> 00:54:16.900
But for us, we're going to run with what we have.

00:54:18.920 --> 00:54:24.820
Okay, so if you're following along here,

00:54:25.440 --> 00:54:30.280
you should, you know, get a message that, you know,

00:54:30.600 --> 00:54:33.860
your final message should be something like launch Apache NaFi,

00:54:34.060 --> 00:54:36.400
but could not determine the process ID.

00:54:37.180 --> 00:54:38.520
That's totally fine.

00:54:38.520 --> 00:54:39.480
It's just a warning.

00:54:39.880 --> 00:54:42.000
It can't determine the process ID.

00:54:42.500 --> 00:54:45.100
There's some additional configuration we need to do,

00:54:45.160 --> 00:54:46.300
but it's okay.

00:54:46.700 --> 00:54:47.280
It's there.

00:54:48.460 --> 00:54:51.760
So NaFi, again, it does take a few minutes

00:54:51.760 --> 00:54:56.040
on that first time of starting to actually be up and running.

00:54:56.920 --> 00:55:00.960
So even though it tells me that it launched NaFi,

00:55:01.320 --> 00:55:03.160
you know, I could give it a couple of minutes

00:55:03.160 --> 00:55:06.280
just because it is creating the content repository.

00:55:06.280 --> 00:55:09.560
It's creating those logs and everything else.

00:55:10.140 --> 00:55:12.520
And then once it's up and running,

00:55:13.380 --> 00:55:16.640
and, you know, once you get that message back in Windows

00:55:16.640 --> 00:55:21.320
that, you know, it's running and can't find its PID,

00:55:21.420 --> 00:55:22.320
but that's okay.

00:55:23.020 --> 00:55:25.840
What I like to do now is go back and now look,

00:55:26.080 --> 00:55:29.080
and you see we have the different repositories created.

00:55:29.540 --> 00:55:32.840
You know, initially, we only had like five folders.

00:55:33.140 --> 00:55:34.380
We've doubled that.

00:55:34.380 --> 00:55:37.200
We have our provenance repository folder now,

00:55:37.220 --> 00:55:40.360
our flow file, the database, the content.

00:55:40.840 --> 00:55:41.740
We have logs.

00:55:41.860 --> 00:55:42.800
We have work.

00:55:42.900 --> 00:55:43.860
We have run.

00:55:44.260 --> 00:55:45.160
We have state.

00:55:45.680 --> 00:55:47.560
You know, there's a lot of additional.

00:55:48.060 --> 00:55:51.440
But for this exercise, I'm going to go into the logs directory.

00:55:54.680 --> 00:55:59.320
And there's primarily, you know, just a, you know,

00:55:59.400 --> 00:56:01.600
you've got five different logs here,

00:56:01.600 --> 00:56:06.920
but the primary log that you will be working with,

00:56:06.920 --> 00:56:08.840
you know, if you need to work with the logs,

00:56:09.440 --> 00:56:11.620
is the 95 dash app log.

00:56:12.320 --> 00:56:14.500
That's where most of the activity occurs.

00:56:15.020 --> 00:56:19.620
You know, users logging in, data flows being added,

00:56:20.060 --> 00:56:22.940
processors being added, you know,

00:56:23.080 --> 00:56:25.620
data flowing through the system, right?

00:56:25.700 --> 00:56:27.200
Any warnings or errors.

00:56:27.780 --> 00:56:30.260
Also, you'll see when we're building a flow,

00:56:30.260 --> 00:56:34.580
I really like to use the log message processor.

00:56:35.060 --> 00:56:39.880
So when I do that, it will send a log message, you know,

00:56:39.980 --> 00:56:43.860
to this log about a data flow, right?

00:56:44.220 --> 00:56:48.200
And so I like this log, you know, from a sysadmin.

00:56:48.220 --> 00:56:49.880
If I put my sysadmin head on,

00:56:50.400 --> 00:56:52.420
this is my favorite log to look at.

00:56:53.200 --> 00:56:56.740
So with that being said, I am actually going to open this.

00:56:58.040 --> 00:57:00.700
We'll go into more in depth later,

00:57:03.180 --> 00:57:05.740
but of course it's going to tell me it's been changed

00:57:05.740 --> 00:57:07.460
because this is a live log.

00:57:09.040 --> 00:57:12.320
But what I like to do is,

00:57:13.920 --> 00:57:18.160
so I mentioned that NAFA, when, you know,

00:57:18.260 --> 00:57:20.700
up until recently, you could download it,

00:57:20.700 --> 00:57:22.680
you could install it,

00:57:23.040 --> 00:57:25.020
and have it up and running in a few minutes,

00:57:25.020 --> 00:57:27.720
but everybody in the world could access it

00:57:27.720 --> 00:57:29.680
if it was on a public IP or something.

00:57:30.160 --> 00:57:32.520
So what they did is they went through and said,

00:57:32.800 --> 00:57:36.700
okay, we are now going to secure every install.

00:57:37.220 --> 00:57:39.700
We're going to generate a username and password

00:57:41.100 --> 00:57:44.020
that is unique to every install.

00:57:45.180 --> 00:57:46.680
So to find that information,

00:57:46.680 --> 00:57:50.720
you actually have to go into the navi-app.log folder

00:57:50.720 --> 00:57:53.520
and look for username.

00:57:57.780 --> 00:58:00.640
And you're going to see in this log folder

00:58:00.640 --> 00:58:04.000
a generated username and a generated password.

00:58:04.720 --> 00:58:08.740
That is going to be our username and password to log in.

00:58:09.120 --> 00:58:10.680
Yours is going to be different.

00:58:10.800 --> 00:58:14.960
This is a very unique EUID that is generated.

00:58:16.360 --> 00:58:19.260
And so, you know, your username and your password

00:58:19.260 --> 00:58:20.400
is going to be different.

00:58:21.020 --> 00:58:22.760
I'm going through this right now,

00:58:22.860 --> 00:58:25.400
but we, as an exercise, you know,

00:58:25.980 --> 00:58:27.880
I'm going to have you all, you know,

00:58:27.960 --> 00:58:28.960
basically do the same.

00:58:29.340 --> 00:58:30.420
What I like to do,

00:58:30.420 --> 00:58:33.740
because there's no way I can remember that much information,

00:58:34.180 --> 00:58:36.020
is I like to copy it,

00:58:36.060 --> 00:58:39.340
and I will actually put it in a new document,

00:58:39.700 --> 00:58:42.440
because that log file is going to go away.

00:58:42.600 --> 00:58:44.580
You know, as we process data,

00:58:44.780 --> 00:58:47.100
it rolls over to a new log file.

00:58:47.100 --> 00:58:51.520
You know, there's a lot of information in that log file,

00:58:51.620 --> 00:58:55.340
so I like to just pull out that username and password,

00:58:55.600 --> 00:58:58.300
that initial username and password,

00:58:58.860 --> 00:59:00.340
and have it readily available.

00:59:00.700 --> 00:59:04.440
So what I did is I just created a new text file.

00:59:04.800 --> 00:59:08.160
I copied and pasted the username and password,

00:59:08.840 --> 00:59:12.960
and then I'm going to just save it as text,

00:59:13.080 --> 00:59:15.140
and I'll just throw it in my downloads.

00:59:15.960 --> 00:59:22.220
And I'll just name it up in my downloads.

00:59:23.840 --> 00:59:23.940
Perfect.

00:59:24.620 --> 00:59:29.420
So now, you know, I've downloaded NAFA,

00:59:29.720 --> 00:59:32.000
I've extracted NAFA,

00:59:32.600 --> 00:59:34.540
I've double-clicked on run NAFA,

00:59:34.940 --> 00:59:38.140
it went through, it created everything it needed to do

00:59:38.660 --> 00:59:40.440
to get up and running,

00:59:40.980 --> 00:59:44.580
and then, you know, it's up and running now,

00:59:44.580 --> 00:59:46.980
so it's just waiting on me to log in.

00:59:47.520 --> 00:59:50.060
So what I like to do then is I'll bring up my browser,

00:59:52.340 --> 00:59:54.340
and, you know, I like to go,

00:59:54.820 --> 00:59:59.020
if you remember, the IP address was 127.0.1,

00:59:59.040 --> 01:00:00.260
which is localhost,

01:00:00.720 --> 01:00:06.460
and we were on 8443, that port,

01:00:06.840 --> 01:00:09.860
so HTTPS, because it's secure,

01:00:11.000 --> 01:00:12.880
and colon 8443.

01:00:12.880 --> 01:00:17.360
Now, you'll learn that you need to do dash NAFA,

01:00:18.760 --> 01:00:21.220
but to show you what happens,

01:00:21.620 --> 01:00:23.320
let's just go to this one.

01:00:27.320 --> 01:00:30.580
And like I said, the initial running of NAFA

01:00:30.580 --> 01:00:32.140
can take a few minutes,

01:00:32.500 --> 01:00:35.640
so if you are following along,

01:00:35.700 --> 01:00:36.840
and you're trying to do this,

01:00:36.860 --> 01:00:38.820
and you're getting page not found,

01:00:38.820 --> 01:00:41.520
then, you know, I don't know,

01:00:41.820 --> 01:00:44.460
but it also helps that I put in the right port,

01:00:46.520 --> 01:00:47.040
8443.

01:00:47.680 --> 01:00:51.160
But again, you can put in the correct IP address,

01:00:51.320 --> 01:00:53.440
the correct port, and it's still not load.

01:00:54.480 --> 01:00:56.580
On the last class, I noticed, you know,

01:00:56.640 --> 01:00:58.300
even three or four minutes

01:00:58.820 --> 01:01:00.640
before it was fully up and running,

01:01:00.740 --> 01:01:04.160
even though NAFA would report that it's running,

01:01:04.880 --> 01:01:07.620
it still took three or four minutes to initialize.

01:01:07.620 --> 01:01:10.160
Again, we're working in a high-latency

01:01:10.160 --> 01:01:11.860
virtual desktop environment,

01:01:12.820 --> 01:01:14.740
and so your own environment

01:01:14.740 --> 01:01:16.880
may be much better or different

01:01:16.880 --> 01:01:18.140
to allow that to run.

01:01:18.760 --> 01:01:20.540
So anyways, I'm at 127.

01:01:21.200 --> 01:01:22.460
It's going to come back and tell me

01:01:22.460 --> 01:01:24.100
my connection is not private.

01:01:24.160 --> 01:01:27.940
It's a self-signed certificate, right?

01:01:28.060 --> 01:01:29.920
All this was set up just to add

01:01:29.920 --> 01:01:32.700
that username, password, security layer.

01:01:33.700 --> 01:01:35.900
So what I like to do is I'll go advanced,

01:01:35.900 --> 01:01:38.060
and I'll go ahead and proceed.

01:01:38.580 --> 01:01:41.900
And then I didn't specify slash NAFA,

01:01:42.240 --> 01:01:43.660
but it caught it.

01:01:43.780 --> 01:01:46.120
It's automatically going to redirect me,

01:01:46.300 --> 01:01:49.420
and now I will be at the login canvas.

01:01:50.480 --> 01:01:53.540
So it's asking for a username and password.

01:01:58.700 --> 01:02:00.600
I have it right here, luckily.

01:02:02.920 --> 01:02:03.900
That's why I said, you know,

01:02:04.120 --> 01:02:07.040
copy and paste it when we get to that part,

01:02:07.240 --> 01:02:09.440
when we go through this more hands-on.

01:02:09.720 --> 01:02:11.260
Make sure you copy and paste it

01:02:11.260 --> 01:02:12.720
into something a little bit easier.

01:02:13.080 --> 01:02:14.900
That log is going to go away,

01:02:15.220 --> 01:02:17.320
so tomorrow when we log in,

01:02:17.480 --> 01:02:19.780
if you did not copy and paste it somewhere,

01:02:20.260 --> 01:02:22.240
you're going to have to find that old log,

01:02:22.320 --> 01:02:24.120
and we're going to have to get it.

01:02:26.020 --> 01:02:29.900
In the username, the password, log in.

01:02:36.740 --> 01:02:37.360
Perfect.

01:02:37.960 --> 01:02:41.700
We are now back at the application.

01:02:41.780 --> 01:02:44.380
So this is the NAFA application.

01:02:44.780 --> 01:02:46.400
It is web-based.

01:02:47.520 --> 01:02:51.120
You know, there's a lot of buttons

01:02:52.060 --> 01:02:53.460
and a lot of things,

01:02:53.480 --> 01:02:55.620
and we're going to go over every one of those.

01:02:55.940 --> 01:02:58.300
But again, it's a web-based application.

01:02:59.420 --> 01:03:01.060
You know, there's some server technologies

01:03:01.060 --> 01:03:02.740
under the hood that's running this,

01:03:02.740 --> 01:03:05.080
you know, to JD and some other things.

01:03:05.820 --> 01:03:08.200
But, you know, it's all browser-based,

01:03:08.840 --> 01:03:10.660
mostly to work with the data flows.

01:03:11.020 --> 01:03:13.760
But again, there's no point-and-click,

01:03:13.760 --> 01:03:16.660
you know, properties manager,

01:03:16.820 --> 01:03:19.420
so you've got to, you know, hand-edit that.

01:03:19.840 --> 01:03:21.720
You know, a lot of applications,

01:03:21.800 --> 01:03:24.060
you know, you're going to have to edit the properties.

01:03:24.600 --> 01:03:26.120
But once you get it up and running,

01:03:26.560 --> 01:03:31.360
you shouldn't need to go back to the log directory

01:03:31.360 --> 01:03:33.540
or any of those other properties

01:03:33.540 --> 01:03:36.940
unless you, like, have a warning or an error

01:03:36.940 --> 01:03:39.060
that you need to look at in the log directory.

01:03:39.820 --> 01:03:42.160
But if you're running this as a standalone,

01:03:43.300 --> 01:03:46.020
in your spare time on your laptop,

01:03:46.340 --> 01:03:48.720
you know, even at work, you know,

01:03:49.420 --> 01:03:51.120
you probably don't need to go back

01:03:51.120 --> 01:03:52.260
and take a look at those,

01:03:52.360 --> 01:03:54.360
but make sure you keep that username and password.

01:03:55.340 --> 01:03:56.520
So we're logged in.

01:03:57.100 --> 01:03:59.800
I can actually now start building my data flows.

01:03:59.800 --> 01:04:02.800
But what I'm going to do is actually go back

01:04:03.320 --> 01:04:07.180
in my presentation,

01:04:07.500 --> 01:04:12.180
where we talked about some of the core components of NAFA.

01:04:12.680 --> 01:04:15.520
So we talked about processors, connections,

01:04:15.880 --> 01:04:17.280
flow files, flow controller,

01:04:17.920 --> 01:04:19.960
all of these things that we talked about.

01:04:20.860 --> 01:04:22.360
And let's take a look at them.

01:04:22.560 --> 01:04:23.660
Let's look at them.

01:04:24.300 --> 01:04:29.000
Let's, you know, see more about what they are in NAFA.

01:04:29.000 --> 01:04:34.260
So what I like to do is this is your canvas.

01:04:34.340 --> 01:04:36.740
This is a blank canvas.

01:04:37.240 --> 01:04:39.480
So you don't have any processors running.

01:04:39.740 --> 01:04:45.880
You don't have, you know, any of the process verbs.

01:04:45.900 --> 01:04:49.080
You don't have any data flows or anything else.

01:04:49.440 --> 01:04:51.240
You know, you don't have any of that.

01:04:51.840 --> 01:04:55.680
So, you know, it's a blank canvas.

01:04:56.200 --> 01:04:57.940
So this section up here,

01:04:57.940 --> 01:05:00.380
you can see the NAFA logo.

01:05:00.920 --> 01:05:02.340
You know, oh, I want to point out,

01:05:02.580 --> 01:05:05.120
there's my banner that this is a test system.

01:05:05.680 --> 01:05:09.740
So I can put in capital letters unclassified even, right?

01:05:09.780 --> 01:05:11.880
Or I can put dev or test.

01:05:12.520 --> 01:05:15.460
And that property, when NAFA has started,

01:05:15.540 --> 01:05:17.180
it's going to read that property

01:05:17.180 --> 01:05:19.000
and put that as the banner.

01:05:19.920 --> 01:05:23.760
So anyway, so the, you know, this is the main canvas.

01:05:23.760 --> 01:05:28.220
MIUI has multiple tools to create and manage,

01:05:28.220 --> 01:05:30.200
you know, your first data flow.

01:05:30.960 --> 01:05:33.720
So what this is is the components toolbar.

01:05:34.320 --> 01:05:37.760
So if you see, you know, you should see processor.

01:05:38.820 --> 01:05:41.860
You see input port, output port, process group.

01:05:42.400 --> 01:05:45.460
If you just hover over them, remote process group,

01:05:46.260 --> 01:05:49.620
funnels, templates, and labels.

01:05:50.240 --> 01:05:52.940
So, you know, the last group,

01:05:52.940 --> 01:05:58.640
we actually, I did not mention filter or funnel on purpose.

01:05:59.080 --> 01:06:02.080
And the last group was able to actually work it in

01:06:02.080 --> 01:06:05.420
to their, you know, their data flow

01:06:06.180 --> 01:06:08.240
as it was pretty understandable.

01:06:08.300 --> 01:06:09.720
They just referenced the document.

01:06:09.840 --> 01:06:12.800
But anyways, this is your components bar.

01:06:13.140 --> 01:06:17.360
Now, right below your components bar is the status bar.

01:06:17.720 --> 01:06:22.060
So, you know, how many bytes are going in and out of the system, right?

01:06:22.060 --> 01:06:24.560
How many processors are started?

01:06:24.680 --> 01:06:25.700
How many are stopped?

01:06:25.820 --> 01:06:27.060
How many are disabled?

01:06:27.580 --> 01:06:29.100
You know, how many have a warning?

01:06:29.620 --> 01:06:31.520
You know, all of these things.

01:06:31.860 --> 01:06:37.680
Now, the canvas itself only updates automatically every five minutes.

01:06:38.220 --> 01:06:43.580
But at any time, when I, when, you'll hear me say this a few times

01:06:43.580 --> 01:06:47.460
during the, when we're building a hands-on data flow,

01:06:47.460 --> 01:06:52.480
is to go ahead and refresh, you know, your canvas.

01:06:52.900 --> 01:06:54.680
So, when I say refresh your canvas,

01:06:54.900 --> 01:06:58.100
that doesn't mean, you know, go up here and refresh from the browser.

01:06:58.840 --> 01:07:01.580
That's actually just anywhere on this canvas

01:07:01.580 --> 01:07:04.240
without clicking on any component,

01:07:04.740 --> 01:07:06.460
you can hit right-click and hit refresh,

01:07:07.100 --> 01:07:09.620
and it will automatically refresh the stats.

01:07:10.540 --> 01:07:13.100
But anyways, so that is your status bar.

01:07:14.340 --> 01:07:17.040
This is our operate palette.

01:07:18.220 --> 01:07:20.080
You know, and we'll go more into that.

01:07:20.140 --> 01:07:26.240
But that operate palette allows me to control that whole process group.

01:07:26.780 --> 01:07:32.200
And so, if I have a process group right here, I can start, stop,

01:07:32.600 --> 01:07:35.380
I can enable, I can disable, I can, you know,

01:07:35.420 --> 01:07:39.160
I can adjust the properties and those types of things

01:07:39.160 --> 01:07:42.100
right here on my operate palette.

01:07:43.180 --> 01:07:46.000
And so, you know, when we build our data flow,

01:07:46.400 --> 01:07:49.160
we are actually going to create a data flow.

01:07:49.520 --> 01:07:52.220
And then afterwards, we're going to put that data flow

01:07:52.220 --> 01:07:53.920
into a new process group

01:07:53.920 --> 01:07:57.380
to get ready for some additional hands-on data flows.

01:07:58.400 --> 01:08:00.760
And so, we'll go through how to do that.

01:08:01.000 --> 01:08:04.600
But once everything is up and running, your data flow is going,

01:08:04.600 --> 01:08:10.740
you know, you have that capability to just click on that process group

01:08:10.740 --> 01:08:11.720
and say, stop.

01:08:11.980 --> 01:08:13.820
You know, I want to stop the whole thing.

01:08:14.560 --> 01:08:16.940
So, you know, you do have that.

01:08:18.260 --> 01:08:22.820
And some of the other parts and pieces of the NiFi Canvas

01:08:22.820 --> 01:08:24.740
is the global menu.

01:08:25.180 --> 01:08:26.520
So, that's right here.

01:08:27.280 --> 01:08:31.220
So, you know, you have a summary of your data flows,

01:08:31.260 --> 01:08:32.940
how much data is coming in.

01:08:32.940 --> 01:08:35.560
You know, a lot, what you see on the status bar,

01:08:35.700 --> 01:08:37.300
but a lot more detail,

01:08:37.700 --> 01:08:40.960
as well as counters and a bulletin board

01:08:40.960 --> 01:08:44.480
in case of, you know, any kind of messages there.

01:08:45.400 --> 01:08:46.360
You have a new section,

01:08:46.620 --> 01:08:48.560
another section called data provenance.

01:08:48.960 --> 01:08:50.860
So, you know, that way,

01:08:51.260 --> 01:08:53.480
right now we have zero data provenance.

01:08:53.520 --> 01:08:56.140
So, if I click on it, it's going to show zero events

01:08:56.140 --> 01:08:58.300
just because we have yet to do anything.

01:08:59.300 --> 01:09:02.700
But later, we will actually go to the provenance.

01:09:02.760 --> 01:09:04.680
We will dive into that,

01:09:04.700 --> 01:09:07.980
and that's where we are going to be able to replay our data,

01:09:08.200 --> 01:09:10.800
look at the lineage, those types of things.

01:09:12.100 --> 01:09:13.680
You have controller settings.

01:09:14.160 --> 01:09:16.100
We'll go into controller settings,

01:09:16.300 --> 01:09:19.120
but, you know, I mentioned what a controller is already.

01:09:19.660 --> 01:09:24.680
You know, a controller is, you know, that reusable component.

01:09:24.680 --> 01:09:28.620
So, you may have a controller

01:09:28.620 --> 01:09:34.640
that provides a connection to SQL Server,

01:09:35.040 --> 01:09:38.600
and you really don't want to share the username and password

01:09:38.600 --> 01:09:42.320
to everybody that needs to connect to the SQL Server.

01:09:42.800 --> 01:09:44.900
So, what you're able to do is actually create

01:09:44.900 --> 01:09:48.000
a new controller service for SQL Server

01:09:48.000 --> 01:09:52.080
where your sysadmin plugs in the correct information

01:09:52.080 --> 01:09:54.840
they need to connect to that database

01:09:54.840 --> 01:09:57.620
and push data to that database.

01:09:58.080 --> 01:10:02.200
But, you know, you don't want to have your username and password

01:10:02.200 --> 01:10:04.040
running around to just anyone.

01:10:04.600 --> 01:10:07.200
So, a sysadmin can create a service

01:10:07.200 --> 01:10:10.640
that is a SQL Server connection service.

01:10:11.200 --> 01:10:14.760
And so, now, when I build my data flow

01:10:14.760 --> 01:10:17.400
and my colleague builds a data flow,

01:10:17.680 --> 01:10:19.620
you know, you may have a whole team,

01:10:19.620 --> 01:10:22.580
but everybody's having to write back to SQL Server.

01:10:22.960 --> 01:10:25.960
They don't need to worry about the connection details.

01:10:26.160 --> 01:10:27.740
Where is that SQL Server at?

01:10:27.920 --> 01:10:28.840
What port is running?

01:10:29.240 --> 01:10:30.420
You know, the IP address.

01:10:30.700 --> 01:10:33.300
They don't need to worry about username and password

01:10:33.300 --> 01:10:36.660
unless you set this up for them to specify

01:10:36.660 --> 01:10:38.760
that username and password they connect with.

01:10:39.260 --> 01:10:40.840
But you don't need to worry about that.

01:10:40.940 --> 01:10:43.620
There's a few things that once you set this up,

01:10:44.420 --> 01:10:48.320
you know, everyone, if they have the security permissions,

01:10:48.320 --> 01:10:50.680
can access that service.

01:10:50.960 --> 01:10:53.800
So, what they would do is just reference that service

01:10:53.800 --> 01:10:56.780
in their data flow when they get it built.

01:10:57.100 --> 01:11:01.960
We are going to build a CSV service to read CSV.

01:11:02.500 --> 01:11:04.320
We're going to build a JSON service

01:11:04.940 --> 01:11:07.320
to read and write JSON documents,

01:11:07.860 --> 01:11:12.680
as well as a controller service for our registry

01:11:12.680 --> 01:11:14.080
and a few other things.

01:11:14.080 --> 01:11:17.580
But, yeah, so that's data provenance.

01:11:17.600 --> 01:11:19.120
That's controller settings.

01:11:19.980 --> 01:11:21.460
You have parameter context.

01:11:22.340 --> 01:11:25.080
So, you know, you may put a parameter in.

01:11:26.780 --> 01:11:29.640
And depending, you know, in your data flow,

01:11:29.800 --> 01:11:31.740
you could say something like, you know,

01:11:31.740 --> 01:11:34.240
you have a dev parameter, a test parameter,

01:11:34.520 --> 01:11:35.520
a prod parameter.

01:11:36.380 --> 01:11:43.080
And, you know, you may have dev as an IP of 1.1.1.1.

01:11:43.080 --> 01:11:46.080
And test is 2.2.2.

01:11:46.920 --> 01:11:48.480
And prod is 3.3.3.

01:11:48.820 --> 01:11:51.420
So, you have that key value parameter

01:11:51.420 --> 01:11:54.660
that you can reference in your data flows.

01:11:55.080 --> 01:11:58.080
And so, this is a global parameter that can be used.

01:11:59.320 --> 01:12:01.980
And then that way, I can say, you know,

01:12:02.020 --> 01:12:05.260
connect to dev instead of having to put the IP address

01:12:05.260 --> 01:12:08.020
and things like that into the processor.

01:12:08.460 --> 01:12:10.020
You know, I need to say dev,

01:12:10.160 --> 01:12:12.380
and it will automatically know the IP address

01:12:12.380 --> 01:12:14.960
that's associated with that because of that parameter.

01:12:15.120 --> 01:12:17.400
So, that's where you would set your parameters.

01:12:19.480 --> 01:12:22.960
Flow configuration history gives you the history

01:12:22.960 --> 01:12:25.340
of your flow, like data flows.

01:12:25.760 --> 01:12:27.960
We'll go into that so you can see, like,

01:12:28.040 --> 01:12:31.140
when we make changes to things or add, you know,

01:12:31.340 --> 01:12:33.060
it keeps a history of that as well.

01:12:34.280 --> 01:12:38.780
Node status history is basically how many bytes

01:12:38.780 --> 01:12:40.700
are coming in and out of this node.

01:12:40.700 --> 01:12:44.280
How about if you have, like, site-to-site setup

01:12:44.280 --> 01:12:47.940
and some other clustering technologies set up for this,

01:12:48.480 --> 01:12:51.320
you may want to see what your node 1 is doing.

01:12:51.480 --> 01:12:54.360
You may want to look at node 12, you know,

01:12:54.360 --> 01:12:55.420
those types of things.

01:12:56.100 --> 01:12:58.120
So, this gives us our status history.

01:12:59.040 --> 01:13:02.700
Templates, I've went over templates a little bit already,

01:13:03.760 --> 01:13:07.760
but, you know, templates are there for, you know,

01:13:07.760 --> 01:13:11.480
re-usability, share things with your colleagues.

01:13:12.500 --> 01:13:13.940
You know, you can build a data flow,

01:13:14.000 --> 01:13:17.340
save it as a template, export that template out,

01:13:17.880 --> 01:13:19.280
and send it to a colleague.

01:13:19.500 --> 01:13:24.220
They can import it and, you know, run that same data flow.

01:13:25.380 --> 01:13:26.160
You have help.

01:13:26.660 --> 01:13:31.000
Like I mentioned, this documentation should look

01:13:31.000 --> 01:13:33.980
very, very similar to what's online.

01:13:39.980 --> 01:13:43.020
The reason that it is, you know,

01:13:43.080 --> 01:13:46.640
because, you know, the documentation is shipped

01:13:46.640 --> 01:13:52.020
with NaFa, so a lot of times we have,

01:13:52.020 --> 01:13:56.960
you know, closed systems, you know,

01:13:56.960 --> 01:14:01.980
we have one-way transfers and things like that.

01:14:02.180 --> 01:14:04.540
We have, you know, you have systems

01:14:04.540 --> 01:14:06.860
that don't ever touch the internet,

01:14:07.140 --> 01:14:09.640
and they're on, you know, their own closed network.

01:14:10.720 --> 01:14:14.080
So, you know, you may not have access to the internet

01:14:14.080 --> 01:14:16.800
in your NaFa instance, and because of that,

01:14:17.220 --> 01:14:21.580
NaFa ships with all of the documentation you see online.

01:14:22.320 --> 01:14:25.420
So, as new releases of NaFa come out,

01:14:25.420 --> 01:14:28.500
the documentation has to be updated as well

01:14:28.500 --> 01:14:30.240
for it to be a proper release.

01:14:31.040 --> 01:14:32.540
And so, you know, when you go to help,

01:14:32.980 --> 01:14:36.440
you're going to be able to go through the documentation.

01:14:37.540 --> 01:14:41.240
You know, if you want to do a delete DynamoDB processor,

01:14:41.320 --> 01:14:43.440
right, and you need to understand,

01:14:43.440 --> 01:14:45.820
you know, the properties and things like that,

01:14:46.100 --> 01:14:47.440
you know, here it is,

01:14:47.460 --> 01:14:49.320
without ever having to go to the internet.

01:14:51.440 --> 01:14:53.000
And then, of course, you have a balance.

01:14:53.000 --> 01:14:54.660
A balance is easy.

01:14:55.260 --> 01:14:59.440
So, this is version 126.0 of NaFa.

01:15:00.260 --> 01:15:03.400
It was built on May 3rd.

01:15:03.840 --> 01:15:06.580
It was tagged as release candidate one.

01:15:07.700 --> 01:15:10.480
And, you know, the branch and everything else,

01:15:10.600 --> 01:15:13.000
you know, you can actually pull a lot of information.

01:15:13.760 --> 01:15:19.820
So, you know, again, if you go to GitHub, for instance,

01:15:23.860 --> 01:15:25.860
I wonder if it'll let me search.

01:15:28.020 --> 01:15:30.260
And it starts NaFa.

01:15:31.580 --> 01:15:33.320
NaFa's GitHub repo,

01:15:33.460 --> 01:15:36.560
where all the NaFa source code is located here.

01:15:36.980 --> 01:15:39.620
And so, you know, this is the main branch,

01:15:39.780 --> 01:15:41.780
but, you know, you can go through

01:15:41.780 --> 01:15:44.140
and see all the different branches,

01:15:44.540 --> 01:15:46.620
release candidates, those types of things.

01:15:46.780 --> 01:15:50.440
Here is the source code to all of NaFa.

01:15:50.440 --> 01:15:53.280
So, not only can you download it from that link earlier,

01:15:53.620 --> 01:15:57.440
you can do a Git clone if you are familiar with GitHub

01:15:57.440 --> 01:15:59.060
and Git and others,

01:15:59.480 --> 01:16:02.100
clone this and build it yourself as well.

01:16:02.740 --> 01:16:05.460
You know, so, you know, just keep that in mind.

01:16:05.500 --> 01:16:07.000
Again, it's very open.

01:16:07.220 --> 01:16:08.780
It's very well supported.

01:16:08.900 --> 01:16:11.440
There's a lot of documentation for it

01:16:12.020 --> 01:16:13.100
and things like that.

01:16:13.660 --> 01:16:17.140
So, that's the help section.

01:16:17.140 --> 01:16:23.440
So, that is an overview of the canvas

01:16:23.440 --> 01:16:27.920
and all of the components on the canvas.

01:16:28.740 --> 01:16:33.040
And so, before I start diving into, you know,

01:16:33.180 --> 01:16:35.680
some of the finer workings of NaFa,

01:16:35.680 --> 01:16:37.300
I want to pause there.

01:16:37.640 --> 01:16:42.360
Is there any questions I can answer up until this point?

01:16:44.720 --> 01:16:48.220
Well, hopefully, I'm teaching so well

01:16:48.220 --> 01:16:50.280
that it's very clear and understandable.

01:16:50.840 --> 01:16:52.360
I always worry about my southern accent,

01:16:53.120 --> 01:16:54.820
you know, playing a part in this.

01:16:55.520 --> 01:16:57.020
So, but again, if you have a question,

01:16:57.220 --> 01:16:58.360
feel free to interrupt me

01:16:59.340 --> 01:17:02.460
or you don't, like, I need to translate something

01:17:02.460 --> 01:17:04.440
or speak proper English, you know,

01:17:04.440 --> 01:17:06.080
just feel free to yell at me.

01:17:07.280 --> 01:17:08.920
I got one quick question.

01:17:08.920 --> 01:17:09.720
Yeah, go ahead, Tom.

01:17:09.720 --> 01:17:13.340
I'm understanding that when you run the command,

01:17:13.820 --> 01:17:17.880
I do see that, and I've run this on a container before,

01:17:17.960 --> 01:17:21.160
so I've seen during the execution of the,

01:17:21.640 --> 01:17:24.320
you'll see the password in there, right?

01:17:24.680 --> 01:17:26.460
But I'm sorry, I missed a part where,

01:17:26.660 --> 01:17:28.220
let's just say you don't see that here.

01:17:28.480 --> 01:17:31.100
How do you, is it written in the log

01:17:31.720 --> 01:17:33.520
or you can go and retrieve that username and password?

01:17:33.760 --> 01:17:35.080
No, that's a great question.

01:17:35.200 --> 01:17:37.500
So, when we start NaFa,

01:17:37.500 --> 01:17:40.860
it's automatically going to create this log

01:17:40.860 --> 01:17:43.220
called nafi-app.log,

01:17:43.640 --> 01:17:45.500
and that's where almost,

01:17:46.160 --> 01:17:49.840
that's where 99% of NaFa activity is writing to this log.

01:17:50.380 --> 01:17:53.440
And so, yes, on that first install,

01:17:53.820 --> 01:17:56.740
you're going to see, you know,

01:17:56.960 --> 01:17:59.860
generated username, generated password,

01:18:00.140 --> 01:18:02.260
and it's only going to be in the logs.

01:18:02.260 --> 01:18:05.840
The problem with, do I?

01:18:07.120 --> 01:18:08.840
No, I was just saying, yep, I see it there.

01:18:09.080 --> 01:18:09.360
Okay.

01:18:10.080 --> 01:18:11.940
You know, the problem with that is,

01:18:12.320 --> 01:18:15.420
we're going to start doing some hands-on exercises

01:18:15.420 --> 01:18:17.180
and some work here,

01:18:17.420 --> 01:18:19.860
and so that log is going to roll over,

01:18:20.240 --> 01:18:22.880
and so it's going to rename this old log,

01:18:23.080 --> 01:18:26.000
give it, I think, a date at the end of the log,

01:18:26.260 --> 01:18:27.900
and it's going to start a fresh one.

01:18:28.300 --> 01:18:30.980
And so, if you do not capture

01:18:30.980 --> 01:18:33.760
your username and password pretty quickly,

01:18:34.040 --> 01:18:36.720
it's going to be in another log,

01:18:37.140 --> 01:18:40.380
and it could be in, you know,

01:18:40.580 --> 01:18:42.620
a log that was generated days ago,

01:18:42.780 --> 01:18:45.240
if you didn't, you know, set everything up, right?

01:18:45.320 --> 01:18:48.720
Or, you know, you may go in and put a data flow in

01:18:48.720 --> 01:18:50.220
and run it.

01:18:50.400 --> 01:18:52.280
It's generating all these log messages,

01:18:52.300 --> 01:18:55.100
and now your username and password

01:18:55.100 --> 01:18:57.360
is sitting in a five-day old log file.

01:18:57.760 --> 01:18:59.900
So that is where you initially

01:18:59.900 --> 01:19:01.680
get your username and password,

01:19:02.160 --> 01:19:04.280
but do know that it can go away,

01:19:04.960 --> 01:19:06.920
especially if we're doing a lot of operations

01:19:06.920 --> 01:19:07.840
very quickly.

01:19:09.200 --> 01:19:09.980
Great question.

01:19:10.400 --> 01:19:12.800
And then we are going to, you know,

01:19:12.860 --> 01:19:15.960
go through installing and getting it up and running

01:19:15.960 --> 01:19:17.440
and all that fun stuff.

01:19:17.600 --> 01:19:19.400
If you didn't follow along,

01:19:19.940 --> 01:19:22.260
you know, I like to kind of go ahead

01:19:22.260 --> 01:19:23.780
and show you what we're doing,

01:19:23.860 --> 01:19:25.900
and then that way we can all hands-on.

01:19:26.220 --> 01:19:28.580
That's where we're going to get our username and password.

01:19:28.580 --> 01:19:32.540
You can find that log in the logs directory,

01:19:33.980 --> 01:19:35.640
and it's 95-app.

01:19:36.820 --> 01:19:37.800
Let's see if I do it right.

01:19:38.060 --> 01:19:40.000
Yeah, I haven't generated enough data

01:19:40.000 --> 01:19:44.460
for it to roll over, but, you know, tomorrow,

01:19:44.580 --> 01:19:47.600
I bet there's going to be a 95-app.log

01:19:48.280 --> 01:19:51.280
5, 21, 2024, something like that.

01:19:54.340 --> 01:19:56.640
Okay, any other questions?

01:19:59.880 --> 01:20:00.480
All right.

01:20:00.760 --> 01:20:03.280
So what I'm going to do is kind of go through

01:20:03.800 --> 01:20:06.600
the more in-depth of the components

01:20:07.480 --> 01:20:10.040
and, you know, go through some of those things.

01:20:10.700 --> 01:20:13.320
We will then take a break and go to lunch

01:20:13.320 --> 01:20:15.280
and come back from lunch

01:20:16.000 --> 01:20:18.620
and, you know, get everyone else up and running

01:20:18.620 --> 01:20:21.500
and, you know, get your own version of NaPhi going

01:20:21.500 --> 01:20:23.380
so we can start building some data flows.

01:20:24.580 --> 01:20:26.100
You know, so that being said,

01:20:26.360 --> 01:20:29.620
you know, on the components toolbar,

01:20:30.300 --> 01:20:32.720
the first thing I have is processors.

01:20:33.260 --> 01:20:34.820
So I actually just click that

01:20:34.820 --> 01:20:36.480
and hold it and drag it down,

01:20:36.920 --> 01:20:39.720
and here are all of my processors.

01:20:40.340 --> 01:20:43.100
So, you know, with this version of NaPhi,

01:20:43.220 --> 01:20:47.620
this install, I have 359 processors available.

01:20:47.620 --> 01:20:51.620
So, you know, I have processors to handle Amazon,

01:20:53.060 --> 01:20:57.020
Azure, you know, AWS tags, JSON, CSV,

01:20:57.720 --> 01:21:00.480
you know, all kinds of things.

01:21:00.500 --> 01:21:03.860
So what you're seeing here is just like a word cloud,

01:21:03.860 --> 01:21:06.180
you know, from all the processors

01:21:06.180 --> 01:21:08.220
and those types of things.

01:21:09.360 --> 01:21:12.000
So then you also have, you know,

01:21:12.060 --> 01:21:14.440
a list of all your processors

01:21:14.440 --> 01:21:18.380
and the, you know, the description.

01:21:19.180 --> 01:21:22.020
Just because it was asked the last time,

01:21:22.780 --> 01:21:25.960
you will see the shield,

01:21:26.080 --> 01:21:29.300
the little red and white shield beside the processor

01:21:29.300 --> 01:21:32.000
is specifically called out

01:21:32.000 --> 01:21:36.560
because you can now create a policy and security within NaPhi

01:21:36.560 --> 01:21:41.180
that will allow you to lock down certain processors.

01:21:41.180 --> 01:21:44.220
You know, so for this one,

01:21:44.740 --> 01:21:50.520
this is a reference remote resources processor.

01:21:50.660 --> 01:21:54.020
So it falls within that reference remote resources.

01:21:54.520 --> 01:21:55.840
And so because of that,

01:21:55.920 --> 01:21:58.700
you may set a policy that says, you know,

01:21:59.600 --> 01:22:03.060
my data engineers cannot, you know,

01:22:03.240 --> 01:22:07.060
see these processors in this group

01:22:07.060 --> 01:22:09.020
because, you know, they're not needed

01:22:09.020 --> 01:22:12.180
and, you know, for security reasons, you know,

01:22:12.180 --> 01:22:13.820
we're just not going to allow that.

01:22:14.220 --> 01:22:16.700
Or you may have it where, you know,

01:22:16.700 --> 01:22:19.780
I have database admins that need access to this group

01:22:19.780 --> 01:22:22.920
that contains the database connection details

01:22:22.920 --> 01:22:25.320
and those types of things to set it up.

01:22:25.920 --> 01:22:29.220
But another group doesn't have access to it,

01:22:29.240 --> 01:22:30.200
doesn't need it.

01:22:30.440 --> 01:22:32.980
They can just reference, you know,

01:22:33.180 --> 01:22:39.000
that processor from a controlling service, right,

01:22:39.000 --> 01:22:41.300
you know, so that is a reason for that little shield.

01:22:42.220 --> 01:22:44.800
But anyway, so all of these are processors,

01:22:46.540 --> 01:22:47.580
359 processors.

01:22:48.160 --> 01:22:52.900
And the one I like to really start with is a Git file.

01:22:53.700 --> 01:22:55.340
So, you know, as you can imagine,

01:22:55.560 --> 01:22:56.960
there's 359 processors.

01:22:57.040 --> 01:22:58.080
It's going to be hard, you know,

01:22:58.080 --> 01:23:01.400
I can scroll through this to see Git file.

01:23:02.160 --> 01:23:04.220
Sometimes I'll skip over it.

01:23:04.220 --> 01:23:06.260
But you can use the filter.

01:23:06.260 --> 01:23:10.700
So I can do Git F and I can do a Git.

01:23:10.720 --> 01:23:13.780
So I'm going to pull Git FTP or Git file.

01:23:14.880 --> 01:23:17.800
You know, that's a really nice way to narrow it down.

01:23:18.120 --> 01:23:22.800
I can use the, you know, little tag cloud here.

01:23:24.120 --> 01:23:26.900
And I want to see all processors with Git

01:23:26.900 --> 01:23:28.100
in the description, right?

01:23:28.140 --> 01:23:32.220
And there should be my Git file right here.

01:23:32.220 --> 01:23:36.580
So, you know, that's how you would select the processor.

01:23:37.340 --> 01:23:39.160
So what I like to do though is,

01:23:39.420 --> 01:23:41.460
I don't even know the name of the processor,

01:23:41.740 --> 01:23:43.320
so I'm going to say Git file.

01:23:44.160 --> 01:23:45.420
I see it.

01:23:45.440 --> 01:23:46.320
It's highlighted.

01:23:46.740 --> 01:23:47.760
I say add.

01:23:49.180 --> 01:23:49.460
Boom.

01:23:49.820 --> 01:23:51.960
New processor on my canvas.

01:23:52.040 --> 01:23:56.080
So this processor is just the Git file processor.

01:23:57.160 --> 01:23:59.420
You know, it's got a single function.

01:23:59.420 --> 01:24:04.320
Its function is to pick files up

01:24:04.320 --> 01:24:07.940
and retrieve those from the file system.

01:24:08.860 --> 01:24:11.480
You know, it's not trying to extract things.

01:24:11.740 --> 01:24:15.300
It's not, you know, doing any kind of ETL.

01:24:15.320 --> 01:24:17.000
It's not a model or anything else.

01:24:17.040 --> 01:24:20.520
This processor is doing one function and one function only,

01:24:20.780 --> 01:24:22.160
and it does it very well.

01:24:22.240 --> 01:24:24.040
And that's the Git file.

01:24:26.000 --> 01:24:30.200
Also, within a processor, you can see again that little shield

01:24:30.200 --> 01:24:33.640
that belongs to a group that, you know,

01:24:33.640 --> 01:24:38.000
you can imagine you may have a convert text processor, right?

01:24:38.280 --> 01:24:42.720
You know, from a security aspect, that's a very low risk,

01:24:42.880 --> 01:24:45.880
you know, just because you're converting data

01:24:45.880 --> 01:24:47.860
that you already pulled in

01:24:47.860 --> 01:24:51.680
and you're converting it to other formats and sending it out.

01:24:51.680 --> 01:24:56.080
But, you know, you're not, you know, this, you know,

01:24:56.280 --> 01:24:58.120
a convert text processor, for instance,

01:24:58.580 --> 01:25:00.480
it doesn't have the connection details.

01:25:00.560 --> 01:25:02.140
It doesn't, can't get a file.

01:25:02.360 --> 01:25:03.640
It can't put a file.

01:25:04.260 --> 01:25:07.200
It can't connect to a database or anything else like that.

01:25:07.560 --> 01:25:11.680
So because this one can actually get data,

01:25:12.180 --> 01:25:14.820
you know, there is a security group for it.

01:25:15.440 --> 01:25:19.120
You may want to, you know, depending on your security policies,

01:25:19.120 --> 01:25:22.400
you may want to lock this down where, you know,

01:25:22.560 --> 01:25:26.420
folks can't do a Git file or a put file.

01:25:27.360 --> 01:25:29.940
You know, they can build in the logic of the data flows

01:25:29.940 --> 01:25:31.220
and everything else,

01:25:31.280 --> 01:25:34.420
and they may get their data from another processor.

01:25:34.940 --> 01:25:40.100
And then that way, you know, you run the risk of,

01:25:40.100 --> 01:25:41.800
you know, someone doing a Git file.

01:25:41.940 --> 01:25:44.440
We actually had this happen on the last class

01:25:44.440 --> 01:25:45.620
with a couple of people

01:25:45.620 --> 01:25:48.980
where during the exercise we put Git file.

01:25:49.600 --> 01:25:54.300
They specified the same directory as NAFA to Git.

01:25:54.780 --> 01:25:58.000
They told it to not keep the source file.

01:25:59.020 --> 01:26:02.580
And so they also told it to ingest everything.

01:26:03.140 --> 01:26:05.600
And so what they did is they built a flow

01:26:05.600 --> 01:26:07.420
that did a self-destruction.

01:26:07.860 --> 01:26:10.780
And so what it did is, you know, they run that flow.

01:26:11.260 --> 01:26:15.020
That file went and grabbed everything in the directory

01:26:15.020 --> 01:26:20.220
and, you know, of itself passed that data to the next flow file

01:26:20.220 --> 01:26:23.780
and then it crashed because, you know, it just couldn't work

01:26:23.780 --> 01:26:26.520
because it consumed itself.

01:26:27.040 --> 01:26:30.960
And so, you know, there is some security thoughts

01:26:30.960 --> 01:26:33.200
that go into this, you know,

01:26:33.320 --> 01:26:35.560
as you're planning this deployment out.

01:26:36.520 --> 01:26:40.000
But anyway, so that is our Git file processor.

01:26:40.780 --> 01:26:42.480
You know, you can take a look at it.

01:26:42.480 --> 01:26:45.060
It's going to give you some real quick information.

01:26:46.080 --> 01:26:47.320
How many bytes came in?

01:26:47.580 --> 01:26:49.020
How many bytes read and write?

01:26:49.620 --> 01:26:50.840
How many bytes went out?

01:26:51.380 --> 01:26:55.560
And how many tasks and the time it took to execute those tasks?

01:26:56.080 --> 01:26:59.180
All of this, again, is in the last five minutes.

01:26:59.600 --> 01:27:01.820
But if you hit refresh on the canvas,

01:27:01.860 --> 01:27:05.380
so I click off of that processor and hit refresh,

01:27:05.620 --> 01:27:08.800
if data was flowing through, that would be updated.

01:27:08.800 --> 01:27:15.140
And so, you know, that's how you would get a quick refresh

01:27:15.140 --> 01:27:17.460
of what's going on with that processor.

01:27:18.000 --> 01:27:21.140
Now, every processor, you should be able to click on it.

01:27:21.140 --> 01:27:24.540
It will do a little black box around it to highlight it

01:27:24.540 --> 01:27:27.980
and then right-click on it and you have options.

01:27:28.460 --> 01:27:33.200
So the option that we will use most is probably configure,

01:27:33.700 --> 01:27:35.740
so we can actually configure the processor.

01:27:35.740 --> 01:27:38.140
You know, there is a disable.

01:27:38.280 --> 01:27:39.380
If you want to disable it,

01:27:39.660 --> 01:27:43.760
you want to view the data provenance for this specific processor.

01:27:44.200 --> 01:27:50.160
You can replay the last event through the processor as well.

01:27:50.520 --> 01:27:53.780
You can view the status, the usage, its connections.

01:27:54.360 --> 01:27:55.960
You can center it in view.

01:27:56.000 --> 01:27:59.260
You can change the color of that processor.

01:27:59.880 --> 01:28:04.540
So, you know, we're going to get into, you know, some of this.

01:28:04.540 --> 01:28:11.020
But just for FYI, you know, the hands-on exercise,

01:28:11.520 --> 01:28:16.980
one of the things I look for is some of these, like, you know,

01:28:17.160 --> 01:28:21.620
coloring, you know, labels, naming conventions.

01:28:22.060 --> 01:28:26.400
You know, some of these types of things that are very non-technical,

01:28:26.760 --> 01:28:31.000
but, you know, I look for those just because of usability,

01:28:31.500 --> 01:28:33.280
ease of use, and those types of things.

01:28:33.280 --> 01:28:36.220
So anyway, so that's my Git file.

01:28:36.560 --> 01:28:39.200
I have my configured, disabled provenance.

01:28:40.160 --> 01:28:41.300
I can group them.

01:28:41.420 --> 01:28:42.560
I can create a template.

01:28:42.700 --> 01:28:46.340
I can select multiple processors and create a template.

01:28:46.860 --> 01:28:48.980
I can copy it and paste it.

01:28:48.980 --> 01:28:50.640
I can delete it.

01:28:51.020 --> 01:28:54.760
But for this scenario, I want to say configure.

01:28:55.200 --> 01:29:00.020
So this is how I configure that specific processor.

01:29:01.020 --> 01:29:04.220
You know, it has a name, Git file.

01:29:04.780 --> 01:29:09.840
Now, you know, I don't like that Git file name because, you know,

01:29:09.840 --> 01:29:11.380
it doesn't tell me a whole lot.

01:29:11.820 --> 01:29:16.080
If I had a data engineer looking at my flow, you know,

01:29:16.080 --> 01:29:18.280
I want them to be able to look at my flow,

01:29:19.000 --> 01:29:23.020
quickly understand what's going on and how this maps together,

01:29:24.140 --> 01:29:27.540
and that way they can accomplish the task that they need to do.

01:29:27.540 --> 01:29:33.240
So what I like to do is I go into my name, you know,

01:29:33.240 --> 01:29:37.540
during the configuration and I'll say Git file from system.

01:29:41.300 --> 01:29:42.900
So there we go.

01:29:43.820 --> 01:29:48.780
That is an easier, more human readable description of what

01:29:48.780 --> 01:29:51.200
this file, this processor is going to do.

01:29:52.280 --> 01:29:57.100
Also, you know, if there is a penalty or error or something

01:29:57.100 --> 01:30:00.620
else like that, it will penalize the flow file.

01:30:00.740 --> 01:30:04.840
And this is basically the duration is how long you want that penalized.

01:30:05.580 --> 01:30:08.800
So right now it's set, everything is default to 30 seconds.

01:30:09.160 --> 01:30:13.220
After 30 seconds, it's going to retry and reprocess that flow file.

01:30:14.420 --> 01:30:17.680
But, you know, 30 second penalty.

01:30:18.780 --> 01:30:20.200
The bullet level, you know,

01:30:20.240 --> 01:30:23.440
what kind of logging do we want from this processor?

01:30:23.440 --> 01:30:30.060
You know, we may, we may, you know,

01:30:30.280 --> 01:30:32.300
the bulletin level is set to warn.

01:30:32.860 --> 01:30:36.900
But if you want to log everything, you may put it at debug.

01:30:37.740 --> 01:30:40.100
Most times you keep it at warn or error.

01:30:41.100 --> 01:30:46.560
And so what that means is if this processor has a warning or error,

01:30:46.560 --> 01:30:50.920
it is going to push that to the NiFi dash app log.

01:30:52.080 --> 01:30:53.060
You know, in that area.

01:30:54.440 --> 01:30:59.220
So, so, you know, if you're building a flow file for your first time,

01:30:59.280 --> 01:31:03.060
you may put debug and that is going to log everything.

01:31:03.620 --> 01:31:08.260
You usually do not need that much detail, but, you know,

01:31:08.420 --> 01:31:10.480
it's there in case you need to set around one,

01:31:10.640 --> 01:31:12.300
but in about 15, 20 minutes.

01:31:14.320 --> 01:31:16.300
Okay. And then you have yield duration,

01:31:17.120 --> 01:31:22.300
just how long that that is going to yield before it's scheduled to do it again.

01:31:23.240 --> 01:31:27.220
You know, so, so one second is pretty standard.

01:31:27.920 --> 01:31:32.800
Again, you may change these settings when you start building your own data flows,

01:31:32.980 --> 01:31:35.400
you know, at, you know, real world.

01:31:35.400 --> 01:31:44.540
But most of the time these, these properties all stay the same,

01:31:44.880 --> 01:31:49.080
except for the name, you know, the name part of this scheduling.

01:31:49.760 --> 01:31:53.680
There's a couple of scheduling strategies.

01:31:54.140 --> 01:31:57.420
There's a timer driven, a cron driven.

01:31:57.880 --> 01:32:03.240
So you can set this, you know, most all processors default to a timer.

01:32:03.240 --> 01:32:08.660
So it's going to run every, you know, it's running constantly.

01:32:09.100 --> 01:32:10.800
So you can actually set a run schedule.

01:32:10.800 --> 01:32:14.020
It says, Hey, I won't run this processor every one second.

01:32:14.400 --> 01:32:17.860
I want to run this processor every 10 minutes or 10 hours.

01:32:18.380 --> 01:32:27.580
You know, so what it will do is that scheduling strategy is going to,

01:32:27.580 --> 01:32:33.280
is going to dictate, you know, the running of this processor.

01:32:33.320 --> 01:32:36.380
You may have a cron where it runs,

01:32:36.540 --> 01:32:41.340
this processor runs only between 10 p.m. and 11 p.m.

01:32:41.860 --> 01:32:44.560
with a run schedule of every one minute.

01:32:45.060 --> 01:32:48.300
And so, you know, it's going to run 60 times during that hour.

01:32:49.380 --> 01:32:54.780
You may have the concurrent tasks is how many tasks.

01:32:54.780 --> 01:33:02.480
So this processor is doing a get file from, and it's running one task to get file.

01:33:02.940 --> 01:33:07.780
Now, one of the things that I had the class do last time is,

01:33:08.400 --> 01:33:14.040
is actually pick up and get file, pick up that 1.5,

01:33:14.200 --> 01:33:18.520
1.2 gig zip file and decompress it.

01:33:20.540 --> 01:33:27.520
And so we had a few folks where the file got duplicated or they picked up everything.

01:33:28.720 --> 01:33:33.340
And so, you know, what happened was it kind of slowed the system down.

01:33:33.700 --> 01:33:36.860
It was taking a while to pick things up and send them off.

01:33:37.380 --> 01:33:40.820
And so, you know, because it was processing large amounts of data.

01:33:41.380 --> 01:33:46.700
But if they wanted to make that quicker, you know, the run schedule is already running full speed.

01:33:46.700 --> 01:33:52.700
So but if they wanted to make that quicker, they could have put the concurrent tasks at five.

01:33:53.340 --> 01:33:57.700
We gave five concurrent tasks to execute this.

01:34:05.240 --> 01:34:07.420
Property. So this is the big one.

01:34:07.420 --> 01:34:11.680
This is this is the configuration for the processor itself.

01:34:11.680 --> 01:34:18.740
And so, you know, if it's bolded, it is a required field.

01:34:19.540 --> 01:34:27.280
So, you know, for this processor, the get file processor, of course, it needs to know where to go to get that file.

01:34:27.700 --> 01:34:31.040
So it has an input directory right now.

01:34:31.200 --> 01:34:34.640
It's blank. And I need to feed it a value.

01:34:34.860 --> 01:34:39.600
So what I like to do is go right here.

01:34:39.600 --> 01:34:43.660
You know, and again, we're going to go down, you know, through this.

01:34:45.920 --> 01:34:49.720
Let's see what I can do is pull.

01:34:54.680 --> 01:35:01.380
We'll reuse this sample data I have for another scenario later on.

01:35:06.800 --> 01:35:11.980
So what I like to do is put this here.

01:35:22.560 --> 01:35:23.960
All right.

01:35:23.960 --> 01:35:28.560
So I have now a folder with data sitting in it.

01:35:29.560 --> 01:35:31.100
And so let me go up here.

01:35:31.240 --> 01:35:33.700
Here is my file path. Right.

01:35:33.780 --> 01:35:37.580
It's in C colon user student downloads weather data.

01:35:37.580 --> 01:35:42.060
So I actually just take that and copy it.

01:35:44.620 --> 01:35:45.700
Paste it in.

01:35:46.080 --> 01:35:46.580
Say OK.

01:35:46.920 --> 01:35:55.060
So that is the input directory for my get file processor to to get in spots.

01:35:56.420 --> 01:36:02.560
You know, we'll go into more detail when, you know, we're building some flows.

01:36:02.560 --> 01:36:12.280
But one of the things that, you know, people up all the time is it says keep source file and they put false.

01:36:12.720 --> 01:36:14.640
I'm going to change that to true.

01:36:14.740 --> 01:36:29.400
I want to keep that source file exactly the way it is because, you know, I don't want to take a chance on picking that file up, sending it to another processor and doing these different operations.

01:36:29.400 --> 01:36:32.480
And then somehow I've messed something up.

01:36:32.880 --> 01:36:37.720
Well, now I don't have the source file because, you know, I told it no to keep source file.

01:36:38.420 --> 01:36:44.400
It went through a process and it's now corrupt or, you know, I didn't do something correct.

01:36:45.240 --> 01:36:46.860
You know, one of those types of things.

01:36:47.500 --> 01:36:54.260
So, you know, from the beginning of creating a flow, a data flow, I like to keep the source file.

01:36:54.260 --> 01:37:04.880
Once I've tested this and I'm very sure that it's working and I don't have a requirement or a need to keep the source file, I'll turn that false.

01:37:05.380 --> 01:37:13.140
But for this exercise, we're just actually going through the components of a processor and the menus and stuff like that.

01:37:13.420 --> 01:37:16.080
We're not really worried about data flow building right this minute.

01:37:16.080 --> 01:37:23.740
So anyway, so there's other, you know, properties for this get file processor.

01:37:24.080 --> 01:37:28.560
I can tell it to filter on the file.

01:37:28.860 --> 01:37:33.080
So only when I can put a filter in to say only pick up CSVs.

01:37:34.120 --> 01:37:36.340
How, you know, the polling interval.

01:37:36.560 --> 01:37:39.080
Do I want to recurse some directories?

01:37:40.040 --> 01:37:44.160
You know, is there a minimum or maximum file size or age?

01:37:44.160 --> 01:37:45.980
You know, those types of things.

01:37:46.940 --> 01:37:51.380
You know, so again, I am now bringing in data.

01:37:51.380 --> 01:37:57.440
I've got a processor that's getting a file based upon what I tell it to do in the configuration.

01:37:57.960 --> 01:38:02.480
Bring that file up and ready to send it to a next processor.

01:38:02.880 --> 01:38:05.180
And I did not have to write any code.

01:38:05.560 --> 01:38:09.840
I was pointing and clicking and filling in properties and calling it a day.

01:38:10.400 --> 01:38:12.200
So that is the properties.

01:38:13.300 --> 01:38:16.700
All processors have a relationship.

01:38:18.360 --> 01:38:25.660
So, you know, that's either the relationship for a get file is usually success.

01:38:26.760 --> 01:38:30.160
If it can't get the file, like, you know, it doesn't have permission.

01:38:30.600 --> 01:38:31.760
It doesn't know.

01:38:32.060 --> 01:38:36.200
It doesn't have only is going to read files that it has access to.

01:38:37.330 --> 01:38:46.570
Now, you know, so it doesn't really have a failure path just because that processor just is pulling and pulling and pulling.

01:38:47.030 --> 01:38:50.410
Some of the other processors have different relationships.

01:38:50.970 --> 01:38:53.190
One is success. One is failure.

01:38:53.470 --> 01:39:00.510
You may have a relationship that sends the original document to another processor,

01:39:00.510 --> 01:39:07.350
and then it takes the extracted information from that document and puts it to another processor.

01:39:08.290 --> 01:39:11.830
So there's a lot of power there, a lot of capabilities that will go into.

01:39:12.310 --> 01:39:15.090
But that is the reason for relationships.

01:39:15.810 --> 01:39:17.870
And then there's a comment section, right?

01:39:19.090 --> 01:39:23.830
So, you know, think of this, you know, from a software engineering aspect,

01:39:23.830 --> 01:39:30.470
when you are committing your code back to a repository like GitHub or GitLab or something,

01:39:30.810 --> 01:39:32.630
you need to make a comment on your code.

01:39:33.130 --> 01:39:34.190
Same thing here.

01:39:34.550 --> 01:39:36.250
What is this processor doing?

01:39:37.370 --> 01:39:40.870
Like, what, you know, give me the, you know, who built this?

01:39:41.190 --> 01:39:46.070
You know, you may have a policy set up that, you know, you need to put in all this information.

01:39:46.450 --> 01:39:48.190
That's where some of this would go.

01:39:48.190 --> 01:39:57.670
So I'm going to just say this is a test processor and leave it at that.

01:39:57.770 --> 01:39:59.630
Apply, done.

01:40:00.870 --> 01:40:06.910
And so I have now, you know, created my first processor, dragged and dropped it down.

01:40:07.730 --> 01:40:15.110
I have, you know, built, you know, started building my data flow.

01:40:15.110 --> 01:40:18.630
And I gave it a good name that I can understand.

01:40:18.910 --> 01:40:20.310
Git file from the system.

01:40:21.150 --> 01:40:27.170
I even put a color on it to distinguish it between other maybe Git files.

01:40:27.870 --> 01:40:30.390
You know, and I went in and configured it.

01:40:30.470 --> 01:40:33.170
Now I just need to start building my data flow.

01:40:33.570 --> 01:40:37.310
And so, you know, to do that, right, you know, once you get your file,

01:40:37.390 --> 01:40:42.670
you may bring down another processor that, you know, identify my file type.

01:40:45.490 --> 01:40:48.630
And this is where that relationship comes into play.

01:40:48.710 --> 01:40:53.270
Because now I can take this, drag my arrow to my next processor,

01:40:54.210 --> 01:40:56.530
and for success send it there.

01:40:57.070 --> 01:40:57.450
Done.

01:40:58.050 --> 01:41:05.670
And once that processor is configured properly and has the connections it needs,

01:41:06.090 --> 01:41:11.690
it goes from a yellow yield to a stop, you know, red square.

01:41:12.350 --> 01:41:14.770
You know, and it's ready to run.

01:41:15.110 --> 01:41:21.410
If you hover over the little yellow yield on your processor when you get to start building them,

01:41:21.670 --> 01:41:24.750
it's going to tell you why it's not ready.

01:41:25.670 --> 01:41:29.550
In this instance, the relationship success is invalid

01:41:29.550 --> 01:41:33.430
because relationship is not connected to any other component.

01:41:33.810 --> 01:41:37.690
So that tells me, like, I'm going to have to put another processor to send this to

01:41:37.830 --> 01:41:41.850
once it identifies the MIME type, the file type, right?

01:41:43.810 --> 01:41:46.430
We will go into more of the flow building.

01:41:46.890 --> 01:41:53.450
This is mainly to point out the components and the user interface part of NAFA,

01:41:54.350 --> 01:41:56.550
you know, just so, you know, everyone understands.

01:41:57.530 --> 01:41:59.370
After lunch, you know, we'll do that.

01:41:59.570 --> 01:42:01.710
We're almost close to breaking for lunch.

01:42:02.230 --> 01:42:05.670
But a couple more things I wanted to go over before we did that.

01:42:05.670 --> 01:42:07.630
So that's the processor.

01:42:08.190 --> 01:42:10.670
You have input port and output port.

01:42:11.090 --> 01:42:18.270
So you may have, you know, data coming from a process, a group of processors,

01:42:18.750 --> 01:42:22.430
and you are sending all of that to an output port.

01:42:22.910 --> 01:42:28.630
And then you may have another process group with a bunch of processors living underneath it

01:42:28.630 --> 01:42:30.250
with an input group.

01:42:30.250 --> 01:42:35.370
And so, like, you are pushing data out to an output group,

01:42:35.710 --> 01:42:40.330
and then you are able to receive that data from an input group.

01:42:40.410 --> 01:42:44.150
So that way, it helps you manage, you know, those data flows.

01:42:45.870 --> 01:42:48.670
And, you know, depending on some rules or something else,

01:42:48.670 --> 01:42:51.050
you may have different input-output ports.

01:42:51.430 --> 01:42:54.150
But that's the reason for an input-output port.

01:42:55.250 --> 01:42:56.910
So you got input and you got output.

01:42:57.390 --> 01:42:59.370
Next is a process group.

01:42:59.370 --> 01:43:00.530
And that's what we talked about.

01:43:01.150 --> 01:43:04.630
So process group is, you know, what it says.

01:43:04.690 --> 01:43:06.930
It's a group of processors.

01:43:07.910 --> 01:43:11.310
So I am saying for this process group, get files.

01:43:11.890 --> 01:43:12.730
Something easy.

01:43:14.130 --> 01:43:15.790
And so get files.

01:43:16.150 --> 01:43:19.330
I can then put all my processors in this.

01:43:19.830 --> 01:43:23.830
I can also put an output port in this process group.

01:43:24.250 --> 01:43:27.550
And I can have it go into another process group through an input port.

01:43:27.550 --> 01:43:31.530
But once you have a process group, you just double-click,

01:43:31.590 --> 01:43:32.770
and you can go in.

01:43:33.450 --> 01:43:35.930
One thing you notice is there's a breadcrumb trail.

01:43:37.210 --> 01:43:42.470
So there is, you know, the main canvas I have is NaFi flow.

01:43:42.850 --> 01:43:47.650
But once I get into my Git file, I'm now, you know, just a level deeper.

01:43:48.430 --> 01:43:55.630
And the way NaFi handles security is, you know, for instance,

01:43:55.630 --> 01:44:00.150
this whole canvas and everything associated is the root level.

01:44:00.590 --> 01:44:04.990
The root level has a unique UUID assigned to it.

01:44:05.890 --> 01:44:09.710
So, you know, I don't think anyone here, I didn't hear anyone saying

01:44:09.710 --> 01:44:12.270
they're having to set up some of the security stuff.

01:44:12.770 --> 01:44:19.830
I do know Brett and Ben and some others are working on this.

01:44:19.830 --> 01:44:25.650
But, yeah, if you were to define a policy for NaFi,

01:44:26.070 --> 01:44:30.970
you would say this is the root canvas and only, you know,

01:44:31.070 --> 01:44:33.030
group A has access to that.

01:44:33.330 --> 01:44:40.610
And then, you know, group B has access to the Git file processor group

01:44:40.610 --> 01:44:43.090
where you can get all the files.

01:44:43.090 --> 01:44:50.050
And so, you know, that Git file processor group with, you know,

01:44:50.210 --> 01:44:53.650
its own set of processors, let me just add one to me.

01:44:54.470 --> 01:44:59.510
You know, you may have a policy that says that they can access that.

01:44:59.710 --> 01:45:02.230
Each one of these has a UUID.

01:45:02.630 --> 01:45:08.090
So the main, we're in the 7DC, we're going to Git files, 389.

01:45:08.410 --> 01:45:11.070
So, you know, the UUID is going to change.

01:45:11.070 --> 01:45:15.630
But based upon your security, you can lock this down to process groups.

01:45:16.050 --> 01:45:20.090
You can also allow people, you know, everyone to have access

01:45:20.090 --> 01:45:23.170
to the main canvas, depending on what you all want to do.

01:45:24.410 --> 01:45:26.230
So that is a process group.

01:45:28.150 --> 01:45:32.350
A remote process group is exactly like I said.

01:45:32.410 --> 01:45:38.830
It's going to pull in a process group from a remote NaFi.

01:45:42.150 --> 01:45:46.910
So what you may do then is you may have a process group

01:45:48.410 --> 01:45:53.330
that lives on another NaFi instance.

01:45:53.670 --> 01:45:58.950
You can pull that in and run that remote process group.

01:45:59.670 --> 01:46:00.690
You have funnels.

01:46:00.690 --> 01:46:03.390
Funnels kind of, you know, just like the name says,

01:46:03.670 --> 01:46:05.570
it just funnels some of the data together.

01:46:05.570 --> 01:46:11.150
It has some configurations and things like that that we can work off of.

01:46:12.090 --> 01:46:13.290
But, you know, that's the point.

01:46:13.870 --> 01:46:14.390
Template.

01:46:14.510 --> 01:46:20.490
So if I were to hold my shift key, I can highlight this whole thing.

01:46:20.950 --> 01:46:23.970
And then I can say actually right here from my operate canvas,

01:46:24.090 --> 01:46:25.350
I can say create template.

01:46:25.690 --> 01:46:27.570
I can say Git file.

01:46:28.350 --> 01:46:29.550
Give it a description.

01:46:32.290 --> 01:46:33.530
And create a template.

01:46:33.530 --> 01:46:38.570
Now I can actually take my template, I can drag it down,

01:46:38.750 --> 01:46:40.290
and now I have a Git file template.

01:46:40.330 --> 01:46:43.890
And I can say add, and it's going to lay out that Git file template.

01:46:44.410 --> 01:46:46.490
Also, I can download that template.

01:46:46.910 --> 01:46:48.730
I can go over here to my templates.

01:46:49.870 --> 01:46:55.370
It now should show up, and I can download it or delete it.

01:46:55.410 --> 01:46:59.910
And so in this case, I am deleting it.

01:47:02.150 --> 01:47:02.790
Okay.

01:47:03.050 --> 01:47:04.010
So that's templates.

01:47:04.770 --> 01:47:11.950
The last component that we will go into before lunch here is the label.

01:47:12.710 --> 01:47:17.010
So again, when I created this processor, I configured it.

01:47:17.110 --> 01:47:18.410
I went right-click.

01:47:18.610 --> 01:47:19.750
Let me get off of this.

01:47:20.290 --> 01:47:21.230
I went right-click.

01:47:21.310 --> 01:47:22.810
I went to configure.

01:47:24.530 --> 01:47:25.090
Come on, latency.

01:47:28.390 --> 01:47:35.550
And under that, I gave it a very, you know, human-readable,

01:47:35.730 --> 01:47:37.330
understandable name.

01:47:38.290 --> 01:47:45.510
But what label does is drag this down, and I can now, you know,

01:47:45.690 --> 01:47:47.610
create a box.

01:47:48.430 --> 01:47:51.270
You know, think of this as almost like a PowerPoint kind of

01:47:51.270 --> 01:47:51.770
capability.

01:47:52.390 --> 01:47:55.070
And then I just double-click and say this.

01:47:55.070 --> 01:47:58.030
I'm going to call this a test group.

01:48:00.530 --> 01:48:05.490
And so what that label does is I'm able to open up NAFA.

01:48:06.070 --> 01:48:10.690
I can look at this, and because of the labels, you know,

01:48:10.690 --> 01:48:17.770
I may have a label that, you know, that is, let's see if I can

01:48:17.770 --> 01:48:18.010
paste that.

01:48:18.010 --> 01:48:18.690
There we go.

01:48:18.690 --> 01:48:23.170
I may have processors, all part of this process group,

01:48:23.730 --> 01:48:30.350
and I have processors.

01:48:31.850 --> 01:48:38.030
So the first label is picking up the file and identifying the

01:48:38.030 --> 01:48:38.830
type of file.

01:48:39.230 --> 01:48:42.690
I may have another label that, you know,

01:48:44.170 --> 01:48:48.810
is processors that is handling the file type.

01:48:48.870 --> 01:48:52.050
So if it's a zip, it will unzip or something else.

01:48:52.290 --> 01:48:56.650
And then I may have another, like, set of processors that I've

01:48:56.650 --> 01:48:59.030
chained together that's doing another function.

01:48:59.430 --> 01:49:03.630
So what this allows me to do is, you know, kind of drag,

01:49:03.910 --> 01:49:06.930
like, you know, maybe, you know, imagine this, identify mine

01:49:06.930 --> 01:49:09.090
type as its own little category.

01:49:09.090 --> 01:49:15.510
So I can actually put these, you know, labels on either a

01:49:15.510 --> 01:49:18.770
processor or a group of processors or even a process

01:49:18.770 --> 01:49:22.630
group, and that way I can quickly look at this and say,

01:49:22.730 --> 01:49:25.830
okay, you know, here's where they're getting the git file

01:49:25.830 --> 01:49:26.710
and what's happening.

01:49:26.830 --> 01:49:29.890
Here's where they're actually doing the ETL or, you know,

01:49:30.010 --> 01:49:31.030
those types of things.

01:49:31.570 --> 01:49:34.350
So that is the purpose of the label.

01:49:37.250 --> 01:49:41.590
That is a lot to ingest.

01:49:41.890 --> 01:49:47.630
But with that being said, I'm going to pause here and see

01:49:47.630 --> 01:49:50.710
what questions we have before we go to lunch.

01:49:51.330 --> 01:49:54.670
When we come back from lunch, we will make sure we have

01:49:54.670 --> 01:49:56.990
our NaPy installed and up and running.

01:49:57.090 --> 01:50:00.410
So if you did not do this while I went through it,

01:50:00.530 --> 01:50:01.850
it's perfectly okay.

01:50:02.770 --> 01:50:04.090
We're going to get it installed.

01:50:04.090 --> 01:50:07.130
We're going to get it up and running and start creating

01:50:07.130 --> 01:50:11.710
our own flow file within our own data flow.

01:50:12.150 --> 01:50:15.070
While we are creating that, we will also visit,

01:50:15.070 --> 01:50:19.130
you know, some menus and components that, you know,

01:50:19.170 --> 01:50:22.670
I haven't already went over because now we're at a

01:50:22.670 --> 01:50:27.470
point where, you know, let's just get it going and

01:50:27.470 --> 01:50:30.650
we've got a processor, our first processor deployed,

01:50:30.650 --> 01:50:35.030
and, you know, now we can learn some additional concepts

01:50:35.030 --> 01:50:36.650
as we build out this flow.

01:50:37.230 --> 01:50:40.710
But any questions before we go to lunch?

01:50:44.310 --> 01:50:44.790
Okay.

01:50:45.970 --> 01:50:46.750
I have a question.

01:50:47.230 --> 01:50:48.210
Yes, go ahead.

01:50:49.490 --> 01:50:50.990
It's actually another question.

01:50:51.610 --> 01:50:53.750
So I can't type on the remote desktop.

01:50:54.570 --> 01:50:57.510
So is there, did I press something that's prevented

01:50:57.510 --> 01:50:58.650
me from typing?

01:51:00.410 --> 01:51:01.530
Let's see.

01:51:01.530 --> 01:51:04.970
You shouldn't have any issues typing.

01:51:05.750 --> 01:51:07.270
Let me pull yours up right quick.

01:51:08.310 --> 01:51:08.650
Okay.

01:51:08.730 --> 01:51:11.150
And then when you click on the, like,

01:51:11.270 --> 01:51:13.110
click on the browser itself or the toolbar,

01:51:13.670 --> 01:51:16.810
like click on this toolbar and start typing,

01:51:16.970 --> 01:51:18.030
can you now try to type?

01:51:19.710 --> 01:51:20.190
No.

01:51:20.670 --> 01:51:22.710
It will not let you.

01:51:23.350 --> 01:51:23.830
Huh.

01:51:24.450 --> 01:51:25.570
Do I just need to reconnect?

01:51:25.570 --> 01:51:28.430
I will probably maybe stop the machine.

01:51:28.970 --> 01:51:29.350
There we go.

01:51:29.470 --> 01:51:29.950
There's Maria.

01:51:30.130 --> 01:51:30.550
There we go.

01:51:30.790 --> 01:51:31.230
Let's see.

01:51:32.310 --> 01:51:32.930
All right.

01:51:33.870 --> 01:51:36.170
So while we're at lunch, if you can,

01:51:36.250 --> 01:51:38.730
just do a regular, can you still click to shut down?

01:51:39.990 --> 01:51:40.630
I can.

01:51:40.810 --> 01:51:41.550
I can click.

01:51:41.810 --> 01:51:44.270
My mouse works, but my keyboard doesn't.

01:51:44.610 --> 01:51:45.130
Okay.

01:51:45.430 --> 01:51:47.330
So that's really weird.

01:51:48.490 --> 01:51:48.930
Yeah.

01:51:49.110 --> 01:51:51.870
If you can, just shut it down and restart it.

01:51:51.870 --> 01:51:56.710
It may be a proxy configuration or something like that.

01:51:56.750 --> 01:51:58.870
We've run into a couple of those types of issues,

01:51:59.170 --> 01:52:01.370
but you should be able to type.

01:52:02.550 --> 01:52:03.890
But no, thank you for asking.

01:52:05.870 --> 01:52:06.470
Thank you.

01:52:07.610 --> 01:52:10.550
And then any other questions?

01:52:13.250 --> 01:52:14.070
All right.

01:52:14.390 --> 01:52:17.610
Well, if there is no other questions,

01:52:18.670 --> 01:52:21.650
let's take about a 45-minute lunch.

01:52:23.050 --> 01:52:26.330
What I will do is I'm going to continue sharing my screen,

01:52:26.350 --> 01:52:29.970
and let's just do an afternoon.

01:52:30.250 --> 01:52:32.410
Well, it is afternoon for me.

01:52:33.550 --> 01:52:37.610
And we will return back here at,

01:52:38.130 --> 01:52:41.910
it is 11.45 your time.

01:52:41.910 --> 01:52:46.910
We are going to return back here at 12.30 your time.

01:52:49.670 --> 01:52:51.570
2.30 my time.

01:52:51.650 --> 01:52:55.830
2.30 p.m.

01:52:59.310 --> 01:53:02.590
2.30 p.m. CST.

01:53:03.770 --> 01:53:04.130
Okay.

01:53:04.590 --> 01:53:06.550
So if there's no further questions

01:53:06.550 --> 01:53:08.490
or anything I can help with,

01:53:08.490 --> 01:53:12.010
go have a great lunch when we get back.

01:53:12.150 --> 01:53:14.530
We are going to be very hands-on.

01:53:14.570 --> 01:53:16.450
We're going to start building some flows

01:53:16.450 --> 01:53:19.130
and things like that.

01:53:19.930 --> 01:53:22.710
And we're going to be doing a lot of that

01:53:22.710 --> 01:53:24.430
over the next few days.

01:53:25.590 --> 01:53:31.870
And so if there is a use case or a scenario

01:53:31.870 --> 01:53:34.150
you kind of want to play on,

01:53:34.950 --> 01:53:37.250
I'd be happy to integrate that in.

01:53:37.250 --> 01:53:40.690
I have a weather station scenario

01:53:40.690 --> 01:53:44.730
where we pull data from three different sources.

01:53:45.230 --> 01:53:47.630
Two of those have different data formats.

01:53:48.050 --> 01:53:50.150
We need to get it into the same format.

01:53:50.350 --> 01:53:54.090
We need to do some reporting on that data.

01:53:54.830 --> 01:53:57.230
So that's one of those scenarios I like working with.

01:53:58.090 --> 01:53:59.830
But, you know, with that being said,

01:53:59.870 --> 01:54:01.610
if you have anything specific

01:54:01.610 --> 01:54:04.810
that you would like for me to tailor this conversation with,

01:54:05.110 --> 01:54:06.350
I would be happy to help.

01:54:06.350 --> 01:54:08.310
And if there's no other questions,

01:54:08.330 --> 01:54:11.610
I will see everyone back in about 45 minutes.

01:54:36.350 --> 01:54:36.510
Thank you.

01:55:16.130 --> 01:55:18.930
Thank you.

01:55:36.350 --> 01:55:36.430
Thank you.

01:56:06.350 --> 01:56:06.930
Thank you.

01:56:57.990 --> 01:57:00.790
Thank you.

01:57:33.490 --> 01:57:36.290
Thank you.

01:57:41.110 --> 01:57:46.350
Hey, just checking in real quickly on the 3-tier app.

01:57:47.510 --> 01:57:49.230
Rhonda mentioned this morning

01:57:49.230 --> 01:57:51.370
that you thought it might take a long time

01:57:51.370 --> 01:57:55.090
because it references AWS resources or something.

01:58:00.330 --> 01:58:01.310
Oh, okay.

01:58:02.970 --> 01:58:05.630
I told her it should be pretty quick,

01:58:29.730 --> 01:58:31.130
but...

01:58:31.130 --> 01:58:32.010
Okay.

01:58:32.670 --> 01:58:33.070
Perfect.

01:58:36.370 --> 01:58:39.630
Reply to the email thread on a separate one.

01:58:41.010 --> 01:58:41.830
I thought you were reading this.

01:58:41.970 --> 01:58:42.490
I guess not.

01:58:42.830 --> 01:58:44.210
But he's saying once a day,

01:58:44.330 --> 01:58:45.330
based on a separate VM,

01:58:45.510 --> 01:58:46.730
he started setting that up.

01:58:46.950 --> 01:58:49.410
I said, yeah, I mean, I can move this thing

01:58:49.410 --> 01:58:51.270
from a whole three of them being on a separate VM

01:58:51.270 --> 01:58:51.890
if he really wants,

01:58:52.490 --> 01:58:55.270
and I can apply it to the rest of the React when I go.

01:58:55.350 --> 01:58:55.650
Yeah.

01:58:55.770 --> 01:58:58.050
That's why I'm the one that suggested this app

01:58:58.050 --> 01:59:00.130
because I was like,

01:59:00.690 --> 01:59:04.110
me, you, and a few others could easily get this up

01:59:04.110 --> 01:59:06.930
and run it quickly, I feel like.

01:59:07.250 --> 01:59:09.750
And so, yeah, it shouldn't be hard

01:59:09.750 --> 01:59:11.750
to move it to a different thing,

01:59:11.750 --> 01:59:14.750
but can you include me on the response

01:59:14.750 --> 01:59:16.010
on that other thread?

01:59:16.150 --> 01:59:17.930
Because it's kind of hard to keep track

01:59:17.930 --> 01:59:19.570
of these things if I don't see it.

01:59:20.770 --> 01:59:21.670
Yeah, yeah.

01:59:24.790 --> 01:59:25.910
I don't know.

01:59:27.050 --> 01:59:30.650
Well, he's just getting ready for his demo.

01:59:31.830 --> 01:59:34.650
The future demo's coming up.

01:59:39.890 --> 01:59:42.470
Well, no, I'm getting ready for demos

01:59:42.470 --> 01:59:45.010
for when it comes time to do the demo.

01:59:45.870 --> 01:59:46.590
Yeah, yeah.

01:59:46.590 --> 01:59:49.450
Because he just wanted three tiered app for that, right?

01:59:52.670 --> 01:59:53.170
Yeah.

01:59:53.330 --> 01:59:58.130
And that's why him and I and Rhonda met last week,

01:59:58.130 --> 01:59:59.150
and I was like, look,

01:59:59.150 --> 02:00:00.430
well, a few of us met, right?

02:00:00.490 --> 02:00:01.750
And then I was like, look,

02:00:01.870 --> 02:00:04.870
I can go to GitHub, I can find a three tiered app

02:00:04.870 --> 02:00:07.970
in a language that most of us can work with and know

02:00:07.970 --> 02:00:11.450
and should be pretty quick and easy to get up and run it.

02:00:11.730 --> 02:00:13.290
So, yeah, good, good.

02:00:13.730 --> 02:00:15.790
Because Rhonda called me this morning and said,

02:00:16.430 --> 02:00:18.570
and I think this was before you even got into the office,

02:00:18.650 --> 02:00:20.710
it was early this morning.

02:00:22.710 --> 02:00:23.870
So she was like, you know,

02:00:23.970 --> 02:00:26.670
feeling thought that there was some AWS resources

02:00:26.670 --> 02:00:30.050
and this was a lot harder.

02:00:30.230 --> 02:00:33.150
And I was like, well, when I looked at it

02:00:33.150 --> 02:00:34.790
and I looked at the source code,

02:00:34.830 --> 02:00:36.670
I was like, I don't think it will take long,

02:00:36.670 --> 02:00:39.270
but I will call Dylan later today to find out.

02:00:39.730 --> 02:00:40.930
But now that I called Dylan

02:00:40.930 --> 02:00:42.230
and he's already got it up and running,

02:00:42.350 --> 02:00:43.190
that's even better.

02:00:45.090 --> 02:00:46.670
What I said was,

02:00:47.430 --> 02:00:48.550
let's see what I can say.

02:00:48.790 --> 02:00:49.670
Oh, I can't remember.

02:00:52.850 --> 02:00:53.650
This was...

02:01:09.170 --> 02:01:11.630
Oh, yeah, yeah.

02:01:11.630 --> 02:01:13.670
And he left the external IP address

02:01:13.670 --> 02:01:16.250
for some AWS EC2 instance,

02:01:16.450 --> 02:01:17.630
and I put it in there.

02:01:17.830 --> 02:01:20.050
And I haven't had the screenshots with pictures.

02:01:20.390 --> 02:01:22.770
This means it's always hard to contact the server.

02:01:22.870 --> 02:01:24.910
One day I will call on the other instance

02:01:24.910 --> 02:01:26.470
and I'll put the link to that.

02:01:27.870 --> 02:01:31.250
Yeah, well, we both know,

02:01:31.430 --> 02:01:33.150
we're both working with the same person

02:01:33.150 --> 02:01:34.170
and I completely understand.

02:01:38.670 --> 02:01:39.230
So...

02:01:39.830 --> 02:01:40.390
Awesome.

02:01:41.990 --> 02:01:43.050
Okay, you rock.

02:01:44.790 --> 02:01:45.530
Thank you.

02:01:45.730 --> 02:01:47.090
Yeah, you're doing good otherwise.

02:01:47.930 --> 02:01:48.730
Yeah, yeah.

02:01:49.190 --> 02:01:50.790
When's the baby going to be here?

02:01:53.530 --> 02:01:55.410
October 11th is my wife.

02:01:55.450 --> 02:01:57.450
You know, her mom's birthday is Wednesday,

02:01:57.710 --> 02:01:58.470
but we'll see.

02:01:58.910 --> 02:02:00.210
And this is her first?

02:02:01.310 --> 02:02:01.910
Yeah.

02:02:02.170 --> 02:02:05.790
All right, so I am banking on the 15th.

02:02:07.390 --> 02:02:08.590
Think it's going to be late?

02:02:12.150 --> 02:02:12.750
Maybe.

02:02:13.070 --> 02:02:15.150
Usually the first ones have a tendency

02:02:15.150 --> 02:02:16.950
that can go a little later.

02:02:17.390 --> 02:02:20.390
But it definitely won't be too much earlier.

02:02:21.150 --> 02:02:21.630
Usually.

02:02:22.650 --> 02:02:24.030
Now the second one...

02:02:24.030 --> 02:02:25.050
Oh, you're going to have it early.

02:02:27.010 --> 02:02:28.810
Dude, they're going to have six kids, right?

02:02:29.170 --> 02:02:30.790
Like, and I delivered...

02:02:31.330 --> 02:02:33.790
I delivered my next to the...

02:02:34.570 --> 02:02:36.350
I delivered Natalie, my nine-year-old,

02:02:36.510 --> 02:02:37.990
until her shoulder got stuck.

02:02:38.970 --> 02:02:39.490
Oh, wow.

02:02:39.730 --> 02:02:43.430
Yeah, and I had requested to deliver.

02:02:43.570 --> 02:02:44.470
They let me.

02:02:44.490 --> 02:02:44.710
Really?

02:02:44.850 --> 02:02:46.970
Yeah, I wanted to deliver my own child

02:02:46.970 --> 02:02:48.670
and that was my plan.

02:02:48.670 --> 02:02:50.810
Until, unless something came up

02:02:50.810 --> 02:02:53.230
and then the doctor was right there.

02:02:53.270 --> 02:02:55.050
So when something didn't come up,

02:02:55.750 --> 02:02:57.390
I just removed my hands

02:02:57.390 --> 02:02:59.150
and I got out of the way

02:02:59.150 --> 02:03:00.490
and let him do their thing.

02:03:03.550 --> 02:03:06.230
So, all right, well, good luck.

02:03:06.950 --> 02:03:09.050
Like I said, I'll be up there in about a month,

02:03:09.050 --> 02:03:10.690
so I'll see you in a little over a month.

02:03:12.430 --> 02:03:12.890
Oh yeah, dude.

02:03:12.970 --> 02:03:13.810
I'm looking forward to it.

02:03:13.870 --> 02:03:14.410
How long are you here for?

02:03:14.810 --> 02:03:15.910
A few weeks, actually.

02:03:15.990 --> 02:03:17.730
But I'll be in the technical office.

02:03:17.730 --> 02:03:19.730
You know, I'll only offer a few days.

02:03:20.810 --> 02:03:21.190
Okay.

02:03:21.730 --> 02:03:22.090
All right.

02:03:26.690 --> 02:03:27.530
Hell yeah.

02:03:27.770 --> 02:03:28.070
All right.

02:03:28.130 --> 02:03:29.390
Well, if you need anything, let me know.

02:03:46.430 --> 02:03:47.270
All right.

02:03:51.650 --> 02:03:52.330
Oliver!

02:03:53.130 --> 02:03:53.650
Ollie!

02:03:57.850 --> 02:03:58.210
Oliver!

02:03:59.890 --> 02:04:00.550
Come here!

02:04:02.970 --> 02:04:04.910
Mario or Pokemon?

02:04:09.470 --> 02:04:11.350
Well, okay.

02:04:12.650 --> 02:04:13.490
These are the high top.

02:04:13.690 --> 02:04:14.850
This is not a high top.

02:04:15.850 --> 02:04:16.550
You like this one?

02:04:17.550 --> 02:04:20.170
I like the Pokemon.

02:04:20.650 --> 02:04:21.910
Okay, so not this one?

02:04:22.370 --> 02:04:23.730
How about this one?

02:04:25.930 --> 02:04:27.010
Do you want the Pokemon?

02:04:27.630 --> 02:04:28.830
Do the Pokemon.

02:04:28.990 --> 02:04:29.930
How about this one?

02:04:30.330 --> 02:04:30.630
Rocket.

02:04:31.550 --> 02:04:32.070
Rocket!

02:04:33.150 --> 02:04:34.890
And he has a Zip on the back.

02:04:35.790 --> 02:04:36.290
Zip?

02:04:36.370 --> 02:04:37.330
I want the Zip.

02:04:37.430 --> 02:04:38.150
I want it too.

02:04:38.710 --> 02:04:39.890
No, just beat one.

02:04:41.850 --> 02:04:42.810
So Rocket?

02:04:44.230 --> 02:04:44.710
Rocket.

02:04:44.710 --> 02:04:46.930
I don't want those.

02:04:47.170 --> 02:04:47.770
Can I have a little?

02:04:48.530 --> 02:04:48.910
Rocket.

02:04:50.930 --> 02:04:51.410
Or...

02:04:51.410 --> 02:04:54.810
Mario Brothers, Pokemon, or Fire Shoes?

02:04:57.630 --> 02:04:58.110
Pokemon!

02:04:59.490 --> 02:04:59.970
Huh?

02:05:00.570 --> 02:05:01.050
Pokemon!

02:05:01.690 --> 02:05:02.170
Pokemon?

02:05:02.570 --> 02:05:02.850
This one?

02:05:03.570 --> 02:05:05.470
This one has a little Zip on the back.

02:05:06.370 --> 02:05:06.970
Okay, stop.

02:05:07.030 --> 02:05:07.770
Why do you keep scratching?

02:05:07.990 --> 02:05:08.870
Let me see your hand.

02:05:10.950 --> 02:05:11.810
Why do you keep...

02:05:11.810 --> 02:05:13.510
Why do you...

02:05:13.510 --> 02:05:14.610
Why do you keep scratching?

02:05:14.610 --> 02:05:15.290
Why do you keep scratching?

02:05:18.230 --> 02:05:19.270
Where's the...

02:05:19.270 --> 02:05:20.670
We need a loop there.

02:05:21.110 --> 02:05:21.950
Okay, stop it!

02:05:29.330 --> 02:05:30.590
Did you brought me a coffee?

02:05:31.610 --> 02:05:33.510
You said you like this.

02:05:37.870 --> 02:05:38.910
I definitely need this.

02:05:38.910 --> 02:05:44.790
Well, hopefully everyone had a great lunch and work and all coming back.