14 videos 📅 2024-05-06 08:00:00 America/Creston
39:11
2024-05-06 08:57:50
3:11:28
2024-05-06 10:30:19
24:39
2024-05-07 07:20:20
19:49
2024-05-07 08:03:25
1:14:35
2024-05-07 08:35:13
15:41
2024-05-07 10:06:25
38:33
2024-05-07 10:37:03
2:48
2024-05-07 11:19:01
59:37
2024-05-07 11:33:56
6:10
2024-05-07 14:55:25
39:40
2024-05-07 15:02:44
9:45
2024-05-08 06:44:21
29:27
2024-05-08 08:09:24
2:51:40
2024-05-08 12:09:24

Visit the Apache Nifi - GROUP 1 course recordings page

                WEBVTT

00:00:00.000 --> 00:00:06.200
So for those that are on your desktop if you can can you bring up your

00:00:07.380 --> 00:00:10.540
Desktop, you should see a blank canvas

00:00:10.540 --> 00:00:17.080
You know a blank desktop like mine. You'll have Microsoft Edge, Docker desktop

00:00:18.040 --> 00:00:20.020
Uploads, you know those types of things

00:00:20.020 --> 00:00:24.680
But one of the things I want you to check is if you can open your file explorer

00:00:27.960 --> 00:00:28.520
And

00:00:28.520 --> 00:00:29.840
When you go to downloads

00:00:29.840 --> 00:00:32.240
There should be

00:00:32.240 --> 00:00:38.440
Quite a bit of downloads, minify, minify, nine five registry, toolkit, nine five, those types of things

00:00:40.840 --> 00:00:43.760
If you do not have that information let me know

00:00:44.500 --> 00:00:51.540
That way I can go in and replicate to your desktop the you know the information. CodySCE you have it

00:00:52.280 --> 00:00:53.360
Aaron, nothing

00:00:55.580 --> 00:00:56.660
Sean, Brett,

00:00:58.540 --> 00:00:59.540
You look good too

00:01:00.500 --> 00:01:06.540
Amanda you should have it as well. I remember replicating your desktop. Perfect. Perfect. Perfect

00:01:06.540 --> 00:01:12.860
Pedro, are you able to see the files? Back to Pedro. Alyssa, if you look in your downloads

00:01:12.860 --> 00:01:14.980
File Explorer then go to downloads

00:01:14.980 --> 00:01:17.280
You should see

00:01:17.980 --> 00:01:19.280
a list of files

00:01:21.240 --> 00:01:23.440
No worries, I

00:01:23.440 --> 00:01:26.840
Again, I'm here on my ranch in Central, Texas. So

00:01:26.840 --> 00:01:29.640
Elon Musk, Starly

00:01:30.420 --> 00:01:33.360
You know if he decides I want data today, I get it

00:01:34.760 --> 00:01:38.560
Okay, you look good. Pedro, Alyssa, Randy

00:01:39.620 --> 00:01:42.640
Being yours is good. Pedro, are you able to

00:01:45.020 --> 00:01:51.960
Yep, I see your screen. You look good to me. Yeah, perfect. Perfect. Perfect. So the

00:01:52.880 --> 00:01:55.360
Downloads all worked and good to go there

00:01:56.120 --> 00:02:00.120
So what we'll do is go through some of the PowerPoint presentation a little bit more

00:02:00.920 --> 00:02:02.840
We'll take a break for

00:02:03.460 --> 00:02:08.400
Just take a quick break. I like to do like a 10 to 15 minute break

00:02:08.960 --> 00:02:13.000
You know depending on on how many questions we get and those types of things

00:02:13.920 --> 00:02:15.240
It's my understanding

00:02:15.240 --> 00:02:20.060
Everyone's in Arizona. So my lunch time is usually in about an hour

00:02:20.580 --> 00:02:27.500
But for Arizona time we will try to go lunch about 11 30 your time 11 11 30 your time

00:02:27.500 --> 00:02:35.520
I like to do it like 45 minutes for lunch, but if you need an hour that we can do that as well

00:02:36.080 --> 00:02:41.420
You know if there's not a lot of questions or a lot of interaction we we usually end a little early

00:02:41.980 --> 00:02:44.800
You know, there's some time built in for those interactions

00:02:45.360 --> 00:02:52.280
You know, so we'll do that so everybody's logged into their desktop you're able to see the files that we're going to work with

00:02:52.840 --> 00:02:59.680
When we get to this part, we're going to actually do an install of NAFA on the Windows operating system

00:03:00.340 --> 00:03:07.380
If you were installing this in Linux, it would actually be a little bit easier, but there is a ton of documentation

00:03:08.460 --> 00:03:14.120
As you can imagine for a government product to the documentation for NAFA is very extensive

00:03:15.140 --> 00:03:20.440
Everything I'm teaching today. Everything I'm going over is in the NAFA docs

00:03:21.500 --> 00:03:24.840
You can actually kind of follow along if you go to

00:03:26.020 --> 00:03:28.520
NAFA.apache.org you will see

00:03:30.220 --> 00:03:34.640
You know tons and tons of documentation, you know, what is NAFA?

00:03:35.440 --> 00:03:40.000
You know, what is the core concepts the architecture some of these things that we're going over

00:03:40.970 --> 00:03:47.930
You know even even the architecture for instance where you have the OS the host which in our case is

00:03:48.670 --> 00:03:50.750
Windows your JVM

00:03:50.750 --> 00:04:00.590
For those that are technical it's a Jetty web server serving this UI on then we have the flow controller

00:04:00.590 --> 00:04:02.070
processors

00:04:02.070 --> 00:04:07.290
You know full file repository that we talked about content repository and provenance repository

00:04:08.090 --> 00:04:12.970
There are local storage now when it says local storage that doesn't necessarily mean

00:04:13.590 --> 00:04:15.570
It's being stored to a local disk

00:04:16.550 --> 00:04:23.690
I've seen local storage be a NAS or some other type of network attached storage system or

00:04:24.450 --> 00:04:26.410
You know those types of things

00:04:26.410 --> 00:04:32.810
So if you want to kind of follow along with some of the documentation, it's all there

00:04:33.510 --> 00:04:35.810
If you go to NAFA.apache.org

00:04:35.810 --> 00:04:38.170
You'll see everything

00:04:39.830 --> 00:04:45.010
Documentation is very well. We're working off of the NAFA version one documentation

00:04:45.690 --> 00:04:49.770
Just because the version two just came out. We'll touch on some of that

00:04:50.670 --> 00:04:54.570
What I like to go off of is the admin guide or the user guide

00:04:55.330 --> 00:05:01.210
Those are the two guides that that I work with when we go into some of the processors we can

00:05:01.210 --> 00:05:04.030
Will actually go in and talk about some of that

00:05:04.030 --> 00:05:06.310
But if you look at the admin guide

00:05:06.870 --> 00:05:14.730
It kind of you know for or an open source product the documentation for NAFA is is amazing

00:05:15.310 --> 00:05:18.210
Usually we don't see this type of documentation

00:05:18.210 --> 00:05:24.310
But as you can imagine being a government product that was released to the open source world

00:05:25.030 --> 00:05:28.490
We had to do a lot of documentation before that was released

00:05:28.490 --> 00:05:32.350
The documentation is also built in to NAFA

00:05:33.110 --> 00:05:39.410
Even down to the processors when you're developing a processor for those that have developed a processor before

00:05:39.410 --> 00:05:45.750
You did have a place where you could include a description as well as other documentation

00:05:45.750 --> 00:05:50.330
So, you know, we'll go through that but I want to make sure that

00:05:50.330 --> 00:05:57.370
That your desktops are running and if you're in your browser, you can pull up the NAFA user guide and admin guide

00:05:58.390 --> 00:06:00.370
And follow along

00:06:00.370 --> 00:06:01.330
as well

00:06:02.610 --> 00:06:03.330
Okay

00:06:05.890 --> 00:06:08.830
You know a flow file in

00:06:09.730 --> 00:06:13.090
abstraction that represents a single piece of information or a data object

00:06:14.110 --> 00:06:16.410
Within a data flow, you know

00:06:16.410 --> 00:06:22.530
so take in this case and I'm using this example is because I just implemented a a

00:06:23.270 --> 00:06:25.390
huge prototype for this

00:06:25.390 --> 00:06:32.370
You know long messages coming in you may have a single message and when it comes into NAFA

00:06:33.290 --> 00:06:36.390
You know treats that as a flow file

00:06:37.250 --> 00:06:39.130
so that flow file

00:06:40.010 --> 00:06:46.330
Is that message it can be in any format it can be in any kind of protocol or those types of things

00:06:46.330 --> 00:06:49.650
You know, so when NAFA receives that

00:06:51.090 --> 00:06:54.970
it generates that as a flow file and then you know

00:06:56.030 --> 00:07:02.030
Within that flow file it consists of two major components the metadata and the data payload

00:07:02.030 --> 00:07:07.550
the metadata is the attributes and we'll go into more of that where we're able to

00:07:07.550 --> 00:07:10.370
Take that data flow and make it into an attribute

00:07:10.370 --> 00:07:12.270
It has a lot of attributes

00:07:12.270 --> 00:07:17.610
So as soon as NAFA touches this data flow it gets assigned an attribute of like, you know

00:07:17.610 --> 00:07:21.390
A date time group of when it was noticed what the source was

00:07:21.390 --> 00:07:26.830
You know those types of things when we're using a processor that goes and grabs data from

00:07:27.530 --> 00:07:34.610
An HTTP website for instance, it will you know what it will record what URL that

00:07:35.270 --> 00:07:40.370
Grab the data from location and those types of things all of that is metadata now

00:07:41.050 --> 00:07:43.570
metadata is separate from the

00:07:44.630 --> 00:07:46.670
actual data file

00:07:46.670 --> 00:07:52.970
But you know we're able to to work with that metadata because you may want to route

00:07:52.970 --> 00:07:57.250
You know your data file based upon source for instance

00:07:57.250 --> 00:08:01.790
And then when the when the data is coming in if you see source X

00:08:01.790 --> 00:08:07.130
You may want to send it one way you see source why you may want to send it another way and all of that

00:08:07.130 --> 00:08:08.310
Would be in the metadata

00:08:09.210 --> 00:08:11.930
You know as it's receiving the data file

00:08:12.530 --> 00:08:15.810
We can also take a look at the data file for instance

00:08:18.190 --> 00:08:22.950
But you know in a lot of cases we like to use those attributes and we'll go into that

00:08:22.970 --> 00:08:29.050
When we're building it, I'm very interactive. I like to do a lot of hands-on work. And so

00:08:30.170 --> 00:08:34.090
We're going to start building some data flows and we go through those will

00:08:35.630 --> 00:08:38.090
Basically repeat what we have here on the slide

00:08:40.230 --> 00:08:47.530
So attributes are key value pairs that store metadata about the data includes basic information file name size

00:08:48.630 --> 00:08:49.270
timestamp

00:08:49.270 --> 00:08:52.450
Any additional metadata added by processor?

00:08:52.970 --> 00:08:56.730
You know again, so it will say you're using the get

00:08:57.330 --> 00:09:02.170
HTTP or get FTP which would get a process from an FTP server

00:09:02.170 --> 00:09:05.650
It will put metadata such as the server name

00:09:06.550 --> 00:09:09.670
IP address, you know some of those types of things that can capture

00:09:11.530 --> 00:09:17.190
Content you know the content of flow files the actual data carried by the file

00:09:17.190 --> 00:09:21.210
So, you know depending on the application it can be text. It can be binary

00:09:22.270 --> 00:09:26.790
Any other formats we have used it before to

00:09:28.370 --> 00:09:29.330
detect

00:09:30.690 --> 00:09:36.550
Heart murmurs and stuff like that and heartbeat data, so we would actually bring in

00:09:38.030 --> 00:09:38.850
audio

00:09:38.850 --> 00:09:42.150
recordings of you know of your heart

00:09:42.910 --> 00:09:45.370
filter and sort those and

00:09:45.370 --> 00:09:47.150
Use, you know

00:09:47.150 --> 00:09:48.870
some additional processors to

00:09:50.150 --> 00:09:53.650
Extrapolate that data look for heart murmurs those types of things

00:09:54.290 --> 00:09:59.290
You know, like I said, I've seen almost every type of data and go through not five

00:10:00.110 --> 00:10:05.050
So yeah, the content is what is processed or transformed by the processors

00:10:05.650 --> 00:10:13.670
There is processors to handle attributes, but most of the processors is to work on the content

00:10:13.670 --> 00:10:16.170
So it actually works on

00:10:16.170 --> 00:10:18.110
that package of data

00:10:19.650 --> 00:10:25.370
Life cycle of a flow file, you know full files are created by source processors that ingest into nine five

00:10:25.370 --> 00:10:31.210
They are processed and potentially split merged or transformed as they move through the flow

00:10:31.990 --> 00:10:38.230
Full files are finally exported out of nine five by a destination processor or store

00:10:38.230 --> 00:10:45.150
You know, so as you know the life of a flow file as you can imagine is being ingested into the system

00:10:45.150 --> 00:10:48.010
it's going through, you know different operations and

00:10:48.150 --> 00:10:55.430
You know at the end it's going to its destination. So the final step is to push that flow file out to its final destination

00:10:56.970 --> 00:11:02.230
Record that in the data governance and then it drops the the flow file

00:11:03.910 --> 00:11:10.550
Water flow files important, you know understanding the structure and lifecycle flow files is crucial because they are the backbone

00:11:11.090 --> 00:11:13.310
of the data flows and not five

00:11:13.310 --> 00:11:15.430
So efficiently, you know

00:11:15.430 --> 00:11:17.390
One of the things I like to do

00:11:18.390 --> 00:11:21.430
Talk about some of the efficiency of a data flow

00:11:21.430 --> 00:11:27.430
so efficient management of flow files ensures that data is processed reliably and

00:11:27.430 --> 00:11:28.230
efficiently

00:11:28.230 --> 00:11:30.890
Maintaining data integrity and traceability

00:11:33.270 --> 00:11:41.330
One of the things at the end of this class I will do is I like to take back any questions that I can't

00:11:41.330 --> 00:11:44.430
Well, I'll be able to answer questions immediately

00:11:45.630 --> 00:11:47.630
Sometimes on those questions

00:11:48.150 --> 00:11:52.830
You know, so if I pause for a second when you're asking a question so I can write it down

00:11:52.830 --> 00:11:55.710
I like the you know at the end of the class

00:11:55.710 --> 00:12:00.530
I like to send out this presentation as well as the q&a portion

00:12:00.530 --> 00:12:06.470
So any questions asked I can write them down. I can get them answered and get them incorporated into this presentation

00:12:06.470 --> 00:12:09.430
You know, so the class is over on Wednesday

00:12:09.970 --> 00:12:17.470
Uh in most likely around Friday or next Monday. I will send this presentation out to the wider audience

00:12:17.470 --> 00:12:23.610
Just so you'll have it for reference and you know, some of that training material that that we can leave behind

00:12:24.310 --> 00:12:26.030
um, I

00:12:26.030 --> 00:12:29.890
Think I've kind of nailed, you know some of the key concepts of not fine depth

00:12:29.890 --> 00:12:36.690
But just in case, you know processors are there. There's a primary component within not five. We'll talk about that a lot

00:12:37.690 --> 00:12:40.790
There's different types of processor, you know, they

00:12:43.350 --> 00:12:45.250
Processors tailored for different tasks

00:12:45.890 --> 00:12:51.310
Just so you know and and you know as a because i'm still part of that community

00:12:51.970 --> 00:12:54.070
I know what's coming up and

00:12:54.070 --> 00:12:56.610
Um, you know some of the nuances there

00:12:57.210 --> 00:13:01.090
when you download nifi now you still get a

00:13:01.650 --> 00:13:05.350
You know, I think it's like 300 processors out of the box

00:13:05.350 --> 00:13:08.890
Um, you know one of the biggest complaints is is you know

00:13:08.890 --> 00:13:12.450
I just don't really need all these processors or I need my own processor and

00:13:13.290 --> 00:13:15.930
The download is one and a half gig of

00:13:17.750 --> 00:13:22.470
Just for nifi and most of that space is actually the processors because of something

00:13:23.210 --> 00:13:28.830
So, you know one thing to keep in mind is as nifi continues to release updates

00:13:28.830 --> 00:13:31.610
um in the updates they are going to

00:13:32.590 --> 00:13:36.810
not put as many processors and you can go to

00:13:37.030 --> 00:13:42.870
Uh some different sources like, you know maven online and others to pull those down

00:13:43.550 --> 00:13:46.470
They will still be built and ready to go

00:13:47.030 --> 00:13:52.830
And there will be some that you know source code that you will need to compile and build and deploy

00:13:53.510 --> 00:13:59.090
But for today, uh, for instance, we have all the processors we will need within

00:13:59.670 --> 00:14:03.110
Our downloaded apache nifi, uh, and we'll go through those

00:14:04.520 --> 00:14:10.020
Custom processors we we've already talked about too. Uh, if there's

00:14:10.020 --> 00:14:11.160
In my

00:14:12.340 --> 00:14:18.840
Uh experience, right? Usually what comes out of the box will work about 95% of the time

00:14:18.840 --> 00:14:23.700
Uh, I do run into cases where we will need a custom processor

00:14:23.700 --> 00:14:30.160
Uh, you know, I can think of a couple for this past implementation. I did

00:14:30.160 --> 00:14:35.280
Where we needed some specialized connectors for some of the

00:14:36.000 --> 00:14:40.440
Tools for instance as or as well as like log systems

00:14:40.440 --> 00:14:44.080
Things like gray log and and other things out there

00:14:44.680 --> 00:14:50.460
So being able to interface with different applications, you know, that's usually when we build a new processor

00:14:50.460 --> 00:14:53.480
Uh, we will build a new processor

00:14:54.080 --> 00:14:58.300
Depending on you know, there's some models that you can run in flight

00:14:58.300 --> 00:15:06.300
Uh, you know, you can do image classification image recognition models, you know things like that as the data is coming through

00:15:07.080 --> 00:15:09.980
Uh, depending on the output of that model you may

00:15:09.980 --> 00:15:14.380
You know filter or or change direction or send it to a different data flow

00:15:14.940 --> 00:15:17.820
You know, so so there's a lot of capabilities, uh

00:15:18.800 --> 00:15:22.620
You know for custom processors, um in those types of things

00:15:24.200 --> 00:15:28.280
Connections are links that route flow files between processors. We'll go into that

00:15:28.300 --> 00:15:29.880
Talk about back pressure

00:15:29.880 --> 00:15:33.340
You know some of those things they are they not only transfer data

00:15:33.340 --> 00:15:38.740
But also control the data flow management such as prioritization back pressure and load balances

00:15:38.740 --> 00:15:42.660
You know, there's there's a few different policies within nine five

00:15:43.180 --> 00:15:46.140
You know, you can do a five-fold method the first in first out

00:15:46.780 --> 00:15:50.540
Uh, you can do, you know some very advanced, um

00:15:51.380 --> 00:15:55.800
Routing with the rules engine for instance, um, you know you can do all kinds of things

00:15:55.800 --> 00:16:00.380
We'll go into some of the back pressure what it does as well as some of the load balancing

00:16:01.260 --> 00:16:05.260
Uh, and then to finish this off, you know enhancing data flow with connections

00:16:05.260 --> 00:16:10.920
Connections can be figured with specific settings to manage how data moves through the system. You may

00:16:11.640 --> 00:16:16.360
You know, you may have a use case where you need data to arrive

00:16:16.940 --> 00:16:23.300
Uh to a processor before another packet of data arrives you can set that up. You know, you may

00:16:23.300 --> 00:16:28.660
Uh, you know, you may have a data flow that you want to take priority on it's data, you know

00:16:29.560 --> 00:16:34.740
You know processing where you know, you've got other data flows that are kind of a lower level priority

00:16:34.740 --> 00:16:37.460
You know, you could set those types of things

00:16:37.960 --> 00:16:44.580
Um, you know, there's a lot of a lot of capabilities here a lot of customization. That is that's part of nine five

00:16:45.380 --> 00:16:47.940
Again, you know, that's the power of it

00:16:47.940 --> 00:16:52.960
But when we get down to some of the design principles and how to do things

00:16:52.960 --> 00:16:54.260
Um, you know

00:16:54.260 --> 00:16:59.740
We'll see this even in this class on some of the tasks that we will have to to build a flow

00:16:59.740 --> 00:17:01.840
And how they will be different

00:17:03.980 --> 00:17:05.100
All right

00:17:05.100 --> 00:17:10.240
We're getting close to to getting done with the the presentation you're going on break

00:17:10.240 --> 00:17:14.080
Uh, and then we get back from break. We'll work on getting NAFA up and running

00:17:14.080 --> 00:17:16.800
uh, but templates and version control so

00:17:17.560 --> 00:17:22.600
Uh, you know templates in NAFA are free to find configurations of a data flow

00:17:22.600 --> 00:17:24.600
They can be saved and reused

00:17:24.600 --> 00:17:28.180
Um, you see this quite a bit, you know

00:17:28.180 --> 00:17:36.060
Most most most organizations have went, you know away from templates and went to the version control as you can imagine

00:17:36.060 --> 00:17:41.800
Just because you know, you can integrate this into your cscd process you can

00:17:42.580 --> 00:17:44.300
um, you know templates

00:17:44.940 --> 00:17:52.140
That you can't work with templates like you can uh, you know a flow file backed up into gith or git lab or github

00:17:52.600 --> 00:17:56.520
You know the NAFA registry which will also go over

00:17:56.520 --> 00:18:01.040
Uh in those types of things, but you know, you can create a template

00:18:01.740 --> 00:18:05.100
I like creating templates sometimes because you know

00:18:05.100 --> 00:18:10.440
I don't have to worry about the the git lab and the github connection those types of things

00:18:10.440 --> 00:18:13.720
I can go to the canvas. I can build my flow

00:18:13.720 --> 00:18:17.660
I can save it as a template and send it to my colleague for instance

00:18:17.660 --> 00:18:20.480
My colleague can quickly import a template

00:18:21.120 --> 00:18:26.880
That flow will be up and running on their canvas and and they can go from there. So

00:18:26.880 --> 00:18:29.540
So templates are are pretty important

00:18:29.540 --> 00:18:32.440
um, but you know here lately

00:18:32.960 --> 00:18:36.540
It's more and more about version control. So

00:18:36.540 --> 00:18:42.940
They encapsulate a set of processors connection and controller services for a specific task or workflow

00:18:42.940 --> 00:18:44.480
uh, you know

00:18:44.480 --> 00:18:51.880
Templates simply simplify the deployment of common patterns and promote best practices by allowing users to deploy tested flows quickly

00:18:51.880 --> 00:18:55.880
um, and and that's the key here is

00:18:55.880 --> 00:19:01.240
You know tested flows quickly. So if you develop a flow you can save it as a template

00:19:01.240 --> 00:19:08.000
export that as a nxml file send that to your your colleague and you should be able to

00:19:08.820 --> 00:19:15.920
Uh and quickly, you know get that flow up and running and and go from there. So, you know

00:19:15.920 --> 00:19:17.140
That's that's template

00:19:17.140 --> 00:19:22.580
NAFI does integrate with NAFI registry, which we will go over

00:19:23.260 --> 00:19:26.100
Which supports versioning of data flows?

00:19:27.060 --> 00:19:30.660
Version control is crucial for managing changes to data flows over time

00:19:31.360 --> 00:19:35.760
Allowing users to track modifications revert to previous versions

00:19:35.760 --> 00:19:39.760
Uh and ensure that the deployments across different environments are consistent

00:19:40.320 --> 00:19:44.960
That's the key ones that we will be working off of. I know we have

00:19:46.960 --> 00:19:48.000
Let's see

00:19:48.000 --> 00:19:50.960
We have a few folks that i've written down that

00:19:52.040 --> 00:19:53.760
Um that would be interested in that

00:19:54.440 --> 00:19:57.140
Some sys admins and those folks

00:19:57.140 --> 00:20:01.980
Um, so the the main thing here is is we're going to look at we're going to touch on templates

00:20:01.980 --> 00:20:04.680
We're going to probably save a template but version control

00:20:05.480 --> 00:20:07.520
Will be our main

00:20:07.520 --> 00:20:10.120
Avenue of saving our flows and those types of things

00:20:12.200 --> 00:20:12.760
um

00:20:12.760 --> 00:20:17.180
And we'll go into using NAFI registry for version control

00:20:17.180 --> 00:20:21.840
You know NAFI registry allows for the storing and retrieval and managing of the version flows

00:20:22.520 --> 00:20:24.700
When we go to the NAFI desktop

00:20:25.480 --> 00:20:31.800
And after we get registry up and running you're going to be able to save your flow check them in check them out

00:20:31.800 --> 00:20:33.960
And those types of things

00:20:34.680 --> 00:20:42.540
And then we will talk about how you can version control those from registry into your own github or git lab environment

00:20:42.540 --> 00:20:49.780
Um, I don't know if someone wants to let me know what what environment you look like I can focus on that

00:20:49.780 --> 00:20:53.960
But you know we can work with a lot of different versions in control system

00:20:56.580 --> 00:21:00.720
Okay, so, uh, let me see this other than chat

00:21:02.600 --> 00:21:03.160
Okay

00:21:03.960 --> 00:21:09.360
So, um, what I like to do is pause here before we go for a quick break

00:21:09.360 --> 00:21:10.880
um, but

00:21:10.880 --> 00:21:13.040
You know what challenges?

00:21:14.140 --> 00:21:18.860
Do you anticipate in implementing or migrating NAFI into your current workflow?

00:21:19.560 --> 00:21:23.780
You know, i'd like to hear from the group on some of the the challenges you may have

00:21:23.780 --> 00:21:30.040
um, and like I said that helps me in tailoring the conversation as well as

00:21:30.880 --> 00:21:32.040
um, you know

00:21:32.680 --> 00:21:35.960
What what we will be trained on so what you know

00:21:35.960 --> 00:21:41.460
What what are some of your challenges in implementing or migrating to NAFI in your in your current process, right?

00:21:41.820 --> 00:21:44.280
And feel free someone just to start talking

00:21:45.740 --> 00:21:47.720
Um considering we're not running it at all

00:21:53.640 --> 00:21:54.580
Fear of the unknown

00:21:57.380 --> 00:21:59.880
We're deploying it with

00:22:02.300 --> 00:22:05.420
The thing we haven't gotten working is multi-tenancy

00:22:06.120 --> 00:22:09.480
So it's just it's still single user mode and it seems like any option we select

00:22:10.600 --> 00:22:13.080
Deploying it from a container. It's

00:22:13.080 --> 00:22:18.860
Single user mode. So i'm wondering if deploying it as a container single user mode is your only option

00:22:19.780 --> 00:22:20.340
um

00:22:20.340 --> 00:22:28.040
It is not but we will we will touch on that. Um, but I can I can understand that pain point as well

00:22:29.160 --> 00:22:30.940
um, okay, um

00:22:32.040 --> 00:22:38.640
One of the biggest challenges for us could be the you know, the cyber security, um aspect that you touched on at the

00:22:38.640 --> 00:22:39.620
beginning

00:22:40.160 --> 00:22:43.740
Okay, you know, I mean even though we know we know it's been a to

00:22:44.760 --> 00:22:49.320
multiple locations and all that but you know, we still have to go through the whole rigging rule for

00:22:50.160 --> 00:22:52.400
our actual demand so

00:22:52.400 --> 00:22:54.440
Um, that's gonna be you know my challenge on

00:22:57.560 --> 00:22:59.480
so, um, no and all those are

00:23:00.320 --> 00:23:06.540
Good things and and I really like the fear of the unknown. Um, you know when we go through this I feel like

00:23:07.120 --> 00:23:08.800
you know, you'll get

00:23:08.800 --> 00:23:13.980
You know less of that fear, uh, just because once you see how easy it is to

00:23:14.800 --> 00:23:19.160
To operate and to start off, you know, I think it's pretty quick to get up and running

00:23:19.160 --> 00:23:25.580
Uh, then it becomes pretty deadly because of all the capabilities and the options you may have and it gets a little overwhelming

00:23:26.220 --> 00:23:30.300
But those are some things I will definitely touch on the multi-tenancy

00:23:31.060 --> 00:23:36.140
Is not necessarily in this class, but what I will do is take that back

00:23:36.820 --> 00:23:40.660
um, and i'm going to work that in for like tomorrow or wednesday

00:23:40.660 --> 00:23:46.460
uh to definitely go over some of that and what that would look like we do have docker desktop on our

00:23:47.440 --> 00:23:48.020
um

00:23:48.020 --> 00:23:54.680
All of our vms and so, you know, we we can we can touch on that and see how that works

00:23:54.680 --> 00:23:59.600
Um, and then definitely we can hit some security aspects all day long. Um,

00:23:59.780 --> 00:24:05.760
Okay, how can the features of non-finance such as data provenance that you know enhance your data governance practices?

00:24:06.320 --> 00:24:11.780
And I ask that because you know, i'm trying to get a better understanding of you know, some of the data governance

00:24:11.780 --> 00:24:16.360
You know requirements you may have uh, you know some of the thoughts, you know, there's

00:24:17.640 --> 00:24:21.700
Um, you know, there's big data governance packages that are out there

00:24:21.700 --> 00:24:25.980
Um, you know, do you have those types of requirements, you know

00:24:25.980 --> 00:24:31.880
Those types of things because it helps me kind of tailor this to what you can expel. Um

00:24:31.880 --> 00:24:37.640
Anybody want to speak on their data governance practices and how this how not five, you know to get

00:24:37.640 --> 00:24:43.100
You know some additional information on enough off of that off the top of my head. No, but when you were talking earlier

00:24:43.660 --> 00:24:44.140
about

00:24:44.140 --> 00:24:47.420
I don't know what some of the telecom and all them were doing

00:24:47.420 --> 00:24:52.800
One of the ideas I thought that I had in my head was like, you know getting event logs or whatever getting

00:24:53.600 --> 00:24:57.100
NYFI and I know from like the central log server srg

00:24:57.100 --> 00:25:01.020
They want to make sure that the data has been modified and all that stuff

00:25:01.700 --> 00:25:07.360
Um, so I think like if we went down a road like that data governance could help in that aspect

00:25:08.060 --> 00:25:14.060
Um, but for like the test community, uh, that would have to be answered by tyler or randy

00:25:15.220 --> 00:25:17.860
Well, that's a good point that's a that's a good point

00:25:20.520 --> 00:25:24.100
So that chain of custody right, you know that

00:25:24.980 --> 00:25:31.460
You can see that data if it was if it was manipulated with their security aspect behind it

00:25:32.780 --> 00:25:34.260
um, and then honestly

00:25:35.140 --> 00:25:41.240
That's why telecoms are using this. Um, you know because of some of those capabilities

00:25:42.280 --> 00:25:45.800
Yes, it was like in the srg they wanted message hashing

00:25:47.220 --> 00:25:53.560
You know, uh, you know, um digest all that all that, you know fun stuff and associating it to that particular message

00:25:54.280 --> 00:25:55.760
um, and it's not

00:25:55.760 --> 00:25:59.180
Easy to do with like our syslog and stuff like that. So

00:25:59.180 --> 00:26:00.960
This could us, you know

00:26:00.960 --> 00:26:06.860
Maybe this is something that we could look into at some point that could help us close those gap

00:26:07.480 --> 00:26:09.260
Oh, definitely

00:26:09.260 --> 00:26:12.680
Okay, anybody else with uh, some of their your data governance

00:26:12.680 --> 00:26:20.340
Perfect. Um, are there any specific processes in your operation that could immediately benefit from the nine five capabilities?

00:26:20.440 --> 00:26:25.780
And I know that that's kind of broad but you know where you know, I like to hear from the audience

00:26:25.780 --> 00:26:30.700
Where do you see nine five fitting in and how it can fit in and how can it just help you?

00:26:31.460 --> 00:26:32.960
You know do that data orchestration?

00:26:34.520 --> 00:26:36.640
I bet I think bam's taking a break

00:26:40.060 --> 00:26:44.780
Uh, we have a compliance database written and access that we use that pulls from

00:26:45.500 --> 00:26:50.380
A bunch of different sources typically using rest api. Um, and all that stuff's done manually

00:26:51.000 --> 00:26:53.800
Oh, wow, I feel like that that's really low-hanging fruit

00:26:53.800 --> 00:26:54.820
Wow

00:26:55.880 --> 00:26:56.500
Yeah

00:26:57.180 --> 00:27:00.140
It's been done manually by somebody on that's team for

00:27:01.040 --> 00:27:02.680
quite a while now

00:27:02.680 --> 00:27:06.500
and and that actually touches on some of the

00:27:06.500 --> 00:27:08.680
Initially kick this off where?

00:27:08.680 --> 00:27:15.880
You know we see scripts just a single python script running just to do something right and and

00:27:15.880 --> 00:27:21.560
You know, it seems kind of small and you're putting this this this big project in front of it

00:27:21.560 --> 00:27:27.260
But you know really understanding, you know those data sources getting those data sources into

00:27:27.260 --> 00:27:30.180
Did you say access like microsoft access?

00:27:34.420 --> 00:27:38.180
Next you're going to tell me you use excel. Um

00:27:41.120 --> 00:27:47.020
So, uh, you know, just yeah being able to read the data from uh access push that data in and keeping those

00:27:47.020 --> 00:27:49.580
you know keeping the

00:27:49.580 --> 00:27:50.520
um

00:27:50.520 --> 00:27:56.200
A record of all of that, you know is definitely needed. I think it will help you with you know

00:27:56.200 --> 00:28:01.560
Some of your compliance issues, uh, it'll help automate that. Um, it'll you know

00:28:01.560 --> 00:28:05.800
There's a lot of rules and a lot of triggers and things like that you can build in

00:28:06.380 --> 00:28:08.020
um, you know and so

00:28:08.580 --> 00:28:10.120
Perfect. Okay

00:28:10.120 --> 00:28:16.640
Uh, you know, what else what other immediate benefits do you do you guys help to get from an alpha?

00:28:17.020 --> 00:28:19.460
For us that data processing branch we're working on more

00:28:20.940 --> 00:28:26.900
Long-term process so it's not really immediate but it's it's our only use case right now, but it's

00:28:27.980 --> 00:28:34.260
Or essentially a real-time data stream and pipeline from one of our test sites to uh do some

00:28:35.420 --> 00:28:42.300
Uh verification on data so running it through some machine learning models to identify like bad sensor data

00:28:43.320 --> 00:28:43.860
uh

00:28:44.380 --> 00:28:52.300
And do some just data verification while it's coming through the pipeline so we can sort of facilitate an automated qa on the data

00:28:52.300 --> 00:28:54.480
Oh very nice

00:28:58.220 --> 00:28:58.900
Okay

00:29:00.600 --> 00:29:01.200
Um

00:29:02.860 --> 00:29:07.460
I've heard potentially you also in this may be related to

00:29:07.460 --> 00:29:10.220
To the the real-time pipeline

00:29:10.220 --> 00:29:17.060
But you know you're you're trying to get data from a talk or get data to a talk smartly filter those out

00:29:17.980 --> 00:29:22.180
Um, you know those types of things, you know, some of the questions

00:29:22.180 --> 00:29:28.520
Previously was like, you know, how do you how do you get minified to to pull that data in?

00:29:28.520 --> 00:29:34.680
Send it to your talk you've talked can filter what that what that kind of architecture looks like so

00:29:35.620 --> 00:29:39.240
Uh, i'm taking note of that previously, but I think that's still valid

00:29:41.380 --> 00:29:43.140
Yes or no

00:29:43.140 --> 00:29:46.640
Yes, so right now our plan for that is to use minify

00:29:46.640 --> 00:29:51.580
For the ingestion on the instrumentation side now to put that into the

00:29:52.980 --> 00:29:58.800
Oh beautiful those, um where you plan to use minify, what is the

00:29:58.800 --> 00:30:01.820
Is it like a an edge device running linux?

00:30:03.160 --> 00:30:05.720
Uh, is it like a windows laptop?

00:30:06.220 --> 00:30:10.800
Uh, you know, can you go if you can go into details? What you know, what does that kind of look like?

00:30:14.560 --> 00:30:20.300
But there could be future use cases on some more restricted instrumentation works, you know

00:30:20.300 --> 00:30:21.700
Possibly a microcontroller

00:30:23.440 --> 00:30:26.380
One of those things, okay

00:30:28.000 --> 00:30:30.260
All right, and then um

00:30:30.260 --> 00:30:36.880
How might you use non-phys scalability and flexibility to improve the data handling and processing in future projects?

00:30:37.560 --> 00:30:43.100
So, you know and I ask this question because i'm i'm trying to I think it was

00:30:43.100 --> 00:30:43.400
Was

00:30:45.920 --> 00:30:49.280
Sean no, shon's a dev, uh amanda

00:30:49.280 --> 00:30:50.360
um

00:30:50.360 --> 00:30:52.760
And erin, you know essays

00:30:52.760 --> 00:30:57.200
Uh looking at you know deploying this in a multi-tenancy

00:30:57.920 --> 00:30:59.120
scalable fashion

00:30:59.740 --> 00:31:03.360
And and so that's why i'm asking is you know, how might you use?

00:31:03.620 --> 00:31:06.120
How do you plan to use nafa for this?

00:31:07.000 --> 00:31:07.440
um

00:31:07.440 --> 00:31:12.480
And and that'll kind of help me tailor the conversation when we go into some of the scalability

00:31:12.480 --> 00:31:15.200
Some of the flexibility and stuff

00:31:15.200 --> 00:31:17.340
Right you want to take that one?

00:31:19.160 --> 00:31:23.400
You know, I don't I don't think we really have a use case for the scalability

00:31:23.400 --> 00:31:29.420
But I can I can say that we're designing it in this way to account for there's a lot of data analysts at ypg

00:31:29.420 --> 00:31:30.780
And we're expecting

00:31:31.740 --> 00:31:32.180
uh

00:31:32.800 --> 00:31:35.940
A lot of people to when they see the platform want to use it

00:31:35.940 --> 00:31:38.160
So we're trying to design it up front to be able to

00:31:39.040 --> 00:31:39.720
Be scalable

00:31:40.280 --> 00:31:42.280
But I would say our immediate use case

00:31:43.220 --> 00:31:43.740
um

00:31:44.440 --> 00:31:46.480
Doesn't really need that. Okay

00:31:47.620 --> 00:31:52.220
Okay, collar, uh, you have that the scalability issue

00:31:54.520 --> 00:31:59.320
Um, we definitely will have it need for scalability in the future. I mean, I don't know

00:32:00.160 --> 00:32:04.280
Exactly what that looks like yet, but there's um

00:32:04.280 --> 00:32:05.460
several different

00:32:06.080 --> 00:32:08.060
locations that will be

00:32:08.060 --> 00:32:11.500
Creating a lot of data throughout the day at least for this

00:32:12.420 --> 00:32:14.920
Initial project is just going to be sort of you know

00:32:14.920 --> 00:32:20.560
One site and then it's going to expand out to multiple sites for for sort of a single mission area

00:32:20.560 --> 00:32:24.120
And then might move out to to more sites. So

00:32:24.120 --> 00:32:28.540
Scalability initially isn't going to be extremely important. But as it goes on, um

00:32:29.300 --> 00:32:31.420
There's probably going to be quite a few workflows

00:32:32.600 --> 00:32:36.480
So I can imagine being significantly more important in the future

00:32:37.440 --> 00:32:37.800
Okay

00:32:39.020 --> 00:32:44.240
so one of the one of the use cases that I think we might have in the future is um

00:32:44.240 --> 00:32:46.280
We have a data lake that's going to be in the cloud

00:32:46.280 --> 00:32:49.520
But there's a lot of talks from our chief data officer about having an on-prem data lake

00:32:50.240 --> 00:32:50.740
Oh very nice

00:32:53.180 --> 00:32:55.580
Um get that data both places

00:32:56.380 --> 00:33:02.960
Um the test data and if that was the case we produce a lot of data. So I think we would definitely need to

00:33:02.960 --> 00:33:10.820
That and what what um what storage like what database and storage of solution are you looking at for your

00:33:10.820 --> 00:33:12.500
You know your own frame

00:33:16.080 --> 00:33:16.680
You know

00:33:18.720 --> 00:33:19.240
Yep

00:33:20.480 --> 00:33:21.000
Okay

00:33:22.000 --> 00:33:26.060
Um, you know beauty of non-fi is it does have a an s3 processor

00:33:26.060 --> 00:33:30.680
You know, it has a azure blob storage processor, you know and those types of things

00:33:31.540 --> 00:33:32.060
um

00:33:32.060 --> 00:33:33.320
Since you're using minio

00:33:34.940 --> 00:33:35.560
Correct

00:33:37.200 --> 00:33:43.980
Oh we so there's processors for both of them so perfect the the minio doesn't

00:33:45.140 --> 00:33:51.500
Uh, I don't think it comes out of the box yet, but it is available as a processor, uh on github

00:33:52.180 --> 00:33:57.740
Uh, you know, so so perfect. No, I like that. I've actually seen that quite a bit

00:33:58.540 --> 00:34:01.620
Uh lately with minio. It's you know folks

00:34:02.060 --> 00:34:05.500
Coming out of the cloud and still you know, kind of keeping it local for

00:34:06.180 --> 00:34:13.020
You know security reasons compliance reasons and and just uh, you know overall process network activity. Yep

00:34:13.020 --> 00:34:13.900
Exactly

00:34:14.900 --> 00:34:15.460
exactly

00:34:16.120 --> 00:34:20.820
Uh, no, and we will uh, i'll make sure to touch on some of those things

00:34:21.440 --> 00:34:25.860
um, you know as we go through and start building full files and and

00:34:25.860 --> 00:34:29.980
You know those types of stuff. Um, we could potentially even

00:34:30.700 --> 00:34:36.080
On the third day, uh do a a flow where we pick data up and put it to mino

00:34:38.740 --> 00:34:39.260
Okay

00:34:39.880 --> 00:34:42.900
All right. Well that being said, um

00:34:42.900 --> 00:34:47.780
Let's take our our first break. I need to get water since i'm talking a lot

00:34:47.780 --> 00:34:50.760
Um, I want to make sure I keep my voice throughout the day

00:34:50.760 --> 00:34:55.720
Uh, let's take a 15 minute, uh, you know, rest bio break rest and break

00:34:55.720 --> 00:34:59.260
Uh, get some water. We'll meet back here at 11 50

00:34:59.260 --> 00:35:04.300
Uh, and then 11 50 my time. I think it's 9 50 your time

00:35:06.420 --> 00:35:06.980
And

00:35:06.980 --> 00:35:13.440
Um, then we'll go through installing nafi and windows and start working on building our first flow

00:35:13.440 --> 00:35:16.520
So, uh, we'll see everybody back here in about 15 minutes

00:35:17.160 --> 00:35:21.520
And if you need anything, just put it in the chat. I'll be running back and forth with getting water

00:35:21.520 --> 00:35:22.780
and restroom

00:35:24.020 --> 00:35:25.520
All right, see you continue

00:35:40.300 --> 00:35:43.080
My best is

00:35:50.340 --> 00:35:53.980
And then we are going to get started on installing naf

00:35:55.020 --> 00:35:57.100
I don't know if you're back or not, but I really like to

00:35:58.020 --> 00:36:00.480
Wait till you hear about our processes

00:36:01.660 --> 00:36:10.100
So being I don't know if anybody's here but being uh a former soldier being within the army itself so many years now

00:36:10.100 --> 00:36:17.300
I completely understand the nuances

00:36:18.260 --> 00:36:24.660
While we wait for give a couple more minutes, um usually, you know during software training

00:36:24.660 --> 00:36:27.180
You just kind of run through the software

00:36:27.300 --> 00:36:32.340
Uh, but I felt it was pretty critical for us to actually do an install within windows

00:36:32.340 --> 00:36:35.620
Just so everyone has that experience

00:36:36.800 --> 00:36:40.540
If you're going to be working within nafi even the local environment

00:36:41.140 --> 00:36:45.440
Who knows you may want to spin up your own instance on your your own laptop

00:36:46.360 --> 00:36:51.320
Get it working get your flow built, you know test some things out save it as a template

00:36:51.320 --> 00:36:56.520
Uh, and then you know export that to your dev environment your test environment

00:36:56.520 --> 00:36:58.040
You know those types of things

00:36:58.660 --> 00:37:02.120
When we are and i'll go over this, you know in detail

00:37:02.120 --> 00:37:06.820
but when we're installing nafi there's some key things to to take a look at because

00:37:06.820 --> 00:37:10.520
There is some specific directories being created

00:37:10.520 --> 00:37:17.060
Um, and and there's a reasoning behind that. There are some specific directories that you will

00:37:17.060 --> 00:37:22.200
Need to understand and learn about as well. So that's one of the reasons I like to

00:37:22.200 --> 00:37:25.020
to to really go in depth and

00:37:25.540 --> 00:37:29.900
Um, i'm taking a risk here because I don't have it installed because i'm gonna walk, you know

00:37:29.900 --> 00:37:31.220
We're going to all do it together

00:37:31.720 --> 00:37:34.700
Uh, I do have java on everyone's machine

00:37:35.360 --> 00:37:39.740
Um, so we'll go through some of the the basics so but we'll give it just another minute

00:37:39.740 --> 00:37:42.460
And then we'll get started then if you're back, um

00:37:43.920 --> 00:37:46.500
Can you just let me know like like

00:37:47.220 --> 00:37:52.540
Um, how long you you all think you need for lunch? Like I said around 45 minutes is is

00:37:52.740 --> 00:37:57.920
Kind of what what I like to go off of but I can do an hour as well. No problem

00:37:58.580 --> 00:38:00.880
Right. Yeah. Um, so

00:38:00.880 --> 00:38:05.540
Actually, we usually just do 30 minutes 45 minutes is fine. Um

00:38:06.180 --> 00:38:11.940
Whatever people need okay. Okay, we'll do 45 and um

00:38:13.100 --> 00:38:16.700
That will give you the capability to eat and then also

00:38:16.700 --> 00:38:21.660
You know play around with whatever we've already built and done because you're gonna have

00:38:22.300 --> 00:38:25.300
You're gonna have this desktop environment throughout the training

00:38:25.820 --> 00:38:33.080
Um, and you you do have the capability to download any information that you have there. Uh, you can

00:38:33.700 --> 00:38:39.560
Um, you know i'll upload the presentation as well so you can have it, you know on the desktop environment

00:38:39.560 --> 00:38:44.700
So there's a lot of capabilities, but we will go ahead and get started. Let me exit all this

00:38:47.160 --> 00:38:50.340
Okay, so if everyone can

00:38:51.060 --> 00:38:56.020
Go ahead and and start working off your desktop. I'm sharing my screen

00:38:56.020 --> 00:39:02.020
Um, but you know, uh, if you can let's go ahead and get logged into the desktop environment

00:39:02.020 --> 00:39:04.820
Um, let me see I can pull everyone up

00:39:06.040 --> 00:39:09.200
Looks like everyone is good to go