Apache Nifi GROUP 2 Videos

                WEBVTT

00:00:02.340 --> 00:00:04.580
Hey, good morning, Agda. How's it going?

00:00:06.740 --> 00:00:14.560
It's, I don't know, it's going okay. I wish I was a lot more further away than being trying to figure this out.

00:00:15.780 --> 00:00:25.960
But basically I had the same thing from yesterday, except I didn't remove some of the extra processes, so I had to introduce some folders.

00:00:26.860 --> 00:00:34.500
So for example, getting the files, routing on attributes based on the file type, right?

00:00:35.020 --> 00:00:42.040
And then the CSV gets processed, the JSON gets put into another folder, which is kind of like an extra step.

00:00:42.200 --> 00:00:46.160
I think you guys didn't do that, but I'd already done that, so I just let it be.

00:00:46.160 --> 00:00:58.980
And the same thing, the CSV gets put into a directory, and then you start processing the splitting of it, evaluating the path, breaking it up into different records.

00:01:02.680 --> 00:01:14.060
And I wanted to, from here, I wanted to merge all records into one file and then start evaluating, which I never really quite figured out yet.

00:01:14.060 --> 00:01:23.600
So I tried to do what you guys were doing over here, like either merge records or merge content, and then kind of processed from there.

00:01:25.220 --> 00:01:37.700
But I went back and did what you guys were doing, putting attributes to your CSV, updating the file name, and then putting the file name right here.

00:01:38.160 --> 00:01:40.720
I mean, putting the file here. So that's what I have.

00:01:41.540 --> 00:01:50.160
Awesome, awesome. And I see you've got some labels, you're trying to mutify the flow.

00:01:51.160 --> 00:01:54.280
You can see how it can get out of hand very quickly.

00:01:55.840 --> 00:02:00.400
But you're doing something that I like to do when I'm developing a flow.

00:02:00.740 --> 00:02:07.780
It looks like you brought down processors that you think you might need or you think you might use, and you put them off to the side.

00:02:07.780 --> 00:02:12.540
And then I'm not using that. I mean, delete it. No, I do the same exact thing.

00:02:12.620 --> 00:02:23.020
So if you're looking at one of my flows, while I'm building it out, I'm thinking through and I'm like, OK, well, I've got the data this way.

00:02:23.040 --> 00:02:26.460
I need it this other way. What processor can handle that?

00:02:26.760 --> 00:02:30.340
I'll bring down a couple of processors. I'll play around with them.

00:02:30.340 --> 00:02:38.340
And then when I get through testing out some of that section, I'll go through and start deleting unused processors.

00:02:39.300 --> 00:02:54.600
But I do the same exact thing. Do remember, anytime you are writing and reading from a disk, it's a much slower operation than processing it as the data is coming through.

00:02:54.600 --> 00:03:03.560
So try to limit, anytime you can, some of the slower processes.

00:03:05.320 --> 00:03:09.720
Zipping and unzipping files is a slow process. It uses a lot of resources.

00:03:10.240 --> 00:03:24.440
Reading and writing from disk is a slow process. It can be a slow process as well because you're reliant on that disk speed to put the files and to retrieve the files, those types of things.

00:03:24.440 --> 00:03:33.760
So try to limit any kind of outside of NaPhi interactions that you can just to keep it within NaPhi.

00:03:33.900 --> 00:03:48.580
Also remember, anytime it leaves NaPhi, and so when you drop it off on the file system and then pick it up later, anytime that that data leaves NaPhi, it's also leaving the data governance.

00:03:48.580 --> 00:04:02.620
So it's not being tracked. So if there was another process sitting there reading that directory of files that you're waiting to pick up, something can happen with the data or similar.

00:04:03.320 --> 00:04:12.200
So just kind of keep that in mind. If it goes outside of the NaPhi ecosystem, you're losing provenance and lineage and those types of things.

00:04:12.200 --> 00:04:22.840
So there are use cases though where you would need to write that to disk and pick it up later. But try to limit that.

00:04:23.680 --> 00:04:31.980
I like how you're doing the route on attribute. We worked on that yesterday evening to sort and filter those files.

00:04:33.080 --> 00:04:41.080
I like your thought process of let's just get it all together into a file and then I can start making some decisions from there.

00:04:41.080 --> 00:04:51.000
So you were able to take your two different types of files, CSV and JSON, bring it all into a common format, all into one document.

00:04:52.240 --> 00:05:04.280
And then I can see from there you would make some decisions or calculations. You may write it to CSV and you already have like an Excel template that will open it up and do its thing.

00:05:04.280 --> 00:05:15.920
So overall, you know, again, I didn't expect anybody to really complete this one, but you did tell a good story on how you were going to handle it.

00:05:16.660 --> 00:05:28.040
So it looks like you also learn how to move your lines around and things like that. You've got a lot of failures going to that log message over there.

00:05:28.040 --> 00:05:35.840
I think once you got it cleaned up and everything, I think it would look great. So overall, you know, a great job. Amy.

00:05:36.920 --> 00:05:45.720
Could you have done like from evaluate, like merge the records and use this instead, instead of going through all this?

00:05:46.020 --> 00:05:47.380
No, let me bring yours back up.

00:05:51.340 --> 00:05:56.760
To merge records to kind of combine it into one file and kind of process it.

00:05:57.600 --> 00:06:05.980
You could. Yeah, you could. So once you had everything is as like JSON, for instance, you could have sent that to a merge record.

00:06:06.920 --> 00:06:12.780
And, you know, here's here's the beauty of merge records and those types of things.

00:06:12.780 --> 00:06:26.520
You then get access again to the controller services, right? And if you wanted to, you could have a schema for that data and it automatically format like we're doing with the CSV JSON.

00:06:27.020 --> 00:06:40.060
It automatically, you know, handle that and, you know, do those types of things. So, yeah, you could have you could have skipped, you know, three or four processes probably right there and send it to the merge records.

00:06:40.060 --> 00:06:44.080
But you're you are going down the right path.

00:06:46.400 --> 00:06:50.520
Okay, thank you. You're welcome. No, thank you. Good job.

00:06:51.400 --> 00:06:53.600
Peter, let's see what you got.

00:06:54.440 --> 00:07:01.360
So, if you can, Peter, just kind of walk me through your flow, what your thought, you know, the scenario. And so, you know,

00:07:01.360 --> 00:07:07.420
So, yeah, I didn't get any further than that.

00:07:07.680 --> 00:07:15.200
Same exact. I got it to have all the attributes, convert them back to CSV, rename them and then export them.

00:07:15.200 --> 00:07:21.320
Oh, you didn't do any of the other analysis as far as in the PDF asking us to look at.

00:07:22.920 --> 00:07:23.880
Yeah, yeah.

00:07:24.540 --> 00:07:24.980
And that's okay.

00:07:26.880 --> 00:07:32.500
That starts up here at the top. Basically, from here, it goes straight down for the most part.

00:07:32.500 --> 00:07:36.680
Here it grabs all the CSV files in the directory.

00:07:38.180 --> 00:07:44.220
I'm kind of breaking it down every step just to see each step where it's going. So I have an original files folder.

00:07:44.380 --> 00:07:47.640
It takes those original files, puts them over there in that folder.

00:07:48.920 --> 00:07:53.460
Updates the schema metadata, converts them to JSONs.

00:07:53.500 --> 00:07:59.940
Updates the attributes. And then it also pushes those converted files over to converted files folder.

00:08:00.820 --> 00:08:06.140
Goes down to the split JSON, which is where this other get file is introduced.

00:08:06.300 --> 00:08:11.020
It grabs the JSON files directly and just pushes those over to the split JSON.

00:08:12.820 --> 00:08:15.620
Then goes down to evaluate JSON path.

00:08:16.860 --> 00:08:20.640
Converts the attributes back to CSV, updates the name and then exports them.

00:08:20.700 --> 00:08:24.620
Then everything else is all just going over to this log error message.

00:08:25.320 --> 00:08:30.220
So all that is running right now. So I'll give it a quick demo.

00:08:31.400 --> 00:08:32.520
Very nice.

00:08:32.980 --> 00:08:42.040
Copy these files into the data directory where it's grabbing all data files from.

00:08:44.840 --> 00:08:45.760
Paste those in there.

00:08:46.720 --> 00:08:51.220
And that'll all run through really quickly so we can see we already have all these files.

00:08:52.000 --> 00:08:55.280
Right here in the final data files folder.

00:08:55.440 --> 00:08:58.220
Oh wow. Yeah, it's almost instant.

00:08:59.440 --> 00:09:02.720
Yeah, I'm impressed by that. It's very quick. I like that.

00:09:04.240 --> 00:09:09.040
If you would have had time, what some of the next steps you would have took?

00:09:10.440 --> 00:09:18.680
I would definitely improve, figure out a better way to name the files in a way that tells users what they are.

00:09:20.020 --> 00:09:23.180
And then also I noticed they're kind of exporting a bit strangely.

00:09:23.960 --> 00:09:29.680
These aren't really CSVs, at least not the way I'm used to them with the header information and then in columns separated by...

00:09:30.320 --> 00:09:32.280
No, that's a JSON document.

00:09:33.120 --> 00:09:35.760
Okay, so I guess I messed up on that conversion then.

00:09:39.400 --> 00:09:42.620
So did you save everything as an attribute?

00:09:45.860 --> 00:09:50.980
Evaluate JSON paths and attributes to CSVs.

00:09:51.620 --> 00:09:54.100
Can we look at your evaluate JSON paths real quick?

00:09:55.500 --> 00:09:59.160
You're grabbing them all and you're setting them.

00:09:59.480 --> 00:10:01.700
So now everything is an attribute.

00:10:03.940 --> 00:10:05.020
Okay, say okay.

00:10:05.140 --> 00:10:07.760
And then you're taking all of those right into CSV.

00:10:10.340 --> 00:10:20.220
Okay, can you stop your set JSON file name?

00:10:20.540 --> 00:10:26.160
And let's just run it real quickly to see what you're getting before...

00:10:26.160 --> 00:10:31.360
I want to see if you have an attribute, what the attributes are saying.

00:10:31.360 --> 00:10:41.320
Because the reason I like doing the attributes to CSV or to JSON is you should have additional data that you can use.

00:10:43.020 --> 00:10:45.360
And so let's see what you got.

00:10:46.900 --> 00:10:49.520
You'll have to pump more files into your directory.

00:10:49.820 --> 00:10:50.480
There you go.

00:10:51.380 --> 00:10:52.220
All right, there we go.

00:10:52.300 --> 00:10:53.000
We're queued up.

00:10:53.180 --> 00:10:55.220
Go ahead and list that.

00:10:59.360 --> 00:11:03.440
And, you know, just go to the far left little icon.

00:11:03.440 --> 00:11:05.120
There you go.

00:11:05.560 --> 00:11:06.600
No, far left.

00:11:07.560 --> 00:11:08.840
The information.

00:11:13.780 --> 00:11:16.440
Okay, so it does have the CSV data.

00:11:16.440 --> 00:11:18.300
Well, actually it does have the attributes.

00:11:18.500 --> 00:11:19.520
If you go down.

00:11:19.520 --> 00:11:27.940
You have hour, humidity, wind speed, temperature, station ID.

00:11:28.520 --> 00:11:36.620
So you are getting those attributes, but you're also getting the CSV attribute that is putting all your attributes in this...

00:11:36.620 --> 00:11:37.840
Oh, hit okay.

00:11:38.100 --> 00:11:41.020
I bet you have a processor sitting before this.

00:11:41.980 --> 00:11:43.180
Hit X.

00:11:43.600 --> 00:11:44.280
There you go.

00:11:47.060 --> 00:11:50.020
And you are splitting your JSON.

00:11:50.080 --> 00:11:51.920
By the way, JSON path.

00:11:53.140 --> 00:11:54.460
Convert CSV to JSON.

00:11:54.600 --> 00:11:55.060
Set JSON.

00:11:56.660 --> 00:11:59.480
Let's look at your configuration for your attributes to CSV.

00:12:00.540 --> 00:12:03.620
So that's the...

00:12:03.620 --> 00:12:05.120
So you see where it says destination.

00:12:05.320 --> 00:12:06.740
You're doing a flow file attribute.

00:12:08.540 --> 00:12:08.820
Yep.

00:12:09.360 --> 00:12:11.160
You want to do a flow file content.

00:12:11.680 --> 00:12:16.560
So we've already got all of the attributes in the flow file.

00:12:16.700 --> 00:12:19.020
Now we want to put it all as a content.

00:12:19.360 --> 00:12:22.220
We got all the attributes already stored.

00:12:22.520 --> 00:12:24.880
Now you want to put it back to a flow file.

00:12:25.260 --> 00:12:29.160
So I bet if you were to do that, it would start writing out your CSVs.

00:12:30.000 --> 00:12:33.440
Now that CSV is going to take all of those attributes.

00:12:33.480 --> 00:12:39.180
And you've got a pretty substantial size CSV data attribute.

00:12:40.040 --> 00:12:46.600
So what you could do is, you know, if you notice in the attributes to CSV,

00:12:47.040 --> 00:12:49.380
we left the blank, the attributes to right.

00:12:49.960 --> 00:12:54.920
You can actually specify just a comma separated list of attributes

00:12:55.580 --> 00:12:58.660
about the attributes you want to be, you know, that you want written.

00:12:59.180 --> 00:13:03.280
Also, because you can specify that, you can put it in the order you want it.

00:13:03.280 --> 00:13:06.880
So maybe you want station ID listed first and date listed second.

00:13:06.880 --> 00:13:09.540
So you can do that ordering.

00:13:10.980 --> 00:13:12.980
And you see the attribute list.

00:13:13.740 --> 00:13:15.100
It's no value is set.

00:13:15.240 --> 00:13:18.380
So it's going to take every attribute and write it as a CSV.

00:13:22.260 --> 00:13:26.700
But one thing I noticed is you have an attribute called CSV data

00:13:26.700 --> 00:13:28.780
that has the data in it.

00:13:29.900 --> 00:13:32.300
And so if you were to write everything as CSV,

00:13:32.300 --> 00:13:37.080
it may not be a proper CSV just because you've got a lot of values

00:13:37.080 --> 00:13:38.460
in that one attribute.

00:13:39.660 --> 00:13:40.340
Okay.

00:13:40.740 --> 00:13:43.820
So I would see where you're picking that up

00:13:43.820 --> 00:13:50.140
and try to either remove that or specify your attribute list here

00:13:50.140 --> 00:13:51.780
and then write that as CSV.

00:13:53.920 --> 00:13:54.400
Okay.

00:13:54.620 --> 00:13:57.480
And then you wrote all your files to disk.

00:14:00.380 --> 00:14:02.500
And I think that looks good.

00:14:03.940 --> 00:14:07.600
You know, the one thing I would take a look at as well is, you know,

00:14:07.680 --> 00:14:09.440
just like everyone, right, you know,

00:14:09.440 --> 00:14:13.540
once you start adding multiple different processors,

00:14:13.760 --> 00:14:15.220
you know, it starts getting out of hand.

00:14:16.200 --> 00:14:19.720
You've got lines going everywhere, those types of things.

00:14:19.740 --> 00:14:22.700
So when you get a chance, like, you know, to clean it up,

00:14:22.700 --> 00:14:26.800
one thing you may think about is busting up this flow.

00:14:27.780 --> 00:14:31.960
And you may want to put, like, your git file and, you know,

00:14:31.960 --> 00:14:35.820
the actual getting of the file and the sorting of the files

00:14:35.820 --> 00:14:37.180
in one processor group.

00:14:37.680 --> 00:14:39.600
And you can then put, you know,

00:14:39.780 --> 00:14:43.900
the conversion of CSV to JSON in its own processor group

00:14:43.900 --> 00:14:47.480
and then the evaluating JSON

00:14:47.480 --> 00:14:50.680
and building your attributes in another group

00:14:50.680 --> 00:14:55.580
and then the writing of that data to disk, you know, in your fourth group.

00:14:56.040 --> 00:14:58.480
And that would break this up even more.

00:14:58.660 --> 00:15:01.560
And then you could use input-output ports,

00:15:01.620 --> 00:15:05.480
which I'll show you on mine, to kind of connect those together.

00:15:07.960 --> 00:15:10.660
But overall, right, you nailed it.

00:15:10.720 --> 00:15:12.600
You got extremely close.

00:15:12.600 --> 00:15:14.460
And I think if you would have had more time,

00:15:14.800 --> 00:15:18.460
you would have been able to write this to disk and do some analysis.

00:15:18.460 --> 00:15:25.460
You could have used maybe some other processors in regex, you know, to do things.

00:15:26.360 --> 00:15:28.040
But, yeah, you got it.

00:15:28.040 --> 00:15:28.980
You got the principle.

00:15:29.060 --> 00:15:31.680
You got the concept of this, so great job.

00:15:32.280 --> 00:15:35.740
Any questions that else that I can answer on this flow?

00:15:36.860 --> 00:15:37.900
Okay, yeah, thank you.

00:15:39.240 --> 00:15:41.020
Yeah, so, yeah, that makes sense.

00:15:41.120 --> 00:15:42.360
You can just group them all together

00:15:42.360 --> 00:15:45.160
and then use these ports to connect different groups together.

00:15:45.400 --> 00:15:45.880
Correct.

00:15:45.880 --> 00:15:51.760
So I actually, I was doing it to mine just so I can kind of point it out.

00:15:51.980 --> 00:15:59.040
So if you're seeing my screen, I actually, I forgot to delete this and this.

00:15:59.960 --> 00:16:02.840
So I worked on cleaning my flow up.

00:16:03.500 --> 00:16:06.180
So what I did is I set the JSON file name

00:16:06.180 --> 00:16:08.580
and I connected it to another processor group.

00:16:08.860 --> 00:16:12.640
And then I've got an input port coming from that,

00:16:12.640 --> 00:16:16.300
going to my put file and then forcing my log error message.

00:16:17.400 --> 00:16:20.400
And now, you know, I'm starting to break this up.

00:16:20.400 --> 00:16:24.960
And I can actually group all of these different colored processor groups

00:16:24.960 --> 00:16:27.420
into its own, or processors into its own group.

00:16:27.980 --> 00:16:34.480
And then utilizing, you know, input output ports to, you know,

00:16:34.660 --> 00:16:37.100
receive that data or to output that data.

00:16:38.000 --> 00:16:40.080
And that's a nice thing about that.

00:16:40.080 --> 00:16:42.560
And it kind of helps clean it up and organize it.

00:16:43.220 --> 00:16:48.340
And then, you know, you may have a standardized way of writing data,

00:16:48.760 --> 00:16:50.680
even if it's not a controller service.

00:16:51.020 --> 00:16:55.940
And so you can have input ports from multiple different processor groups,

00:16:56.520 --> 00:16:58.980
all, you know, as well as different organizations

00:16:58.980 --> 00:17:02.800
using that same method to write data.

00:17:03.720 --> 00:17:07.000
So, you know, just tips and tricks if you want to clean this up

00:17:07.000 --> 00:17:09.620
where, you know, when you're building these in the future,

00:17:10.000 --> 00:17:11.780
you know, just some things to think about.

00:17:13.400 --> 00:17:14.880
Okay, yeah, that sounds useful.

00:17:15.060 --> 00:17:15.900
Thank you.

00:17:16.180 --> 00:17:16.600
Yeah, no.

00:17:16.620 --> 00:17:18.940
I don't think I really have any other questions right now,

00:17:18.940 --> 00:17:20.200
like you said, with more time.

00:17:20.640 --> 00:17:26.600
I go into the setting alerts and stuff like that.

00:17:26.600 --> 00:17:27.020
Yeah.

00:17:28.020 --> 00:17:28.460
Okay.

00:17:29.000 --> 00:17:29.140
No.

00:17:29.140 --> 00:17:29.960
Yeah, there's a lot.

00:17:30.820 --> 00:17:34.220
There's definitely a lot here that just need a lot of practice

00:17:34.220 --> 00:17:36.820
to get used to what we're able to do and how to do everything.

00:17:38.200 --> 00:17:39.020
It is.

00:17:40.040 --> 00:17:44.800
And, you know, and this is for everyone, if you, you know,

00:17:45.700 --> 00:17:52.260
everyone got NaFi up and running on their local machine relatively easily.

00:17:53.240 --> 00:17:57.940
And so, you know, I find in my day-to-day job as, you know,

00:17:58.100 --> 00:17:59.580
whatever job hat I'm wearing,

00:17:59.580 --> 00:18:05.260
I will sometimes have large amounts of files to deal with or something.

00:18:06.420 --> 00:18:10.340
And, you know, I'll actually start NaFi, you know,

00:18:10.440 --> 00:18:13.880
build a quick flow to handle the processing of those, that data,

00:18:14.200 --> 00:18:15.700
and then I'll shut it back down.

00:18:15.740 --> 00:18:19.720
So, you know, you may run into scenarios like that where you're like,

00:18:19.740 --> 00:18:22.360
well, wait a minute, let me just open up my local NaFi instance,

00:18:22.820 --> 00:18:25.580
process all these, you know, zip files that were sent to me,

00:18:25.640 --> 00:18:31.080
and, you know, just so I can get more practice at NaFi and, you know,

00:18:31.120 --> 00:18:32.280
building some of these things.

00:18:33.080 --> 00:18:37.820
And there's times where I could write a script to solve a problem,

00:18:37.820 --> 00:18:40.600
but I'm like, well, let me build a data flow.

00:18:40.920 --> 00:18:46.320
And that way, you know, it provides me more insight to, you know,

00:18:46.340 --> 00:18:47.320
what's going on.

00:18:47.600 --> 00:18:50.140
You know, we could have scripted this out.

00:18:50.440 --> 00:18:53.000
This whole thing could have been scripted in Python, for instance,

00:18:53.100 --> 00:18:53.260
right?

00:18:53.260 --> 00:18:55.400
You could have picked it up in Python.

00:18:55.960 --> 00:18:59.280
You could have built logic into sort and filter and all that.

00:18:59.700 --> 00:19:02.700
You know, you could have done all of this, but you wouldn't have got,

00:19:02.700 --> 00:19:06.220
you know, the security, the data governance, you know,

00:19:06.300 --> 00:19:08.100
all these things that NaFi is providing.

00:19:08.640 --> 00:19:11.020
So, you know, just a little tip and trick.

00:19:11.120 --> 00:19:13.860
You know, I like to keep it installed locally.

00:19:14.100 --> 00:19:19.960
You know, I'll start it, and I'll play around sometimes if I have a use for it.

00:19:19.960 --> 00:19:21.960
But, yeah, I think if you would have had time,

00:19:22.040 --> 00:19:24.680
you would have been able to send an email alert,

00:19:24.900 --> 00:19:27.740
even though there's no email server, you know,

00:19:27.780 --> 00:19:28.560
and something like that.

00:19:29.000 --> 00:19:30.480
So, but great job.

00:19:30.680 --> 00:19:33.780
You know, just keep in mind, you've got your processors named.

00:19:33.820 --> 00:19:36.860
You've got, you know, some of the colors and the images,

00:19:37.320 --> 00:19:41.040
the labels being applied, you know, at the very end,

00:19:41.140 --> 00:19:42.240
before it's ready to go.

00:19:42.660 --> 00:19:45.160
Just clean it up, make sure it's readable.

00:19:45.340 --> 00:19:48.020
That way, if a sys admin logs in and says, hey,

00:19:48.020 --> 00:19:51.120
what's this flow doing, they can easily take a quick look and say,

00:19:51.300 --> 00:19:54.220
okay, well, that's the flow, and that's what it's doing.

00:19:55.420 --> 00:19:58.460
And, you know, that way, it's easily just understandable.

00:19:59.000 --> 00:19:59.680
But great job.

00:20:00.120 --> 00:20:04.040
Any questions overall that I can help with?

00:20:04.060 --> 00:20:04.760
I don't think so.

00:20:05.040 --> 00:20:05.280
Okay.

00:20:06.480 --> 00:20:06.800
Perfect.

00:20:07.020 --> 00:20:07.420
Thank you.

00:20:07.540 --> 00:20:07.980
Yep, yep.

00:20:08.160 --> 00:20:09.280
Oh, Darius, how you doing?

00:20:11.180 --> 00:20:12.140
Hey, I'm good.

00:20:12.880 --> 00:20:15.400
I think I was missing a few steps for mine.

00:20:15.400 --> 00:20:15.740
Yeah, I was.

00:20:15.740 --> 00:20:21.500
But pretty much I would use the get file to get the data.

00:20:22.540 --> 00:20:31.280
I had it route on the attribute, and it goes to the set schema metadata,

00:20:32.660 --> 00:20:34.220
which goes to convert CSV.

00:20:35.100 --> 00:20:36.680
I don't know why I put it all the way over there,

00:20:36.700 --> 00:20:38.600
but it converts the CSV to JSON.

00:20:41.580 --> 00:20:43.600
It splits it.

00:20:44.360 --> 00:20:45.860
It sends it to evaluate.

00:20:49.000 --> 00:20:53.940
From there, I guess it sends it to X2 script.

00:20:54.520 --> 00:20:56.440
I think there might have been another processor,

00:20:56.880 --> 00:20:58.700
which would have been easier than this one.

00:21:00.400 --> 00:21:01.580
Oh, nice.

00:21:01.700 --> 00:21:02.680
Look at you.

00:21:04.880 --> 00:21:09.280
It's not complete, but it was just kind of like an idea.

00:21:12.240 --> 00:21:14.420
I think one of the requirements scenarios,

00:21:14.480 --> 00:21:16.680
they wanted you to categorize weather conditions.

00:21:17.580 --> 00:21:20.780
So I was just searching based on precipitation or temperature.

00:21:22.420 --> 00:21:26.500
So from the documentation, this was how you get one word.

00:21:30.060 --> 00:21:33.320
I'm not sure if this line would work, but if it did,

00:21:33.340 --> 00:21:35.500
it would just load the flow files to JSON.

00:21:36.520 --> 00:21:39.880
You search through the keys for precipitation or temperature,

00:21:40.540 --> 00:21:41.680
and you get those values.

00:21:42.040 --> 00:21:45.500
And then later on, I guess I would try to compare them

00:21:46.980 --> 00:21:49.200
and categorize the conditioners that way

00:21:49.200 --> 00:21:52.700
and then just find a way to append it to the file.

00:21:53.300 --> 00:21:55.900
I think there's other processors that you can do that.

00:21:56.500 --> 00:21:58.000
Yeah, there's merge.

00:21:58.100 --> 00:21:59.760
There's other ways.

00:22:00.180 --> 00:22:01.780
I really like this.

00:22:01.880 --> 00:22:05.220
I really like that we did not go over this,

00:22:05.240 --> 00:22:07.640
so you thought outside the box.

00:22:08.720 --> 00:22:11.940
You know, we went over it a little bit.

00:22:13.380 --> 00:22:16.020
I like some of the capabilities of 9.5,

00:22:16.220 --> 00:22:22.300
executing these scripts and executing binaries on the file system as such.

00:22:22.300 --> 00:22:27.740
I'm really excited for 9.5 2.0 because I love Python,

00:22:28.300 --> 00:22:33.760
and you can actually develop your own custom Python processor now very quickly.

00:22:34.860 --> 00:22:38.660
So I can envision you in the future taking this,

00:22:39.080 --> 00:22:43.780
and you may have a weather conditioning category processor

00:22:43.780 --> 00:22:47.460
that takes all your data and spits out alerts.

00:22:48.020 --> 00:22:49.920
It spits out values for alerts.

00:22:49.920 --> 00:22:55.920
And, you know, others could use that same processor to do the same.

00:22:56.440 --> 00:22:59.480
So I think you did an amazing job.

00:23:00.360 --> 00:23:05.340
After you got the alert, what would you do with it?

00:23:06.360 --> 00:23:07.420
Alert of the weather condition?

00:23:07.860 --> 00:23:08.560
Correct, yeah.

00:23:09.200 --> 00:23:11.640
Say you had time and you ran the script

00:23:11.640 --> 00:23:14.400
and you were able to generate some sort of an alert.

00:23:14.900 --> 00:23:16.580
What do you think would happen next?

00:23:16.580 --> 00:23:17.700
I'm not sure.

00:23:17.840 --> 00:23:20.980
I guess it would be another processor maybe.

00:23:21.280 --> 00:23:23.620
Yeah, I mean, you have an email processor.

00:23:24.120 --> 00:23:25.580
You type email.

00:23:26.560 --> 00:23:36.600
Yep, you can put email, send an email to recipients in a flow file.

00:23:37.000 --> 00:23:39.020
Of course, you would need an underlying email server.

00:23:39.460 --> 00:23:41.700
One of the cool things is, you know,

00:23:41.700 --> 00:23:44.420
it does have processors to handle email.

00:23:44.420 --> 00:23:51.220
I know of a few companies that is running NAFA to intercept all email,

00:23:51.540 --> 00:23:53.200
and it's scoring it for spam

00:23:53.200 --> 00:23:56.840
and checking for additional antivirus and stuff like that.

00:23:57.800 --> 00:24:00.100
But, you know, yeah, you could have put it to email.

00:24:00.900 --> 00:24:04.280
There used to be an actual texting processor,

00:24:05.360 --> 00:24:08.440
and it's still available and it's still updated

00:24:08.440 --> 00:24:10.720
through some of the texting services.

00:24:10.720 --> 00:24:14.100
It's not a part of the official NAFA package,

00:24:14.980 --> 00:24:19.440
but yeah, there's a texting processor that would send a text.

00:24:19.840 --> 00:24:21.460
So you could have took that alert

00:24:22.240 --> 00:24:26.720
and made a text message that said, you know,

00:24:27.680 --> 00:24:32.960
temperature is 99 degrees and it's hot today or something, right?

00:24:33.540 --> 00:24:35.120
Or you could send an email.

00:24:35.940 --> 00:24:39.700
So no, I like where you're going with this.

00:24:39.700 --> 00:24:42.200
I like the thinking outside the box.

00:24:43.020 --> 00:24:45.180
You know, working on cleaning it up.

00:24:45.600 --> 00:24:49.900
Your set schema metadata is way up there when you got it, right?

00:24:50.980 --> 00:24:53.860
You know, just try to make it where, you know,

00:24:53.940 --> 00:24:57.040
you're reading left or right or up, you know, up and down.

00:24:57.520 --> 00:24:59.260
But I think if you would have had time,

00:24:59.320 --> 00:25:01.200
you could have finished this up, cleaned it up,

00:25:01.200 --> 00:25:04.260
and it would be a great functional data flow.

00:25:05.260 --> 00:25:06.660
So great job.

00:25:06.660 --> 00:25:11.140
Any questions I can answer, you know, right now?

00:25:12.440 --> 00:25:14.420
I don't think so.

00:25:15.480 --> 00:25:19.120
This is the right processor to write it to the disk, right?

00:25:19.320 --> 00:25:20.580
To this, yeah, put file.

00:25:23.120 --> 00:25:26.700
And, you know, you got your get file and you got your put file.

00:25:27.540 --> 00:25:32.860
And then you could have taken this data

00:25:32.860 --> 00:25:35.520
and pushed it to a database, right?

00:25:35.520 --> 00:25:37.060
You've got those processors.

00:25:38.440 --> 00:25:40.540
You know, you could be receiving different data.

00:25:40.560 --> 00:25:43.960
And this is a real world scenario where you're receiving different data

00:25:43.960 --> 00:25:45.380
from different sources.

00:25:45.780 --> 00:25:48.900
You're bringing it together, getting it into a common format,

00:25:49.060 --> 00:25:52.100
breaking it down per record instead of like 20 record,

00:25:52.120 --> 00:25:56.460
24 records per document, and, you know, updating the database.

00:25:57.180 --> 00:26:00.160
So yeah, no, you got the right processor for it.

00:26:00.160 --> 00:26:02.900
You got the right kind of flow going as well.

00:26:02.900 --> 00:26:04.180
So great job.

00:26:07.960 --> 00:26:09.400
Good morning, Leroy.

00:26:09.480 --> 00:26:09.900
How are you doing?

00:26:10.560 --> 00:26:11.320
I'm doing all right.

00:26:11.720 --> 00:26:12.240
All right.

00:26:12.340 --> 00:26:16.520
Let's kind of go into your, kind of walk me through your processor,

00:26:16.680 --> 00:26:17.740
your thought process.

00:26:17.800 --> 00:26:18.760
What are you thinking?

00:26:19.460 --> 00:26:21.280
And let's look at this.

00:26:22.560 --> 00:26:22.820
Okay.

00:26:22.920 --> 00:26:28.100
Well, my first attempt at this was to lay out the steps,

00:26:28.100 --> 00:26:32.920
how it's presented in the PDF for the scenario.

00:26:34.340 --> 00:26:37.160
I maybe spent too much time making it look nice

00:26:37.160 --> 00:26:41.620
because I didn't get very far in actually implementing the links between them.

00:26:42.620 --> 00:26:47.240
But the general idea was to get CSV, extract text.

00:26:49.440 --> 00:26:50.980
I honestly didn't make it past this step.

00:26:51.040 --> 00:26:54.340
I was trying to extract, using regular expressions,

00:26:54.340 --> 00:26:57.800
the text and immediately applying it to like the attributes.

00:26:59.920 --> 00:27:03.340
So I jumbled up there because as soon as we got to the next step,

00:27:03.440 --> 00:27:10.080
I was able to just use a replace text to basically convert it over to JSON

00:27:10.080 --> 00:27:10.820
type format.

00:27:12.840 --> 00:27:15.800
I like where you're going with this.

00:27:15.880 --> 00:27:20.640
Instead of sending it through the convert records and everything else,

00:27:20.640 --> 00:27:26.680
you're just quickly ingesting it, extracting the text and doing it that.

00:27:27.120 --> 00:27:31.500
That's a perfectly acceptable way of doing this.

00:27:31.880 --> 00:27:32.360
Go ahead.

00:27:32.940 --> 00:27:35.200
The next idea was to execute some kind of script.

00:27:36.580 --> 00:27:40.320
Darius was kind of doing that with his Python.

00:27:40.960 --> 00:27:41.980
Nice, nice.

00:27:43.000 --> 00:27:44.840
I just can't even get to this step.

00:27:46.500 --> 00:27:49.320
The idea after that was to merge everything into one sheet

00:27:49.320 --> 00:27:53.720
and then kind of use a SQL query to kind of format and get summary stats.

00:27:56.220 --> 00:28:02.740
And then after that, it'd be routing on some kind of threshold of extreme temperatures

00:28:03.500 --> 00:28:08.260
or where it would convert the record and send it as an email.

00:28:10.880 --> 00:28:14.660
And if it didn't meet those conditions, it would just, either way,

00:28:14.660 --> 00:28:19.200
it would just make a report and write that report into a file.

00:28:20.660 --> 00:28:21.340
Nice.

00:28:22.120 --> 00:28:23.620
That was a good idea.

00:28:23.840 --> 00:28:27.560
Didn't get very far, so kind of overhearing everyone else's comments

00:28:27.560 --> 00:28:32.300
and kind of the other approaches started scrapping together the other way

00:28:32.300 --> 00:28:38.500
where we used some of the controllers.

00:28:39.360 --> 00:28:44.920
Well, I mean, so let's take a look at that, right?

00:28:45.060 --> 00:28:47.720
You're getting a CSV file, you're sending the schema metadata,

00:28:47.980 --> 00:28:53.000
you're converting it to JSON using a record reader and a record writer.

00:28:53.440 --> 00:28:56.580
You're having controller services you've got to set up and figure.

00:28:57.460 --> 00:29:02.680
There's a lot going into those first core processes, right?

00:29:03.060 --> 00:29:04.680
And then sending the file name.

00:29:04.680 --> 00:29:09.140
So, you know, I like the approach you took on the other one

00:29:09.140 --> 00:29:11.380
where you're just bringing the data in.

00:29:11.920 --> 00:29:16.600
You're doing a regular expression and extracting what you need.

00:29:17.700 --> 00:29:24.420
The beauty of this way that you're now doing is I feel like it's easier to understand.

00:29:24.440 --> 00:29:28.200
And if somebody else needed to work on this, you know,

00:29:28.240 --> 00:29:31.700
and then they knew controller services and such, you know,

00:29:31.700 --> 00:29:35.000
they could pick it up right up in the other flow.

00:29:35.320 --> 00:29:39.380
You're depending on someone understanding the rejects.

00:29:39.800 --> 00:29:44.520
But I feel like with some documentation, you know, that would be, you know, sufficient.

00:29:45.200 --> 00:29:50.720
But you also accomplished, you know, a lot of these steps and less processes

00:29:50.720 --> 00:29:53.380
for your first group versus this group.

00:29:55.140 --> 00:30:00.740
And one of the things that, you know, for fun when I was at NSA,

00:30:00.740 --> 00:30:05.180
we would just sit around and we would take some scenario like this.

00:30:05.820 --> 00:30:10.520
And, you know, we used to do this with, you know, shell scripts and everything else.

00:30:10.780 --> 00:30:14.580
But let's see who can build the data flow with the least amount of processors

00:30:15.260 --> 00:30:16.760
and, you know, set some roles.

00:30:17.300 --> 00:30:20.320
And, you know, we'd all take the same scenario and we'd run through

00:30:20.320 --> 00:30:24.400
and whoever could accomplish it in the least amount of processes won.

00:30:25.420 --> 00:30:29.780
And so, you know, sometimes we like to overcomplicate things

00:30:29.780 --> 00:30:31.300
and make things complicated.

00:30:31.460 --> 00:30:36.360
And I like your first approach, but your second approach is more, you know, sustainable.

00:30:37.340 --> 00:30:38.380
It's scalable.

00:30:39.040 --> 00:30:41.560
You know, the extract text, if something comes through,

00:30:41.580 --> 00:30:45.760
the data comes through that, you know, we've seen data before, right,

00:30:45.760 --> 00:30:47.960
where it's come through and it's supposed to be in one format

00:30:47.960 --> 00:30:52.360
and it's got a special character or something that might throw off your extract text.

00:30:52.360 --> 00:30:56.620
Error handling is better, I feel like, in the convert record.

00:30:57.420 --> 00:31:02.120
But, no, and you've got the beautification down.

00:31:02.960 --> 00:31:06.240
That first flow looked, I mean, everything was well documented.

00:31:06.540 --> 00:31:08.600
You had bullets for each one.

00:31:09.280 --> 00:31:10.700
You know, great job.

00:31:11.140 --> 00:31:15.140
Any questions I can answer, you know, about either one of these?

00:31:18.120 --> 00:31:22.780
I had questions in general about some of the Python stuff.

00:31:22.860 --> 00:31:24.500
Well, I guess it's not specifically to Python,

00:31:24.720 --> 00:31:30.260
but just making a custom processor in general.

00:31:31.000 --> 00:31:32.840
I know that there's one that speaks scripts

00:31:32.840 --> 00:31:37.680
and it says, or some of the language on there say, like, they're deprecated.

00:31:37.740 --> 00:31:38.260
Yes.

00:31:38.720 --> 00:31:40.960
What kind of replaces that?

00:31:40.960 --> 00:31:48.500
So in 9.5 2.0, and I will make a note.

00:31:49.260 --> 00:31:53.620
A good friend of mine, Mark Payne, who was on my team at NSA,

00:31:55.080 --> 00:31:57.900
he does YouTube videos here and there.

00:31:59.440 --> 00:32:04.200
And, you know, he puts out, like, design patterns and stuff like that,

00:32:04.200 --> 00:32:08.800
as well as, you know, what's coming forward in 9.5 2.0.

00:32:08.800 --> 00:32:16.400
You know, the 2.0 is getting rid of some of these execute script-type capabilities

00:32:16.400 --> 00:32:17.800
for security reasons.

00:32:18.260 --> 00:32:22.820
But the beauty is they're making it where you can build your own Python processor.

00:32:24.000 --> 00:32:26.820
And so if you can write a Python script,

00:32:27.620 --> 00:32:31.160
it's basically taking that script and making it a processor.

00:32:31.680 --> 00:32:36.920
And so, you know, you can start, you can actually download 2.0 right now.

00:32:36.920 --> 00:32:40.220
The documentation has already been updated as well.

00:32:40.580 --> 00:32:46.100
And, you know, you can start playing around with developing your own custom Python processor as well.

00:32:46.740 --> 00:32:53.040
I like the execute process a little bit better than the execute script.

00:32:53.700 --> 00:32:57.680
The reason they're deprecating these and the reason they're trying to get away from it

00:32:57.680 --> 00:33:00.540
is it was causing more confusion than it was helping.

00:33:00.540 --> 00:33:04.060
And, you know, it was executing.

00:33:05.460 --> 00:33:07.740
You've got to configure the processor right.

00:33:08.320 --> 00:33:12.160
You know, Ogarious had pulled out where you've got certain things.

00:33:12.280 --> 00:33:14.840
You've got to include with your Python to accept the file,

00:33:15.600 --> 00:33:19.180
to put the file back out, and those types of things.

00:33:20.420 --> 00:33:21.980
So they're getting rid of that.

00:33:22.060 --> 00:33:25.320
But if you look at 2.0 and if you like Python,

00:33:26.380 --> 00:33:30.160
you can do amazing things with 9.5 2.0.

00:33:30.160 --> 00:33:34.160
But if you have a Python script that you want to write,

00:33:34.680 --> 00:33:39.620
you can put it on the file system, execute process instead of execute script,

00:33:40.080 --> 00:33:41.500
and use it instead.

00:33:42.620 --> 00:33:46.560
There's a lot of documentation out there, one versus the other.

00:33:47.220 --> 00:33:49.540
And even execute string command.

00:33:50.700 --> 00:33:53.940
If you're familiar with Spark, Apache Spark,

00:33:53.940 --> 00:34:01.620
it's a ML AI data framework that usually runs on top of big data systems like Hadoop.

00:34:01.920 --> 00:34:03.600
But you can run Spark standalone.

00:34:05.220 --> 00:34:09.340
And 9.5 has the capabilities to connect to Spark and execute Spark jobs.

00:34:10.200 --> 00:34:16.700
So you may even set up your own local Spark and have a Python script to handle this.

00:34:17.500 --> 00:34:19.360
And Spark will run that in memory.

00:34:19.700 --> 00:34:23.800
And then you can also do data science type capabilities on that.

00:34:23.800 --> 00:34:28.160
So there's a few differences between those processors.

00:34:28.740 --> 00:34:33.400
There's reasons that the execute script is going away, like I mentioned.

00:34:34.000 --> 00:34:40.520
But I feel like you can still accomplish if you wanted to do that with an execute process.

00:34:40.940 --> 00:34:44.040
And I also feel like with the new version of 9.5 that's out,

00:34:44.380 --> 00:34:46.600
you can do your own processor even.

00:34:46.800 --> 00:34:48.300
And that way it's more sustainable.

00:34:48.580 --> 00:34:53.200
And if you develop that, anybody can use that processor that has permission.

00:34:53.200 --> 00:34:58.000
So it's a reusable code, which is real nice.

00:34:59.240 --> 00:35:01.280
But they're getting away from it.

00:35:01.280 --> 00:35:02.780
That's why you see it's deprecated.

00:35:03.140 --> 00:35:05.200
But there's other ways to handle that.

00:35:06.140 --> 00:35:10.360
But I hope that kind of helps answer some of your questions as well as the theory behind that.

00:35:10.360 --> 00:35:10.920
Yeah, thank you.

00:35:11.220 --> 00:35:11.640
Okay.

00:35:12.680 --> 00:35:16.980
And then any other questions about 9.5?

00:35:19.620 --> 00:35:23.220
Looks like you looked through all the 9.5 processors.

00:35:23.700 --> 00:35:26.520
You've got most of the toolbar sitting here.

00:35:28.080 --> 00:35:31.700
So it looks like you've exercised what we learned over the last couple of days.

00:35:32.480 --> 00:35:33.900
But any additional questions?

00:35:35.760 --> 00:35:39.120
I know that some of this is coming up in 2.0.

00:35:39.120 --> 00:35:42.520
But within this version,

00:35:44.500 --> 00:35:47.440
what would be the most straightforward way to implement the custom processor?

00:35:47.780 --> 00:35:51.300
Is it just limited to...

00:35:51.300 --> 00:35:53.440
I guess this is negatively jumbled already.

00:35:53.880 --> 00:35:55.620
So probably have to write the jumbled processor.

00:35:55.960 --> 00:35:56.160
Yep.

00:35:56.380 --> 00:35:58.540
And you can see my screen.

00:36:00.860 --> 00:36:02.160
Creating a custom processor.

00:36:04.820 --> 00:36:09.360
So I don't know what ID integrated development environment,

00:36:09.600 --> 00:36:10.800
IDE that you use.

00:36:11.000 --> 00:36:12.800
Most people use VS Code now.

00:36:13.700 --> 00:36:16.940
And Eclipse is still the rainy ones.

00:36:17.540 --> 00:36:21.220
But yeah, you are basically writing a Java jar

00:36:23.720 --> 00:36:29.580
and building a Java jar applet.

00:36:29.580 --> 00:36:33.340
But in this case, it's a Nahon R.

00:36:34.800 --> 00:36:37.200
But to create a custom processor,

00:36:37.940 --> 00:36:40.540
this will be in the documentation SNL.

00:36:41.320 --> 00:36:47.480
I was told that I may not want to go into some of the very,

00:36:47.480 --> 00:36:51.320
very, very technical aspects of this for this class.

00:36:51.500 --> 00:36:56.180
But I've got the material here just in case.

00:36:56.600 --> 00:36:58.760
And so to create a custom processor,

00:36:58.760 --> 00:37:01.700
you just need to download your Maven type.

00:37:02.020 --> 00:37:03.220
You need to get Maven running.

00:37:04.360 --> 00:37:08.240
You make sure you have your Java JDK environment set up.

00:37:08.360 --> 00:37:13.520
Make sure that your IDE is configured appropriately for that environment.

00:37:13.680 --> 00:37:15.540
And then you can start running your processor.

00:37:16.180 --> 00:37:18.260
Now, with that being said,

00:37:19.640 --> 00:37:22.920
I say this and I want to show everyone.

00:37:24.420 --> 00:37:26.920
I like to...

00:37:27.680 --> 00:37:31.280
So what I like to do is I will go to GitHub.

00:37:33.020 --> 00:37:34.420
That's the PowerPoint presentation.

00:37:36.140 --> 00:37:40.640
I will go to GitHub and I will look.

00:37:41.100 --> 00:37:43.000
And even just a search for NaPhi.

00:37:43.680 --> 00:37:44.860
Tim Spann is a good guy.

00:37:44.880 --> 00:37:46.340
He puts out a lot of information.

00:37:47.560 --> 00:37:50.600
You know, here's Cogstack NaPhi, which is really cool.

00:37:50.600 --> 00:37:54.780
It's a whole data processing pipeline, you know,

00:37:56.100 --> 00:37:58.560
for NLP and those types of things.

00:37:58.580 --> 00:38:00.460
They got a few custom processors.

00:38:01.340 --> 00:38:02.780
So that's a whole stack.

00:38:03.220 --> 00:38:06.000
But if you look, you can go...

00:38:06.000 --> 00:38:06.400
Let's see here.

00:38:06.660 --> 00:38:08.500
There is...

00:38:08.500 --> 00:38:12.000
Oh, Influx data.

00:38:13.260 --> 00:38:16.380
Influx publishes its own NaPhi processors.

00:38:17.000 --> 00:38:19.580
Here's the source code, you know,

00:38:19.580 --> 00:38:21.260
for the NaPhi processor.

00:38:21.380 --> 00:38:23.040
Here's how to build it.

00:38:23.480 --> 00:38:25.860
Actually, they already have pre-built ones you can download.

00:38:26.260 --> 00:38:30.360
So for, you know, 126, you can download it now.

00:38:30.640 --> 00:38:32.800
And put it in your extensions directory.

00:38:32.880 --> 00:38:36.060
And you will have the Influx DB processor

00:38:36.060 --> 00:38:38.180
as part of your processor group.

00:38:38.680 --> 00:38:40.340
But all the source code is here.

00:38:40.480 --> 00:38:42.300
So if you go to GitHub,

00:38:43.800 --> 00:38:48.420
it has most of the other types of connections

00:38:48.420 --> 00:38:49.740
that these people have developed.

00:38:50.900 --> 00:38:53.720
You know, some of the specialized military formats

00:38:53.720 --> 00:38:55.900
we don't see, of course, on here.

00:38:56.940 --> 00:39:00.760
But even some of these specialized, you know,

00:39:00.920 --> 00:39:05.440
sensors and UAV and all these other things,

00:39:05.800 --> 00:39:08.680
there's processors usually for those as well

00:39:08.680 --> 00:39:10.120
that the government's developed.

00:39:10.600 --> 00:39:13.520
So, you know, if you run across something

00:39:13.520 --> 00:39:15.880
that you think you might need a custom processor for,

00:39:15.880 --> 00:39:19.300
you check your resources, check GitHub, check Google.

00:39:20.160 --> 00:39:21.860
And, you know, there are some resources

00:39:22.620 --> 00:39:26.840
that I unfortunately don't have a JWix terminal in my house.

00:39:27.620 --> 00:39:29.660
But, you know, if I was on JWix,

00:39:29.940 --> 00:39:34.760
there are some sources as well I can Google to get to.

00:39:35.660 --> 00:39:39.040
You know, so I think before you build your own custom processor,

00:39:39.220 --> 00:39:40.420
you know, just kind of Google around.

00:39:40.440 --> 00:39:42.480
You may find some documentation.

00:39:42.480 --> 00:39:46.120
And then if you do end up having to build your own processor,

00:39:46.700 --> 00:39:50.560
if you're using, you know, anything less than 952.0,

00:39:50.880 --> 00:39:53.180
then it's definitely going to have to be in Java.

00:39:53.520 --> 00:39:58.320
But if you use 952.0 and above, it's going to be Java or Python.

00:39:58.880 --> 00:40:05.920
And you can find tons and tons of source code out there to help you.

00:40:06.620 --> 00:40:09.380
So, you know, definitely take a look at that.

00:40:09.380 --> 00:40:12.440
There's all kinds of flows and stuff on GitHub,

00:40:13.100 --> 00:40:15.220
you know, non-file source codes on GitHub.

00:40:16.840 --> 00:40:18.400
There's, I like GeoMesa.

00:40:19.260 --> 00:40:21.600
That's another product that I really like.

00:40:23.080 --> 00:40:24.520
They have their processors.

00:40:24.680 --> 00:40:27.840
So a lot of these companies even build their own processors

00:40:27.840 --> 00:40:28.720
and release them.

00:40:28.740 --> 00:40:32.820
So that way you can use their services

00:40:32.820 --> 00:40:35.800
as part of your data orchestration later.

00:40:36.520 --> 00:40:38.640
But, yeah, hopefully that answers your question

00:40:38.640 --> 00:40:41.540
and gives you a little bit of tips and some pointers

00:40:41.540 --> 00:40:43.260
to where you can get some more information.

00:40:44.020 --> 00:40:44.960
All right.

00:40:45.180 --> 00:40:46.180
Ricker, how'd it go?

00:40:46.960 --> 00:40:48.820
Yeah, I was going to...

00:40:48.820 --> 00:40:50.540
I unfortunately had to step out.

00:40:50.540 --> 00:40:51.260
You had a meeting.

00:40:51.280 --> 00:40:53.020
I missed out a couple steps here.

00:40:53.100 --> 00:40:54.180
Yeah, and I apologize.

00:40:54.320 --> 00:40:54.940
Oh, no worries.

00:40:55.160 --> 00:40:57.060
Just kind of something happened here,

00:40:57.060 --> 00:40:58.680
and I ended up being pulled away.

00:40:58.820 --> 00:41:02.540
However, I did attempt to kind of follow along.

00:41:02.640 --> 00:41:03.700
I didn't want to interrupt the class.

00:41:03.860 --> 00:41:06.560
And I still won't, so I just kind of been following

00:41:06.560 --> 00:41:07.880
what you're supposed to have been doing.

00:41:08.460 --> 00:41:10.600
And if it's okay with you and everyone else in the class,

00:41:10.740 --> 00:41:12.280
I'll just kind of continue in that format.

00:41:12.620 --> 00:41:14.860
Otherwise, I'm going to be a huge distraction to everyone,

00:41:14.960 --> 00:41:15.880
and I don't want to do that.

00:41:16.460 --> 00:41:17.640
No, no, no.

00:41:17.640 --> 00:41:18.560
And I think...

00:41:18.560 --> 00:41:19.840
So let's do this.

00:41:20.380 --> 00:41:24.040
Again, I'm not looking for what's complete.

00:41:25.340 --> 00:41:28.160
Kind of walk me through some of your thought process

00:41:29.080 --> 00:41:31.060
and what you've already got built,

00:41:31.600 --> 00:41:33.120
and let's go with that.

00:41:33.120 --> 00:41:35.340
So if you can, just kind of walk me through.

00:41:36.520 --> 00:41:39.240
Yeah, actually, so at least in my lab here,

00:41:39.320 --> 00:41:40.580
I was trying to follow the process

00:41:40.580 --> 00:41:41.900
that you were working on yesterday.

00:41:42.440 --> 00:41:45.360
I did end up having some issues with the lab.

00:41:47.160 --> 00:41:48.040
I couldn't...

00:41:48.040 --> 00:41:49.960
Yeah, it was kind of latency was the issue.

00:41:50.040 --> 00:41:51.820
I'm sure our network is part of the problem.

00:41:52.600 --> 00:41:53.900
So when I was doing this,

00:41:54.240 --> 00:41:58.220
I kind of got cut off there on the evaluate based on that.

00:41:59.560 --> 00:42:01.140
And just on my end, you know,

00:42:01.140 --> 00:42:03.120
I was trying to see what other folks were doing

00:42:03.120 --> 00:42:05.840
as you were kind of doing exactly what we're doing now.

00:42:05.960 --> 00:42:07.320
Just kind of walk me through the workflow.

00:42:08.080 --> 00:42:08.980
So if anyone has questions,

00:42:09.040 --> 00:42:10.320
I was just kind of following them.

00:42:11.180 --> 00:42:12.420
Then I ended up getting cut off.

00:42:13.000 --> 00:42:13.900
I couldn't get...

00:42:13.900 --> 00:42:15.520
I had to start for whatever reason.

00:42:15.960 --> 00:42:17.840
I ended up doing a...

00:42:17.840 --> 00:42:20.840
I just spun up a Docker image on my desktop.

00:42:21.260 --> 00:42:22.180
I started to point.

00:42:23.480 --> 00:42:26.420
So I ended up just kind of building something

00:42:26.420 --> 00:42:28.800
that I'm currently just kind of tolling with

00:42:28.800 --> 00:42:30.080
in my environment.

00:42:31.440 --> 00:42:33.460
So basically what I'm trying to do,

00:42:33.580 --> 00:42:35.940
because I see potential at least,

00:42:36.520 --> 00:42:37.820
some of our data formats,

00:42:38.380 --> 00:42:39.800
which are CSVs,

00:42:41.000 --> 00:42:43.800
that top header column, the metadata essentially,

00:42:44.360 --> 00:42:47.220
lets us know what attributes are available for the data.

00:42:47.360 --> 00:42:48.920
So kind of doing the same thing here,

00:42:49.420 --> 00:42:51.880
trying to see if I could just extract that top portion

00:42:51.880 --> 00:42:53.520
to see what metadata is available.

00:42:54.720 --> 00:42:57.580
And then the next steps are still kind of blurry.

00:42:58.040 --> 00:42:59.820
I know I'm going to have to do some research,

00:43:00.060 --> 00:43:02.760
but the idea is I can at least know

00:43:02.760 --> 00:43:05.600
what the header contents are

00:43:05.600 --> 00:43:07.680
and compare that to a catalog to say,

00:43:08.160 --> 00:43:11.420
hey, I'm looking for a particular type of data appeal.

00:43:12.040 --> 00:43:13.780
And this thing happens to have it.

00:43:13.780 --> 00:43:14.620
I like that.

00:43:14.700 --> 00:43:16.080
And make it visible to say,

00:43:16.300 --> 00:43:19.560
hey, this data has this one column that you're looking for.

00:43:19.640 --> 00:43:21.760
So that's kind of what I was tolling with.

00:43:22.180 --> 00:43:22.740
And I apologize.

00:43:22.860 --> 00:43:23.800
I couldn't do it in this lab

00:43:23.800 --> 00:43:26.780
because I wanted to do it in my...

00:43:26.780 --> 00:43:28.460
Again, yesterday I was having issues,

00:43:28.620 --> 00:43:30.540
but then I also wanted to see if I could do this

00:43:30.540 --> 00:43:33.040
in the same step with an actual data box.

00:43:33.660 --> 00:43:34.940
No, no, no.

00:43:35.820 --> 00:43:38.180
And again, it's...

00:43:38.680 --> 00:43:40.860
What I've went over the last couple of days

00:43:40.860 --> 00:43:42.860
is it starting to set in.

00:43:43.000 --> 00:43:44.500
And I think it is, right?

00:43:45.860 --> 00:43:48.340
I saw, I think it was yesterday,

00:43:48.600 --> 00:43:51.000
or the day before yesterday,

00:43:51.100 --> 00:43:53.640
you were already experimenting and playing around.

00:43:54.520 --> 00:43:57.500
So it's funny, you just spun up a Docker image

00:43:57.500 --> 00:43:59.040
and went to work.

00:43:59.300 --> 00:44:01.860
I would have done basically the same thing.

00:44:03.440 --> 00:44:04.720
So I think you get it.

00:44:05.700 --> 00:44:07.540
You've got your flows laid out.

00:44:07.760 --> 00:44:08.760
You've got...

00:44:08.760 --> 00:44:10.260
It's starting to become readable.

00:44:11.120 --> 00:44:13.880
I think you grasp that there's processors

00:44:13.880 --> 00:44:16.580
that you can use to do these things

00:44:16.580 --> 00:44:19.340
and then send emails and stuff like that as well.

00:44:20.420 --> 00:44:22.640
You've got everything in the processor group.

00:44:22.640 --> 00:44:24.540
It's starting to...

00:44:24.540 --> 00:44:26.560
Your right side is starting to look like a spider web,

00:44:26.600 --> 00:44:29.380
but I bet if you were to completely go through it,

00:44:29.380 --> 00:44:31.360
you would kind of get it cleaned up

00:44:31.360 --> 00:44:33.080
and make it more presentable.

00:44:33.680 --> 00:44:37.280
But I think overall you're grasping

00:44:37.280 --> 00:44:40.240
what we're trying to do here and why we're doing it.

00:44:40.620 --> 00:44:42.100
And so, you know,

00:44:42.140 --> 00:44:44.040
actually I thought you got that on the first day

00:44:44.040 --> 00:44:47.120
when I saw your experiment with all your images

00:44:47.120 --> 00:44:48.320
or whatever you were working on.

00:44:49.680 --> 00:44:50.600
But no, cool.

00:44:50.600 --> 00:44:54.040
Any questions I can answer?

00:44:54.820 --> 00:44:55.600
Any questions about NaFi?

00:44:56.420 --> 00:44:58.240
Are you up on a specific step?

00:44:58.340 --> 00:45:00.780
Even your test-like playground.

00:45:01.660 --> 00:45:03.380
Is there anything I can help with?

00:45:04.280 --> 00:45:05.100
Not really.

00:45:06.060 --> 00:45:07.320
Or I should say no questions.

00:45:07.720 --> 00:45:09.340
I'm just kind of experimenting.

00:45:09.580 --> 00:45:11.240
If I do, I definitely will reach out.

00:45:11.860 --> 00:45:14.660
And I definitely got some ideas during Leroy,

00:45:14.740 --> 00:45:15.900
Peter, actually even Tom.

00:45:16.820 --> 00:45:18.420
And now that I know they know,

00:45:18.420 --> 00:45:20.280
I'm going to reach out to them and bug them.

00:45:22.380 --> 00:45:25.200
I think Tom got the first of everybody.

00:45:25.240 --> 00:45:27.080
He had the least amount of questions.

00:45:27.120 --> 00:45:28.760
So I think Tom's already done.

00:45:28.920 --> 00:45:30.460
He's already done with the next scenario probably.

00:45:31.580 --> 00:45:33.180
No, hell no.

00:45:33.600 --> 00:45:35.120
Tom already knows I'm going to bug him.

00:45:35.500 --> 00:45:36.180
I have bugged him.

00:45:36.460 --> 00:45:37.860
And I think it's been about two years.

00:45:38.020 --> 00:45:40.780
I used to bug him just about every day for a long time.

00:45:40.820 --> 00:45:42.940
And then I kind of switched roles

00:45:42.940 --> 00:45:44.180
and I stopped bugging him.

00:45:44.460 --> 00:45:46.220
But Tom, now that I know you know.

00:45:47.860 --> 00:45:49.680
I'm way below your pay grade, man.

00:45:49.780 --> 00:45:50.320
No, sir.

00:45:52.960 --> 00:45:54.460
But really, I think these folks are the ones

00:45:54.460 --> 00:45:56.060
that are going to be probably more hands-on.

00:45:57.180 --> 00:45:58.540
And I don't want to be a distraction.

00:45:59.160 --> 00:45:59.480
Okay.

00:45:59.900 --> 00:46:00.300
No, no.

00:46:01.640 --> 00:46:04.040
It's one of those tools that, again,

00:46:04.240 --> 00:46:06.280
anybody can download, spin up a Docker image,

00:46:06.520 --> 00:46:08.000
install it, play with it.

00:46:08.240 --> 00:46:09.980
You may not be as hands-on

00:46:09.980 --> 00:46:12.820
or as part of your day-to-day operations,

00:46:12.820 --> 00:46:16.240
but you may have a use case even at home

00:46:16.240 --> 00:46:20.380
that you want to, like maybe you get like me sometimes

00:46:20.380 --> 00:46:22.960
and you want to get away from all the Google services

00:46:22.960 --> 00:46:24.560
and start storing everything locally.

00:46:25.460 --> 00:46:28.200
So, you know, you got to process all that data.

00:46:28.820 --> 00:46:30.340
So, no, have at it.

00:46:30.340 --> 00:46:33.000
If anything comes up, you have any questions, let me know.

00:46:33.020 --> 00:46:35.820
But I think you're coming down the right path.

00:46:37.420 --> 00:46:38.000
I appreciate that.

00:46:38.340 --> 00:46:38.640
Yep.

00:46:38.860 --> 00:46:39.560
All right, Tom.

00:46:40.420 --> 00:46:43.060
The high achiever of the class.

00:46:43.580 --> 00:46:44.560
I don't think so.

00:46:45.120 --> 00:46:46.440
I didn't finish.

00:46:46.640 --> 00:46:47.520
No, that's okay.

00:46:48.020 --> 00:46:50.260
A couple of people got a lot farther than me.

00:46:50.560 --> 00:46:53.220
Odarius and, well, I think it was Eleroy.

00:46:53.300 --> 00:46:54.640
They got a lot farther than I did.

00:46:54.660 --> 00:46:57.440
I was very surprised to see their extract tags

00:46:57.440 --> 00:46:59.920
and their scripting.

00:47:00.000 --> 00:47:00.860
That was really nice.

00:47:01.080 --> 00:47:02.080
Walk me through your flow.

00:47:03.620 --> 00:47:06.320
Well, it's been my turn to be the distraction today

00:47:06.320 --> 00:47:07.520
or be distracted today

00:47:07.520 --> 00:47:11.200
because our team lead is on vacation the rest of this week

00:47:11.200 --> 00:47:15.740
and it's been this short-fused, dramatic fire thing

00:47:15.740 --> 00:47:17.000
that I've been dealing with.

00:47:17.340 --> 00:47:19.660
I've been in the government for over 20 years,

00:47:19.660 --> 00:47:20.220
I understand.

00:47:21.340 --> 00:47:24.060
Yeah, so it figures, you know, my team lead's not here

00:47:24.060 --> 00:47:26.000
and there's this thing that just happened

00:47:26.000 --> 00:47:27.980
that's like, ugh, anyway.

00:47:29.180 --> 00:47:30.760
Drama, man, always something.

00:47:31.040 --> 00:47:31.460
Always.

00:47:33.020 --> 00:47:34.720
But no, like the other folks,

00:47:34.720 --> 00:47:37.940
I was trying to, you know, pick up all the files

00:47:37.940 --> 00:47:40.060
because I liked X's approach

00:47:40.060 --> 00:47:42.660
when he was describing it, pick up all the files

00:47:42.660 --> 00:47:44.600
and then sort it from there

00:47:45.300 --> 00:47:48.860
and then convert the CSV to a JSON format.

00:47:49.740 --> 00:47:51.300
And then there, you know, we did like,

00:47:51.300 --> 00:47:53.860
like most people did, was split the JSON.

00:47:54.400 --> 00:47:57.180
And all this was working and I got to this point

00:47:57.180 --> 00:47:59.480
and then I wasn't sure what to do

00:47:59.480 --> 00:48:01.380
after I had the attributes from the CSV.

00:48:01.800 --> 00:48:05.180
I kind of felt like I wanted to merge the contents

00:48:05.180 --> 00:48:06.700
into just one single file.

00:48:06.820 --> 00:48:09.600
And then from there, I would have liked to have,

00:48:09.620 --> 00:48:11.620
like the scenario was asking,

00:48:12.340 --> 00:48:15.880
take some of those and pull some sort of summary

00:48:15.880 --> 00:48:17.380
like it was, you know, like if,

00:48:18.400 --> 00:48:20.140
like you were saying, the high temperature day

00:48:20.140 --> 00:48:22.100
or the wind speed, you know, certain attributes

00:48:22.100 --> 00:48:24.720
and pull that into some sort of summarized report

00:48:24.720 --> 00:48:26.740
and then put that into some kind of email.

00:48:26.920 --> 00:48:28.580
Or once I had it in one file,

00:48:28.600 --> 00:48:30.740
maybe put that into a different folder, I wasn't sure.

00:48:30.740 --> 00:48:33.600
But that's kind of what I was looking for.

00:48:33.900 --> 00:48:36.940
The end result would be was to not only have maybe

00:48:36.940 --> 00:48:40.200
a well-formatted human readable report stored somewhere,

00:48:40.220 --> 00:48:44.020
but also have something like that sent out as an email,

00:48:44.020 --> 00:48:46.400
like, well, here's your weather for the day

00:48:46.900 --> 00:48:48.620
or, you know, something like that.

00:48:49.340 --> 00:48:51.120
Or, hey, we experienced this yesterday,

00:48:51.120 --> 00:48:52.260
you know, something, I don't know.

00:48:52.680 --> 00:48:54.320
I hadn't really thought that far ahead,

00:48:54.320 --> 00:48:55.620
but that's kind of what I was thinking.

00:48:56.560 --> 00:48:59.680
Well, I like the thought of getting each one of these

00:48:59.680 --> 00:49:02.680
as an individual record, because if you can do that,

00:49:02.920 --> 00:49:04.680
then you could potentially, if that was, say,

00:49:04.780 --> 00:49:06.240
an individual JSON record,

00:49:06.620 --> 00:49:09.620
and you were looking for a temperature threshold

00:49:09.620 --> 00:49:12.660
of over 90 degrees, for instance,

00:49:12.860 --> 00:49:17.800
and you could send that data to an evaluate JSON path,

00:49:18.060 --> 00:49:21.280
you could look at just the temperature path,

00:49:21.660 --> 00:49:23.980
and if it meets that threshold,

00:49:24.460 --> 00:49:27.460
it sends it another direction where you generate a report.

00:49:28.480 --> 00:49:32.060
So, you know, there was a few ways you could have,

00:49:32.060 --> 00:49:33.660
you know, ultimately handled that.

00:49:33.860 --> 00:49:35.920
It could, you could send it to a SQL query

00:49:35.920 --> 00:49:38.180
or a database or, you know, those.

00:49:38.280 --> 00:49:40.900
But I like the thought of getting all this in,

00:49:41.200 --> 00:49:44.240
all the same format, all in this, you know,

00:49:44.300 --> 00:49:47.760
put it all together, and then start analyzing it.

00:49:48.180 --> 00:49:50.820
And then that way you can also do some trend analysis

00:49:50.820 --> 00:49:51.900
and things like that.

00:49:52.140 --> 00:49:53.960
So, I think you're close.

00:49:54.960 --> 00:49:58.660
I understand you got hung up on a few of these parts

00:49:58.660 --> 00:50:01.180
and pulled away and things like that.

00:50:01.260 --> 00:50:04.420
You know, again, I recommend for everyone,

00:50:04.880 --> 00:50:06.820
download this, install it.

00:50:06.840 --> 00:50:08.160
Well, there's nothing really to install.

00:50:08.340 --> 00:50:10.780
Download and run it, and play around.

00:50:10.840 --> 00:50:13.280
Just make sure the data you're playing around with

00:50:13.280 --> 00:50:15.080
is something that, you know,

00:50:15.080 --> 00:50:17.700
you can delete and never get back,

00:50:18.320 --> 00:50:20.640
because NaPhi is very quick,

00:50:20.680 --> 00:50:22.720
and if you accidentally hit start,

00:50:22.720 --> 00:50:26.540
it could, you know, in the last class,

00:50:27.300 --> 00:50:30.400
they had a couple of folks that was doing a Git file,

00:50:30.640 --> 00:50:33.080
and they did a Git file on the NaPhi directory,

00:50:33.220 --> 00:50:36.440
and they picked the application up and crashed it.

00:50:37.320 --> 00:50:40.260
So, you know, luckily we didn't have any of that

00:50:40.260 --> 00:50:42.980
in this class, but, you know,

00:50:43.780 --> 00:50:46.980
so again, some of these concepts just take practice,

00:50:47.140 --> 00:50:49.920
take time, take Googling and things like that.

00:50:49.920 --> 00:50:52.920
You know, so don't beat yourself up

00:50:52.920 --> 00:50:55.480
if you didn't finish or even if you didn't get

00:50:55.480 --> 00:50:56.080
halfway through.

00:50:56.440 --> 00:50:58.300
What I'm looking at is, again,

00:50:58.460 --> 00:51:01.240
a basic, you know, understanding of NaPhi,

00:51:01.340 --> 00:51:04.120
how it works, how some of the processes work,

00:51:04.220 --> 00:51:07.920
what you can do, and flow file versus content

00:51:09.180 --> 00:51:11.660
and attributes and those types of things.

00:51:13.020 --> 00:51:14.840
So, great job.

00:51:14.920 --> 00:51:16.720
Any questions I can answer for you?

00:51:17.300 --> 00:51:20.920
Well, I think it's the biggest, my biggest problem

00:51:20.920 --> 00:51:23.600
for me that I see is the biggest hurdle is

00:51:24.360 --> 00:51:27.380
understanding what some of these processes actually do

00:51:27.380 --> 00:51:30.400
and which ones to pick to fit your requirement.

00:51:30.660 --> 00:51:33.200
I mean, that to me is extremely difficult

00:51:33.200 --> 00:51:35.360
and challenging to try to figure out,

00:51:35.380 --> 00:51:35.760
you know what I mean?

00:51:36.280 --> 00:51:38.840
So, even understanding what the processes actually mean,

00:51:39.140 --> 00:51:40.540
like, okay, I can add a processor,

00:51:40.580 --> 00:51:42.940
and then you start looking at it,

00:51:42.940 --> 00:51:44.420
you're like, I have no idea what that means.

00:51:44.420 --> 00:51:46.520
I have no idea what to put in properties.

00:51:48.640 --> 00:51:50.060
You know, I think you just, I don't know.

00:51:50.120 --> 00:51:52.840
I mean, I guess it comes with using it more

00:51:52.840 --> 00:51:54.720
and like you said, playing with it.

00:51:54.720 --> 00:51:55.260
I don't know.

00:51:56.220 --> 00:51:57.540
It's definitely a challenge.

00:51:57.740 --> 00:52:00.120
I mean, this is really, really cool.

00:52:00.220 --> 00:52:01.160
I really love playing with it,

00:52:01.160 --> 00:52:02.900
but I also think it's extremely challenging

00:52:02.900 --> 00:52:06.840
and difficult to kind of grasp and get, I don't know.

00:52:07.640 --> 00:52:08.280
It is.

00:52:08.500 --> 00:52:11.360
And again, one thing I always recommend

00:52:11.360 --> 00:52:15.540
and it's something I, even me who has wrote

00:52:15.540 --> 00:52:17.360
the underlying code for this,

00:52:18.180 --> 00:52:20.060
I stay at the documentation

00:52:20.060 --> 00:52:23.760
because there's hundreds of committers on IFA.

00:52:24.180 --> 00:52:26.000
You know, and you know,

00:52:26.300 --> 00:52:27.660
a committer may be somebody that's doing

00:52:27.660 --> 00:52:28.920
just a single processor.

00:52:29.440 --> 00:52:31.640
So what I like to do is I always have

00:52:31.640 --> 00:52:32.960
the documentation open.

00:52:33.960 --> 00:52:36.300
You know, you've got your expression language guide,

00:52:36.320 --> 00:52:39.400
which is a biggie, a record path guide.

00:52:39.400 --> 00:52:42.400
We really haven't touched on some of the admin

00:52:42.400 --> 00:52:45.400
and tool guide stuff just because that's pretty technical,

00:52:45.960 --> 00:52:47.260
low-level technical stuff.

00:52:47.920 --> 00:52:50.900
But I love the processor list.

00:52:51.220 --> 00:52:54.340
And these are the official supported processors.

00:52:55.020 --> 00:52:58.640
And so, you know, I'll pull this up.

00:52:59.160 --> 00:53:01.860
I'll click around like, oh, okay, let me see.

00:53:01.940 --> 00:53:03.380
I'm doing some JSON stuff.

00:53:03.500 --> 00:53:06.840
So let me find, you know, a JSON.

00:53:06.840 --> 00:53:09.580
I like a jolt transform, right?

00:53:09.660 --> 00:53:12.020
You can actually do some transformations on the JSON.

00:53:12.240 --> 00:53:13.780
I'm surprised somebody didn't choose that.

00:53:14.160 --> 00:53:15.980
And then you can get more information

00:53:15.980 --> 00:53:17.420
and those types of things.

00:53:18.020 --> 00:53:20.220
So, you know, as well as relationships,

00:53:20.600 --> 00:53:22.940
what's required, what's not required,

00:53:23.400 --> 00:53:27.880
you know, the explanation for each of the fields

00:53:27.880 --> 00:53:29.580
and those types of things.

00:53:30.480 --> 00:53:34.940
So, you know, and there's 511 processors

00:53:34.940 --> 00:53:37.400
and reporting tasks and parameter providers.

00:53:38.960 --> 00:53:40.780
That's another thing that makes it overwhelming.

00:53:41.080 --> 00:53:41.760
There's so many.

00:53:43.520 --> 00:53:44.800
That's a complaint.

00:53:44.880 --> 00:53:47.180
That's a complaint that the community has.

00:53:47.640 --> 00:53:50.260
So how do you balance the, you know,

00:53:50.420 --> 00:53:52.340
shipping this out with, you know,

00:53:52.380 --> 00:53:54.980
some of the core technologies that you would need

00:53:54.980 --> 00:53:58.260
versus shipping it out with very little processors

00:53:58.260 --> 00:54:01.700
and everybody having to download and find and everything else?

00:54:02.700 --> 00:54:04.440
That's also one of the advantages.

00:54:04.760 --> 00:54:06.700
So one of the things that's cool

00:54:06.700 --> 00:54:10.020
is you could be as wildly creative as you want

00:54:10.020 --> 00:54:11.740
or you can try to keep the sith.

00:54:11.920 --> 00:54:14.600
I mean, it's a wide range.

00:54:14.740 --> 00:54:16.260
I mean, it's just wide open, your strategy.

00:54:16.460 --> 00:54:18.040
You can hear your approach you could use.

00:54:18.840 --> 00:54:20.560
You know, that's what makes it cool, though.

00:54:20.900 --> 00:54:24.400
But yeah, difficult for somebody like me is still like,

00:54:24.420 --> 00:54:24.980
I don't know.

00:54:25.320 --> 00:54:28.080
Yeah, no, I get it. I get it.

00:54:28.760 --> 00:54:29.480
Okay, good.

00:54:29.480 --> 00:54:32.040
Any other questions I can answer?

00:54:33.340 --> 00:54:34.700
The only other thing I would ask,

00:54:35.860 --> 00:54:38.420
so in a scenario where, you know,

00:54:38.540 --> 00:54:39.760
we're deploying this to prod,

00:54:40.080 --> 00:54:42.000
we're going to be using DOD SSL strategies,

00:54:42.200 --> 00:54:46.780
we're going to be accessing the web GUI via a friendly DNS name, et cetera,

00:54:47.320 --> 00:54:51.080
and you want to do even simple things like get files from a directory

00:54:52.680 --> 00:54:57.860
where you and Thomas Hall doesn't have access to the directories,

00:54:57.860 --> 00:55:00.100
and you have to use a service account and a password,

00:55:00.620 --> 00:55:03.340
I mean, how does something like that work?

00:55:03.480 --> 00:55:07.260
I'm assuming there's a process where we put in the service account

00:55:07.260 --> 00:55:09.460
and the password, but then is it encrypted?

00:55:10.560 --> 00:55:13.240
I mean, you know, I was just thinking about that yesterday.

00:55:13.440 --> 00:55:14.820
Like, how would something like that work?

00:55:14.940 --> 00:55:19.940
Because that's generally how we would end up doing things in our environment.

00:55:20.940 --> 00:55:24.700
Yeah, so let me, that's a great question.

00:55:24.700 --> 00:55:28.300
So, and we went over this in the first class.

00:55:28.420 --> 00:55:33.720
If you see this little shield icon next to some of the processors,

00:55:34.300 --> 00:55:38.480
that is because they have access to the file system.

00:55:39.580 --> 00:55:46.140
They have access to, you know, other resources outside of NAFA.

00:55:46.260 --> 00:55:48.440
For instance, you know,

00:55:48.560 --> 00:55:51.480
these patterns can reference resources over HTTP,

00:55:51.480 --> 00:55:54.480
so they can actually pull in an HTTP request.

00:55:56.000 --> 00:55:59.960
You know, so we were actually talking about that and the multi-tenancy.

00:56:01.060 --> 00:56:04.860
So when they, you know, I know when it gets set up,

00:56:04.960 --> 00:56:07.760
you know, they're going to set all of the multi-tenancy up,

00:56:07.900 --> 00:56:08.900
they're going to have policies.

00:56:09.240 --> 00:56:10.480
So when Tom logs in,

00:56:11.580 --> 00:56:16.920
Tom actually may not have access to the Git file processor

00:56:16.920 --> 00:56:20.780
just because of some security restrictions

00:56:20.780 --> 00:56:24.060
or if you do have a password,

00:56:24.720 --> 00:56:28.740
let me actually show you what happens with passwords.

00:56:29.640 --> 00:56:33.840
Yeah, I can't tell you how many things we do with service accounts and passwords, honestly.

00:56:33.880 --> 00:56:35.020
Yeah, no, I get it.

00:56:35.040 --> 00:56:35.640
That's a lot.

00:56:35.920 --> 00:56:37.840
So here, let me see here.

00:56:41.240 --> 00:56:44.860
Like, here's a credential service and everything else that you can set up with GCP.

00:56:45.460 --> 00:56:49.220
But let me, let me find one that specifically asks for a password.

00:56:50.060 --> 00:56:50.780
Oh, HTTP.

00:56:54.080 --> 00:56:57.000
Invoke HTTP, I think it is.

00:57:06.760 --> 00:57:08.360
Oh, well, I'll just show it right here.

00:57:08.360 --> 00:57:09.200
I can do it right here.

00:57:09.400 --> 00:57:10.720
So let's say you had a username.

00:57:16.900 --> 00:57:21.740
I added one to my canvas by mistake and I was like, oh, that's kind of cool.

00:57:21.940 --> 00:57:25.000
But then I removed it because I don't think it was what I was looking for.

00:57:25.440 --> 00:57:25.860
But yeah.

00:57:26.240 --> 00:57:29.500
And so say I had password and username and password.

00:57:29.980 --> 00:57:31.480
You see the sensitive value.

00:57:31.980 --> 00:57:33.260
That's now yes.

00:57:34.460 --> 00:57:34.760
OK.

00:57:35.100 --> 00:57:39.860
So now we'll type the password, the made up password.

00:57:40.860 --> 00:57:42.440
And so there's a sensitive value.

00:57:42.440 --> 00:57:46.600
So now it's encrypted on disk or in the setting.

00:57:47.780 --> 00:57:57.520
So nobody unless you have permission to access to modify not access to modify that processor.

00:57:58.000 --> 00:58:00.820
You do not have access to the password.

00:58:01.080 --> 00:58:06.900
Not only that, if someone has access to modify, they still don't get to see the password.

00:58:06.900 --> 00:58:09.920
Because you have to it's a sensitive value.

00:58:09.920 --> 00:58:13.560
So even I can't see what I set that password to be.

00:58:13.940 --> 00:58:19.260
So now I have to do it again and fix it and set it and done.

00:58:20.340 --> 00:58:27.740
So, yeah, if you have a username and password, it could be we can actually set that in many places as well.

00:58:27.760 --> 00:58:32.660
I like this question because it helps me show a couple of other parts we talked about earlier.

00:58:32.660 --> 00:58:34.940
There's a parameter.

00:58:37.240 --> 00:58:53.480
So if there was a username and password we want to use and go from there, I can say this is like creds or DB.

00:58:54.920 --> 00:59:00.780
And let's put it like username.

00:59:11.120 --> 00:59:16.920
Then I can add another one that says password.

00:59:20.540 --> 00:59:28.820
That's a sensitive value apply.

00:59:29.500 --> 00:59:42.700
Now I have a parameter that I can reference in all of my I can reference it in all of my processors.

00:59:42.700 --> 00:59:50.600
So instead of Tom needing to know the password, you can just say reference this password.

00:59:52.520 --> 00:59:56.720
And so now is going to see that as a parameter that's set.

00:59:57.060 --> 01:00:09.460
And so you can use, you know, using the, you know, rejects the regular expression language, which, you know, it's just like dollar dot curly bracket password.

01:00:13.260 --> 01:00:20.700
So you never knew the password that was set, but you are referencing the password for your processor.

01:00:21.340 --> 01:00:23.020
If you have that permission, right?

01:00:23.160 --> 01:00:33.720
So it's really cool because that way I can me say I'm the sys admin and I set and I have the keys to the kingdom and I set all of these things up.

01:00:34.160 --> 01:00:37.800
I can then say, OK, Tom, when you connect to this database,

01:00:37.800 --> 01:00:48.420
just use the username database username parameter database password parameter and I can actually set both of them as sensitive values.

01:00:48.660 --> 01:00:54.100
You would never know the username and password, but you would still be able to connect to the database.

01:00:54.840 --> 01:00:59.400
Oh, that's what's up. Yeah. OK. Cool. Pretty cool. Huh? Yep. Very cool.

01:00:59.520 --> 01:01:04.560
No, thank you for asking, because it gives me a teaching moment.

01:01:04.560 --> 01:01:07.180
So I appreciate it. All right. Any other questions?

01:01:08.800 --> 01:01:22.240
Yeah, I think I'm good. OK. So what I'm going to do is actually I'm going to kind of walk through my file and what I was thinking, because, you know, yesterday evening noticing, you know, some of the hurdles that we were having.

01:01:22.260 --> 01:01:25.160
I went through and started building my flow as well.

01:01:25.160 --> 01:01:29.380
And so, you know, I just kind of want to show you all what I was thinking.

01:01:29.560 --> 01:01:41.840
I was going to go and then we will probably just go ahead and go to lunch, come back from lunch, install registry and work on our another scenario.

01:01:42.140 --> 01:01:44.160
That's that's relatively easy.

01:01:44.980 --> 01:01:49.400
So anyway, so, of course, I'm getting my files from the directory.

01:01:49.460 --> 01:01:52.860
It's got CSV, it's got JSON, those types of things.

01:01:53.580 --> 01:01:57.980
I send it to a route on route data on the file type route on attribute.

01:01:59.100 --> 01:02:02.400
So what I'm looking for, you know, there's a file name context.

01:02:02.460 --> 01:02:08.000
So I'm looking for, you know, if it contains dot CSV, send it somewhere else.

01:02:08.000 --> 01:02:10.980
If it contains dot JSON, send it another place.

01:02:11.580 --> 01:02:17.840
I could, once you get that file, it does some identifying MIME type.

01:02:18.520 --> 01:02:20.860
And so I could route that in.

01:02:20.860 --> 01:02:25.280
And if you have an identified MIME type, it's reading what type of file it is.

01:02:25.340 --> 01:02:26.920
It can care less about the name.

01:02:27.300 --> 01:02:31.520
So everything could be named dot JSON and it be an actual CSV.

01:02:32.000 --> 01:02:35.120
It doesn't matter because the MIME type is going to detect CSV.

01:02:35.660 --> 01:02:45.600
So I could have used a MIME type attribute here and said, OK, well, MIME type is application slash JSON, send it over here.

01:02:46.280 --> 01:02:48.640
Application, you know, CSV, send it elsewhere.

01:02:49.220 --> 01:02:55.640
But for this example, I just made it something easy that if it contains the file name CSV, send it somewhere.

01:02:55.760 --> 01:02:57.680
If it contains JSON, send it somewhere else.

01:02:58.780 --> 01:03:07.940
So, you know, and I built this flow basically, you know, a lot of it is modeled after what you all were working on, as well as the previous.

01:03:09.300 --> 01:03:11.140
You know, I'm sending this schema metadata.

01:03:11.240 --> 01:03:12.640
We went over that a couple of times.

01:03:12.760 --> 01:03:15.720
So with that flow, the previous flow, you know, I did it with weather.

01:03:16.840 --> 01:03:18.300
I converted it to JSON.

01:03:18.500 --> 01:03:22.540
I actually do not need the set JSON file name.

01:03:22.820 --> 01:03:26.180
I left it there just because that was the previous scenario.

01:03:26.180 --> 01:03:31.200
I could care less about the file name because I'm not writing it to disk right now.

01:03:31.260 --> 01:03:36.560
So from there, I could have actually sent that JSON straight to the split JSON record.

01:03:37.140 --> 01:03:41.280
So if I was cleaning this up, I would remove that step altogether.

01:03:41.760 --> 01:03:46.980
It's just a step I don't need just because I'm going to write file names later.

01:03:48.160 --> 01:03:51.260
And then, you know, splitting it into individual records.

01:03:51.260 --> 01:03:58.420
I think, you know, Tom, you might have mentioned, or Darius, a couple of you mentioned you want to get it down to an individual record.

01:03:59.320 --> 01:04:06.480
You know, that way you can inspect, you know, each individual file that comes out and make decisions.

01:04:06.760 --> 01:04:11.400
You may still merge it, but that way you have when each record happened.

01:04:12.300 --> 01:04:20.940
Also, if you have each individual record, you know, say you want to take that record and put it onto your enterprise service bus or something similar like that.

01:04:20.940 --> 01:04:25.780
You can take each record and post it, you know, or something like that.

01:04:25.820 --> 01:04:27.320
That might be a use case.

01:04:28.160 --> 01:04:31.940
I evaluate JSON path, you know, like most of you all did.

01:04:32.580 --> 01:04:41.660
Again, whenever you are doing an evaluate JSON path, make sure that you have this set to flow file attribute.

01:04:42.000 --> 01:04:50.280
If you are pulling out multiple elements of that JSON document, if you do flow file content,

01:04:50.280 --> 01:04:55.820
it's only going to pull out one of the elements of that document, right?

01:04:56.220 --> 01:05:00.980
And so what I did is I said, okay, I've got this JSON file.

01:05:01.220 --> 01:05:02.500
I'm bringing it in.

01:05:02.880 --> 01:05:07.440
I'm going to extract every value out because I want it as an attribute.

01:05:07.840 --> 01:05:20.260
And then that way, if I have all the CSV as the same attribute and I have JSON as the same attribute, all the same, I can write those attributes however I want to write them.

01:05:21.240 --> 01:05:24.140
So I did that. I extracted that.

01:05:26.000 --> 01:05:29.000
And then I started building my document.

01:05:29.620 --> 01:05:33.400
So, you know, I took all the attributes, saved them as a JSON.

01:05:34.600 --> 01:05:41.000
You know, keep in mind I did pretty print on my JSON just so it's more presentable.

01:05:41.800 --> 01:05:45.480
I also included the core attributes.

01:05:45.480 --> 01:05:51.140
And of course, you know, the destination, I want it from an attribute down to the flow file.

01:05:51.380 --> 01:05:59.140
So I took all those attributes, built an actual piece of data document, and then that's what goes to the next.

01:05:59.440 --> 01:06:04.800
Now, you could do flow file attribute, but I don't need to because I've already got them as attributes.

01:06:07.240 --> 01:06:10.120
And so I run this one time.

01:06:12.060 --> 01:06:17.720
Okay, so, you know, I took a list of all the attributes.

01:06:18.540 --> 01:06:24.540
And the reason I did that is now I've got even more information.

01:06:25.020 --> 01:06:35.080
So that actual file, it only had temperature, wind speed, station ID, precipitation, humidity.

01:06:35.080 --> 01:06:40.880
Now I have, you know, was this originally a JSON document or CSV?

01:06:41.180 --> 01:06:44.520
Well, I know it was JSON because I have the route on attribute route.

01:06:44.960 --> 01:06:53.520
So that's some of those core attributes that were automatically added, you know, as this document traverses the flow.

01:06:54.680 --> 01:06:58.580
You know, when was the file last modified, right?

01:06:58.580 --> 01:07:12.340
When was a fragment identifier, the UUID, the path of where the data came from originally, you know, the original file name, some of the owner and stuff like that.

01:07:13.140 --> 01:07:16.000
So, you know, I wanted to include that.

01:07:16.440 --> 01:07:23.540
And that might be additional data I would need when I'm making my data science type decisions.

01:07:24.680 --> 01:07:33.740
So, but if I didn't want that, I could have just, like, I could have spelled out only attributes to JSON.

01:07:34.020 --> 01:07:36.600
I could have, it's just a comma separated list.

01:07:36.600 --> 01:07:40.340
So I could put temperature, comma, humidity, comma, wind speed.

01:07:40.940 --> 01:07:44.940
And those are the only three attributes I would have got when I wrote this document.

01:07:45.700 --> 01:07:47.680
So, you know, something just keep in mind.

01:07:48.800 --> 01:07:53.920
And then this is where setting the file name really came into play.

01:07:54.360 --> 01:07:57.500
I was trying to work with some fancy date format.

01:07:57.500 --> 01:07:59.680
I was pre-pending or pending.

01:08:00.800 --> 01:08:09.100
But for the sake of time, instead of trying to diagnose it, I just updated the file name to be a UUID.JSON.

01:08:10.240 --> 01:08:11.980
And so, you know, it was really easy.

01:08:11.980 --> 01:08:17.580
I just take file name and, you know, it's doing an update attribute.

01:08:17.660 --> 01:08:24.060
So it's going to take the file name attribute and output a UUID.JSON file name.

01:08:24.940 --> 01:08:31.460
So all the data that was coming in, you know, to that, let me see, I can run this one time.

01:08:34.020 --> 01:08:38.980
So the data going in was a weird file name JSON.

01:08:38.980 --> 01:08:44.200
And now the file name should be a unique UUID.JSON.

01:08:44.900 --> 01:08:48.960
So now I've got all of my JSON extracted.

01:08:49.180 --> 01:08:51.160
I've got my CSV converted to JSON.

01:08:51.300 --> 01:08:56.120
I've got everything coming in, and all the data looks exactly the same.

01:08:56.560 --> 01:08:58.380
Even the file name has a pattern.

01:08:59.400 --> 01:09:01.760
You know, I could have named it a date or something else.

01:09:01.760 --> 01:09:13.980
If you look in the regex expression for the NAFA documentation, you'll see file name has a pretty substantial, you know,

01:09:14.120 --> 01:09:17.460
it has a substantial category to it.

01:09:17.460 --> 01:09:22.620
So you can actually prepend, append, replace those types of things.

01:09:23.620 --> 01:09:31.700
So from there, I created a new processor group because I wanted to start showing off some other tips and tricks of NAFA.

01:09:32.240 --> 01:09:37.300
So I created a new processor group that basically is the output to the file system.

01:09:37.380 --> 01:09:41.640
So I go into my processor group and I created an input port.

01:09:41.640 --> 01:09:56.640
And so when I did the input port, all I did is, let's see here, from main group, local connection.

01:10:05.300 --> 01:10:10.620
And, oh, because I'm already using, let me delete this one, because I'm already using this.

01:10:13.740 --> 01:10:19.500
Anyways, when you set this up, when you do an input port, it's going to ask you where it came from.

01:10:19.780 --> 01:10:24.760
And so, you know, you just drag and drop, it came from that previous group.

01:10:24.760 --> 01:10:31.000
So now I have an input port. And what I'm able to do is start separating this.

01:10:31.240 --> 01:10:44.500
And this goes back to what I was saying earlier where, you know, you may have, you know, Tom may be responsible for all data movement that's getting written to a database.

01:10:44.500 --> 01:10:53.900
And so Tom says, well, if you want to write to a database, here are the specs that you need and just send me the specs.

01:10:54.440 --> 01:11:00.160
And my process group handles everything else. My process group will take your specs.

01:11:00.500 --> 01:11:05.600
It will write it to the database. You know, you do not have to worry about that task.

01:11:05.600 --> 01:11:27.280
And so, you know, breaking this up into different processor groups and having input and output ports really helps on, you know, divvying up not only the workload, but, you know, I've seen processor groups that everyone uses because it does some sort of enrichment of data.

01:11:27.280 --> 01:11:35.440
And I've had other data engineers use these processor groups and they had no clue what was in the processor group.

01:11:35.700 --> 01:11:43.700
They know that they could just put CSV in and output would be, you know, this beautiful JSON that answers all these questions.

01:11:44.800 --> 01:11:47.040
Somebody else, you know, built the logic.

01:11:47.040 --> 01:11:57.880
And so, you know, in that sense, you can think of a processor group as a processor where, you know, you've built this whole data flow.

01:11:58.100 --> 01:12:04.740
And now you've got basically a processor that you can re-use over and over.

01:12:05.180 --> 01:12:08.060
So there's a lot of power there. There's a lot of capabilities.

01:12:08.060 --> 01:12:18.940
So, and then when I got done with putting the file as each individual record, if I would have had time, I would have probably sent it to like, I like SQL.

01:12:19.560 --> 01:12:27.000
I like Python and stuff. So I would probably send it to a SQL processor. I would have done some SQL on the files coming in.

01:12:27.200 --> 01:12:30.460
That way I can compute average and stuff like that.

01:12:31.180 --> 01:12:38.340
And then, you know, if I got an alert like that was above a threshold, I would package that up and send it as an email.

01:12:39.040 --> 01:12:43.080
But that's my flow. Some of my thinking that went into mine.

01:12:44.280 --> 01:12:46.960
And, you know, then I try to work on some of the beautification.

01:12:48.740 --> 01:12:56.500
So any questions of, you know, of what we've covered in the last two and a half days?

01:13:00.700 --> 01:13:05.360
Okay. Well, I guess you guys, I didn't know you guys came in at like 6 a.m.

01:13:05.400 --> 01:13:07.640
I would have tried to get us out of here earlier for lunch.

01:13:08.420 --> 01:13:17.660
So let's go to lunch. We will come back at 1 50, my time, 11 50, y'all's time.

01:13:17.880 --> 01:13:28.560
And then we will get started with registry. We will check our flows in and, you know, we may go over some more slides.

01:13:28.660 --> 01:13:31.120
They work on another scenario, depending on time.

01:13:32.180 --> 01:13:47.540
But we'll definitely touch on registry, checking things in, because you're going to see this when, you know, whoever gets the privilege of deploying for for you all and whoever has to administer it, you're going to see registry.

01:13:47.580 --> 01:13:54.040
So I want to make sure that it's a different component than the canvas, but it's a sub project for now.

01:13:54.180 --> 01:13:57.620
And then we're going to touch on and find some other things as well.

01:13:57.620 --> 01:14:06.840
So go enjoy lunch. I will see everybody back here in 45 minutes, which is 11 46.

01:14:06.960 --> 01:14:12.340
You know, I was going to say, I'm not going to have enough person or keep pulling it.

01:14:13.180 --> 01:14:16.280
But those two things I've been hearing a lot.

01:14:16.480 --> 01:14:19.600
The keywords I've been hearing a lot is multi-tenancy and registry.

01:14:19.740 --> 01:14:21.820
I hear them talk about that all the time. Yep.

01:14:21.820 --> 01:14:24.280
And those are things that you're going to use.

01:14:25.240 --> 01:14:31.340
And when I say go over some slides, I actually have Q&A section about multi-tenancy.

01:14:31.680 --> 01:14:36.940
And then I also have multi-tenancy, you know, just about multi-tenancy.

01:14:37.260 --> 01:14:40.740
I just don't want to kill you too much with PowerPoint, but we will go over.

01:14:42.000 --> 01:14:43.100
Appreciate it. Yeah.

01:14:43.100 --> 01:14:46.380
I mean, I don't know if I want to be the essay of it. I'm hoping not.

01:14:46.480 --> 01:14:48.340
I'm hoping the person deployed it now owns it.

01:14:49.520 --> 01:14:51.660
No, you're getting stuck with it, Tom. You're the expert.

01:14:52.100 --> 01:14:53.080
You never know.

01:14:54.100 --> 01:14:55.280
That's how it usually goes.

01:14:55.320 --> 01:14:58.140
You're the captain now, Tom. You're the captain.

01:14:59.180 --> 01:15:00.560
Sure. That's how it usually goes.

01:15:01.040 --> 01:15:03.040
All right. I'll see you guys here soon.

01:15:18.540 --> 01:15:20.340
All right.

01:15:51.000 --> 01:15:52.400
Okay.

01:16:41.100 --> 01:16:43.900
All right.

01:17:13.200 --> 01:17:16.000
All right.

01:17:18.340 --> 01:17:19.540
All right.

01:18:14.720 --> 01:18:17.520
All right.

01:18:18.340 --> 01:18:20.340
All right.

01:18:49.820 --> 01:18:52.620
All right.

01:19:38.320 --> 01:19:41.120
All right.

01:20:15.520 --> 01:20:18.320
All right.

01:20:43.040 --> 01:20:45.840
All right.

01:20:50.780 --> 01:20:50.940
All right.

01:21:43.640 --> 01:21:46.440
All right.

01:21:59.260 --> 01:22:02.060
All right.

01:22:45.180 --> 01:22:45.260
All right.

01:23:15.460 --> 01:23:18.260
All right.

01:23:18.340 --> 01:23:20.720
All right.

01:23:48.380 --> 01:23:50.340
All right.

01:23:52.180 --> 01:23:54.760
All right.

01:23:54.980 --> 01:23:57.040
All right.

01:23:57.040 --> 01:24:01.040
All right.

01:24:46.580 --> 01:24:49.380
All right.

01:24:49.840 --> 01:24:56.580
All right.

01:25:15.100 --> 01:25:17.900
All right.

01:25:22.400 --> 01:25:24.140
All right.

01:25:24.140 --> 01:25:26.480
All right.

01:25:46.180 --> 01:25:47.180
All right.

01:25:47.180 --> 01:25:47.500
All right.

01:25:47.500 --> 01:25:49.620
Hopefully everyone's getting back from lunch.

01:26:06.460 --> 01:26:06.920
Okay.

01:26:06.920 --> 01:26:07.240
Hey, Joshua.

01:26:07.500 --> 01:26:08.640
I have a question on mine.

01:26:08.980 --> 01:26:10.380
Yeah, Peter, go ahead.

01:26:12.420 --> 01:26:15.060
So, I managed to change the way that it was naming them.

01:26:15.240 --> 01:26:18.000
I got a good name format that I like now.

01:26:19.380 --> 01:26:23.260
but it's still not exporting data as a CSV.

01:26:25.360 --> 01:26:29.800
It exported all of the names by date, the hours,

01:26:30.280 --> 01:26:31.960
data was recorded, and then the station.

01:26:32.200 --> 01:26:32.680
Nice.

01:26:32.860 --> 01:26:35.680
So each of those, between those two variables that are,

01:26:35.960 --> 01:26:36.960
between those three attributes,

01:26:37.100 --> 01:26:38.660
that's gonna create a unique ID.

01:26:38.660 --> 01:26:39.400
Uh-huh.

01:26:40.060 --> 01:26:43.000
Same C-Tracker, or, yeah, I guess C-Tracker separately.

01:26:43.540 --> 01:26:44.020
Okay.

01:26:44.280 --> 01:26:46.240
This is what it looks like when you open one of them,

01:26:46.340 --> 01:26:47.660
though it just has one line of data

01:26:47.660 --> 01:26:51.840
instead of having no, there's no headers.

01:26:52.440 --> 01:26:53.520
Well, it's a CSV file.

01:26:55.260 --> 01:26:58.380
So it's a single, well, wait a minute, is that, okay.

01:26:58.420 --> 01:27:00.580
So it looks like that's per record.

01:27:01.300 --> 01:27:05.780
It doesn't like a, so it is just a comma separated value,

01:27:06.180 --> 01:27:08.400
but it doesn't have any header.

01:27:09.540 --> 01:27:11.280
You know, there's no headers there.

01:27:11.360 --> 01:27:15.660
So you don't have like the attribute and then the name,

01:27:15.660 --> 01:27:19.420
which is like, you know, you may have temperature or date

01:27:19.420 --> 01:27:21.160
in the, you know, the correct date.

01:27:21.580 --> 01:27:22.580
Did you format it?

01:27:22.580 --> 01:27:24.480
No, that's the date from the attribute.

01:27:25.580 --> 01:27:28.740
So it is, it does look like a CSV to me.

01:27:31.580 --> 01:27:35.300
Unless there's a comma somewhere in the data,

01:27:35.560 --> 01:27:37.920
I don't think I see any commas in the data.

01:27:37.980 --> 01:27:40.340
So I think it's a proper CSV.

01:27:40.780 --> 01:27:42.120
It's just no header file.

01:27:43.020 --> 01:27:44.340
Okay, yeah, I see what you mean.

01:27:44.340 --> 01:27:45.880
Yeah, so.

01:27:45.880 --> 01:27:47.140
It doesn't have a header for you,

01:27:47.140 --> 01:27:48.460
or like a name for each of the.

01:27:48.460 --> 01:27:50.920
Yeah, a name for each of the categories.

01:27:51.680 --> 01:27:54.380
So that's something you can add.

01:27:56.820 --> 01:27:59.980
You could add the, let's look at the,

01:28:00.740 --> 01:28:03.240
how would you add that header back right quick?

01:28:03.260 --> 01:28:04.340
Let me look at mine.

01:28:05.600 --> 01:28:06.340
Let's see here.

01:28:07.400 --> 01:28:09.220
I like your date format.

01:28:10.740 --> 01:28:12.520
It's pretty cool once you start messing

01:28:12.520 --> 01:28:13.600
with some of the regex.

01:28:21.080 --> 01:28:22.160
Let me go in here.

01:28:22.160 --> 01:28:24.740
I remember where I got that from, somewhere further back.

01:28:25.240 --> 01:28:27.480
All right, let's bring down the,

01:28:27.520 --> 01:28:29.680
you're doing an attributes to CSV, right?

01:28:30.600 --> 01:28:31.060
Yes.

01:28:31.780 --> 01:28:33.800
All right, let's do this real quick.

01:28:40.520 --> 01:28:43.600
I think the, it includes schema.

01:28:44.000 --> 01:28:44.320
If you have a schema,

01:28:44.320 --> 01:28:47.780
the schema attribute names will also be converted to CSV.

01:28:48.680 --> 01:28:50.440
So that way you have a header file.

01:28:51.740 --> 01:28:54.260
So it's gonna give you the attribute name.

01:28:54.720 --> 01:28:56.500
In your case, it'd be like temperature,

01:28:56.860 --> 01:28:58.600
station ID, those types of things.

01:28:59.000 --> 01:29:00.600
So if you set that to true,

01:29:01.260 --> 01:29:03.780
it should give you all the attributes.

01:29:04.960 --> 01:29:06.280
The, you know, it's not gonna say attribute.

01:29:06.460 --> 01:29:07.620
It should say temperature,

01:29:07.620 --> 01:29:09.960
and then the value in the next line.

01:29:10.560 --> 01:29:13.740
So that way you always have a header file with it.

01:29:14.320 --> 01:29:18.100
But it does look like regular CSV to me.

01:29:19.280 --> 01:29:22.540
If you want, if you can copy it and send it to yourself,

01:29:22.640 --> 01:29:25.240
you can just see if Excel opens it with no problem.

01:29:30.400 --> 01:29:34.000
There's a way to upload files through the drop files

01:29:34.000 --> 01:29:35.740
here on the bottom of your screen.

01:29:36.140 --> 01:29:38.680
But I don't think this platform

01:29:38.680 --> 01:29:40.840
has an easy way to download.

01:29:41.360 --> 01:29:44.580
You could use something like Pastebin, not Pastebin.

01:29:45.820 --> 01:29:47.580
One of the things that we use for this class

01:29:47.580 --> 01:29:48.900
sometimes is Etherpad.

01:29:51.380 --> 01:29:54.480
Here's like an Etherpad that I have saved from,

01:29:54.840 --> 01:29:58.660
I teach this class as well as the

01:29:58.660 --> 01:30:01.340
DoD architecture framework class,

01:30:01.820 --> 01:30:04.020
just because I've had to implement that

01:30:04.020 --> 01:30:05.480
so many times now.

01:30:06.040 --> 01:30:08.540
But you can actually go to Etherpad.

01:30:09.440 --> 01:30:14.500
You can use this company, NobleProg,

01:30:14.720 --> 01:30:15.900
you can use their Etherpad,

01:30:15.940 --> 01:30:19.580
and if you go to etherpad.nobleprog.com,

01:30:19.600 --> 01:30:23.500
you can just create one and say okay,

01:30:23.620 --> 01:30:25.180
and it's gonna create a new pad for you,

01:30:25.180 --> 01:30:27.000
and then you can just use that same address

01:30:27.000 --> 01:30:30.180
on your local and copy your information over.

01:30:30.520 --> 01:30:32.280
But that looks like valid CSV.

01:30:32.780 --> 01:30:35.200
Looks like you just need to include the attribute list

01:30:35.200 --> 01:30:36.100
and you'll have a header,

01:30:36.480 --> 01:30:37.840
and you're off to the races.

01:30:39.480 --> 01:30:41.060
Yeah, I like it.

01:30:41.180 --> 01:30:42.300
Yep, yep, okay.

01:30:42.900 --> 01:30:44.540
Well, hopefully everyone had a good lunch.

01:30:46.460 --> 01:30:50.240
I wish I realized you all came in at 6 a.m.

01:30:50.460 --> 01:30:52.300
because you probably take your lunch

01:30:52.940 --> 01:30:55.100
closer towards 11.

01:30:55.820 --> 01:30:58.080
I know yesterday and the day before

01:30:58.080 --> 01:30:59.320
we went a little bit past that.

01:30:59.560 --> 01:31:00.640
It would have made me happy

01:31:00.640 --> 01:31:04.300
because I usually eat lunch at about 11.30 my time,

01:31:04.380 --> 01:31:06.500
which is 9.30 y'all's time,

01:31:06.500 --> 01:31:11.140
so I didn't realize y'all start so early,

01:31:11.260 --> 01:31:14.240
and so do I, so good deal.

01:31:14.860 --> 01:31:17.740
Anyways, we are going to now,

01:31:19.340 --> 01:31:23.100
feel free to follow along with me.

01:31:23.500 --> 01:31:25.480
I could kill you via PowerPoint,

01:31:25.620 --> 01:31:29.640
but again, I feel like hands-on is the best,

01:31:29.660 --> 01:31:33.540
and it really teaches you how to do these things.

01:31:34.300 --> 01:31:36.760
So on this next topic,

01:31:36.760 --> 01:31:41.360
we're gonna talk about NaPy registry and what it does,

01:31:41.420 --> 01:31:43.840
and then we are actually going to configure

01:31:43.840 --> 01:31:47.240
our NaPy instance to use registry,

01:31:47.660 --> 01:31:51.980
and we're gonna then check in our code or our data flows

01:31:51.980 --> 01:31:56.160
and check those out before we roll in the multi-tenancy,

01:31:56.260 --> 01:31:59.740
which is a PowerPoint, unfortunately.

01:32:00.340 --> 01:32:01.980
And then if we have some time,

01:32:02.580 --> 01:32:06.920
I'd like to do a little scenario, another quick scenario,

01:32:07.000 --> 01:32:10.060
and then do a little QA-type, test-type thing,

01:32:10.060 --> 01:32:15.380
and then we can go get done and get through

01:32:15.380 --> 01:32:16.820
and get through to a long weekend.

01:32:17.160 --> 01:32:20.320
So let me minimize this.

01:32:20.900 --> 01:32:24.100
Okay, so if you remember when we installed NaPy,

01:32:24.380 --> 01:32:26.860
we had a zip file I showed you.

01:32:26.860 --> 01:32:29.140
We could go and we can download the zip file.

01:32:29.920 --> 01:32:31.340
Let me bring my browser right up.

01:32:34.780 --> 01:32:39.960
So I went to napy.apache.org, and if I click Download,

01:32:40.540 --> 01:32:43.880
it is going to, you know, there's the NaPy 2.0

01:32:43.880 --> 01:32:45.180
that I was talking about earlier.

01:32:45.820 --> 01:32:47.360
You can download the binaries.

01:32:50.320 --> 01:32:53.600
You know, the latest stable branch is 126.

01:32:53.700 --> 01:32:57.800
It's also the version that it's my understanding

01:32:57.800 --> 01:33:01.080
some of you may be using when they install it.

01:33:02.980 --> 01:33:05.080
So, but you know, feel free to,

01:33:05.080 --> 01:33:06.620
you can download NaPy, and we did.

01:33:06.700 --> 01:33:09.600
We downloaded the binaries.

01:33:09.640 --> 01:33:11.600
We downloaded the NaPy standard.

01:33:11.840 --> 01:33:15.680
Now, there's NaPy standard, which is what we are using.

01:33:16.240 --> 01:33:21.300
There is NaPy stateless, which is, it's really cool.

01:33:21.580 --> 01:33:23.380
It's not been out that long,

01:33:23.420 --> 01:33:28.200
but it gives us a capability to run these flows

01:33:28.200 --> 01:33:29.660
as a stateless service.

01:33:29.660 --> 01:33:32.480
So, you know, you can have a data flow

01:33:32.480 --> 01:33:36.980
and it get packaged up basically as a microservice,

01:33:37.200 --> 01:33:40.400
and we'll run in your Kubernetes or your AWS

01:33:40.400 --> 01:33:43.960
or Azure as a service.

01:33:44.000 --> 01:33:46.220
So you have a microservice that's only thing

01:33:46.220 --> 01:33:49.360
that it's doing is executing that data flow,

01:33:49.360 --> 01:33:50.700
and that's it.

01:33:50.760 --> 01:33:54.100
So there's no UI, there's no way to modify it

01:33:54.100 --> 01:33:55.280
and things like that.

01:33:55.560 --> 01:33:58.860
You know, it's for those production grade data flows

01:33:58.860 --> 01:34:01.180
that you wanna set it and forget it.

01:34:01.580 --> 01:34:02.800
And then if there's any changes,

01:34:03.060 --> 01:34:06.460
you can quickly build another one and deploy it as well.

01:34:07.400 --> 01:34:08.480
So that's stateless.

01:34:08.740 --> 01:34:11.520
There's NaPy Kafka Connector Kit.

01:34:11.980 --> 01:34:14.720
Kafka, you probably have heard of it.

01:34:15.020 --> 01:34:19.220
It is another open source project that is widely used.

01:34:19.980 --> 01:34:21.540
Kafka is used more than NaPy.

01:34:22.040 --> 01:34:26.000
It is a very popular PubSub type of capability.

01:34:26.260 --> 01:34:27.460
It provides a message bus

01:34:27.460 --> 01:34:29.500
where you push data to topics

01:34:29.500 --> 01:34:31.340
and you can subscribe to topics,

01:34:32.360 --> 01:34:34.180
you know, do those types of things.

01:34:34.580 --> 01:34:36.620
And then there's the NaPy Toolkit.

01:34:37.200 --> 01:34:38.920
You can download that as well.

01:34:38.980 --> 01:34:41.580
So if you are a sysadmin,

01:34:41.640 --> 01:34:44.200
you would be interested in the NaPy Toolkit.

01:34:44.340 --> 01:34:46.580
It's the one that's gonna help you start setting up

01:34:46.580 --> 01:34:48.240
some of your security things,

01:34:48.500 --> 01:34:51.480
as well as changing usernames and passwords

01:34:51.480 --> 01:34:54.400
if you're locally authenticating like we are.

01:34:54.400 --> 01:34:57.260
You know, it's got a bunch of different capabilities.

01:34:57.700 --> 01:34:59.960
The toolkit is command line only.

01:35:00.200 --> 01:35:02.520
There is no GUI, there's no UI.

01:35:03.540 --> 01:35:05.480
You know, they don't make it easy for that.

01:35:06.040 --> 01:35:09.120
And it's a, well, they made it easier for the sysadmin.

01:35:09.360 --> 01:35:10.560
So if you're a sysadmin,

01:35:10.640 --> 01:35:12.860
you may be interested in those, you know,

01:35:13.140 --> 01:35:14.820
different flavors of NaPy.

01:35:16.340 --> 01:35:18.900
But one of the things is, you know,

01:35:18.900 --> 01:35:21.900
if you look at projects, we have Minify,

01:35:22.080 --> 01:35:24.100
which we are definitely gonna chat about

01:35:24.100 --> 01:35:26.460
just because I know the other group is going,

01:35:26.460 --> 01:35:29.420
is using it and is going to use it even more.

01:35:30.380 --> 01:35:32.440
We have Minify, we have Registry,

01:35:32.560 --> 01:35:34.220
we have Flow Design System.

01:35:34.880 --> 01:35:37.260
Flow Design System we're really not gonna go into.

01:35:37.400 --> 01:35:38.960
It's a sub project of NaPy,

01:35:39.040 --> 01:35:43.020
but basically if you were building a custom UI

01:35:43.020 --> 01:35:45.340
to interact with NaPy and other stuff

01:35:45.340 --> 01:35:50.420
and you kinda want it to be the same style and layout

01:35:50.420 --> 01:35:52.920
as NaPy with some of those reusable components,

01:35:52.920 --> 01:35:55.440
that is the Flow Design System.

01:35:56.420 --> 01:36:00.480
So, you know, not a lot of people are using that right now.

01:36:00.560 --> 01:36:02.820
There's not a whole ton of uses for it.

01:36:02.820 --> 01:36:04.800
It's more if you're building your own custom product,

01:36:04.800 --> 01:36:07.840
I feel like, you know, in those types of things.

01:36:08.500 --> 01:36:11.220
So, but the one that we are going to touch on first

01:36:11.220 --> 01:36:12.280
is Registry.

01:36:12.940 --> 01:36:16.500
So if you, it's already downloaded on your lap,

01:36:16.640 --> 01:36:17.660
on your virtual machine.

01:36:17.780 --> 01:36:19.960
So you do not need to follow these steps,

01:36:20.600 --> 01:36:25.140
but you can go to the Apache NaPy Registry sub project

01:36:25.140 --> 01:36:27.660
and you've got documentation.

01:36:27.980 --> 01:36:31.300
You've got, you know, here's the Registry documentation.

01:36:31.500 --> 01:36:32.240
Here's the Wiki.

01:36:32.360 --> 01:36:35.400
Here's videos on getting started and running it.

01:36:35.740 --> 01:36:39.360
Here's what it does, you know, those types of things.

01:36:39.500 --> 01:36:41.700
And then, you know, you can download Registry

01:36:41.700 --> 01:36:43.200
just like everybody else.

01:36:44.600 --> 01:36:47.100
If you click download and click Registry,

01:36:47.480 --> 01:36:49.300
one, two, six, binary, this is right there.

01:36:49.300 --> 01:36:50.680
You can even download the source code.

01:36:51.000 --> 01:36:53.280
But you don't need to worry about downloading it

01:36:53.280 --> 01:36:55.760
because we are going to install it.

01:36:56.780 --> 01:36:58.060
So with that being said,

01:36:58.620 --> 01:37:00.460
if you kind of want to follow along,

01:37:01.160 --> 01:37:02.960
what I'm doing is going to my desktop.

01:37:03.560 --> 01:37:06.920
I'm going to go back into the downloads folder.

01:37:08.700 --> 01:37:10.120
In the downloads folder.

01:37:12.340 --> 01:37:13.640
Let's see, let me double check.

01:37:15.640 --> 01:37:16.020
Awesome.

01:37:16.020 --> 01:37:20.880
You should see a file that was made earlier this week.

01:37:21.360 --> 01:37:26.580
NaPy-Registry-126.0-bin, a zip file.

01:37:26.880 --> 01:37:29.020
You know, it's just like the NaPy zip file

01:37:29.020 --> 01:37:31.020
except for it's NaPy-Registry.

01:37:31.580 --> 01:37:33.460
So what we're going to do is right click

01:37:33.460 --> 01:37:35.640
and we're going to extract all.

01:37:36.180 --> 01:37:38.140
And I'm going to just leave it,

01:37:38.140 --> 01:37:41.500
the destination in the downloads folder and let it run.

01:37:43.760 --> 01:37:50.060
That'll take just a minute to extract.

01:38:02.960 --> 01:38:03.520
Okay.

01:38:07.320 --> 01:38:10.740
And the way that I like to do some of these sub-projects

01:38:10.740 --> 01:38:14.740
I know I haven't really told you what NaPy-Registry is

01:38:14.740 --> 01:38:17.300
or what it does or any of those types of things.

01:38:18.100 --> 01:38:20.620
What I like to do is just go right into it

01:38:20.620 --> 01:38:24.120
because you already have data flows

01:38:24.120 --> 01:38:25.980
that we can check in.

01:38:26.640 --> 01:38:29.140
And so, you know, that I think

01:38:29.140 --> 01:38:31.160
by getting the hands-on approach

01:38:31.160 --> 01:38:32.880
is way better than a PowerPoint

01:38:32.880 --> 01:38:36.200
but it should be extracted by now.

01:38:38.420 --> 01:38:42.260
And if you get caught up on any steps,

01:38:42.380 --> 01:38:44.900
if you missed a step, if I go too fast,

01:38:45.240 --> 01:38:47.020
you know, you get caught away for a second

01:38:47.020 --> 01:38:48.580
and you missed a step, I don't like,

01:38:48.720 --> 01:38:52.240
just interrupt me and let's get you on the same page.

01:38:52.620 --> 01:38:54.740
So the layout of this file folder

01:38:54.740 --> 01:38:56.700
is a lot like NaPy, right?

01:38:57.780 --> 01:38:58.980
Unlike NaPy though,

01:38:58.980 --> 01:39:01.980
we don't have all the data governance reporting

01:39:01.980 --> 01:39:04.820
and all the content repository

01:39:04.820 --> 01:39:06.200
and those types of things.

01:39:06.900 --> 01:39:08.840
But we do have a bin, we do have a conf,

01:39:08.920 --> 01:39:11.700
we have the docs and EXP and a lib directory.

01:39:12.800 --> 01:39:15.480
We've kind of went over lib directory in NaPy,

01:39:15.500 --> 01:39:17.480
it's a processor in our files.

01:39:18.000 --> 01:39:19.800
Same type of thing within registry.

01:39:20.300 --> 01:39:22.840
You know, the lib directory holds all the libraries

01:39:22.840 --> 01:39:26.980
and any kind of components of a registry.

01:39:28.880 --> 01:39:30.720
You know, you have the docs folder,

01:39:30.940 --> 01:39:32.460
the comp folder and the bin folder.

01:39:32.460 --> 01:39:35.380
The bin folder has the executable to run registry

01:39:35.380 --> 01:39:38.040
and, you know, of course we're gonna go into that.

01:39:38.080 --> 01:39:39.820
Just like we were doing with NaPy,

01:39:40.080 --> 01:39:41.300
we're gonna run registry.

01:39:41.760 --> 01:39:43.080
But before we do that,

01:39:44.260 --> 01:39:46.260
I wanna go into my configuration directory.

01:39:46.940 --> 01:39:50.500
I understand that not everyone will have to do this

01:39:50.500 --> 01:39:51.920
and that's great, but, you know,

01:39:51.980 --> 01:39:54.460
for some of the sysadmins on the call,

01:39:55.040 --> 01:39:57.640
you know, folks that are gonna help set this up,

01:39:57.740 --> 01:40:00.460
you know, this is where your settings are located.

01:40:01.000 --> 01:40:03.060
So if you go into the comp directory

01:40:03.060 --> 01:40:07.420
and you go to NaPy-registry.properties,

01:40:08.120 --> 01:40:09.580
should be able to edit that

01:40:09.580 --> 01:40:12.260
and open it with the Notepad++

01:40:12.260 --> 01:40:15.260
that should have already been installed as well,

01:40:15.820 --> 01:40:17.260
if you wanna kind of follow along.

01:40:17.320 --> 01:40:19.340
Or you just watch my screen on this one.

01:40:20.540 --> 01:40:22.340
So we have our web properties.

01:40:22.460 --> 01:40:24.300
That's exactly like it was in NaPy.

01:40:24.400 --> 01:40:26.040
Some of the security properties.

01:40:26.040 --> 01:40:30.840
All of these should look very similar

01:40:30.840 --> 01:40:32.220
to the NaPy properties.

01:40:32.400 --> 01:40:35.100
It's in the same format, same naming convention,

01:40:35.560 --> 01:40:36.640
those types of things.

01:40:36.920 --> 01:40:40.340
Except for, you know, instead of NaPy.db.url,

01:40:40.360 --> 01:40:42.300
it's NaPy.registry.db.

01:40:42.700 --> 01:40:46.700
So, you know, if it shares the same property as NaPy,

01:40:46.900 --> 01:40:48.940
then that property is just changed

01:40:48.940 --> 01:40:51.820
from NaPy.period to NaPy.registry.

01:40:52.900 --> 01:40:55.100
Some of the key things is, you know,

01:40:55.100 --> 01:40:57.600
your security settings and those types of stuff.

01:40:58.560 --> 01:41:03.240
The configuration file for NaPy.registry

01:41:03.240 --> 01:41:05.080
is much shorter, right?

01:41:05.080 --> 01:41:07.340
It's a sub-project of NaPy.

01:41:08.040 --> 01:41:11.500
There's not a lot going on there.

01:41:12.960 --> 01:41:15.620
And, you know, it's not a heavy lift.

01:41:16.000 --> 01:41:17.740
You know, it's a much smaller package.

01:41:17.860 --> 01:41:21.080
NaPy is about two gig when you extract it.

01:41:21.680 --> 01:41:23.820
I think registry is like three or 400 meg.

01:41:23.820 --> 01:41:27.020
But the main property we'll look at is

01:41:27.020 --> 01:41:30.220
this is just, what port is this going to be running on?

01:41:31.080 --> 01:41:33.900
So, for instance, in our configuration,

01:41:34.400 --> 01:41:39.160
it's gonna run on 1-8-0-8-0.

01:41:39.660 --> 01:41:41.720
So, if you know the internet,

01:41:42.040 --> 01:41:43.640
everything is running on port 80.

01:41:43.920 --> 01:41:47.080
And then it usually will take you to 443,

01:41:47.120 --> 01:41:48.060
which is secure.

01:41:49.000 --> 01:41:51.720
8080 is your backup, unsecure web port.

01:41:51.720 --> 01:41:56.380
And then 1-8-0-8-0 is the backup to that backup.

01:41:56.920 --> 01:42:03.340
So, it's running on the backup HTTP port of 18-0-8-0.

01:42:03.800 --> 01:42:06.260
And so, when we visit the browser,

01:42:06.740 --> 01:42:08.660
we're just gonna go to localhost,

01:42:08.720 --> 01:42:14.900
you know, colon 1-8-0-8-0 slash NaPy-registry.

01:42:15.160 --> 01:42:17.960
And so, we don't need HTTPS.

01:42:18.120 --> 01:42:19.600
We don't need any of that.

01:42:20.400 --> 01:42:23.440
The reason being is, you know,

01:42:23.780 --> 01:42:25.340
registry doesn't need the security

01:42:25.340 --> 01:42:27.820
because if you install registry

01:42:28.640 --> 01:42:31.940
and it's public facing and things like that,

01:42:32.000 --> 01:42:33.880
it can't execute code.

01:42:34.660 --> 01:42:37.860
It can't execute any of your data flows

01:42:37.860 --> 01:42:38.840
or anything else.

01:42:39.780 --> 01:42:42.660
You know, the NaPy-registry subproject

01:42:42.660 --> 01:42:46.300
is only for your versioning control, right?

01:42:46.340 --> 01:42:49.520
That is, you know, making sure that your flows

01:42:49.520 --> 01:42:52.280
have a version control system in place,

01:42:53.000 --> 01:42:57.460
a place to put those, a place to operate with, you know,

01:42:57.580 --> 01:43:00.860
I know you all use Azure DevOps on-prem,

01:43:01.720 --> 01:43:03.580
you know, so a place to operate,

01:43:03.600 --> 01:43:05.140
you know, in that type of environment.

01:43:05.420 --> 01:43:07.040
And we'll go through it, but, you know,

01:43:07.040 --> 01:43:09.800
it's not secure, there's no username or password.

01:43:10.180 --> 01:43:12.280
So, when this gets set up in your environment,

01:43:12.500 --> 01:43:15.020
it will need, you know, in your, you know,

01:43:15.480 --> 01:43:17.040
the regular dev test prod,

01:43:17.040 --> 01:43:19.660
you're gonna need to have security enabled,

01:43:19.740 --> 01:43:22.360
you're gonna need to have logins and policies

01:43:22.360 --> 01:43:23.340
and those types of things.

01:43:23.720 --> 01:43:24.840
But for the sake of this,

01:43:25.060 --> 01:43:26.480
luckily we don't have to use it.

01:43:26.740 --> 01:43:28.740
So we didn't make any changes to our properties.

01:43:28.880 --> 01:43:30.880
I'm gonna just leave it alone and exit out.

01:43:30.920 --> 01:43:33.980
I just wanted to point specifically that one property out.

01:43:36.280 --> 01:43:38.440
So that's the main property.

01:43:38.520 --> 01:43:40.320
We're gonna go actually into the bin directory

01:43:40.940 --> 01:43:44.420
and we are going to say run NaPy-registry,

01:43:44.420 --> 01:43:46.780
just like we were doing with run NaPy.

01:43:47.660 --> 01:43:49.820
So it's gonna open it up, say run.

01:43:52.720 --> 01:43:55.200
And it should start just like our NaPy.

01:43:55.300 --> 01:43:59.180
You should actually have two command line boxes up.

01:44:01.560 --> 01:44:04.380
And, you know, when registry runs for the first time,

01:44:04.440 --> 01:44:06.240
it's gotta create its folders,

01:44:06.360 --> 01:44:09.460
it's gotta unpack some of the, you know,

01:44:09.480 --> 01:44:11.640
now I have a logs directory, I have a work directory.

01:44:11.640 --> 01:44:13.040
It's gotta unpack, you know,

01:44:13.060 --> 01:44:16.280
some of the library files that's included with it.

01:44:16.580 --> 01:44:17.820
So give it just a minute.

01:44:18.200 --> 01:44:22.020
But when it does come up, you should be able to visit,

01:44:22.100 --> 01:44:23.960
you know, let's just do a new tab

01:44:23.960 --> 01:44:27.460
because we're gonna need our NaPy instance in a minute.

01:44:27.880 --> 01:44:35.600
But let's just do a new tab and go to wrong browser.

01:44:46.620 --> 01:44:52.440
Sorry, I have to, I was doing a new tab on my own desktop.

01:44:52.840 --> 01:44:57.440
All right, so say run, run that NaPy-registry.

01:44:57.740 --> 01:45:00.880
You might get a Java warning like I have.

01:45:01.160 --> 01:45:04.220
Just say allow access because this is a network,

01:45:04.220 --> 01:45:05.360
you know, application.

01:45:06.740 --> 01:45:11.620
So registry should run, takes it a minute,

01:45:11.620 --> 01:45:12.700
takes a minute to start up.

01:45:12.940 --> 01:45:15.680
While that's starting up, bring up your new tab.

01:45:16.180 --> 01:45:20.340
You wanna go to HTTP colon front slash front slash

01:45:20.340 --> 01:45:26.000
127.0.0.1, still on localhost,

01:45:26.740 --> 01:45:31.940
one eight zero, eight zero for that backup port.

01:45:32.660 --> 01:45:35.340
And then NaPy-registry.

01:45:36.400 --> 01:45:38.520
I don't know why, but, you know,

01:45:38.620 --> 01:45:39.760
I don't know why they make it this way,

01:45:39.760 --> 01:45:43.480
but if you do not put, you know, slash NaPy,

01:45:43.960 --> 01:45:46.040
to go to NaPy, it will ask you like,

01:45:46.160 --> 01:45:47.760
hey, I think you're trying to go here.

01:45:47.840 --> 01:45:50.260
I'm gonna automatically redirect you.

01:45:50.880 --> 01:45:53.880
But registry, if you go to just the IP address

01:45:53.880 --> 01:45:57.180
in the port, it will just tell you if I'm up,

01:45:57.280 --> 01:45:59.460
you know, it'll give you an error that it's not there.

01:45:59.940 --> 01:46:03.340
So they don't have a redirect like they do with NaPy,

01:46:03.480 --> 01:46:07.160
so you have to go to the slash NaPy-registry.

01:46:09.300 --> 01:46:11.560
So what I'll give everybody, just a minute,

01:46:11.600 --> 01:46:13.040
let me check everyone's screen.

01:46:13.540 --> 01:46:15.960
You should have registry up and running.

01:46:16.540 --> 01:46:17.300
Should be starting.

01:46:18.100 --> 01:46:19.560
Ecta, you're looking good.

01:46:19.660 --> 01:46:21.160
Tom, you've got it working.

01:46:23.000 --> 01:46:24.100
Peter, give it just a minute.

01:46:24.180 --> 01:46:25.680
It might take a minute to come up.

01:46:28.160 --> 01:46:31.480
So what this is, the registry basically is

01:46:31.480 --> 01:46:33.280
a way to check in and out your flows.

01:46:34.120 --> 01:46:36.160
Exactly, and I'm on purpose,

01:46:36.160 --> 01:46:40.340
not like not going into the full details of what it is,

01:46:40.380 --> 01:46:41.660
but you got it, right?

01:46:42.520 --> 01:46:48.100
So, you know, NaPy needs a way to check in the data flows.

01:46:48.440 --> 01:46:49.860
And the data flows, you know,

01:46:50.520 --> 01:46:54.380
it's not like a Java application, right?

01:46:54.480 --> 01:46:57.260
So, you know, you may have some sort

01:46:57.260 --> 01:47:00.660
of version control already, and you know,

01:47:00.660 --> 01:47:04.000
that's where you're storing your Terraform information,

01:47:04.000 --> 01:47:06.520
you're storing your Chef and Puppet,

01:47:06.880 --> 01:47:11.580
you're storing your Java applications, your Python code.

01:47:12.480 --> 01:47:14.840
Ansible, Ansible, you got it, right?

01:47:15.000 --> 01:47:17.060
But a data flow is a little different.

01:47:17.580 --> 01:47:19.180
Now, underneath the hood,

01:47:19.320 --> 01:47:22.060
a data flow is just a JSON document, right?

01:47:23.780 --> 01:47:26.800
We imported just a single JSON file

01:47:26.800 --> 01:47:29.940
into our new processor group for the previous scenario

01:47:29.940 --> 01:47:33.120
so we can do the first time controller service.

01:47:33.800 --> 01:47:36.860
But, you know, the way NaPy,

01:47:37.020 --> 01:47:39.280
but then it's still like, you know,

01:47:39.280 --> 01:47:41.760
with code, you can just pop it open in your IDE,

01:47:42.160 --> 01:47:43.260
start working on it.

01:47:43.440 --> 01:47:45.780
You can't necessarily do that with ease

01:47:45.780 --> 01:47:49.220
with just a JSON blob of text.

01:47:49.620 --> 01:47:51.200
And so you need that interpretation,

01:47:51.360 --> 01:47:52.280
that interpreter.

01:47:52.740 --> 01:47:55.340
So yeah, Registry gives us that interpreter.

01:47:56.120 --> 01:47:58.060
Registry then also connects

01:47:58.060 --> 01:47:59.960
to your versioning control system.

01:47:59.960 --> 01:48:04.000
So think of Registry as the avenue

01:48:04.000 --> 01:48:07.460
to source code like versioning control.

01:48:08.360 --> 01:48:10.620
Registry is gonna handle the comments,

01:48:10.960 --> 01:48:13.200
Registry is gonna handle the versioning,

01:48:13.220 --> 01:48:15.420
you know, as well as, you know,

01:48:15.520 --> 01:48:19.140
being able to put that into your GitHub or GitLab

01:48:19.140 --> 01:48:21.620
or whatever Git repository you're using.

01:48:22.200 --> 01:48:25.680
But, you know, you got it, Tom, you understand it.

01:48:26.180 --> 01:48:28.280
But let's go ahead and make sure we got everybody up.