2:05:45
2024-05-20 09:46:48
2:09
2024-05-20 12:30:32
2:41:18
2024-05-20 12:33:23
1:36:58
2024-05-21 08:00:54
5:24:36
2024-05-21 10:06:11
3:24
2024-05-22 06:36:04
9:25
2024-05-22 08:03:05
40:22
2024-05-22 08:14:12
2:49
2024-05-22 09:47:03
1:48:29
2024-05-22 09:50:24
1:57:28
2024-05-22 12:09:49
Visit the Apache Nifi GROUP 2 course recordings page
WEBVTT--> Hey, good morning, Agda. How's it going? --> It's, I don't know, it's going okay. I wish I was a lot more further away than being trying to figure this out. --> But basically I had the same thing from yesterday, except I didn't remove some of the extra processes, so I had to introduce some folders. --> So for example, getting the files, routing on attributes based on the file type, right? --> And then the CSV gets processed, the JSON gets put into another folder, which is kind of like an extra step. --> I think you guys didn't do that, but I'd already done that, so I just let it be. --> And the same thing, the CSV gets put into a directory, and then you start processing the splitting of it, evaluating the path, breaking it up into different records. --> And I wanted to, from here, I wanted to merge all records into one file and then start evaluating, which I never really quite figured out yet. --> So I tried to do what you guys were doing over here, like either merge records or merge content, and then kind of processed from there. --> But I went back and did what you guys were doing, putting attributes to your CSV, updating the file name, and then putting the file name right here. --> I mean, putting the file here. So that's what I have. --> Awesome, awesome. And I see you've got some labels, you're trying to mutify the flow. --> You can see how it can get out of hand very quickly. --> But you're doing something that I like to do when I'm developing a flow. --> It looks like you brought down processors that you think you might need or you think you might use, and you put them off to the side. --> And then I'm not using that. I mean, delete it. No, I do the same exact thing. --> So if you're looking at one of my flows, while I'm building it out, I'm thinking through and I'm like, OK, well, I've got the data this way. --> I need it this other way. What processor can handle that? --> I'll bring down a couple of processors. I'll play around with them. --> And then when I get through testing out some of that section, I'll go through and start deleting unused processors. --> But I do the same exact thing. Do remember, anytime you are writing and reading from a disk, it's a much slower operation than processing it as the data is coming through. --> So try to limit, anytime you can, some of the slower processes. --> Zipping and unzipping files is a slow process. It uses a lot of resources. --> Reading and writing from disk is a slow process. It can be a slow process as well because you're reliant on that disk speed to put the files and to retrieve the files, those types of things. --> So try to limit any kind of outside of NaPhi interactions that you can just to keep it within NaPhi. --> Also remember, anytime it leaves NaPhi, and so when you drop it off on the file system and then pick it up later, anytime that that data leaves NaPhi, it's also leaving the data governance. --> So it's not being tracked. So if there was another process sitting there reading that directory of files that you're waiting to pick up, something can happen with the data or similar. --> So just kind of keep that in mind. If it goes outside of the NaPhi ecosystem, you're losing provenance and lineage and those types of things. --> So there are use cases though where you would need to write that to disk and pick it up later. But try to limit that. --> I like how you're doing the route on attribute. We worked on that yesterday evening to sort and filter those files. --> I like your thought process of let's just get it all together into a file and then I can start making some decisions from there. --> So you were able to take your two different types of files, CSV and JSON, bring it all into a common format, all into one document. --> And then I can see from there you would make some decisions or calculations. You may write it to CSV and you already have like an Excel template that will open it up and do its thing. --> So overall, you know, again, I didn't expect anybody to really complete this one, but you did tell a good story on how you were going to handle it. --> So it looks like you also learn how to move your lines around and things like that. You've got a lot of failures going to that log message over there. --> I think once you got it cleaned up and everything, I think it would look great. So overall, you know, a great job. Amy. --> Could you have done like from evaluate, like merge the records and use this instead, instead of going through all this? --> No, let me bring yours back up. --> To merge records to kind of combine it into one file and kind of process it. --> You could. Yeah, you could. So once you had everything is as like JSON, for instance, you could have sent that to a merge record. --> And, you know, here's here's the beauty of merge records and those types of things. --> You then get access again to the controller services, right? And if you wanted to, you could have a schema for that data and it automatically format like we're doing with the CSV JSON. --> It automatically, you know, handle that and, you know, do those types of things. So, yeah, you could have you could have skipped, you know, three or four processes probably right there and send it to the merge records. --> But you're you are going down the right path. --> Okay, thank you. You're welcome. No, thank you. Good job. --> Peter, let's see what you got. --> So, if you can, Peter, just kind of walk me through your flow, what your thought, you know, the scenario. And so, you know, --> So, yeah, I didn't get any further than that. --> Same exact. I got it to have all the attributes, convert them back to CSV, rename them and then export them. --> Oh, you didn't do any of the other analysis as far as in the PDF asking us to look at. --> Yeah, yeah. --> And that's okay. --> That starts up here at the top. Basically, from here, it goes straight down for the most part. --> Here it grabs all the CSV files in the directory. --> I'm kind of breaking it down every step just to see each step where it's going. So I have an original files folder. --> It takes those original files, puts them over there in that folder. --> Updates the schema metadata, converts them to JSONs. --> Updates the attributes. And then it also pushes those converted files over to converted files folder. --> Goes down to the split JSON, which is where this other get file is introduced. --> It grabs the JSON files directly and just pushes those over to the split JSON. --> Then goes down to evaluate JSON path. --> Converts the attributes back to CSV, updates the name and then exports them. --> Then everything else is all just going over to this log error message. --> So all that is running right now. So I'll give it a quick demo. --> Very nice. --> Copy these files into the data directory where it's grabbing all data files from. --> Paste those in there. --> And that'll all run through really quickly so we can see we already have all these files. --> Right here in the final data files folder. --> Oh wow. Yeah, it's almost instant. --> Yeah, I'm impressed by that. It's very quick. I like that. --> If you would have had time, what some of the next steps you would have took? --> I would definitely improve, figure out a better way to name the files in a way that tells users what they are. --> And then also I noticed they're kind of exporting a bit strangely. --> These aren't really CSVs, at least not the way I'm used to them with the header information and then in columns separated by... --> No, that's a JSON document. --> Okay, so I guess I messed up on that conversion then. --> So did you save everything as an attribute? --> Evaluate JSON paths and attributes to CSVs. --> Can we look at your evaluate JSON paths real quick? --> You're grabbing them all and you're setting them. --> So now everything is an attribute. --> Okay, say okay. --> And then you're taking all of those right into CSV. --> Okay, can you stop your set JSON file name? --> And let's just run it real quickly to see what you're getting before... --> I want to see if you have an attribute, what the attributes are saying. --> Because the reason I like doing the attributes to CSV or to JSON is you should have additional data that you can use. --> And so let's see what you got. --> You'll have to pump more files into your directory. --> There you go. --> All right, there we go. --> We're queued up. --> Go ahead and list that. --> And, you know, just go to the far left little icon. --> There you go. --> No, far left. --> The information. --> Okay, so it does have the CSV data. --> Well, actually it does have the attributes. --> If you go down. --> You have hour, humidity, wind speed, temperature, station ID. --> So you are getting those attributes, but you're also getting the CSV attribute that is putting all your attributes in this... --> Oh, hit okay. --> I bet you have a processor sitting before this. --> Hit X. --> There you go. --> And you are splitting your JSON. --> By the way, JSON path. --> Convert CSV to JSON. --> Set JSON. --> Let's look at your configuration for your attributes to CSV. --> So that's the... --> So you see where it says destination. --> You're doing a flow file attribute. --> Yep. --> You want to do a flow file content. --> So we've already got all of the attributes in the flow file. --> Now we want to put it all as a content. --> We got all the attributes already stored. --> Now you want to put it back to a flow file. --> So I bet if you were to do that, it would start writing out your CSVs. --> Now that CSV is going to take all of those attributes. --> And you've got a pretty substantial size CSV data attribute. --> So what you could do is, you know, if you notice in the attributes to CSV, --> we left the blank, the attributes to right. --> You can actually specify just a comma separated list of attributes --> about the attributes you want to be, you know, that you want written. --> Also, because you can specify that, you can put it in the order you want it. --> So maybe you want station ID listed first and date listed second. --> So you can do that ordering. --> And you see the attribute list. --> It's no value is set. --> So it's going to take every attribute and write it as a CSV. --> But one thing I noticed is you have an attribute called CSV data --> that has the data in it. --> And so if you were to write everything as CSV, --> it may not be a proper CSV just because you've got a lot of values --> in that one attribute. --> Okay. --> So I would see where you're picking that up --> and try to either remove that or specify your attribute list here --> and then write that as CSV. --> Okay. --> And then you wrote all your files to disk. --> And I think that looks good. --> You know, the one thing I would take a look at as well is, you know, --> just like everyone, right, you know, --> once you start adding multiple different processors, --> you know, it starts getting out of hand. --> You've got lines going everywhere, those types of things. --> So when you get a chance, like, you know, to clean it up, --> one thing you may think about is busting up this flow. --> And you may want to put, like, your git file and, you know, --> the actual getting of the file and the sorting of the files --> in one processor group. --> And you can then put, you know, --> the conversion of CSV to JSON in its own processor group --> and then the evaluating JSON --> and building your attributes in another group --> and then the writing of that data to disk, you know, in your fourth group. --> And that would break this up even more. --> And then you could use input-output ports, --> which I'll show you on mine, to kind of connect those together. --> But overall, right, you nailed it. --> You got extremely close. --> And I think if you would have had more time, --> you would have been able to write this to disk and do some analysis. --> You could have used maybe some other processors in regex, you know, to do things. --> But, yeah, you got it. --> You got the principle. --> You got the concept of this, so great job. --> Any questions that else that I can answer on this flow? --> Okay, yeah, thank you. --> Yeah, so, yeah, that makes sense. --> You can just group them all together --> and then use these ports to connect different groups together. --> Correct. --> So I actually, I was doing it to mine just so I can kind of point it out. --> So if you're seeing my screen, I actually, I forgot to delete this and this. --> So I worked on cleaning my flow up. --> So what I did is I set the JSON file name --> and I connected it to another processor group. --> And then I've got an input port coming from that, --> going to my put file and then forcing my log error message. --> And now, you know, I'm starting to break this up. --> And I can actually group all of these different colored processor groups --> into its own, or processors into its own group. --> And then utilizing, you know, input output ports to, you know, --> receive that data or to output that data. --> And that's a nice thing about that. --> And it kind of helps clean it up and organize it. --> And then, you know, you may have a standardized way of writing data, --> even if it's not a controller service. --> And so you can have input ports from multiple different processor groups, --> all, you know, as well as different organizations --> using that same method to write data. --> So, you know, just tips and tricks if you want to clean this up --> where, you know, when you're building these in the future, --> you know, just some things to think about. --> Okay, yeah, that sounds useful. --> Thank you. --> Yeah, no. --> I don't think I really have any other questions right now, --> like you said, with more time. --> I go into the setting alerts and stuff like that. --> Yeah. --> Okay. --> No. --> Yeah, there's a lot. --> There's definitely a lot here that just need a lot of practice --> to get used to what we're able to do and how to do everything. --> It is. --> And, you know, and this is for everyone, if you, you know, --> everyone got NaFi up and running on their local machine relatively easily. --> And so, you know, I find in my day-to-day job as, you know, --> whatever job hat I'm wearing, --> I will sometimes have large amounts of files to deal with or something. --> And, you know, I'll actually start NaFi, you know, --> build a quick flow to handle the processing of those, that data, --> and then I'll shut it back down. --> So, you know, you may run into scenarios like that where you're like, --> well, wait a minute, let me just open up my local NaFi instance, --> process all these, you know, zip files that were sent to me, --> and, you know, just so I can get more practice at NaFi and, you know, --> building some of these things. --> And there's times where I could write a script to solve a problem, --> but I'm like, well, let me build a data flow. --> And that way, you know, it provides me more insight to, you know, --> what's going on. --> You know, we could have scripted this out. --> This whole thing could have been scripted in Python, for instance, --> right? --> You could have picked it up in Python. --> You could have built logic into sort and filter and all that. --> You know, you could have done all of this, but you wouldn't have got, --> you know, the security, the data governance, you know, --> all these things that NaFi is providing. --> So, you know, just a little tip and trick. --> You know, I like to keep it installed locally. --> You know, I'll start it, and I'll play around sometimes if I have a use for it. --> But, yeah, I think if you would have had time, --> you would have been able to send an email alert, --> even though there's no email server, you know, --> and something like that. --> So, but great job. --> You know, just keep in mind, you've got your processors named. --> You've got, you know, some of the colors and the images, --> the labels being applied, you know, at the very end, --> before it's ready to go. --> Just clean it up, make sure it's readable. --> That way, if a sys admin logs in and says, hey, --> what's this flow doing, they can easily take a quick look and say, --> okay, well, that's the flow, and that's what it's doing. --> And, you know, that way, it's easily just understandable. --> But great job. --> Any questions overall that I can help with? --> I don't think so. --> Okay. --> Perfect. --> Thank you. --> Yep, yep. --> Oh, Darius, how you doing? --> Hey, I'm good. --> I think I was missing a few steps for mine. --> Yeah, I was. --> But pretty much I would use the get file to get the data. --> I had it route on the attribute, and it goes to the set schema metadata, --> which goes to convert CSV. --> I don't know why I put it all the way over there, --> but it converts the CSV to JSON. --> It splits it. --> It sends it to evaluate. --> From there, I guess it sends it to X2 script. --> I think there might have been another processor, --> which would have been easier than this one. --> Oh, nice. --> Look at you. --> It's not complete, but it was just kind of like an idea. --> I think one of the requirements scenarios, --> they wanted you to categorize weather conditions. --> So I was just searching based on precipitation or temperature. --> So from the documentation, this was how you get one word. --> I'm not sure if this line would work, but if it did, --> it would just load the flow files to JSON. --> You search through the keys for precipitation or temperature, --> and you get those values. --> And then later on, I guess I would try to compare them --> and categorize the conditioners that way --> and then just find a way to append it to the file. --> I think there's other processors that you can do that. --> Yeah, there's merge. --> There's other ways. --> I really like this. --> I really like that we did not go over this, --> so you thought outside the box. --> You know, we went over it a little bit. --> I like some of the capabilities of 9.5, --> executing these scripts and executing binaries on the file system as such. --> I'm really excited for 9.5 2.0 because I love Python, --> and you can actually develop your own custom Python processor now very quickly. --> So I can envision you in the future taking this, --> and you may have a weather conditioning category processor --> that takes all your data and spits out alerts. --> It spits out values for alerts. --> And, you know, others could use that same processor to do the same. --> So I think you did an amazing job. --> After you got the alert, what would you do with it? --> Alert of the weather condition? --> Correct, yeah. --> Say you had time and you ran the script --> and you were able to generate some sort of an alert. --> What do you think would happen next? --> I'm not sure. --> I guess it would be another processor maybe. --> Yeah, I mean, you have an email processor. --> You type email. --> Yep, you can put email, send an email to recipients in a flow file. --> Of course, you would need an underlying email server. --> One of the cool things is, you know, --> it does have processors to handle email. --> I know of a few companies that is running NAFA to intercept all email, --> and it's scoring it for spam --> and checking for additional antivirus and stuff like that. --> But, you know, yeah, you could have put it to email. --> There used to be an actual texting processor, --> and it's still available and it's still updated --> through some of the texting services. --> It's not a part of the official NAFA package, --> but yeah, there's a texting processor that would send a text. --> So you could have took that alert --> and made a text message that said, you know, --> temperature is 99 degrees and it's hot today or something, right? --> Or you could send an email. --> So no, I like where you're going with this. --> I like the thinking outside the box. --> You know, working on cleaning it up. --> Your set schema metadata is way up there when you got it, right? --> You know, just try to make it where, you know, --> you're reading left or right or up, you know, up and down. --> But I think if you would have had time, --> you could have finished this up, cleaned it up, --> and it would be a great functional data flow. --> So great job. --> Any questions I can answer, you know, right now? --> I don't think so. --> This is the right processor to write it to the disk, right? --> To this, yeah, put file. --> And, you know, you got your get file and you got your put file. --> And then you could have taken this data --> and pushed it to a database, right? --> You've got those processors. --> You know, you could be receiving different data. --> And this is a real world scenario where you're receiving different data --> from different sources. --> You're bringing it together, getting it into a common format, --> breaking it down per record instead of like 20 record, --> 24 records per document, and, you know, updating the database. --> So yeah, no, you got the right processor for it. --> You got the right kind of flow going as well. --> So great job. --> Good morning, Leroy. --> How are you doing? --> I'm doing all right. --> All right. --> Let's kind of go into your, kind of walk me through your processor, --> your thought process. --> What are you thinking? --> And let's look at this. --> Okay. --> Well, my first attempt at this was to lay out the steps, --> how it's presented in the PDF for the scenario. --> I maybe spent too much time making it look nice --> because I didn't get very far in actually implementing the links between them. --> But the general idea was to get CSV, extract text. --> I honestly didn't make it past this step. --> I was trying to extract, using regular expressions, --> the text and immediately applying it to like the attributes. --> So I jumbled up there because as soon as we got to the next step, --> I was able to just use a replace text to basically convert it over to JSON --> type format. --> I like where you're going with this. --> Instead of sending it through the convert records and everything else, --> you're just quickly ingesting it, extracting the text and doing it that. --> That's a perfectly acceptable way of doing this. --> Go ahead. --> The next idea was to execute some kind of script. --> Darius was kind of doing that with his Python. --> Nice, nice. --> I just can't even get to this step. --> The idea after that was to merge everything into one sheet --> and then kind of use a SQL query to kind of format and get summary stats. --> And then after that, it'd be routing on some kind of threshold of extreme temperatures --> or where it would convert the record and send it as an email. --> And if it didn't meet those conditions, it would just, either way, --> it would just make a report and write that report into a file. --> Nice. --> That was a good idea. --> Didn't get very far, so kind of overhearing everyone else's comments --> and kind of the other approaches started scrapping together the other way --> where we used some of the controllers. --> Well, I mean, so let's take a look at that, right? --> You're getting a CSV file, you're sending the schema metadata, --> you're converting it to JSON using a record reader and a record writer. --> You're having controller services you've got to set up and figure. --> There's a lot going into those first core processes, right? --> And then sending the file name. --> So, you know, I like the approach you took on the other one --> where you're just bringing the data in. --> You're doing a regular expression and extracting what you need. --> The beauty of this way that you're now doing is I feel like it's easier to understand. --> And if somebody else needed to work on this, you know, --> and then they knew controller services and such, you know, --> they could pick it up right up in the other flow. --> You're depending on someone understanding the rejects. --> But I feel like with some documentation, you know, that would be, you know, sufficient. --> But you also accomplished, you know, a lot of these steps and less processes --> for your first group versus this group. --> And one of the things that, you know, for fun when I was at NSA, --> we would just sit around and we would take some scenario like this. --> And, you know, we used to do this with, you know, shell scripts and everything else. --> But let's see who can build the data flow with the least amount of processors --> and, you know, set some roles. --> And, you know, we'd all take the same scenario and we'd run through --> and whoever could accomplish it in the least amount of processes won. --> And so, you know, sometimes we like to overcomplicate things --> and make things complicated. --> And I like your first approach, but your second approach is more, you know, sustainable. --> It's scalable. --> You know, the extract text, if something comes through, --> the data comes through that, you know, we've seen data before, right, --> where it's come through and it's supposed to be in one format --> and it's got a special character or something that might throw off your extract text. --> Error handling is better, I feel like, in the convert record. --> But, no, and you've got the beautification down. --> That first flow looked, I mean, everything was well documented. --> You had bullets for each one. --> You know, great job. --> Any questions I can answer, you know, about either one of these? --> I had questions in general about some of the Python stuff. --> Well, I guess it's not specifically to Python, --> but just making a custom processor in general. --> I know that there's one that speaks scripts --> and it says, or some of the language on there say, like, they're deprecated. --> Yes. --> What kind of replaces that? --> So in 9.5 2.0, and I will make a note. --> A good friend of mine, Mark Payne, who was on my team at NSA, --> he does YouTube videos here and there. --> And, you know, he puts out, like, design patterns and stuff like that, --> as well as, you know, what's coming forward in 9.5 2.0. --> You know, the 2.0 is getting rid of some of these execute script-type capabilities --> for security reasons. --> But the beauty is they're making it where you can build your own Python processor. --> And so if you can write a Python script, --> it's basically taking that script and making it a processor. --> And so, you know, you can start, you can actually download 2.0 right now. --> The documentation has already been updated as well. --> And, you know, you can start playing around with developing your own custom Python processor as well. --> I like the execute process a little bit better than the execute script. --> The reason they're deprecating these and the reason they're trying to get away from it --> is it was causing more confusion than it was helping. --> And, you know, it was executing. --> You've got to configure the processor right. --> You know, Ogarious had pulled out where you've got certain things. --> You've got to include with your Python to accept the file, --> to put the file back out, and those types of things. --> So they're getting rid of that. --> But if you look at 2.0 and if you like Python, --> you can do amazing things with 9.5 2.0. --> But if you have a Python script that you want to write, --> you can put it on the file system, execute process instead of execute script, --> and use it instead. --> There's a lot of documentation out there, one versus the other. --> And even execute string command. --> If you're familiar with Spark, Apache Spark, --> it's a ML AI data framework that usually runs on top of big data systems like Hadoop. --> But you can run Spark standalone. --> And 9.5 has the capabilities to connect to Spark and execute Spark jobs. --> So you may even set up your own local Spark and have a Python script to handle this. --> And Spark will run that in memory. --> And then you can also do data science type capabilities on that. --> So there's a few differences between those processors. --> There's reasons that the execute script is going away, like I mentioned. --> But I feel like you can still accomplish if you wanted to do that with an execute process. --> And I also feel like with the new version of 9.5 that's out, --> you can do your own processor even. --> And that way it's more sustainable. --> And if you develop that, anybody can use that processor that has permission. --> So it's a reusable code, which is real nice. --> But they're getting away from it. --> That's why you see it's deprecated. --> But there's other ways to handle that. --> But I hope that kind of helps answer some of your questions as well as the theory behind that. --> Yeah, thank you. --> Okay. --> And then any other questions about 9.5? --> Looks like you looked through all the 9.5 processors. --> You've got most of the toolbar sitting here. --> So it looks like you've exercised what we learned over the last couple of days. --> But any additional questions? --> I know that some of this is coming up in 2.0. --> But within this version, --> what would be the most straightforward way to implement the custom processor? --> Is it just limited to... --> I guess this is negatively jumbled already. --> So probably have to write the jumbled processor. --> Yep. --> And you can see my screen. --> Creating a custom processor. --> So I don't know what ID integrated development environment, --> IDE that you use. --> Most people use VS Code now. --> And Eclipse is still the rainy ones. --> But yeah, you are basically writing a Java jar --> and building a Java jar applet. --> But in this case, it's a Nahon R. --> But to create a custom processor, --> this will be in the documentation SNL. --> I was told that I may not want to go into some of the very, --> very, very technical aspects of this for this class. --> But I've got the material here just in case. --> And so to create a custom processor, --> you just need to download your Maven type. --> You need to get Maven running. --> You make sure you have your Java JDK environment set up. --> Make sure that your IDE is configured appropriately for that environment. --> And then you can start running your processor. --> Now, with that being said, --> I say this and I want to show everyone. --> I like to... --> So what I like to do is I will go to GitHub. --> That's the PowerPoint presentation. --> I will go to GitHub and I will look. --> And even just a search for NaPhi. --> Tim Spann is a good guy. --> He puts out a lot of information. --> You know, here's Cogstack NaPhi, which is really cool. --> It's a whole data processing pipeline, you know, --> for NLP and those types of things. --> They got a few custom processors. --> So that's a whole stack. --> But if you look, you can go... --> Let's see here. --> There is... --> Oh, Influx data. --> Influx publishes its own NaPhi processors. --> Here's the source code, you know, --> for the NaPhi processor. --> Here's how to build it. --> Actually, they already have pre-built ones you can download. --> So for, you know, 126, you can download it now. --> And put it in your extensions directory. --> And you will have the Influx DB processor --> as part of your processor group. --> But all the source code is here. --> So if you go to GitHub, --> it has most of the other types of connections --> that these people have developed. --> You know, some of the specialized military formats --> we don't see, of course, on here. --> But even some of these specialized, you know, --> sensors and UAV and all these other things, --> there's processors usually for those as well --> that the government's developed. --> So, you know, if you run across something --> that you think you might need a custom processor for, --> you check your resources, check GitHub, check Google. --> And, you know, there are some resources --> that I unfortunately don't have a JWix terminal in my house. --> But, you know, if I was on JWix, --> there are some sources as well I can Google to get to. --> You know, so I think before you build your own custom processor, --> you know, just kind of Google around. --> You may find some documentation. --> And then if you do end up having to build your own processor, --> if you're using, you know, anything less than 952.0, --> then it's definitely going to have to be in Java. --> But if you use 952.0 and above, it's going to be Java or Python. --> And you can find tons and tons of source code out there to help you. --> So, you know, definitely take a look at that. --> There's all kinds of flows and stuff on GitHub, --> you know, non-file source codes on GitHub. --> There's, I like GeoMesa. --> That's another product that I really like. --> They have their processors. --> So a lot of these companies even build their own processors --> and release them. --> So that way you can use their services --> as part of your data orchestration later. --> But, yeah, hopefully that answers your question --> and gives you a little bit of tips and some pointers --> to where you can get some more information. --> All right. --> Ricker, how'd it go? --> Yeah, I was going to... --> I unfortunately had to step out. --> You had a meeting. --> I missed out a couple steps here. --> Yeah, and I apologize. --> Oh, no worries. --> Just kind of something happened here, --> and I ended up being pulled away. --> However, I did attempt to kind of follow along. --> I didn't want to interrupt the class. --> And I still won't, so I just kind of been following --> what you're supposed to have been doing. --> And if it's okay with you and everyone else in the class, --> I'll just kind of continue in that format. --> Otherwise, I'm going to be a huge distraction to everyone, --> and I don't want to do that. --> No, no, no. --> And I think... --> So let's do this. --> Again, I'm not looking for what's complete. --> Kind of walk me through some of your thought process --> and what you've already got built, --> and let's go with that. --> So if you can, just kind of walk me through. --> Yeah, actually, so at least in my lab here, --> I was trying to follow the process --> that you were working on yesterday. --> I did end up having some issues with the lab. --> I couldn't... --> Yeah, it was kind of latency was the issue. --> I'm sure our network is part of the problem. --> So when I was doing this, --> I kind of got cut off there on the evaluate based on that. --> And just on my end, you know, --> I was trying to see what other folks were doing --> as you were kind of doing exactly what we're doing now. --> Just kind of walk me through the workflow. --> So if anyone has questions, --> I was just kind of following them. --> Then I ended up getting cut off. --> I couldn't get... --> I had to start for whatever reason. --> I ended up doing a... --> I just spun up a Docker image on my desktop. --> I started to point. --> So I ended up just kind of building something --> that I'm currently just kind of tolling with --> in my environment. --> So basically what I'm trying to do, --> because I see potential at least, --> some of our data formats, --> which are CSVs, --> that top header column, the metadata essentially, --> lets us know what attributes are available for the data. --> So kind of doing the same thing here, --> trying to see if I could just extract that top portion --> to see what metadata is available. --> And then the next steps are still kind of blurry. --> I know I'm going to have to do some research, --> but the idea is I can at least know --> what the header contents are --> and compare that to a catalog to say, --> hey, I'm looking for a particular type of data appeal. --> And this thing happens to have it. --> I like that. --> And make it visible to say, --> hey, this data has this one column that you're looking for. --> So that's kind of what I was tolling with. --> And I apologize. --> I couldn't do it in this lab --> because I wanted to do it in my... --> Again, yesterday I was having issues, --> but then I also wanted to see if I could do this --> in the same step with an actual data box. --> No, no, no. --> And again, it's... --> What I've went over the last couple of days --> is it starting to set in. --> And I think it is, right? --> I saw, I think it was yesterday, --> or the day before yesterday, --> you were already experimenting and playing around. --> So it's funny, you just spun up a Docker image --> and went to work. --> I would have done basically the same thing. --> So I think you get it. --> You've got your flows laid out. --> You've got... --> It's starting to become readable. --> I think you grasp that there's processors --> that you can use to do these things --> and then send emails and stuff like that as well. --> You've got everything in the processor group. --> It's starting to... --> Your right side is starting to look like a spider web, --> but I bet if you were to completely go through it, --> you would kind of get it cleaned up --> and make it more presentable. --> But I think overall you're grasping --> what we're trying to do here and why we're doing it. --> And so, you know, --> actually I thought you got that on the first day --> when I saw your experiment with all your images --> or whatever you were working on. --> But no, cool. --> Any questions I can answer? --> Any questions about NaFi? --> Are you up on a specific step? --> Even your test-like playground. --> Is there anything I can help with? --> Not really. --> Or I should say no questions. --> I'm just kind of experimenting. --> If I do, I definitely will reach out. --> And I definitely got some ideas during Leroy, --> Peter, actually even Tom. --> And now that I know they know, --> I'm going to reach out to them and bug them. --> I think Tom got the first of everybody. --> He had the least amount of questions. --> So I think Tom's already done. --> He's already done with the next scenario probably. --> No, hell no. --> Tom already knows I'm going to bug him. --> I have bugged him. --> And I think it's been about two years. --> I used to bug him just about every day for a long time. --> And then I kind of switched roles --> and I stopped bugging him. --> But Tom, now that I know you know. --> I'm way below your pay grade, man. --> No, sir. --> But really, I think these folks are the ones --> that are going to be probably more hands-on. --> And I don't want to be a distraction. --> Okay. --> No, no. --> It's one of those tools that, again, --> anybody can download, spin up a Docker image, --> install it, play with it. --> You may not be as hands-on --> or as part of your day-to-day operations, --> but you may have a use case even at home --> that you want to, like maybe you get like me sometimes --> and you want to get away from all the Google services --> and start storing everything locally. --> So, you know, you got to process all that data. --> So, no, have at it. --> If anything comes up, you have any questions, let me know. --> But I think you're coming down the right path. --> I appreciate that. --> Yep. --> All right, Tom. --> The high achiever of the class. --> I don't think so. --> I didn't finish. --> No, that's okay. --> A couple of people got a lot farther than me. --> Odarius and, well, I think it was Eleroy. --> They got a lot farther than I did. --> I was very surprised to see their extract tags --> and their scripting. --> That was really nice. --> Walk me through your flow. --> Well, it's been my turn to be the distraction today --> or be distracted today --> because our team lead is on vacation the rest of this week --> and it's been this short-fused, dramatic fire thing --> that I've been dealing with. --> I've been in the government for over 20 years, --> I understand. --> Yeah, so it figures, you know, my team lead's not here --> and there's this thing that just happened --> that's like, ugh, anyway. --> Drama, man, always something. --> Always. --> But no, like the other folks, --> I was trying to, you know, pick up all the files --> because I liked X's approach --> when he was describing it, pick up all the files --> and then sort it from there --> and then convert the CSV to a JSON format. --> And then there, you know, we did like, --> like most people did, was split the JSON. --> And all this was working and I got to this point --> and then I wasn't sure what to do --> after I had the attributes from the CSV. --> I kind of felt like I wanted to merge the contents --> into just one single file. --> And then from there, I would have liked to have, --> like the scenario was asking, --> take some of those and pull some sort of summary --> like it was, you know, like if, --> like you were saying, the high temperature day --> or the wind speed, you know, certain attributes --> and pull that into some sort of summarized report --> and then put that into some kind of email. --> Or once I had it in one file, --> maybe put that into a different folder, I wasn't sure. --> But that's kind of what I was looking for. --> The end result would be was to not only have maybe --> a well-formatted human readable report stored somewhere, --> but also have something like that sent out as an email, --> like, well, here's your weather for the day --> or, you know, something like that. --> Or, hey, we experienced this yesterday, --> you know, something, I don't know. --> I hadn't really thought that far ahead, --> but that's kind of what I was thinking. --> Well, I like the thought of getting each one of these --> as an individual record, because if you can do that, --> then you could potentially, if that was, say, --> an individual JSON record, --> and you were looking for a temperature threshold --> of over 90 degrees, for instance, --> and you could send that data to an evaluate JSON path, --> you could look at just the temperature path, --> and if it meets that threshold, --> it sends it another direction where you generate a report. --> So, you know, there was a few ways you could have, --> you know, ultimately handled that. --> It could, you could send it to a SQL query --> or a database or, you know, those. --> But I like the thought of getting all this in, --> all the same format, all in this, you know, --> put it all together, and then start analyzing it. --> And then that way you can also do some trend analysis --> and things like that. --> So, I think you're close. --> I understand you got hung up on a few of these parts --> and pulled away and things like that. --> You know, again, I recommend for everyone, --> download this, install it. --> Well, there's nothing really to install. --> Download and run it, and play around. --> Just make sure the data you're playing around with --> is something that, you know, --> you can delete and never get back, --> because NaPhi is very quick, --> and if you accidentally hit start, --> it could, you know, in the last class, --> they had a couple of folks that was doing a Git file, --> and they did a Git file on the NaPhi directory, --> and they picked the application up and crashed it. --> So, you know, luckily we didn't have any of that --> in this class, but, you know, --> so again, some of these concepts just take practice, --> take time, take Googling and things like that. --> You know, so don't beat yourself up --> if you didn't finish or even if you didn't get --> halfway through. --> What I'm looking at is, again, --> a basic, you know, understanding of NaPhi, --> how it works, how some of the processes work, --> what you can do, and flow file versus content --> and attributes and those types of things. --> So, great job. --> Any questions I can answer for you? --> Well, I think it's the biggest, my biggest problem --> for me that I see is the biggest hurdle is --> understanding what some of these processes actually do --> and which ones to pick to fit your requirement. --> I mean, that to me is extremely difficult --> and challenging to try to figure out, --> you know what I mean? --> So, even understanding what the processes actually mean, --> like, okay, I can add a processor, --> and then you start looking at it, --> you're like, I have no idea what that means. --> I have no idea what to put in properties. --> You know, I think you just, I don't know. --> I mean, I guess it comes with using it more --> and like you said, playing with it. --> I don't know. --> It's definitely a challenge. --> I mean, this is really, really cool. --> I really love playing with it, --> but I also think it's extremely challenging --> and difficult to kind of grasp and get, I don't know. --> It is. --> And again, one thing I always recommend --> and it's something I, even me who has wrote --> the underlying code for this, --> I stay at the documentation --> because there's hundreds of committers on IFA. --> You know, and you know, --> a committer may be somebody that's doing --> just a single processor. --> So what I like to do is I always have --> the documentation open. --> You know, you've got your expression language guide, --> which is a biggie, a record path guide. --> We really haven't touched on some of the admin --> and tool guide stuff just because that's pretty technical, --> low-level technical stuff. --> But I love the processor list. --> And these are the official supported processors. --> And so, you know, I'll pull this up. --> I'll click around like, oh, okay, let me see. --> I'm doing some JSON stuff. --> So let me find, you know, a JSON. --> I like a jolt transform, right? --> You can actually do some transformations on the JSON. --> I'm surprised somebody didn't choose that. --> And then you can get more information --> and those types of things. --> So, you know, as well as relationships, --> what's required, what's not required, --> you know, the explanation for each of the fields --> and those types of things. --> So, you know, and there's 511 processors --> and reporting tasks and parameter providers. --> That's another thing that makes it overwhelming. --> There's so many. --> That's a complaint. --> That's a complaint that the community has. --> So how do you balance the, you know, --> shipping this out with, you know, --> some of the core technologies that you would need --> versus shipping it out with very little processors --> and everybody having to download and find and everything else? --> That's also one of the advantages. --> So one of the things that's cool --> is you could be as wildly creative as you want --> or you can try to keep the sith. --> I mean, it's a wide range. --> I mean, it's just wide open, your strategy. --> You can hear your approach you could use. --> You know, that's what makes it cool, though. --> But yeah, difficult for somebody like me is still like, --> I don't know. --> Yeah, no, I get it. I get it. --> Okay, good. --> Any other questions I can answer? --> The only other thing I would ask, --> so in a scenario where, you know, --> we're deploying this to prod, --> we're going to be using DOD SSL strategies, --> we're going to be accessing the web GUI via a friendly DNS name, et cetera, --> and you want to do even simple things like get files from a directory --> where you and Thomas Hall doesn't have access to the directories, --> and you have to use a service account and a password, --> I mean, how does something like that work? --> I'm assuming there's a process where we put in the service account --> and the password, but then is it encrypted? --> I mean, you know, I was just thinking about that yesterday. --> Like, how would something like that work? --> Because that's generally how we would end up doing things in our environment. --> Yeah, so let me, that's a great question. --> So, and we went over this in the first class. --> If you see this little shield icon next to some of the processors, --> that is because they have access to the file system. --> They have access to, you know, other resources outside of NAFA. --> For instance, you know, --> these patterns can reference resources over HTTP, --> so they can actually pull in an HTTP request. --> You know, so we were actually talking about that and the multi-tenancy. --> So when they, you know, I know when it gets set up, --> you know, they're going to set all of the multi-tenancy up, --> they're going to have policies. --> So when Tom logs in, --> Tom actually may not have access to the Git file processor --> just because of some security restrictions --> or if you do have a password, --> let me actually show you what happens with passwords. --> Yeah, I can't tell you how many things we do with service accounts and passwords, honestly. --> Yeah, no, I get it. --> That's a lot. --> So here, let me see here. --> Like, here's a credential service and everything else that you can set up with GCP. --> But let me, let me find one that specifically asks for a password. --> Oh, HTTP. --> Invoke HTTP, I think it is. --> Oh, well, I'll just show it right here. --> I can do it right here. --> So let's say you had a username. --> I added one to my canvas by mistake and I was like, oh, that's kind of cool. --> But then I removed it because I don't think it was what I was looking for. --> But yeah. --> And so say I had password and username and password. --> You see the sensitive value. --> That's now yes. --> OK. --> So now we'll type the password, the made up password. --> And so there's a sensitive value. --> So now it's encrypted on disk or in the setting. --> So nobody unless you have permission to access to modify not access to modify that processor. --> You do not have access to the password. --> Not only that, if someone has access to modify, they still don't get to see the password. --> Because you have to it's a sensitive value. --> So even I can't see what I set that password to be. --> So now I have to do it again and fix it and set it and done. --> So, yeah, if you have a username and password, it could be we can actually set that in many places as well. --> I like this question because it helps me show a couple of other parts we talked about earlier. --> There's a parameter. --> So if there was a username and password we want to use and go from there, I can say this is like creds or DB. --> And let's put it like username. --> Then I can add another one that says password. --> That's a sensitive value apply. --> Now I have a parameter that I can reference in all of my I can reference it in all of my processors. --> So instead of Tom needing to know the password, you can just say reference this password. --> And so now is going to see that as a parameter that's set. --> And so you can use, you know, using the, you know, rejects the regular expression language, which, you know, it's just like dollar dot curly bracket password. --> So you never knew the password that was set, but you are referencing the password for your processor. --> If you have that permission, right? --> So it's really cool because that way I can me say I'm the sys admin and I set and I have the keys to the kingdom and I set all of these things up. --> I can then say, OK, Tom, when you connect to this database, --> just use the username database username parameter database password parameter and I can actually set both of them as sensitive values. --> You would never know the username and password, but you would still be able to connect to the database. --> Oh, that's what's up. Yeah. OK. Cool. Pretty cool. Huh? Yep. Very cool. --> No, thank you for asking, because it gives me a teaching moment. --> So I appreciate it. All right. Any other questions? --> Yeah, I think I'm good. OK. So what I'm going to do is actually I'm going to kind of walk through my file and what I was thinking, because, you know, yesterday evening noticing, you know, some of the hurdles that we were having. --> I went through and started building my flow as well. --> And so, you know, I just kind of want to show you all what I was thinking. --> I was going to go and then we will probably just go ahead and go to lunch, come back from lunch, install registry and work on our another scenario. --> That's that's relatively easy. --> So anyway, so, of course, I'm getting my files from the directory. --> It's got CSV, it's got JSON, those types of things. --> I send it to a route on route data on the file type route on attribute. --> So what I'm looking for, you know, there's a file name context. --> So I'm looking for, you know, if it contains dot CSV, send it somewhere else. --> If it contains dot JSON, send it another place. --> I could, once you get that file, it does some identifying MIME type. --> And so I could route that in. --> And if you have an identified MIME type, it's reading what type of file it is. --> It can care less about the name. --> So everything could be named dot JSON and it be an actual CSV. --> It doesn't matter because the MIME type is going to detect CSV. --> So I could have used a MIME type attribute here and said, OK, well, MIME type is application slash JSON, send it over here. --> Application, you know, CSV, send it elsewhere. --> But for this example, I just made it something easy that if it contains the file name CSV, send it somewhere. --> If it contains JSON, send it somewhere else. --> So, you know, and I built this flow basically, you know, a lot of it is modeled after what you all were working on, as well as the previous. --> You know, I'm sending this schema metadata. --> We went over that a couple of times. --> So with that flow, the previous flow, you know, I did it with weather. --> I converted it to JSON. --> I actually do not need the set JSON file name. --> I left it there just because that was the previous scenario. --> I could care less about the file name because I'm not writing it to disk right now. --> So from there, I could have actually sent that JSON straight to the split JSON record. --> So if I was cleaning this up, I would remove that step altogether. --> It's just a step I don't need just because I'm going to write file names later. --> And then, you know, splitting it into individual records. --> I think, you know, Tom, you might have mentioned, or Darius, a couple of you mentioned you want to get it down to an individual record. --> You know, that way you can inspect, you know, each individual file that comes out and make decisions. --> You may still merge it, but that way you have when each record happened. --> Also, if you have each individual record, you know, say you want to take that record and put it onto your enterprise service bus or something similar like that. --> You can take each record and post it, you know, or something like that. --> That might be a use case. --> I evaluate JSON path, you know, like most of you all did. --> Again, whenever you are doing an evaluate JSON path, make sure that you have this set to flow file attribute. --> If you are pulling out multiple elements of that JSON document, if you do flow file content, --> it's only going to pull out one of the elements of that document, right? --> And so what I did is I said, okay, I've got this JSON file. --> I'm bringing it in. --> I'm going to extract every value out because I want it as an attribute. --> And then that way, if I have all the CSV as the same attribute and I have JSON as the same attribute, all the same, I can write those attributes however I want to write them. --> So I did that. I extracted that. --> And then I started building my document. --> So, you know, I took all the attributes, saved them as a JSON. --> You know, keep in mind I did pretty print on my JSON just so it's more presentable. --> I also included the core attributes. --> And of course, you know, the destination, I want it from an attribute down to the flow file. --> So I took all those attributes, built an actual piece of data document, and then that's what goes to the next. --> Now, you could do flow file attribute, but I don't need to because I've already got them as attributes. --> And so I run this one time. --> Okay, so, you know, I took a list of all the attributes. --> And the reason I did that is now I've got even more information. --> So that actual file, it only had temperature, wind speed, station ID, precipitation, humidity. --> Now I have, you know, was this originally a JSON document or CSV? --> Well, I know it was JSON because I have the route on attribute route. --> So that's some of those core attributes that were automatically added, you know, as this document traverses the flow. --> You know, when was the file last modified, right? --> When was a fragment identifier, the UUID, the path of where the data came from originally, you know, the original file name, some of the owner and stuff like that. --> So, you know, I wanted to include that. --> And that might be additional data I would need when I'm making my data science type decisions. --> So, but if I didn't want that, I could have just, like, I could have spelled out only attributes to JSON. --> I could have, it's just a comma separated list. --> So I could put temperature, comma, humidity, comma, wind speed. --> And those are the only three attributes I would have got when I wrote this document. --> So, you know, something just keep in mind. --> And then this is where setting the file name really came into play. --> I was trying to work with some fancy date format. --> I was pre-pending or pending. --> But for the sake of time, instead of trying to diagnose it, I just updated the file name to be a UUID.JSON. --> And so, you know, it was really easy. --> I just take file name and, you know, it's doing an update attribute. --> So it's going to take the file name attribute and output a UUID.JSON file name. --> So all the data that was coming in, you know, to that, let me see, I can run this one time. --> So the data going in was a weird file name JSON. --> And now the file name should be a unique UUID.JSON. --> So now I've got all of my JSON extracted. --> I've got my CSV converted to JSON. --> I've got everything coming in, and all the data looks exactly the same. --> Even the file name has a pattern. --> You know, I could have named it a date or something else. --> If you look in the regex expression for the NAFA documentation, you'll see file name has a pretty substantial, you know, --> it has a substantial category to it. --> So you can actually prepend, append, replace those types of things. --> So from there, I created a new processor group because I wanted to start showing off some other tips and tricks of NAFA. --> So I created a new processor group that basically is the output to the file system. --> So I go into my processor group and I created an input port. --> And so when I did the input port, all I did is, let's see here, from main group, local connection. --> And, oh, because I'm already using, let me delete this one, because I'm already using this. --> Anyways, when you set this up, when you do an input port, it's going to ask you where it came from. --> And so, you know, you just drag and drop, it came from that previous group. --> So now I have an input port. And what I'm able to do is start separating this. --> And this goes back to what I was saying earlier where, you know, you may have, you know, Tom may be responsible for all data movement that's getting written to a database. --> And so Tom says, well, if you want to write to a database, here are the specs that you need and just send me the specs. --> And my process group handles everything else. My process group will take your specs. --> It will write it to the database. You know, you do not have to worry about that task. --> And so, you know, breaking this up into different processor groups and having input and output ports really helps on, you know, divvying up not only the workload, but, you know, I've seen processor groups that everyone uses because it does some sort of enrichment of data. --> And I've had other data engineers use these processor groups and they had no clue what was in the processor group. --> They know that they could just put CSV in and output would be, you know, this beautiful JSON that answers all these questions. --> Somebody else, you know, built the logic. --> And so, you know, in that sense, you can think of a processor group as a processor where, you know, you've built this whole data flow. --> And now you've got basically a processor that you can re-use over and over. --> So there's a lot of power there. There's a lot of capabilities. --> So, and then when I got done with putting the file as each individual record, if I would have had time, I would have probably sent it to like, I like SQL. --> I like Python and stuff. So I would probably send it to a SQL processor. I would have done some SQL on the files coming in. --> That way I can compute average and stuff like that. --> And then, you know, if I got an alert like that was above a threshold, I would package that up and send it as an email. --> But that's my flow. Some of my thinking that went into mine. --> And, you know, then I try to work on some of the beautification. --> So any questions of, you know, of what we've covered in the last two and a half days? --> Okay. Well, I guess you guys, I didn't know you guys came in at like 6 a.m. --> I would have tried to get us out of here earlier for lunch. --> So let's go to lunch. We will come back at 1 50, my time, 11 50, y'all's time. --> And then we will get started with registry. We will check our flows in and, you know, we may go over some more slides. --> They work on another scenario, depending on time. --> But we'll definitely touch on registry, checking things in, because you're going to see this when, you know, whoever gets the privilege of deploying for for you all and whoever has to administer it, you're going to see registry. --> So I want to make sure that it's a different component than the canvas, but it's a sub project for now. --> And then we're going to touch on and find some other things as well. --> So go enjoy lunch. I will see everybody back here in 45 minutes, which is 11 46. --> You know, I was going to say, I'm not going to have enough person or keep pulling it. --> But those two things I've been hearing a lot. --> The keywords I've been hearing a lot is multi-tenancy and registry. --> I hear them talk about that all the time. Yep. --> And those are things that you're going to use. --> And when I say go over some slides, I actually have Q&A section about multi-tenancy. --> And then I also have multi-tenancy, you know, just about multi-tenancy. --> I just don't want to kill you too much with PowerPoint, but we will go over. --> Appreciate it. Yeah. --> I mean, I don't know if I want to be the essay of it. I'm hoping not. --> I'm hoping the person deployed it now owns it. --> No, you're getting stuck with it, Tom. You're the expert. --> You never know. --> That's how it usually goes. --> You're the captain now, Tom. You're the captain. --> Sure. That's how it usually goes. --> All right. I'll see you guys here soon. --> All right. --> Okay. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> All right. --> Hopefully everyone's getting back from lunch. --> Okay. --> Hey, Joshua. --> I have a question on mine. --> Yeah, Peter, go ahead. --> So, I managed to change the way that it was naming them. --> I got a good name format that I like now. --> but it's still not exporting data as a CSV. --> It exported all of the names by date, the hours, --> data was recorded, and then the station. --> Nice. --> So each of those, between those two variables that are, --> between those three attributes, --> that's gonna create a unique ID. --> Uh-huh. --> Same C-Tracker, or, yeah, I guess C-Tracker separately. --> Okay. --> This is what it looks like when you open one of them, --> though it just has one line of data --> instead of having no, there's no headers. --> Well, it's a CSV file. --> So it's a single, well, wait a minute, is that, okay. --> So it looks like that's per record. --> It doesn't like a, so it is just a comma separated value, --> but it doesn't have any header. --> You know, there's no headers there. --> So you don't have like the attribute and then the name, --> which is like, you know, you may have temperature or date --> in the, you know, the correct date. --> Did you format it? --> No, that's the date from the attribute. --> So it is, it does look like a CSV to me. --> Unless there's a comma somewhere in the data, --> I don't think I see any commas in the data. --> So I think it's a proper CSV. --> It's just no header file. --> Okay, yeah, I see what you mean. --> Yeah, so. --> It doesn't have a header for you, --> or like a name for each of the. --> Yeah, a name for each of the categories. --> So that's something you can add. --> You could add the, let's look at the, --> how would you add that header back right quick? --> Let me look at mine. --> Let's see here. --> I like your date format. --> It's pretty cool once you start messing --> with some of the regex. --> Let me go in here. --> I remember where I got that from, somewhere further back. --> All right, let's bring down the, --> you're doing an attributes to CSV, right? --> Yes. --> All right, let's do this real quick. --> I think the, it includes schema. --> If you have a schema, --> the schema attribute names will also be converted to CSV. --> So that way you have a header file. --> So it's gonna give you the attribute name. --> In your case, it'd be like temperature, --> station ID, those types of things. --> So if you set that to true, --> it should give you all the attributes. --> The, you know, it's not gonna say attribute. --> It should say temperature, --> and then the value in the next line. --> So that way you always have a header file with it. --> But it does look like regular CSV to me. --> If you want, if you can copy it and send it to yourself, --> you can just see if Excel opens it with no problem. --> There's a way to upload files through the drop files --> here on the bottom of your screen. --> But I don't think this platform --> has an easy way to download. --> You could use something like Pastebin, not Pastebin. --> One of the things that we use for this class --> sometimes is Etherpad. --> Here's like an Etherpad that I have saved from, --> I teach this class as well as the --> DoD architecture framework class, --> just because I've had to implement that --> so many times now. --> But you can actually go to Etherpad. --> You can use this company, NobleProg, --> you can use their Etherpad, --> and if you go to etherpad.nobleprog.com, --> you can just create one and say okay, --> and it's gonna create a new pad for you, --> and then you can just use that same address --> on your local and copy your information over. --> But that looks like valid CSV. --> Looks like you just need to include the attribute list --> and you'll have a header, --> and you're off to the races. --> Yeah, I like it. --> Yep, yep, okay. --> Well, hopefully everyone had a good lunch. --> I wish I realized you all came in at 6 a.m. --> because you probably take your lunch --> closer towards 11. --> I know yesterday and the day before --> we went a little bit past that. --> It would have made me happy --> because I usually eat lunch at about 11.30 my time, --> which is 9.30 y'all's time, --> so I didn't realize y'all start so early, --> and so do I, so good deal. --> Anyways, we are going to now, --> feel free to follow along with me. --> I could kill you via PowerPoint, --> but again, I feel like hands-on is the best, --> and it really teaches you how to do these things. --> So on this next topic, --> we're gonna talk about NaPy registry and what it does, --> and then we are actually going to configure --> our NaPy instance to use registry, --> and we're gonna then check in our code or our data flows --> and check those out before we roll in the multi-tenancy, --> which is a PowerPoint, unfortunately. --> And then if we have some time, --> I'd like to do a little scenario, another quick scenario, --> and then do a little QA-type, test-type thing, --> and then we can go get done and get through --> and get through to a long weekend. --> So let me minimize this. --> Okay, so if you remember when we installed NaPy, --> we had a zip file I showed you. --> We could go and we can download the zip file. --> Let me bring my browser right up. --> So I went to napy.apache.org, and if I click Download, --> it is going to, you know, there's the NaPy 2.0 --> that I was talking about earlier. --> You can download the binaries. --> You know, the latest stable branch is 126. --> It's also the version that it's my understanding --> some of you may be using when they install it. --> So, but you know, feel free to, --> you can download NaPy, and we did. --> We downloaded the binaries. --> We downloaded the NaPy standard. --> Now, there's NaPy standard, which is what we are using. --> There is NaPy stateless, which is, it's really cool. --> It's not been out that long, --> but it gives us a capability to run these flows --> as a stateless service. --> So, you know, you can have a data flow --> and it get packaged up basically as a microservice, --> and we'll run in your Kubernetes or your AWS --> or Azure as a service. --> So you have a microservice that's only thing --> that it's doing is executing that data flow, --> and that's it. --> So there's no UI, there's no way to modify it --> and things like that. --> You know, it's for those production grade data flows --> that you wanna set it and forget it. --> And then if there's any changes, --> you can quickly build another one and deploy it as well. --> So that's stateless. --> There's NaPy Kafka Connector Kit. --> Kafka, you probably have heard of it. --> It is another open source project that is widely used. --> Kafka is used more than NaPy. --> It is a very popular PubSub type of capability. --> It provides a message bus --> where you push data to topics --> and you can subscribe to topics, --> you know, do those types of things. --> And then there's the NaPy Toolkit. --> You can download that as well. --> So if you are a sysadmin, --> you would be interested in the NaPy Toolkit. --> It's the one that's gonna help you start setting up --> some of your security things, --> as well as changing usernames and passwords --> if you're locally authenticating like we are. --> You know, it's got a bunch of different capabilities. --> The toolkit is command line only. --> There is no GUI, there's no UI. --> You know, they don't make it easy for that. --> And it's a, well, they made it easier for the sysadmin. --> So if you're a sysadmin, --> you may be interested in those, you know, --> different flavors of NaPy. --> But one of the things is, you know, --> if you look at projects, we have Minify, --> which we are definitely gonna chat about --> just because I know the other group is going, --> is using it and is going to use it even more. --> We have Minify, we have Registry, --> we have Flow Design System. --> Flow Design System we're really not gonna go into. --> It's a sub project of NaPy, --> but basically if you were building a custom UI --> to interact with NaPy and other stuff --> and you kinda want it to be the same style and layout --> as NaPy with some of those reusable components, --> that is the Flow Design System. --> So, you know, not a lot of people are using that right now. --> There's not a whole ton of uses for it. --> It's more if you're building your own custom product, --> I feel like, you know, in those types of things. --> So, but the one that we are going to touch on first --> is Registry. --> So if you, it's already downloaded on your lap, --> on your virtual machine. --> So you do not need to follow these steps, --> but you can go to the Apache NaPy Registry sub project --> and you've got documentation. --> You've got, you know, here's the Registry documentation. --> Here's the Wiki. --> Here's videos on getting started and running it. --> Here's what it does, you know, those types of things. --> And then, you know, you can download Registry --> just like everybody else. --> If you click download and click Registry, --> one, two, six, binary, this is right there. --> You can even download the source code. --> But you don't need to worry about downloading it --> because we are going to install it. --> So with that being said, --> if you kind of want to follow along, --> what I'm doing is going to my desktop. --> I'm going to go back into the downloads folder. --> In the downloads folder. --> Let's see, let me double check. --> Awesome. --> You should see a file that was made earlier this week. --> NaPy-Registry-126.0-bin, a zip file. --> You know, it's just like the NaPy zip file --> except for it's NaPy-Registry. --> So what we're going to do is right click --> and we're going to extract all. --> And I'm going to just leave it, --> the destination in the downloads folder and let it run. --> That'll take just a minute to extract. --> Okay. --> And the way that I like to do some of these sub-projects --> I know I haven't really told you what NaPy-Registry is --> or what it does or any of those types of things. --> What I like to do is just go right into it --> because you already have data flows --> that we can check in. --> And so, you know, that I think --> by getting the hands-on approach --> is way better than a PowerPoint --> but it should be extracted by now. --> And if you get caught up on any steps, --> if you missed a step, if I go too fast, --> you know, you get caught away for a second --> and you missed a step, I don't like, --> just interrupt me and let's get you on the same page. --> So the layout of this file folder --> is a lot like NaPy, right? --> Unlike NaPy though, --> we don't have all the data governance reporting --> and all the content repository --> and those types of things. --> But we do have a bin, we do have a conf, --> we have the docs and EXP and a lib directory. --> We've kind of went over lib directory in NaPy, --> it's a processor in our files. --> Same type of thing within registry. --> You know, the lib directory holds all the libraries --> and any kind of components of a registry. --> You know, you have the docs folder, --> the comp folder and the bin folder. --> The bin folder has the executable to run registry --> and, you know, of course we're gonna go into that. --> Just like we were doing with NaPy, --> we're gonna run registry. --> But before we do that, --> I wanna go into my configuration directory. --> I understand that not everyone will have to do this --> and that's great, but, you know, --> for some of the sysadmins on the call, --> you know, folks that are gonna help set this up, --> you know, this is where your settings are located. --> So if you go into the comp directory --> and you go to NaPy-registry.properties, --> should be able to edit that --> and open it with the Notepad++ --> that should have already been installed as well, --> if you wanna kind of follow along. --> Or you just watch my screen on this one. --> So we have our web properties. --> That's exactly like it was in NaPy. --> Some of the security properties. --> All of these should look very similar --> to the NaPy properties. --> It's in the same format, same naming convention, --> those types of things. --> Except for, you know, instead of NaPy.db.url, --> it's NaPy.registry.db. --> So, you know, if it shares the same property as NaPy, --> then that property is just changed --> from NaPy.period to NaPy.registry. --> Some of the key things is, you know, --> your security settings and those types of stuff. --> The configuration file for NaPy.registry --> is much shorter, right? --> It's a sub-project of NaPy. --> There's not a lot going on there. --> And, you know, it's not a heavy lift. --> You know, it's a much smaller package. --> NaPy is about two gig when you extract it. --> I think registry is like three or 400 meg. --> But the main property we'll look at is --> this is just, what port is this going to be running on? --> So, for instance, in our configuration, --> it's gonna run on 1-8-0-8-0. --> So, if you know the internet, --> everything is running on port 80. --> And then it usually will take you to 443, --> which is secure. --> 8080 is your backup, unsecure web port. --> And then 1-8-0-8-0 is the backup to that backup. --> So, it's running on the backup HTTP port of 18-0-8-0. --> And so, when we visit the browser, --> we're just gonna go to localhost, --> you know, colon 1-8-0-8-0 slash NaPy-registry. --> And so, we don't need HTTPS. --> We don't need any of that. --> The reason being is, you know, --> registry doesn't need the security --> because if you install registry --> and it's public facing and things like that, --> it can't execute code. --> It can't execute any of your data flows --> or anything else. --> You know, the NaPy-registry subproject --> is only for your versioning control, right? --> That is, you know, making sure that your flows --> have a version control system in place, --> a place to put those, a place to operate with, you know, --> I know you all use Azure DevOps on-prem, --> you know, so a place to operate, --> you know, in that type of environment. --> And we'll go through it, but, you know, --> it's not secure, there's no username or password. --> So, when this gets set up in your environment, --> it will need, you know, in your, you know, --> the regular dev test prod, --> you're gonna need to have security enabled, --> you're gonna need to have logins and policies --> and those types of things. --> But for the sake of this, --> luckily we don't have to use it. --> So we didn't make any changes to our properties. --> I'm gonna just leave it alone and exit out. --> I just wanted to point specifically that one property out. --> So that's the main property. --> We're gonna go actually into the bin directory --> and we are going to say run NaPy-registry, --> just like we were doing with run NaPy. --> So it's gonna open it up, say run. --> And it should start just like our NaPy. --> You should actually have two command line boxes up. --> And, you know, when registry runs for the first time, --> it's gotta create its folders, --> it's gotta unpack some of the, you know, --> now I have a logs directory, I have a work directory. --> It's gotta unpack, you know, --> some of the library files that's included with it. --> So give it just a minute. --> But when it does come up, you should be able to visit, --> you know, let's just do a new tab --> because we're gonna need our NaPy instance in a minute. --> But let's just do a new tab and go to wrong browser. --> Sorry, I have to, I was doing a new tab on my own desktop. --> All right, so say run, run that NaPy-registry. --> You might get a Java warning like I have. --> Just say allow access because this is a network, --> you know, application. --> So registry should run, takes it a minute, --> takes a minute to start up. --> While that's starting up, bring up your new tab. --> You wanna go to HTTP colon front slash front slash --> 127.0.0.1, still on localhost, --> one eight zero, eight zero for that backup port. --> And then NaPy-registry. --> I don't know why, but, you know, --> I don't know why they make it this way, --> but if you do not put, you know, slash NaPy, --> to go to NaPy, it will ask you like, --> hey, I think you're trying to go here. --> I'm gonna automatically redirect you. --> But registry, if you go to just the IP address --> in the port, it will just tell you if I'm up, --> you know, it'll give you an error that it's not there. --> So they don't have a redirect like they do with NaPy, --> so you have to go to the slash NaPy-registry. --> So what I'll give everybody, just a minute, --> let me check everyone's screen. --> You should have registry up and running. --> Should be starting. --> Ecta, you're looking good. --> Tom, you've got it working. --> Peter, give it just a minute. --> It might take a minute to come up. --> So what this is, the registry basically is --> a way to check in and out your flows. --> Exactly, and I'm on purpose, --> not like not going into the full details of what it is, --> but you got it, right? --> So, you know, NaPy needs a way to check in the data flows. --> And the data flows, you know, --> it's not like a Java application, right? --> So, you know, you may have some sort --> of version control already, and you know, --> that's where you're storing your Terraform information, --> you're storing your Chef and Puppet, --> you're storing your Java applications, your Python code. --> Ansible, Ansible, you got it, right? --> But a data flow is a little different. --> Now, underneath the hood, --> a data flow is just a JSON document, right? --> We imported just a single JSON file --> into our new processor group for the previous scenario --> so we can do the first time controller service. --> But, you know, the way NaPy, --> but then it's still like, you know, --> with code, you can just pop it open in your IDE, --> start working on it. --> You can't necessarily do that with ease --> with just a JSON blob of text. --> And so you need that interpretation, --> that interpreter. --> So yeah, Registry gives us that interpreter. --> Registry then also connects --> to your versioning control system. --> So think of Registry as the avenue --> to source code like versioning control. --> Registry is gonna handle the comments, --> Registry is gonna handle the versioning, --> you know, as well as, you know, --> being able to put that into your GitHub or GitLab --> or whatever Git repository you're using. --> But, you know, you got it, Tom, you understand it. --> But let's go ahead and make sure we got everybody up.