39:11
2024-05-06 08:57:50
3:11:28
2024-05-06 10:30:19
24:39
2024-05-07 07:20:20
19:49
2024-05-07 08:03:25
1:14:35
2024-05-07 08:35:13
15:41
2024-05-07 10:06:25
38:33
2024-05-07 10:37:03
2:48
2024-05-07 11:19:01
59:37
2024-05-07 11:33:56
6:10
2024-05-07 14:55:25
39:40
2024-05-07 15:02:44
9:45
2024-05-08 06:44:21
29:27
2024-05-08 08:09:24
2:51:40
2024-05-08 12:09:24
Visit the Apache Nifi - GROUP 1 course recordings page
WEBVTT--> I --> Didn't get far no worries you want to kind of talk me through --> How are you going to accomplish this --> Yeah, so I just started off with converting the CSVs to JSON and --> Regardless, it was a failure success since you would get a failure. It was already a JSON. I had them go --> Through another conversion where it just made the JSON files pretty and then basically merged those into one --> and --> Then the part where it was execute script was where I just I don't really have experience doing Python --> So I wasn't sure how to go about that --> Okay. Okay. No worries --> Yeah, and you know, they're like said there's there's a couple of different ways, you know, you can handle this --> You could have you know use the execute script SQL --> I'd be careful --> picking up files and moving files that you know through a processor that may not be needed just because you know --> To get the best performance out of the system you want that processor to --> You know work on data. It knows about or do that single --> Task that it wants to do. So if you send the files that you know, you know may fail --> Send those a different direction have another process handle things like that --> But overall, you know great job in and again, you know --> We have a mixed audience of different skill sets and those types things. So I didn't expect --> Everyone to get finished. I didn't actually expect anyone really to get as far as some some folks did --> But you know being able to explain this explain how you're building out your flow --> It's critical. Um, you know, just keep in mind also, you know name your processor something a little bit more --> readable when you can --> Add some labels those types of things --> You know you if if you were to set this up and have a very expensive data flow --> And just help visually to to do that thing --> Recognize where those are and we'll go into some of the other visual aspects enough --> I because you know when we get to some of the security settings, but no great job --> Thanks for walking me through it and any questions that you know, you have so far on any of the --> components in the canvass or any of that stuff --> Hi, where's next --> This is really nice --> Thank you morning Tyler you want to kind of walk me through your flow what you're thinking how far you got --> You know any questions? --> Yeah, just ingest the data --> To the conversion --> I was just trying to work with this query record --> So this three records during the agitation I'm having some issues with the date column right now --> and --> then this three record is --> Pulling out the right now. I just have a wind speed warning so that --> I --> Didn't really fledge out of these two paths, but these would be going to merge for that. I --> Guess this path would be going to an aggregation report and then this path out --> I'm taking it's building all of those records and then pinning them by either day nice nice --> so --> You know another instance where we're using a query record --> And you're using evaluate JSON --> You know there's tons of different processors. I really like your layout --> You know, it's nice clean --> You can kind of follow the path in the life of a packet of data. So, you know overall, you know, great job --> Any questions I can answer? --> any issues or anything else I --> Didn't really have many issues --> The only issue I had was with the date, but I think it's just because in the schema it's pulling it in as a --> String string. Yeah --> Yeah, yeah, they can be you know --> We've seen that earlier as well yesterday they can be a little little finicky but --> Yeah, do the best if you have any other questions if you work on it you have any other questions --> Let me know I can --> if you want to --> Take a look at that when we don't break as well to see if there's a way we can quickly fix that date --> Good morning, I --> Can I can --> Okay, so we didn't get to talk much yesterday --> Converting the CSV to JSON a two different --> Get files whether it was CSV or JSON pick it up --> But then started into the flow at a different point --> My question is is there a processor that --> Allows you to want external application like if I already have a Java application a jar file sitting on the server --> Can I launch that from here pass it the input file and have that do the work? --> But you can and so say you have like a jar file that that has all the logic built in --> You know pass the flow file to that and and you know get the results --> Processing here --> So you can execute a process and that could be like a like a shell script --> You know we use this for --> Executing the like Linux commands, you know those types of things --> The arguments --> This question is is when you take a piece of data outside of NAFA --> And any kind of processing that happens there you're going to lose --> You know that data governance part --> so we're going to see in the lineage that you sent it to the process and --> We will see in the lineage when we get the data back --> But any processing or anything that happens outside of that, you know, we will not be able to --> You know capture that you know big Paul and lots of events now with that being said, right? --> You know, sometimes we have these external applications that you know do --> You know things like this very well and we call upon right we use it --> There there's ways in you know, if you don't want to rewrite your whole well --> Already a Java I bet converting it to a processor would be pretty easy --> But you know there there's ways as well to --> You know keep the NAFA ecosystem alive in your separate process, you know being able to --> You know save those attributes --> And then there's callbacks to NAFA you can --> Use to say, you know, here's the attributes associated with this flow file, you know those types of things --> But yeah, you do have a processor to to execute a process --> And if that's you know execute Java dash jar and and your parameters, you know have that --> Yep, and like I said, you know, I would love to be able to go through each individual processor --> And do that, you know, that's a different way of handling this you can execute, you know a script even --> And and you know do some of that stuff you but probably in your case --> You would just use an execute process execute that Java --> You know, just make sure that that external application gets shipped with them with NAFA --> Just because you know you deploy this and cluster it --> You'll need to make sure you have access to that binary, you know across all the machines --> But great question and and actually, you know, that's why we go through some of these is because nobody's even asked that question yet --> If you have any other questions or issues, let me know let me have to take a look --> Ain't great hearing from you today. All right Ben --> Good morning --> A little bit. Yeah, it's looking a little better a little less spotter webby --> Okay, you just kind of walk me through your flow how far you got and what issues you have --> What we did before which is you know, get the source file --> If it is a Jason put in my work folder of the CSV do the conversion --> You know, there's failures --> Funnel and then the workflow the upper ray --> Is that's the merge of the --> File but also backing up of the original --> The other stuff like making an attribute all those things my brain doesn't work in that space, you know, it's an alien to me --> No, no worries, but you did something that I would highly recommend and you know when you were going through --> Designing your data flow started laying these things out --> One of the things I noticed is you would put your original file back --> Or you would you know take a copy of the file and save it somewhere else those types of things --> You know when you're building your data flow and in some of those precautions and in some of that --> You know safety mechanism to to ensure that you know, if you are writing a file to a database for instance --> You know, you're writing the values you may run it to a file first just so you can see, you know --> Does this look exactly like I want it to go in and you know, you know --> The beauty of a processor is is you can branch off, you know hundreds of success, right? --> And so if it's a success you can send a same success to another file --> And so when you're building your flows building in some of those safety mechanisms, I feel like really helps --> And then when you're done you've got it tested you're ready to start shipping this out --> You know go in and look and get rid of the redundancy, you know go in --> You know get rid of some of those safety mechanisms. So, you know that way the flow can perform as best as possible --> but --> Oh --> Well, and I think it's funny how it's being used --> so that kind of tells me who's used this before and --> You know, who's experimented things like that. So no great job on I'm putting that in --> I like how the only other thing I would do is just you know, you know, again back to the labeling --> beautification and make it easier to read --> But usually that's at the end and when you're ready to start shipping that data flow out, you know --> You start doing those types of things. So --> No, this looks great. I get what you're trying to get at and --> And I understand where you're going your flow --> If you you know, you don't have the skill set to write code for instance, you know, that's that's fine --> So long as you you you get close --> Thank you, let's look at Amanda --> Oh, yeah, she did not want to do training today --> I --> Actually, that's who I was pulling up. Okay, Ali or Alyssa --> I --> Needed too much further. I was just basically looking at the processes --> Processers and what they did. I mean I did a little improvement like --> All my JSON now looks the same because why I added a flattened --> Oh, yeah, but basically I'm just picking up the files --> And then routing by type and then I didn't --> But I believe it was so much --> Can you open up your --> Okay, no worries --> You you did the one thing though that you know --> I was I mentioned is as you change the destination from flow file content to flow file attribute --> So that's what you needed to do --> The reason being is you can this processor and --> Some of these things even being a committer, you know, some of these things, you know --> Confuse me why we're doing it this way, but you know, if you had a flow file content --> It's only gonna let you extract, you know one element out of that JSON document --> But if you do a flow file attribute you can go through the the whole JSON tree and start extracting --> You know every every value out of there and then having that as an attribute. So I'm glad you changed that --> You know, I think I think you know if you would have had chance to go further --> You're you're really close because you know, once you start looking at the evaluate JSON and you can do --> You know some of the same things with CSVs --> You know, I think you're merging and other things later would have been a lot easier --> so --> Any questions concerns --> That can help me with media --> No, okay --> Is there anyone else in the room? --> Cuz I know you guys are sharing a room. Is there anyone else in a room? --> Look at Brett --> Okay, Brett how far looks nice looks real nice --> Good morning. Thank you. So I switched I didn't get much further than I did yesterday when I was this but I --> Switched like halfway through the way I was doing it. I was trying to use I think it was split JSON. Mm-hmm --> I --> didn't I'm I --> Wasn't getting I was able to get break things up into different files because I thought that was the way to do it --> But then I switched to this value --> Jason --> So I get the inventory file --> To Jason it's Jason I just go to this to die with Jason and then I was able to get --> The attributes --> Extracted before yeah extracted into the thing. Oh, I think I showed this yesterday. Oh, yes --> so like our humidity --> Precipitation. Mm-hmm the station and all that stuff --> and --> then --> The plan was to just feed this --> Converted CSV converted into JSON into that same thing. You mentioned yesterday that I might have to do a separate evaluate --> And I think I did because the second one I got --> Didn't parse correctly. Mm-hmm --> So I'll probably have to do just a separate evaluate Jason for that CSV to get that to work. I --> absolutely or --> you may want to --> Just you know how you're parsing your JSON parse your CSV and have is as an attribute and then --> You know with both of those as an attribute --> You can put processors on down the line that would you know, right? --> a single JSON document and --> All of it would be the same --> So there's a couple different ways. I like the path your own --> Always like using, you know, you know record reader and record writers just because --> They're reusable components. They --> You know, you can add some logic and schema and some intelligence behind it --> But I think you would have got, you know pretty close --> If you would have had the time --> Any issues or concerns or any questions you had about the overall scenario or flow? --> Perfect perfect perfect --> All right Pedro, let's look at your okay --> How we doing --> So --> My approach was to like I put a filter on CSV files so I could make it into JSON --> Well, I think I got that working --> Yeah, right there so I got those guys going oh nice nice and then and then I guess after that I was --> Okay, then just go in and do the JSONs and then merge them and then do whatever you have to do --> Okay, yeah if you were able to have time --> You --> In your queue --> And you notice, you know with those 10,000 files in the queue is it's basically halted, right? --> The reason it's halted is that error log messages --> if you were to start that it would it would clear the queue for the extract text and then the --> Process JSON files can send its queue to the extract text --> You know, so --> You know, I'm glad it's there just so we can point it out and it's a learning ability --> But yeah, you know that would have helped clear to you. I think you're on a good path. I just keep in mind --> You know, there's there's --> You know, you want to reduce the amount of processors you use in a data flow --> So if possible, right you can pick all the files up do some filtering and sorting --> As soon as possible and then you know start sending it off to its own process --> It's all its own flow and then you could merge those at the end as well --> You know, so it's something just just tips and tricks to keep in mind --> But it looks great. I like the labeling --> And those things that you you've got accomplished though. Great job --> Okay --> Shawn good morning --> All right, you can't walk me through your data flow --> Also, I didn't get too much done since we talked about it yesterday. Okay --> But I definitely learned a few lessons that I would so I would change how I had done it today --> But yeah, it's just picking up the CSVs --> I've been in schema burning JSON and then writing it again. And then this one's just I was just picking up the already --> the same output directory, okay --> Which that is one part I would do differently if I was starting over from scratch, okay --> And then I was just messing with this merge content one a little bit while you're going through with the other people this morning --> Oh nice, but it's we're just pick pick the files that were written back up --> better merger them into a single merged JSON and then --> I'm always gonna do the --> SQL statistics on it. Mm-hmm. I --> Like how you know, I like how we you know --> Folks were just extracting it using an Avril schema for the CSV those types things --> And then you know, there was different approaches such as using the sequel --> So that was really nice. I all of the you know, you have a merge content and a merge record --> All of the standard processors have documentation --> Long is pretty helpful. Yeah, you see it bolded. It's a required field if it's not --> You know, they're they're all but yeah merge a group of flow files together --> on a user-defined strategy --> I think you would have got all of those Jason's merged you could have you know --> extracted a few things from them and --> Set up some alert, you know extract the wind speed or something and and you would have been finished --> So great job and --> any questions --> Again --> But I think like you mentioned earlier it's good practice for this --> That's all safety steps in there. It is it is and and a lot of people like to just well --> I don't want to add those processors because I have to go back and delete them or I may leave them in --> You know as this gets deployed, you know those types of things --> So, you know, it's always good to have those safety steps in place --> You know, even even to this day --> You know, I'll create a flow and I'll get ahead of myself and I'm like, oh no --> I forgot to turn on keep source fall and I'm you know --> Because it's missing from the source back to the you know back to a folder --> You know, so I don't believe it. So, you know, I even I get into those situations sometimes so, you know, yeah --> Nice nice no, it looks like you got really far --> You know --> One of the things that that I didn't see enough I've seen in the past is and you know, this is across the board --> You may create a processor group that --> handles the --> picking up and filtering of --> files --> You may have another process group within that, you know that that parent level process group that --> you know handles the --> You know your ETL steps for your CSV and then you may have another process group that --> Handles the JSON and then that way each of those functions can run independently of each other --> You know keep in mind if you have I know you're you're accessing a website --> It's cumbersome. It's not automated and you're downloading it --> But if you were getting a feed of data just written to a disk for instance with you know --> Different data types formats those types things --> You don't want an error or or something else, you know, potentially blocking --> The whole flow so, you know, keep in mind you can bust this up --> into, you know --> Smaller functions so that way, you know, you may you may be processing seen JSON and maybe processing CSV --> CSV could act up, but you know, Jason will continue to process --> So just keep that in mind when you're designing your data flows. You can bust this up put it into --> You know different processor groups link those together with your input and output ports --> And you go from there so --> But great job --> Everyone, you know, you got a lot further everyone got further than I was expecting --> I knew it would it would throw a few curve balls because you know --> We were having to do some ETL steps and then that alerting mechanism I knew would kind of trip folks up --> You know, just keep in mind that you can always go back to the documentation --> You can you know the --> The description of the documentation and not five, you know should include all of this as well --> But you know, everything's on the website --> and then you know, there's a ton a processor for --> You know some of the some of these things --> and then speaking of --> documentation I found I you know, I had mentioned that --> As your was supporting now five more and more and --> So last night I was looking on this --> I was going over what I was going to you know show today and I ran across a --> New the new Microsoft Azure --> Perfect --> So, you know as I mentioned --> Microsoft is starting to really lean into not five. They you know, I I can't confirm nor deny --> but it will become a a potential service within --> And so, you know, they do have a lot of --> Documentation on this I stole you know this graphic for the slides. I'm up to present --> But there's a lot of stuff --> That Microsoft's even realist, you know the putting out so I'm gonna include a link to this and I'm gonna include other links --> You know just so you can take this back and and have that documentation, you know --> One of the biggest things I try to you know, let the class know is I'm gonna give you as much information as I can --> This is a quick three-day training. We're not on a server --> You know in a multi-tenancy environment those types of things so, you know, we've got to do the best we can with what tools we have --> But I definitely get these links out to everyone. But yeah, you know in case you didn't know there is now, you know some --> additional information --> Specifically on Azure. All right. So that being said --> any other questions about --> the Nafa Nafa's --> Registry --> those types of things before we go into --> scalability --> multi-tenancy --> And and you know those types of topics. I'll take some