2:05:45
2024-05-20 09:46:48
2:09
2024-05-20 12:30:32
2:41:18
2024-05-20 12:33:23
1:36:58
2024-05-21 08:00:54
5:24:36
2024-05-21 10:06:11
3:24
2024-05-22 06:36:04
9:25
2024-05-22 08:03:05
40:22
2024-05-22 08:14:12
2:49
2024-05-22 09:47:03
1:48:29
2024-05-22 09:50:24
1:57:28
2024-05-22 12:09:49
Visit the Apache Nifi GROUP 2 course recordings page
WEBVTT--> You may have to log in to NAFI again this morning. I know I've had to. So hopefully --> you copied your username and password somewhere easy. If not, we can... In case you shut --> everything down, you may have to also run you go into the bin directory and go to run NAFI --> as well to start it back up. Hopefully you just left it running and so it just needs to re-log in. --> Looks like the only two that we are missing is is Ekta and Richard logged in. Yeah, --> I'm on it. I was having some menlo problems here on my side so looks like I should be up here. --> I see. Perfect. Now we just need Ekta. Looks like everybody's got it almost there. Give --> another couple minutes so everybody get up and running. Hopefully Ekta can join us again this --> morning. It's your local host so it's 127.0.0.1 colon 8443. Here, I'll bring my browser up --> and share it. You should be able to see the IP but yeah 127.0.0.1 colon 8443 slash NAFI. --> Okay, yeah. We're running all these locally on the machine so we should all have the same IPs. --> Peter, he just started it. It might take a minute but it's 127.0.0.1 colon 8443 slash NAFI. --> Okay, I'll do that. I'll also put it in chat. --> And you can also, if you want, you can just bookmark it once you get to the login page or --> to the main desktop canvas. That way you'll have it for tomorrow. I don't see it running --> on, Derek said I don't see yours running so yeah go into bin. There you go. And run NAFI. --> Perfect. And say run. Awesome. It'll be running in just a minute. It takes a minute --> for everything to initialize. As I mentioned yesterday, that lib directory is full of --> processors so it loads all those processors, loads everything, unpacks all the content. --> So NAFI itself can take a couple of minutes just to get started and running. Even though --> it will tell you that it's running, it takes still a few minutes for it to initialize. --> Then just make sure you have HTTPS in front of the 127.0.1. --> Hey, good morning, Ekta. I notice you just joined us and you are logging in. Perfect. --> Ekta, just a minute to get logged in. Darius and Peter, it looks like you guys are... --> Here, I don't know if your NAFI is running. I don't see the command. --> Peter, you'll have to go back into your... Go into your folder and you'll have to start... --> It looks like NAFI is not running. So if you'll go into the bin directory and double --> click on run NAFI, it will turn the bin directory. Okay. Perfect. Right there next to the last at --> the bottom and run that. Give it just a minute or two and you should be able to log in. --> Ekta is up. I'm not sure if my mind isn't working. I'm pulling it up right now. --> Let me try something real quick. --> I can bookmark this. It will let me bookmark it for you. --> I don't know if it works. You'll have to... I can't hit control D to bookmark it. --> But yours is up and running. No worries. Peter, it looks like yours is coming up. --> I remember yesterday you said the page sometimes just gets stuck for a couple of minutes. --> Yeah, it takes time to initialize HTTPS colon. --> Okay. I'm going to give it just another second or two and it might work. --> Okay. I'm still refusing on book browsers. --> Illegal character. All right. Since it's available, I see errors for some reason. --> Oh, there we go. All right. Peter, yours is coming up. --> Yeah, it looks good. Thank you. --> All right. Go ahead and get logged in. So I think yesterday everyone did an amazing --> job of getting your first data flow built. Today is a lot more hands-on. --> We are going to dive a little deeper now that we know how to pick a file up. --> We did a basic operation of unzipping the zip file and just putting it back to file system. --> Today's goal is to pick a file up and work with the data that it has. --> So what we're going to do today is learn about controller services. --> In the process of learning about controller services, we are going to take a CSV file --> and we are going to convert it to JSON. Now, this is a little bit more in depth to NAFA. --> Feel free to ask questions. I'll give you a hint after we get through doing this. --> The hands-on is going to be picking up some JSON and CSV files and making all of them, --> I have a scenario where all of them will need to be the same format and we are going to look for --> some patterns in the data. So that way we can send alerts. --> And I've got a whole write-up on the scenario to show. --> So it might be, even in the last class, it was a little tricky sometimes. --> And so, you know, we're going to have plenty of time to work through this. --> But, you know, I'll be, I'm going to sit here and watch and provide answers because --> I know you'll have, well, hopefully you'll have questions. If not, you can just bring --> straight through it. So yesterday we went and I'm going to pull up mine. --> We went and we made our whole flow to get files from the file system, --> put those back after we unzip them and those types of things. --> We also looked at some of the, you know, maybe cleaning the flow up. --> There's still some cleaning I need to do as well. --> But, you know, renaming a processor, adding color to a processor, adding --> background labels to the processors, you know, to make them more, you know, --> easier to navigate and those types of things. The only thing that has changed that I did --> was I put all this into a new process group. --> And so how I did that, and I'll show you, is I clicked on process group. --> And I drag it down and I just named it new process group or whatever. --> First sample flow is what I named mine. And then once you have that process group --> on your canvas, let me do it so I can show you. --> What I like to do is zoom out a little bit on my navigation. --> So I have my process group here. There we go. --> And what you can do is you can hold the shift key and you can then, --> it draws a box around your whole flow. And you can take that and drag it and --> drop it right into your process group. So if you can, on your canvas, --> bring down a new process group and drag and drop, like I just did, --> your whole flow into that process group. And the reason we do this is for many, --> many reasons. If you can imagine 300 processors doing all these operations --> running on this canvas, it would get very cluttered very quickly. --> It would be very hard to navigate and understand data flows. --> Also for security reasons, the way that NAFA likes to handle --> multi-tenancy and others is a process group. On the main canvas, if you had, --> say, 10 organizations all using the same NAFA, you could create a process group --> for each organization and lock that process group down to just them. --> And then that way, from the main canvas, you can only see 10 process groups, not 300 data --> flows. So if you can, you know, copy that in, bring it down, I am going to... --> So I missed how you did that. Can you go back over that? --> Yeah, exactly. So what I did is I brought down a process group, --> name it, you know, whatever you want to name it, my first data flow. --> Oh, we'll put it in there, how I got the whole flow into the process group. --> Okay. I can drop it on the canvas with my flow, but it doesn't... --> No worries, no worries. In fairness, the latency sometimes will mess with you. --> But what you do, what I like to do is zoom out a little bit on my --> navigation. And then I hold the shift key. And while I'm holding the shift key, --> I will drag and create like a box. You see the box that's being created? --> And let go. And it will highlight the whole data flow, the connections, --> the images, all of that. Then you can take it and drag it to your process group. --> Your process group should highlight blue and drag and drop it in. --> And then you should have only a process group on your canvas. And then when you --> double click your process group, you can go right in and see your flow. --> Now, again, the virtual desktop environment, sometimes it's a little --> difficult because of the latency. And it doesn't want to select all or --> something. So it may take a couple of attempts. --> Yeah. So I tried it a little differently. I did select the entire section. --> And when you right click, it lets you also create a process group. --> But I like to drag it down so that way we can do it. But yeah, you got the --> shortcut, right? Yeah. Just no reason why it's --> because I had that same problem. My browser was just not responding to --> that for whatever reason. So I noticed that and it seemed like it worked. --> Yeah. That's another way of doing it. Yeah. But also, see, you're already --> creating breadcrumbs. And I'm not doing that at all. --> No worries. I can take a look at it. --> Let's take a quick question. Once you click inside the flow, how do you --> get back out of it? Once you're in the process group. --> Oh, my God. Yeah. I noticed on the bottom left corner, there's a --> breadcrumb. Yeah. Yeah. So if you remember, I touched on it briefly, --> but I can go right back to my parent group. And if I go into the --> process group, it creates a breadcrumb and you can go right back. --> Also, if you're in this, you can --> leave the group right click and say leave group. And you can also --> leave the group. Tom, let's take a look at yours. --> It's all jacked up. No worries. We can get it fixed. --> I don't know why my stuff's all the way to the right now. --> I don't know what I did. Okay. Well, watch my screen and --> I'll take over and get you squared away. No, I completely --> completely get lost. I lost my label. I don't know what happened. --> All right. Let's see what you got. So remember you have the --> navigation panel here. You can actually then like drag and drop --> drag it around and kind of center. Okay. That's cool. --> Did you already bring your process group down? --> Nope. Okay. So I keep deleting it because I'm not doing it right. --> Okay. No worries. So if you hold shift and I'm holding the shift --> key, I'm going to start up here and I'm going to drag this box --> all the way over your flow. And if you already brought down --> a process group, you can just copy it in there. But since --> you've got it all selected, you can also say group, name --> your group, say add. And there you go. --> So now on the root canvas, you have your first flow group. --> So you just double click and you can go into it. --> Okay. And, you know, you can then if you click like on the --> canvas, you can, you know, hold and drag it around. --> You can also select everything and say align. --> And it's not the prettiest, but like you can get it aligned. --> Actually, that messed up even more. --> That aligning doesn't work right. Let me fix this for you. --> Undo. Undo. Yeah, unfortunately. Control Z. --> I wish. Let's see. Let me get this. --> Okay. So logically, we want this here. --> We would want our unpacked files. Get file from folder. --> Right here is the first one. Yeah, that is, you know, --> that's some of the nuances and it's moving because like the --> latencies. --> Yeah, that's okay. No, I'm going to clean it up real --> quickly. The file, the VM with its latency sometimes will, --> well, it just doesn't want to work right. --> I was going to ask how you do create, you know, like save. --> I guess you'd save this template, but like do a new, --> like, like I said, get a new blank canvas to start a new --> thing. You know what I mean? Without getting rid of what you --> built. I guess, I guess this is one way of doing that, --> right? Yeah. And we're going to. --> It's me, man. It's not going to be easy. --> No, it's fine. No, it's, it's again, like with some --> software products, like this virtual desktop works great. --> But when you're real time clicking and dragging and --> type of stuff, it can be a little bit. --> Okay. Let me zoom in a little bit more. --> Okay. So we can take this guy. --> Hopefully move him over here. Okay. --> And just like what, you know, the way we write, --> you know, left to right, unless you speak Arabic, --> then you're going off some opposite. --> You know, that's usually how like flows get designed --> is either left or right, or, you know, straight up and down. --> Right. And, and that way you can branch off your connections, --> your log messages, those types of things. --> So let me see if I can. That's going back to put to not --> go that way. There we go. --> That's a lot better. Okay. That's a little cleaner, --> I think. And then while I'm in yours, so this is, --> you know, this is your main canvas. --> You can use the breadcrumb trail. --> You can go into the process group. --> And then once you're in the process group, --> you can just right click and say leave group if you need. --> Or you can just click the breadcrumb trail. --> There's many ways to get out of that process group. --> But the reason, you know, we like to do this is, --> we have this flow. We're about to build another flow. --> And, you know, you can imagine your canvas would be just, --> just flows everywhere. --> That's also why we have input output ports --> to manage connections, those types of things. --> So, you know, for the next exercise, --> you would actually probably just bring down a new process group, --> create a new one. --> And that way your main canvas stays cleaner. --> And then when you've set this up in a real world environment --> where you have multi-tendency and some other things, --> you want to be able to, you know, --> I got mine open like 20 times. --> All right. You want to be able to lock this down --> where, you know, this may be organization A, --> and then this is organization B. --> In your policy for multi-tendency --> and some of the security, you can, you know, --> have that process group belong to another organization. --> And then under that process group is basically, --> you know, that organization's main canvas. --> And so it would be blank, you know, --> mine's not because I put a process group --> within a process group, but, you know, --> it would be blank. --> And when they log into NAFA, --> that's the only process group that they have access to --> on the canvas. --> And they're able to go in to their process group --> and then, you know, build whatever data flows as needed. --> Another advantage of that is, --> and I see this quite a bit where, you know, --> organization A has a responsibility to import data, --> ETL it, you know, get it into right format, --> and things like that. --> But they may also have a requirement --> to share with organization B. --> And so, you know, within NAFA, you can have, --> you know, organization A has their own section, --> organization B has their own section, --> and then you can actually do it like an output port --> on the data that, you know, --> you need to share with organization B --> and it go to, you know, to their process group. --> And organization B has an input port that receives that data. --> So that way, organization B doesn't really see, --> you know, if you have it locked down, --> they don't really see how you made the sausage, --> just as, you know, they're just getting, --> you know, sausage delivered to their process group. --> So the sausage making can hide within that group --> whatever logic you put in. --> I know some organizations, they, you know, --> they have models and stuff like that --> they don't want, you know, folks to mess with. --> And so, you know, they'll run everything --> within that locked down group. --> And then the output of that will go, you know, elsewhere. --> So, you know, just keep that in mind. --> There's many ways to do this. --> But again, you know, I think on the last class, --> Brett and a couple of others chatted --> on how to set up some multi-tenancy. --> It's my understanding, you know, some of the folks --> on that call is working also to help set this up. --> So, you know, there's no point in everybody --> running their own instance unless you have, --> you know, that need or if you're just developing --> and learning. --> But when it comes to like test prod, you know, --> having that multi-tenancy NAFA available --> for everyone to use, you know, --> it just sounds more reasonable. --> So, you know, that's my understanding is, you know, --> potentially that's what your environment --> may look like in the future if that's the way, --> you know, that those things are set up. --> But anyway, so I need to move to parent group. --> All right, and then I can get rid of this. --> All right. --> So, if you go back to your main canvas, --> you should have one process group. --> And then when we build additional data flows --> today and tomorrow, what if you can bring down --> a new process group and work within that group. --> That way, you know, you don't have three, --> four, five data flows that we're working on --> building on the canvas and, you know, --> and just cluttered up and you can't, --> you know, find, you know, certain processors --> or anything else. --> Now, if sometimes, you know, processors, --> you know, they can be a long data flow. --> So within a process group, you may have, --> you know, a couple of processors doing --> an input port, receiving data, --> those types of things. --> And then you may have 10 or 15 process groups --> within that, that original process group --> that does, you know, all the data movement, --> logic, ETL, and even embedded into that, --> you know, you will have different process groups --> and those types of things. --> So I've seen it six, seven, eight levels deep. --> And that's the reason we have a breadcrumb trail. --> So you can actually go back out, you know, --> very quickly and go to the process group you need. --> But if you get lost on a flow, --> you do have a search bar. --> So you can see if I can do. --> And then this one, for instance, --> is, you know, named connection to mind type. --> I think it was get file. --> Yeah, connection to get file to identify mind type. --> So that's also, you know, why you want --> to kind of label these with, you know, --> a good readable processor name --> because you can easily just search --> and, you know, find whatever you need to find. --> So, you know, it's all up here on the toolbar. --> And then now that we have processors --> on our NaFi instance, you can see the status bar. --> You know, I have 10 stopped processors. --> I have two that need attention. --> We haven't put any data through. --> We don't have any other connections --> or disabled services yet. --> So they're not showing up. --> But yeah, everything's on the status bar. --> You can search and those types of things. --> So that should make it a little easier, --> a little cleaner when we start building more flows --> and those types of things. --> So for this morning, we are going to learn --> about controller services. --> I think I mentioned controller services briefly. --> And let's see here. --> There's only one of the slides --> I mentioned controller services, --> but it doesn't matter. --> So controller services, I mentioned like --> a database connection where you can establish --> that database connection, that username and password, --> the IP address, the port. --> So if you're going to, you know, MoriahDB --> or MySQL, you know, the port that --> that database runs on is usually 3306. --> It has an IP address. --> It has a username and password, all of those things. --> So, you know, those are some sensitive information --> that you may not want to share across the board --> with everybody that's using your NAFA --> if you're like over managing this system. --> And so, you know, as a sysadmin or Dataflow developer, --> you can create controller services that, you know, --> here is you can put in the database connection. --> You can put in the database connection information --> and all of that. --> And then that way, when people are utilizing --> that database connection, they just use the one --> that's already set up and running. --> So they, you know, it saves time for other data engineers --> because they can just reference the connection that, --> you know, that controller service database connection --> that's already set up. --> You do not have to give out the username, --> the password, the IP address, the port, --> you know, those types of things. --> So it's really nice, you know, to have, you know, --> some of that shared services. --> And that's what it is, is, you know, --> controller services are shared services --> that can be used by processors, reporting tasks, --> and other controller services. --> But, you know, again, the nuances --> like removing a processor and putting another processor --> in that we ran into yesterday, --> in order to modify controller service, --> all the referencing components must be stopped. --> And there's ways to, like, stop all referencing processors --> together, but, you know, if you modify --> that controller service, you know, --> all the processors that are referencing --> that controller service will also need to be stopped. --> Luckily, you know, and that sounds, you know, --> you know, like a lot of work, --> but luckily once you establish your database connection, --> your database controller service, --> unless the IP address changes or the port changes --> or username and password changes, --> even username and passwords, you can automate that. --> But unless there's some major changes, --> you should be able to install that controller service --> and, you know, everyone take advantage of it. --> And then update, if it updates, you know, --> it shouldn't be that often. --> We see, like, in the real world, --> controller services running for years --> without any interaction, just because, you know, --> the database that's referencing is always there. --> So within the data flow, to scope a controller service, --> it can be created within any process group. --> You can create a controller service, --> you know, within a processor. --> You can create it within a processes group. --> You can actually go to the hamburger menu --> and you should see some controller services as well. --> We don't have any installed yet, but, you know, --> you can, if you have it on your main canvas, --> it will show up here. --> Later, when we install registry, --> we're actually going to install the registry service --> that everyone gets to use and, you know, --> you install it once. --> So that's one way of connecting to it. --> So what I'm going to do is kind of go into my sample flow --> and kind of walk you through how we do this. --> You're more than welcome to follow along. --> It is, you know, this is a little bit more advanced --> because we're converting CSV to JSON, --> setting the name and writing the JSON out. --> We're going to use a controller service for that --> and, you know, kind of show you how that's done. --> So the first processor, you know, of course, --> is we're going to get a file from a directory --> and this should not be configured. --> The file that we're looking for is inventory.csv. --> It is a CSV file. --> Pull that up and show you what that looks like. --> So the goal of this is we're going to take this CSV file, --> which is to store item and quantity --> and make it a JSON document. --> So that way all the data that we're receiving as CSV, --> we can convert it to JSON and, you know, --> do further operations. --> So that's the file that we are going to work with. --> You all have access to this file. --> You should. --> Let me see. --> If you don't, I can put it on your desktop. --> But for now, I'm just going to walk through with me. --> So I have this inventory file. --> Again, we're using a Git file processor. --> So I'm going to just copy the path --> and I'm going to put it in. --> The file filter, you know, is a little different. --> You can actually filter just on the file name. --> I put in inventory.csv. --> I can put whatever CSV, whatever, you know, --> zip file, JSON documents. --> It doesn't matter. --> It's only going to pick up inventory.csv. --> And then, of course, I have, you know, --> keep source formatting to true, --> recursive subdirectories. --> There's no other subdirectories --> and those types of things. --> So I have my Git file, you know, where I need it to be. --> And that's the first step, right? --> You know, read the CSV file, get it into a flow file --> so I can start operating with it. --> So the controller service we are using --> is a Avro schema service, --> a JSON writer service, --> and a CSV reader service. --> And we're going to go into those. --> But, you know, to convert CSV to JSON, --> there's a couple ways we can do it within NAFA. --> This is the most optimal way. --> The hands-on exercise is going to be a little different. --> It's very similar to this. --> But the hands-on exercise, --> you can use a controller service. --> You can not use a controller service. --> There's processors to do the scenario that's next. --> And so, but for this, we need to set a schema metadata. --> That means as soon as we get this data, --> we need to bring it in --> and we're going to set this attribute. --> So we use the update attribute processor. --> And so when we get this data, --> it's going to show just like we did yesterday. --> Here's the file name, here's the file size, --> those types of things. --> What we're doing, though, --> is we're adding to that metadata for the flow file --> an attribute called schema.name --> with the value of inventory. --> So that is going to tell our controller service --> which schema to use to convert the CSV to JSON. --> I can have another schema that's called, --> you know, conversion or whatever. --> And so, you know, as I bring data in, --> I could filter and sort that data --> and depending on the type of data, --> I would assign it, you know, a schema attribute --> that, you know, could be, you know, --> whatever name I needed to be to match up. --> But this one is schema.name is the property value. --> Inventory is the actual value. --> Now, schema.name is important here --> because NaFi is going to look for that attribute --> and we'll see that as we start setting up --> our controller services. --> So NaFi is going to look for this --> and it's going to ask, you know, --> the attribute, you know, --> what schema name you want me to apply --> and we're going to apply the inventory. --> So this process, --> next processor in line is a convert record. --> And again, you can, --> if we were converting CSV to JSON, --> I could use the processor. --> There's a extract record. --> There's a valid update record, --> split record. --> I can route text. --> I can use regular expression to pull that CSV out. --> Those types of things. --> I can write, well, this one's writing attributes to CSV. --> So I can actually extract the CSV, --> write everything out from attribute to a JSON document. --> You know, there's a few different ways --> to see attributes to JSON. --> So, you know, imagine your data flow, --> you know, you're importing this CSV. --> You can extract it. --> There's many ways to extract that. --> Once you have it extracted, --> you can use a processor to write it back as JSON, --> you know, those types of things. --> This is one of those nuances that I go through, --> you know, pretty constantly, --> is, you know, there's many ways to skin a cat. --> The most optimal way of doing this --> is the method I'm using right now, --> where we are using controller services. --> That, you know, I feel like it's a lot easier as well, --> because I don't have to do regular expression --> to extract text, you know, in those types of things. --> So I'm using the Convert Record processor. --> We'll go ahead and configure this, --> and that way we can... --> And you can see here the Convert Record, --> converts records from one data format to another, --> and it uses a record reader --> and a record writer controller service. --> So what, you know, like I mentioned, --> we are going to use the record reader CSV service, --> and we're going to use the record writer JSON service --> to do this. --> But any time you can, you know, --> pull this up and look at the documentation, --> the allowed values in the record reader, --> you know, there's CSV reader, there's a JSON, --> there's an Excel, there's XML, Windows event log reader. --> A lot of times this is used for, --> you know, cybersecurity use cases where you're pulling in --> syslog, event log, you know, those types of things. --> As you can see, there's a syslog reader, --> multiple different formats for syslog. --> But for us, we are going to use the CSV reader. --> Now the writer, we have a JSON record set writer. --> We could read something in CSV or JSON --> and convert it to CSV if you wanted to. --> But for this case, we are going to bring it in. --> We are going to bring this CSV in and convert it. --> So for the Convert Record, we need a record reader. --> So for this case, we have the CSV reader. --> And I've already selected it, --> because I know that that's a CSV file coming in. --> Record writer is going to be the JSON record writer. --> I've already got it selected just because that's the --> value that I'm going to need to write that out. --> One of the things that you'll notice with a processor --> that does anything with controller services, --> you will have a little arrow right here on the far right --> to actually go to that controller service. --> So if you were to drag and draw a Convert Record, --> let's do that. --> If you were to drag and drop a Convert Record --> or any kind of record, --> let me actually use a different one. --> So I can select a new service. --> So I've already got a JSON record set, --> but I can create a new service. --> And here is the services that I have available. --> So if I want to do whatever, I can set that service. --> But any time you use a controller service, --> you're going to have that arrow. --> So you can actually go in and configure the service, --> because things are not working right. --> On these, we don't have any kind of services, --> so we're not going to get the arrow. --> So what I want to do is, --> because I'm going to take that CSV in, --> I'm going to read in CSV. --> I'm going to tell you to read it out as JSON. --> So I want to go to my CSV record reader service. --> And so now I'm bringing up my controller services. --> I have four listed. --> One's the Avro reader. --> One's an Avro schema registry for this flow, --> a CSV reader, and a JSON record set right. --> So let's go to the CSV reader first --> and click the little gear icon, --> and I can configure the CSV reader. --> First thing you notice is the use schema name property. --> So if you remember the previous processor, --> we said, you know, set the attribute schema.name to inventory, --> because NaPy is going to look for schema.name. --> So for this, the schema access strategy --> is to use the schema name property that we set. --> The registry, you know, --> the schema registry is an Avro schema, --> so it needs to, usually within NaPy, --> it works with Avro, you know, formats. --> If you're not familiar with Avro, --> it is a serialization format for record data. --> It's used, you know, quite extensively, --> you know, throughout the community, --> but it's modeled all in JSON. --> So, you know, that way we have a schema that says, --> okay, well, I'm going to extract this CSV, --> I'm going to extract every column, --> but I need to know where to put these values --> that is going into this data. --> And so Avro is what we're using here. --> I do realize that, you know, this is a bit technical, --> and so, you know, just let me know --> if you have any questions. --> But for that controller, CSV controller service, --> we are going to use the schema.name property. --> We have a schema registry, --> and again, because we're now referencing --> another controller service, --> you should be able to see the arrow that goes to that. --> The schema name is, you know, --> this is how in NaPy you would like --> read the schema name using --> the NaPy regular expression language. --> If you have any questions on the NaPy --> expression language, --> there's a whole guide. --> So, you know, there's a lot to this, --> and there's a lot, --> and so if you're familiar with any kind of regex --> or those types of things, --> this should look very familiar. --> If you're not, you know, we'll work through it, --> but NaPy has its own. --> It's based off of Java, of course, --> regular expression engine, --> and so this is how you reference --> the file name, for instance, right? --> And so, you know, I can call this in a property, --> and it will return the file name attribute. --> So for this use case, though, --> we told it that it's going to use --> the schema name property. --> The name of the schema is schema.name --> that will match that update attribute. --> So if you noticed on the update attribute, --> it had schema.name and then it had inventory. --> So this tells, you know, --> the controller service which schema to use --> and which property should it look for. --> So that's where we get the schema.name. --> Some of the other things that are required, --> CSV parser, we're using the Apache common CSV parser. --> I think there's a Jackson CSV, Jackson JSON. --> I like the Apache commons just because I know it works. --> CSV format, if you have a custom format, --> you can do that. --> You can use Excel format, MySQL format. --> There's all kinds of formats that it can read from. --> Here we're going to put custom --> because the value separator is a comma. --> It's a regular JSON. --> It's a regular CSV, I mean, file. --> So it's got just, you know, --> your value, comma, your value, comma, your value. --> So we're going to set it to the value separator is a comma. --> The record separator slash n, that's just a new line. --> We want to treat the first line as a header. --> Actually, that's one of the bigger issues I have seen, --> you know, for learning this is folks will always forget --> to treat the first line as the header. --> And, you know, it'll throw data off because, you know, --> it's expecting, you may expect it, --> you may not expect to pull that data in, --> and then you've got the header as an actual data value --> in the JSON or the formatting will be off, --> you know, those types of things. --> So, you know, treat the first line as a header if you have it. --> And then, of course, you know, --> a lot of these are already kind of filled in. --> The quote character, the escape character, --> those types of things. --> So, oh, I don't think we'll get ahead of these. --> But anyway, so the one that we're looking for, though, --> is just saying that it's a comma separated, --> and the record separator is a new line slash in. --> We've got our Apache Comm and CSV parser, --> and then we've got our, you know, --> we're updating it and telling it to use --> the schema.name property that we have already set --> in that previous processor. --> And so I can apply, let me go back into that. --> And now I've got that configured. --> Now I'm wanting to go to my, the, the Avro schema. --> So if you notice, I had the property value --> for schema.name set to inventory. --> And so, you know, it, it knows when it pulls it in, --> that controller knows that it is to use the schema.name --> as the model to, to convert this. --> And then the name of that model is inventory. --> So I can actually have multiple different Avro schemas --> here on, you know, if I were bringing in --> CSV data from multiple different sources, --> I can just set this up where, okay, --> well, it's going to all convert to the same format --> and how it gets converted and recognized, --> I can split that schema up. --> So, you know, I can set the attribute to, --> you know, store or, or price or whatever for the inventory. --> Let me expand this. --> So a lot of times when we're working in NaPhi, --> you, you, you have a, a, a small box. --> And if you want, you know, a lot of these boxes, --> you can just drag and drop and, and make them, --> like, easily readable. --> So here I have a basic schema. --> Again, I want to take all of this data. --> I want to read it in and I want to put it as a JSON. --> So I have store, item, and quantity. --> So if you notice, the type is record, --> the name is inventory, and here's the fields --> that is going to go into that JSON document. --> Store is, is the first one. --> Item is the second one, and quantity is the other one. --> And you see, it will match my data. --> And what I can, if you, I'll make sure you have --> access to all of this when you're working --> on your scenario, as you may want to use it. --> I understand I'll sell you, this may be the first time --> you've ever seen an Avro schema, --> and we, but we'll work through it. --> So my schema is very easy. --> It's very, you know, it's a very simple schema. --> It's three fields. --> So I built my schema. --> I put that into the Avro schema registry. --> So, okay, apply. --> And now, now if I've, you know, now I've got this, --> the schema registry, and I can add different schemas. --> I can, I can do all kinds of things. --> But the nice thing is, is now I can just reference --> that schema name in all of my data flows, --> and I only have to configure this one time. --> And I only have to configure the schema one time. --> So it makes rents and reuse of those controller servers --> is a lot easier. --> So that was my CSV reader. --> It's going to read the CSV. --> It's going to use this schema to convert it to JSON --> and write a JSON document out. --> Now, my second step was writing it out as a JSON record. --> So the right strategy for this is to, you know, --> use the schema because it's already filled out. --> It already knows to extract that first column, --> put it into the JSON document, extract the second column, --> put it in, so forth and so on. --> So we're giving it, you know, here's the schema --> that registry that we're using, right? --> It's the same one that we were using --> from the CSV document. --> Here's the name of the schema, you know, --> and a couple of, you know, pretty JSON, --> just so it looks nicer. --> And some other, you know, is there any kind of compression --> or suppress no values, those types of things. --> So this is the controller service --> to write that JSON document. --> So what we're going to do is apply that. --> And so what happens is now we've got these --> controller services configured. --> We have our, you know, schema registry right here --> that we've already worked off of. --> We also have the Avro reader. --> So when you use the Avro schema registry, --> it'll automatically add an Avro reader --> because it just needs to read that schema --> in an Avro format to be able to write --> that JSON document. --> And so to get this working, though, --> we're still, you know, some of the controller services --> are disabled. --> So the other services are not going to work. --> You can see that, you know, it's just like --> when we're working with a processor. --> You can highlight over the yellow yield icon --> and it'll tell you why it's not working properly. --> So for this, it's because the schema registry --> is invalid because the controller service --> is disabled. --> Same here. --> So after you get through with working --> on your controller services and you've done --> your configuration, you need to enable them. --> So what I like to do is click the little lightning bolt --> and it starts to enable that service. --> I can select service only or I can select service --> and the referencing components. --> So, you know, because the, you know, --> processor references this service, --> I can enable this service and it will enable --> that processor as well. --> So I want to do service only. --> The reason being is I want to check my data flow --> to make sure everything looks good --> before I turn everything on. --> So I've got the green check box. --> I enabled that service. --> It's up and running. --> You see the state is enabled. --> Same thing here. --> I'm going to enable just the service here. --> It will let me know the referencing services. --> You know, it's got the CSV record reader, --> the JSON record writer, --> and you can actually see the processors too. --> Convert CSV to JSON, convert CSV to JSON. --> So it's that same processor, --> but, you know, you can see the services --> and processors. --> That way you can make a decision --> if you want to enable just the controller service --> or everything that goes with it. --> So we close that. --> Now our CSV record reader and writer --> is no longer in a, you know, yield state, --> but it is disabled. --> So now everything is good. --> It's good to go. --> We just need to enable it. --> So with those first two services enabled, --> we can enable our other two services. --> And now everything is enabled. --> So we can just exit our controller service --> and now our convert CSV to JSON is stopped. --> It's no longer yellow. --> It's got all the services configured --> and so forth and so on. --> So now we're getting the file, --> picking up from inventory. --> We're just updating the attribute --> to set a property name of schema.name --> to inventory. --> That way that attribute, --> when it gets to this processor, --> that attribute is set. --> And so that processor is going to look, --> based upon what we told it --> with the controller services, --> it's going to look for the schema.name property --> and then it's going to look at what value --> on which schema to use. --> So anyway, so we have CSV to JSON working now. --> We need, you know, this is a new document. --> So we are going to need to name this document. --> And all we're doing is do an update attribute. --> For this, it's very easy. --> As I mentioned, the file name attribute --> is right here in the Notify expression language. --> This is how we reference the file name. --> So in this scenario, we're just saying, --> okay, get the file name, add .json to the end. --> So it's going to look at that file name attribute --> and it's going to say, --> okay, I've got the file name attribute. --> All I need to do is name it --> with that name plus .json. --> Apply that. --> And then, of course, write the file back to a directory. --> So for this, I want to go back. --> And I want to say inventory JSON. --> Okay. So if we've done this right, --> we will be able to pick a CSV file up, --> set the schema, convert it to JSON, --> set the name of the JSON document, --> and then write it to file. --> So let's run this one time and see how it goes. --> All right. --> So we have our CSV in our queue. --> We can actually look at the attributes. --> And you see file name attribute --> with the file name of inventory. --> But what we don't see is the schema.name attribute. --> The reason we don't see the schema.name attribute --> is because it hasn't went to that processor yet. --> This one has a viewer because, you know, --> like I mentioned earlier, --> it's a zip file like we were working with yesterday. --> You will not have a viewer. --> But, you know, a CSV file, JSON, XML, --> text-based, you know, data, --> you're going to have a viewer. --> So let's exit that. --> That's what the data looks like. --> And then I can look at the provenance, --> which we will hear real soon. --> So run that one time. --> I'm going to run this once. --> The only thing that should change --> should be exactly the same data, --> except I should have a new attribute called schema.name. --> Right. --> So now I have a new attribute called schema.name --> and with the value of inventory. --> So now it's ready to go into the actual convert record. --> And so, again, you know, it's going to read --> the flow file as CSV --> and it's going to write the flow file as JSON. --> Because it's reading as CSV, --> we have a record service, you know, --> you can go back to the controller services. --> You see there's a CSV reader. --> The JSON record writer. --> We went through and configured these. --> So we should be good to go. --> So it's going to run once. --> All right. --> Success. --> List the queue. --> Still the same attributes. --> File name is inventory.csv. --> You do notice that it detected a mine type of... --> A lot of processors will automatically try to detect mine types. --> And so this one's application JSON now. --> And, you know, same type of details with file name --> and modified date and those types of things. --> We do have a different file size --> because we went from CSV to JSON. --> And actually now we can view the JSON document. --> So, you know, now it took all of those CSV file records --> and it wrote it out as JSON. --> And if you remember from our Ambrose schema, --> we had store, item, and quantity. --> And so it just followed that pattern --> and started writing out the JSON. --> We want to say, okay, you know, --> the problem with this is the file name. --> It's probably still inventory.json or CSV. --> Yep. --> So when it tries... --> If we were to try to write this right now, --> it would be a JSON document, --> but it'd be written as a file name inventory.csv. --> So we're going to do another update attribute --> where we told it I want to take the file name, --> which is this regular expression, --> and I want to save it, this file, --> as the file name .json. --> So you run once. --> Look at our queue. --> So our file name, you can already see here, --> but our file name is now inventory.csv.json. --> If you are really fancy, --> you could go in and strip the CSV name off, --> the .csv off, and put .json. --> To keep this as simple and straightforward as possible, --> I'm just adding a .json extension. --> And then the last step is to write this JSON file to... --> A directory. --> So that should be here. --> And there it is. --> So now I'm taking this inventory from a CSV to a JSON, --> even the bad data row that has nulls. --> So anyways, you notice that that's actually, --> bad data row did not have any commas. --> It didn't conform. --> So when it wrote it, it just applied null values to the others. --> So if you had a process where you were checking for null values, --> you would throw that record out, --> and you may send it to another processor --> to do ETL steps or whatever. --> But we've taken our CSV and we've made it a JSON document. --> So I'm going to pause there because that is a lot to ingest --> in the last hour, a little over an hour. --> What questions do you all have? --> So are we going to go through a process of --> doing the controllers ourselves, right? --> Because obviously for us to follow along, --> we would have to set those up. --> And I saw how you configured them, --> but I'm not entirely sure how you added them. --> Yeah, so the scenario is you're going to add these --> and start building them in and I'll help you along the way. --> But it's going to be... --> It's very hands-on for the next part to do this. --> And for the scenario though, --> if you want to use the record service, you can. --> The scenario on purpose is set up to be able to be... --> You can use multiple different processors to do this. --> What I'm looking for in the scenario is that thought process --> of here's what my data flow should look like --> if you're going to probably have technical questions, --> you're going to have technical issues. --> But what I'm looking for in the next scenario --> is I kind of thought this whole data flow through --> and I've kind of got it built out --> because you can build the whole data flow --> without turning it on. --> You may have missing relationships or something, --> but you can still build that whole data flow. --> And what we'll do is kind of walk through it. --> And then where you need technical help, --> I'm going to help you with whatever way --> you're designing your flow. --> Because you may say, --> I don't want to do with the record service. --> I want to extract text, for instance. --> And I want to use an extract text processor. --> So I can manually extract the text, --> those types of things. --> You may want to use a record writer, record setter, --> like I did. --> There's a few different ways. --> And so what I'm looking for in the scenario --> is just thinking through how I want to accomplish --> what I'm trying to do. --> And then if you come back to me during the scenario --> and say, hey, I want to use this record writer. --> I want to use this record setter. --> I need help configuring it or help with the schema. --> We can do that. --> Or I want to use extract JSON. --> And how do I do that? --> There's a couple of different ways --> because the last class, for instance, --> they spent quite a bit of time on that scenario. --> And there was five or six different ways --> people were doing it. --> Now, not everyone, I think only one person --> actually completed this scenario. --> But they got all of their processors down --> on the canvas. --> They got most of them configured. --> They applied the labels. --> They applied the naming convention --> and those types of things. --> And then what I did is go through --> and help them finish out the building of that. --> So hopefully that will help in the scenario. --> But again, if you run into any struggles --> or anything else, I'm right here --> during the scenario as well. --> And we'll just talk through it. --> Does that make sense? --> I think Richard, did you ask that question? --> Yep. --> Okay. --> Okay. --> Perfect. --> Perfect. --> So again, I know that's a lot to ingest. --> And we can build data flows --> and some basic data flows. --> But I really wanted us to try to get --> through some of the more advanced --> ways of doing this. --> And it's a little... --> It's a lot to learn real quickly. --> But that's why we have a few hours now --> to work through a scenario --> and help along the way. --> So I think I went over --> what a controller service is about. --> I know that there are still some points --> to learn and things like that. --> But we can get through it. --> But that is a controller service. --> That controller service, again, --> it was just a record reader --> and a record writer. --> It was reading in CSV --> and writing out JSON. --> If I wanted to, --> I could bring in more CSV files. --> I could have set... --> I could set an attribute --> based upon a source of the file. --> So that way I can say, --> well, everything I get from direction A --> gets this schema. --> Everything from direction B --> gets another schema. --> They're all CSVs. --> So that's how you can handle it. --> So that's the beauty --> of those controller services --> is I've now built them and set them. --> And so now everyone can use that service. --> If that was a database, again, --> everyone would be able to use that service. --> We're going to install --> NaPhi registry, --> which has to handle --> our version control of data flows. --> It's got its own service. --> And so once we set it, though, --> we all get to use it. --> So I wanted to make sure --> I went over service --> and those types of things --> because I think they are very valuable --> and it's a very important part of NaPhi. --> So with that being said, --> is there any general questions --> about services I can answer right now? --> And then we can go a little bit --> into provenance, take a break, --> and come back and work on scenarios. --> Any controller service questions? --> What exactly are the controller services --> for, like, specifically? --> Yeah, so a controller service --> is a shared service in NaPhi. --> So I could have --> a database controller service. --> And if I am building --> and that database controller service --> is already installed, --> it's running, it's good to go. --> And I can be a separate data engineer. --> I can come in --> and I need to make a database connection --> because I need to put data --> to a SQL server. --> So instead of me building in --> the connection details, --> where the database is, --> the username, password, --> the tables, all of those things, --> if you have a controller service --> set up already, --> multiple different users --> can use that same service --> in their data flows. --> So I could then --> use that database connection service --> and I could write the data --> to the database. --> But as a sys admin, --> you never had to give me the username, --> the password, those types of things. --> Plus, you're able to control --> who gets to use that connection --> and who did use that connection --> with all the provenance information. --> Did that kind of help answer --> you still have some additional questions about it? --> Yeah, that made it more clear. --> Yeah. --> And so that's why I let off --> with this this morning is --> I know this is probably --> one of the most tricky, --> harder to learn aspects of NaPhi. --> So I wanted to give us plenty of time --> to kind of go through this. --> Any other questions? --> This is going to be kind of not dumb, --> but maybe just more aesthetics. --> How did you get those relationships --> out to the side like that --> where the arrow goes? --> Yeah. --> I mean, like, yeah, --> when I try to do it, --> I don't know how I like to know --> how you did that. --> No, no. --> So if you remember yesterday, --> I said you can click. --> So let me do this. --> Let me take a processor --> and let me walk you there. --> Let me just do a get on something. --> I'm asking because I like --> I like the way that looks. --> No, having to spread the. --> So I'm going to do a solar. --> So I'm going to put two flows --> to get two blocks together right quick. --> So you see this. --> If I double click the line. --> Like that. --> That's exactly what I was trying to do. --> I could not get that to work. --> So click right off of the box. --> And let me do it again. --> So if you know it and then --> all these machines, --> it can be a little. --> And then right above that box, --> I just double click --> and I get my point --> and then I can adjust it. --> OK, OK, that's what I was trying to do. --> OK, thank you. --> Yeah, no, it makes it so much present. --> Better present presentation cleaner. --> Those are things. --> Yeah, yeah, I like it. --> Thank you. --> Great question. --> Actually, I get that question. --> There is there's really not a lot --> of documentation on the some of the --> like like beautification of flows --> and working, you know, --> clicking and doing things. --> And they're always introducing more, --> but they're like minor like features --> that don't really get publicized. --> So a lot of it is just trying to, --> you know, googling around --> or having the experience to work with it. --> And once you start getting this, --> then then you can, you know, --> you see I've got adjusted lines --> here that look different. --> I've got them here. --> I've actually got them all color coded --> just because it's easier to read. --> I have labels on every one, --> you know, to kind of give you --> that visual explanation. --> When you are building this out --> and in your environment, --> you know, you may have a policy --> that says here's some basic design --> principles that you need to follow. --> You know, software engineering teams --> when when I lead software --> engineering teams previously, --> we had to comment our code, right? --> And we had to make sure we had comments --> in and those types of things. --> So, you know, same thing here. --> If, you know, you may have a policy --> that says, you know, --> all these data flows, --> you need to label them --> that needs to make sense, --> you know, those types of things. --> And, you know, beautified --> because you can have a spider web --> of processors. --> And today we will have --> a spider web of processors. --> Like, you know, --> when we're working through this scenario, --> it's going to be processors --> all over the place. --> So, you know, just do the best you can --> and then come back behind --> and just clean up. --> That's usually the best thing --> that I like to do. --> All right, I got it. --> But that was tricky, man. --> I'm like, oh, yes. --> It's easier if you are running this --> on your local machine --> because the latency, --> you know, like you can click --> and then it won't drag --> and you drag too far. --> It never drags at all. --> And I've already got a pop-up --> that told me about latency once already. --> Okay, but great question. --> Any other questions? --> Okay. --> We have a few minutes before break. --> Before we go into the scenario, --> now that we've sent data through, --> let's look at our data provenance. --> So if you remember --> from the hamburger menu, --> you can actually pull down --> your data provenance events. --> So, you know, the component name. --> Here's the actual processor. --> Again, another reason --> to name your component --> something easy to read. --> So you can see, you know, --> all the data provenance events. --> If you have a good, you know, --> good readable name --> you can sort and filter --> and things like that. --> A lot easier. --> You can actually search for events --> and those types of things. --> So let's look at --> the JSON file to a directory. --> So when you click this, --> it's going to compute the lineage. --> And you can actually then replay --> this data of actually --> going through that data flow. --> So I received it from this --> provenance event. --> I received, you know, --> it was get CSV file from directory. --> I can actually look at the attributes. --> I can look at the content. --> And this is when I received it. --> So my next step --> in that provenance event was --> download the content --> from the processor. --> It received it. --> It downloaded it. --> Here's what the content --> looked like after that event. --> And here's what happened. --> Here's the attributes --> and those types of things. --> Modify the attribute. --> So if you remember --> the next step was to set --> the schema name. --> So here is that --> during that whole data flow --> here is, you know, --> it did an update attribute. --> I can look at the attributes, --> set the schema name as inventory. --> And I can look at the content. --> The content should still be the same --> because it's still a CSV file. --> But if anything changes, --> I can replay that and see it. --> Then the content was modified --> using the convert record. --> Convert CSV to JSON. --> I can see now that, you know, --> it came in as CSV. --> It comes out as a JSON document. --> I can look at the content now. --> Here's the input claim. --> Here's the output claim. --> So input was CSV. --> Output JSON. --> So that one processor --> took that document --> from a CSV to a JSON. --> I was able to replay this --> and see exactly --> like what changed, --> how it changed --> within that single processor. --> It received that JSON document. --> So it's a download event --> because it downloaded it --> from that processor. --> But now I have a JSON document --> that it has downloaded. --> I think of download as like --> the download is like the connection. --> So, you know, --> connection was to receive --> from processor to processor. --> So you have a download event. --> This should be the set JSON file name. --> So, you know, in that flow --> we set the file name --> of the JSON document. --> And we look at the attributes. --> We now have inventory.csv.json. --> And then after it had that JSON --> it received it --> and wrote the JSON file to directory. --> Here's where it put it. --> As well as here's the attributes. --> Here's the content. --> You know, the input of 745 bytes. --> Output 745 bytes. --> Same identifier. --> All set. --> You know, that type. --> And then it dropped --> that data flow. --> So it was done. --> So the whole event duration was .006 seconds. --> And you can see, you know, --> the final content, you know, --> replay that, those types of things. --> So one of the other nice things --> is you can actually download --> the lineage if you want. --> I've really --> I've never seen a lot of use for this. --> But, you know, if you were working --> for Center for Medicaid Services --> a while back, I was helping them --> for a while. --> And I worked in the fraud, --> you know, division of CMS --> where, you know, we would need --> to turn over a chain of custody --> for data that we had received. --> And, you know, to FBI and others --> to prosecute, you know, --> like false claims --> and those types of things. --> And in doing that, you know, --> we would have to download --> some of this lineage information. --> Now, when you click here --> to download lineage, --> you know, it's just basically --> giving you an image of what happened. --> I don't really find that useful. --> You know, but you --> were extracting, you can take --> all of these events --> and send them to either --> like corporate governance --> for long term storage --> or, you know, extract them --> out of the providence events --> and notify. --> That's how we would usually --> handle those. --> But I don't know, like, --> they have the image here. --> I just don't I just don't think --> it's that useful. --> You know, you may. --> So, you know, you do have --> capability, you can, you know, --> you can go strictly to that event --> and pull all of the things --> that happened, you know, --> those types of things. --> I've never had a use for the image, --> but you may have a use, --> but it's there in case you need it. --> So, you know, the beauty of this --> is we went through --> on the data provenance from --> start to finish with that data flow. --> We can look at exactly --> when we received it. --> We can see exactly --> when it was converted. --> We can see the before --> and after of that. --> We can, you know, see it move along --> from processor to processor --> to processor until, you know, --> until it's out of NAFA. --> The nice thing is, is, you know, --> that's built in. --> You can go and replay these events. --> We use it a lot for diagnosing --> data flow issues, --> because if you can imagine, --> you may have a valid data flow --> that is, you know, picking data up, --> you know, putting it somewhere, --> but, you know, you still run --> into an issue with, --> you know, processor. --> I've seen a processor --> handle characters incorrectly before, --> and we just, you know, --> but it never, never had an error. --> And so when you look --> at the data provenance events, --> you can actually go through --> and replay that and say, --> hey, wait a minute, --> that processor is malfunctioning. --> It's not reporting any errors, --> but we're seeing weird things --> within the data. --> So, you know, there are some --> some use cases for that as well. --> Provenance events for investigations, --> those types of things, --> you know, and, you know, --> just to provide that chain of custody --> to the data. --> So we went over provenance --> yesterday a little bit. --> We're going to touch on --> some of these things as we go along, --> but, you know, your flow --> from yesterday should have generated --> provenance events as well. --> And so if you want to pull up, --> you know, your flow and run it, --> you can see provenance events as well. --> So, you know, that's where you access it. --> We talked about it yesterday a little bit. --> We're going to talk about it --> a little bit today --> and those types of things. --> So, and then, you know, feel free to --> explore the menu. --> You break something, --> the beauty is I'm here to fix it. --> And so look at your flow --> configuration history. --> Look at the node status history. --> So, you know, here's how much --> free heap space. --> Again, we're on a Windows box --> and using Java, so it's going --> to be all over the place. --> Number of flow file repository --> free space, right? --> And so feel free to go through, --> look at all these, you know, --> things here because, you know, --> from I know we have some --> sysadmins on the call. --> Tom, you know, you may be interested --> in some of these metrics. --> Now, you know, there's an intro --> or service for Prometheus. --> So, you can actually have all --> of these events going out to Prometheus --> using your Grafana dashboard. --> You can see some of the same thing. --> You know, you can, you know, --> you can send these provenance events off. --> You can send the status --> and all of those metrics off as well. --> So, you know, it's here if you need it. --> If you're processing a large --> amount of files, you will --> definitely look at this because --> some processors consume --> a lot of resources. --> We were working with one --> of the bigger resource hogs --> of the whole system yesterday. --> The unpacking and packing --> of zip files, you know, --> can be very, if you can imagine, --> we've had folks, I've seen folks --> like trying to unzip and zip --> 5 to 10 gig files. --> And so, you know, it's trying --> to throw all of that data --> into memory, trying to unzip it, --> trying to then take, you know, --> 5 gig that turned into 20. --> And, you know, their system --> crashes, right? --> And so, there's smart ways to do this. --> And so, you know, you just got --> to work through it. --> As a sysadmin, I find the status --> history a lot of good information. --> And then we went through --> some of the other, you know, --> already. --> But feel free to click around --> and, you know, just explore the UI. --> What we will do is take --> our first break of the day --> and then when we come back, --> we're going to start working --> on our scenario. --> I do expect this scenario --> to take a while. --> And you're going to think --> that we went from easy data flow --> to going over the deep end. --> But I'm here to help. --> I'm here to walk through it --> with you, talk through it. --> I can see everyone's screen. --> So, you know, just bear with us --> and we'll get through --> some of these other scenarios --> I have planned. --> And then we'll go into probably --> some registry later today. --> And then tomorrow we'll wrap up, --> have a little test. --> Mostly like an open book Q&A --> and, you know, clean up --> our other data flows --> and go from there. --> But again, you know, --> when you're building this out, --> the scenario, I'm looking for --> mainly how you want to do this. --> I don't necessarily want to see --> a functioning data flow. --> I want to see the thought process --> of here's how I plan --> to accomplish the task. --> And here's the processors --> I would use and, you know, --> the connections and things like that. --> So with that said, --> unless there's a question, --> I am going to get something to drink --> and we'll go to on our first break. --> It is 11. --> So we will be back at nine. --> Okay, let's do that. --> All right. --> So let's take a quick 15 minute break. --> I will see everyone back here --> in 15 minutes and we will --> start going through some scenario. --> Give everybody a few minutes to get back. --> I was checking to make sure --> you all have a scenario --> and it looks like it's installed. --> So. --> What happened? --> What happened? --> Okay. --> I'm going to start with the C first. --> We'll get started up here in a minute. --> So the data flow --> I was using for the controller services. --> I have a template of that --> that you can use for reference. --> I didn't upload it yet --> because, you know, --> I kind of wanted to walk --> through my flow first. --> But, you know, for the scenario --> we can upload it --> for some assistance --> once everybody gets back. --> Thomas, it looks like you've got --> one of your lines figured out. --> Travis got it all figured out, I think. --> Yeah, I see. --> I'm looking at Travis. --> I'm like freaking dude, man. --> Still, it's not perfect. --> One of my lives is a little crooked --> because it's getting those points --> to straight. --> It's not very easy either. --> But I don't know how Travis did it. --> He's I haven't gotten --> the controller aspect of it yet. --> No, yours looks great. --> I'm jelly. --> Yeah, the controller aspect --> is like I said, like that's why --> I wanted to kind of lead off --> with that this morning. --> This will potentially take the entire day --> up until after lunch, at least. --> And but, you know, kind of throws --> us over into the deep end. --> And once we get this figured out, --> you have 90% figured out. --> There's some other ways --> of doing things. --> I know we asked some questions --> about some Python stuff like that. --> And I'm going to probably --> go into that tomorrow. --> But once we get controller services, --> you know, kind of squared away, --> you know, you should be --> you should be set for building --> your own not five flows. --> And this time. --> All right, let me see. --> There was a about to pull up a link. --> Hey, Maria, are you still there? --> No, she dropped off. --> Hang on. --> There's a scratch pad that we use. --> Oh, here we go. --> Okay, once you get once you get --> the hang of moving this around --> with those different. --> I don't know what to call --> way points or whatever. --> When you double click on the line, --> it's a little easier once you get it. --> Get the hang of it. --> Oh, that's okay. --> Okay, so in teams, --> I am posting a link. --> Hopefully. --> Hopefully you're able to click on it. --> It's a Dropbox link --> and it has the scenarios already --> uploaded to your uploads folder. --> But it also has the flow --> that I just worked on. --> So I saved that. --> So that way you can use it as reference. --> So if you are able to, --> you should be able to bring --> that Dropbox link up and download. --> It's two zip files. --> Not five scenario. --> Not five example me. --> And if you notice at the bottom --> of your screen.