2:05:45
2024-05-20 09:46:48
2:09
2024-05-20 12:30:32
2:41:18
2024-05-20 12:33:23
1:36:58
2024-05-21 08:00:54
5:24:36
2024-05-21 10:06:11
3:24
2024-05-22 06:36:04
9:25
2024-05-22 08:03:05
40:22
2024-05-22 08:14:12
2:49
2024-05-22 09:47:03
1:48:29
2024-05-22 09:50:24
1:57:28
2024-05-22 12:09:49
Visit the Apache Nifi GROUP 2 course recordings page
WEBVTT--> Oh, excuse me, mister. --> How do I avoid the miss? --> Did I lose connection or? --> Yeah, we lost you, Odie. --> My bad. --> Oh, no worries. --> I think I got the wrong URL. --> Yeah, so what? --> Like when I put it, my bad, --> it keeps muting for some reason, --> but when I put in this one, --> it'll send me to the, --> so you know the original NAPA. --> Oh, so let's do this. --> Bring open a new tab. --> And if you can see my screen, --> I also have my NAPA up, --> but I have a new tab --> and the URL you go to is HTTP, --> not HTTPS because we're working on an unsecure port. --> So all you do is go to HTTP colon front slash front --> slash same IP address, 127.0.0.1, colon, --> and we're going to use a backup HTTP port. --> So we're using one eight zero eight zero. --> So that's one two seven zero zero one, --> colon one eight zero eight zero. --> Then you need to do a front slash --> and then you want to type NAPA dash registry, --> because if you don't, --> it's just going to take you to a page not found. --> Did that work for you? --> I think so, yeah. --> Yeah, you got it, you got it. --> All right, no worries. --> Leroy and, --> oh, I don't even see Richard. --> He might be called into it. --> Oh, there we go, Richard. --> Let's see what URL are you going to. --> So let me see. --> Leroy, give me just a second. --> We'll take a look at yours. --> 127.0.1.8.0.9.5.0.3. --> Do you have both of the, --> you should have two command line boxes --> if you go down to your task bar. --> There you go. --> Here, I'll do this. --> I'm going to, I'll just take a look. --> Do we still need to keep the other one? --> Yes, we do. --> Well, luckily, it looks like we only closed NAFA. --> Richard only didn't start it. --> So, yeah, you need to keep your run NAFA going. --> Oh, I think Richard missed this step. --> Yeah, leave the other one running. --> We're just going to run two of them. --> I think Richard got on the call or something, --> so I'm going to just extract his and run it for him right quick. --> I'm having issues trying to rerun the original NAFA. --> I thought that. --> Okay, give me just two seconds and I will take a look at yours as well. --> Okay. --> I'm going to start Richard's and that way his will work and log in as well. --> So if you're looking at my screen, Peter, --> you should have two command line boxes open. --> Okay. --> And what I did is my Richard here had his already running. --> And so I just minimized it and went and extracted the NAFA registry zip file, --> went into the bin directory, double click run NAFA registry. --> It's going to open a new command line box. --> And when you see the Apache NAFA registry and the version code, --> then it should be up and running. --> Yep, I see that one. --> Okay, perfect. --> And so I'm going to talk to you about 0.1. --> There we go. --> All right, so Peter, yep, it looks like you're in. --> All right, and I know Richard is going to go. --> All right, looks like everyone made it. --> So this is registry. --> Like I said, it is a sub project to NAFA. --> The same folks that maintain NAFA, --> also there's committers that maintain registry as well. --> Registry is a sub project. --> Like I said earlier, it's a complementary application to NAFA. --> It's like that translation layer. --> So what it does is it provides a central location for the storage --> and management of shared resources across one or many NAFA instances. --> Remember, NAFA is scalable to hundreds, thousands of nodes. --> And so, you know, I've seen multiple scenarios, --> but usually what I've seen is like you may have a dev system, --> a prod, a dev-tested prod. --> And depending on the system, it reaches out to NAFA registry. --> It grabs the flows it needs that have been checked in from dev. --> It grabs those flows and runs them. --> But, you know, you may have one registry for all, you know, --> three different networks, or you may have a registry for its own --> and utilize the backing of GitHub or GitLab to replicate that code --> to other systems and space levels, those types of things. --> But registry is there to keep track of all of our data flows --> for the most part, right, for storing and managing version flows. --> It's integrated within NAFA completely. --> We are going to set that up because right now, --> when you work within NAFA, there's no hint of registry. --> But there's a couple things we're going to do, --> and it's just going to flip all kinds of switches underneath. --> So anyway, so that is NAFA registry. --> What I like to do is there's a little wrench icon at the top right. --> If you can click on that, that's our settings. --> And the settings in this, again, is very basic and very, you know, --> it's a translation and a place to store stuff. --> So if you go to the settings, you can actually click New Bucket. --> If you can click New Bucket, say, you know, name your bucket. --> I'm going to name mine New Bucket. --> Description. --> Name it a description. --> And say Create. --> And so after that, you should see at the bottom right, --> a Success Bucket is created. --> Once we have this bucket, we can delete it. --> We can manage it. --> So if your bucket is created, --> you can click the little pencil on the right to manage that bucket. --> When you go into that bucket, you'll have your name, description. --> Again, you can share that. --> Permission settings, where you can actually make this publicly available. --> Bundle settings, you know, allow bundle overwrite. --> You know, it allows the released bundles in the bucket to be overwritten. --> And then also the big one is the policy. --> So once this is installed and up and running on your system, --> you're going to have policies in place. --> You're going to have username, passwords, and identity management, --> certificate, cat cards, you name it. --> And so, you know, there will be policies set up that says, you know, --> Tom can log in, but he can only see this bucket. --> Or Tom can log in and see all the buckets. --> He can then, you know, maybe get another permission to allow him to delete --> buckets and those types of things. --> So every one of these actions can be broken apart --> and assigned to different individuals and groups. --> And it goes into the multi-tenancy. --> So that's the reason I mention these things, --> is we haven't talked about multi-tenancy yet. --> But I like pointing out some of these things that we see in the application. --> So when we do talk about multi-tenancy, it kind of brings it back together. --> But anyway, so you have those policies. --> We can't really create new policies. --> We don't really have a I damn solution. --> You know, any kind of identification service, you know, --> we're not running key cloak or any of those types of things. --> So, you know, we have just a regular page. --> We don't even have a username and password. --> So anyway, so you should have now a new bucket. --> That bucket should be graded. --> And you have NAFA registry up and running. --> You should be good to go. --> So with that being said, that is the majority of NAFA registry. --> If security was enabled, we could add users. --> If we had a whole identity system, we could then enable that. --> We could change our usernames. --> The about is pretty simple. --> There's just not a lot of things, you know, to go through with registry. --> But anyway, so we've got a new bucket. --> What I like to do, just because I don't like typing, --> is I like to take the URL that we went to, --> not the whole URL because it's going to add administration and workflow, --> but the HTTP 127001, call it 18080 slash NAFA registry. --> I just like to copy it because we need to tell NAFA where our registry is at. --> So, you know, if you can actually just, it's pretty easy to remember. --> You can type it out as well. --> But let's go back to our NAFA system. --> Let's go to the main canvas. --> And you all should, you may have three, four different processor groups. --> I've got a couple. --> But, you know, one of the things you should be able to do --> is go back to the main canvas where this is laid out. --> So registry is a controller service, right? --> So, you know, you don't want everyone having to configure their own registry. --> You know, you have a registry setup where people can configure buckets, --> but the registry service is shared. --> So, you know, we've got to configure a new controller service for this. --> Luckily, it's a lot easier than a CSV writer, reader, JSON, you know, reader. --> So what we do, go to your hamburger menu and go to controller settings. --> So if you go into your controller settings, you know, you have a general tab, --> a management of controller services, you have a reporting task tab, --> and you have a registry client tab. --> And that's where we want to be. --> And so this controller service is registry only. --> So we need to tell NAFA how to get to registry. --> So if you are on the registry clients tab, --> you should be able to see a little check box to the right that says, --> register a new registry client. --> So click on that and give it a name. --> So we are going, I'm going to just put, you know, something easy. --> Oh, that's the wrong screen. --> Oh, it was the right screen. --> Oh, latency was horrible on that one. --> I'm going to give you the description and the type. --> So right now we only have, you should only have one drop down, right? --> You know, some controller services, you will have, you know, --> a drop down of 10, 15 different selections. --> But luckily for registry, there's only one. --> Now it is set up for a drop down because in the future, --> you're going to be able to, you can still run multiple registries. --> You're going to be able to run multiple versions of a registry. --> So anyway, so NAFA registry client one, two, six selected and I'm going to say add. --> But if you notice, we really didn't tell it anything where, --> how to even get to our registry, --> which is an enhancement request I put in because we should be able to tell --> how to get to registry when we're creating. --> So when you create this, you should have a little yield sign on the left --> telling you that the URL is invalid because the URL is required. --> And so what we're going to do is go to the far right --> and edit our registry client information. --> And then so if you go to the properties, --> you should see URL and SSL contact service. --> The only one we need is a URL. --> So I'm going to copy, I'm going to paste the URL that I copied from my NAFA registry. --> I'm going to take that. --> I want to make sure that after the colon 18 0 8 0, --> that I have a front slash NAFA registry. --> I don't have another front slash. --> I don't have any of the other, you know, some folders that's listed there. --> You know, have none of that. --> So just make sure you have slash NAFA registry. --> And then when I have that, I'm going to say, OK, update. --> It's going to think a minute because it's going to reach out and talk to it. --> And it should be good to go. --> So you should no longer have a yield sign. --> So with this, you know, with this controller service, though, --> we don't have to start it and the configuration is a lot easier. --> So once that, let me make sure everyone has that. --> I'm going to look around. --> There's looking good. --> Petersons, Richard. --> OK, Leroy, are you having a? --> Yeah, it looked like it was working and then it's not. --> OK, let's let's say look. --> So you're on the wrong port up there where it says colon 80 80. --> You want to do one eight zero eight zero. --> Put one in front of that. --> And this day three of me talking so much. --> So it could also be my voice is just like starting to crackle. --> But yeah, it's 18. --> Perfect. --> So now that you're there, let's go ahead and go and create a new bucket. --> So get your rich icon in the top right. --> And say new bucket. --> All right. --> So put in a name for your bucket. --> Whatever you want, bucket works. --> Put in a description. --> OK, and say create. --> Perfect. --> And success. --> The bucket has been created. --> And so this is the name of your first bucket. --> If you want to go over to the pencil on the right, that's how you would manage this bucket. --> You can change the names. --> You can do all kinds of stuff here. --> If security was set up, you could add users to this bucket and groups and those types of things. --> You know, to allow for that multi-tenancy of registry. --> But for now, this is what the settings that we have. --> So if you can just go ahead and close that. --> And I think you're good. --> So go back to your NAFA. --> You know, that one. --> There we go. --> And go back to the parent, your top level canvas. --> There you go. Perfect. --> You can go to hamburger menu, controller settings. --> And then over there on registry clients, click that tab. --> All the way to the right, click plus. --> You got it. --> Name of your registry. --> This can be any name you want. --> Description of that registry. --> Any description. --> And then say add. --> So if you notice on the left, you got a little yield. --> That's because of URL, right? --> So that's it. --> You got it. --> I got properties. --> URL. --> And then we want to post the registry URL. --> So I'll tell it to you. --> Http colon 127.0.0.1 colon 18080 front slash NAFA dash registry. --> Okay. --> Update. --> Perfect. --> Go ahead and hit the refresh on the bottom left. --> You hover over your yield. --> Oh, Http. --> Let's see. --> There we go. --> It's hyper hyper text protocol. --> Okay. --> So Http. --> All right. --> Perfect. --> Go ahead and close that window. --> And you are now where we are. --> So let me bring my window back open. --> Okay. --> So that's it. --> That's all it takes to get NAFA registry installed up and running and NAFA to recognize the registry --> for us to use. --> So it's a very easy process, I feel like. --> So, you know, there's a couple of nuances along the way. --> But getting registry up and running is rather quickly. --> But now we've got our NAFA configured, our registry is up and running. --> Let's check some code in. --> So if you right click on your processor group, you should see a new category that says version. --> You know, it had configure, variable, start, stop, all the ones that we've been dealing with the last couple days. --> But now we have version. --> So if you click on or don't click on version, but you hover over version, you should be able to see something that says start version control. --> So click on that and it's already going to populate with your test registry, the name of the registry you gave it. --> If you have access to multiple registries, it's going to list those. --> Then your bucket should be already populated as well. --> I named mine new bucket. --> So what it did is, you know, it reached out and got a list of buckets that I have access to and pulled those in for me. --> And then now I have flow name, flow description, and version comments. --> So for flow name, I'm actually just going to put first sample flow. --> I don't say underscore and those other things. --> I'm a heavy Linux user. --> I don't really like Windows, so that's why you'll see like no capitalization. --> I try to keep capitalization out and I don't like using spaces. --> But anyway, so I need my first underscore sample, underscore flow, you name it, whatever you want. --> Flow description. --> Again, this is a flow description. --> So, you know, it's whatever description, you know, you like. --> Now, with all that being said, if you have policies in place for, say, software engineers where they have to comment code, --> they have to comment it before it gets checked in. --> And actually I know most organizations will at least ask that you please comment your code before you check it in. --> The comments that go into the version comments are comments that's also going to be stored in your version control system. --> So registry is great and all, but how do you package this with the rest of your software? --> How do you package this as part of your Ansible and Terraform and those types of things? --> So, you know, registry is backed by GitLab or GitHub or Azure DevOps. --> You know, whatever version control repo you're using, NaFi usually knows how to deal with it. --> But anyway, so the version comments that go in here is going to be seen in NaFi registry. --> But when registry automatically checks this in to GitHub, for example, those comments will be there as well. --> Because, of course, it's going to check all these JSON documents it needs to run these flows into registry. --> Now, I mentioned NaFi stateless, for instance, and I've mentioned Minify and some others. --> Those also use registry. --> So just because they are headless, there's no UI, you know, they're built to run on the edge or in a cluster or as a microservice. --> When deploying your data flow in that type of fashion, it will actually come back and read the NaFi registry to understand what flow it needs to run. --> So keep that in mind as well as NaFi is, you know, a user of registry, but all the other components are users as well. --> Registry is that sole core that handles all the versioning control within the NaFi ecosystem. --> And then, you know, you'll see in like GitHub, if you have a new project in GitHub where this is being backed up to, you know, you'll have buckets that are separated by directories and things like that in GitHub. --> Setting up GitHub or setting that up is not part of this class, but there is tons and tons of examples online of how to connect it to all these different, you know, versioning control. --> And actually, all it is is a as a configuration block in your config file of which service to connect to. --> So anyways, these version comments is going to be what is sent out. --> So I'm going to just put this is a test comment. --> Then we'll hit save. --> So what changed with our process group? --> You may have noticed a small little change that is different than the other process groups. --> But if you look at the top left, you will see a little green check mark. --> That means that that data flow has been checked in and there's no updates or changes or versioning control is stopped on it or anything else. --> That means that little green check box means you have the latest, greatest from registry that it knew about. --> So if you go back to our registry and hit refresh on your buckets. --> Oh, there we go. --> And if you go to your registry, you should have, you know, just click on the top left, not by registry or explore registry now. --> There's a little drop down that says all buckets or whichever one you need. --> We only have one, so we just say all buckets. --> We should see now our first sample flow in new bucket. --> So, you know, it's going to give us the bucket identifier. --> Everything gets you ready. --> It's going to give a flow identifier description of the bucket, of the flow, not the bucket, in any kind of change log. --> So just like we have that provenance in NIFI, keeping track of all those changes, registry is keeping track of any changes as well. --> And so we'll see that in just a minute. --> So with change log, you can click the refresh. --> Version one was committed two minutes ago by anonymous. --> So because that's the username that I'm logged into with registry is anonymous. --> And so that's the who would be checked in. --> Once you have, again, once you have identity and security set up, you'll see the actual name of the person, however that's set up, you know, here. --> You also should be able to notice that we have an action. --> So we can import a new version of this flow. --> So if we have a updated JSON version of this flow, we can import that. --> So instead of checking it in through NIFI, we can just import the flow. --> So somebody could have sent me the flow and, you know, I updated it because they built it on another system. --> Or you may want to go ahead and instead of having templates in your NIFI instance, --> you have some flows that act as templates that you can just upload from other sources. --> So that's why we do upload. We have export version. --> You know, you can click export, check the latest version and click export. --> And you're going to get a JSON representation of this flow. --> So it's basically the same document that, you know, I sent everyone where you were creating a processor group and you imported that JSON flow. --> This is how you could import that into another system that's not connected to this registry, for instance. --> We also can delete the flow. And if we were to do that, NIFI would recognize that and let us know as well. --> All right. So what I want to do is go back to NIFI. --> I want to go into my checked in flow. --> And I'm going to make some changes. --> So, for instance, I'm going to copy this group. --> This label and put it. --> Always bring the front. --> Me, I'm going to bring this one to the front. --> All right. So I've made some changes. --> I might, you know, change even the position of the processors. --> And then, you know, I'm going to add just a processor to the canvas. --> So if I do that, you'll notice even on the bread crumb trail on the bottom left, there is now an asterisk. --> And so NIFI is letting you know that this flow was changed. --> So you need to check it in. --> Right. Like, you know, the flows change. --> You need to basically click save. --> So I'm back at my main canvas. --> And when I'm looking at the main canvas, I have my first sample flow. --> And instead of a little green check box, I've got an asterisk. --> So tracking to, you know, the first sample flow version one, you know, level changes have been made. --> So that tells you which flow in the bucket as well as the bucket name has been changed. --> So you can come back later. --> And if somebody else changed something, you can see that or you may have changed something and then forgot to commit it. --> And when you run it elsewhere, you know, it may be confusing. --> Wow. So what we like to do is check that back in. --> So you can just right click version and now you have multiple options. --> You can commit local changes. --> You can show local changes or revert all the changes you made and keep the one in source code. --> You can also just stop using version control on this process group. --> The reason that's there is you may, I know of the instance where people will connect into the registry, --> check out the flow and then stop versioning control, work on that flow, --> and then start versioning control later on a new branch because maybe they did like we did. --> And we kind of took some of the scenario, you know, for the controller service. --> And we, you know, we used a lot of those components in the second scenario we worked on. --> So, you know, that's a use case for that. --> If you make, you know, maybe you're working in a high latency environment, --> you may accidentally made some changes. --> You don't really know why you can revert those changes, right? --> Or just show the local changes. --> So what I like to do is just commit local changes. --> Again, we've got to do some sort of versioning comment. --> This is a comment of why, you know, we've done this. --> And now we've got our green checkbox. --> So if I go back to registry and I hit refresh on my changelog, --> I can see that version two has been created a few seconds ago in the comment about version two. --> So I can go through this changelog and see what has changed or what's changed --> and who changed it and those types of things. --> I could also, you know, this thing here, let's create a, --> what I'm going to do is actually create a new bucket. --> So if you can go to the top right again, we're going to go and, you know, create a new bucket. --> Instead of a new bucket, I'm going to put bucket two, bucket two. --> And I'm going to say create. --> All right. --> So you should now have two buckets in your registry and one flow with two different versions. --> And so what I like to do next is go back to your 9.5 canvas and select one of those other process groups. --> So, you know, I'm going to use the CSV JSON that we were working on and I'll say start version control. --> But what I want to make sure is I'm not putting it in my new bucket that I originally created. --> I'm putting it in the second bucket. --> And then save. --> Awesome. --> So now if you go into registry and go back to the 9.5 registry explorer, I'm just hitting refresh. --> You should see two buckets. --> In those buckets, you should have one flow each. --> You can you can store all your flows in one bucket. --> But, you know, I'd like to keep these separate. --> It's totally different, you know, scenarios, those types of things. --> So for this use case, I like to keep everything in its own bucket. --> But if you have a lot of related flows, you may put them all in one bucket and call it a day. --> So if you refresh registry, you should notice two different buckets. --> So I'm going to pause there and check in. --> How's it going? Peter, did you get it figured out? --> Let's look and see if Peter. --> That was the second one. --> I forgot how to change the URL for it. --> Okay. --> So you've got two buckets. --> You don't need to worry about URL. --> Okay. --> Yeah. --> So let's go back to your 9.5 instance and you have your version star version control. --> The URL went into 9.5 and needed to know which registry to go to. --> But because you've already set that in its controller service, you can set it and forget it. --> That was some info, Marshall. --> Yep. --> And then just name your flow. --> Give it a description. --> There you go. --> All right. --> You should have a checkbox. --> Perfect. --> So now you have, you know, two different buckets, each with its own flow. --> If you wanted to, you could check the CSV to JSON in the analyze weather data again as well. --> And you can put them in their own bucket or you can put them in your original bucket. --> It's up to you. --> But you look like you're good to go. --> Let's see. --> It looks like he's got it. --> Okay. --> Did anyone have any issues with registry, you know, setting it up or not being able to check their code in? --> Any issues like that? --> Perfect. --> Perfect. --> Yeah. --> It seemed like that was pretty straightforward. --> It seemed like everyone got it. --> There was only a couple of config questions. --> So, you know, nothing there. --> So I think we're good to go with registry. --> Again, like it's not something that is just massive to talk about because it's a pretty simple sub project to notify. --> Now, when your sys admins and your CSV folks or maybe even your DevSecOps folks work on getting this in, --> they are going to use probably a versioning control system like Azure. --> I know you guys use on-prem Azure DevOps, you know, and it also has a code repository, I think. --> So, you know, so when they get that configured, you know, you'll actually when you check things in, it'll actually be going, you know, the end is a regular versioning control system. --> And that's, you know, like I said, you may have a need for stateless NAFA where it's basically a microservice. --> It's just the NAFA engine and you tell it, OK, NAFA engine, run this flow. --> And that's all it does. --> So, you know, you may have that use case. --> You may have Minify, which actually I know you all will use Minify because the last class is heavily interested in it. --> Minify, if you have never heard of it, it's all it is, is it comes in two flavors. --> There's a Java version and a C++ version. --> Now, the Java version of Minify is exactly almost exactly like NAFA, but without the UI. --> So what it can do is execute your flows and it will actually accept most of the processors that you have for NAFA. --> And all it does is it's, you know, more of those edge use cases or agent type use cases. --> I've seen Minify working on Windows laptops or security log events and stuff. --> You know, I now only work from the board, but I have founded a company who uses Minify for F35. --> You know, doing FOD detection for F35s. --> So, you know, when I built that product, I used Minify. --> And so, you know, Minify is an edge case for the most part. --> And so Minify will look at registry and say, OK, you want me to run. --> I'm just running the NAFA headless and here's the flow I'm going to run. --> And, you know, just run it from the edge where it takes the less resources. --> And then also the other version of Minify is Minify C++. --> It only has like 10 or 15 processors, but it also can pull the data flow you want it to run from registry and run it on the edge. --> So, you know, as you can imagine, Minify is not only building your data flows that you will run, --> but Minify is also used to build the flows that some of the edge cases and some of the headless engine only use cases run these flows. --> So, you know, Minify controls that. --> So because it is, you know, the centralized UI for all of these, you know, services and actions, you know, it utilizes registry for, you know, all of the versions. --> So that is registry. --> There's not a whole lot to it. --> Anyone have any questions about registry, you know, in itself? --> All right. --> I didn't think there would be many. --> So that's a good thing. --> What we're going to do is now kind of go into some PowerPoint. --> I'm going to try to get through a couple of slides. --> We're going to talk about some multi-tenancy and things like that. --> There's not, you know, there's not a whole lot of Minify interaction right this minute. --> So if you can just bring up my shared screen, we'll go through some of these PowerPoint slides as quickly as possible. --> We'll take a break here soon. --> Our last break of our class and then get back and, you know, go over, you know, if you want to work on another scenario, survey, any administrative stuff, things like that. --> So, excuse me. --> All right. --> Where is my slides? --> All right. --> I think where I left off mid-Monday was actually installing NIFA. --> But, so, again, hate to bore everyone with PowerPoint. --> But I want to kind of go into some of these key concepts. --> You may not be the person that is actually setting this up. --> But having an understanding of why, you know, why these things operate like they do, you know, will, I think, be beneficial. --> So I like to just spell it out. --> I don't like to just sit here and read words off of a slide. --> You know, sometimes I will. --> But I want to, you know, just abbreviate and highlight some key aspects. --> But, you know, multi-tendency is efficient resource utilization, first and foremost, you know, with the baked-in security. --> You know, so depending on how you all set this NIFA instance up, you know, you're going to, you know, you're going to work on the same NIFA instances as potentially other organizations and other units and those types of things where traditionally, right, usually that, you know, sometimes those resources are all within the same unit or all within the same organization. --> So NIFA is set up for that multi-tendency use case. --> And so it allows different teams, different users to share the NIFA, you know, depending on, you know, the collaboration and the sharing, you may share your flows with other organizations that's also using that NIFA. --> And allowing them to have access to your bucket, for instance, to check out your data flows. --> You know, you may give read permission to your process, you know, your whole process group so someone else can check out your flow and see a little bit how you built it, but they can't turn it off. --> They can't interact with it. --> They can't clear any states or anything else like that. --> And so, you know, it adds to that, like, enhanced security and isolation requirement that's part of multi-tendency. --> It's cost effective, you know, because you're sharing the resources, you're having to spend a lot less on the resource to run NIFA and others. --> So, you know, it does have some cost savings built in. --> You know, you administer one NIFA instance, not 100 per 100 different organizations. --> So, you know, it just keeps it a lot simpler. --> I kind of go into the importance of scalability and those things. --> Again, I think everyone kind of gets it that, you know, NIFA was built, you know, for scalability. --> I mean, hundreds and thousands of nodes. --> So, with that, we had to build in, you know, as part of this multi-tendency, the scalability aspect of NIFA. --> So, you know, that way, you know, as we ingest more data flows or as you ingest more data flows, you got more data coming in, --> you're able to grow to add more NIFA instances, but you're all still sharing that same resource. --> You know, because you have that security set up, you have some of the underlying architecture built to do those things. --> You know, you just keep adding resources as your data needs go and therefore, you know, saving time and money and energy to manage all that. --> So, you know, the scalability factor is very important. --> I have an image here. --> This was actually pulled from Azure that because Azure is setting a lot of these up, --> mentioned on the first day that if you were trying to set up the security and some of these things, it's all command line. --> It's all configuration files. --> There's no GUI. --> You know, there's some automation now, and there's quite a bit of automation, but, you know, you're still working on the command line. --> So, you know, Azure is working to make it better, making it where it's part of a service, --> and that way, you know, it will tie into the rest of the services as well as, you know, your Active Directory and plugging into those groups and those types of things. --> So, you know, I like to call that image out because, you know, it's not necessarily very public information that Microsoft is doing this with NYPAI, --> but, you know, they're a big user of it and, you know, working to make it a service. --> I think I kind of went into what multi-tenancy is, but, you know, multi-tenancy is an architecture in which a single instance of software serves multiple users or tenants. --> NYPAI is built for it. --> It's a data isolation. --> It says data is securely isolated. --> We can isolate data flows. --> We can isolate complete, like, whole processor groups, those types of things. --> So, you know, it does provide that isolation, that enhanced security that's part of multi-tenancy. --> And we talked about the scalability and flexibility. --> So, some of the things to keep in mind, we have talked about the provenance and data lineage, you know, for the last few days. --> But, you know, during a multi-tenancy environment, as you can imagine, you have multi-different organizations generating, you know, provenance data. --> So, NYPAI provides, though, comprehensive provenance data and data lineage capabilities that allow for detailed tracking of the data as it moves the system. --> You know, that's always part. --> So, that's also included as part of that multi-tenancy. --> So, you know, that fine-grained detail, that chain of custody is still present even in a multi-tenancy environment. --> So, you know, if that tells you that, you know, a single person, though, that had access to all of that data could potentially see all the organizations and everything else. --> So, you know, just keep that in mind as you offload your provenance data into your corporate provenance data governance solution, if you have such a thing. --> You know, the administrators of that could potentially have access to a lot of data provenance. --> But even some of those systems, you can, is multi-tenancy as well. --> And so, you know, it could be only certain groups have access to certain groups of data as well, depending on the corporate policy. --> So, multi-tenancy, you know, process groups and now finally have been organizing data flows and those types of things. --> But each process group can configure with specific resources and string that no single tenant can monopolize shared resources. --> This is kind of key because, you know, we're seeing this newer Kubernetes environment. --> I know it's been out for a while. --> But, you know, in Kubernetes, we'll see, you know, somebody deploy a container that uses up all the resources because proper resource allocation was never set up. --> You know, it's getting, Kubernetes is getting better and better with that now that, you know, some of the default values is getting better. --> But the same concept with NFI. --> We can set these process groups to only use a certain amount of resources, which is a good thing. --> Controller services, same thing, you know, same type of thing, right? --> Specific processor groups for, you know, one tenant versus another tenant and those types of things. --> The way that NFI's ability to manage data queues and apply back pressure prevents system overload by regulating, you know, the flow of data. --> These settings can be adjusted for connection within the data flow. --> So the data flow itself has a ton of fine-grained details. --> Not only, you know, setting back pressure, setting that 10,000 files can be in the queue before it starts backing up. --> You can change all of these things very quickly and dynamically per data flow. --> So, you know, you're allowing for that multi-tenancy use. --> So you may have one tenant that's allowed 30,000 files in the back, you know, in the queue. --> Where another tenant may only be allowed 10,000. --> So just keep that in mind. --> This kind of goes into the, you know, setting up the environments. --> So what I'd really like to point out here is it's kind of crucial to establish separate environments for development, testing, and production. --> If you have everyone using the same instance for dead, the same instance for test, the same instance for fraud, that is a lot of, now, because of the fine-grained details --> and because of the multi-tenancy, you could set up one instance and have dev, test, and prod all in that instance and locked down where, you know, nobody that's seeing dev can see test and test see prod. --> But that is a lot of configuration. --> That is a lot of management of a lot of policies. --> And, you know, it's going to be hard to manage those resources. --> So if you can set up multiple environments, even if you have to do a dev and a prod, fine, right? --> If you can get a test in there, that'd be even better. --> So, you know, for, you know, this is just not, this is not necessarily tips and tricks, but here's somebody that's architected and designed these systems. --> This is what you should do. --> Each environment can be configured as a separate process group in 9.5, has its own workflows, its own controller services, zone access. --> This arrangement helps maintain order and enforce security policies unique to each environment. --> You know, a lot of these is the same, right? --> We have a lot of fine-grained control within 9.5. --> You can lock this down, you know, for process group and each process group has its own controller service access, those types of things. --> You know, as well as resources, right? --> We've already mentioned how we can limit, you know, your queue size or limit, you know, resources like memory and CPU, you know, those types of things. --> You know, there's a lot of advanced configuration you can do here. --> And 9.5 supports the promotion of data flows from development to testing and finally to production through registry, right? --> And that's what I was saying is registry is there to support. --> Usually you have one registry and you have three different 9.5s. --> One for depth test and product. --> And so, you know, 9.5 facilitates that promotion of depth to test, test to product. --> So that's multi-tenancy and an explanation of multi-tenancy. --> If you ask me why we are not going into 9.5 right now and setting up multi-tenancy, many reasons. --> It would require us to probably be a different type of machine that's well more suited for this. --> You know, I don't want to run multiple instances of 9.5 on a Windows box. --> You know, so we would need a cluster of machines. --> We would need to set up, you know, security. --> We need some sort of active directory and it would take us longer than three days to set that up versus, you know, just learning 9.5. --> So, you know, unfortunately, I can't show you any of that in the show and how to set it up. --> But that is what is multi-tenancies about. --> There's a ton of information on the official apache.9.5.org website as well as some additional diagrams and those types of things. --> Scalability. --> So, again, 9.5 was built for scalability. --> You know, at Fort Meade, we were ingesting a bunch of data. --> If you're familiar with that organization, you know we love data and we love to hang on to data. --> So, you know, we have data coming from all different sources, all kinds of places. --> And so, you know, it was nothing for us to have, you know, four or five racks and two racks is developed to, you know, basically for 9.5. --> And, you know, the remaining racks are just storage drives. --> And we'll run like a small cluster just so we can get data off the system. --> You know, so things like that. --> So, you know, 9.5 was built to scale. --> Going from one to two to three is the hardest. --> I think it's like, you know, what they say if you have a child, right? --> Going from one to two is the hardest. --> But once you go from two to three, like, it's all easy. --> And then, you know, you just keep adding to it. --> So, 9.5 is extremely scalable. --> As your demand increases, so can you increase those resources. --> It's both horizontal and vertical. --> I think you've already talked about some of this. --> You can scale out. --> You can scale up. --> You can add more CPU memory or add just more servers. --> Also, I think we ran into this a few times where we could work with the data flow as data was moving. --> We may change things around. --> We may pause the data. --> But it's not pausing the, you know, picking up the files, that initial processing of files, you know, --> and depending on the use case, right, we could, you know, just throw another fork in the data flow, --> take the original and do some additional processing. --> You know, so without ever having to turn off or anything else, same type of thing. --> You know, some of that vertical scaling. --> If you can add more resources, like CPU memory, you can just keep taking advantage of it. --> Usually what we see, though, is just draw more servers or, you know, like in the Kubernetes environment, --> draw more resources and ultimately more servers to run in Kubernetes. --> So, you know, usually that's what we see. --> You know, I don't really have to go into it. --> All important it is is, you know, scalability is crucial, especially some of the data intensive organizations out there. --> So, if you look at the first block here, that's actually a Raspberry Pi. --> Now, if I can run on a Raspberry Pi without one gig of RAM, you know, just fine. --> I mean, you're not going to be unzipping a lot of files. --> You're not going to be doing some of this data intensive stuff. --> But if you were to take NIFI onto your Raspberry desktop, you could download it, install it, and run it just fine. --> You know, so if you can imagine if you had a Raspberry Pi, you could actually do Minify or even better, --> Minify C++, which is the C++ version of Minify. --> You know, you can easily run those as well and process data for people. --> So, you know, as a single instance, you're a pretty square place. --> I got poor data quality warning for teams. --> So anyways, but as you start moving up and you're trying to process, you know, millions of events per second, --> millions of messages per second, 100 meg of throughput per second, --> you need to start scaling out your NIFI instances. --> You know, so, you know, this is a basic provisioning chart. --> If you are doing, I think it was like a million messages, you need, you know, you're going to need a few cores, --> 8 gig of RAM, you know, definitely some hard drive space because it's going to start making those content repositories fill up, --> those provenance repositories, those types of things. --> And, you know, depending on how much data you come in, you may have to use different storage types. --> And we get into that into the large, but a medium sized cluster, you know, it's a dual core. --> NIFI is pretty good at using all the cores because it is written in Java. --> I do prefer to throw, like, if you're doing this on bare metal, I like to have NIFI run on the, you know, as many cores as possible. --> But I prefer faster cores than additional cores. --> With NIFI, I think it utilizes that core very well. --> You know, so these are just some of the best things as you work through setting these up, if you go down that route. --> You know, if you're processing video, you're processing millions of events per second, you're needing, you know, --> 200 to 300 meg plus percent of throughput. --> You want full data governance, promenade, pedigree, lineage. --> You need everything, right? --> You want to start then throwing 64 gig of RAM. --> You know, you want to start throwing way more than four cores, as it says here. --> You want to start putting, like, eight to 12 cores to this. --> You know, you want it to be on a couple of different network adapters. --> And then you're, you know, you can run your operating system on a slower drive, but you want to make sure that that product repository --> and that container repository is running on something extremely fast. --> That way you can keep up writing all those events. --> I'm not going to go into these too much. --> I just kind of want to go over it. --> But, you know, NIFI does, you know, this is the example of the registry, --> where we have a NIFI dev. --> We have a deploy or QA test and props. --> And so, you know, on the dev, you're doing the commits. --> You're sending that to registry. --> Registry's got its API that's connecting to it. --> You know, you're then pulling that flow, utilizing Ansible, utilizing whatever, --> to pull that flow into your test and execute it. --> And the same type of thing with prod, you know, that becomes your prod. --> You've tested it out. --> You can then pull your prod ready to flow and execute those. --> Clustering allows NIFI nodes to operate together as a single unit. --> If we were in a clustered environment, it would show up on the toolbar of NIFI. --> It will let you know how many nodes you have available. --> There is a cluster coordinator in NIFI that's responsible for managing the state --> and the status of all the nodes. --> You know, the primary node, which also could be the cluster coordinator, --> for specific roles that are critical to the operation of the cluster, --> such as running controller services that may not be applicable on other nodes. --> You know, you may, for security reasons, if there was a CISO on the line, --> you may have certain servers or services that can, you know, --> reach out and touch, you know, like certain databases and things like that. --> And so, you know, you will set a controller service up for that so others could access it. --> But, you know, the server that is running NIFI, you know, --> it may not have access, you know, to all of those, you know, services. --> So, you know, it performs more localized controller service. --> So, all of those that can't be run, you know, sorry, --> the controller coordinator is going to run all of those tasks --> and all of those, you know, services that the other, you know, --> it's going to coordinate and run all of those. --> You may have an instance though that can communicate with a service --> that that service has to run on. --> That controller service has to be on that one instead. --> So, just keep that in mind. --> Data management, flow management, you know, --> NIFI clusters manage data flow dynamically across nodes. --> The system uses algorithms to distribute data evenly. --> Here's a little secret a lot of people don't know. --> Is there is a predictive analytics framework running underneath NIFI --> if you enable it. --> So, what that would do, you can actually take advantage of that framework --> many ways, but NIFI will use it to handle data flow, --> back pressure, those types of things a lot better than it does --> out of the box. --> It's got some special built algorithms, especially for this. --> But you don't get to download, install NIFI and use it. --> You got to enable it and I think you still have to build from source --> with it enabled to take advantage of the analytic framework, I think. --> There may be a property where you can turn it on, --> but I just bring that up because, you know, --> as we talk about data balancing and flow management --> and those types of things, you know, the clustering aspect helps there. --> But a component of that you can enable is analytics, --> a framework that NIFI uses. --> If you ever have interest in that or anything else, --> feel free to reach out and I'll put you in the right direction. --> NIFI also has a rules engine built in. --> It's J-Rules, I think. --> Java rules, so it's J-Rules. --> You have to enable that as well and build from source. --> But just for FYI, for those that like rules engines, --> you can run a rules engine as part of NIFI as well. --> So, you know, just some tidbits that really isn't publicly available, --> easily available, those types of things. --> We kind of went through the configuration of a connection. --> You know, you're able to, you know, --> you can actually do compression on the connection. --> So the data is compressed as it goes across. --> You can set the object threshold or the size threshold on the connections. --> And, you know, it's defaulting to 10,000 files or one gig, --> but you can change those. --> You can change the prioritizers. --> You know, you can pick up a lot of data and say, --> well, I don't necessarily want to go first in, first out, --> because I need the older data processed first. --> And so, you know, you're able to do all this first. --> So it's going to pick all this data up --> and then it's going to look through and say, --> okay, give me the oldest file first, and that's the one I want to send. --> So, you know, you have those types of capabilities. --> So as part of that load balancing, you have task distribution, --> node efficiency, you know, --> where a task can get spread out across the nodes. --> So they equally, you know, contribute to that data processing. --> If a node goes down, --> 9.5 recognizes and redistributes the workload across the cluster. --> If you bring on a new node, --> it will, you know, increase the resources it has --> and start putting flows onto the new node. --> And we are almost done with slides. --> So, you know, failures is inevitable. --> It's going to happen. --> You know, you're going to have clock crashes. --> You're going to have, you know, --> if you approach designing a system with that thing, you know, --> with Murphy's law in mind, you know, --> you can build that system to try to accommodate for it. --> So, NAFA is designed to handle that node --> and network failures really gracefully. --> You remember the original use case of NAFA was, --> you know, being able to process data --> and then when it finally had a connection, --> it would send that data. --> I know a lot of applications that will break --> if it doesn't have a network connection. --> I know a lot of applications that wouldn't all work right --> if they were using, you know, a T1, --> you know, one of our military satellites --> that's got a T1 connection, you know, --> that's really slow or even dollop. --> It's funny, last year we actually did a network simulation --> where we simulated a dollop connection --> and sending data across that --> and making sure that it actually got there. --> So, you know, it's designed to handle those network failures. --> If something does fail, that cluster coordinator --> will automatically detect the failure --> and redistribute tasks and data flows, --> you know, redistribute the workflow basically. --> It does have automatic rebalancing, --> does have failover mechanisms, --> as well as data replication and checkpointing, --> you know, to safeguard against data loss. --> NonFi does replicate flow files across multiple nodes. --> So, you know, if something does fail, --> it can pick that back up from the content repository --> and keep on going. --> And the big one here, I don't know how you all set up --> your monitoring and alerts and stuff like that --> for your systems. --> Prometheus is built in. --> So, I know most organizations use Prometheus. --> So, you can actually go, --> there's a Prometheus controller service. --> So, you can go into the controller service, --> configure your Prometheus instance, --> and send all these metrics and all these alerts to Prometheus. --> I'm not really going to go over creating a custom processor. --> I don't think we all want to download IDs and write code, --> but especially, you know, --> a little over an hour left of the class. --> But what I did is included this in case you do. --> Like I said, I will, I will, --> I'm going to clean up a few of these slides, --> but I'll package it up, --> as well as potentially some of the scenarios, --> and email everyone the slides by the end of the week. --> That way you have them, you'll have my contact information, --> you know, those types of things. --> Now, you know, my obligation ends when class is over today. --> However, you guys do work for my favorite military organization, --> and I also like helping out on things like this. --> So, if you do have questions, --> you have my contact information, --> those types of things that's going to come out with the presentation, --> feel free to shoot me an email. --> If it's a quick question, have at it. --> If it's, hey, Josh, can you come design our system --> and architect and put it together? --> My answer is going to be yes, --> but I need a contract with my company to do that. --> But if you have a question that I can quickly answer, --> please don't hesitate to reach out. --> I love talking NaFi. --> It's one of my most favorite. --> If there's a favorite software application, it's probably it. --> But anyway, so I'll include all of this, --> and we'll, I'm losing my voice, --> we will kind of go through, I'll send it out. --> That way, if you want to create your own processor, --> here's the instructions. --> There's a lot of source code, things like that. --> I'm cleaning this up as well to include, you know, --> things like, you know, the Q&A section. --> I've gotten any kind of questions I've written down from you all. --> I try to include that as well. --> You know, I will include some helpful links and resources --> to kind of get you started. --> But, you know, my only advice, if you take it away, --> is download it locally and have fun. --> So that being said, I've got to get something to drink. --> Does anyone have any questions about multi-tenancy, --> registry, load balancing, scalability, NaFi in general? --> Okay. --> All right. --> Well, if there is no questions, let's take our last break. --> I'm going to, I'm going to save 15 minutes, --> but I'll be back before that. --> You know, I just kind of want to go over a couple of other things --> and show you where the next scenario is in case you want to work on it. --> You know, you should have a survey. --> We'll talk about and that kind of stuff. --> But, and then answer any final questions and get us out of here. --> I know we stayed over quite a bit yesterday, --> so I'm going to try to get us out of here a little earlier today. --> But quick 15-minute break, --> but I should be back at my desk within 10 minutes. --> Hopefully everyone's getting back. --> Let's see, we have an hour left. --> So I'll get everybody back. --> For registry, for instance, --> everything we went over is in the documentation. --> We went over downloading, installing, starting it. --> You can install it as a service. --> We talked a little bit about that with NaFi. --> You know, creating a bucket, connecting it and all those things. --> Again, I don't put the resources in the presentation I send out to everyone, --> but if you have any additional questions about registry, --> about NaFi, about MENIFI, those projects, --> you have a lot of documentation there. --> I do realize that, you know, --> even though I think it's pretty good documentation, --> it could be lacking in some areas just because it is an open source project. --> It's kind of general. --> There's no specific use cases and setup and things like that. --> But, you know, I'll include these in, you know, in case you all need it. --> One of the other little points I want to chat about is MENIFI. --> Only because the last class was really interested in this, --> we actually did some MENIFI things. --> But, you know, I mentioned MENIFI. --> It is a sub-project under NaFi. --> It's basically the engine and any processors you would need to run a data flow. --> And it's built for speed. --> It's built for edge cases and those types of things. --> There is two different versions. --> There's a MENIFI C++, --> so if we have anybody on the call that likes to tinker with Raspberry Pis --> and things like that, --> the MENIFI C++ actually has, you know, --> a processor to capture images from a Raspberry Pi camera, --> you know, read sensor data, those types of things. --> As you can imagine, the C++ is very small. --> It's extremely fast because, you know, C++ is, --> I would consider it more low-level code. --> Java still has to go through the JVM and the Java Virtual Machine. --> So C++ outruns Java. --> But the C++ version only has about 15 processors. --> The last I looked, and so, you know, --> and they're basic processors for, like, getting a file or getting an HTTP --> or something like that, update attribute, those types of things. --> If you, yeah, there's actually, if you click on processors for C++, --> actually it's gotten a lot better. --> But it's only a subset, so you can tell a file. --> You can do Windows syslog events and stuff like that. --> You can execute scripts in SQL, Python processor, those types of things. --> What I really like is the collect Kubernetes pod metrics and stuff like that. --> So for the toms of the world and others, --> you may want to look at C, MENIFI C++ and deploy an instance of that --> for some of the monitoring of your system. --> I know that, you know, some of the others in the organization may use it --> for downrange sensor collection. --> That's what they were looking at it for. --> You know, that's another great use case. --> And so, you know, just keep that in mind. --> There's a lot of documentation about it and some of those things. --> The one that most people use is the MENIFI Java --> because, you know, it does have most of the processors --> that you have in a MENIFI instance. --> It also has, you know, any kind of custom processors. --> You can include it. --> The only processors really that you really can't use --> is some of the specialized processors for... --> Like, it doesn't bundle well with some of the cloud services, --> those types of things, but for the most part, you have that service. --> The nice thing about MENIFI as well is you still get that full data governance. --> You still get, you know, data delivery guarantee, --> you know, all of these different things that MENIFI provides, --> you get with MENIFI as well. --> So, you know, when you start thinking about some of the future use cases --> for future scenarios in MENIFI --> or hearing others in the organization talk about MENIFI being used --> to collect data on the edge and sort and filter --> and apply machine learning or AI algorithms to that, --> you know, you'll see why. --> But, yeah, I want to make sure I touched on MENIFI. --> A couple of other things. --> I put another scenario in your uploads. --> So, you should go to Uploads, and there should be a scenario two. --> I don't think we're going to get time to do it today, --> but I will send it out as well if you want to try to do this one. --> It's, you know, pretty... --> I think, like, it's not that hard. --> There's not a lot of controller services, things like that. --> So, if you want to do this scenario, have at it, --> your machines will be around for the next couple of days. --> And if you are working on this scenario and you get hung up, --> just send me an email with where you're hung up, --> and I can still log in and take a look. --> But, you know, this is just homework or extra work if you want it. --> So, you have, you know, you're working at a mid-size retail company, --> and you're going to create data flow. --> You know, I gave you the CSVs and stuff like that, --> but it's a little bit easier. --> You don't have to deal with, you know, different formats. --> You do have to deal with combining CSVs --> and things like that. --> But, you know, there is a simple reporting mechanism --> where you can use an execute script. --> You can, you know, the main thing here is try to get you --> more familiar with the regular expression language. --> So, you know, the scenario is there. --> It's not mandatory. --> But if you just want to take it a step further, --> you know, have at it. --> So, you should see that. --> You should see three CSV files that go along with that. --> And, you know, you're able to just do that on your own time. --> I'm going to clean all the scenarios up. --> I am going to also include, like, the export of my flows. --> So, remember the flow we did on the first one, the convert record. --> I'm going to include that. --> I'm going to try to save my second one that uses that, --> you know, as well. --> So that way, if you want, you can import my flow --> and go through it and reference it, you know, --> copy from it, you know, whatever. --> If it provides any use to you, please use it. --> If not, delete it and it will go away. --> So I'll only try to include that as well. --> Let's see here. --> What else? --> Oh, there should have been a survey sent to you all. --> Again, I just ask that I get paid either way, --> but I get paid a little quicker if the surveys are done quicker. --> So, you know, I ask that, you know, if you can complete the survey, --> you know, in a timely fashion, just helps me out a little bit --> and it gets it over to the company doing the training, --> coordinating this training, you know, --> to get those survey results in and those types of things. --> So if you can work on that. --> I don't think you want this option, but Maria asked me to let you all know --> that these machines will be available. --> If you want to purchase some additional time, you can. --> Those types of things. --> But, you know, you can definitely, you know, --> use these machines for the next day or two --> and play around with your data flow. --> If you want to, you know, --> you set up some data flows to go download things or whatever, --> but, you know, you'll have it for another day at least. --> You should get an email that it will be shutting down --> and then they'll tell you it's deleted or something. --> So, you know, that's there in case you want to reference any of your work. --> There's paste bed. --> There's some scratch pads and stuff like that. --> If you just want to copy and paste from your machine. --> If not, feel free to log in with like a Gmail or something --> and email yourself all this information. --> This machine will get deleted after today. --> I won't be in the machine unless you text me like, --> hey, email me that you have a question. --> So there's many ways to get the data off --> in case you wanted to capture and keep some of this. --> But I'm going to include not only the scenarios but my flows, --> also the original JSON that the first class broke apart --> as well as the new JSON that we worked on. --> So just keep that in mind. --> And with all of that being said, any questions? --> What I like to do now, you know, if there's not any other questions, --> is just kind of go around the room, you know, --> call on folks and, you know, do a little test just to make sure --> we all have, you know, our nine-five knowledge --> and things like that, you know, squared away. --> So before we go into that, anyone have a question that I can answer --> before we go around the room here? --> Did you ever put out your email? --> I don't think I've gotten your email address --> unless it was in some sort of email I may have missed. --> I did. --> It was actually on the first slide. --> All right. --> Like, I put it out, but you're going to get it. --> And that phone number is my personal cell phone --> that I've got in my hand right now in front of my laptop. --> So, and I already get like 20 phone calls a day from spam, --> so please don't sign me up for spam, --> but if you need anything, you can text or call me as well. --> But that's my email that I use. --> And you're going to get, like I said, --> you're going to get a copy of this as well that will have, --> you know, my cowboy picture here. --> I do live on a, I do run cattle. --> I have a ranch. --> I live here in Texas. --> I love Hereford's. --> You know, on my breaks, I run outside to check on the animals --> or listen to my wife yell at them. --> But yeah, you'll get all this information as well. --> Great, thank you. --> Yep, yep. --> Any other questions? --> Okay. --> Well, that being said then, --> so I'm going to just randomly pick on people --> and you know, just kind of go around the room. --> I have a test, you know, kind of set up where it's multiple choice, --> but you know, I'm not going to give you the choice --> unless you need some help and then I'll give you the choice. --> But, you know, for the first one, --> let's just start with the first person I have here is Ekta. --> Ekta. --> So what is Apache 9.5 primarily used for? --> Processing data, moving data around. --> Well, basically data language. --> Data from different types of sources, systems, --> and you know, in a multi-tennis environment, if there is one. --> Yeah, no, it's data flow automation, right? --> So automating those data flows, --> the orchestration of those data flows. --> So you nailed it. --> You know, the beauty is it's using workflow-based programming. --> We can build data flow and run that data flow, --> and now you have that automated flow, you know, going in. --> So great answer. --> Peter, can you name me a core component of 9.5? --> What do you mean by component? --> So 9.5 has many components. --> What would be a main component that you would use in 9.5? --> Like the processor. --> Bingo. --> Processor, right? --> Processor is a main component. --> A controller service is a main component. --> Your connection is a main component. --> So something like that. --> So you said processor. --> You got it. --> It's hard to explain the question because it's going to give you the answer. --> Like, oh, what do you mean component? --> Like, oh, processor. --> Okay. --> Oh, there it is. --> Or OD, I think they called you. --> What is a flow file in 9.5? --> I would say it's the file that the processor works with. --> It can take it in and do something with it, --> and then it'll usually output a flow file when it's done. --> Yeah. --> Yeah. --> You got it, right? --> You know, a flow file is a data record within 9.5. --> So that data record is a piece of data. --> So that could be a zip file, a CSV file, JSON, you name it, right? --> But it's a data record within 9.5. --> No, you got it. --> Let's see. --> Leroy, if I wanted to fetch data from an HTTP source, --> what processor do you think I would use? --> Let me look at one real quick. --> No. --> So if I had to get data from HTTP, what processor do you think? --> There is one called get HTTP. --> Nailed it. --> Perfect. --> You got it. --> It's really, I mean, you know, and this is, --> this class would take two weeks, --> and I would lose my voice, you know, very quickly --> if we got to go through every single processor. --> But, you know, these processors are kind of set up for self-discovery, right? --> You just demonstrated that where, you know, --> I bet there's a get HTTP, and there was. --> So, no, good answer. --> All right. --> Tom, what does the split text processor do in 9.5? --> It pulls out the, it separates the attributes into their own files. --> Yeah, so it splits a text file into multiple smaller files. --> So, not attributes, but definitely like a, you know, 100 line CSV, right? --> It feeds it in, it splits it out into individual smaller files, --> and they can be one line per file. --> So, but I give you a great answer on that one. --> I knew what you meant. --> Thank you. --> Yeah, yeah, no, we'll go through the rest of these tests hopefully pretty quickly. --> How is Providence data used in 9.5, Richard? --> You gave me the hard one. --> I was helping to help you. --> You're the CDO, right? --> Well, I mean, I guess you can, that's the question again. --> I want to make sure I understand that. --> How is Providence data used in 9.5? --> How, what is the purpose of the Providence data? --> Yeah, I'm struggling to answer that one. --> No worries. --> I'm going to phone a friend. --> Oh, well, I was going to answer it, but phone a friend. --> Yeah, no, actually, no, that's fine. --> You can answer it. --> Oh, you're phoning me. --> So the Providence data in 9.5 is used to track the data flow, track data flow and modifications to data. --> So if you can remember, we are keeping track of everything that touches that data. --> So if you're doing an extract text, you know, we're keeping track of that. --> We're keeping track of that state of that change. --> We're keeping track of the data itself and what changed. --> So that way we can compute that lineage and go back and replay it. --> And so you can actually replay a whole data flow to see how your 100-line CSV got to one line, you know, text file. --> So, you know, the answer is to track data flow and modifications to data. --> No, but you'll get the next one. --> I got about two or three, three questions per person. --> So anyways, which features in 9.5 can be used for scheduling when a processor runs? --> Ekta, how would you, can you show me how, you've got your screen up. --> Can you show me how you would schedule when a processor runs? --> Just for you, I guess, maybe, I don't know what's with that, do we? --> We did. --> So if you can, go into your processor group. --> No, it's not registry. --> Yeah, go into your processor group and let's bring up a processor and just go to the properties of one of those. --> There you go. --> Configure and go to scheduling. --> Remember, so we actually, you know, so we can change how things are scheduled here. --> We can do a timer and we can do, if you go down to scheduling strategy, you should have two options. --> There you go. --> Well, three actually, timer, event or chron. --> So an event can kick off a processor. --> A timer can kick off a processor or chron, meaning, you know, like, you know, at a certain time every day. --> So that's how you can use, you know, some of the scheduling strategy to run a processor. --> Just hit cancel. --> So that way it doesn't save it. --> Okay. --> All right. --> Well, thank you, Eka. --> Let's see. --> Peter. --> I've mentioned this a few times. --> What is back pressure? --> Back pressure. --> I don't remember the term, but it sounds like when the queue is getting filled up and it can't move on past a certain point. --> Yeah, it's a way to control the system overload by throttling the incoming data based upon what you were just talking about. --> So if the queue is backing up and, you know, it's configured to back up to 10,000 files, for instance, right. --> And we chatted about that a couple of times. --> Back pressure is going to slow things down. --> And so, you know, the data is still being ingested. --> The data is still running up until that point where you were queued up for 10,000 files. --> But because of back pressure, that flow doesn't stop. --> It just slows down a little bit. --> Move, move. --> Is that something that it does automatically or we're able to control? --> You are able to control it, but it does do it automatically. --> Okay. --> You're able to adjust settings and stuff like that for back pressure. --> And the config file, you know, as well as some other like some of the scheduling and first in, first out, you know, some of those things all kind of go into, you know, some of that back pressure and those types of things. --> But yeah, it's something that automatically happens as well as, you know, you're able to configure it a bit. --> Okay. --> Um, let's see. --> Oh, there it is. --> If a processor fails to process a flow file and needs to be retried, how would you do that? --> So can you actually bring up, let me bring your screen up so we can all see it. --> If you can just go into one of your processors, how would you do a retry? --> Put in the chat that he had to step away for a minute, so maybe he's not back yet. --> Oh, I didn't see the chat. --> Thank you very much. --> No worries. --> Tom, since you're here, how would you retry on a processor? --> So if you can, you can pull up your canvas and show me how you would do a retry on a processor. --> Might be better to do it on another processor that is not yet filed. --> I would do it on something like that has an error state. --> So let's look at a processor that, oh, convert CSV to JSON. --> How would you do a retry on that? --> So if you can, let's just go on that CSV to JSON, and you see the arrow in the middle? --> You know, the connection in the middle. --> You know, when you hover over, it's the little arrow that's trying to create a connection. --> Keep going to the right. --> You don't have to click. --> So go to the middle of the processor. --> You see the arrow there where it's trying to make a connection? --> Drag that arrow onto yourself. --> Bingo. --> Right there. --> You had it, but it didn't. --> Go back a little bit. --> There you go. --> Let go. --> And if you have a retry relationship, you will checkbox that. --> So you can actually do a success as well. --> You can send success back to yourself, but I would not recommend that because it'd be a loop. --> Then hit cancel. --> You can drag a connection back to yourself. --> And when you do that, you know, it will retry, you know, it will retry to process that data. --> We noticed those in a couple of processors. --> I forgot which process. --> It's only limited to a subset of processors. --> But I think it was one of the ones that we worked on on the previous scenario had a retry. --> I noticed on multiple people used it, so it had to retry. --> But that if you have the retry option, then I think it's unzipping. --> You can actually drag that arrow back to yourself and do a retry. --> So, you know, maybe a processor will penalize a file. --> You have some sort of penalty set up. --> And, you know, it's a time thing and you've got a lot of back pressure. --> You can tell it to retry, right? --> And then you can tell it, well, I want you to retry 10 times. --> And after that, fail. --> So, you know, you do have a retry. --> See? --> So that's how you would do a retry. --> So good job. --> Oh, gotcha. --> Okay, gotcha. --> Yeah. --> Yeah, you're right. --> It's on the Zip file processor. --> Yeah, not all of them have it. --> But yeah, you can drag it to yourself and retry it. --> So good job. --> All right. --> Let's see. --> Who else do we have here? --> Odair stepped away. --> So I skipped Leroy. --> Leroy, I'll throw you a softball. --> NAFA supports data ingestion and data extraction. --> True or false? --> Bingo. --> I'm just reading off my pre-built test. --> All right. --> Tom, I've got you. --> Richard, I'll throw you a softball. --> Flow files in NAFA can only contain text data. --> Richard, I had to step out. --> No, I'm kidding. --> That is false. --> That is false. --> I can actually get everybody's voice on this one. --> I can put names to voices now, so it's going to be hard to hide. --> Let's see. --> Ekta, last question for you. --> In NAFA, every processor must be connected to another processor. --> True or false? --> False. --> It can terminate on its own, right? --> It can. --> If you count it. --> Nice. --> That's a tricky one because most people will be like, no, it has to go to another processor. --> No, you can have a single processor doing something and then terminate. --> So you got it. --> You got it. --> Okay. --> Let's see. --> Let me ask you this. --> How often is that actually done though? --> Oh, it is extremely rare, right? --> I would think so, yeah. --> I mean, if I wanted to, I guess I could use a Git file. --> If I could terminate a Git file within itself, --> I could use that to basically create a provenance event on all the files in a directory. --> And that's it, right? --> Because it's going to pick that file up and then it's just going to terminate. --> So because of that pickup, I'm going to see that provenance event. --> So that's about the only reason I can think of. --> Now, yeah, that's the only reason I can think of. --> But it rarely happens, but you can have a single processor flow. --> Let's see here. --> What is, let's go with Leroy. --> What is the purpose of registry? --> It's like the version control and kind of also permissions on there as well, right? --> Yeah. --> Yeah. --> It is the versioning control for your flow files, --> but it's backed by a real versioning control system. --> I say real, like GitHub, GitLab, you know, those types of things. --> So no, you got it. --> Tom, because this one may come up for you. --> What is Minify? --> Cover that a little bit. --> It's a mini-me of mine. --> It's a mini-me. --> It's a Java application you can use. --> Man, I think I came in right when you were talking about it. --> You went over that right after the break, didn't you? --> Yeah, I did. --> I came in like right in the middle of that because I had stepped away as well. --> So do you have an easier one? --> Well, what I'll do is I'll just tell you the answer. --> So Minify comes in two flavors, Java and C++. --> The Java version is basically NaPy without the UI. --> So and underneath it, it's the same engine. --> It's exactly the same engine. --> And that's how Minify was used. --> And so, you know, back in the day, you know, for edge cases like internet of things, --> internet of battle things, you know, those types of stuff, you know, --> we would use NaPy but we would never, we would just take away the UI --> and have it run the flow over and over. --> And so that's how Minify was born. --> And Minify is basically a stripped down NaPy instance. --> It's smaller, more agile, leaner, easier to run on smaller devices, --> but it doesn't have a UI. --> It's all command line. --> And you would use NaPy and registry to develop the flow that goes to Minify. --> Minify can also, the Java version can use most processors that NaPy can, --> but the C++ version, which is even smaller and faster, --> has about 20 different processors with it. --> So that is Minify. --> Well, I mean, I went over it pretty quickly. --> And I'll only bring it up. --> I'll only bring it up because the other class was heavily interested --> and I don't know what everyone's here relationship is with each other --> as well as the other class. --> So I'll just bring it up just in case it's brought up in a meeting --> and you're like, oh, I know what Minify is. --> And it's really cool that you all are using it. --> So, okay. --> So, you know, I think I went around the room enough. --> I think everyone got, you know, all the concepts and terminology --> and those types of things. --> We covered a ton over the last three days. --> I do appreciate all the, you know, all the activity and the questions --> and the answers and the participation. --> You made it a little less boring than some of the classes. --> You know, I don't like to talk that much. --> So, you know, asking a lot of questions helps a lot. --> But I do appreciate it. --> If you can get those surveys complete, --> I will work on getting these slides updated --> and pushed out to everyone you have. --> If you don't, here is my contact information. --> Feel free to write it down. --> I would post it in Teams chat, but I can't because I'm not, --> I don't have permission to use chat. --> You know, so that's my contact info. --> If you have any questions, feel free to reach out. --> Like I said, I'll be happy to help. --> Send me an email with your question --> and I'll answer it as soon as I possibly can. --> But if anybody had any closing questions that I can answer, --> no, for now. --> Yes, you said, did you say you had a sample flow --> that we could import for this new scenario too? --> For the new scenario, I do not have a flow for it, --> but I do have a flow for the others. --> However, I will quickly put a flow together --> so you can implement it. --> That's easy. --> The scenario is not that bad, so I can... --> No, it didn't look too bad. --> It looks like we could even take pieces of stuff we've already done. --> Exactly. --> I try to make it where you can reuse a lot of components. --> Okay, good question. --> And I'll send that out, a flow for that. --> The scenario also says to... --> Let me pull it up here. --> You've got to get the files from separate directories. --> So are we supposed to put those files in sound directory? --> Yeah, so if you can, just try to put them in separate directories. --> The goal here, we were able to get all files from a directory. --> We were able to get all files from a directory --> and sort them by file type. --> Now, let's see if we get files from three directories, right, --> instead of just one. --> Because it kind of exercises that thought process --> of how am I going to get all of these files --> in the least amount of processors needed. --> But yeah, bustles up, please. --> That was going to be my next question. --> I suppose, and I suppose you're leaving that up to us to figure out, --> but I suppose you could get all three from separate directories --> with one processor, question mark? --> Well, if they are recursive, absolutely, right? --> So if you were listening to the parent folder --> and it was scanning the children --> and you had three different folders, absolutely. --> There's also, I think, some other ways to do this --> where you could do potentially like a list, --> do a directory listing, because there is a processor for a list, --> and then you can pick and choose which directory you want to pull from. --> So there's an easy way, there's a little bit more difficult way, --> and then you can write a script or something else --> and make it even harder. --> So it's up to you. --> All right, any other questions? --> Okay. --> Well, if there's no other questions, --> I'll give you a little bit of your time back early. --> I know we went over yesterday. --> Feel free to reach out. --> You've got my email, you've got my phone number. --> I will update these slides and send them out to you. --> I have everyone's email, so I'll just send them out to everyone. --> And please do your surveys. --> And if that's it, have a wonderful holiday weekend. --> And if you need anything, let me know. --> Thanks, guys. --> Thank you. --> Thank you. --> Uh-huh. --> Fucking dump! --> I'm done! --> Yay! --> I'm going to go to the mall. --> So last month, we stopped and went home on the car --> and we went to the satisfaction. --> So family was happy. --> Okay. --> Okay. --> Oh, my God. --> Okay. --> Okay. --> Okay. --> Okay.