On our previous video on the basics of Nifi, we covered a brief definition of Nifi, how flows are built, and the different types of processors that can be used. We also stepped through a very basic flow that pulls lines from a file and pushes them into AWS. Next on our list is to introduce a transformation step to get a better handle on our data coming through.
Installation
Instructions for downloading and starting Nifi can be found here.
An alternative installation using Docker can be found here.
I use this Docker-Compose file to start my personal Nifi instance.
Check out the video on our YouTube Channel!
Excerpt
This is a good start, however all ETL processing needs to be prepared for malformed data. Rather than let the destination sort that out, we can add a transformation step between our extraction and our loading.
Flowfiles that move through Nifi have the data itself as the body and attributes associated with it as a header. Attributes can be added throughout the flow and they persist through the entirety of the flow. These attributes can be any data that describe where the flowfile came from, how it was created, what flowfile it may have been created from, or user defined data used to route flowfiles to specific destinations.