Basics of Apache Nifi: 2

On our previous video on the basics of Nifi, we covered a brief definition of Nifi, how flows are built, and the different types of processors that can be used. We also stepped through a very basic flow that pulls lines from a file and pushes them into AWS. Next on our list is to introduce a transformation step to get a better handle on our data coming through.

Installation

Instructions for downloading and starting Nifi can be found here.
An alternative installation using Docker can be found here.
I use this Docker-Compose file to start my personal Nifi instance.

Check out the video on our YouTube Channel!

Excerpt

This is a good start, however all ETL processing needs to be prepared for malformed data. Rather than let the destination sort that out, we can add a transformation step between our extraction and our loading.

Flowfiles that move through Nifi have the data itself as the body and attributes associated with it as a header. Attributes can be added throughout the flow and they persist through the entirety of the flow. These attributes can be any data that describe where the flowfile came from, how it was created, what flowfile it may have been created from, or user defined data used to route flowfiles to specific destinations.

Author image
Dirty Data Dancer, Spark Specialist, and an advocator of awesome AWS Applications. My experience ranges from Data Warehousing to Data Management and Data Exploration.
OUR COMPANY
Ippon Technologies is an international consulting firm that specializes in Agile Development, Big Data and DevOps / Cloud. Our 400+ highly skilled consultants are located in the US, France, Australia and Russia. Ippon technologies has a $42 million revenue.