develop, deploy, execute & monitor in one tool (welcome to apache nifi)

Backdrop

If you haven’t heard of Apache NiFi, then you’ve been living under a rock for far too long. For a 18 year old framework (ASF project since 2014) that is operating in nearly 10,000 organizations it is often the case that someone has NOT heard about NiFi. The docs answer the What is NiFi? question as…

NiFi was built to automate the flow of data between systems. While the term ‘dataflow’ is used in a variety of contexts, we use it here to mean the automated and managed flow of information between systems.

Yes, a ‘dataflow’ system! Moving data from one place to another. I affectionately call it slurp & burp because I’m goofy, but there is nothing simple or primitive about NiFi. There are tons of processors for reading and writing data from a variety of systems and data formats. You can also do some pretty sophisticated transformations, enrichments, cleansing, etc along the way.

Here’s an example of the ‘art of the possible’ — an unstructured document ETL pipeline for GenAI applications.

<disclaimer>
    I'm a NiFi Developer Advocate at 
    <a href='https://datavolo.io'>Datavolo</a>.
</disclaimer>

With that <disclaimer /> out of the way, I can tell you about a nice tutorial I targeted at first-time NiFi developers; Build a simple NiFi flow. Ok, enough backdrop on Apache NiFi.

Many SDLC phases in one tool

The purpose of this blog post is to point out an interesting feature that makes NiFi pretty unique. Many of the various Software Development Life Cycle (SDLC) phases are all carried out seamlessly in the same visual low-code environment simply referred to as the NiFi UI.

Development

Most folks that see NiFi’s UI for the first time recognize the drag/n/drop, configure, and connect visual paradigm that many tools offer.

Nothing all that unique here.

Compile & build

Well, there isn’t such a thing in NiFi. Once you add something to the UI’s canvas it is ready to go.

NOTE: You can create custom processors with Java and/or Python that do follow the more typical SDLC phases. For example, see Build a NiFi Python transform processor.

Deploy

Much like there isn’t any compile/build step, you have already deployed your dataflow as you build it. This is happening component by component and connection by connection.

NOTE: You CAN move code across different NiFi runtimes such as dev/test/prod. For example, see Versioning NiFi flows with GitHub.

Runtime

The execution environment, or runtime, is ALSO the SAME environment. Every single fine-grained and coarse-grained component can be started or stopped as part of the UI itself.

Monitoring & reporting

Even at the most fine-grained level of a processor, statistics are available.

Tracing & debugging

The ‘queue’ above with one item representing a total of 37.27 MB of data can be drilled down into for more information.

You can even peer into the data itself that is traversing through the dataflow.

Bulletins give you a peek into the logs for debugging opportunities in the same tool that you defined the flow in the first place with.

Try it out for yourself

This post’s function was to give you an overview of how NiFi is quite unique by collapsing so many of SDLC phases into one tool. It didn’t give you a chance to check it all out yourself, so I encourage you to visit the hands-on tutorials I’ve published on the Datavolo DevCenter as well as see some additional resources that can help on your journey.

Happy dataflowing!

Published by lestermartin

Developer advocate, trainer, blogger, and data engineer focused on data lake & streaming frameworks including Trino, Hive, Spark, Flink, Kafka and NiFi.

Leave a Reply

Discover more from Lester Martin (l11n)

Subscribe now to keep reading and get access to the full archive.

Continue reading