after a recent class i was asked what skills someone needs to become a data engineer – there are plenty of these lists all over the internet, yet here i go assuming i know enough to jot down yet another; at least i put mine all in a single picture 😉
Tag Archives: java
batch as a “special case” of flink streaming (yes, now we’re mv’ing streaming back to batch)
the third part of a loosely coupled trilogy on flink batch and streaming that take us full-circle with the collapse of the DataSet API into the DataStream API — i’m not sure Run-D.M.C. could make this less tricky
mv’ing batch flink to streaming (easy breezy)
building on a prior post, this tutorial ports a simple flink batch program to become a streaming solution – put lakeside on the turntable and let’s finish up the fantastic voyage
hello world with flink (from scratch)
come along and ride on a fantastic voyage where we will setup an apache flink environment, code up a very simple job, and execute it & verify our results — we’ll just slide, glide, slippity-side
big data api’s look a lot alike (code comparison with flink, kafka, spark, trident and pig)
exploring the similarity of the APIs from flink, kafka streams, spark (RDDs & DFs), storm’s trident and yes, even good old pig by implementing the canonical word count solution with each framework
viewing the content of ORC files (using the Java ORC tool jar)
a quick tutorial about finding and using the orc java tool jar for peering into the contents of the otherwise non humanly readable orc file format