building on a prior post, this tutorial ports a simple flink batch program to become a streaming solution – put lakeside on the turntable and let’s finish up the fantastic voyage
Tag Archives: open_source
hello world with flink (from scratch)
come along and ride on a fantastic voyage where we will setup an apache flink environment, code up a very simple job, and execute it & verify our results — we’ll just slide, glide, slippity-side
big data api’s look a lot alike (code comparison with flink, kafka, spark, trident and pig)
exploring the similarity of the APIs from flink, kafka streams, spark (RDDs & DFs), storm’s trident and yes, even good old pig by implementing the canonical word count solution with each framework
functional programming and big data (what a pair)
a high-level overview of how functional programming with immutable datasets is a great partner with big data processing frameworks — code examples with spark rdds using scala
building a spark sql udf with scala (using multiple arguments)
a short & sweet code-focused tutorial declaring a scala function as a spark sql udf that can be leveraged via the api approach or in a formal sql statement
joining spark dataframes with identical column names (not just in the join condition)
a quick walkthru of spark sql dataframe code showing joining scenarios when both tables have columns with the same name; this includes when they are used in the join condition as well as when they are not
securing hive entities (ranger and atlas to the rescue)
video showing how to use ranger & atlas to create security policies on hive tables, columns and rows as well as implementing data masking and tag-based restrictions
hive’s merge statement (it drops a lot of acid)
hive’s merge command provides another option for acid transactioning beyond insert, update and delete — this post walks you through a simple example and looks at the underlying filesystem at all the base, delta and delta_delete files that are created to support this standard sql command
hive delta file compaction (minor and major)
a quick walk-thru of how minor and major compactions occur for hive transactional tables; ensuring all the delta files eventually roll into base ones
hive acid transactions with partitions (a behind the scenes perspective)
let’s take a deeper look at what happens under the hood of hive on these “acid” activities such as insert, update and delete — including look at the actual directories and orc files created