joining spark dataframes with identical column names (not just in the join condition)

a quick walkthru of spark sql dataframe code showing joining scenarios when both tables have columns with the same name; this includes when they are used in the join condition as well as when they are not

hive’s merge statement (it drops a lot of acid)

hive’s merge command provides another option for acid transactioning beyond insert, update and delete — this post walks you through a simple example and looks at the underlying filesystem at all the base, delta and delta_delete files that are created to support this standard sql command

hive acid transactions with partitions (a behind the scenes perspective)

let’s take a deeper look at what happens under the hood of hive on these “acid” activities such as insert, update and delete — including look at the actual directories and orc files created

topology supervision features of streaming frameworks (or lack thereof)

a smackdown of sort pitting kafka streams, spark streaming, and storm against each other — not for the features they give developers, but for the features they offer the operations side of the devops formula