exploring the similarity of the APIs from flink, kafka streams, spark (RDDs & DFs), storm’s trident and yes, even good old pig by implementing the canonical word count solution with each framework
Tag Archives: scala
functional programming and big data (what a pair)
a high-level overview of how functional programming with immutable datasets is a great partner with big data processing frameworks — code examples with spark rdds using scala
building a spark sql udf with scala (using multiple arguments)
a short & sweet code-focused tutorial declaring a scala function as a spark sql udf that can be leveraged via the api approach or in a formal sql statement
joining spark dataframes with identical column names (not just in the join condition)
a quick walkthru of spark sql dataframe code showing joining scenarios when both tables have columns with the same name; this includes when they are used in the join condition as well as when they are not