spark – Page 2 – Lester Martin (l11n)

viewing astronauts thru windows (more pystarburst examples)

i’ve got a fever and the only prescription is more pystarburst examples — this third installment is all about window functions via the dataframe api and like before, I present sql first for comparison

pystarburst analytics examples (querying aviation data part deux)

i had so much fun publishing my first pystarburst post and running it in starburst galaxy that i wanted to share some more examples – i ported my aviation dataset analytical queries to python and the dataframe api

pystarburst (the dataframe api)

the dataframe api is finally available for trino and starburst galaxy thanks to the pystarburst libraries — take a peek at some example usages in this quick validation run

hive, trino & spark features (their journeys to sql, performance & durability)

different big data sql engines are created to solve a particular lack of focus from existing ones, but sooner or later they all start looking like each other from their list of features and observable behaviors

updated streaming supervision features scorecard (added flink)

added apache flink to the comparison grid of kafka streams, spark streaming, and storm focused on the features they offer the operations side of the devops formula — it measures up well

big data api’s look a lot alike (code comparison with flink, kafka, spark, trident and pig)

exploring the similarity of the APIs from flink, kafka streams, spark (RDDs & DFs), storm’s trident and yes, even good old pig by implementing the canonical word count solution with each framework

functional programming and big data (what a pair)

a high-level overview of how functional programming with immutable datasets is a great partner with big data processing frameworks — code examples with spark rdds using scala

building a spark sql udf with scala (using multiple arguments)

a short & sweet code-focused tutorial declaring a scala function as a spark sql udf that can be leveraged via the api approach or in a formal sql statement

joining spark dataframes with identical column names (not just in the join condition)

a quick walkthru of spark sql dataframe code showing joining scenarios when both tables have columns with the same name; this includes when they are used in the join condition as well as when they are not

topology supervision features of streaming frameworks (or lack thereof)

a smackdown of sort pitting kafka streams, spark streaming, and storm against each other — not for the features they give developers, but for the features they offer the operations side of the devops formula