title says it all 😉
Category Archives: Big Data
eliminate rollup’s null confusion (hint: grouping keyword)
rollup functions,such as cube, identify the rolled up totals by using null for the column they are representing a total for – this gets rather confusing when the column itself has null values in the individual rows – this post will show you how to definitively differentiate between these two cases
querying aviation data in the cloud (leveraging starburst galaxy)
come along on a quick tutorial of loading some airline flight data into a cloud object store and performing some data analysis of it from the starburst galaxy sql engine in the sky
hive, trino & spark features (their journeys to sql, performance & durability)
different big data sql engines are created to solve a particular lack of focus from existing ones, but sooner or later they all start looking like each other from their list of features and observable behaviors
why i joined starburst (optionality and common sense)
i’m just so excited to be working at starburst and I want to share why and to encourage others to consider joining us as we grow, grow, grow
federated queries on starburst galaxy (long and short videos)
a long (and short) video of performing a federated join across s3, redshift, and mysql using trino-based starburst galaxy
querying starburst galaxy from tableau (super easy)
short post pointing to the youtube video i created showing how to use starburst galaxy to query data from tableau desktop
wrapping up my 8 year hortonworks – cloudera adventure (best job ever)
what an amazing eight years at hortonworks/cloudera — the technology, the focus, the use cases, the domains, the FUN and most importantly, the PEOPLE, made this the best job of my entire career and make it super hard to say goodbye to this role
updated streaming supervision features scorecard (added flink)
added apache flink to the comparison grid of kafka streams, spark streaming, and storm focused on the features they offer the operations side of the devops formula — it measures up well
batch as a “special case” of flink streaming (yes, now we’re mv’ing streaming back to batch)
the third part of a loosely coupled trilogy on flink batch and streaming that take us full-circle with the collapse of the DataSet API into the DataStream API — i’m not sure Run-D.M.C. could make this less tricky