better iceberg materialized views in galaxy (no staleness check)

i’m happy to report that some code changes were made since my last post on materialized views in starburst galaxy and the (mostly useless) “staleness check” is not being executed any more

starburst galaxy’s materialized views (using apache iceberg)

join me on a quick test drive of the features of materialized views in starburst galaxy (saas offering powered by trino) which use apache iceberg for persistence and features some pretty cool features around snapshots and awareness of stale data

updated streaming supervision features scorecard (added flink)

added apache flink to the comparison grid of kafka streams, spark streaming, and storm focused on the features they offer the operations side of the devops formula — it measures up well

batch as a “special case” of flink streaming (yes, now we’re mv’ing streaming back to batch)

the third part of a loosely coupled trilogy on flink batch and streaming that take us full-circle with the collapse of the DataSet API into the DataStream API — i’m not sure Run-D.M.C. could make this less tricky

big data api’s look a lot alike (code comparison with flink, kafka, spark, trident and pig)

exploring the similarity of the APIs from flink, kafka streams, spark (RDDs & DFs), storm’s trident and yes, even good old pig by implementing the canonical word count solution with each framework

joining spark dataframes with identical column names (not just in the join condition)

a quick walkthru of spark sql dataframe code showing joining scenarios when both tables have columns with the same name; this includes when they are used in the join condition as well as when they are not