a prior post tackled this same quest of understing how trino decides how many splits to use in a query with the hive table format — it ended with a question of how iceberg tackles the same problem which is answered in this post
Tag Archives: apache
starburst galaxy’s materialized views (using apache iceberg)
join me on a quick test drive of the features of materialized views in starburst galaxy (saas offering powered by trino) which use apache iceberg for persistence and features some pretty cool features around snapshots and awareness of stale data
querying starburst / trino from apache superset (in 7 steps)
title says it all 😉
hive, trino & spark features (their journeys to sql, performance & durability)
different big data sql engines are created to solve a particular lack of focus from existing ones, but sooner or later they all start looking like each other from their list of features and observable behaviors
wrapping up my 8 year hortonworks – cloudera adventure (best job ever)
what an amazing eight years at hortonworks/cloudera — the technology, the focus, the use cases, the domains, the FUN and most importantly, the PEOPLE, made this the best job of my entire career and make it super hard to say goodbye to this role
updated streaming supervision features scorecard (added flink)
added apache flink to the comparison grid of kafka streams, spark streaming, and storm focused on the features they offer the operations side of the devops formula — it measures up well
batch as a “special case” of flink streaming (yes, now we’re mv’ing streaming back to batch)
the third part of a loosely coupled trilogy on flink batch and streaming that take us full-circle with the collapse of the DataSet API into the DataStream API — i’m not sure Run-D.M.C. could make this less tricky
mv’ing batch flink to streaming (easy breezy)
building on a prior post, this tutorial ports a simple flink batch program to become a streaming solution – put lakeside on the turntable and let’s finish up the fantastic voyage
hello world with flink (from scratch)
come along and ride on a fantastic voyage where we will setup an apache flink environment, code up a very simple job, and execute it & verify our results — we’ll just slide, glide, slippity-side
big data api’s look a lot alike (code comparison with flink, kafka, spark, trident and pig)
exploring the similarity of the APIs from flink, kafka streams, spark (RDDs & DFs), storm’s trident and yes, even good old pig by implementing the canonical word count solution with each framework