Big Data – Page 5 – Lester Martin (l11n)

building a sql-based data pipeline with trino & starburst (5 slick videos)

a collection of videos presented as an overview of how you could build a sql-based data transformation pipeline utilizing trino/starburst and automating it with dbt

better iceberg materialized views in galaxy (no staleness check)

i’m happy to report that some code changes were made since my last post on materialized views in starburst galaxy and the (mostly useless) “staleness check” is not being executed any more

determining # of splits w/trino/starburst/galaxy (iceberg table format)

a prior post tackled this same quest of understing how trino decides how many splits to use in a query with the hive table format — it ended with a question of how iceberg tackles the same problem which is answered in this post

delta lake in starburst galaxy (intro & integration)

delta lake is a popular data lake table format and the trino engine, and starburst galaxy, easily integrate with it all while using your favorite cloud provider’s object store thanks to galaxy’s great lakes connectivity

finally checking out chatgpt (adding a new tool in my toolbelt)

putting aside my (natural?) fear of artificial intelligence, i finally got around to exploring (testing?) chatgpt that everyone has been talking about for many months now

determining # of splits w/trino/starburst/galaxy (hive table format)

ever wondered how trino decides how many splits to use in a query when reading files from your data lake — if so, come along and ride on a fantastic voyage

starburst galaxy’s materialized views (using apache iceberg)

join me on a quick test drive of the features of materialized views in starburst galaxy (saas offering powered by trino) which use apache iceberg for persistence and features some pretty cool features around snapshots and awareness of stale data

querying starburst / trino from apache superset (in 7 steps)

title says it all 😉

eliminate rollup’s null confusion (hint: grouping keyword)

rollup functions,such as cube, identify the rolled up totals by using null for the column they are representing a total for – this gets rather confusing when the column itself has null values in the individual rows – this post will show you how to definitively differentiate between these two cases

querying aviation data in the cloud (leveraging starburst galaxy)

come along on a quick tutorial of loading some airline flight data into a cloud object store and performing some data analysis of it from the starburst galaxy sql engine in the sky