hive to iceberg migration tool (rev1)

they had a need for an iceberg migration tool, I wrote an iceberg migration tool — i committed it as a github project, then i promoted a github project (i’ve got macklemore’s thrift shop in my head as i write this excerpt)

pystarburst via a jupyter notebook (exploring the tpc-h dataset)

ready to explore pystarburtst via a jupyter notebook? this post points you to a single-click solution to spin up jupyter that has sample notebooks ready to run — you’re welcome!

building scalar udf’s w/sql for trino (aka sql routines)

check out this quick set of simple examples showing how easily you can create sql-based user-defined functions (udf), formally referred to as trino sql routines, to allow more succinct queries and offer reusability

apache iceberg table maintenance (is_current_ancestor part deux)

as a follow-on to my earlier post about iceberg versioning (and the is_current_ancestor flag), i thought it would be useful to show working examples of the maintenance activities that are needed to manage the sprawl of data lake files that come with more and more versions

iceberg snapshot is_current_ancestor flag (what does it tell us)

i’ve noticed the is_current_ancestor column of the apache iceberg $history metadata table for a while now – it wasn’t until I got a direct question about it that i realized it was time to find out for sure

dbt cloud & starburst galaxy workshop (beta testers welcome)

interested in building a data pipeline with dbt cloud and starburst galaxy? if so, then this post presents recorded videos of 7 lab exercises plus the lab guide itself so you work through them on your own & at your pace

ibis & trino (dataframe api part deux)

this is a port of the dataframe api code from my original pystarburst posting – this time i implemented the same scenarios with ibis, the portable python dataframe library, and had a blast doing it

viewing astronauts thru windows (more pystarburst examples)

i’ve got a fever and the only prescription is more pystarburst examples — this third installment is all about window functions via the dataframe api and like before, I present sql first for comparison

pystarburst analytics examples (querying aviation data part deux)

i had so much fun publishing my first pystarburst post and running it in starburst galaxy that i wanted to share some more examples – i ported my aviation dataset analytical queries to python and the dataframe api