hive to iceberg migration tool (rev1)

they had a need for an iceberg migration tool, I wrote an iceberg migration tool — i committed it as a github project, then i promoted a github project (i’ve got macklemore’s thrift shop in my head as i write this excerpt)

pystarburst via a jupyter notebook (exploring the tpc-h dataset)

ready to explore pystarburtst via a jupyter notebook? this post points you to a single-click solution to spin up jupyter that has sample notebooks ready to run — you’re welcome!

becoming a data engineer (yet another top 10 list)

after a recent class i was asked what skills someone needs to become a data engineer – there are plenty of these lists all over the internet, yet here i go assuming i know enough to jot down yet another; at least i put mine all in a single picture 😉

ibis & trino (dataframe api part deux)

this is a port of the dataframe api code from my original pystarburst posting – this time i implemented the same scenarios with ibis, the portable python dataframe library, and had a blast doing it

viewing astronauts thru windows (more pystarburst examples)

i’ve got a fever and the only prescription is more pystarburst examples — this third installment is all about window functions via the dataframe api and like before, I present sql first for comparison

pystarburst analytics examples (querying aviation data part deux)

i had so much fun publishing my first pystarburst post and running it in starburst galaxy that i wanted to share some more examples – i ported my aviation dataset analytical queries to python and the dataframe api

pystarburst (the dataframe api)

the dataframe api is finally available for trino and starburst galaxy thanks to the pystarburst libraries — take a peek at some example usages in this quick validation run

batch as a “special case” of flink streaming (yes, now we’re mv’ing streaming back to batch)

the third part of a loosely coupled trilogy on flink batch and streaming that take us full-circle with the collapse of the DataSet API into the DataStream API — i’m not sure Run-D.M.C. could make this less tricky