they had a need for an iceberg migration tool, I wrote an iceberg migration tool — i committed it as a github project, then i promoted a github project (i’ve got macklemore’s thrift shop in my head as i write this excerpt)
Tag Archives: software_development
data universe 2024 workshops (feedback appreciated)
feel free to come and test drive my four trino/starburst workshops i will be delivering at data universe 2024
pystarburst via a jupyter notebook (exploring the tpc-h dataset)
ready to explore pystarburtst via a jupyter notebook? this post points you to a single-click solution to spin up jupyter that has sample notebooks ready to run — you’re welcome!
becoming a data engineer (yet another top 10 list)
after a recent class i was asked what skills someone needs to become a data engineer – there are plenty of these lists all over the internet, yet here i go assuming i know enough to jot down yet another; at least i put mine all in a single picture 😉
ibis & trino (dataframe api part deux)
this is a port of the dataframe api code from my original pystarburst posting – this time i implemented the same scenarios with ibis, the portable python dataframe library, and had a blast doing it
viewing astronauts thru windows (more pystarburst examples)
i’ve got a fever and the only prescription is more pystarburst examples — this third installment is all about window functions via the dataframe api and like before, I present sql first for comparison
pystarburst analytics examples (querying aviation data part deux)
i had so much fun publishing my first pystarburst post and running it in starburst galaxy that i wanted to share some more examples – i ported my aviation dataset analytical queries to python and the dataframe api
wait… what? (a video game named lester)
i always thought it was super cool to have some lester (kasai) skateboards and stickers, as well as garbage pail kids sticker with my name on it, but today i found the coolest thing ever – a video game using my name
pystarburst (the dataframe api)
the dataframe api is finally available for trino and starburst galaxy thanks to the pystarburst libraries — take a peek at some example usages in this quick validation run
batch as a “special case” of flink streaming (yes, now we’re mv’ing streaming back to batch)
the third part of a loosely coupled trilogy on flink batch and streaming that take us full-circle with the collapse of the DataSet API into the DataStream API — i’m not sure Run-D.M.C. could make this less tricky