building trino data pipelines (with sql or python)

trino is well-known as a fast query engine, but it is also a robust transformation processing engine that allows data engineers to developer in sql and/or python

joining spark dataframes with identical column names (an easier way)

presenting an easier solution to the problem of colliding column names when joining spark dataframes than i previously offered in my most popular post that just happens to be four years old — some things do age well

pystarburst in 90 seconds (try it)

still thinking about trying to get a pystarburst code stub up/n/running? starburst galaxy makes it pain free and you can even get your first dataframe created via python in under 90 seconds — why not give it a try?

hive to iceberg migration tool (rev1)

they had a need for an iceberg migration tool, I wrote an iceberg migration tool — i committed it as a github project, then i promoted a github project (i’ve got macklemore’s thrift shop in my head as i write this excerpt)

pystarburst via a jupyter notebook (exploring the tpc-h dataset)

ready to explore pystarburtst via a jupyter notebook? this post points you to a single-click solution to spin up jupyter that has sample notebooks ready to run — you’re welcome!

becoming a data engineer (yet another top 10 list)

after a recent class i was asked what skills someone needs to become a data engineer – there are plenty of these lists all over the internet, yet here i go assuming i know enough to jot down yet another; at least i put mine all in a single picture 😉

ibis & trino (dataframe api part deux)

this is a port of the dataframe api code from my original pystarburst posting – this time i implemented the same scenarios with ibis, the portable python dataframe library, and had a blast doing it

viewing astronauts thru windows (more pystarburst examples)

i’ve got a fever and the only prescription is more pystarburst examples — this third installment is all about window functions via the dataframe api and like before, I present sql first for comparison