trino is well-known as a fast query engine, but it is also a robust transformation processing engine that allows data engineers to developer in sql and/or python
Tag Archives: pystarburst
recap of the inaugural iceberg summit (my top 5 observations)
tl;dr – iceberg is pervasive, the real fight is for the catalog, concurrent transactional writes are a bitch, append-only tables still rule, and trino is widely adopted
joining spark dataframes with identical column names (an easier way)
presenting an easier solution to the problem of colliding column names when joining spark dataframes than i previously offered in my most popular post that just happens to be four years old — some things do age well
pystarburst in 90 seconds (try it)
still thinking about trying to get a pystarburst code stub up/n/running? starburst galaxy makes it pain free and you can even get your first dataframe created via python in under 90 seconds — why not give it a try?
hive to iceberg migration tool (rev1)
they had a need for an iceberg migration tool, I wrote an iceberg migration tool — i committed it as a github project, then i promoted a github project (i’ve got macklemore’s thrift shop in my head as i write this excerpt)
data universe 2024 workshops (feedback appreciated)
feel free to come and test drive my four trino/starburst workshops i will be delivering at data universe 2024
pystarburst via a jupyter notebook (exploring the tpc-h dataset)
ready to explore pystarburtst via a jupyter notebook? this post points you to a single-click solution to spin up jupyter that has sample notebooks ready to run — you’re welcome!
ibis & trino (dataframe api part deux)
this is a port of the dataframe api code from my original pystarburst posting – this time i implemented the same scenarios with ibis, the portable python dataframe library, and had a blast doing it
viewing astronauts thru windows (more pystarburst examples)
i’ve got a fever and the only prescription is more pystarburst examples — this third installment is all about window functions via the dataframe api and like before, I present sql first for comparison
pystarburst analytics examples (querying aviation data part deux)
i had so much fun publishing my first pystarburst post and running it in starburst galaxy that i wanted to share some more examples – i ported my aviation dataset analytical queries to python and the dataframe api