data.streams

l11n’s technical blog

pystarburst in 90 seconds (try it)

still thinking about trying to get a pystarburst code stub up/n/running? starburst galaxy makes it pain free and you can even get your first dataframe created via python in under 90 seconds — why not give it a try?

by lestermartin May 2, 2024

trino: an origin story (nailed it!)

the full trino origin story complete with architectural walkthru and comparisons with other frameworks like hive & spark all in a single video? a single video that is < 20 minutes long? yep, and the creator nailed it!

by lestermartin April 30, 2024

hive to iceberg migration tool (rev1)

they had a need for an iceberg migration tool, I wrote an iceberg migration tool — i committed it as a github project, then i promoted a github project (i’ve got macklemore’s thrift shop in my head as i write this excerpt)

by lestermartin April 25, 2024May 2, 2024

data universe 2024 workshops (feedback appreciated)

feel free to come and test drive my four trino/starburst workshops i will be delivering at data universe 2024

by lestermartin April 4, 2024April 4, 2024

pystarburst via a jupyter notebook (exploring the tpc-h dataset)

ready to explore pystarburtst via a jupyter notebook? this post points you to a single-click solution to spin up jupyter that has sample notebooks ready to run — you’re welcome!

by lestermartin March 21, 2024March 21, 2024

what i do at starburst (educational engineer)

i came to starburst just over two years ago for the framework’s focus on optionality; i’ve stayed for the opportunities that have been presented to me (i’m proud to be an all-star and glad to talk about open opportunities with you)

by lestermartin March 15, 2024March 15, 2024

building scalar udf’s w/sql for trino (aka sql routines)

check out this quick set of simple examples showing how easily you can create sql-based user-defined functions (udf), formally referred to as trino sql routines, to allow more succinct queries and offer reusability

by lestermartin March 14, 2024March 15, 2024

apache iceberg table maintenance (is_current_ancestor part deux)

as a follow-on to my earlier post about iceberg versioning (and the is_current_ancestor flag), i thought it would be useful to show working examples of the maintenance activities that are needed to manage the sprawl of data lake files that come with more and more versions

by lestermartin February 21, 2024February 21, 2024

becoming a data engineer (yet another top 10 list)

after a recent class i was asked what skills someone needs to become a data engineer – there are plenty of these lists all over the internet, yet here i go assuming i know enough to jot down yet another; at least i put mine all in a single picture 😉

by lestermartin February 20, 2024February 20, 2024

iceberg snapshot is_current_ancestor flag (what does it tell us)

i’ve noticed the is_current_ancestor column of the apache iceberg $history metadata table for a while now – it wasn’t until I got a direct question about it that i realized it was time to find out for sure

by lestermartin February 15, 2024February 15, 2024

Follow My Blog

Get new content delivered directly to your inbox.