apache – Page 2 – Lester Martin (l11n)

recap of the inaugural iceberg summit (my top 5 observations)

tl;dr – iceberg is pervasive, the real fight is for the catalog, concurrent transactional writes are a bitch, append-only tables still rule, and trino is widely adopted

joining spark dataframes with identical column names (an easier way)

presenting an easier solution to the problem of colliding column names when joining spark dataframes than i previously offered in my most popular post that just happens to be four years old — some things do age well

hive to iceberg migration tool (rev1)

they had a need for an iceberg migration tool, I wrote an iceberg migration tool — i committed it as a github project, then i promoted a github project (i’ve got macklemore’s thrift shop in my head as i write this excerpt)

data universe 2024 workshops (feedback appreciated)

feel free to come and test drive my four trino/starburst workshops i will be delivering at data universe 2024

apache iceberg table maintenance (is_current_ancestor part deux)

as a follow-on to my earlier post about iceberg versioning (and the is_current_ancestor flag), i thought it would be useful to show working examples of the maintenance activities that are needed to manage the sprawl of data lake files that come with more and more versions

iceberg snapshot is_current_ancestor flag (what does it tell us)

i’ve noticed the is_current_ancestor column of the apache iceberg $history metadata table for a while now – it wasn’t until I got a direct question about it that i realized it was time to find out for sure

hive acid transactions work on trino (can even update a partitioned column)

it seems that folks who haven’t used hive in production are always quick to say that hive doesn’t have classic crud operations, much less the merge statement, and that simply isn’t true – this post shows you that you can create a hive acid table and mutate its contents with trino

Tag Archives: apache