hadoop – Lester Martin (l11n)

trino query plan analysis (video series)

query plan analysis is critical for getting every single ounce of performance & scalability out of your trino cluster; my 3-part video series will get you started with the basics

trino: an origin story (nailed it!)

the full trino origin story complete with architectural walkthru and comparisons with other frameworks like hive & spark all in a single video? a single video that is < 20 minutes long? yep, and the creator nailed it!

hive to iceberg migration tool (rev1)

they had a need for an iceberg migration tool, I wrote an iceberg migration tool — i committed it as a github project, then i promoted a github project (i’ve got macklemore’s thrift shop in my head as i write this excerpt)

hive acid transactions work on trino (can even update a partitioned column)

it seems that folks who haven’t used hive in production are always quick to say that hive doesn’t have classic crud operations, much less the merge statement, and that simply isn’t true – this post shows you that you can create a hive acid table and mutate its contents with trino

hive, trino & spark features (their journeys to sql, performance & durability)

different big data sql engines are created to solve a particular lack of focus from existing ones, but sooner or later they all start looking like each other from their list of features and observable behaviors

wrapping up my 8 year hortonworks – cloudera adventure (best job ever)

what an amazing eight years at hortonworks/cloudera — the technology, the focus, the use cases, the domains, the FUN and most importantly, the PEOPLE, made this the best job of my entire career and make it super hard to say goodbye to this role

updated streaming supervision features scorecard (added flink)

added apache flink to the comparison grid of kafka streams, spark streaming, and storm focused on the features they offer the operations side of the devops formula — it measures up well

securing hive entities (ranger and atlas to the rescue)

video showing how to use ranger & atlas to create security policies on hive tables, columns and rows as well as implementing data masking and tag-based restrictions

hive’s merge statement (it drops a lot of acid)

hive’s merge command provides another option for acid transactioning beyond insert, update and delete — this post walks you through a simple example and looks at the underlying filesystem at all the base, delta and delta_delete files that are created to support this standard sql command

hive delta file compaction (minor and major)

a quick walk-thru of how minor and major compactions occur for hive transactional tables; ensuring all the delta files eventually roll into base ones