query plan analysis is critical for getting every single ounce of performance & scalability out of your trino cluster; my 3-part video series will get you started with the basics
Tag Archives: hive
logo to company match game (data engineering open-source projects)
can you match the open-source data engineering project logos with the company names who are most affiliated with each?
apache spark (yet another overview)
an overview of apache spark presented from 20,000 feet, on the surface, and below the waterline
iceberg acid transactions with partitions (a behind the scenes perspective)
a port of my prior post taking a deeper look at what happens under the hood of hive with “acid” transactions — this time on iceberg tables with parquet files
well designed partitions aid iceberg compaction (call them ice cubes)
despite what you may have heard, partitions are not dead (yes, there are multiple tools in the shed) and using a well-defined partitioning strategy with apache iceberg can help prevent concurrency issues when compacting files
reasons to avoid apache iceberg (clickbait)
wrapper post for two starburst deliverables (webinar & blog post) discussing why you should, or maybe shouldn’t, move from apache hive to apache iceberg for your data lake table format
iceberg materialized views in galaxy (no más storage_schema)
starburst galaxy, as a saas offering, just keeps slipping in nice bits of features & functionality — this one tackles hiding the underlying storage table of an iceberg materialized view
recap of the inaugural iceberg summit (my top 5 observations)
tl;dr – iceberg is pervasive, the real fight is for the catalog, concurrent transactional writes are a bitch, append-only tables still rule, and trino is widely adopted
trino: an origin story (nailed it!)
the full trino origin story complete with architectural walkthru and comparisons with other frameworks like hive & spark all in a single video? a single video that is < 20 minutes long? yep, and the creator nailed it!
hive to iceberg migration tool (rev1)
they had a need for an iceberg migration tool, I wrote an iceberg migration tool — i committed it as a github project, then i promoted a github project (i’ve got macklemore’s thrift shop in my head as i write this excerpt)