tuning – Lester Martin (l11n)

my iceberg book (early release 1)

super stoked to announce the first early release of my upcoming o’reilly book, optimizing your apache iceberg lakehouse, has been published — pull down the pdf and let me know what you think

trino query plan analysis (video series)

query plan analysis is critical for getting every single ounce of performance & scalability out of your trino cluster; my 3-part video series will get you started with the basics

iceberg snapshots affect storage footprint (not performance)

it is easy to understand why most folks initially imagine that iceberg’s ability to maintain a long history of snapshots will cause performance problems, but that is not the case — the real gotcha is that keeping many versions can quickly consume 2-10+ times the amount of data lake storage space

well designed partitions aid iceberg compaction (call them ice cubes)

despite what you may have heard, partitions are not dead (yes, there are multiple tools in the shed) and using a well-defined partitioning strategy with apache iceberg can help prevent concurrency issues when compacting files

apache iceberg table maintenance (is_current_ancestor part deux)

as a follow-on to my earlier post about iceberg versioning (and the is_current_ancestor flag), i thought it would be useful to show working examples of the maintenance activities that are needed to manage the sprawl of data lake files that come with more and more versions

z-order (visualized)

when asked to compare sort-by with z-order for data lake tables i realized i finally needed to have a better understanding of what z-order is all about and my goal with this blog post is to present a simplified visualization of what’s going on and how it can help

Tag Archives: tuning

my iceberg book (early release 1)

trino query plan analysis (video series)

iceberg snapshots affect storage footprint (not performance)

well designed partitions aid iceberg compaction (call them ice cubes)

apache iceberg table maintenance (is_current_ancestor part deux)

z-order (visualized)

configuring the cache service (starburst enterprise)

determining # of splits w/trino/starburst/galaxy (iceberg table format)

determining # of splits w/trino/starburst/galaxy (hive table format)

presenting at hadoop summit (archiving evolving databases in hive)