well designed partitions aid iceberg compaction (call them ice cubes)

It seemed only last week that Databricks was telling us to just use Z-ordering instead of partitioning. Now I can’t stop hearing about liquid clustering from them which supposedly “replaces table partitioning and ZORDER”. I’m not saying either of these things aren’t cool, but I am clearly saying that good old-fashioned partitioning STILL makes sense in many scenarios.

It especially makes sense for “very large tables” (even dbx docs say that) when you have a clear access pattern that can allow you to read only a small percentage of these folders. Just as I called out in my recent recap of the inaugural Iceberg Summit conference, append-only tables still rule and these high volume/n/velocity time-series immutable tables can spread out over many years all while most querying is done at a much smaller date range.

A well-designed partitioning strategy can be like taking a highly selective scoop of water instead of trying to drink the entire data lake. To help visualize it further, think of your table as a stack of ice cube trays with each ice cube representing a partition. Of course, my affinity to Apache Iceberg surely influenced this metaphor. Furthermore, my Why partitioning matters: 3 Best practices to improve performance posting on the Starburst blog calls out that partitioning works best on low cardinality columns that have fairly uniform distributions. Getting my metaphor yet?

In that blog post I further state that it is ideal to see the following (additional thoughts inside parenthesis for each).

A new partition is created periodically (for a day-level partitioning strategy, a new one is created each day)
Data additions/modifications occur for a predictable period (likely the majority of the data for a given day will arrive the very day, but could include some period of time for late records to arrive)
Finally, the partition itself becomes static (even with late arriving data, you’ll likely eventually just stop receiving it for consistent reporting purposes)
The process repeats itself over and over (i.e. create a new partition as the new day’s data begins to arrive)

For example, the following humpback whale visualization suggests that the data for a given day (and associated partition) is mostly landing that day itself, but it normally sees trailing data arriving for up to 3 more days.

In this case, 3 days after a partition was created it becomes static. This could be a great time to execute a compaction job on the files that are in that particular partition. This will roll together many smaller files into fewer larger ones thereby allowing the query engine (you’re using Trino, right?) to be more efficient and much faster.

ALTER TABLE my_iceberg_table EXECUTE optimize
   WHERE partition_key = <value>;

Since I’m (at least casually) discussing Iceberg tables, this would also be very helpful given the fact that this compaction process creates a new version/snapshop of the table. It will be subject to the optimistic concurrency model employed. In our example, data is being added to the table in very short windows of time by some ingestion pipeline. That pipeline works great in isolation as it continuously creates a new snapshot, one after other.

Iceberg’s concurrency model assumes no other writers are operating, but the writer could find out when attempting to atomically swap the table metadata file in the catalog that 1+ other changes happened after the compaction work started and finished before the compaction process. In that scenario, the writer can check to verify that none of the other changes that finished before it had any collisions with the partition it is working on. If not, then it can continue on with making its changes.

Without targeting a specific partition when running the compaction process, it is likely that there would be at least one collision that would not pass the retry validation rules and the transaction would fail. There would be no data corruption in this scenario, but the compute resources utilized without making any impact to the table could have been leveraged by other processes. And, of course, the compaction will have to be attempted again and could still fail.

Let me reiterate on what I started with at the beginning of this post, I’m all for us learning and growing into new ways to solve the problem that partitioning already handles (especially if/when it solves ADDITIONAL problems), but let’s not throw the proverbial baby out with the bathwater.

Now I’m realizing dirty bathwater and ice cubes are making my tummy hurt when I think about them together. Maybe I shouldn’t mix a proverb with a metaphor! 😉

All that said, partitions are NOT dead and I do hope you have a few minutes to see my Starburst blog post on this subject. My messages above were focused on reinforcing the need to pick an efficient partitioning strategy as well as showing how a well-designed one would allow Iceberg compaction jobs to commit more smoothly and efficiently.

Below is a zip file of the Starburst blog post.

PartitioningMatters_BlogPost.zip Download

well designed partitions aid iceberg compaction (call them ice cubes)

Like this:

Related

Published by lestermartin

Leave a ReplyCancel reply

Share this:

Like this:

Related

Published by lestermartin

Leave a ReplyCancel reply

Discover more from Lester Martin (l11n)