apache iceberg table maintenance (is_current_ancestor part deux)

This post is about the maintenance activities that Apache Iceberg tables need to undergo as more and more versions are created. I will be using the details from my prior iceberg snapshot post to build upon and I do recommend you check it out first.

As before, I’m using Starburst Galaxy built on top of Trino and will be leveraging the ALTER TABLE EXECUTE docs which details the steps for the needed maintenance.

Starting point

Querying the table from the iceberg snapshot post we see there are 8 records at this point.

SELECT * FROM test_rb ORDER BY f1;

Querying the view below gives us some details about the current snapshot of the test_rb table.

-- see prior post for the CREATE VIEW statement
SELECT * FROM curr_ver_dets;

More easily visualized with the following graphic.

A peek at the folder in S3 where this table is rooted at shows…

./metadata/ — has 51 files
./data/ — has 14 files

Run the following to see how many data files make up the current version.

SELECT file_path, file_format, 
       record_count, file_size_in_bytes
  FROM "test_rb$files";

The output shows just 5 of the 14 files are used in this version (let’s not worry about all those “small files” — that’s a discussion for another time & just a byproduct of our simple tests).

Compact the data

The optimize command is used for rewriting the content of the specified table so that it is merged into fewer but larger files. If the table is partitioned, the data compaction acts separately on each partition selected for optimization. This operation improves read performance.
https://docs.starburst.io/latest/connector/iceberg.html#optimize

ALTER TABLE test_rb EXECUTE optimize;

After running the query on "test_rb$files" again, it doesn’t look like much has happened; down from 5 to 4 files We have to take into account these are TINY little files and Trino runs in parallel.

If you look closely, you’ll see there are only the 8 records shown earlier that are spread across the files — down from 13 previously. This is because the compaction process is also doing a merge-on-write operation to get rid of those merge-on-read “delete files”.

Note to self… Seems we need a blog post explaining how the “no in-place deletes” constraint actually allows deletes with Iceberg…

It may not sound like we changed the data or the structure (which causes the creation of a new version), but we did. Rewriting the data like this creates yet another version. Running the previous curr_ver_dets query shows this (only displaying the last two rows).

Here is the updated visualization.

And yes, because we created another version we have even more files on the data lake. The metadata subfolder has 60 files now and data has 18.

Prune old versions

The expire_snapshots command removes all snapshots and all related metadata and data files. Regularly expiring snapshots is recommended to delete data files that are no longer needed, and to keep the size of table metadata small. The procedure affects all snapshots that are older than the time period configured with the retention_threshold parameter.
https://docs.starburst.io/latest/connector/iceberg.html#expire-snapshots

Everything before this last version occurred more than a few days ago, so my command below is taking that in account to get rid of all versions except the current one. Normally, you would likely want to keep some prior versions and only be dropping those you know you can get rid of.

ALTER TABLE test_rb 
EXECUTE expire_snapshots(retention_threshold => '3d');

GOOD! The system is suggesting that we are being too aggressive. Well, let’s just be “aggressive” anyways! Run the session override below and then the expire_snapshots command again.

SET SESSION mycatalog.expire_snapshots_min_retention = '2d';

We only have a single version now!

Let’s verify we cannot rollback to that version it references as its parent_id (notice the operation value of replace above).

CALL mycatalog.system.rollback_to_snapshot(
   'myschema', 'test_rb',
    4056689309859241417);

Remove orphaned files

There are still a lot of data files (11) on the data lake for this table although our file compaction process lowered the count down to 4 needed for the current version. Thankfully, there’s another maintenance activity that can clean up these files that are no longer referenced.

The remove_orphan_files command removes all files from a table’s data directory that are not linked from metadata files and that are older than the value of retention_threshold parameter. Deleting orphan files from time to time is recommended to keep size of a table’s data directory under control.
https://docs.starburst.io/latest/connector/iceberg.html#remove-orphan-files

Like before, we need to be a bit “aggressive” and lower the minimum threshold value.

SET SESSION 
webinar2.remove_orphan_files_min_retention = '2d';

ALTER TABLE test_rb 
EXECUTE remove_orphan_files(retention_threshold => '2d');

Hooray, the data lake only has the 4 referenced file shown earlier!

Wrapping up

These maintenance activities rebuild many small files into fewer larger ones, get rid of those pesky/interesting “delete files”, and reduce the amount of data lake storage required maintaining so many historical versions. Coupling these maintenance activities along with the inherent abilities that come from leveraging the metadata on the data lake will allow your Iceberg tables to continue to perform at internet-scale!

apache iceberg table maintenance (is_current_ancestor part deux)

Starting point

Compact the data

Prune old versions

Remove orphaned files

Wrapping up

Like this:

Related

Published by lestermartin

Leave a ReplyCancel reply

Starting point

Compact the data

Prune old versions

Remove orphaned files

Wrapping up

Share this:

Like this:

Related

Published by lestermartin

Leave a ReplyCancel reply

Discover more from Lester Martin (l11n)