hive to iceberg migration tool (rev1)

I’m preparing for a Hive to Iceberg migration webinar coming up on May 8, 2024, (yes, a #shamelessselfpromotion there) where I’ll talk about the in-place & shadow migration strategies you learn more about in this hands-on Starburst tutorial. It is cool to migrate one table at a time, but what if you have a whole schema full of tables to migrate. Heck, what if you have a WHOLE BUNCH of schemas, too?!?!

Well, that’s why programmers will always have a job! They can build a tool to help you out. I’m sure someone is going to build something better than what I did this evening, but I did put together an initial, “happy path”, migration tool. I put it up on GitHub at lestermartin/trino-dataframes-exploration/IcebergMigrationTool, too, so you can check it out if you want as well. What a swell guy I am. 😉

You can find instructions there of how to use this Jupyter notebook and I even recorded a quick demonstration of the features in this initial version.

This first cut only tackles the easy stuff, but I’ll work on it and share more here when it is worth mentioning again. As always, more eyes on it will only make it better so let me know what it is missing or give me a pull request with your updates.

Published by lestermartin

Developer advocate, trainer, blogger, and data engineer focused on data lake & streaming frameworks including Trino, Hive, Spark, Flink, Kafka and NiFi.

Leave a Reply

Discover more from Lester Martin (l11n)

Subscribe now to keep reading and get access to the full archive.

Continue reading