i actually like writing code and did not imagine i would enjoy just asking my business data analysis questions with natural language, but i’m nothing if not flexible and open to reevaluating my opinions
Tag Archives: galaxy
my iceberg book (early release 1)
super stoked to announce the first early release of my upcoming o’reilly book, optimizing your apache iceberg lakehouse, has been published — pull down the pdf and let me know what you think
understanding iceberg deletion vectors (and enjoying some humble pie)
for or a given iceberg snapshot, there can be 0 or 1 deletion vector per data file & a deletion vector cannot span more than one data file
building trino data pipelines (with sql or python)
trino is well-known as a fast query engine, but it is also a robust transformation processing engine that allows data engineers to developer in sql and/or python
yarp: yet another rag post (this time using sql)
you don’t have to know python or bother your data scientists to start exploring genai concepts like rag; you just need a tool that offers these features in a familiar sql interface
trino query plan analysis (video series)
query plan analysis is critical for getting every single ounce of performance & scalability out of your trino cluster; my 3-part video series will get you started with the basics
optionality and common sense (why i returned to starburst)
i’m so excited to have returned to starburst and be focused on rebooting the devrel function, not to mention staying active in the trino and iceberg communities — long live the icehouse
iceberg acid transactions with partitions (a behind the scenes perspective)
a port of my prior post taking a deeper look at what happens under the hood of hive with “acid” transactions — this time on iceberg tables with parquet files
iceberg materialized views in galaxy (no más storage_schema)
starburst galaxy, as a saas offering, just keeps slipping in nice bits of features & functionality — this one tackles hiding the underlying storage table of an iceberg materialized view
recap of the inaugural iceberg summit (my top 5 observations)
tl;dr – iceberg is pervasive, the real fight is for the catalog, concurrent transactional writes are a bitch, append-only tables still rule, and trino is widely adopted