
what is driving the semantic layer revival? (ai can’t live without it)
decades later semantic layers are still a good idea. will the value they provide agentic ai finally be the reason enterprises build & maintain these valuable business context dictionaries?
my iceberg book (early release 1)
super stoked to announce the first early release of my upcoming o’reilly book, optimizing your apache iceberg lakehouse, has been published — pull down the pdf and let me know what you think
don’t lead your chat-based llm (it wants to please)
ai tools want to please us, but their overly-agreeable responses are tweaked to make use happy, not necessarily provide the right, or best, response — don’t trust the response at face value!
understanding iceberg deletion vectors (and enjoying some humble pie)
for or a given iceberg snapshot, there can be 0 or 1 deletion vector per data file & a deletion vector cannot span more than one data file
my freewrite alternative (under $200)
a smart typewriter, freewrite, is made of a simple keyboard, a tiny e-ink display, a low capacity flash card, and a low-bandwith wifi card – my first car didn’t cost $750
building trino data pipelines (with sql or python)
trino is well-known as a fast query engine, but it is also a robust transformation processing engine that allows data engineers to developer in sql and/or python
yarp: yet another rag post (this time using sql)
you don’t have to know python or bother your data scientists to start exploring genai concepts like rag; you just need a tool that offers these features in a familiar sql interface
trino query plan analysis (video series)
query plan analysis is critical for getting every single ounce of performance & scalability out of your trino cluster; my 3-part video series will get you started with the basics
logo to company match game (data engineering open-source projects)
can you match the open-source data engineering project logos with the company names who are most affiliated with each?
delta lake time-travel (just reference the version)
trino’s delta lake connector offers features around versioning to include comparing versions and time-travel querying
Follow My Blog
Get new content delivered directly to your inbox.