i’m so excited to have returned to starburst and be focused on rebooting the devrel function, not to mention staying active in the trino and iceberg communities — long live the icehouse
Author Archives: lestermartin
apache spark (yet another overview)
an overview of apache spark presented from 20,000 feet, on the surface, and below the waterline
iceberg acid transactions with partitions (a behind the scenes perspective)
a port of my prior post taking a deeper look at what happens under the hood of hive with “acid” transactions — this time on iceberg tables with parquet files
the effect of ai on intelligence (behold the idiocracy)
the long-term benefits of sunscreen have been proved by scientists whereas my advice on ai has no basis more reliable than my own meandering experience; i will dispense this advice now, but trust me on the sunscreen
my shortest gig ever (but what a ride)
wait… what??? you started evangelizing datavolo as a developer advocate and then 4 months later they were acquired by one of the biggest tech companies out there? well, yes… yes i did and i’m ready to do it again
google codelabs (my go-to tutorial authoring framework)
i’ve been creating technical hands-on tutorials for years, mostly in blog posts, but for standing up a site focused on self-paced learning modules i haven’t found a better tool than google codelabs
develop, deploy, execute & monitor in one tool (welcome to apache nifi)
for those not familiar with apache nifi, come on a short overview of how this framework rather uniquely spans so many of the phases of the typical software development lifecycle
exploring ai data pipelines (hands-on with datavolo)
after explaining what rag ai apps are all about & showing what a typical ai data engineering pipeline looks like, i wanted to offer a hands-on lab exercise actually building a simple pipeline use datavolo cloud
understanding rag ai apps (and the pipelines that feed them)
i’m learning all about rag ai apps and wanted to try to explain, at a high-level, what these are all about plus do the same for the etl pipelines that are key to their success
iceberg snapshots affect storage footprint (not performance)
it is easy to understand why most folks initially imagine that iceberg’s ability to maintain a long history of snapshots will cause performance problems, but that is not the case — the real gotcha is that keeping many versions can quickly consume 2-10+ times the amount of data lake storage space