In a somewhat prioritized order, here are my current plans for future blog posts that I might write. Send me a message at the bottom of this page if you have some thoughts on what might be better.
- Using Google’s public Iceberg REST catalog and sample datasets with Trino and/or Starburst Galaxy — more info at https://opensource.googleblog.com/2026/01/explore-public-datasets-with-apache-iceberg-and-biglake.html and https://gist.github.com/rambleraptor/7fd2fd55a208da7e5c000430d54d8db4
- Post: Do a part deux of building scalar udf’s w/sql for trino (aka sql routines) this time with Python — have it working on Trino itself, but waiting until I can do it on Galaxy (only SQL UDFs avail now)
- Delta to Iceberg migration process on Trino
- SB schema discovery
- SB Galaxy data quality jobs
- SB Galaxy multi-statement jobs
- Test out https://trino.io/docs/current/object-storage/file-system-local.html a bit
- Revive the data lake performance series
- Series: AI
- Post: Review of Coursera’s course I’m checking out
- Post: LangChain for dummies
- Post: AI Agents bolted on Galaxy
- Post: Snowflake integration w/Galaxy using Snowflake connector and Polaris connector
- Post: add
add_filesfunctionality to Iceberg connector - Post: How
write.metadata.previous-versions-max=100affects Iceberg snapshot expiration (i.e. does it leave lots of orphaned files?) && how does it work alongside Starburst’siceberg.expire_snapshots.min-retention=7dproperty (and what happens when there are more than 100 snapshots that are valid — can you time travel back past the 100th snapshot, and if so, how impactful is that & how does it work under the covers?) - Compare multiple Trino-oriented clusters in a box solutions such as:
- Series: Micro Cluster on NUCs
- Post: OS and networking setup of 4+ nodes
- Post: Ambari install of Hadoop & Hive (and Ozone)
- Post: Collocating Trino (and stopping YARN)
- Post: Run MinIO on a machine with a big disk (such as doc’d here) and configure new tables on the Micro Cluster to access it
- Post: Create a good example of a bloom filter (possibly a table of web logs with BF on the IP address so that a security team could quickly look for only a specific IP)
Below is a list of open-source development efforts I’d like to work on, but have not yet allowed myself anytime to focus on. Things are easier to do with help, so please let me know if there’s something here (or in your head) you might want to co-develop with me.
- Trino: modify the Iceberg connector to NOT error out (and just ignore it) if it sees a
type='iceberg'property in theWITHDDL clause so that Starburst customers with SEP & Galaxy could just maintain a single DDL statement - Trino: cluster-wide parameter to prevent results for being > n rows (with override session property)
Maybe I’m losing my mind, but I thought some parody covers of some songs (like Weird Al does) might be fun as a very side project!?!?
- OSS (based on OPP)
- Ice Berg Baby (based on Ice Ice Baby)
- Loosing My Edge (based on same named song from LCD Soundsystem)