you don’t have to know python or bother your data scientists to start exploring genai concepts like rag; you just need a tool that offers these features in a familiar sql interface
Tag Archives: tutorial
trino query plan analysis (video series)
query plan analysis is critical for getting every single ounce of performance & scalability out of your trino cluster; my 3-part video series will get you started with the basics
delta lake time-travel (just reference the version)
trino’s delta lake connector offers features around versioning to include comparing versions and time-travel querying
iceberg acid transactions with partitions (a behind the scenes perspective)
a port of my prior post taking a deeper look at what happens under the hood of hive with “acid” transactions — this time on iceberg tables with parquet files
google codelabs (my go-to tutorial authoring framework)
i’ve been creating technical hands-on tutorials for years, mostly in blog posts, but for standing up a site focused on self-paced learning modules i haven’t found a better tool than google codelabs
exploring ai data pipelines (hands-on with datavolo)
after explaining what rag ai apps are all about & showing what a typical ai data engineering pipeline looks like, i wanted to offer a hands-on lab exercise actually building a simple pipeline use datavolo cloud
well designed partitions aid iceberg compaction (call them ice cubes)
despite what you may have heard, partitions are not dead (yes, there are multiple tools in the shed) and using a well-defined partitioning strategy with apache iceberg can help prevent concurrency issues when compacting files
iceberg materialized views in galaxy (no más storage_schema)
starburst galaxy, as a saas offering, just keeps slipping in nice bits of features & functionality — this one tackles hiding the underlying storage table of an iceberg materialized view
joining spark dataframes with identical column names (an easier way)
presenting an easier solution to the problem of colliding column names when joining spark dataframes than i previously offered in my most popular post that just happens to be four years old — some things do age well
pystarburst in 90 seconds (try it)
still thinking about trying to get a pystarburst code stub up/n/running? starburst galaxy makes it pain free and you can even get your first dataframe created via python in under 90 seconds — why not give it a try?