building trino data pipelines (with sql or python)

trino is well-known as a fast query engine, but it is also a robust transformation processing engine that allows data engineers to developer in sql and/or python

yarp: yet another rag post (this time using sql)

you don’t have to know python or bother your data scientists to start exploring genai concepts like rag; you just need a tool that offers these features in a familiar sql interface

develop, deploy, execute & monitor in one tool (welcome to apache nifi)

for those not familiar with apache nifi, come on a short overview of how this framework rather uniquely spans so many of the phases of the typical software development lifecycle

exploring ai data pipelines (hands-on with datavolo)

after explaining what rag ai apps are all about & showing what a typical ai data engineering pipeline looks like, i wanted to offer a hands-on lab exercise actually building a simple pipeline use datavolo cloud

understanding rag ai apps (and the pipelines that feed them)

i’m learning all about rag ai apps and wanted to try to explain, at a high-level, what these are all about plus do the same for the etl pipelines that are key to their success

joining spark dataframes with identical column names (an easier way)

presenting an easier solution to the problem of colliding column names when joining spark dataframes than i previously offered in my most popular post that just happens to be four years old — some things do age well

pystarburst via a jupyter notebook (exploring the tpc-h dataset)

ready to explore pystarburtst via a jupyter notebook? this post points you to a single-click solution to spin up jupyter that has sample notebooks ready to run — you’re welcome!

becoming a data engineer (yet another top 10 list)

after a recent class i was asked what skills someone needs to become a data engineer – there are plenty of these lists all over the internet, yet here i go assuming i know enough to jot down yet another; at least i put mine all in a single picture 😉

dbt cloud & starburst galaxy workshop (beta testers welcome)

interested in building a data pipeline with dbt cloud and starburst galaxy? if so, then this post presents recorded videos of 7 lab exercises plus the lab guide itself so you work through them on your own & at your pace