
NGL; I actually LIKE to write code (real code and even SQL) and did not think I was going to actually enjoy just writing out my requests in natural language, but today I had a little “stop fighting it” moment. Let me show you…
Setup
I held a hands-on, workshop-styled, webinar showing the basics of Building Data Pipelines with Ibis and Starburst which focused on Starburst Galaxy‘s sample burstbank dataset with these entities & relationships.

Details on the exercises are in my Starburst DevRel GitHub repo, specifically here, but one of the analysis questions I wanted to find the answer to was as follows.
Define a view that groups customers by their medallion status and the state/province they live in and calculate the following for the group; number of auto loans and average auto payment.
Taking a peek at the tables themselves ensured this was a softball question, but just enough to showcase implementing it with the Ibis Expression (aka dataframe) API using the Trino backend, as well as comparing it with straight SQL. Both implementations got the following results.

Check out the Jupyter notebook if you want to run any of this yourself.
Ibis
I got the results above from this bit of Python code using Ibis.
# grab needed fields from customerall_cust = con.table("customer", database=("sample", "burstbank")) \ .select("custkey", "state")# grab needed fields from customer_profilecust_p = con.table("customer_profile", database=("sample", "burstbank")) \ .select("custkey", "customer_segment")# css -> Customer State (and) Segmentcss = all_cust.join(cust_p, all_cust.custkey == cust_p.custkey)# grab needed fields from accountall_acct = con.table("account", database=("sample", "burstbank")) \ .select("custkey", "auto_loan_id", "auto_loan_status", "auto_loan_balance")# only keep those with ACTIVE auto loansactive_auto = all_acct.filter( [all_acct.auto_loan_status == "open", all_acct.auto_loan_balance > 0])# grab needed fields from loan_paymentauto_pymt = con.table("auto_loan_payment", database=("sample", "burstbank")) \ .select("auto_loan_id", "payment_amount")# put it all togetherthe_aggs = active_auto.join(css, active_auto.custkey == css.custkey) \ .join(auto_pymt, active_auto.auto_loan_id == auto_pymt.auto_loan_id) \ .group_by(css.state, css.customer_segment) \ .aggregate( nbr_auto_loans = active_auto.auto_loan_id.nunique(), avg_payment = auto_pymt.payment_amount.mean() )# few touch-ups to be DONEthe_view = the_aggs.mutate( avg_payment = the_aggs.avg_payment.round()).order_by("customer_segment", "state")print(the_view)
I can’t lie, it was fun touching this dataframe API again, but for folks that aren’t familiar with Python or the concept of a dataframe, this might look a bit intimidating at first glance.
SQL
In that same notebook I show that this could also be tackled with some straight-forward SQL (maybe I went overboard with my CTEs; haha).
WITHactive_auto_loans AS (SELECT custkey, auto_loan_id, auto_loan_balance FROM sample.burstbank.account WHERE auto_loan_status = 'open' AND auto_loan_balance > 0),cust_state_and_segment AS (SELECT c.custkey, c.state, cp.customer_segment FROM sample.burstbank.customer c JOIN sample.burstbank.customer_profile cp ON c.custkey = cp.custkey),cust_active_auto AS (SELECT * FROM active_auto_loans aal JOIN cust_state_and_segment csas ON aal.custkey = csas.custkey)SELECT caa.state, caa.customer_segment, count(distinct alp.auto_loan_id) AS nbr_auto_loans, round(avg(alp.payment_amount)) AS avg_payment FROM cust_active_auto caa JOIN sample.burstbank.auto_loan_payment alp ON caa.auto_loan_id = alp.auto_loan_id GROUP BY caa.state, caa.customer_segment ORDER BY caa.customer_segment, caa.state
As I stated before, this was a pretty simple ask and it was almost as much fun as the Python code, but it did take “more than a minute” to get it working perfectly. In fact, I’ll let you know that I actually had an error in my Ibis code and my query that are in the video recording of the webinar (can you find it?); but more on that in a bit.
Claude
Like most everyone else, I work for Initech and we are ALL being tasked with finding out how to work more efficiently using Gen AI tools. As always, new technology isn’t here to make work life easier for anyone… it is here for us to work even harder (my hot take for the day). One of the supported tools we have is everyone’s good friend; Claude.
Following Starburst’s documentation, which was clearly where Claude got its answer to this same question, I wired it up to the MCP Server available on Starburst Galaxy. I then asked the usual “tell me about” questions to learn more about the burstbank tables. Heck, I even had Claude build me that ERD at the top of this post and I’m surely going to use that little trick again.
Ultimately, I broke down and asked the million dollar question…
Find customers with active auto loans, group them by their medallion status and state they live in, provide the total number of these accounts by the grouping as well as an average auto payment amount, rounded to the nearest dollar. Order the results by medallion status and state.
And yes… it got the answer perfectly right. Of course, this is a softball question as I mentioned at the top, but I guess that even this COP (Cranky Old Programmer) can admit this might be the place to start when crafting any non-trivial query.
But remember, as I pointed out in don’t lead your chat-based llm (it wants to please), DON’T TRUST THE RESPONSE AT FACE VALUE! Verify & validate, early & often!! That said, my initial results from Claude actually were different than I received from Ibis and my SQL query implementations. Initially I thought, “see, it got it wrong!”, but in my verify/validate efforts I realized it was ME that got it wrong earlier.
I quickly realized the problem when I compared results and was able to update the Jupyter notebook to be correct. Side quest: if YOU can see what my error was in the webinar recording let me know and I’ll send you some Trino, Iceberg, and Starburst stickers! Heck, Claude even showed me the query it wrote.
SELECT cp.customer_segment AS medallion_status, c.state, COUNT(DISTINCT a.auto_loan_id) AS total_accounts, ROUND(AVG(alp.payment_amount)) AS avg_auto_paymentFROM sample.burstbank.customer cJOIN sample.burstbank.account a ON c.custkey = a.custkeyJOIN sample.burstbank.customer_profile cp ON c.custkey = cp.custkeyJOIN sample.burstbank.auto_loan_payment alp ON a.auto_loan_id = alp.auto_loan_idWHERE a.auto_loan_status = 'open'GROUP BY cp.customer_segment, c.stateORDER BY cp.customer_segment, c.state
Yes, seems I do like to write little CTEs instead of just banging out all the joins at once. Hey, it’s a matter of style! 😉
Epilogue
At a personal level, I still have a LOT of concerns about the impact of Gen AI including environmental harm and the effect of ai on intelligence (behold the idiocracy), but these tools are here and I would be lying to suggest they are not making me go faster. I guess I’d just say, find what works for you, don’t be completely closed-minded, NEVER forget that everything has consequences, and absolutely DON’T TRUST THE RESPONSE AT FACE VALUE!
Note: I did NOT use the help of any LLM-based Gen AI tool to write this blog post. I actually hope that shows! It’s still our time. Long live the writers!

























































