
I’m finally getting a bit of time to try out Trino SQL routines in Starburst Galaxy and I thought I’d share my experiences. Like is often the case, the intro docs actually help quite a bit. I thought I’d show a simple example to get this post started and then add another slightly more complex one to illustrate their biggest benefit — abstraction!
A simple SQL routine
Run the following to create a SQL routine that takes a first and last name as parameters and then returns a single string in the “last_name, first_name” format.
CREATE FUNCTION combine_name(
first_name varchar, last_name varchar)
RETURNS varchar
RETURN concat_ws(', ', last_name, first_name);
SELECT combine_name('Lester', 'Martin');

Starburst Galaxy has a sample table that can help test this out a bit more.
SELECT custkey,
combine_name(first_name, last_name) AS combined_name
FROM sample.burstbank.customer
LIMIT 5;

Easy-peasy, but for this simple example it would have been perfectly acceptable to just use the concat_ws function in-stream of the SELECT statement.
A slightly more complex one
For this one, let’s continue to use the sample Burst Bank customer table. Here are a few of the key columns we will work with.
SELECT custkey, paycheck_dd,
estimated_income, fico
FROM sample.burstbank.customer
LIMIT 5;

Our business partners have created a medallion status algorithm based on creating a score by adding the following values together.
- If the customer is using direct deposit, start with 50,000 points (0 points if not).
- Their income (as-is).
- 100 times their credit score.
Then assign a medallion status based on the following score ranges.
| bronze | up to 75,000 |
| silver | up to 150,000 |
| gold | up to 250,000 |
| platinum | everything else |
Here is the implementation.
CREATE OR REPLACE FUNCTION medallion_status(
paycheck_dd varchar, estimated_income double, fico integer)
RETURNS varchar
BEGIN
DECLARE m_score double DEFAULT 1.00;
SET m_score = IF(paycheck_dd = 'Y', 50000.00, 0.00) +
estimated_income + (fico * 100);
IF m_score < 75000.01 THEN
RETURN 'bronze';
ELSEIF m_score < 150000.01 THEN
RETURN 'silver';
ELSEIF m_score < 250000.01 THEN
RETURN 'gold';
ELSE
RETURN 'platinum';
END IF;
RETURN null;
END;
Run a quick smoke-test.
SELECT paycheck_dd, estimated_income, fico,
medallion_status(paycheck_dd, estimated_income, fico)
FROM sample.burstbank.customer LIMIT 5;

Let’s do a quick aggregation on the newly created medallion status to verify it makes sense across the full customer base.
WITH incl_medallion AS (
SELECT paycheck_dd, estimated_income, fico, state,
medallion_status(paycheck_dd, estimated_income, fico)
AS medallion_status
FROM sample.burstbank.customer
)
SELECT medallion_status,
round(avg(estimated_income),2) AS avg_income,
round(avg(fico),0) AS avg_fico
FROM incl_medallion
GROUP BY medallion_status
ORDER BY avg_income desc;

Yep, it looks good from here! Plus, we can abstract away the full user-defined function (UDF) so it doesn’t make our query so convoluted AND we can reuse it easily when needed. Cool beans!