sql window functions explained (transparently as possible)

SQL has had “windowing functions” for a long time, but not everyone has explored them before. They definitely fall into the analytical query family and TBH, the first half of my 30 year career was focused on OLTP application development and CRUD programmers usually don’t need these fancy critters.

If you ALREADY know all about window functions then this isn’t the article for you, but I do welcome your comments at the bottom of the post if there are better ways to introduce these to new newbs. If you are new to them, let’s see if I can help you understand them.

What are window functions?

In SQL, a window function or analytic function is a function which uses values from one or multiple rows to return a value for each row. (This contrasts with an aggregate function, which returns a single value for multiple rows.) Window functions have an OVER clause; any function without an OVER clause is not a window function, but rather an aggregate or single-row (scalar) function.

https://en.wikipedia.org/wiki/Window_function_(SQL)

That’s a lot at once. What if I told you they are kinda like a GROUP BY, but keeping the the row-granularity in the result set and each row could have its own aggregated value?

Hmmm… maybe that didn’t help much! Let’s see if this first example helps any.

Canonical example is a running average

SELECT
  avg(totalprice) OVER (
    PARTITION BY
      custkey
    ORDER BY
      orderdate 
        ROWS 
          BETWEEN UNBOUNDED PRECEDING
            AND 
          CURRENT ROW
  )
FROM
  orders;

Basically, the figurative example above shows

  • That all rows are still present
  • Each row a logical “window” (or collection of records) that can be used for computations
  • That window contains the current row and all other orders with the same custkey that were placed prior to the current one
  • Each row has a new column that is the average of all the totalprice values for the given window

The ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW keywords in the above query define what records belong in the window that was used to calculate the aggregated value added to the record. The left side of the AND represents the lower_bound and the right side is the upper_bound. The bounds can be any of these options.

  • UNBOUNDED PRECEDING – all rows before the current row (the default lower_bound)
  • n PRECEDING – n rows before the current row
  • CURRENT ROW – the current row
  • n FOLLOWING – n rows after the current row
  • UNBOUNDED FOLLOWING – all rows after the current row (the default upper_bound)

Supports sophisticated range definitions

When the ORDER BY column is a numeric or a date/time datatype, you can swap out the keyword ROWS with RANGE to create a more sophisticated window definition. One in which the n PRECEDING and/or n FOLLOWING options are not a fixed number (n) of rows, but a logical watermark based on the value of the ORDER BY column.

I’m from Missouri… SHOW-ME!

SELECT
  avg(totalprice) OVER (
    PARTITION BY
      custkey
    ORDER BY
      orderdate 
        RANGE 
          BETWEEN interval '1' month PRECEDING
            AND 
          CURRENT ROW
  )
FROM
  orders;

Looking at the visual above and concentrating on the cust_1 records, the first three records (sorted by orderdate) end up having the same average as in the first example. This is because for each of these rows, all the preceding ones were placed within the single month lower_bound identified.

When the 2022-12-25 order is processed, the window itself only included itself as all of the preceding records were more than a month ago.

I’m hopeful this helps get you started on your window functions journey and if there is still any confusion, please add a comment below and I’ll try to help you out. I might even be able to update the post to make things even more clear if needed.

Published by lestermartin

Software development & data engineering trainer/evangelist/consultant currently focused on data lake frameworks including Hive, Spark, Kafka, Flink, NiFi, and Trino/Starburst.

Leave a comment