How to Partition Consecutive Rows with PostgreSQL

After a recent hot streak in my golf league, I decided I’d like to add a new feature to the league’s website. This is a custom web app (mentioned previously by Ryan Abel) that would show each player their best averages over two rounds, three rounds, etc. We have a PostgreSQL database with all of the rounds played, so it was just a matter of finding the right SQL syntax to add up scores over each N-week period for a player.

If you do a search for “Postgres window functions,” the top results from the official PostgreSQL documentation are part of a tutorial,and the Window Functions section. And while those do a good job of describing window functions and how to partition data in a table using an ORDER BY, nothing describes solving the problem I was working on. That problem was summing values over a specified number of records for a particular partition (in my case, consecutive rounds of golf for a specific player).

The PostgresSQL website buried the answer I needed was buried in the less search result friendly Value Expressions – Window Function Calls documentation. It’s here that they provide the syntax for a window_definition:


[ existing_window_name ]
[ PARTITION BY expression [, ...] ]
[ ORDER BY expression [ ASC | DESC | USING operator ] [ NULLS { FIRST | LAST } ] [, ...] ]
[ frame_clause ]

And more importantly, the optional frame_clause, which can be one of:


{ RANGE | ROWS | GROUPS } frame_start [ frame_exclusion ]
{ RANGE | ROWS | GROUPS } BETWEEN frame_start AND frame_end [ frame_exclusion ]

And finally, the frame_start and frame_end, which can be one of:


UNBOUNDED PRECEDING
offset PRECEDING
CURRENT ROW
offset FOLLOWING
UNBOUNDED FOLLOWING

Using a frame_clause, I can specify partitions that cover consecutive rounds for any arbitrary number. For example, I can look at windows that cover a four-round span:


WINDOW w AS (
  PARTITION BY league_rounds.member_id
  ORDER BY league_rounds.member_id, outings.played_at ASC
  ROWS BETWEEN 3 PRECEDING AND 0 FOLLOWING
)

This partitions the data by the league member and then creates windows that include the current row and the three previous rows (and no following rows).

You can then use these windows with aggregate functions to sum up scores over the period, count the number of rounds, etc. Here’s everything put together:


WITH stats AS (
  SELECT
    league_rounds.member_id,
    SUM(league_rounds.to_par) OVER w AS scores,
    COUNT(league_rounds.to_par) OVER w AS num_rounds,
    ARRAY_AGG(DATE(outings.played_at)) OVER w AS the_dates,
    ARRAY_AGG(league_rounds.week_num) OVER w AS the_weeks,
    MAX(DATE(outings.played_at)) OVER w AS max_date
  FROM league_rounds
  INNER JOIN outings ON outings.id = league_rounds.outing_id
  WHERE league_rounds.member_id = 27
  WINDOW w AS (
    PARTITION BY league_rounds.member_id
    ORDER BY league_rounds.member_id, outings.played_at ASC
    ROWS BETWEEN 3 PRECEDING AND 0 FOLLOWING
  )
  ORDER BY member_id, outing_id
)
SELECT
  RANK() OVER (order by stats.scores ASC),
  stats.scores AS score,
  ROUND(stats.scores::decimal / stats.num_rounds, 2) AS avg,
  CONCAT(members.first_name, ' ', members.last_name) AS name,
  stats.num_rounds,
  stats.the_dates,
  stats.the_weeks
FROM stats
INNER JOIN members ON members.id = stats.member_id
WHERE stats.num_rounds = 4
ORDER BY stats.scores ASC, max_date DESC

Notes: Partition Consecutive Rows with PostgreSQL

  • To eliminate windows that didn’t have enough rounds (like the window that includes the first round a player participated in), a where clause filter on the number of rounds in the window (num_rounds) is used.
  • I used the ARRAY_AGG function to include all of the dates and week numbers that a particular window covers.
  • The RANK function ranks the the windows from best to worst.
Conversation

Join the conversation

Your email address will not be published. Required fields are marked *