Selecting Top N Per Group in PostgreSQL

For a recent project, I found myself needing to get certain data out of our database. I wanted to select the top two items, grouped by color, and sorted by when they were created. My project was being built on PostgreSQL, so I turned to my handy dandy window functions.

Window functions are similar to aggregate functions, but instead of operating on groups of records to produce a single output record, they operate on records related to the current row. We can use that to add a column to our result set that represents the rank of each item in its respective group. Here’s the completed query:


    SELECT rank_filter.* FROM (
        SELECT items.*, 
        rank() OVER (
            PARTITION BY color
            ORDER BY created_at DESC
        )
        FROM items
        WHERE items.cost < 50
    ) rank_filter WHERE RANK = 1

Let’s break down what’s going on in each piece. The most interesting piece is the inner query that utilizes the OVER clause.


    SELECT items.*, 
      rank() OVER (
          PARTITION BY color
          ORDER BY created_at DESC
      )
    FROM items
    WHERE items.other_value < 50

OVER describes how to window the results. PARTITION BY splits, or partitions, the data into groups with the same value for the specified column. In this case, it builds groups of the same color. ORDER BY within an OVER clause tells PostgreSQL how to order each group.

Once you’ve built your window, you’re ready for the window function: rank() returns the position of the record within its group or window. The above results might look like:

color	other_value	created_at	rank
red	12	2016-01-22	1
red	18	2016-01-23	2
red	19	2016-01-24	3
blue	12	2016-01-22	1
blue	18	2016-01-23	2
blue	19	2016-01-24	3

Now we’re getting close to the goal. However, we want just the top two from each group. We cannot use a simple LIMIT because LIMIT doesn’t know about our partitions. If we use a sub-select from these results, we can now filter by rank.


    SELECT rank_filter.* FROM (
        ... codevious results here ...
    ) rank_filter WHERE RANK <= 2

This gives us just what we wanted:

color	other_value	created_at	rank
red	12	2016-01-22	1
red	18	2016-01-23	2
blue	12	2016-01-22	1
blue	18	2016-01-23	2

This is such a common pattern that PostgreSQL has a built-in to shortcut the process, which is great if you’re only interested in the top ranking record from each group. It’s called DISTINCT ON.
Here’s a quick example:


    SELECT DISTINCT ON (color), items.*
    FROM items
    WHERE items.other_value < 50
    ORDER BY items.created_at DESC

So next time you need to grab the top N records from each group, reach for a rank()/PARTITION query.

Conversation

Roie Koper says:

May 4, 2017

Very helpful article! Very easy! Thanks.

Shawn Anderson says:

May 4, 2017

Thanks Roie
I’m glad you found it useful!

Joey Robert says:

May 11, 2017

It’s important to note rank() will give all equivalent rows the same value. In your example, if created_at is the same for many rows, they will all have the same rank, so your query could return more than 2 rows for the case of RANK <= 2. You can use row_number() to get around this and only return up to a hard limit.

Comments are closed.

Related Posts

How to Create Docusign Envelopes with a Variable Number of Signers

Solve Optimization Problems with MiniZinc and Google OR Tools

3 Reasons I Love CSS and You Should Too

Keep up with our latest posts.

Tell Us About Your Project