Level up your PostgreSQL Skill with Lateral Joins

PostgreSQL’s lateral joins have a lot of uses. One of my favorites is the ability to reuse calculations in a query.

Before I discovered lateral joins, I would either copy calculations throughout the query or use subqueries. Neither of these approaches is ideal because they make the query much more difficult to read—and nearly impossible to update and maintain. Additionally, the query is inefficient, because it performs the calculations more than once.

I am going to demonstrate a lateral join using a contrived example from one of my hobbies: powerlifting. The Wilks Coefficient is a value that is calculated from the weightlifter’s body weight and provides a normalized measurement of their strength. For example, Joey weighs 100kg and can lift 500kg across the three main lifts (squat, bench press, and deadlift). Chandler weighs 80kg and can lift 450kg. Even though Joey can lift more weight than Chandler, his Wilks score is 304, which is lower than Chandler’s score of 307. This indicates that Chandler is a better lifter than Joey.

In this example, I have three tables: lifters (information about each weightlifter), wilks_constants (data used in the Wilks calculation), and mcculloch_constants (another variable that is sometimes used to account for age).

The query, with the calculation copied throughout, looks like this:

SELECT
  name,
  500 / (wilks_constants.a + weight * wilks_constants.b + weight^2 * wilks_constants.c + weight^3 * wilks_constants.d + weight^4 * wilks_constants.e + weight^5 * wilks_constants.f) AS coefficient,
  total * 500 / (wilks_constants.a + weight * wilks_constants.b + weight^2 * wilks_constants.c + weight^3 * wilks_constants.d + weight^4 * wilks_constants.e + weight^5 * wilks_constants.f) as wilks,
  mcculloch_constants.coefficient * total * 500 / (wilks_constants.a + weight * wilks_constants.b + weight^2 * wilks_constants.c + weight^3 * wilks_constants.d + weight^4 * wilks_constants.e + weight^5 * wilks_constants.f) as mcculloch
FROM lifters
JOIN wilks_constants ON lifters.sex = wilks_constants.sex
LEFT OUTER JOIN mcculloch_constants ON lifters.age = mcculloch_constants.age;

This is hard to read and even harder to maintain. If the formula were to change, it would need to be updated three times.

This query could use a subquery to calculate the Wilks Coeffient, and it would look like this:

SELECT
  wilks_query.name,
  wilks_query.coefficient,
  wilks_query.wilks,
  mcculloch_constants.coefficient * wilks_query.wilks AS mcculloch
FROM
  (
    SELECT
      coefficent_query.name,
      coefficent_query.age,
      coefficent_query.coefficient AS coefficient,
      coefficent_query.total * coefficent_query.coefficient AS wilks
    FROM
      (
        SELECT
          lifters.name AS name,
          lifters.age AS age,
          lifters.total AS total,
          500 / (
            wilks_constants.a +
            lifters.weight * wilks_constants.b +
            lifters.weight ^ 2 * wilks_constants.c +
            lifters.weight ^ 3 * wilks_constants.d +
            lifters.weight ^ 4 * wilks_constants.e +
            lifters.weight ^ 5 * wilks_constants.f
          ) AS coefficient
        FROM
          lifters
          JOIN wilks_constants ON lifters.sex = wilks_constants.sex
      ) AS coefficent_query
  ) AS wilks_query
  LEFT OUTER JOIN mcculloch_constants ON wilks_query.age = mcculloch_constants.age;

This is not much of an improvement over the first query, even though the duplication has been removed. It’s still hard to read and not significantly easier to maintain.

Using lateral joins, this query becomes quite simple:

SELECT
  name,
  calculated_coefficient,
  total * calculated_coefficient AS wilks,
  mcculloch_constants.coefficient * total * calculated_coefficient AS mcculloch
FROM
  lifters
  JOIN wilks_constants ON lifters.sex = wilks_constants.sex
  LEFT OUTER JOIN mcculloch_constants ON lifters.age = mcculloch_constants.age,
  LATERAL (
    SELECT
      500 / (
        wilks_constants.a +
        weight * wilks_constants.b +
        weight ^ 2 * wilks_constants.c +
        weight ^ 3 * wilks_constants.d +
        weight ^ 4 * wilks_constants.e +
        weight ^ 5 * wilks_constants.f
      ) AS calculated_coefficient
  ) AS coefficient_calculation;

This is much cleaner, and the amount of duplicated code is greatly reduced.

Lateral joins can even reference columns from other lateral joins:

SELECT
  name,
  calculated_coefficient,
  calculated_wilks,
  calculated_mcculloch
FROM
  lifters
  JOIN wilks_constants ON lifters.sex = wilks_constants.sex
  LEFT OUTER JOIN mcculloch_constants ON lifters.age = mcculloch_constants.age,
  LATERAL (
    SELECT
      (
        500 / (
          wilks_constants.a +
          weight * wilks_constants.b +
          weight ^ 2 * wilks_constants.c +
          weight ^ 3 * wilks_constants.d +
          weight ^ 4 * wilks_constants.e +
          weight ^ 5 * wilks_constants.f
        )
      ) AS calculated_coefficient
  ) AS coefficient_calculation,
  LATERAL (
    SELECT
      total * calculated_coefficient AS calculated_wilks
  ) AS wilks_calculation,
  LATERAL (
    SELECT
      mcculloch_constants.coefficient * calculated_wilks AS calculated_mcculloch
  ) AS mcculloch_calculation;

This is simple, clean, understandable, and maintainable.

Reusing calculations or values within a query is just one of the conveniences of lateral joins; however, it is not the only usage. Check out the documentation for more detailed use cases.