Optimizing GraphQL Queries with DataLoader

In my post about GraphQL and the Apollo server tools, I primarily focused on setting up a simple GraphQL schema and set of resolvers, but did not go much deeper. The main example in that post defined a findBy method which simulated hitting a database, but for the sake of brevity, this detail was largely overlooked.

However, in a real production server, query efficiency is extremely important; if you’re not careful, you can end up with sending multiple round trips to the database over the course of a single GraphQL query.

I recommend using DataLoader, a data caching and batching utility that can drastically increase backend performance, especially in a GraphQL server. This post will explore the performance issues that are often encountered in a GraphQL server and how to use DataLoader to solve them.

The Problem

GraphQL field resolvers are just functions that are called independently of one another. There is no coupling between these resolvers, which gives your application the most flexibility possible.

When resolving the Person type from the example, I have full control over how each field is resolved on each Person object. However, if resolving a field on a type requires another query, then we will likely end up in a classic N+1 situation.

Taking a Closer Look

To see this in action, let’s take the full schema/resolvers setup from the example in the last post. To help exploit the query problem, I have added a bestFriend field to the Person type. Additionally, you can see in the people array that each person has a best_friend attribute, and also notice that a couple of the people are best friends with George.

I have also added some additional logging to the findBy function to help demonstrate the issue.


const schema = `
type Person {
  name: String!
  age: Int
  gender: Gender
  height(unit: HeightUnit = METER) : Float
  bestFriend: Person
}

type Query {
  guys: [Person]
  girls: [Person]
}

enum Gender {
  MALE
  FEMALE
}

enum HeightUnit {
  METER
  INCH
}

schema {
  query: Query
}
`;

const rootResolvers = {
  Query: {
    guys(root, _, context) {
      return findBy('gender', 'MALE');
    },

    girls(root, _, context) {
      return findBy('gender', 'FEMALE');
    }
  },
  Person: {
    name: ({ name }) => name.toUpperCase(),
    height: ({ height }, { unit }) => unit === 'METER' ? height * 0.0254 : height,
    bestFriend: ({ best_friend}) => findBy('name', best_friend).then(people => people[0]),
  },
};

// This `findBy` method simulates a database query.
const findBy = (field, value) => {
  console.log(`finding person with ${field} === ${value}`);
  return Promise.resolve(people.filter(person => person[field] === value));
};

const people = [
  {
    name: 'George',
    age: 17,
    gender: 'MALE',
    height: 72,
    best_friend: 'Alexander',
  }, {
    name: 'Jill',
    age: 19,
    gender: 'FEMALE',
    height: 65,
    best_friend: 'Alexander',
  }, {
    name: 'Alexander',
    age: 32,
    gender: 'MALE',
    height: 68,
    best_friend: 'George',
  }, {
    name: 'Dave',
    age: 19,
    gender: 'MALE',
    height: 58,
    best_friend: 'George',
  }
];

When I run the guys query (for example, from GraphiQL), I get the following logs:

finding person with gender === MALE
finding person with name === Alexander
finding person with name === George
finding person with name === George

Some Observations

Notice two things here:

  1. We clearly have an N+1 problem–for each result from the initial guys query (i.e. for each of the three guys in the result set), we get an additional “best friend” query to the database.
  2. George is queried twice, since he is the best friend of two of the guys.

This is obviously not ideal. Wouldn’t it be nice if we could retrieve all the bestFriend information in a single shot, and also prevent duplication of requests? Luckily for us, there is a solution!

Introducing DataLoader

DataLoader is a utility that supports query batching and caching out of the box. These features are enabled by simply defining a “batch function” and instantiating a DataLoader object.

A batch function is simply a function that takes an array of “keys” and returns a Promise which resolves to an array of values. We’ll see a concrete example of this in a bit.

Batching is accomplished in DataLoader by essentially debouncing consecutive calls to the same batch function and executing the defined batch function a single time. Caching is performed within the context of the individual DataLoader instance. As soon as .load() is called on the DataLoader instance, the result is cached, and the same Promise is returned on consecutive calls. Simply put, .load() is a memoized function.

DataLoader in Action

Using our earlier example, we can create a DataLoader object to batch and cache calls to findBy. To set this up correctly, we should make a change to our findBy method to make it accept an array of values. This will enable more efficient batching of the consecutive queries from the bestFriend resolver.


const findBy = (field, ...values) => {
  console.log(`finding people with ${field} === ${values.join(', ')}`);
  return Promise.resolve(
    people.filter(person => values.includes(person[field]))
  );
};

Next, we can define the DataLoader batch function:


const findByNameLoader = new DataLoader(names => findBy('name', ...names));

The batch function will be passed an array of keys (in this case, it’s passing an array of names) and must return a Promise which resolves to an array of values. Since we just refactored findBy to accept and return multiple values, we now have a fully functional DataLoader instance.

Finally, we need to actually use our new loader function! Let’s just change the bestFriend resolver on the Person type:


Person: {
  // ...
  bestFriend: ({ best_friend}) => findByNameLoader.load(best_friend)
},

DataLoader will batch all consecutive calls to the bestFriend resolver and cache the result. When we run our query again in GraphiQL, we see the following:

finding person with gender === MALE
finding people with name === George, Alexander

Notice that DataLoader batched the three “find by name” queries into a single query, and also filtered out the dupliate “George” query! With just a few extra lines of code, we get some pretty significant savings.

Obviously, this is a very simple example, but you can imagine how much value this can add to a large application.