In my post about GraphQL and the Apollo server tools, I primarily focused on setting up a simple GraphQL schema and set of resolvers, but did not go much deeper. The main example in that post defined a findBy
method which simulated hitting a database, but for the sake of brevity, this detail was largely overlooked.
However, in a real production server, query efficiency is extremely important; if you’re not careful, you can end up with sending multiple round trips to the database over the course of a single GraphQL query.
I recommend using DataLoader, a data caching and batching utility that can drastically increase backend performance, especially in a GraphQL server. This post will explore the performance issues that are often encountered in a GraphQL server and how to use DataLoader to solve them.
The Problem
GraphQL field resolvers are just functions that are called independently of one another. There is no coupling between these resolvers, which gives your application the most flexibility possible.
When resolving the Person
type from the example, I have full control over how each field is resolved on each Person
object. However, if resolving a field on a type requires another query, then we will likely end up in a classic N+1 situation.
Taking a Closer Look
To see this in action, let’s take the full schema/resolvers setup from the example in the last post. To help exploit the query problem, I have added a bestFriend
field to the Person
type. Additionally, you can see in the people
array that each person has a best_friend
attribute, and also notice that a couple of the people are best friends with George.
I have also added some additional logging to the findBy
function to help demonstrate the issue.
const schema = `
type Person {
name: String!
age: Int
gender: Gender
height(unit: HeightUnit = METER) : Float
bestFriend: Person
}
type Query {
guys: [Person]
girls: [Person]
}
enum Gender {
MALE
FEMALE
}
enum HeightUnit {
METER
INCH
}
schema {
query: Query
}
`;
const rootResolvers = {
Query: {
guys(root, _, context) {
return findBy('gender', 'MALE');
},
girls(root, _, context) {
return findBy('gender', 'FEMALE');
}
},
Person: {
name: ({ name }) => name.toUpperCase(),
height: ({ height }, { unit }) => unit === 'METER' ? height * 0.0254 : height,
bestFriend: ({ best_friend}) => findBy('name', best_friend).then(people => people[0]),
},
};
// This `findBy` method simulates a database query.
const findBy = (field, value) => {
console.log(`finding person with ${field} === ${value}`);
return Promise.resolve(people.filter(person => person[field] === value));
};
const people = [
{
name: 'George',
age: 17,
gender: 'MALE',
height: 72,
best_friend: 'Alexander',
}, {
name: 'Jill',
age: 19,
gender: 'FEMALE',
height: 65,
best_friend: 'Alexander',
}, {
name: 'Alexander',
age: 32,
gender: 'MALE',
height: 68,
best_friend: 'George',
}, {
name: 'Dave',
age: 19,
gender: 'MALE',
height: 58,
best_friend: 'George',
}
];
When I run the guys
query (for example, from GraphiQL), I get the following logs:
finding person with gender === MALE
finding person with name === Alexander
finding person with name === George
finding person with name === George
Some Observations
Notice two things here:
- We clearly have an N+1 problem–for each result from the initial
guys
query (i.e. for each of the three guys in the result set), we get an additional “best friend” query to the database. - George is queried twice, since he is the best friend of two of the guys.
This is obviously not ideal. Wouldn’t it be nice if we could retrieve all the bestFriend
information in a single shot, and also prevent duplication of requests? Luckily for us, there is a solution!
Introducing DataLoader
DataLoader is a utility that supports query batching and caching out of the box. These features are enabled by simply defining a “batch function” and instantiating a DataLoader
object.
A batch function is simply a function that takes an array of “keys” and returns a Promise
which resolves to an array of values. We’ll see a concrete example of this in a bit.
Batching is accomplished in DataLoader by essentially debouncing consecutive calls to the same batch function and executing the defined batch function a single time. Caching is performed within the context of the individual DataLoader
instance. As soon as .load()
is called on the DataLoader instance, the result is cached, and the same Promise is returned on consecutive calls. Simply put, .load()
is a memoized function.
DataLoader in Action
Using our earlier example, we can create a DataLoader object to batch and cache calls to findBy
. To set this up correctly, we should make a change to our findBy
method to make it accept an array of values. This will enable more efficient batching of the consecutive queries from the bestFriend
resolver.
const findBy = (field, ...values) => {
console.log(`finding people with ${field} === ${values.join(', ')}`);
return Promise.resolve(
people.filter(person => values.includes(person[field]))
);
};
Next, we can define the DataLoader batch function:
const findByNameLoader = new DataLoader(names => findBy('name', ...names));
The batch function will be passed an array of keys (in this case, it’s passing an array of names) and must return a Promise
which resolves to an array of values. Since we just refactored findBy
to accept and return multiple values, we now have a fully functional DataLoader instance.
Finally, we need to actually use our new loader function! Let’s just change the bestFriend
resolver on the Person
type:
Person: {
// ...
bestFriend: ({ best_friend}) => findByNameLoader.load(best_friend)
},
DataLoader will batch all consecutive calls to the bestFriend
resolver and cache the result. When we run our query again in GraphiQL, we see the following:
finding person with gender === MALE
finding people with name === George, Alexander
Notice that DataLoader batched the three “find by name” queries into a single query, and also filtered out the dupliate “George” query! With just a few extra lines of code, we get some pretty significant savings.
Obviously, this is a very simple example, but you can imagine how much value this can add to a large application.