If you’re working in the land of Node and Typescript, chances are good you’ve already discovered Prisma for accessing your database. But, since this ORM is still fairly young, no clear strategy has emerged for seeding test data. This doesn’t have to be a complicated process, but I always like to use factories for this purpose.
Why use test data factories?
Why bother with data factories when you can just add whatever data you want in your tests? Although it seems like it would be faster, making up data on the spot has a few drawbacks.
It reduces the clarity of what your test is testing. Say your test sets up mock data where one property is important to the test, and five properties are only there because they’re required. The next person to read this test case may have difficulty deciding which of those properties is relevant to the test at hand.
It takes more time in the long run. Sure, if you only need to write a handful of test cases, jamming in mock values as needed might save time. But, anything more than that, and you will wish you had a way to insert a complete model. Taking the time to set up a few factories upfront will pay off in the long run.
It produces more code, and that code needs to be maintained. Eventually, your models will change. If every test makes up data on the spot, that might mean updating every one of those tests. But if your tests used factories, only two things would need to be updated: the factories and tests that actually involve the thing being changed.
It provides very narrow test coverage. If you hard-code some test data, you get test coverage for exactly those values. But if you configure data factories to produce random values, you automatically test a wide range (and combination) of inputs. It might seem like a recipe for flaky tests, but this kind of flakiness is easier to spot and deal with than, say, weird timing issues in a UI test.
Every once in a while, you may run into a problem with a flaky test and suspect it’s due to the randomization. Most fake data generators allow you to configure the seed fed to the pseudo-random number generator. By logging the seed value, you can reproduce a test run by forcing it to use the same seed.
Test data factories in Prisma
Writing a function that generates a model using fake data is pretty simple. But how do you deal with overriding properties in nested models? Or generating a list of models that are mostly the same, with a few key differences? Or creating a template for a complicated but common data structure? These are the sorts of things that will derail a clean factory structure if they are handled ad-hoc as needed.
As of this writing, Prisma does not have its own mechanism for producing test data factories. But Prisma’s create-or-connect syntax works pretty well whether you’re inserting data in production or generating fake data. If your data model is simple, a handful of functions might be sufficient. But test data factories need to be flexible or else they’ll end up being pulled in a thousand different directions to suit a myriad of test cases.
Of the third-party factory packages that are emerging, prisma-factory looks promising. It is simple and straight to the point. Its main benefit is that it generates factory types using your Prisma schema. This saves the tedium of writing and maintaining types that mirror what Prisma generated anyway. If you are looking for a more full-featured (but generic) factory, you could check out factory.ts.
Any Prisma-related factory will likely use Prisma’s nested writes to quickly build up a complicated data structure in a single expression. This works pretty well for the most part but beware of a few tripping hazards involving relations.
Let’s say you have a Prisma schema like this, with a couple of models, a one-to-many relation, and a many-to-many relation:
model User {
id String @id @default(uuid())
email String
purchases Purchase[]
createdWidgets Widget[]
}
model Purchase {
id String @id @default(uuid())
status PurchaseStatus
transactionId String?
purchaseDate DateTime?
user User @relation(fields: [userId], references: [id])
userId String
widget Widget @relation(fields: [widgetId], references: [id])
widgetId String
}
enum PurchaseStatus {
pending
fulfilled
}
model Widget {
id String @id @default(uuid())
name String
creator User @relation(fields: [creatorId], references: [id])
creatorId String
}
And you define some factories like this (using the factory types generated by prisma-factory):
import { faker } from "@faker-js/faker";
export const userFactory = createUserFactory({
email: () => faker.internet.email(),
});
export const widgetFactory = createWidgetFactory({
name: () => faker.commerce.productName(),
creator: () => ({ create: userFactory.build() })
});
export const purchaseFactory = createPurchaseFactory({
status: () => faker.random.arrayElement(["pending", "fulfilled"]),
user: () => ({ create: userFactory.build() }),
widget: () => ({ create: widgetFactory.build() })
})
Prisma is very strict about types.
If you look closely at what Prisma expects for the create
part of a nested write expression, you’ll notice that it excludes the property corresponding to the parent type. This makes sense because the nested model will automatically be connected to the parent. But your factory function will need to produce a completely valid, standalone model (i.e. including all required properties/relations). Notice how the widgetFactory
above specifies a create
clause for the creator
property, since creator
is a required relation. If you try to use this factory within a nested create, Prisma will throw a runtime “Invalid invocation” error. For example:
userFactory.create({
createdWidgets: {
createMany: {
data: [widgetFactory.build()],
},
},
});
This will throw an error because the widgetFactory includes its own “create user” clause. Prisma probably could just ignore this, but in the interest of protecting you from yourself, it tends to throw errors for any unexpected properties. To get around that, you will need to unset that property somehow. This is clunky, but it works:
userFactory.create({
createdWidgets: {
createMany: {
data: [widgetFactory.build({ creator: undefined })],
},
},
});
Nested many-to-many relations are not supported.
This is just a limitation of Prisma in general, but somehow it hits harder in tests. It would be so nice to create a whole deeply nested data structure using a single expression. But because you use Prisma’s “unchecked” input type to create nested many-to-many relations, the handy create
/createMany
nested write syntax is unavailable. Instead, you’ll have to insert the target models separately and then create the many-to-many relation using IDs:
const user = userFactory.create();
const widget = widgetFactory.create();
const purchase = purchaseFactory.create({
user: { connect: { id: user.id } },
widget: { connect: { id: widget.id } }
})
Factory organization
Having a factory pre-configured to generate test data is great, but inevitably you’ll need to customize it for specific cases. You could rely on callers to always override generated properties, but even this will result in a lot of repetition.
I’ve found it’s helpful to organize factories into templates. Looking at our Purchase
model, you can see that there is a status
field that determines which other fields are expected to be present. Instead of simply randomizing all properties on the model (which would result in many invalid combinations), we can use templates for specific combinations of properties. For example, one template for each of the status
cases:
export const purchaseFactory = {
pending: createPurchaseFactory({
status: "pending",
user: () => ({ create: userFactory.build() }),
widget: () => ({ create: widgetFactory.build() })
}),
fulfilled: createPurchaseFactory({
status: "fulfilled",
transactionId: () => faker.datatype.uuid(),
purchaseDate: () => faker.date.recent().toISOString(),
user: () => ({ create: userFactory.build() }),
widget: () => ({ create: widgetFactory.build() })
})
}
Ultimately you’ll have to figure out what structure works for your team. But you’ll know you’ve found it when new tests are written using factories rather than hard-coded test data.