Snapshot Tests for Software with Complex Data

Article summary

We did not have a snapshot problem so much as a diff problem.
A bad diff makes a good test feel bad.
Here's where the noise came from.
Here's what helped.
The value showed up most in code review.
We still needed focused assertions.
Here's what I would do next time.

I like the basic idea of snapshot tests. Many software developers first run into them in frontend work, where you snapshot rendered output and compare it later after a code change. That feels pretty natural. You are checking whether the shape of the result changed in a way you expected.

On a recent migration project, we found the ideas behind snapshot testing worked well for backend data, too. We were translating large source records into a different target model, and it was hard to get comfortable with broad changes by reading code and a handful of assertions alone. Snapshotting the transformed data helped a lot.

That part is pretty easy to buy into.

The harder part was keeping those snapshots useful once we had a lot of them.

I think that is the part that gets skipped sometimes. People discuss why snapshots are helpful, which is fair, but not as much about what happens when the diffs get noisy, and the tests start feeling expensive to maintain.

We did not have a snapshot problem so much as a diff problem.

Once we had a few good examples in place, the value was obvious. A broad change to the output showed up clearly. We could review it, decide whether it was correct, and move on. That was great.

The trouble was that not every diff was like that.

Some were clean and easy to understand. Others had a lot of junk in them. The output was technically accurate, but it was full of changes that did not help explain the behavior we actually cared about. When that happened often enough, the tests started to feel heavier than they should have.

That was the pattern we kept running into. The snapshots themselves were fine. The diffs were what made them annoying.

A bad diff makes a good test feel bad.

The most frustrating version of this is when a test fails and you open the diff already half-expecting that most of it will not matter.

That is not a great place to be.

Once reviewers get used to seeing a lot of churn, it becomes easier to skim, easier to approve, and easier to miss something real. The test still exists, but it is doing less work for the team.

That is why I started thinking about snapshots less as stored output and more as something written for a reviewer. If a human is supposed to look at the diff and make a call, the diff needs to be readable enough to support that.

Here’s where the noise came from.

On our project, the noise was usually not anything dramatic. It was small stuff that stacked up:

Values that changed but were not important to the scenario
Setup that varied from test to test in ways we did not mean
Output that included a lot more detail than the review really needed
Related data that shifted and dragged other parts of the snapshot along with it

None of that made the snapshot wrong. It just made it harder to review.

I think that is an important distinction. A snapshot can be faithful to the system and still be a lousy artifact for code review.

Here’s what helped.

The most useful change we made was pretty simple. We got pickier about what kind of diff we were willing to live with.

That pushed us into a few good habits.

Stable setup helped more than I expected.

If a test is about one behavior, the rest of the test data should stay out of the way.

We spent time improving our builders and defaults so the baseline output stopped moving around for accidental reasons. In a few places, that meant using catch-all values just to keep unrelated parts of the output steady.

This was not exciting work, but it paid off fast. A lot of snapshot churn was really setup churn.

We stopped assuming every real value belonged in the snapshot.

This was another useful adjustment.

At first it is tempting to think that if a value is real output, it should stay in the snapshot. In practice, that made some of our tests worse. Some values were worth keeping because they helped explain behavior. Some changed often and mostly made the diff longer.

Once we got more comfortable normalizing or omitting the second kind, the snapshots got easier to read.

That did not make the tests less honest. It made them easier to use.

Ignoring fields needed some discipline.

It is also pretty easy to overcorrect here and ignore too much.

The question that helped us was: if this changed, would I want someone reviewing the diff to notice?

If the answer was yes, we kept it. If the answer was no, it was a candidate for normalization or omission. That was not a perfect rule, but it kept us from going too far in either direction.

The value showed up most in code review.

The payoff for all of this was not that the tests looked nicer. The payoff was that they became more useful during review.

Before we cleaned them up, a failing snapshot often meant digging through a bunch of churn to find the one change that mattered. After we cleaned them up, the diffs got a lot faster to read. That made reviews easier and made refactors feel less risky.

It also helped with something that is hard to measure but easy to feel. The domain got easier to work in. When the snapshots were clear, they doubled as examples of how the transformation behaved in realistic cases. That was often more useful than reading a long test full of targeted assertions and trying to piece the full output together in my head.

We still needed focused assertions.

I do not think snapshot tests replaced anything for us.

If I want to prove one narrow rule, I still want a direct assertion. That is usually the clearest way to write the test.

Snapshots helped with a different problem. They were good at showing the full result after a change. That mattered a lot in a codebase where one small update could ripple through a large output.

Using both together worked better than trying to make one style do all the jobs.

Here’s what I would do next time.

If I were starting this over, I would spend less time talking about snapshot testing in general and more time talking about diff quality right away. Getting a snapshot test running is not usually the hard part. Keeping the output clean enough that people continue to trust it is where more of the work is.

That work was worth it for us. Once the diffs got better, the tests felt lighter, reviews got faster, and the whole thing became more useful.

So yes, snapshot tests can be great outside UI work too. But if the diffs are noisy, people will feel that long before they can explain it. That is the part I would pay attention to earlier next time.

Here’s How We Used Snapshot Tests for Software with Complex Data

Article summary

We did not have a snapshot problem so much as a diff problem.

A bad diff makes a good test feel bad.

Here’s where the noise came from.

Here’s what helped.

Stable setup helped more than I expected.

We stopped assuming every real value belonged in the snapshot.

Ignoring fields needed some discipline.

The value showed up most in code review.

We still needed focused assertions.

Here’s what I would do next time.

Join the conversation Cancel reply

Tell Us About Your Project

Article summary

We did not have a snapshot problem so much as a diff problem.

A bad diff makes a good test feel bad.

Here’s where the noise came from.

Here’s what helped.

Stable setup helped more than I expected.

We stopped assuming every real value belonged in the snapshot.

Ignoring fields needed some discipline.

The value showed up most in code review.

We still needed focused assertions.

Here’s what I would do next time.

Related Posts

Form Fatigue? Let Cypress and Faker.js Do the Dirty Work

Is It Blue, or Is It a Bug?

3 Tips for Effective Snapshot Testing with Roborazzi

Keep up with our latest posts.

Join the conversation Cancel reply

Tell Us About Your Project