Reproducible Builds with NPM (And Why You Should Use Yarn Instead)

If you’ve ever worked on a large JavaScript project with many dependencies, you know how difficult it can be to have reliable, repeatable builds. I’ve seen many projects resort to committing their node_modules directory. While this approach will give you dependable builds, it is quite annoying and causes many problems of its own–the least of which is that it can easily add tens of thousands of files to an otherwise clean repo.

I spent a lot of time and effort trying to get nice reproducible builds on a recent JavaScript project, and I actually succeeded! Less than a week later, Yarn was released, solving all of these problems much more elegantly.

In this post, I’ll show how you can get reproducible builds in NPM—just to prove that it can actually be done–and demonstrate just how much nicer Yarn is than the alternative.

Basically, two goals are not met by NPM out of the box:

  • Reproducibility: I need to be able to get the exact same set of dependencies on multiple machines.
  • Reliability: I need a way to make a change and push out a new build, even if NPM is down (or there’s another left-pad incident). Basically, I need to be able to do a build offline.

Pin Direct Dependencies

The first obvious step is to pin down the versions of your direct dependencies in your package.json (e.g. ^1.0.0). Unfortunately, this won’t help with transitive dependencies. Doing an NPM install on another machine will likely pull down different versions. So how do we pin down the versions of our transitive dependencies?

Shrinkwrap

To deal with transitive dependencies, NPM implemented a feature called shrinkwrap, which gets you a little further.

It pins down transitive dependencies (yay!), but…

  • Your shrinkwrap file does not update automatically, and you are not warned when it’s out of date. You just have to remember to update it when you add or remove a dependency.
  • Optional dependencies of transitive dependencies still break things. If I do an NPM install on my Mac, and it installs an optional dependency that only works on a Mac, then I shrinkwrap it (which makes the optional dependency non-optional) and commit and push to CI, which runs Linux and breaks because it can’t build the dependency.
  • Dependencies are still pulled from the main NPM package repository, so we’re still vulnerable to a left-pad incident.
  • If I use shrinkwrap and a dependency uses an empty version string to specify the version of one of its dependencies, NPM breaks.

Shrinkpack

I was able to fix two of the above issues with a tool called shrinkpack, which is actually pretty cool. Shrinkpack:

  • Provides support for offline builds, offering a clean way of bundling the tarballs of your dependencies in with your project for offline builds. This gives many of the advantages of committing your node_modules directory, but with a much smaller footprint. You only need to commit one file per dependency.
  • Fixes the issue with optional dependencies (with a little manual effort when you run into them).

Hurray! now we can have repeatable offline builds! Almost…

Things that still suck:

  • I have to remember to both shrinkwrap and shrinkpack every time I add a dependency. And if I forget, I might not discover my mistake until much later.
  • Transitive optional dependencies don’t ruin everything now, but I have to add an entry to the package.json every time I encounter one that breaks CI.
  • The empty version string problem is still there.

Preventing Accidental Deployment of Irreproducible Builds

The first one turns out to be a major problem because it means my builds are only reproducible when I remember to shrinkwrap + shrinkpack before I commit. In practice, irreproducible builds were rarer, but still a problem. So I came up with an ugly, but workable solution that verifies on CI that a build is reproducible.

When NPM install runs on the CI machine, it uses this command:

HTTPS_PROXY=https://you.probably.forgot.to.shrinkpack.your.depdendencies.before.pushing.example.com npm install

This prevents NPM from talking to the main NPM repository and forces it to do an offline build. If you forget to shrinkwrap + shrinkpack before commit, the build will break with an error message saying it can’t connect to https://you.probably.forgot.to.shrinkpack.your.depdendencies.before.pushing.example.com.

Whew. There you have it. A messy, involved, but fairly workable solution.

The Yarn Alternative

However, compare that to Yarn.  To do the same thing with Yarn (see this blog post):

  • Run:
    yarn config set yarn-offline-mirror ./yarn_packages;
    yarn install;
  • Commit yarn_packages and yarn.lock
  • Now for your CI builds you can run:
    yarn --offline;

And that’s it.

Oh, also Yarn is like 10 times faster, but the speed is not why I use it.