In this post, I’ll show how you can get reproducible builds in NPM—just to prove that it can actually be done–and demonstrate just how much nicer Yarn is than the alternative.
Basically, two goals are not met by NPM out of the box:
- Reproducibility: I need to be able to get the exact same set of dependencies on multiple machines.
- Reliability: I need a way to make a change and push out a new build, even if NPM is down (or there’s another left-pad incident). Basically, I need to be able to do a build offline.
Pin Direct Dependencies
The first obvious step is to pin down the versions of your direct dependencies in your package.json (e.g. ^1.0.0). Unfortunately, this won’t help with transitive dependencies. Doing an NPM install on another machine will likely pull down different versions. So how do we pin down the versions of our transitive dependencies?
To deal with transitive dependencies, NPM implemented a feature called shrinkwrap, which gets you a little further.
It pins down transitive dependencies (yay!), but…
- Your shrinkwrap file does not update automatically, and you are not warned when it’s out of date. You just have to remember to update it when you add or remove a dependency.
- Optional dependencies of transitive dependencies still break things. If I do an NPM install on my Mac, and it installs an optional dependency that only works on a Mac, then I shrinkwrap it (which makes the optional dependency non-optional) and commit and push to CI, which runs Linux and breaks because it can’t build the dependency.
- Dependencies are still pulled from the main NPM package repository, so we’re still vulnerable to a left-pad incident.
- If I use shrinkwrap and a dependency uses an empty version string to specify the version of one of its dependencies, NPM breaks.
I was able to fix two of the above issues with a tool called shrinkpack, which is actually pretty cool. Shrinkpack:
- Provides support for offline builds, offering a clean way of bundling the tarballs of your dependencies in with your project for offline builds. This gives many of the advantages of committing your node_modules directory, but with a much smaller footprint. You only need to commit one file per dependency.
- Fixes the issue with optional dependencies (with a little manual effort when you run into them).
Hurray! now we can have repeatable offline builds! Almost…
Things that still suck:
- I have to remember to both shrinkwrap and shrinkpack every time I add a dependency. And if I forget, I might not discover my mistake until much later.
- Transitive optional dependencies don’t ruin everything now, but I have to add an entry to the package.json every time I encounter one that breaks CI.
- The empty version string problem is still there.
Preventing Accidental Deployment of Irreproducible Builds
The first one turns out to be a major problem because it means my builds are only reproducible when I remember to shrinkwrap + shrinkpack before I commit. In practice, irreproducible builds were rarer, but still a problem. So I came up with an ugly, but workable solution that verifies on CI that a build is reproducible.
When NPM install runs on the CI machine, it uses this command:
HTTPS_PROXY=https://you.probably.forgot.to.shrinkpack.your.depdendencies.before.pushing.example.com npm install
This prevents NPM from talking to the main NPM repository and forces it to do an offline build. If you forget to shrinkwrap + shrinkpack before commit, the build will break with an error message saying it can’t connect to https://you.probably.forgot.to.shrinkpack.your.depdendencies.before.pushing.example.com.
Whew. There you have it. A messy, involved, but fairly workable solution.
The Yarn Alternative
However, compare that to Yarn. To do the same thing with Yarn (see this blog post):
yarn config set yarn-offline-mirror ./yarn_packages;
- Commit yarn_packages and yarn.lock
- Now for your CI builds you can run:
And that’s it.
Oh, also Yarn is like 10 times faster, but the speed is not why I use it.