Open Source Basics: NPM Edition

As software developers, we’ve long used third-party code in our day-to-day work, but these days, it’s much easier to find and integrate it with package managers and searchable repositories.

Inevitably, there comes a time when our unique use of a library exposes a new bug, or we find that we could almost use that sweet tool if only it did this one tiny thing differently. When that happens, we find ourselves popping open the hood and making changes to a third-party dependency.

The same modern cushy systems also make it easier to maintain these changes, collaborate, and contribute our changes upstream. This is what I’m going to talk about today.

fork

I’ll use NodeJS’s npm in this example, but the process is similar for other languages’ packaging systems like RubyGems or PyPI.

Fork It

So we’ve decided to make a change to a library. Say we’re using the npm package foo, referenced in our application’s package.json file like this:


"devDependencies": {
    "foo": "1.2.3",
    ...

The first step, of course, is to clone the repository. Make sure to check out the same revision that your application is currently using. (It’s probably a recent release, not trunk.)

With npm, we can reference our local copy like this:


"foo": "file:/Users/johnruble/repos/foo",

This is a memorable but somewhat blunt approach, with a couple of caveats:

  • file:/ sources do not know about Git. They’re just looking at what’s on disk, so don’t try to reference a specific branch or revision.
  • This path is simply a source we can install from. To pick up changes, we’ll need to re-npm install and rebuild our app. If you find yourself doing this repeatedly, look into npm link.

Now that we can build our app using our own custom version of the third-party component, we’re ready to dive in.

Eventually, our experiment is successful. We’ve made changes, and we want to use them in development (and eventually production) builds of our app. After pushing our branch to another remote where it can live for a while, we can reference our repository in our app’s package.json, so that it can be reached by other developers, CI, and deployment:


"foo": "jrr/foo#branch-with-my-changes", //(github shorthand), or
"foo": "git://private.repo.com/jrr/foo.git#branch-with-my-changes",

Keeping a separate fork allows us to keep moving forward for now, but eventually, we’ll probably want to…

Unfork It

The big downside to keeping a fork like this long-term is that it puts friction between us and future updates from upstream. We’re going to want those bug fixes and new features, but it’s a tedious chore to switch over to other repo, reintegrate our changes, update the reference from our application, etc.

On the flip side, there are several advantages to making our fork obsolete by contributing the work upstream:

  1. Code review: The changes we made are in somebody else’s code, unfamiliar to us. If we submit our changes upstream, we get review from the experts.
  2. That small change we made is a tiny piece of custom software, and it has a disproportionately large maintenance cost. What if somebody else could maintain it for free?
  3. That cool thing we built? We get to share it with the world!

So, we’ve decided to submit our changes upstream. How do we do it?

Get prepared

  • We’ve been working from a tagged release of the library, but changes are typically made on a develop or master branch. Merge the latest code from upstream into your branch (or better yet, rebase onto it).
  • Run the library’s tests to make sure it’s still behaving correctly.
  • Use this updated version of the the library in your app, and run your app’s tests to make sure the library is still behaving the way you want.
  • Clean up your branch (squash commits, remove commented code, etc.). It may be easier to just check out a new branch from master and apply all your changes to it in one commit.
  • Write tests! This is critical since we’re working with code that 1) we depend on, and 2) is not under our control. In particular, write tests to specify and document the behavior we need, and to defend our changes against accidental regression in the future.

Create the pull request

Now we’re ready to create a pull request. Make it as nice for the maintainers as you can. Spend a few minutes looking to see if the project has any guidance for contributors. Fill out their template, file an issue, etc. Write up your changes for the project’s changelog. Ask for feedback on your implementation.

Wrap it up

With luck, after some iteration, our changes will be accepted. After the pull request has been merged, we can switch our app’s package reference back to the upstream repository at a specific commit:


"foo": "git://github.com/user/foo.git#3f25967e",

Finally, when our changes are released with the library’s next version, we can switch back to vanilla upstream:


"devDependencies": {
     "foo": "1.2.4",

It feels good to remove the lingering risk that the fork represented for our project, and also to know that other developers are using our code!