The Tradeoff of Multiple Repositories

More often than I expect, I come across software projects that consist of multiple source control repositories. The reasons vary. Perhaps it’s thought that the web frontend and backend aren’t tightly coupled and don’t need to be in the same repository. Perhaps there’s code that’s meant to be used throughout an entire organization. Regardless, there are real costs involved in the decision to have a development team work in distinct, yet related, repositories. I believe these costs are always overlooked.

Double (or n Times) the Gruntwork

The most obvious cost involved is additional gruntwork. Let’s imagine a project with a mobile app and web service, each having its own Git repository. When it’s time to start a new feature, the feature branch will need to be created twice. When the work is finished, two pull requests will need to be made. When it’s appropriate to make a commit, it might need to be done twice. When it’s time to push, it might need to be done twice. To help manage all of this, an extra terminal might be appropriate.

Individually, none of these costs is very significant. Collectively, they represent a moderate inconvenience and cognitive burden. I’ve seen developers weigh this and decide it’s worth the cost, because they are trying to achieve some other ideal.

Ultimately, these inconveniences are just symptoms of a more fundamental—and easily overlooked—tradeoff.

Context: Not Version-Controlled

A repository is essentially a set of snapshots in time. For any commit, it’s easy to see not only what changes were made, but also precisely what other files existed and contained at that point in time. This is pretty obvious, after all. It’s one of the biggest selling points of version control.

With a project consisting of one single repository, that snapshot encapsulates everything there is to know about the source code. Once there are multiple repositories involved in a single project, this context is fragmented.

This fragmentation manifests in various ways. Let’s look at some examples:

  • When moving code between repositories, neither one has knowledge of the other. Information about where the code came from or went is lost.
  • If your frontend branch repo depends on your server to be running with a corresponding branch, there’s no native or reasonable way to express that relationship. Information is lost.

The Real Tradeoff of Multiple Repositories

Breaking a project into multiple repositories involves a fundamental tradeoff. By doing so, information about the broader context of the application is pushed entirely outside of version control.

Although it’s possible to work to counteract this, for example, by establishing team practices, using Git submodules, or building custom machinery, it will require work. That’s work spent to regain what you get for free by using a single repository.

Therefore, the most likely place that this information will move is into the culture and individual minds of the team. This is a much more ephemeral and unreliable place than a source repository. It makes it harder to onboard new developers and coordinate things like continuous integration.

Conclusion

It’s up to your unique situation whether it’s a win or loss to split your code into multiple repositories, but the costs are both real and easily overlooked. I’d strongly suggest weighing these tradeoffs thoughtfully. And, if you find yourself on a project where these costs are bringing you down, I’ve written a blog post on how to super-collide your repositories together.

Conversation
  • I’m not sure whether the purpose of the article is to attract comments, but I have to disagree with this.

    Git by nature is a distributed version control system. Breaking problems into smaller ones is generally a commonly accepted way to approach things. It seems what you are suggesting is that a monolith is better than a micro-services based approach. The approach you suggest is probably a good way to go for a very small application, such as a website with one or two developers, but anything larger, I would avoid monolithic design of any sortl

    You even explained in the blog post the fact that Git offers the tooling to handle separation of components into multiple repositories. In addition, there are various tools outside of version control systems to accomplish this, and the best part is, you can store the information of those tools to Git.

    If you need the context specific history, you achieve this with submodules as well.

    Coordination of continuous integration is also better off when components are divided to separate repositories. It allows you to have layers of testing instead of just one set of tests.

    In the end, it boils down to the size of your project like you said. From my personal experience during 10 years of seeing software projects of various sizes, only small projects, such as web pages for individuals and SMEs are such that should follow the suggestion in this post.

  • Comments are closed.