Control Repository Size with SVN Sparse Checkout

Atomic Object was founded by a computer science professor, and during the company’s earlier stages, it only employed developers. That legacy is still felt today in the tooling we use to run the business. Case in point: We use a code repository tool—Subversion (SVN)—to securely store, manage revisions, and control access to some business documents. Documents stored in SVN include employee information and sales contracts.

In my role as managing partner, I’m exposed to nearly all aspects of our business. So naturally, I need copies of all of these repositories. Unfortunately, that’s a problem for my 256 GB Macbook Pro hard drive these days, as the sales contract repository has ballooned to ~20 GB. Mercifully, Sivhuan Sera gave me an easy way to reduce the size of my local copy of that repository while still maintaining my ability to work with it. In this blog post, I’ll show you how.

SVN Sparse Checkout

The sales repository I am dealing with has many folders, each containing binary and text files. There are 1,421 directories at the root level, and I interact with less than 100 of them on a yearly basis. My goal, therefore, has been to find a way to only download the things I need as the needs arise (e.g., lazily). Sivhuan helped me with this by linking me to SVN’s Sparse Directories.

When you check out a repository in SVN, it will, by default, recursively check out all subdirectories until the entire structure is copied to your local disk. This is usually a good thing. However, I wanted all directories to be shallow copies, and only a few directories to be fully copied locally. More specifically, I wanted to download all of the directories without their content.

SVN checkout has a handy --depth with the following options (from the documentation):

  • --depth empty: Include only the immediate target of the operation, not any of its file or directory children.
  • --depth files: Include the immediate target of the operation and any of its immediate file children.
  • --depth immediates: Include the immediate target of the operation and any of its immediate file or directory children. The directory children will themselves be empty.
  • --depth infinity: Include the immediate target, its file and directory children, its children’s children, and so on to full recursion.

Immediates handles the situation I am in perfectly, with the following checkout command:

svn checkout --depth immediates

After running the command, I have a trunk directory containing 1,421 empty folders.

Mixing Things Up

A local repository copy containing only 1,421 empty folders has a small footprint, but it’s not very useful on its own. The beauty of SVN sparse checkout is you can mix depths. For folders where I’m actively working, I can use the svn update command to change the depth setting to infinity:

svn update --set-depth=infinity trunk/dir1

The ability to selectively download only parts of our repository has really helped my ever-worsening hard disk space situation!

Other Repository Technologies

Naturally, I did some digging to see if the same trick could be accomplished with Git and Mercurial.

With Mercurial, you can use the narrowhg extension (hgext.narrow) to include or exclude paths. Documentation is sparse, but there is a narrow clone plan document, and a link to the obsolete readme can be found on the Mercurial extensions page here.

In Git, there is no clear analogy to SVN’s sparse checkout, but there are a couple of helpful options that will cut down the size of the repo and time to clone:

  1. Shallow clone: Using git clone --depth [n] [url] will create a shallow clone with a history truncated to the specified number of commits, but this approach will still download the entire working copy.
  2. Use Git LFS: Git LFS was designed for people struggling with large binary files. It locally stores pointers to files that are kept in a remote location so that the working copy is much smaller. It’s possible to migrate a Git repository to Git LFS.
  3. Use submodules and only check out those you need.

I’m certainly missing some options, so I’d love to hear if there’s some approach in Git that matches SVN’s sparse directories exactly. Let me know in the comments below.