Splitting an SVN Repository

Occasionally, a subversion repository may start to become too large and cumbersome to work with easily. If the repository contains a large amount of data, it may take a great deal of time to perform certain operations, such as checking out the entire repository at once. It may make sense to split the repository to isolate the more frequently modified files or directories — especially if they serve different purposes.

For example, let’s say we had a repository named repo1 with the following structure:

# in repo1
branches
tags
trunk
trunk/project_a
trunk/project_b


trunk/project_a may contain fewer files which are more important, and modified more frequently. By comparison, trunk/project_b may contain a very large number of files, take up a lot of space on disk, and be rarely modified.

While it is possible to just create a new repository, repo2, for trunk/project_b, copy the files over, and then remove the trunk/project_b files from repo1, this would be a bad idea. First of all, we would lose the history and individual commits for trunk/project_b. Secondly, the data for trunk/project_b would still technically be stored in repo1, needlessly increasing its overall size.

The more appropriate solution is to make use of some specific svnadmin commands and utilities to gather only the required data, and then use this consolidated data to create two new repositories: repo1_new and repo_2. repo1_new will replace the existing repo1. The creation of a new repo to replace repo1 is only necessary to remove trunk/project_b data from the existing repository.

  1. Determine the exact paths that we want in each repository:
    # in repo1_new
    branches
    tags
    trunk
    trunk/project_a
    
    # in repo2
    trunk
    trunk/project_b

    Note that trunk is included for both repo1_new and repo2. This is important later to prevent errors when trying to load data into repo2: trying to load trunk/project_b without an existing directory named trunk causes problems.

  2. Obtain a dump file which contains the contents of the entire repository, including all commits:
    
    svnadmin dump /path/to/repo1 > /path/to/repo1_dumpfile
    
  3. Filter out the appropriate data for each of the new repositories. This is accomplished with the svndumpfilter command, utilizing the exclude sub-command
    # for repo1_new
    cat /path/to/repo1_dumpfile | svndumpfilter exclude trunk/project_b > /path/to/repo1_new_dumpfile
    
    # for repo2
    cat /path/to/repo1_dumpfile | svndumpfilter exclude branches tags trunk/project_a > /path/to/repo2_dumpfile

    Note that we included trunk/project_b in repo2 by specifically excluding everything else. We could have used the include sub-command, however, this would not include trunk in repo2_dumpfile. If we attempted to load repo2_dumpfile into the new repository, we would receive an error because the creation of trunk/project_b, as a sub-directory, depends on trunk already existing. This could be worked around by committing a directory named trunk to the new repository before attempting the load of the dump file.

  4. Create the new repositories to load the dump files into:
    
    # for repo1_new
    svnadmin create /path/to/repo1_new
    
    # for repo2
    svnadmin create /path/to/repo2
  5. Load the dump files into the appropriate repository:In this case, we will maintain the UUID from repo1 on repo1_new because we intend repo1_new to replace repo1 as an identical repository, minus trunk/project_b. We will not maintain the UUID on repo2, because it will be a completely new repository, containing trunk/project_b.
    # for repo1_new
    svnadmin load --force-uuid /path/to/repo1_new < /path/to/repo1_new_dumpfile
    
    # for repo2
    svnadmin load --ignore-uuid /path/to/repo2 < /path/to/repo2_dumpfile
  6. Replace repo1 with repo1_new. This is easily accomplished:
    
    rm /path/to/repo1 && mv /path/to/repo1_new /path/to/repo1

    The final step is to remove trunk/project_b from existing working copies of repo1, update, and then (if necessary) check out the newly created repo2:

    rm -rf /path/to/repo1/working/copy/trunk/project_b
    cd /path/to/repo1/working/copy/ && svn up
    svn co http://svn.server.com/repo2 repo2

Note: It’s a very good idea to back everything up before you begin. It would be a good idea to temporarily make the affected repositories read-only so that commits or other data are not lost. Communicating changes in the repositories to users before-hand may prevent unhappiness and confusion.