Occasionally, a subversion repository may start to become too large and cumbersome to work with easily. If the repository contains a large amount of data, it may take a great deal of time to perform certain operations, such as checking out the entire repository at once. It may make sense to split the repository to isolate the more frequently modified files or directories — especially if they serve different purposes.
For example, let’s say we had a repository named
repo1 with the following structure:
# in repo1 branches tags trunk trunk/project_a trunk/project_b
trunk/project_a may contain fewer files which are more important, and modified more frequently. By comparison, trunk/project_b may contain a very large number of files, take up a lot of space on disk, and be rarely modified.
While it is possible to just create a new repository,
repo2, for trunk/project_b, copy the files over, and then remove the trunk/project_b files from
repo1, this would be a bad idea. First of all, we would lose the history and individual commits for trunk/project_b. Secondly, the data for trunk/project_b would still technically be stored in
repo1, needlessly increasing its overall size.
The more appropriate solution is to make use of some specific svnadmin commands and utilities to gather only the required data, and then use this consolidated data to create two new repositories:
repo1_new will replace the existing
repo1. The creation of a new repo to replace
repo1 is only necessary to remove trunk/project_b data from the existing repository.
- Determine the exact paths that we want in each repository:
# in repo1_new branches tags trunk trunk/project_a # in repo2 trunk trunk/project_b
Note that trunk is included for both
repo2. This is important later to prevent errors when trying to load data into
repo2: trying to load trunk/project_b without an existing directory named trunk causes problems.
- Obtain a dump file which contains the contents of the entire repository, including all commits:
svnadmin dump /path/to/repo1 > /path/to/repo1_dumpfile
- Filter out the appropriate data for each of the new repositories. This is accomplished with the svndumpfilter command, utilizing the exclude sub-command
# for repo1_new cat /path/to/repo1_dumpfile | svndumpfilter exclude trunk/project_b > /path/to/repo1_new_dumpfile # for repo2 cat /path/to/repo1_dumpfile | svndumpfilter exclude branches tags trunk/project_a > /path/to/repo2_dumpfile
Note that we included trunk/project_b in
repo2by specifically excluding everything else. We could have used the include sub-command, however, this would not include trunk in
repo2_dumpfile. If we attempted to load
repo2_dumpfileinto the new repository, we would receive an error because the creation of trunk/project_b, as a sub-directory, depends on trunk already existing. This could be worked around by committing a directory named trunk to the new repository before attempting the load of the dump file.
- Create the new repositories to load the dump files into:
# for repo1_new svnadmin create /path/to/repo1_new # for repo2 svnadmin create /path/to/repo2
- Load the dump files into the appropriate repository:In this case, we will maintain the UUID from
repo1_newbecause we intend
repo1as an identical repository, minus trunk/project_b. We will not maintain the UUID on
repo2, because it will be a completely new repository, containing trunk/project_b.
# for repo1_new svnadmin load --force-uuid /path/to/repo1_new < /path/to/repo1_new_dumpfile # for repo2 svnadmin load --ignore-uuid /path/to/repo2 < /path/to/repo2_dumpfile
repo1_new. This is easily accomplished:
rm /path/to/repo1 && mv /path/to/repo1_new /path/to/repo1
The final step is to remove trunk/project_b from existing working copies of
repo1, update, and then (if necessary) check out the newly created
rm -rf /path/to/repo1/working/copy/trunk/project_b cd /path/to/repo1/working/copy/ && svn up svn co http://svn.server.com/repo2 repo2
Note: It’s a
very good idea to back everything up before you begin. It would be a good idea to temporarily make the affected repositories read-only so that commits or other data are not lost. Communicating changes in the repositories to users before-hand may prevent unhappiness and confusion.