Occasionally, a subversion repository may start to become too large and cumbersome to work with easily. If the repository contains a large amount of data, it may take a great deal of time to perform certain operations, such as checking out the entire repository at once. It may make sense to split the repository to isolate the more frequently modified files or directories — especially if they serve different purposes.
For example, let’s say we had a repository named repo1
with the following structure:
# in repo1
branches
tags
trunk
trunk/project_a
trunk/project_b
trunk/project_a may contain fewer files which are more important, and modified more frequently. By comparison, trunk/project_b may contain a very large number of files, take up a lot of space on disk, and be rarely modified.
While it is possible to just create a new repository, repo2
, for trunk/project_b, copy the files over, and then remove the trunk/project_b files from repo1
, this would be a bad idea. First of all, we would lose the history and individual commits for trunk/project_b. Secondly, the data for trunk/project_b would still technically be stored in repo1
, needlessly increasing its overall size.
The more appropriate solution is to make use of some specific svnadmin commands and utilities to gather only the required data, and then use this consolidated data to create two new repositories: repo1_new
and repo_2
. repo1_new
will replace the existing repo1
. The creation of a new repo to replace repo1
is only necessary to remove trunk/project_b data from the existing repository.
- Determine the exact paths that we want in each repository:
# in repo1_new branches tags trunk trunk/project_a # in repo2 trunk trunk/project_b
Note that trunk is included for both
repo1_new
andrepo2
. This is important later to prevent errors when trying to load data intorepo2
: trying to load trunk/project_b without an existing directory named trunk causes problems. - Obtain a dump file which contains the contents of the entire repository, including all commits:
svnadmin dump /path/to/repo1 > /path/to/repo1_dumpfile
- Filter out the appropriate data for each of the new repositories. This is accomplished with the svndumpfilter command, utilizing the exclude sub-command
# for repo1_new cat /path/to/repo1_dumpfile | svndumpfilter exclude trunk/project_b > /path/to/repo1_new_dumpfile # for repo2 cat /path/to/repo1_dumpfile | svndumpfilter exclude branches tags trunk/project_a > /path/to/repo2_dumpfile
Note that we included trunk/project_b in
repo2
by specifically excluding everything else. We could have used the include sub-command, however, this would not include trunk inrepo2_dumpfile
. If we attempted to loadrepo2_dumpfile
into the new repository, we would receive an error because the creation of trunk/project_b, as a sub-directory, depends on trunk already existing. This could be worked around by committing a directory named trunk to the new repository before attempting the load of the dump file. - Create the new repositories to load the dump files into:
# for repo1_new svnadmin create /path/to/repo1_new # for repo2 svnadmin create /path/to/repo2
- Load the dump files into the appropriate repository:In this case, we will maintain the UUID from
repo1
onrepo1_new
because we intendrepo1_new
to replacerepo1
as an identical repository, minus trunk/project_b. We will not maintain the UUID onrepo2
, because it will be a completely new repository, containing trunk/project_b.# for repo1_new svnadmin load --force-uuid /path/to/repo1_new < /path/to/repo1_new_dumpfile # for repo2 svnadmin load --ignore-uuid /path/to/repo2 < /path/to/repo2_dumpfile
- Replace
repo1
withrepo1_new
. This is easily accomplished:rm /path/to/repo1 && mv /path/to/repo1_new /path/to/repo1
The final step is to remove trunk/project_b from existing working copies of
repo1
, update, and then (if necessary) check out the newly createdrepo2
:rm -rf /path/to/repo1/working/copy/trunk/project_b cd /path/to/repo1/working/copy/ && svn up svn co http://svn.server.com/repo2 repo2
Note: It’s a very
good idea to back everything up before you begin. It would be a good idea to temporarily make the affected repositories read-only so that commits or other data are not lost. Communicating changes in the repositories to users before-hand may prevent unhappiness and confusion.