Recently, I worked to recover a client’s corrupted SVN repository. While the best protection against repository corruption is good backups, these are not always up-to-date or intact. Unless there are backups, repository corruption will almost certainly result in some data loss.
However, by working around corrupt revisions, it may be possible to restore the repository to functionality with minor data loss, and potentially detect what data loss has occurred–if an up-to-date and intact working copy is available.
Success will depend, of course, upon the level of SVN repository corruption. If most (or all) revisions are corrupt, not much can be done. When corruption occurs without backups, the task becomes recovering as much data as possible and identifying what data has been lost, rather than expecting full recovery.
The primary strategy for working around corruption is to create a new repository, omitting the specific revisions which were corrupted. This is achieved by creating an SVN dumpfile from the existing repository containing only valid revisions, then importing the dumpfile into a new repository.
Here are the steps for recovery:
1. Detect Corruption
Corrupted revisions usually come to light when attempting a full repository checkout or running operations that only affect one particular revision. A definitive diagnosis can be achieved using the `svnadmin verify` command against the repository in question.
If a corrupted revision is detected, `svnadmin` will print out an error message. Specific revisions or ranges can be checked with the `-r` option to `svnadmin verify.`
For example, let’s say we have a repository with 10 revisions. Revisions 4 and 7 are corrupted. We can check for corruption on the repository using `svnadmin verify`:
jk@gerty ~/svn $ svnadmin verify myrepository
* Verified revision 0.
* Verified revision 1.
* Verified revision 2.
* Verified revision 3.
svnadmin: E160004: Missing node-id in node-rev at r4 (offset 290)
The verification will stop at the first sign of corruption. (In this case, at revision 4.)
2. Dump Valid Revisions
An SVN dumpfile is created using the `svnadmin dump` command against a repository. Specific revisions can be selected with the `-r` option to `svnadmin dump`, and revisions can be isolated so that they only include incremental changes using the `–incremental` option.
Continuing our example, let’s start dumping the repository:
jk@gerty ~/svn $ svnadmin dump myrepository > dumpfile
* Dumped revision 0.
* Dumped revision 1.
* Dumped revision 2.
* Dumped revision 3.
svnadmin: E160004: Missing node-id in node-rev at r4 (offset 290)
We can then move beyond that corrupt revision by using the `-r` option to select the revision beyond the corrupted one, and `–incremental` to only include changes made in the dumped revisions (instead of the whole source tree):
jk@gerty ~/svn $ svnadmin dump --incremental -r 6:HEAD myrepository >> dumpfile
* Dumped revision 6.
svnadmin: E160004: Missing node-id in node-rev at r7 (offset 289)
Note that we use `>>` to append to the existing dumpfile, and that we actually use a revision offset of 6 instead of 5 (since revision 4 is corrupted, the ‘incremental’ dump for revision 5 cannot be calculated).
Then we can dump the rest of the repository as follows:
jk@gerty ~/svn $ svnadmin dump --incremental -r 9:HEAD myrepository >> dumpfile
* Dumped revision 9.
* Dumped revision 10.
Note again that we actually use a revision offset of 9 instead of 8.
Caveat: If revisions 4 and 7 only added new files to the source tree, revisions 5 and 8 could be dumped without the `–incremental` option as full revisions (avoiding comparison with revisions 4 and 7, which fail because of the corruption). However, in our example, revisions 4 and 7 modify existing files, so dumping without `–incremental` would cause an error when loading into a new repository because the files already exist in the source tree:
svnadmin: E160020: File already exists: filesystem 'recoveredrepo/db', transaction '4-4', path 'file'
3. Create a New Repository
A new SVN repository is easily created with `svnadmin create`:
jk@gerty ~/svn $ svnadmin create recoveredrepo
4. Restore Valid Revisions from Dumpfile
Importing the dumpfile into the new SVN repository is also easily accomplished with `svnadmin load`:
jk@gerty ~/svn $ cat dumpfile | svnadmin load recoveredrepo
<<< Started new transaction, based on original revision 1
------- Committed revision 1 >>>
<<< Started new transaction, based on original revision 2
------- Committed revision 2 >>>
<<< Started new transaction, based on original revision 3
------- Committed revision 3 >>>
<<< Started new transaction, based on original revision 4
------- Committed revision 4 >>>
<<< Started new transaction, based on original revision 6
------- Committed new rev 5 (loaded from original rev 6) >>>
<<< Started new transaction, based on original revision 7
------- Committed new rev 6 (loaded from original rev 7) >>>
<<< Started new transaction, based on original revision 9
------- Committed new rev 7 (loaded from original rev 9) >>>
<<< Started new transaction, based on original revision 10
------- Committed new rev 8 (loaded from original rev 10) >>>
Note that because of the omitted revisions, the new repository will have revisions numbers that do not necessarily correspond with the old revision numbers.
5. Compare the Recovered Repository
Now that we have recovered what we can from the corrupted SVN repository, it is time to perform a checkout:
jk@gerty ~/svn $ svn co file:///Users/jk/svn/recoveredrepo recoveredwc
Checked out revision 8.
We can also compare it with an up-to-date working copy (if one is available):
jk@gerty ~/svn $ diff -r -x .svn recoveredwc/ originalwc/
diff -r -x .svn recoveredwc/somefile originalwc/somefilefile
1c1
< Hello World --- > Nothing
Note that we omit comparison of the `.svn` directories with `-x .svn`.
From the comparison, we can see that some data in the new recovered working copy is different from the original working copy. If the original working copy was fully up-to-date, this may give us a hint as to what data was lost due to corrupted revisions.
While this solution would not detect data loss in the repository history, it would potentially allow us to detect data loss in the latest version of the files (which are often the most relevant).
Using the diff, we can determine which version of the file is most correct on a case-by-case basis . Any necessary updates can then be committed to the new recovered repository.
First of all, this blog helped me realize how to recover (al least try to) a corrupted repository. Hopefully I will recover most of it. Thank you.
In case it is usefull for someboyd else, based on this blog and after googling from different sources since I am not a programmer, I wrote the following script, please feel free to share with other people. Also please take a good look at the script before using it.
#!/bin/bash
#Specify the folder in the next line
dir=/var/lib/scm/repositories/svn/
clear
echo “Repository folder is $dir”
echo “In case this is not right, abort the process and modify the script”
echo “Please enter the name of the repository to check”
read repo
#repo_full complete repository name including folder
repo_full=$dir$repo
revision_file=$repo.txt
echo “Repository $repo_full will be verified.”
echo “Failed revisions of the repositoy will be saved to file $revision_file”
echo “If there is a file named $revision_file, please either move it or delete it”
echo “Press Enter to continue”
read a
touch $revision_file
START=1
END=”$(svnlook youngest $repo_full)”
echo “Youngest repository revision of $repo_full is $END”
echo “Verifying from revision $START until $END”
echo “Press Enter to continue”
read a
echo “Verifying repository $repo_full”
for (( c=$START; c/dev/null
then
echo “Revision $c repository of $repo is OK ”
else
echo “Revision $c repository of $repo is not OK. Saving to file.”
echo “$c” >> $revision_file
fi
done
echo “Finished repository check!”
echo “Repository $repo_full will be dumped”
echo “In case two successive revisions are corrupt, you may see an error messag
e”
echo “But the process will go on”
svnadmin dump -r $START:HEAD $repo_full > “$repo.dump”
# line delimited (each failed revision is stored on a line)
cat $revision_file | while read LINE
do
var=$LINE
var=$((var + 1))
echo “Incrementally dumping revision $var of $repo_full repository”
svnadmin dump –incremental -r $var:HEAD $repo_full >> “$repo.dump”
done
So hopefully this creates a file named after your repository name. From this point, you need to follow this guide or a similar process starting from point 3.
for (( c=$START; c/dev/null
it’s incomplete
Yes, I´m sorry for that. Replace until “then” with this
for (( c=$START; c/dev/null
Again, please do have in mind I´m not a programmer, so please backup first and do your own tests, though svnadmin dump should not break anything int the original repo.
Regards
Sorry again, now I realize copy and paste is not working ok with this part, here I go again:
for (( c=$START; c/dev/null
No, it´s not my fault, it seems some characters are not allowed here and there is no preview. Maybe it is the “less than” sign. I will write it down in text since I can not figure out what to do. Replace the text with the corresponding signs:
for (( c=$START; c “less than or equal” $END; c++))
do
if svnadmin verify -r “$c” “$repo” &>/dev/null
then
Hope it works now, else I give up.
Thanks! it really works!
just a little modification
START=1
END=”$(svnlook youngest $repo_full)”
remove the ( ” )
START=1
END= $(svnlook youngest $repo_full)
and
echo “Revision $c repository of $repo is not OK. Saving to file.”
echo “$c” >> $revision_file
remove the ( ” )
echo “Revision $c repository of $repo is not OK. Saving to file.”
echo $c >> $revision_file
Glad it helped :)
Regards
Great article – thanks. My repository is back, and had 4 distinct problems I didn’t even know about.