I was working on a server this morning and accidentally deleted an important configuration file. Like many Linux users, I lamented the absence of an “undelete” command. The file wasn’t still open by any processes, wasn’t present in the backups, and would be painful to recreate.
Fortunately, not all hope was lost. When a file is deleted from a hard drive, the blocks are freed, but not actually cleared. The data remains on disk, but it cannot be directly accessed and is in danger of being overwritten. Recovery is a matter of search and rescue.
Since the file I was hoping to recover was a text file, and I knew a fair amount about it (such as approximate file size and some text that was definitely going to be included), finding it actually turned out to be fairly simple task using grep:
grep -a -B 25 -A 100 'some string in the file' /dev/sda1 > results.txt
Grep searches through a file and prints out all the lines that match some pattern. Here, the pattern is some string that is known to be in the deleted file. The more specific this string can be, the better. The file being searched by grep (/dev/sda1) is the partition of the hard drive the deleted file used to reside in. The “-a” flag tells grep to treat the hard drive partition, which is actually a binary file, as text.
Since recovering the entire file would be nice instead of just the lines that are already known, context control is used. The flags “-B 25 -A 100” tell grep to print out 25 lines before a match and 100 lines after a match. Be conservative with estimates on these numbers to ensure the entire file is included (when in doubt, guess bigger numbers). Excess data is easy to trim out of results, but if you find yourself with a truncated or incomplete file, you need to do this all over again. Finally, the ”> results.txt” instructs the computer to store the output of grep in a file called results.txt.
Once the command is done, results.txt will probably contain lots of gibberish, but if you’re lucky, the contents of the deleted file will be intact and recoverable.
To help prevent this problem from happening in the first place, many people elect to alias the rm command to a script which will move files to a temporary location, like a trash bin, instead of actually deleting them.
I have used this to recover lost work. It’s a good trick.
I’d add this advice, though: make sure you write your output to a different partition than the one you’re trying to recover from, or you risk that your output reuses some of the very blocks you’re after!
Great tip! You can simplify your command with the -C flag. Use grep -C 201 to get 100 lines above and 100 lines below the matched line. Thanks
There are some decent “data carvers” out there (testdisk and photorec seem to be the famous open source ones) but nothing is going to beat really good and regular backups (which are tested!).
That’s brilliant advice. I usually just grab my latest backup but that’s not always up to date. Will remember it for next time.
Thanks for all the great feedback!
@ David Lowe: Excellent point. Alternatively, if your hard drive only has one partition and there are no USB disks readily available, you can mount a portion of system memory as a RAM disk, and write the output there or pipe the output to less (or some other program) to keep it in memory (this assumes you have enough free system memory to hold the output file, of course).
@Paulo Eduardo Neves: I wanted to keep the flexibility of a different number of context lines before and after hits to account for the search item being very close to the beginning or end of a file, but you’re absolutely right. If your search string is near the middle of the file, the -C flag can simplify things.
@Anon: If you already have the tools installed, they can definitely make your life a lot easier. If they need to be installed, however (as David pointed out), the more you write to disk, the higher your risk for overwriting the blocks you want to recover. I don’t think anyone will argue the importance of good backups. Shortly after I made this mistake I realized I had been backing up a simlink instead of the actual file, and what I had just deleted was not a duplicate.
@Mike: To help reduce problems of backups being out of date, I use a backup integrity checker. Part of our backup solution here involves time stamping backups as they’re made, then a script on the backup server will raise alarms if anything looks like it’s getting stale. Maybe I should make that my next post…
Wow! Can’t belive I’ve never thought about using an alias for rm, as you said…
Adding something like that to my Puppet recipes and spread across all servers that I manage!
Genius post!
Nice hack!
For files that are not text photorec (http://www.cgsecurity.org/wiki/PhotoRec) may be useful. I recovered about 8000 photos for a friend of mine earlier this year using that tool.
thx for sharing :)
had the same problem on my mac, delete a header file accidentally which was not in my svn repos (thought it was old). However this did not come into my mind. I wonder how to get raw access to the disk on OSX.
What I suggest is a script (of another name) used instead of rm that does not remove anything if _any_ of the file arguments are not found. When used from a Bourne-like shell that means commands like “rm * txt” (by mistake for “rm *.txt”) are harmless (if you’ve no file called “txt”).
This is clever, but you can’t recover the whole file if the it’s fragmented. Granted, a decent file system will try to avoid fragmented files, but it happens.
Even a partial recovery is better than a total loss, though. I won’t repeat the advice about regular backups of course. :)
Terrific work! This is the type of information that should be shared around the web. Shame on the search engines for not positioning this post higher!
Not only are regular backups important, verification that the media can be recovered from and the backups cover everything you expect, are crucial. At a previous job, an admin accidentally deleted a samba share with an errant rm -rf. When he went to recover from backups, he discovered the backup software did not cross mountpoints (he was backing up using / as the only filesystem). At least one Ph.D student lost 5 years of work. This was especially bad since that same admin had convinced everyone to save to the samba drive since it would be “backed up and safe” So the Ph.D did not bother to do her own backups. Luckily another admin was able to recover most of the data using the method you mentioned, but it took him a couple of weeks.
Good advice thanks :) it would have helped me some months ago when I deleted important files by accident… It deserves a RT
I also recovered files using this useful technique:
http://www.pixelbeat.org/docs/disk_grep.html
http://www.pixelbeat.org/scripts/disk_grep
what about pictures? ive always wanted to know how to recover pictures on my hard drive, my ex wife came in and deleted every thing i had. so many memories lost.
See also:
http://www.matusiak.eu/numerodix/blog/index.php/2007/09/10/recover-lost-stuff-from-memory/
For recovering stuff from live memory.
HAMMER fs has an undelete command (it’s called undo(1)).
Cheers.
Very interesting article!! You learn something new every day~ ;)
Just being picky…
“Be conservative with estimates on these numbers to ensure the entire file is included…”
Do you mean “Be generous”?
Before doing the recovery process, you should remount the FS as read only to not overwrite the lost data.
mount -o remount,ro /
Having grep print out the context may not give you the whole file and rerunning grep will take ages on large disks. A better alternative would be to use
grep -ob …
to print out the byte offset and not much else, then play with dd’s skip and count to grab as much as you want.
dd skip=1000 count=1337 …
Another reason to use Windows (Windoze for you folks) and its Recycle Bin.
I had to do exactly the same thing once when I was in college. I had accidentally deleted a programming project the night before it was due (a disk driver for an operating systems class, ironically). It was fragmented into two parts even, so I had to run grep twice to get the second half of the file.
It was definitely a good lesson on the value of source control that, unfortunately, I had to learn a few more times before I actually applied it.
I like the alias script idea, maybe adding a switch to make it a hard delete would be a good idea.
Hmm, I think I prefer editing the directory entries directly. At least that’s what I did in the good ’ole days on my MSX whenever the harddrive got trashed for some reason, again ;). Dunno about ext3, but FAT12 is relatively easy to understand :).
it is late for me, but i’m sure i will need it again.
Thats surely a wonderful workaround. And I really like the idea of aliasing the rm command on critical servers to move the files to separate directory instead of deleting.
Thanks for sharing. :-)
I’m just beginning to use Linux operation system and I am glad that there are such places like this to get to knot a bit more about it.
@Saddumal Bhasodia
“Another reason to use Windows (Windoze for you folks) and its Recycle Bin.”
The Windows command line also skips the recycle bin and Linux also has a trash bin. Thus, out of the thousands of reasons to choose one operating system over the other, this isn’t one of them.
What this is a reason for is to avoid the command line if you don’t know what you’re doing. Also, it’s another example of why everyone should backup everything regularly.
[…] start to think about trying to `grep` through the block device, but you don’t know exactly what you’re looking for — and much of it is likely to be […]