Article summary
Many of my colleagues at Atomic Object use Time Machine to back up their laptops. By default, Time Machine makes incremental backups hourly, but only when the external backup disk is attached. As a result, this default Time Machine backup system has two limitations:
- It is not automatic: The user must attach the external hard drive (assuming the laptop travels frequently).
- The gap between backups can be relatively large.
For most people, these limitations are minor. For example, the gap between backups can be mitigated by the diligent use of git or some other version control system. However, I am not a typical developer: I’m an absent-minded professor who celebrates when I go a whole week without forgetting to bring my laptop to work. Relying on my remembering to attach an external hard disk to run a backup is not a good idea.
There are several ways to automate a Time Machine backup and run it over a network; however, I found a lower-tech solution: rsync
. I use rsync
and cron
to back up my Documents
and Library
directories hourly. These directories contain all of my files whose changes would be difficult or impossible to re-create by hand. This “workspace” data is less than 10GB and fits easily on the shared file server. The amount of “workspace” data that changes every hour tends to be very small, so the rsync
backup is rarely noticeable. The potentially long gaps between Time Machine backups pose less of a risk for other files (applications, music, system files, etc.) because they rarely change; therefore, I don’t back them up with rsync
. In the event of a failure, the few “non-workspace” files that had changed since the previous Time Machine backup can reasonably be restored from original media.
My Script
I have cron
run the following script hourly:
#! /bin/bash
if /sbin/ifconfig | egrep -A 5 ^en? | grep -q 'status: active'
then
echo -n "Network detected. Starting backup at "
date
cd $HOME
`/opt/local/bin/rsync -e 'ssh -i ID_RSA' -a Documents Library --filter ': .rsync-filter' \
--exclude="/Library/Caches" --exclude '/Library/Mail/IMAP*' ${remote_server}:Backup`
echo -n "Backup complete at: "
date
else
echo -n "No network detected at "
date
fi
echo "----------------------------"
The if
statement checks to see if the machine is connected to a network. It is not sufficient for me to simply grep for status:active
because my ifconfig
contains two virtual nics that are always active. (I think Parallels uses them.) This line relies on
- the two entries for my two network cards (wired and wireless) being the only two lines beginning with
en?
, and - the
status
line for a particular network device being between 3 and 5 lines from the first line of the record. (In other words,grep -A 5
grabs enough lines to find the nic’s status, but doesn’t generate a “false positive” by detecting a succeeding nic’s status.)
ID_RSA
is the name of a file containing a private key with no passphrase. The key allows rsync
to authenticate with the file server. The key must not have a passphrase (because I don’t think there is a way to configure cron
to use one). Because the key contains no passphrase, I configure the server to run only rsync
when a client attempts to use this key. (To do this, add the public key for ID_RSA
to the server’s authorized_keys
file, then add command="rsync --server -logDtpr . Backup"
to the beginning of the line containing the public key.)
The --filter ': .rsync-filter'
option tells rsync
to check each directory for a .rsync-filter
file containing lists of files to exclude.
The crontab
entry for the backup is 5 * * * * my_rsync.sh >> /tmp/backup.log 2>&1
. If you don’t redirect the output to a file, cron
will e-mail it to your account on localhost
.
At this point, I need only occasionally check backup.log
for problems. (For example, when the server gets re-imaged, I need to log in once by hand to accept the new fingerprint.) This script never deletes files on the server; therefore, I delete the backup directory and run a “full” backup twice a year.
Alternatives
My ifconfig
-based technique of checking to see if the machine is online is not portable. In addition, it does not verify that the backup server is available. (Just because my computer is connected to the internet doesn’t mean that the backup server is up and reachable.) A more robust solution may be to use ping
.
I have a simple rsync trick which will avoid having to setup and use SSH for unattended access.
Check out the details here
http://www.diyode.com/2011/11/backup-your-mac-to-remote-server/
[…] Fortunately, you have backups. Unfortunately, the server included a database with important business data that was written just before the disaster. That most recent data is not included in the last database backup. […]