2 Comments

Complementing Time Machine with rsync

Many of my colleagues at Atomic Object use Time Machine to back up their laptops. By default, Time Machine makes incremental backups hourly, but only when the external backup disk is attached. As a result, this default Time Machine backup system has two limitations:

  1. It is not automatic: The user must attach the external hard drive (assuming the laptop travels frequently).
  2. The gap between backups can be relatively large.

For most people, these limitations are minor. For example, the gap between backups can be mitigated by the diligent use of git or some other version control system. However, I am not a typical developer: I’m an absent-minded professor who celebrates when I go a whole week without forgetting to bring my laptop to work. Relying on my remembering to attach an external hard disk to run a backup is not a good idea.

There are several ways to automate a Time Machine backup and run it over a network; however, I found a lower-tech solution: rsync. I use rsync and cron to back up my Documents and Library directories hourly. These directories contain all of my files whose changes would be difficult or impossible to re-create by hand. This “workspace” data is less than 10GB and fits easily on the shared file server. The amount of “workspace” data that changes every hour tends to be very small, so the rsync backup is rarely noticeable. The potentially long gaps between Time Machine backups pose less of a risk for other files (applications, music, system files, etc.) because they rarely change; therefore, I don’t back them up with rsync. In the event of a failure, the few “non-workspace” files that had changed since the previous Time Machine backup can reasonably be restored from original media.

My Script

I have cron run the following script hourly:

#! /bin/bash
 
if /sbin/ifconfig | egrep -A 5 ^en? | grep -q 'status: active'
then
    echo -n "Network detected.  Starting backup at "
    date
    cd $HOME
    `/opt/local/bin/rsync -e 'ssh -i ID_RSA' -a Documents Library --filter ': .rsync-filter' \
                    --exclude="/Library/Caches" --exclude '/Library/Mail/IMAP*' ${remote_server}:Backup`
    echo -n "Backup complete at: "
    date
else
    echo -n "No network detected at "
    date
fi
 
echo "----------------------------"

The if statement checks to see if the machine is connected to a network. It is not sufficient for me to simply grep for status:active because my ifconfig contains two virtual nics that are always active. (I think Parallels uses them.) This line relies on

  1. the two entries for my two network cards (wired and wireless) being the only two lines beginning with en?, and
  2. the status line for a particular network device being between 3 and 5 lines from the first line of the record. (In other words, grep -A 5 grabs enough lines to find the nic’s status, but doesn’t generate a “false positive” by detecting a succeeding nic’s status.)

ID_RSA is the name of a file containing a private key with no passphrase. The key allows rsync to authenticate with the file server. The key must not have a passphrase (because I don’t think there is a way to configure cron to use one). Because the key contains no passphrase, I configure the server to run only rsync when a client attempts to use this key. (To do this, add the public key for ID_RSA to the server’s authorized_keys file, then add command="rsync --server -logDtpr . Backup" to the beginning of the line containing the public key.)

The --filter ': .rsync-filter' option tells rsync to check each directory for a .rsync-filter file containing lists of files to exclude.

The crontab entry for the backup is 5 * * * * my_rsync.sh >> /tmp/backup.log 2>&1. If you don’t redirect the output to a file, cron will e-mail it to your account on localhost.

At this point, I need only occasionally check backup.log for problems. (For example, when the server gets re-imaged, I need to log in once by hand to accept the new fingerprint.) This script never deletes files on the server; therefore, I delete the backup directory and run a “full” backup twice a year.

Alternatives

My ifconfig-based technique of checking to see if the machine is online is not portable. In addition, it does not verify that the backup server is available. (Just because my computer is connected to the internet doesn’t mean that the backup server is up and reachable.) A more robust solution may be to use ping.