Rsync Speedup on Underpowered Systems

At my house, I have an old Linux desktop that I use as a backup server. It sits in a corner of the house, faithfully churning away, storing my Time Machine backups and other system images. Over the summer I had cause to migrate all of the data onto a new drive array in a new machine.

To prepare for the large data transfer, I set up both machines on a gigabit switch supporting jumbo frames and configured the NICs to use an MTU of 9,000 bytes. This network should have been good enough to saturate whichever machine had the slower hard drive array, but when I started syncing the two systems using rsync, I was only getting a paltry 20MiB/sec — faster than 10/100 Ethernet to be sure, but still nowhere near what I was expecting.

Some preliminary investigation revealed that my old file server, with it’s puny single-core Athlon 64, was having its CPU pegged at 100% utilization. That seemed odd, as reading files off of a disk and spitting them over the wire should have been a fairly simple task. It turns out that, by default, rsync uses an ssh session to transfer data. The versions of ssh I was using defaulted to an AES encryption, which is not a particularly light cipher. The disk array was also a Linux soft RAID 5, which wasn’t helping things. Both factors combined seemed to be causing the CPU bottleneck.

Since I was stuck with the software RAID, I had two options to speed up my data transfer speed: Use a different data tunnel, or set up a proper rsync server which would allow unencrypted data transfer. Rsync servers are not that hard to set up, but as this was going to be a single-use setup, I didn’t want to invest the time. To configure rsync to use a different remote shell, you just have to pass it the shell as a -e parameter. To avoid setting up something like rsh, I just used SSH with a simpler cipher, Arcfour (Alleged RC4).

rsync -e 'ssh -c arcfour' -av

The solution came from a mailing list post back in 2005.

With this simple change, I was now getting transfer speeds of ~65MiB/s, not too far off the normal read speeds for the older drive array. Great success! This trick could also be useful for small embedded storage devices, where CPU time is at a premium, or any of the small ARM computers that are coming onto the market.