In Unix and its variants, devices (disks, peripherals) are treated as files. Modern Linux distributions mount /dev
at boot using the devtmpfs
file-system and populate the device files dynamically (based on what is present on the system) using udev. Listing the /dev
directory shows that the device files appear just like any other file.
Devices are files. That much we understand, but can files be devices, and if so, how?
In this article we look at creating sparse files, assigning them to loop devices and placing them in a software RAID configuration. The final step in the network RAID configuration is moving one (or more) of the files to a remote mount or share.
Creating a Sparse File
A sparse file doesn’t physically consume empty space, yet the file-system will still report the empty space as allocated. Creating a new sparse file is easy using dd
.
$ dd of=sparse_file bs=1024 count=0 seek=100K
0+0 records in
0+0 records out
0 bytes (0 B) copied, 2.72e-05 s, 0.0 kB/s
Notice that the output from dd
reports that 0 records in and 0 records out. Nothing was read or written. Nonetheless, 100MB file should exist in the current directory.
$ ls -l sparse_file
-rw-r--r-- 1 kyle kyle 104857600 2010-07-29 10:15 sparse_file
$ du -s -B1 --apparent-size sparse_file
104857600 sparse_file
Of course, it’s not really 100 megabytes…
$ du -s -B1 sparse_file
0 sparse_file
Files can be device files
In this example, we created a sparse file (sparse_file) which will will now assign to a loop device. It isn’t necessary to use a sparse file. First take a look at the current loop assignments on the system:
$ sudo losetup -a
If there are no loop assignments, the above command will not display any output. Display the first unused loop device with the following:
$ sudo losetup -f
/dev/loop0
The loop device files are under /dev
just like the disk device files.
$ ls -l /dev/loop*
brw-rw---- 1 root disk 7, 0 2010-07-29 08:01 /dev/loop0
brw-rw---- 1 root disk 7, 1 2010-07-29 08:01 /dev/loop1
brw-rw---- 1 root disk 7, 2 2010-07-29 08:01 /dev/loop2
...
Your system may have up to 255 of these loop devices.
Creating a file system
Assign the sparse_file to the first available loop device. /dev/loop0
is the first available, so it becomes our device.
$ sudo losetup -f sparse_file
$ sudo losetup -a
/dev/loop0: [fc00]:4770 (/home/kyle/sparse_file)
$ sudo losetup -j sparse_file
/dev/loop0: [fc00]:4770 (/home/kyle/sparse_file)
Create a file system on the loop device. In this example I create a XFS file system, but you could create whatever you want (ext4, jfs, reiserfs, etc)
$ sudo mkfs.xfs /dev/loop0
meta-data=/dev/loop0 isize=256 agcount=4, agsize=6400 blks
= sectsz=512 attr=2
data = bsize=4096 blocks=25600, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0
log =internal log bsize=4096 blocks=1200, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Mount the device:
$ mkdir sparse_file_mount
$ sudo sudo mount /dev/loop0 sparse_file_mount/
$ df sparse_file_mount/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/loop0 97600 4256 93344 5% /home/kyle/sparse_file_mount
Since the device was mounted as root, if you wanted to create any files in the new file system you need to
- Create the files as root, or
- Change the ownership of the mount to your local user.
(This is left as an exercise to the reader)
Using mdadm for software RAID between two files
After creating two empty files (sparse or otherwise) named disk1 and disk2 and assigning both of these to a loop device, we can use mdadm
to establish a RAID configuration. It is not necessary to create a file-system on the loop devices like we did before, we’ll do that on the new device created by mdadm
.
$ dd of=disk1 bs=1024 count=0 seek=100K
-output removed-
$ dd of=disk2 bs=1024 count=0 seek=100K
-output removed-
$ ls -l disk*
-rw-r--r-- 1 kyle kyle 104857600 2010-07-29 11:03 disk1
-rw-r--r-- 1 kyle kyle 104857600 2010-07-29 11:21 disk2
$ sudo losetup -f disk1
$ sudo losetup -f disk2
$ sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/loop{0,1}
mdadm: array /dev/md0 started.
$ sudo mkfs.xfs -f /dev/md0
-output removed-
$ mkdir raid_mount
$ sudo mount /dev/md0 raid_mount/
$ df raid_mount/
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/md0 97536 4256 93280 5% /home/kyle/raid_mount
(Note: in bash, /dev/loop{0,1} is expanded to ”/dev/loop0 /dev/loop1”).
Networked RAID
At this point, setting up the networked RAID is the simple matter of mounting a remote file-system. This can be accomplished with a variety of protocols such as NFS, SSHFS or samba.
Notes on Performance
You can use hdparm
to gauge disk performance:
$ sudo hdparm -Tt /dev/md0
/dev/md0:
Timing cached reads: 1772 MB in 2.00 seconds = 886.50 MB/sec
Timing buffered disk reads: 98 MB in 1.30 seconds = 75.31 MB/sec
$ sudo hdparm -Tt /dev/sda2
/dev/sda2:
Timing cached reads: 1726 MB in 2.00 seconds = 862.92 MB/sec
Timing buffered disk reads: 196 MB in 3.00 seconds = 65.26 MB/sec
If the raid is setup over a network, the bottleneck is likely going to be bandwidth. With 100Mbps, expect disk reads/writes to be about 10 MB/s (asymptotic maxima at 100 Mbps / 8 bits/byte = 12.5 MB/s)
Further Reading
Check out drbd for a networked Raid.
This is a nice trick, I’ve been doing something similar on a faraway colo machine since it lost a disk. Replaced the failed /dev/sda* with loop devices on NFS-mounted files on a nearby machine. Indeed I get ~10MB/sec performance. I haven’t bothered with the “write-mostly” option as it’s just a low-volume mail/web server. I hadn’t known about using a sparse file – wish I had.
The issue today is that I’m resyncing my 50GB /home to the NFS/loop device and the machine is on its knees, ~100 load average (not CPU – all WIO), unresponsive to the point of being offline, etc. Yet the I/O to the loop file is crawling at 500KB/sec. In the past, resync has been hard on loadaverage, but at least was short-lived at 10MB/sec. At this rate, I’m looking at ~24hours of downtime.
Check out vblade and AOE (ATA Over Ethernet) techniques for a networked block device mapping with no TPC overhead.