In my previous post, I described how to access a private network remotely by creating a VPN server using OpenVPN and a Raspberry Pi. Now that we can connect to our local network remotely, we can set up a system to serve files outside our network.
This is the second post in a series about protecting your privacy by self-hosting while attempting to maintain the conveniences of public cloud services. See the bottom of this post for a list.
Primary Goals: Data Longevity and Accessibility
The two primary goals I have for storing data are ease of access and longevity of data. I have found that the lowest common denominator for this is to use the unencrypted NTFS file system and SMB for network access.
One reason I chose unencrypted NTFS is that the hard drives are physically located in my own home, so other people don’t normally have physical access to them. There are circumstances where someone might gain access to your home, but those are not typical.
Another reason is that this file system is supported across many platforms. It is easy to recover data by plugging the drives into almost any machine, be that macOS, Linux, or Windows.
Another option would be to use a more involved RAID setup, but that would definitely complicate recovering the data in the event that something in the RAID failed beyond normal operation. Also, if I were to become ill or incapacitated, family members with limited technical ability would be able to gain access to retrieve our family photos and other important documents.
Regarding protocols, I decided to use SMB primarily for the ubiquitous support provided by the Samba project as well as native access using Microsoft’s own implementation of the protocol.
Because I am running NTFS without automatic mirroring, it’s a good idea to maintain multiple copies of the data. That is where rsync and rclone come into the story, which I’ll cover later in the post.
Hardware Selection and Setup
There are a few options for selecting a hardware and software arrangement for serving files. I decided that, for my purposes, it would be quite useful to have a powerful workstation and, at the same time, provide file and other network services for my use.
Another alternative would be to set up a dedicated file server on your home network. The primary reason I chose what I did was that I didn’t mind leaving my workstation running all the time. The hardware I selected was powerful enough to serve data and function well as workstation.
This is the hardware I selected:
- 6-core AMD Ryzen with liquid cooling
- Compatible motherboard
- 64 GB RAM
- GeForce GTX 1060
- 1 TB M.2 PCIe SSD
- Midtower case with enough space for 4 internal 3.5″ mechnical hard drives
- (3) 8 TB internal 3.5″ hard drives
- (3) 8 TB external USB 3.0 hard drives
Your needs may vary, but for me, this hardware provides a very nice workstation experience with enough RAM and CPU cores to run VMs and other services (like transcoding videos with Emby). In addition, the internal SSD allows me to run the OS and other applications quickly with the mechanical hard drives providing massive data storage. The external drives provide a one-to-one backup of my data.
Don’t forget that an Uninterruptible Power Supply (UPS) is always a good idea. Configure your UPS to automatically and safely shut down your system when the power is out for a certain number of minutes.
For an OS, I’ve decided to run Windows 10. While Linux definitely excels as a server, I felt that Microsoft has come a long way with the stability and usability of Windows (Windows Subsystem for Linux, or WSL, is great, too). Plus, I run Microsoft Office apps quite regularly. I’ve found that it is useful to be skilled and fluent in all three major platforms (Windows, macOS, Linux).
Since I’m running Windows, I can use the built-in SMB file server support. If you would like to run Linux, Samba is great and very well supported.
We can now directly connect to our file share on macOS, Linux, and Windows. To access our files from iOS, however, we’ll need to do something different because iOS is a bit lacking in the file access area and doesn’t directly support connecting to SMB file shares. That will be covered in the next post in this series.
I am also running VMware Workstation Pro, which I use to host other network services (also to be covered in a later post in this series).
Rsync for Local and Remote Backups
I read and write directly to my internal hard drives.
Rsync is typically installed by default on Linux or macOS. On Windows, you can run it under WSL.
I use the Windows Task Scheduler (or you can use cron in the Linux world) to set up nightly and hourly backups of my data. I configured the schedule to run two scripts I’ve put together: one for running a nightly backup and another for a more selective set of files hourly.
For nightly backups, I make an incremental backup of my entire set of files.
LOG=nightly-`date '+%Y-%m-%d_%H:%M:%S'`.log rsync -av --delete --exclude '$RECYCLE.BIN' --exclude 'System\ Volume\ Information' --exclude 'found.000' --exclude 'Recovery' $SOURCE $DESTINATION | tee ~/logs/$LOG
Of course, you’ll need to define
$SOURCE. This can be any path on your computer. The
$DESTINATION can be any path, including local paths and those over ssh. I plan to utilize the ssh method later this summer by co-locating a small PC and set of hard drives at a family member’s home in another city. The PC will be connected to my network via the same VPN we just set up in Part 1 of this series and will allow me to push incremental backups offsite to provide some peace of mind if something happens to my home (fire, flood, electrical surge).
We use archive mode (
-a) as well as verbose mode (
-v). Also, we’ll be deleting files at the destination if we’ve deleted them at the source. If this is not what you want, you can easily omit this parameter. These settings work well for my situation.
For hourly backups, I am a bit more selective what what gets copied over. In my case, I have a set of files for a small business I run on the side that I want to back up hourly, as they change more frequently.
LOG=hourly-`date '+%Y-%m-%d_%H:%M:%S'`.log rsync -av --delete --exclude '$RECYCLE.BIN' --exclude 'System\ Volume\ Information' --exclude 'found.000' --exclude 'Recovery' $SOURCE/Business/ $DESTINATION/Business/ | tee ~/logs/$LOG
Rsync is very powerful, well-supported and can be used for other things like incremental snapshot backups. You may want to tweak the parameters to fit your specific needs. For more information, see Mike Rubel’s writeup about snapshots or Mark Sanborn’s article. Both are excellent resources. Also, the man page is quite helpful.
Rclone for Encrypted Backups to the Cloud
Rclone is a useful tool for saving and restoring files hosted on public cloud services including Google Drive and about two dozen others. Did I just say public cloud services? I thought the whole point of this blog series was to not rely on them! Well, yes, you are right. I am using Google Drive, but I am encrypting the files before I send them over.
Rather than duplicate the Rclone Google Drive setup documentation, you can follow the steps provided there. Once you set up the configuration, you can set up a crypt profile with Rclone by following the Rclone Crypt setup documentation. This is the most important part, as you do not want to be using the Google Drive share directly. Rclone even supports encrypting of directory and file names, so that your data is not decipherable by others.
Of course, it is possible that given Google’s or a government agency’s computing resources, they could potentially crack the encryption on your files. If you are apprehensive about this, you can skip this step. For now, I am only backing up my encrypted business files to Google Drive and may consider ceasing this altogether at some point when I get my offsite backup online.
The following line is part of my hourly backup script shown above.
rclone -vv copy $SOURCE/$DIRECTORY $TARGET:/$DIRECTORY 2>&1 | tee ~/logs/$LOG
Again, your situation may differ from mine. You may want to back up files to the cloud more frequently, less frequently, or not at all. One other tip: At time of this writing, Google’s G Suite for business provides unlimited storage in Google Drive for around $10 USD per month.
This is another important step on the self-hosting journey. Now that we can serve and remotely access our files while feeling secure that we have regular backups, we can begin to build more network services.
Later in this series, I’ll cover setting up and hosting alternatives to Google Photos, Gmail, and others. See below for a list of the upcoming posts.
This is the second in a series about protecting your privacy by self-hosting while attempting to maintain the conveniences of public cloud services.
- Setting up OpenVPN
- SMB File Server with Automated Backups using Rsync/Rclone
- Note-taking with Nextcloud & Syncthing
- Movies and Music using Emby
- Protect Yourself Online with Privacy Tools
- Ad and Tracker Blocking with Pi-Hole
- Email, Contacts, and Calendars
- Bookmarks and Browsing History using Firefox Sync and Accounts Server
- Photos and Home Movies using Custom Tool