30 Comments

Super Cheap Data Backups with Amazon Glacier Storage

Matanuska Glacier mouth

My daughter turned 1 year old recently, and milestones like this one remind me how important my data is—especially photographs. Our family uses Google+ for photo storage and backup (which I consider very safe), but a few weeks back I started to get nervous about data loss. It can happen, even at Google.

Using physical media is an option, but what I really wanted was another cloud-based backup option that met the following criteria:

  • The source is reliable and likely to still exist in 10+ years.
  • I don’t need to access the data often (or ever, hopefully)—it’s insurance against catastrophic data loss.
  • The service is very cheap.

Amazon Glacier Storage meets all these criteria. In this post I’ll describe how to download all your Google data (as an example) and set up an Amazon S3 Bucket with lifecycle rules that will automatically transfer your S3 data directly into Glacier storage without needing to do any programming.

Basics of Amazon Glacier Storage

So what is Amazon Glacier Storage? It is an extremely low-cost ($0.01 per gigabyte, monthly) storage service for long term online data backup. You can store anywhere from 1 byte to 40 terabytes using Glacier Storage. The catch is that accessing the data takes time (Amazon quotes 3-5 hours), and data retrieval is more expensive ($0.09 per GB). With the long access times and higher retrieval cost, Glacier is not a good choice for file staging or storage of data you need to retrieve regularly, but it’s perfect for long-term backups.

Download All Your Google Data!

Google provides a great download service called Google Takeout for documents, photos, or email. Takeout lets you download any or all of your data in zip or gzip format through their web interface.

Visit the Takeout website, select the classes of data you want to download (photos, contacts, maps, email, youtube, etc.), choose a file type for the archive, and click “Create Archive”.
takeout

Google will create the archive for you (this takes time, but in my case a 17gb archive was ready in minutes) and you can then download the .zip files.

There are two annoying details:

  1. Google splits the archives into 2gb chunks, which by itself isn’t too bad, but…
  2. Google forces you to re-authenticate via their web interface each time you download one of the archives. They do this for data security, but not being able to batch up all the 2gb .zip files in a single download job is annoying and tedious (I had to manually download 9 archives of ~2gb a piece).

Once you have your data, it can be uploaded to Amazon.

Amazon Web Services S3 Buckets & Glacier Storage Class

To use Amazon Glacier storage, you must create an Amazon Web Services account. Amazon Web Services encompasses many different solutions: computing, storage, database, analytics, etc. We’ll be focusing on two of their storage solutions, Amazon Simple Storage Service (S3), and Amazon Glacier Storage.

Our ultimate goal is to get the data that needs to be backed up into inexpensive, long-term Glacier storage. To that end, there are three approaches:

  1. Use the Amazon Glacier API to write code that will upload data to Glacier. There are AWS SDKs available for both Java and .NET.
  2. Use a 3rd party tool like Arq, which also supports encryption.
  3. Configure an Amazon S3 bucket to automatically move your data to Amazon Glacier storage with Lifecycle Rules.

I am going to skip the first two approaches and explain how the 3rd can be accomplished without paying for any 3rd party software or writing any code, with four simple steps:

1. Create an AWS account.

Self-explanatory. Head over to Amazon and sign up for an account. Don’t worry, it’s free.

2. Create an S3 bucket for your backups.

Login to the AWS Management Console and select S3 under “Storage & Content Delivery.”

aws_console
Once in the S3 section of AWS, click the big blue “Create Bucket” button and choose and name and region.

The key detail here is we are going to be using an Amazon S3 bucket as the container for the files we upload, then—via Lifecycle Rules on the bucket itself—have Amazon automatically move it to Glacier storage. I have found working with S3 directly is more straightforward than Glacier.

3. Add and configure a Lifecycle Rule for your S3 Bucket.

In the S3 Management Console, you should now see your new bucket under “All Buckets”, and to the left of the bucket is a Properties icon (looks like a piece of paper with a magnifying glass). Click the properties icon and Bucket Properties will open up on the right side of the screen. Expand “Lifecycle” and click “Add Rule”.

add_lifecycle_rule

Clicking “Add Rule” opens a modal wizard for creating the rule. On the first page, select “Apply Rule to: Whole Bucket” and click the “Configure Rule” in the lower right. Now for “Action On Objects”, select “Archive Only” and enter “1 day” for “Archive to Amazon Glacier Storage Class”.

configure_lifecycle_rule

Review and accept your new lifecycle rule. Now any data uploaded to your S3 bucket will be moved to glacier automatically after 1 day.

4. Upload your data to the S3 Bucket.

There are a number of 3rd party tools for uploading and managing S3 bucket data. I used one called 3Hub (which has since been discontinued). However, it’s easy to do with the AWS Console. In the S3 Bucket console, click “Actions” and “Upload” to bring up a simple web interface for queueing and uploading files to your bucket.

Note that this approach does not encrypt your data. If that is important to you, I recommend you explore Arq.

I’ve used this method of backing up my Google data for a few months now. My plan is to just manually upload data occasionally (I’m mainly concerned with family photos). Getting my data into Glacier has given me piece of mind that even if my computer died and Google lost my data, I could get it off Glacier as a last resort.