Distributing Large Files Easily with Amazon S3

I’ve found it very useful to use Amazon’s S3 to distribute large files to customers — not just as an application back end, but as a simple way of distributing and archiving data. This is not at all surprising or innovative, as S3 is for serving files, but it didn’t seem obvious to me how to set it up a simple file distribution area.

Eventually, I boiled it down to:

$ rake s3:upload[MyFile.zip]
$ rake s3:urls

https://s3.amazonaws.com/MyBucket/MyFile.zip?AWSAccessKeyId=ABCDEFGHIJKLMNOPQRST&Expires=1373635405&Signature=GasdfTHxYCBwhH%2FhyktUD0RuI8o%3D

The first line uploads a file, and the second line will produce a signed link to that file that anyone can use for a limited time (until the Expires time). This was finally a workflow I could use.

Read more to see a complete example of how to do it.

Why S3?

Why not Dropbox? Or Basecamp? Or… XYZ? Honestly, it’s partly because Atomic has an S3 account already. Additionally, S3 storage is very cheap if you can wrangle the user-interface, which is very flexible (much more than is needed for simple distribution).

1. Setting up S3 & Permissions

Prerequisites

Obviously, you’ll need an Amazon Web Services account. The sample code listed below is for Rake (and is written in Ruby). And finally this wraps Tim Kay’s fantastic aws script.

Find User Id

Amazon provides fine-grained access management through IAM (Identity & Access Management). From the AWS console, you can get to the IAM pages, and from there you can create a user and sign keys. I won’t cover that part here, as it’s well documented elsewhere.

With an IAM user setup, click on the user and view the Summary tab. Under User ARN, you’ll see a value that looks like `arn:aws:iam::123456789012:user/ausername`. Copy that somewhere so we can get at it in the next step.

Bucket & Policy Setup

The part that confused me most was setting up S3 with a bucket and security. First, create a new bucket. In this documentation I’ve called it `ao-demo-bucket`. Next, select the newly-created bucket, go under *Properties* -> *Permissions*, and and click *Add bucket policy*. You should see a dialog with a text box. The simplest way to start is to use their policy generator, so next click that link on the editor dialog.

In the generator, select the policy type of *S3 Bucket Policy*. Then do the following:

  1. For the principal, use the “User ARN” that you copied earlier.
  2. Select S3 as the service.
  3. Select *All Actions*. The Amazon Resource Name should refer to your bucket, and it will look like arn:aws:s3:::.
  4. Click *Add Statement*. You’ll see it added to a list at the bottom of the form.

Next, we’ll add another statement to this policy.

  1. Enter the *User ARN* again as the principal.
  2. Select S3 as the service again.
  3. Select *All Actions*.
  4. Enter an Amazon Resource Name that refers to the *contents* of your bucket, looking like: arn:aws:s3:::/*. Note the trailing “/*” in the principal.
  5. Click *Add Statement* again.

We’re almost there! You should have a policy that looks something like:

{
  "Id": "Policy1373378044308",
  "Statement": [
    {
      "Sid": "Stmt1234567890123",
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::ao-demo-bucket",
      "Principal": {
        "AWS": [
          "arn:aws:iam::123456789012:user/ausername"
        ]
      }
    },
    {
      "Sid": "Stmt1234567890123",
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::ao-demo-bucket/*",
      "Principal": {
        "AWS": [
          "arn:aws:iam::123456789012:user/ausername"
        ]
      }
    }
  ]
}

Take that and paste it into the bucket policy editor back in the AWS console.

2. Upload Files & Distribute Links

Now, we need to get files up there. Take Tim Kay’s aws script and place it in your path somewhere.

Here’s the Rakefile I’ve been copying around to different projects. Just update the configuration section with your own authentication keys and bucket name (and the path to the `aws` script if necessary), and you’re set.

require 'openssl'
require 'cgi'
require 'base64'

namespace :s3 do
  ## CONFIGURATION ##############################################
  AWS_BIN = "aws"

  AWS_ACCESS_KEY_ID="MY_ACCESS_KEY_ID1234"
  AWS_SECRET_ACCESS_KEY="MY_SECRET_KEY123456789012345678901234567"

  # Links generated by the s3:urls task will expire in this many seconds.
  EXPIRATION_TIME = (3600*72) # 3 days

  BUCKET = "ao-demo-bucket"
  
  ## TASKS ######################################################

  ENDPOINT="https://s3.amazonaws.com/#{BUCKET}"

  def aws
    "env AWS_SECRET_ACCESS_KEY=#{AWS_SECRET_ACCESS_KEY} AWS_ACCESS_KEY_ID=#{AWS_ACCESS_KEY_ID} #{AWS_BIN} "
  end

  desc "List the contents of the bucket"
  task :ll do
    sh "#{aws} ls -l #{BUCKET}"
  end

  desc "List the contents of the bucket"
  task :ls do
    sh "#{aws} ls -1 #{BUCKET}"
  end

  desc "Remove a file"
  task :rm, :filename do |t, args|
    filename = args[:filename].to_s.strip
    raise ArgumentError, "filename argument is required" if filename == ""
    sh "#{aws} rm #{BUCKET}/#{filename}"
  end

  desc "Upload a file"
  task :upload, :filename do |t, args|
    filename = args[:filename].to_s.strip
    raise ArgumentError, "filename argument is required" if filename == ""
    filename = File.expand_path(filename)
    sh "#{aws} put --progress #{Shellwords.escape("#{BUCKET}/#{File.basename(filename)}")} #{Shellwords.escape(filename)}"
  end

  desc "Upload an OVA to S3 with a date"
  task :publish, :filename do |t, args|
    filename = args[:filename].to_s.strip
    raise ArgumentError, "filename argument is required" if filename == ""
    filename = File.expand_path(filename)
    ts = DateTime.now.strftime("%Y-%m-%d-%H:%M:%S")

    # Attempt to add a timestamp to the filename before the extension. This doesn't attempt to handle corner cases
    dest = File.basename(filename.sub(%r|(.[^.]+)$|, "-#{ts}1"))
    sh "#{aws} put --progress #{Shellwords.escape("#{BUCKET}/#{dest}")} #{Shellwords.escape(filename)}"
  end

  desc "Run an arbitrary aws script command"
  task :sh, :cmd do |t, args|
    sh "#{aws} #{args[:cmd]}"
  end

  desc "Generate a signature"
  task :urls do
    expires = Time.now.to_i + EXPIRATION_TIME
    digest = OpenSSL::Digest::Digest.new('sha1')

    puts "[expiration]: [url]"

    lines = `#{aws} ls -1 #{BUCKET}`.split(/$/).map(&:chomp).reject(&:empty?)
    lines.each do |line|
      file = line.strip
      content = "GETnnn#{expires}n/#{BUCKET}/#{file}"
      signature = CGI.escape(Base64.encode64(OpenSSL::HMAC.digest(digest, AWS_SECRET_ACCESS_KEY, content)).chomp)
      puts "#{Time.at(expires)}: #{ENDPOINT}/#{file}?AWSAccessKeyId=#{AWS_ACCESS_KEY_ID}&Expires=#{expires}&Signature=#{signature}"
    end
  end
end

The `s3:urls` task uses a SHA1 HMAC signature to sign the link. This way, we know it comes from an authorized user (you), and the filename and expiration dates cannot be changed without invalidating the signature. You can then distribute links easily and know that they will only be available for a limited time to people that have the link. It isn’t perfect security, but it’s very easy to use for both the person distributing the files and the person downloading them.

3. Tying it All Together

If everything’s configured right, you should be able to run some commands like these:

rake s3:ls                # List all the files that've been uploaded
rake s3:ll                # List all the files that've been uploaded like "ls -l"
rake s3:rm[filename]      # Remove a file from the bucket
rake s3:upload[filename]  # Upload a file
rake s3:publish[filename] # Upload a file and add a timestamp to the filename (myFile.zip -> myFile-2013-07-02-07:47:33.zip)
rake s3:urls              # Generate time-limited URLs to the contents of the bucket