3 Comments

Managing Amazon S3 files in Python with Boto

Amazon S3 (Simple Storage Service) allows users to store and retrieve content (e.g., files) from storage entities called “S3 Buckets” in the cloud with ease for a relatively small cost. A variety of software applications make use of this service.

I recently found myself in a situation where I wanted to automate pulling and parsing some content that was stored in an S3 bucket. After some looking I found Boto, an Amazon Web Services API for python. Boto offers an API for the entire Amazon Web Services family (in addition to the S3 support I was interested in).

Installing Boto

Boto can be installed via the python package manager pip. If you don’t already have pip installed, here are the directions. Once you have pip, boto can be installed via:

pip install boto

Listing Files from in an S3 Bucket

Using boto is rather simple. The documentation is great, and there are plenty of examples available on the web. I was specifically interested in the S3 functionality. You can connect to an S3 bucket and list all of the files in it via:

from boto.s3.connection import S3Connection
 
AWS_KEY = 'MY_KEY'
AWS_SECRET = 'MY_SECRET'
aws_connection = S3Connection(AWS_KEY, AWS_SECRET)
bucket = aws_connection.get_bucket('bucketname')
for file_key in bucket.list():
    print file_key.name

bucket.list() returns a BucketListResultSet that can be iterated to obtain a list of keys contained in a bucket. A key represents some object (e.g., a file) inside of a bucket.

Downloading and Deleting from a Bucket

I was interested in programmatically managing files (e.g., downloading and deleting them). Both of these tasks are simple using boto. Given a key from some bucket, you can download the object that the key represents via:

key.get_contents_to_filename(local_download_destination)

You can also delete an object given a bucket and key via:

bucket.delete_key(key)

In addition to download and delete, boto offers several other useful S3 operations such as uploading new files, creating new buckets, deleting buckets, etc. Given these primitives, you can automate virtually anything.

Extending Boto

To help simplify what I was working on I wrote a thin wrapper around boto called S3.FMA that exposes the higher level file operations that I was interested in. You can find it here. It hides the lower level details such as S3 keys, and allows you to operate on files you have stored in an S3 bucket by bucket name and file name.

The usage model for S3.FMA looks like this:

from S3FMA import *
 
AWS_KEY = 'my key'
AWS_SECRET = 'my secret'
s3FileManager = S3FileManager(AWS_KEY, AWS_SECRET, use_ssl = True)

The S3FileManager class exposes the following API:

# returns a list of files stored in bucket 'bucket_name'
getFileNamesInBucket(bucket_name)
 
# download a file named 'filename' from bucket 'bucket_name' to 'local_download_directory'
downloadFileFromBucket(bucket_name, filename, local_download_directory)
 
# download all of the files in bucket 'bucket_name' to the 'local_download_directory"
downloadAllFilesFromBucket(bucket_name, local_download_directory)
 
# delete all files in bucket 'bucket_name'
deleteAllFilesFromBucket(bucket_name)
 
# download files with names that satisfy 'filename_predicate' from 'bucket_name' to 'local_download_directory'
downloadFilesInBucketWithPredicate(bucket_name, filename_predicate, local_download_directory)
 
# delete files with names that satisfy 'filename_predicate' from 'bucket_name'
deleteFilesInBucketWithPredicate(bucket_name, filename_predicate)