Managing Amazon S3 files in Python with Boto

Amazon S3 (Simple Storage Service) allows users to store and retrieve content (e.g., files) from storage entities called “S3 Buckets” in the cloud with ease for a relatively small cost. A variety of software applications make use of this service.

I recently found myself in a situation where I wanted to automate pulling and parsing some content that was stored in an S3 bucket. After some looking I found Boto, an Amazon Web Services API for python. Boto offers an API for the entire Amazon Web Services family (in addition to the S3 support I was interested in).

Installing Boto

Boto can be installed via the python package manager pip. If you don’t already have pip installed, here are the directions. Read more on Managing Amazon S3 files in Python with Boto…

Making AWS More Affordable with EC2 Scheduling

Amazon Web Services (AWS) provides an amazingly flexible platform for running and managing virtual machines in their Elastic Compute Cloud (EC2). With EC2, it is almost effortless to spin up clusters of dozens to hundreds of nodes. This allows for incredible flexibility in setting up various environments, for purposes such as development, testing, and production.

Of course, all of this comes with the hourly cost of running EC2 instances. Some EC2 instances, such as Windows instances, cost a substantial amount per month. While it probably won’t break the bank, it certainly factors into the decision of how many nodes and environments can be spun up and kept active.

The prevailing attitude often seems to be that EC2 instances must be kept running 24/7. This seems to ignore one of the great attractions of EC2 (and other AWS services) — you only pay for the resources that you consume. Keeping an instance running 24/7 when it isn’t actually being utilized is consuming unnecessary resources. Turning off instances when they won’t be utilized eliminates this resource wastage, and reduces cost. Fortunately, EC2 has a powerful set of tools that makes that very easy to configure schedules for turning instances on and off, and re-assigning static IP addresses.

Read more on Making AWS More Affordable with EC2 Scheduling…

Deploy to AWS S3 and CloudFront with Rake

Even in this age of web applications and dynamic websites, sometimes it is still helpful (or necessary) to host static HTML websites. Sometimes these are just one-off pages; other times they are full websites. While the traditional route involves self or shared hosting, that is no longer necessary. You can quickly and easily host a site using Amazon’s S3 and Cloudfront, and you can easily deploy with Rake and the help of a few Ruby gems.

I recently wrote a set of Rake tasks to help me deploy a static site to S3 with Cloudfront support. While fairly simple, the setup may prove useful to people looking for an automated way to deploy HTML files, such as generated with Middleman or some other static site generator. Read more on Deploy to AWS S3 and CloudFront with Rake…

Distributing Large Files Easily with Amazon S3

I’ve found it very useful to use Amazon’s S3 to distribute large files to customers — not just as an application back end, but as a simple way of distributing and archiving data. This is not at all surprising or innovative, as S3 is for serving files, but it didn’t seem obvious to me how to set it up a simple file distribution area.

Eventually, I boiled it down to:

$ rake s3:upload[]
$ rake s3:urls

The first line uploads a file, and the second line will produce a signed link to that file that anyone can use for a limited time (until the Expires time). This was finally a workflow I could use.

Read more to see a complete example of how to do it. Read more on Distributing Large Files Easily with Amazon S3…

Using Vagrant AWS with Capistrano

Vagrant 1.1 was recently released, adding support for virtualization providers other than VirtualBox. Among the providers now available is one for AWS. In switching my Vagrant workflow from VirtualBox to AWS, I ran into a problem; and in solving it, I discovered a better way to integrate Vagrant with Capistrano.

1. Vagrant Setup

Vagrant 1.1 was released recently. This release adds support for provider plugins, including a new, freely available provider for AWS. Rather than using VirtualBox on your local machine as the virtualization provider, you can now provision Vagrant-managed VMs in the cloud. This makes it much easier to try out things that require more resources like multi-VM environments and VMs requiring lots of RAM.

Read more on Using Vagrant AWS with Capistrano…

Upload Files Directly to S3 with Plupload, Rails, and Paperclip

Plupload is an open source javascript upload handler that supports uploading files directly to Amazon S3. This is an alternative to uploading files to the web server, and then to S3. You will need to use the Flash or Silverlight options to upload directly to S3 because Amazon has yet to enable cross-origin uploading.

Read more on Upload Files Directly to S3 with Plupload, Rails, and Paperclip…

Website Video using Amazon S3, Amazon CloudFront, & JWPlayer

Providing video on the web has been done. It should be simple to get it up and running right? Just a Google search away? Well, one Google search will get hundreds of results all giving different methods, technologies and solutions. The following is a road-map of my path to enlightenment and the eventual solution I employed.

I looked at several commercial hosting solutions including Brightcove, Viddler and Bits on the Run, only to discover that they were expensive to use for a start-up commercial website with a small video library. I also looked at Vimeo but it has a “non-commercial purposes only” clause in their terms of service. Doing what any developer with time on his hands would do, I ventured out to do it myself.

Read more on Website Video using Amazon S3, Amazon CloudFront, & JWPlayer…