We were recently faced with the problem of how to ship and support a complicated piece of server software. We needed the software to be installed on a customer’s existing infrastructure and were nervous about depending on them to have experts in house.
We decided to build a virtual machine “appliance” style packaging and to ship a fully-configured Linux installation. This greatly simplified the process of installing at a customer’s site, but building such an appliance repeatably is not trivial.
One driving concern I had was to make sure that we could easily pick up the project months later and be able to build without a hitch. The application side of things I expect to get a lot of attention and development effort, but I didn’t really expect the VM image to need to change very frequently (beyond dropping in a new application build.)
In the past I’ve been frustrated by the process of getting an old project to build on modern systems. Sometimes we just can’t download the right version of some now-obscure dependency, or maybe our compiler is no longer compatible with the old version of some library. A deprecated feature in a dependency might force us to upgrade — but that can introduce a new incompatibility, and so on up the chain. This has gotten better over the years, with tools like [Bundler](http://gembundler.com) and [Maven](http://maven.apache.org) maturing, but it is not yet a completely solved problem.
In order to protect us from such changes we ended up with a multi-stage approach to building our software, so that if our platform changes we shouldn’t have to swap out too much of our process.
We developed a multi-stage approach to our build:
1. Download all Dependencies
1. Build a Basebox
1. Configure the Machine
1. Validate the Installation
1. Package the Appliance as an OVA
What we built is essentially a [Deployment Pipeline (per Martin Fowler very recent post)](http://martinfowler.com/bliki/DeploymentPipeline.html). Here’s what each stage looked like.
1. Download All Dependencies
First, we wanted to make sure we had all of our dependencies. This included:
* Downloading RPMs with dependencies (recursively)
* Cloning git repositories for things not packaged (like rbenv.)
* Fetching rubygems with dependencies (recursively) and then indexing them as a rubygems source
* Downloading tarballs via HTTP
* Fetch ISOs via HTTP
I then served up this whole directory structure using a small [Sinatra](http://www.sinatrarb.com/) app running inside the [Thin](https://github.com/macournoyer/thin) web server. (We couldn’t use a shared VM directory since we didn’t want to ship a VM to a VMWare customer with VirtualBox kernel modules installed.) Since we now have everything we need, we can just archive these files and use them when we do a build in the future. From this build stage on, we won’t need an internet connection to build the VM.
2. Build a Basebox
We built our own basebox to install on. The biggest thing we needed to do ourselves was to build a box that didn’t have the virtualization platform’s extensions. We used [VeeWee](https://github.com/jedi4ever/veewee) to build a minimal install, and I just used a mostly-stock definition.
After VeeWee does the hard work, we export a package that can be used by [Vagrant](http://vagrantup.com).
3. Configure the Machine
The actual work of configuring the machine is done by [Capistrano](http://capistranorb.com/) and Chef-Solo. Justin Kulesza has [written about this here](https://spin.atomicobject.com/2012/12/18/chef-solo-with-capistrano/). Chef Solo is a great tool for this because it doesn’t need a Chef server, but it still provides an automated way of configuring the machine and lets us use the substantial existing collection of Chef recipes already out there. It also means that the majority of our code is in Chef recipes, so anyone familiar with that (even if they’re not a developer) should be able to come in later and make updates.
4. Validate the Installation
This process has a lot of moving parts, and it certainly warrants tests, but it was not obvious at first how to do that. It was difficult because there are so many unrelated pieces: scripts that prompt users for configuration information, a J2EE container, cron jobs, backup scripts, etc.
In the end, I realized it’s a simple problem because Vagrant provides simple control over the VM. I just wrote unit tests in RSpec. It worked by shelling out to Vagrant to boot up and roll back the machine, and we can shell out to SSH to manipulate the machine. I use net-ping to check that the system is listening on the right ports, and HTTP libraries let me check that the app is actually running.
This allowed me to easily test workflows like:
- Boot up the machine.
- Perform an initial configuration.
- POST license keys to our API.
- Backup the system.
- Roll back to a pristine system.
- Restore from backup and see that it looks like we expect (e.g. with license keys restored).
Eventually I would definitely like to see this tested with [test-kitchen](https://github.com/opscode/test-kitchen), but per the warning currently on that site it is not really stable enough yet.
5. Package for Distribution
Updating the Network Configuration
When Linux detects a network card, it creates an interface for it (
eth0), and then saves that long-term. This way things won’t get shifted around (e.g. your USB ethernet adapter will always be eth2.) However, this is absolutely not the behavior we want for our appliance: when the image is imported at the customer site, it’s going to get a new virtual ethernet card with a new MAC address, so we must not cache anything from our development environment.
The specifics of how to do this will differ among Linux distributions, but the broad overview is:
* Clear any cached udev devices (on my system these were in /etc/udev/rules.d/70-persistent-net.rules).
* Clear DHCP leases.
After doing this and shutting down the VM, we had a clean disk image ready to import anywhere.
Packaging an OVA
Mike English spent some time working out what the OVA virtual machine archive format should look like so we could write an automated process to export an image.
## An Appliance
It was a lot of work to develop this process, but I’m very happy with the results. I’m also quite confident that we’ve separated everything enough that we’ll be able to easily replace components. We’re not tied to VeeWee: anything that can get us a VirtualBox image can be integrated easily. We’re not even tied to Vagrant: as long as we can script VM import and export, we can swap in a different component.
This corner of the virtualization world is changing at an incredible pace. New releases of many of our tools are coming out seemingly daily. I fully expect that when we next need to update our build, much will have changed. I think we’re ready for it.
Is anyone else out there building a virtual appliance? What does your approach look like?
I’m curious, how did you handle the following :
1) Does the customer get root access to the virtual appliance? And if not, what privs are they given?
2) How did you handle updates to the application running on the appliance? (especially if there’s no net access)
2a) How did you track exceptions (especially if there’s no net access)?
In our case, it is assumed the customer is trusted IT staff. We hope that all our automated setup is safe and secure, but we do trust the user to not be malicious. So for #1 – yes they get root access.
For upgrades, our solution was “restore from backup”. This appliance isn’t running software that needs five-nines sort of reliability, so a few minutes downtime for the reboot is ok. We have a script that backs up our application data which runs nightly (or when manually invoked). So they just back up the data, throw away the old one, then restore using our tool. In this case, the tool is just doing app data and config so it’s fast, but it also was hand-rolled for exactly the set of things we care about.
We got a lot of leverage out of the fact that our audience is IT: for 2a the solution basically is to wire it into the customer’s existing monitoring. We didn’t have time to build a custom agent for all the solutions out there (ITM, OEM, Nagios, Zabbix et al.), so we just provided documentation on our expectations, log file locations, etc., and added some scripts to help them gather some system state information and logs to submit back to us if they need to contact support.
easy to read and understand.. Thumbs up
Mike it looks like you developed a really good approach to this, and I thank you for the blog post.
We built a software suite specifically to automate everything you mentioned above. It goes a few steps further, because we integrate our open source REST admin API and UI, so once your customers have the OVA they can easily manage the updates and other aspects of the appliance. Just google ‘Jidoteki’ and you’ll find a link to our site.
Mike, this is explained very well. I am starting my project in few weeks.
This article will help me for sure.
I wish there was a video tutorial for this
Comments are closed.