Test-Driven Infrastructure (TDI)

Atomic really has a passion for writing high quality code and for testing. While our internal server infrastructure has often been maintained in a semi-automated fashion, it has traditionally lagged far behind our development practices in terms of code quality, testing, and continuous integration.

Over the past year, however, Mike English and I have been working to revamp much of our server infrastructure using the Chef configuration management tool. Our goal has become to build a Test-Driven Infrastructure (TDI) in which we first write tests to model and validate the code that we later produce to configure and manage our servers and applications.

Why Test Infrastructure?

Traditionally, much of infrastructure was managed manually, or with one-off scripts and programs. This inconsistency and lack of readily verifiable behavior resulted in many issues. In software development, test-driven development (TDD) is well recognized for increasing code quality, improving overall design, and allowing safe refactoring throughout a project.

Similar benefits can be realized in infrastructure projects when infrastructure is treated as code, and that code is driven with tests. Configuration management tools such as Chef allow infrastructure to be easily described as code, and provide facilities to readily introduce tests. This can allow infrastructure teams to consistently, speedily, and confidently create and manage resources.

Principles of TDI

Paul Duvall of Stelligent has written about a 5-step process of automating environments for continuous delivery: document, test, script, version, and continuous. These 5 steps can be taken as the basic principles for Test-Driven Infrastructure.

Each principle is described in more detail below:

  • Document: Identify and record the high-level details of the system, and what needs to be accomplished.
  • Test: Write tests that describe the desired behavior or outcome.
  • Script: Write code to implement the processes that will produce the behavior or outcome.
  • Version: Use version control to track changes and make collaboration easier.
  • Continuous: Whenever changes are made, test everything. Be sure that the system as a whole still behaves properly.

The specific order of these principles may be somewhat debated (perhaps version control should come first?), but they are all key to implementing a test-driven infrastructure.

What does TDI Look Like?

Much of TDI is very similar to TDD. The major difference is in the specific tools and environments in which the process is being carried out, usually affecting the test and script aspects. The document, version, and continuous details are all nearly identical. Below is a brief peek at what some code for a TDI might look like.

The Tests

Tests occur at unit, integration, and system levels. Tests are generally written incrementally prior to actually writing or executing any code that the tests describe. However, it is difficult to convey this workflow in writing. For simplicity, I’ve included a few tests first, and will then show the implementing code.

Unit Tests

TDI unit tests describe the actions that will be taken against specific system resources: files, services, users, etc.

For example, with Chef, a set of unit tests for ChefSpec may look like:

describe 'chef_testing_example::default' do

  ...

  it 'installs apache2' do
    expect(chef_run).to install_package('apache2')
  end

  it 'enables mod_rewrite' do
    expect(chef_run).to run_execute('a2enmod rewrite')
  end

  it 'creates the default_example site from template' do
    expect(chef_run).to create_template('/etc/apache2/sites-available/default_example.conf').with(
      source: 'default_example.conf.erb',
      variables: {
        hostname: 'www.example.com'
      }
    )
  end

  ...
  
end

Integration Tests

TDI integration tests attempt to combine the various modules and resources described in code to ensure that all systems converge without errors.

For Chef, this is often done with a test harness such as Test-Kitchen. The Test-Kitchen configuration specifies the various components that are being integrated:

---
driver:
  name: vagrant

provisioner:
  name: chef_solo

platforms:
  - name: ubuntu-14.04

suites:
  - name: default
    run_list:
      - recipe[apt]
      - recipe[build-essential]
      - recipe[chef_testing_example::default]

System Tests

System tests for TDI describe the outcome of the convergence process to ensure that the system actually behaves as intended.

For Chef, there are several tools for this, but Serverspec often provides a convenient means to model the expected result:

describe 'Chef Testing Example Server' do

  ...

  describe port 80 do
    it { should be_listening }
  end
  
  describe command 'curl -q -I http://www.example.com' do
    its(:stdout) { should match '302 Found' }
    its(:stdout) { should match 'https://docs.getchef.com/kitchen.html' }
  end
  
  ...

end

The Scripts

Once tests have been written to describe actions against system resources, the scripts that actually realize these actions can be coded. Continuing with the above set of tests, an implementation in a Chef recipe may look like:

package 'apache2'

service 'apache2' do
  action [:start]
end

execute 'a2enmod rewrite' do
  not_if 'test -f /etc/apache2/mods-enabled/rewrite.load'
end

template '/etc/apache2/sites-available/default_example.conf' do
  source 'default_example.conf.erb'
  variables(
    hostname: node['chef_testing_example']['hostname']
  )
end

The Chef recipe may then be included in a cookbook that is applied to a particular node:

{
  "name": "app-server.example.com",
  "chef_environment": "_default",
  "run_list": [
    "role[apt]",
    "role[build-essential]",
    "role[chef_testing_example]",
    "role[mysql]",
  ],
  "chef_testing_example": {
    "hostname": "www.example.com"
  }
}

The Result

The example is very simple, but illustrates the test and script components of the TDI process. For a situation as simple as installing Apache, enabling a module, and placing a configuration file, it may not seem that testing would be particularly valuable. However, when recipes become more complex, are combined into a cookbook along with dozens of others, and then applied to entire clusters of nodes, the testing becomes a much more significant statement.

It allows the infrastructure team to have confidence that their configuration changes are being applied as intended across many different systems, and having the intended effect. This allows the whole system to be trusted, and then updated without fear for introducing breaking changes or otherwise causing instability.

I’ve written a fully working example using a TDI approach (from which the code samples above were drawn) that contains the individual components of a working system (including continuous integration). You can check it out on GitHub (github.com/kuleszaj/cheftestingexample) and Travis CI (travis-ci.org/kuleszaj/cheftestingexample).