Capistrano and Chef Solo for Configuration Management

Tddium is designed to address two problems: tests that take too long to run and test infrastructure that is too painful to manage.  In both cases, the underlying goal is to save developers time while helping them to improve the quality of the code they deliver.  Parallelism in the cloud is part of the story: it is the key ingredient for making tests run faster.  But careful configuration and management of Tddium is the key to keeping it all running.  After all, more parallelism means more computers to configure!

It has been a while, but in a past life I administered a collection of Digital Unix, FreeBSD, and Linux servers and desktops in a research environment.  Although there has been some progress in the last ten years,  system administration remains more dark art than science, especially as individual machine configurations diverge over time.  Here at Solano Labs, we were determined not to become mired in the morass of configuration files, one-off shell scripts, and half-remembered history that has often characterized Unix system administration.  “No Futz”, or more recently “DevOps” , are the phrases of the day.

Two advantages Tddium enjoys over the research lab environment is that most of the machines can have virtually identical configurations and we don’t have graduate students with the root password. In our environment, there are two tools that will be familiar to many readers that have already proved indispensable: Capistrano and Chef.  Capistrano is a Ruby gem designed to help automate Rails deployments, although it is flexible enough to be useful in other settings, too.  Chef is an “infrastructure automation” tool from Opscode that allows system administrators to write declarative configurations for machines and then apply those configurations in an automated fashion.  A full-blown Chef installation includes a central repository that handles access control and distributes “recipes” to clients.

We’ve taken a hybrid approach to building virtual machine images: we use a small snippet of Capistrano to build an up to date base image from a widely used Linux distribution.  The Capistrano scripts are responsible for upgrading packages, installing a few essential packages not in the base distribution, and bootstrapping Ruby and RVM (the topic of a future post).  We considered using Capistrano for the full build and management process, but it is not as well suited to this task as Chef. Capistrano is more akin to Make but with provisions for automatically managing multiple SSH connections.  The result is that configuration management with Capistrano requires a lot of explicit scripting.  A fine approach for deploying Rails apps, but something we explicitly wanted to avoid for building our infrastructure.  Chef, on the other hand, allows us to describe what a machine should look like using a declarative approach.  Of course, behind each recipe in the configuration there is some non-trivial programming, but the configuration and its implementation are neatly decoupled.

The documentation for both Capistrano and Chef could be more accessible to the neophyte, but there is a community supporting both tools and a little digging around on Google will get you a long way. Chef in particular would benefit from better error messages when something goes wrong deep inside a recipe written in its Domain Specific Language (DSL).  The stack traces don’t make the site of the error readily discernible We’re also avoiding the central Chef server of a typical Chef deployment in favor of Chef Solo which reads configurations and recipes from the local file system.  This approach is not as well documented but fits well with our current infrastructure deployment scheme.  In the next few paragraphs we’ll give the flavor of our Chef Solo deployment to help get you started if you aren’t already familiar with the tools.  An overview of Chef terminology can be found here.

A Chef Solo installation consists of an installed copy of the Ruby interpreter, the Chef gem, a JSON file describing the configuration you want for the current host, the Chef cookbooks required by the configuration, and a Chef configuration file that tells Chef where to find the cookbooks.  In our case, these prerequisites are installed using Capistrano so once an image is booted, Chef is ready to go.

A very simple configuration file that tells Chef Solo where to find its cookbooks might look like this:

[sourcecode language=”ruby”]
def find_cookbook_paths
cookbooks = [ ENV["HOME"]+"/etc/chef/cookbooks" ];
cookbooks.push(ENV["HOME"]+"/etc/chef/site-cookbooks");
return cookbooks
end

log_level :debug
cookbook_path find_cookbook_paths()
file_cache_path ENV["HOME"]+"/tmp/chef-cache"
role_path ENV["HOME"]+"/tmp/chef-roles"
[/sourcecode]

A configuration, written in JSON, consists of a sequence of declarations.  These declarations might assign the machine a host name, specify which name servers to use in resolv.conf, describe a user account to create, or specify the version of a tool chain such as Java JRE or Ruby to install.  At the end of the configuration is the “run_list” which specifies a sequence of recipes to execute.  These recipes use the metadata in the declarations to configure the machine.  We show a slightly abstracted example of a real JSON configuration at the end of this post.

In Chef, a single computer is represented by a “Node”.  A node consists of a configuration, which in the case of Chef Solo will be a JSON configuration with a set of attributes describing the desired configuration and a run list consisting of a sequence of recipes to get there. Recipes live, oddly enough, in cookbooks.  Recipes are initially drawn from the run list.  Each recipe is executed to find “resources” which are run in order to carry out the actual configuration process.  These resources have access to the node attributes.  Although the JSON configuration will specify many attributes of interest, it is frequently possible to derive attributes from the state of the underlying machine or from configuration files on the machine.  For instance, under Linux the number and type of processors can be determined by reading /proc/cpuinfo. The automatic discovery of these attributes is carried out by Ohai, a handy little gem from Opscode that is useful in its own right.

One of the advantages of Chef is that the ugly, complicated bits can be left out of the per-host configuration and instead encapsulated in a recipe.  The configuration itself is written in a declarative fashion that is easy to inspect and understand.  Thus, the meat of getting your infrastructure up and running with Chef is adapting existing recipes or writing new ones for your environment.

Recipes are organized into cookbooks.  Each collection of related recipes in the cookbook is placed in a separate directory.  For instance, the recipes we use to handle DNS resolver setup lives in the eponymous “resolver” directory. This directory contains two files with metadata (metadata.json and metadata.rb) that are not particularly important for Chef Solo.  There are also several sub-directories:

  • Recipes — a directory containing implementations of individual recipe components.  The default recipe lives in default.rb.  In our example, this would simply be the recipe “resolver”.  Related recipes, such as one that configured resolution for YP/NIS+  might live in yp.rb and be referred to as “resolver::yp”.
  • Files — a two-level directory hierarchy containing files such as shell scripts that Chef recipes may install directly on the target host.  Files for the default recipe live in files/default.  In the case of our resolver recipe, we add a small shell script to hook DHCP events using dhclient’s already existing hook mechanism.  This shell script lives under the files directory.
  • Templates — organized in a similar fashion to the files directory, the templates directory contains ERB templates.  In our resolver example, we provide a template for resolv.conf that can be instantiated with values drawn from the JSON configuration.  This provides a simple way to customize resolv.conf based on the configuration.
  • Other — more complicated recipes may also have “attributes” and “definitions” sub-directories.  A good example worth some study is the Apache 2 Chef recipe from Opscode.

Here’s a fragment from the default resolver recipe we’ve written.  The DSL is close enough to Ruby that it isn’t too hard to pick up if  you already know Ruby.  Although it is best to avoid platform specifics, one can break down a recipe by platform where necessary.  The cookbook_file directive allows us to install files from the files directory.  Similarly, the template directive allows us to instantiate templates using the attributes available from the current Node.  There is also an execute directive that allows the recipe to “shell out” and run arbitrary commands.  As even this simple example illustrates, Chef is not a panacea: you still have to deal with a lot of ugly OS-specific details such as where the distribution decided to put dhclient’s hooks.  But if you are judicious in your choice and implementation of recipes, Chef can help you get well on the road to declarative configuration bliss.

[sourcecode language=”ruby”]
if platform?("centos","redhat") then # RedHat

else # Ubuntu
cookbook_file "/etc/dhcp3/dhclient-exit-hooks" do
source "resolver-hook"
mode "0755"
owner "root"
group "root"
end
end

template "/etc/resolv.conf.dist" do
source "resolv.conf.dist.erb"
owner "root"
group "root"
mode "0444"
end
[/sourcecode]

The net result of implementing infrastructure management with Capistrano and Chef is that the state of the world can be codified in one place.  Snippets of code can be tested in isolation and deployed carefully.  Perhaps most importantly, machine configurations can be declarative rather than being the agglomeration of little configuration files and one-off scripts that inevitably accumulate over time.  The DevOps approach to No Futz or Low Futz computing requires a lot more up front work, but in the long run knowing how your infrastructure is configured when your institutional memory is short or deadlines loom is priceless.

The promised sample JSON declarative configuration:

[sourcecode language=”javascript”]
{
"machine_name": "api.tddium.com",

"users": {
"api": {
"id": "api",
"uid": 1000,
"gid": 1000,
"password": "*",
"shell": "/bin/bash",
"comment": "Tddium API User"
}
},

"psql_users": [ "psql-api" ],

"resolver": {
"domain": "tddium.com",
"search": "tddium.com ec2.internal",
"nameservers": { }
},

"ssh_keys" : { },
"rvm": { "script": "rvm-install-latest" },
"rubies": [ … ],
"rails_config": { "api" : { } },

"passenger": {
"user": "api",
"version": "3.0.3",
"module_path": "/path/gems/passenger-3.0.3/ext/apache2/mod_passenger.so",
"root_path": "/path/gems/passenger-3.0.3",
"ruby_bin": "/path/ruby"
},

"rails_env": "production",

"run_list": [
"recipe[hostname]",
"recipe[resolver]",
"recipe[mkuser]",
"recipe[rvm]",
"recipe[ruby]",
"recipe[postgresql::client]",
"recipe[postgresql::server]",
"recipe[psql_user]",
"recipe[apache2]",
"recipe[apache2::mod_deflate]",
"recipe[apache2::mod_ssl]",
"recipe[rails]",
"recipe[passenger_apache2]",
"recipe[passenger_apache2::mod_rails]",
"recipe[tddium_api]"
]
}
[/sourcecode]

Post a Comment