Puppet

Overview

Systems administration has gotten a lot easier now that there are things like package management. When you need to setup one machine its not such a big deal to compile everything by hand, install it, and configure it. But things begin to get a lot more complex and harder to maintain as the amount of systems that you need to put together grows, especially if you want to maintain a uniform policy across all of your machines. This has become more of a problem for us now that we use things like vservers to put together different atomic pieces of a larger infrastructure. Isolation is good, but increasing the management load (and complexity) is a concern.

When you sit down to put together some service, you spend a lot of time researching what others have done by searching, reading sites, mailing lists, wikis etc. picking pieces from various configurations, making choices that are related to your environment and eventually come up with something that is the best way to set up your service.

Hopefully you document it, so that later you can refer to your notes to remember what you were thinking, or how you managed to do this odd arcane thing. Or maybe you need to rebuild it on new hardware, because things crashed, were seized or have outgrown the hardware. Things are way easier now that there are packages for a lot of stuff, and getting things back to how they were is much easier when you have clear documentation describing all the pieces needed and how you configured each piece. You can often reconstruct things from scratch with this documentation (along with some good backups of the data!) by following your original recipe.

But there are some problems with documentation: it grows old and out-dated, you need to keep it current, and sometimes doing so is difficult. Also following your own documentation sometimes can be hard, if only for the fact that you need to spend all this time setting things up that you had already setup before. That can be a pain! Especially if you need to do it multiple times because you need to grow things. What if you had written this recipe in such a way that you could have the machine apply the recipe and you can save your wrists from typing and it will be done exactly the same way every time. This recipe can also be used to keep your machine’s configurations from drifting away from each other, and allows you to do something once and have it done to an infinite number of machines without loosing your wrists. This is what we can do with puppet!

Puppet is a huge leap forward in the art and practice of systems administration/systems automation/systems architecture. Puppet provides fundamentally useful abstraction layer for not only the practical management of systems, but for expressing the entire infrastructure as meaningful, functional and repeatable code.

The impact on the ability of a single systems administrator to create elegant, functional, repeatable configurations cannot be understated. Most of this was not even theoretically possible with any existing tool until Luke created puppet.

With the right tool, you can automate the vast majority of modern unix systems, as long as the tool, at minimum, manages the following resources: Files, Directories, Symlinks, Packages, Services, and being able to execute arbitrary commands. Many tools provide the ability to manage these fundamental resources, to say “make sure this file is owned by root” or “make sure this service is enabled”. None of them come close to letting you do it as elegantly as Puppet does.

Puppet was designed to manage all of these (and more), but it didn’t stop there. It also gives us an even more powerful abstraction layer: the ability to group together in sets these fundamental resources, according to function (regardless of the underlying implementation). So that the resources have clearly defined relationships that then can be applied to systems. This means that we can take the various steps that are required to get a functioning apache installation, roll them all together into a single thing that we give a name to (eg. the apache class).

So to summarize a few key things that puppet can do: we can manage different resources with a similar semantics, all in one place. We can group those resources together under functional umbrellas so that they are easy to find, maintain and extend.

Puppet is not perfect, it has bugs, the community is good. there is a good irc channel and mailing list, and bugs are responded to and worked on. vast improvements have happened over the last year and even more are coming.

Some words of wisdom (read in the voice of Yoda):

Here are some tips based on my experiences and frustrations: Your head is going to hurt, you are going to be frustrated, at times you will wonder why you are doing this at all. Sometimes the work seems not worth it, especially when you are struggling in the beginning to do simple things. Sometimes simple things are not so simple to do in puppet, the key is to not let that bog you down. I’m not an expert, I still get frustrated.

Architecture overview

Lets go over the different pieces.

Client/Server

Puppet is basically just a single server (the puppetmaster) and individual clients that run on each of your servers. These clients periodically wake up, contact the puppetmaster and ask it how things should be configured on the system. The puppetmaster, after deciding to talk to the client, determines if it has a configuration for that client and if so ‘compiles’ that manifest. If that manifest compiles without error, it then caches that compilation and then sends it off to the client. The client then takes that manifest and steps through it, performing whatever has been detailed in the manifest.

Facter

Facter plays a key roll in puppet, its basically a cross-platform ruby script that returns key facts for a host, things like hostname, FQDN, ip address, architecture, etc. These “facts” can be used in your puppet recipes as reliable variables. If facter is installed you can run it to get a list of available facts on the system and their values. You can also run ‘facter ’ to get the actual value of that fact, for example: ’facter architecture’

Webrick

Puppetmaster listens on port 8140 by default. its a rails application, and so it boots up the built-in webrick ‘webserver’ by default. webrick is a very basic HTTP server, and it is used for the puppet transport by default. if you’ve done any rails programming you will recognize webrick as the development HTTP server that you use to test things out, and eventually deploy to some other more full-featured webserver (such as apache with mod-passenger or mongrel, lighthttp, nginx, etc.). Webrick does not scale, and wasn’t really built to scale, its slow to start and slow to operate. It works fine for the puppetmaster to a certain point, and then it starts to fall on its face. The magic number seems to be around 25-35 nodes, but that varies a lot depending on what you are doing. You dont need to worry about this for now, but when you get to this point you will need to move to a more robust setup. You will know when you get here because you will start having failures and timeouts.

Daemon configuration

So to get started, we need to have puppetmaster and puppet installed, running and listening on 8140 and any firewall opened up. both puppetmaster and puppetd share the configuration file /etc/puppet.conf. its an ini-style configuration file, with headers for different configuration sections. So configurations specific to puppetmasterd would go under puppetmasterd and those specific to puppetd fall under puppetd. A default configuration is installed by the debian package, but to get a commented list of all configuration options, you can run: puppetd —genconfig, also the packages install the puppet user, which the puppetmaster runs as, by default. The puppet clients have to run as root, because the majority of the work that they do is root-specific.

getting started with the puppetmaster

puppetmasterd --verbose --no-daemonize 

This is a good way to debug our ‘master’ because the no-daemonize keeps things in the foreground, and the —verbose gives us more information about what is going on. You wont typically run the puppetmaster like this, but its a good way to see what is going on while you mess around.

getting started with the client (puppetd vs. puppet)

In a typical configuration, you would run puppetd on the client, and it would periodically wake up (default: 30min) and query the master. Some prefer to not have this daemon running all the time, eating up resources, so they run it from cron.

There is also a one-time client that you can run on puppet manifests to test it out, this is simply called ‘puppet’, for example if you wanted to test some manifest locally, you could just create a test.pp file with the stuff you wanted to test, and then run ‘puppet —verbose /tmp/test.pp’ to see what it does.

certificates and bootstrapping

Since we have the puppetmaster running we need to fire up the first client. the first time that the client is run, it will generate a local, self-signed certificate, it will then connect to the master server (which also acts as a certificate authority), and then request that the certificate should be signed. by default, puppet will try to connect to the server called ‘puppet’, unless we specify another server on the command-line. we can either add to /etc/hosts the host alias ‘puppet’ to the localhost address, or we can specify the puppetmaster server when we run things. setting up in DNS the puppetmaster server to have a ‘puppet’ alias is a pretty standard. So we run ‘puppetd —test’. On the client-side we see that the client doesn’t have a signed certificate from the master, on the master side we see that there is a pending certificate sign request (note: if the puppet client thinks it is running on the same server as the master, the certificate will automatically be signed). Now we need to learn a little bit about the built-in CA:

puppetca --list 
puppetca --sign

If you signed your client’s certificate, you can now run the client again and we will see that the signed certificate will be transmitted back and now this particular ‘node’ can authenticate with the puppetmaster and have configurations delivered to it.

site.pp

But wait, what is this error, “missing site.pp”?! What is site.pp? puppet requires a ‘site manifest’, by default this file is called site.pp and is located in the manifests directory on the puppetmaster. This is the central manifest which will contain all your configurations for your nodes, typically by including other things. we’ll start out small with just some basic puppet configuration in the site.pp, but later build it up to have different node definitions. with no nodes defined, the configuration that is defined in site.pp will be applied to all clients that connect.

In the beginning, there is a master manifest, called site.pp which contains all the “nodes” that are managed by puppet and what configurations should be applied for those nodes. For example, the following is a typical entry in your site.pp, this one defines two nodes who have the hostnames “cormorant” and “albatross”, and include a few classes and a module (we will go over classes and modules later):

node cormorant, albatross {
  include debian_etch
  include ssh_server
  include backendmailserver
  include munin::default_client
}

Each node will have its own specific configuration defined in the site.pp by including the appropriate classes, templates, modules, etc. that should be applied for that node.

I have to write code?! the puppet DSL

Puppet is all about the recipes which are written in a pseudo-code. this code is what people call a DSL, a Domain Specific Language. A DSL is essentially a programming language that is targeted at a specific problem domain. The opposite of a DSL would be a general purpose programming language, such as C. DSLs are pretty common. Some examples: CSS, regular expressions, make, SQL… puppet has its own DSL, and the reason it does I believe is a tactical decision by the author. Not all systems administrators are programmers, and learning a programming language is not a trivial task, becoming fluent enough in a language so that you can be productive is even harder. If puppet required people to learn a fully generic language (such as Ruby), I believe that the learning curve would be too great for most people to adopt it. For people who already know Ruby, puppet is frustrating because its not a complete language (but learning puppet’s DSL is easy for them), but for people who dont know ruby, the ramp-up to learn puppet’s DSL is much less steep and you can get things done faster.

stick to a standard format

When you write your puppet manifests, consistency of indentation and code-style are going to be important if you want to work with others. The best thing to do is to use the puppet provided emacs and vi modes. On Debian, make sure you have the ‘puppet’ package installed.

If you are using emacs, you simply have to open a file that ends in .pp to have the puppet mode activated.

If you are using vim on Debian, you can do: ln -s /usr/share/vim/addons/syntax/puppet.vim ~/.vim/plugin/ after doing this, you should automatically get syntax highlighting, if it doesn’t work automatically, try typing, “:syn on”

Check the upstream repository for newer versions of these files, as there are always improvements making them better!

resources

The basic configuration in puppet is a ‘resource’, resource are basically things you want to manage on your nodes. they are files, services, cronjobs, users… Its highly recommended you keep the upstream Type Reference handy when you are working on puppet.

attributes/parameters

Each resource has its own set of attributes and a name; for example, the ‘mode’ attribute in the file resource is used by the file resource type to set file permissions. The ‘name’ is the unique name for this particular thing you are defining. The file resource type definition contains all the possible attributes and parameters you can set.

metaparameters

There are also some special attributes, called metaparameters, which can used on any resource type.

a simple resource

Start by using one of the most common resource types: the ‘file’ resource:

file { "/tmp/test":
     mode => 0644, owner => micah, group => micah;
}

This is a “file” resource, it has the “name” of “/tmp/test”. The name variable here is also the path, and if you look at the file resource type definition you will find that one of the parameters you can pass to this resource is the ‘path’ parameter. By default the ‘path’ parameter is set to the ‘namevar’, but it doesn’t have to be. For example, this is the same:

file { "myfirsttest":
     path => "/tmp/test,
     mode => 0644, owner => micah, group => micah;
}

Its just convenient to make the namevar the path, but you don’t have to.

You also should note that after the resource (in this case “file”) has been specified, we open a set of curly braces. These braces will hold the resource title and attributes. They need to match properly.

files, templates, manifests, oh my!

templates vs. files

Templates are files that have pieces that are generated and then shipped off to the clients. More here

Pulling it all together

One way to set this up is to have a centralized git repository where you put everything that would be in /etc/puppet on the puppetmaster. This git repository can be checked out and edited locally, and when you check it back in to the repository the /etc/puppet directory on the puppetmaster is automatically updated and the puppetmaster detects these changes automatically, and the clients will eventually check in and pick up your changes. Who needs to login anymore when you control the world from your laptop? Delicious power!!

This way you can mess with things offline, make commits and when you get online you can push those changes (you should test them locally to make sure they work first!). If you wanted to then watch those things get deployed you can login to the node in question and run puppetd --test, and watch and see what happens, then adjust accordingly.

Set up your git repository on the puppetmaster

One way to do this would be to use gitosis on the puppetmaster. Create a puppet repository there, and then configure it for some good times.

email notifications to your co-conspirators

One nice thing to do is to notify the people you work with of changes, you can do that by specifying in your .git/config the following:

[hooks]
	mailinglist = email@addresses.here,each@one.should.be,separated@by.commas
	announcelist =
	envelopesender = 
	emailprefix =

Then copy into your .git/hooks/post-receive the script provided in /usr/share/doc/git-core/contrib/hooks/post-receive-email and make it executable.

auto-checkout to the puppetmaster

The git repository only really becomes awesome when the commits you do get automatically deployed out to your infrastructure. To do this, you will need to configure git to automatically checkout the repository into /etc/puppet after you have pushed a commit.

One way to do that is to create a post-update hook in .git/hooks/post-update. You will need the procmail package installed so you can set a lockfile, then put the following script in the post-update, and make it executable:

#!/bin/bash

PUPPET_DIR=/etc/puppet

echo ""
echo "Updating $PUPPET_DIR"
echo ""

unset GIT_DIR

cd $PUPPET_DIR

# lockfile comes from procmail package
lockfile ~/puppet.lock

# call external SUID script that will update PUPPET_DIR
sudo -u puppet /usr/local/sbin/update_puppet_repo.sh $PUPPET_DIR || echo "Updating $PUPPET_DIR failed. Fix it manually."

# remove lock
rm -f ~/puppet.lock

You will also need to have installed the /usr/local/sbin/update_puppet_repo.sh script:

#!/bin/bash

if [ $# -ne 1 ]; then
	echo "Usage: `basename $0` /full/path/to/puppet/dir" 1>&2
	exit 1
fi

PUPPET_DIR=$1

unset GIT_DIR # better safe than sorry

cd $PUPPET_DIR

# update to HEAD
git pull --rebase || exit $?

git submodule update --init || exit $?

Now test things out by pushing some commits, those commits should automatically get pushed into /etc/puppet and they should be owned by the user puppet.

In the course of human events, you might be getting frustrated with your puppet commits not working right, and you want to try a few things out on the master before you commit them. You can do that by editing the files in /etc/puppet, but then you will be out of sync with your repository, and the next time you try to commit, it will fail. You will need to make sure your /etc/puppet git checkout stays clean, so use git checkout — /etc/puppet/file/i/messed/with to get that file back to the repository state.

Ordering / DAG

Many people are confused by puppet because the language is not interpreted and executed sequentially, so if you write things in a certain order you will be surprised that things do not happen in that same order. This is because puppet is a declarative language which really just means that what you are doing is defining things like resources and defining their relationships. Puppet creates a dependency graph, does a topological sort and then executes in that order. This means that Puppet then tries to be smart in regards to the situation that you have declared, but there is no guarantee that puppet will do the right thing in terms of applying your resources without the logic being explicit… it probably will work, but there might be more than one path through the resulting graph.

This means that class inheritance does not necessarily ensure order, it means that file order doesn’t mean squat, and it means that unless you explicitly tell puppet that something should depend on something else, it isn’t going to figure that out on its own

So to save your sanity, if you haven’t lost it already, be sure to specify in the manifest if something requires another thing, you will thank yourself later.

Namespaces

One of the most confusing things with puppet is that there are certain namespace issues that you are likely to run into because its not clearly spelled out anywhere what you can and and cannot use for naming nodes, classes, variables, etc. I am going to attempt to collect all of these here so that they can be better compiled to make this clear.

Nodes and Classes must have unique names

If you have a node that includes a class with the same name, such as the following:

node test {
  include test
}

...

class test {
   do_some_stuff_here
}

You will find that when the node test runs puppet it will not apply this test class configuration. If you are running puppet 0.23.1, you will get an error about this.

Modules

A module in puppet is an encapsulation of a specific thing you want to achieve, all packaged up together in one place. A Puppet module contains manifests, distributable files and templates all wrapped up together in its own hierarchy.

An example module might be used to configure your apache virtual hosts.

How does puppet find modules? The modulepath parameter!

There are a bunch of modules that come with puppet, there is also planned a module distribution framework so you can download externally contributed modules that the community has made. There are also the modules that you create yourself. Each of these different modules get put in a different place in the filesystem to help differentiate between them, so there is a configuration value called modulepath that lets puppet know where it should look for all the modules. For example, you may have in your puppetd.conf the following:

modulepath=/etc/puppet/modules:/usr/share/puppet:/var/lib/prm

Module names

Modules must be named with normal words, no spaces or “::” or “/” characters and the word “site” is reserved.

Using modules

To use a module, it must first be located in a directory underneath the modulepath search path, then it is just a matter of using a class from the module by referencing its name and it will then be autoloaded. Or you can use an ‘import “modulename”’ line to get it loaded.

Example modules

This page has some modules available

Miscellaneous useful things

How to set the environment in cron

There is a cron type that you can use to specify cronjobs for particular users, sometimes you need to set the environment to particular values, you can do that like this:

class cronstuff {
    # First set defaults for all cron entries
    Cron { environment => [
              'SHELL=/bin/sh', 'HOME=/var/log',
              'PATH=/etc:/bin:/sbin:/usr/bin:/usr/sbin:/usr/local/bin' ]
    }
    cron { foo:
        user => nobody, minute => "*/15", ensure => present,
        command => "/foo/bin/stuff.sh"
    }
    cron { bar:
        user => nobody, minute => "*/15", ensure => present,
        command => "/foo/bin/things.sh"
    }
}

I keep getting the error, “end of file reached”, WTF?

There are two very common situations where this occurs, one is with fileserving, and the second is with storedconfigs and report storing.

Puppet uses XMLRPC and WEBrick for the transfer files and it doesn’t scale very well, sometimes it results in this error when the number of clients hitting it reaches somewhere around 7. Using Mongrel instead of WEBrick allows for multiple processes to serve the same pool of clients on the same host, rather than WEBrick’s limitation of using Ruby’s threading which doesn’t scale beyond one process and instead just starts dropping concurrent connections. If you’re getting connection-reset or End-of-file errors, you should try switching to Mongrel. The authors are working to change to REST which will make this much more efficient. On your side, you could switch to mongrel as this seems to have a much higher threshold for concurrent clients (somewhere around 30).

The second scenario where this happens is when you are using StoredConfigs, or are storing reports after configuration runs. Puppet uses SQLlite by default and this works fine until you start doing more interesting things. Once you have more than a few nodes checking in, the amount of time it takes to process, compile and serve the configurations starts increasing significantly when using the SQLlite back-end. The solution is to switch to using MySQL or PostgreSQL back-end. To do this, simply create a puppet database in your preferred database software, and then make sure you have the following set in your puppetmasterd section of puppet.conf:

[puppetmasterd]
dbadapter=mysql
dbserver=localhost
dbuser=puppet
dbpassword=yourpasswdhere

Unrecognised escape sequence ‘\/’ ?

You may find that when using a definition such as “delete_lines” to remove a pattern that has slashes in it (e.g. /proc/kmsg) you will need to escape the slashes for sed and grep (e.g. “\/proc\/kmsg”), and it works fine, but you keep getting this annoying error:

Aug  7 08:27:58 kakapo puppetmasterd[26482]: Unrecognised escape sequence '\/' in file /etc/puppet/manifests/default.pp at line 121

This is because puppet is trying to spit on the output the escaped lines, but it is passing unrecognized sequences to the Escaper in puppet on output. If you want this error to go away, you will need to double escape your backslashes (e.g. “\\/proc\\/kmsg”). This behavior will become stricter in future releases of puppet, so it wont even pass these sequences to the output, so double-backslashing will be required.

Why not cfengine?

cfengine is not really that great of a tool, but it was really the only thing out there, it looks good at first, but you dont use it for long before you start to hate it. Its a closed product, although it is GPL’d, the author doesn’t accept patches. There are a lot of assumptions that are hard coded in cfengine, if you dont share all of these with the author, then you are in trouble. Puppet allows you to make your own assumptions and encode those, instead of forcing you to think about the world in a specific way. Thus puppet is more of a framework that allows you to express your preconceptions, rather than a tool that forces you to adopt certain assumptions. Its also a more open community that encourages participation.

Puppet doesn’t think that it is the right way to do this, but it is trying and it is interested in knowing how to do it better.

Resource relationships — you can define something, like a daemon and then define relationships that are related, such as this daemon should be running, it requires this configuration file, this package needs to be installed, etc. If this configuration file changes, restart the service. Or make sure this user is created, before you try to do a chown /home/user because that wont work, etc.

Why not capistrano?

The short answer is: puppet and capistrano go together like curry and rice.

Puppet and Capistrano fill two different roles that are complementary. Capistrano is basically Rake+SSH, allowing you to define ways to execute commands on servers in parallel. The most typical use is to deploy Rails applications. Capistrano can be used to automate systems (and some use it this way), but doing this isn’t much different from using shell scripts and for loops wrapped around ssh.

Puppet is a language for expressing your infrastructure in code. Using Puppet’s language, you define all the resources needed in your
infrastructure, and then apply those resources to individual nodes (or servers.) Puppet makes your infrastructure more reliable, repeatable, and documented.

Puppet and capistrano go together, puppet builds and manages the infrastructure, while Capistrano can handle the deployment of new code. Both of these things each is individually suited for and is particularly good at doing.

On the horizon is a tool called iClassify which will make integrating Puppet and Capistrano more direct as it lets you organize your systems with tags which map to Puppet classes and Capistrano can query it to setup specific server roles. An example that was brought up on the Puppet list for how these could work together goes as follows. you might have a “database” puppet class that configures a mysql server. It further might be a “master” or “slave”, which maps to the Capistrano :db role. We query iClassify with Capistrano, asking for the database servers, and making the one that is your master the :primary one.

What about others?

Wikipedia has a really good comparison chart of different free/open configuration management software options. Feel free to explore, in my experience puppet is the most mature offering and has the most active and productive development, but I haven’t tried them all.