Tue, 05 Jul 2016

Ansible: A Primer


Permanent link

Ansible is a very pragmatic and powerful configuration management system that is easy to get started with.

Connections and Inventory

Ansible is typically used to connect to one or more remote hosts via ssh and bring them into a desired state. The connection method is pluggable: other methods include local, which simply invokes the commands on the local host instead, and docker, which connects through the Docker daemon to configure a running container.

To tell Ansible where and how to connect, you write an inventory file, called hosts by default. In the inventory file, you can define hosts and groups of hosts, and also set variables that control how to connect to them.

# file myinventory
# example inventory file
[all:vars]
# variables set here apply to all hosts
ansible_user=root

[web]
# a group of webservers
www01.example.com
www02.example.com

[app]
# a group of 5 application servers,
# all following the same naming scheme:
app[01:05].example.com

[frontend:children]
# a group that combines the two previous groups
app
web

[database]
# here we override ansible_user for just one host
db01.example.com ansible_user=postgres

(In versions prior to Ansible 2.0, you have to use ansible_ssh_user instead of ansible_user). See the introduction to inventory files for more information.

To test the connection, you can use the ping module on the command line:

$ ansible -i myinventory web -m ping
www01.example.com | success >> {
    "changed": false,
    "ping": "pong"
}

www02.example.com | success >> {
    "changed": false,
    "ping": "pong"
}

Let's break the command line down into its components: -i myinventory tells Ansible to use the myinventory file as inventory. web tells Ansible which hosts to work on. It can be a group, as in this example, or a single host, or several such things separated by a colon. For example, www01.example.com:database would select one of the web servers and all of the database servers. Finally, -m ping tells Ansible which module to execute. ping is probably the simplest module, it simply sends the response "pong" and that the remote host hasn't changed.

These commands run in parallel on the different hosts, so the order in which these responses are printed can vary.

If there is a problem with connecting to a host, add the option -vvv to get more output.

Ansible implicitly gives you the group all which -- you guessed it -- contains all the hosts configured in the inventory file.

Modules

Whenever you want to do something on a host through Ansible, you invoke a module to do that. Modules usually take arguments that specify what exactly should happen. On the command line, you can add those arguments with `ansible -m module -a 'arguments', for example

$ ansible -i myinventory database -m shell -a 'echo "hi there"'
db01.exapmle.com | success | rc=0 >>
hi there

Ansible comes with a wealth of built-in modules and an ecosystem of third-party modules as well. Here I want to present just a few, commonly-used modules.

The shell Module

The shell module executes a shell command on the host and accepts some options such as chdir to change into another working directory first:

$ ansible -i myinventory database -m shell -e 'pwd chdir=/tmp'
db01.exapmle.com | success | rc=0 >>
/tmp

It is pretty generic, but also an option of last resort. If there is a more specific module for the task at hand, you should prefer the more specific module. For example you could ensure that system users exist using the shell module, but the more specialized user module is much easier to use for that, and likely does a better job than an improvised shell script.

The copy Module

With copy you can copy files verbatim from the local to the remote machine:

$ ansible -i myinventory database -m copy -a 'src=README.md dest=/etc/motd mode=644
db01.example.com | success >> {
    "changed": true,
    "dest": "/etc/motd",
    "gid": 0,
    "group": "root",
    "md5sum": "d41d8cd98f00b204e9800998ecf8427e",
    "mode": "0644",
    "owner": "root",
    "size": 0,
    "src": "/root/.ansible/tmp/ansible-tmp-1467144445.16-156283272674661/source",
    "state": "file",
    "uid": 0
}

The template Module

template mostly works like copy, but it interprets the source file as a Jinja2 template before transferring it to the remote host.

This is commonly used to create configuration files and to incorporate information from variables (more on that later).

Templates cannot be used directly from the command line, but rather in playbooks, so here is an example of a simple playbook.

# file motd.j2
This machine is managed by {{team}}.


# file template-example.yml
---
- hosts: all
  vars:
    team: Slackers
  tasks:
   - template: src=motd.j2 dest=/etc/motd mode=0644

More on playbooks later, but what you can see is that this defines a variable team, sets it to the value Slacker, and the template interpolates this variable.

When you run the playbook with

$ ansible-playbook -i myinventory --limit database template-example.yml

It creates a file /etc/motd on the database server with the contents

This machine is manged by Slackers.

The file Module

The file module manages attributes of file names, such as permissions, but also allows you create directories, soft and hard links.

$ ansible -i myinventory database -m file -a 'path=/etc/apt/sources.list.d state=directory mode=0755'
db01.example.com | success >> {
    "changed": false,
    "gid": 0,
    "group": "root",
    "mode": "0755",
    "owner": "root",
    "path": "/etc/apt/sources.list.d",
    "size": 4096,
    "state": "directory",
    "uid": 0
}

The apt Module

On Debian and derived distributions, such as Ubuntu, installing and removing packages is generally done with package managers from the apt family, such as apt-get, aptitude, and in newer versions, the apt binary directly.

The apt module manages this from within Ansible:

$ ansible -i myinventory database -m apt -a 'name=screen state=installed update_cache=yes'
db01.example.com | success >> {
    "changed": false
}

Here the screen package was already installed, so the module didn't change the state of the system.

Separate modules are available for managing apt-keys with which repositories are cryptographically verified, and for managing the repositories themselves.

The yum and zypper Modules

For RPM-based Linux distributions, the yum module (core) and zypper module (not in core, so must be installed separately) are available. They manage package installation via the package managers of the same name.

The package Module

The package module tries to use whatever package manager it detects. It is thus more generic than the apt and yum modules, but supports far fewer features. For example in the case of apt, it does not provide any control over whether to run apt-get update before doing anything else.

Application-Specific Modules

The modules presented so far are fairly close to the system, but there are also modules for achieving common, application specific tasks. Examples include dealing with databases, network related things such as proxies, version control systems, clustering solutions such as Kubernetes, and so on.

Playbooks

Playbooks can contain multiple calls to modules in a defined order and limit their execution to individual or group of hosts.

They are written in the YAML file format, a data serialization file format that is optimized for human readability.

Here is an example playbook that installs the newest version of the go-agent Debian package, the worker for Go Continuous Delivery:

---
- hosts: go-agent
  vars:
    go_server: hack.p6c.org
  tasks:
  - apt: package=apt-transport-https state=installed
  - apt_key: url=https://download.gocd.io/GOCD-GPG-KEY.asc state=present validate_certs=no
  - apt_repository: repo='deb https://download.gocd.io /' state=present
  - apt: update_cache=yes package={{item}} state=installed
    with_items:
     - go-agent
     - git
     - build-essential
  - lineinfile: dest=/etc/default/go-agent regexp=^GO_SERVER= line=GO_SERVER={{ go_server }}
  - service: name=go-agent enabled=yes state=started

The top level element in this file is a one-element list. The single element starts with hosts: go-agent, which limits execution to hosts in the group go-agent. This is the relevant part of the inventory file that goes with it:

[go-agent]
go-worker01.p6c.org
go-worker02.p6c.org

Then it sets the variable go_server to a string, here this is the hostname where a GoCD server runs.

Finally, the meat of the playbook: the list of tasks to execute.

Each task is a call to a module, some of which have already been discussed. A quick overview:

  • First, the Debian package apt-transport-https is installed, to make sure that the system can fetch meta data and files from Debian repositories through HTTPS
  • The next two tasks use the apt_repository and apt_key modules to configure the repository from which the actual go-agent package shall be installed
  • Another call to apt installs the desired package. Also, some more packages are installed with a loop construct
  • The lineinfile module searches by regex for a line in a text file, and replaces the appropriat line with pre-defined content. Here we use that to configure the GoCD server that the agent connects to.
  • Finally, the service module starts the agent if it's not yet running (state=started), and ensures that it is automatically started on reboot (enabled=yes).

Playbooks are invoked with the ansible-playbook command.

There can be more than one list of tasks in a playbook, which is a common use-case when they affect different groups of hosts:

---
- hosts: go-agent:go-server
  tasks:
  - apt: package=apt-transport-https state=installed
  - apt_key: url=https://download.gocd.io/GOCD-GPG-KEY.asc state=present validate_certs=no
  - apt_repository: repo='deb https://download.gocd.io /' state=present

- hosts: go-agent
  - apt: update_cache=yes package={{item}} state=installed
    with_items:
     - go-agent
     - git
     - build-essential
   - ...

- hosts: go-server
  - apt: update_cache=yes package=go-server state=installed
  - ...

Variables

Variables are useful both for controlling flow inside a playbook, and for filling out spots in templates to generate configuration files.

There are several ways to set variables. One is directly in playbooks, via vars: ..., as seen before. Another is to specify them at the command line:

ansible-playbook --extra-vars=variable=value theplaybook.yml

Another, very flexible way is to use the group_vars feature. For each group that a host is in, Ansible looks for a file group_vars/thegroup.yml and for files matching `group_vars/thegroup/*.yml. A host can be in several groups at once, which gives you quite some flexibility.

For example, you can put each host into two groups, one for the role the host is playing (like webserver, database server, DNS server etc.), and one for the environment it is in (test, staging, prod). Here is a small example that uses this layout:

# environments
[prod]
www[01:02].example.com
db01.example.com

[test]
db01.test.example.com
www01.test.example.com


# functional roles
[web]
www[01:02].example.com
www01.test.example.com

[db]
db01.example.com
db01.test.example.com

To roll out only the test hosts, you can run

ansible-playbook --limit test theplaybook.yml

and put environment-specific variables in group_vars/test.yml and group_vars/prod.yml, and web server specific variables in group_vars/web.yml etc.

You can use nested data structures in your variables, and if you do, you can configure Ansible to merge those data structures for you. You can configure it by creating a file called ansible.cfg with this content:

[defaults]
hash_behavior=merge

That way, you can have a file group_vars/all.yml that sets the default values:

# file group_vars/all.yml
myapp:
    domain: example.com
    db:
        host: db.example.com
        username: myappuser
        instance. myapp

And then override individual elements of that nested data structure, for example in group_vars/test.yml:

# file group_vars/test.yml
myapp:
    domain: text.example.com
    db:
        hostname: db.test.example.com

The keys that the test group vars file didn't touch, for example myapp.db.username, are inherited from the file all.yml.

Roles

Roles are a way to encapsulate parts of a playbook into a reusable component.

Let's consider a real world example that leads to a simple role definition.

For deploying software, you always want to deploy the exact version you want to build, so the relevant part of the playbook is

- apt: name=thepackage={{package_version}} state=present update_cache=yes force=yes

But this requires you to supply the package_version variable whenever you run the playbook, which will not be practical when you instead configure a new machine and need to install several software packages, each with their own playbook.

Hence, we generalize the code to deal with the case that the version number is absent:

- apt: name=thepackage={{package_version}} state=present update_cache=yes force=yes
  when: package_version is defined
- apt: name=thepackage state=present update_cache=yes
  when: package_version is undefined

If you run several such playbooks on the same host, you'll notice that it likely spends most of its time running apt-get update for each playbook. This is necessary the first time, because you might have just uploaded a new package on your local Debian mirror prior to the deployment, but subsequent runs are unnecessary. So you can store the information that a host has already updated its cache in a fact, which is a per-host kind of variable in Ansible.

- apt: update_cache=yes
  when: apt_cache_updated is undefined

- set_fact:
    apt_cache_updated: true

As you can see, the code base for sensibly installing a package has grown a bit, and it's time to factor it out into a role.

Roles are collections of YAML files, with pre-defined names. The commands

$ mkdir roles
$ cd roles
$ ansible-galaxy init custom_package_installation

create an empty skeleton for a role named custom_package_installation. The tasks that previously went into all the playbooks now go into the file tasks/main.yml below the role's main directory:

# file roles/custom_package_installation/tasks/main.yml
- apt: update_cache=yes
  when: apt_cache_updated is undefined
- set_fact:
    apt_cache_updated: true

- apt: name={{package}={{package_version}} state=present update_cache=yes force=yes
  when: package_version is defined
- apt: name={{package} state=present update_cache=yes
  when: package_version is undefined

To use the role, first add the line roles_path = roles in the file ansible.cfg in the [default] section, and then in a playbook, include it like this:

---
- hosts: web
  pre_tasks:
     - # tasks that are execute before the role(s)
  roles: { role: custom_package_installation, package: python-matheval }
  tasks:
    - # tasks that are executed after the role(s)

pre_tasks and tasks are optional; a playbook consisting of only roles being included is totally fine.

Summary

Ansible offers a pragmatic approach to configuration management, and is easy to get started with.

It offers modules for low-level tasks such as transferring files and executing shell commands, but also higher-level task like managing packages and system users, and even application-specific tasks such as managing PostgreSQL and MySQL users.

Playbooks can contain multiple calls to modules, and also use and set variables and consume roles.

Ansible has many more features, like handlers, which allow you to restart services only once after any changes, dynamic inventories for more flexible server landscapes, vault for encrypting variables, and a rich ecosystem of existing roles for managing common applications and middleware.

For learning more about Ansible, I highly recommend the excellent book Ansible: Up and Running by Lorin Hochstein.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link