Categories
Posts in this category
- Automating Deployments: A New Year and a Plan
- Automating Deployments: Why bother?
- Automating Deployments: Simplistic Deployment with Git and Bash
- Automating Deployments: Building Debian Packages
- Automating Deployments: Debian Packaging for an Example Project
- Automating Deployments: Distributing Debian Packages with Aptly
- Automating Deployments: Installing Packages
- Automating Deployments: 3+ Environments
- Architecture of a Deployment System
- Introducing Go Continuous Delivery
- Technology for automating deployments: the agony of choice
- Automating Deployments: New Website, Community
- Continuous Delivery for Libraries?
- Managing State in a Continuous Delivery Pipeline
- Automating Deployments: Building in the Pipeline
- Automating Deployments: Version Recycling Considered Harmful
- Automating Deployments: Stage 2: Uploading
- Automating Deployments: Installation in the Pipeline
- Automating Deployments: Pipeline Templates in GoCD
- Automatically Deploying Specific Versions
- Story Time: Rollbacks Saved the Day
- Automated Deployments: Unit Testing
- Automating Deployments: Smoke Testing and Rolling Upgrades
- Automating Deployments and Configuration Management
- Ansible: A Primer
- Continuous Delivery and Security
- Continuous Delivery on your Laptop
- Moritz on Continuous Discussions (#c9d9)
- Git Flow vs. Continuous Delivery
Tue, 05 Jul 2016
Ansible: A Primer
Permanent link
Ansible is a very pragmatic and powerful configuration management system that is easy to get started with.
Connections and Inventory
Ansible is typically used to connect to one or more remote hosts
via ssh
and bring them into a desired state. The connection method is
pluggable: other methods include local
, which simply invokes the commands on
the local host instead, and docker
, which connects through the Docker daemon
to configure a running container.
To tell Ansible where and how to connect, you write an inventory file,
called hosts
by default. In the inventory file, you can define hosts and
groups of hosts, and also set variables that control how to connect to them.
# file myinventory
# example inventory file
[all:vars]
# variables set here apply to all hosts
ansible_user=root
[web]
# a group of webservers
www01.example.com
www02.example.com
[app]
# a group of 5 application servers,
# all following the same naming scheme:
app[01:05].example.com
[frontend:children]
# a group that combines the two previous groups
app
web
[database]
# here we override ansible_user for just one host
db01.example.com ansible_user=postgres
(In versions prior to Ansible 2.0, you have to use ansible_ssh_user
instead
of ansible_user
). See the introduction to inventory
files for more
information.
To test the connection, you can use the ping
module on the command line:
$ ansible -i myinventory web -m ping
www01.example.com | success >> {
"changed": false,
"ping": "pong"
}
www02.example.com | success >> {
"changed": false,
"ping": "pong"
}
Let's break the command line down into its components: -i myinventory
tells Ansible to use the myinventory
file as inventory. web
tells
Ansible which hosts to work on. It can be a group, as in this example, or a
single host, or several such things separated by a colon. For example,
www01.example.com:database
would select one of the web servers and all of
the database servers. Finally, -m ping
tells Ansible which module to
execute. ping
is probably the simplest module, it simply sends the
response "pong"
and that the remote host hasn't changed.
These commands run in parallel on the different hosts, so the order in which these responses are printed can vary.
If there is a problem with connecting to a host, add the option -vvv
to get
more output.
Ansible implicitly gives you the group all
which -- you guessed it --
contains all the hosts configured in the inventory file.
Modules
Whenever you want to do something on a host through Ansible, you invoke a module to do that. Modules usually take arguments that specify what exactly should happen. On the command line, you can add those arguments with `ansible -m module -a 'arguments', for example
$ ansible -i myinventory database -m shell -a 'echo "hi there"'
db01.exapmle.com | success | rc=0 >>
hi there
Ansible comes with a wealth of built-in modules and an ecosystem of third-party modules as well. Here I want to present just a few, commonly-used modules.
The shell
Module
The shell
module
executes a shell command on the host and accepts some options such as chdir
to change into another working directory first:
$ ansible -i myinventory database -m shell -e 'pwd chdir=/tmp'
db01.exapmle.com | success | rc=0 >>
/tmp
It is pretty generic, but also an option of last resort. If there is a more
specific module for the task at hand, you should prefer the more specific
module. For example you could ensure that system users exist using the shell
module, but the more specialized user
module is much easier to
use for that, and likely does a better job than an improvised shell script.
The copy
Module
With copy
you can
copy files verbatim from the local to the remote machine:
$ ansible -i myinventory database -m copy -a 'src=README.md dest=/etc/motd mode=644
db01.example.com | success >> {
"changed": true,
"dest": "/etc/motd",
"gid": 0,
"group": "root",
"md5sum": "d41d8cd98f00b204e9800998ecf8427e",
"mode": "0644",
"owner": "root",
"size": 0,
"src": "/root/.ansible/tmp/ansible-tmp-1467144445.16-156283272674661/source",
"state": "file",
"uid": 0
}
The template
Module
template
mostly
works like copy
, but it interprets the source file as a Jinja2
template before transferring it to the
remote host.
This is commonly used to create configuration files and to incorporate information from variables (more on that later).
Templates cannot be used directly from the command line, but rather in playbooks, so here is an example of a simple playbook.
# file motd.j2
This machine is managed by {{team}}.
# file template-example.yml
---
- hosts: all
vars:
team: Slackers
tasks:
- template: src=motd.j2 dest=/etc/motd mode=0644
More on playbooks later, but what you can see is that this defines a variable
team
, sets it to the value Slacker
, and the template interpolates this
variable.
When you run the playbook with
$ ansible-playbook -i myinventory --limit database template-example.yml
It creates a file /etc/motd
on the database server with the contents
This machine is manged by Slackers.
The file
Module
The file
module manages
attributes of file names, such as permissions, but also allows you create
directories, soft and hard links.
$ ansible -i myinventory database -m file -a 'path=/etc/apt/sources.list.d state=directory mode=0755'
db01.example.com | success >> {
"changed": false,
"gid": 0,
"group": "root",
"mode": "0755",
"owner": "root",
"path": "/etc/apt/sources.list.d",
"size": 4096,
"state": "directory",
"uid": 0
}
The apt
Module
On Debian and derived distributions, such as Ubuntu, installing and removing
packages is generally done with package managers from the apt
family, such
as apt-get
, aptitude
, and in newer versions, the apt
binary directly.
The apt module manages this from within Ansible:
$ ansible -i myinventory database -m apt -a 'name=screen state=installed update_cache=yes'
db01.example.com | success >> {
"changed": false
}
Here the screen
package was already installed, so the module didn't change
the state of the system.
Separate modules are available for managing apt-keys with which repositories are cryptographically verified, and for managing the repositories themselves.
The yum
and zypper
Modules
For RPM-based Linux distributions, the yum module (core) and zypper module (not in core, so must be installed separately) are available. They manage package installation via the package managers of the same name.
The package
Module
The package
module
tries to use whatever package manager it detects. It is thus more generic than
the apt
and yum
modules, but supports far fewer features. For example in
the case of apt
, it does not provide any control over whether to run apt-get
update
before doing anything else.
Application-Specific Modules
The modules presented so far are fairly close to the system, but there are also modules for achieving common, application specific tasks. Examples include dealing with databases, network related things such as proxies, version control systems, clustering solutions such as Kubernetes, and so on.
Playbooks
Playbooks can contain multiple calls to modules in a defined order and limit their execution to individual or group of hosts.
They are written in the YAML file format, a data serialization file format that is optimized for human readability.
Here is an example playbook that installs the newest version of the go-agent
Debian package, the worker for Go Continuous Delivery:
---
- hosts: go-agent
vars:
go_server: hack.p6c.org
tasks:
- apt: package=apt-transport-https state=installed
- apt_key: url=https://download.gocd.io/GOCD-GPG-KEY.asc state=present validate_certs=no
- apt_repository: repo='deb https://download.gocd.io /' state=present
- apt: update_cache=yes package={{item}} state=installed
with_items:
- go-agent
- git
- build-essential
- lineinfile: dest=/etc/default/go-agent regexp=^GO_SERVER= line=GO_SERVER={{ go_server }}
- service: name=go-agent enabled=yes state=started
The top level element in this file is a one-element list. The single element
starts with hosts: go-agent
, which limits execution to hosts in the group
go-agent
. This is the relevant part of the inventory file that goes with it:
[go-agent]
go-worker01.p6c.org
go-worker02.p6c.org
Then it sets the variable go_server
to a string, here this is the hostname
where a GoCD server runs.
Finally, the meat of the playbook: the list of tasks to execute.
Each task is a call to a module, some of which have already been discussed. A quick overview:
- First, the Debian package
apt-transport-https
is installed, to make sure that the system can fetch meta data and files from Debian repositories through HTTPS - The next two tasks use the
apt_repository
and apt_key modules
to configure the repository from which the actual
go-agent
package shall be installed - Another call to
apt
installs the desired package. Also, some more packages are installed with a loop construct - The lineinfile module searches by regex for a line in a text file, and replaces the appropriat line with pre-defined content. Here we use that to configure the GoCD server that the agent connects to.
- Finally, the service
module starts the agent if it's not yet running (
state=started
), and ensures that it is automatically started on reboot (enabled=yes
).
Playbooks are invoked with the ansible-playbook
command.
There can be more than one list of tasks in a playbook, which is a common use-case when they affect different groups of hosts:
---
- hosts: go-agent:go-server
tasks:
- apt: package=apt-transport-https state=installed
- apt_key: url=https://download.gocd.io/GOCD-GPG-KEY.asc state=present validate_certs=no
- apt_repository: repo='deb https://download.gocd.io /' state=present
- hosts: go-agent
- apt: update_cache=yes package={{item}} state=installed
with_items:
- go-agent
- git
- build-essential
- ...
- hosts: go-server
- apt: update_cache=yes package=go-server state=installed
- ...
Variables
Variables are useful both for controlling flow inside a playbook, and for filling out spots in templates to generate configuration files.
There are several ways to set variables. One is directly in playbooks, via
vars: ...
, as seen before. Another is to specify them at the command line:
ansible-playbook --extra-vars=variable=value theplaybook.yml
Another, very flexible way is to use the group_vars
feature. For each group
that a host is in, Ansible looks for a file group_vars/thegroup.yml
and
for files matching `group_vars/thegroup/*.yml. A host can be in several
groups at once, which gives you quite some flexibility.
For example, you can put each host into two groups, one for the role the host is playing (like webserver, database server, DNS server etc.), and one for the environment it is in (test, staging, prod). Here is a small example that uses this layout:
# environments
[prod]
www[01:02].example.com
db01.example.com
[test]
db01.test.example.com
www01.test.example.com
# functional roles
[web]
www[01:02].example.com
www01.test.example.com
[db]
db01.example.com
db01.test.example.com
To roll out only the test hosts, you can run
ansible-playbook --limit test theplaybook.yml
and put environment-specific variables in group_vars/test.yml
and
group_vars/prod.yml
, and web server specific variables in
group_vars/web.yml
etc.
You can use nested data structures in your variables, and if you do, you can
configure Ansible to merge those data structures for you. You can configure it
by creating a file called ansible.cfg
with this content:
[defaults]
hash_behavior=merge
That way, you can have a file group_vars/all.yml
that sets the default
values:
# file group_vars/all.yml
myapp:
domain: example.com
db:
host: db.example.com
username: myappuser
instance. myapp
And then override individual elements of that nested data structure, for
example in group_vars/test.yml
:
# file group_vars/test.yml
myapp:
domain: text.example.com
db:
hostname: db.test.example.com
The keys that the test
group vars file didn't touch, for example
myapp.db.username
, are inherited from the file all.yml
.
Roles
Roles are a way to encapsulate parts of a playbook into a reusable component.
Let's consider a real world example that leads to a simple role definition.
For deploying software, you always want to deploy the exact version you want to build, so the relevant part of the playbook is
- apt: name=thepackage={{package_version}} state=present update_cache=yes force=yes
But this requires you to supply the package_version
variable whenever you
run the playbook, which will not be practical when you instead configure a new
machine and need to install several software packages, each with their own
playbook.
Hence, we generalize the code to deal with the case that the version number is absent:
- apt: name=thepackage={{package_version}} state=present update_cache=yes force=yes
when: package_version is defined
- apt: name=thepackage state=present update_cache=yes
when: package_version is undefined
If you run several such playbooks on the same host, you'll notice that it
likely spends most of its time running apt-get update
for each playbook. This
is necessary the first time, because you might have just uploaded a new
package on your local Debian mirror prior to the deployment, but subsequent
runs are unnecessary. So you can store the information that a host has already
updated its cache in a fact,
which is a per-host kind of variable in Ansible.
- apt: update_cache=yes
when: apt_cache_updated is undefined
- set_fact:
apt_cache_updated: true
As you can see, the code base for sensibly installing a package has grown a bit, and it's time to factor it out into a role.
Roles are collections of YAML files, with pre-defined names. The commands
$ mkdir roles
$ cd roles
$ ansible-galaxy init custom_package_installation
create an empty skeleton for a role named custom_package_installation
.
The tasks that previously went into all the playbooks now go into the file
tasks/main.yml
below the role's main directory:
# file roles/custom_package_installation/tasks/main.yml
- apt: update_cache=yes
when: apt_cache_updated is undefined
- set_fact:
apt_cache_updated: true
- apt: name={{package}={{package_version}} state=present update_cache=yes force=yes
when: package_version is defined
- apt: name={{package} state=present update_cache=yes
when: package_version is undefined
To use the role, first add the line roles_path = roles
in the file
ansible.cfg
in the [default]
section, and then in a playbook, include it
like this:
---
- hosts: web
pre_tasks:
- # tasks that are execute before the role(s)
roles: { role: custom_package_installation, package: python-matheval }
tasks:
- # tasks that are executed after the role(s)
pre_tasks
and tasks
are optional; a playbook consisting of only roles
being included is totally fine.
Summary
Ansible offers a pragmatic approach to configuration management, and is easy to get started with.
It offers modules for low-level tasks such as transferring files and executing shell commands, but also higher-level task like managing packages and system users, and even application-specific tasks such as managing PostgreSQL and MySQL users.
Playbooks can contain multiple calls to modules, and also use and set variables and consume roles.
Ansible has many more features, like handlers, which allow you to restart services only once after any changes, dynamic inventories for more flexible server landscapes, vault for encrypting variables, and a rich ecosystem of existing roles for managing common applications and middleware.
For learning more about Ansible, I highly recommend the excellent book Ansible: Up and Running by Lorin Hochstein.
I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.