Categories

Ads

Your advertisement could be here -- Contact me!.

Sun, 04 Dec 2016

Perl 6 By Example: Formatting a Sudoku Puzzle


Permanent link

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


As a gentle introduction to Perl 6, let's consider a small task that I recently encountered while pursuing one of my hobbies.

Sudoku is a number-placement puzzle played on a grid of 9x9 cells, subdivided into blocks of 3x3. Some of the cells are filled out with numbers from 1 to 9, some are empty. The objective of the game is to fill out the empty cells so that in each row, column and 3x3 block, each digit from 1 to 9 occurs exactly once.

An efficient storage format for a Sudoku is simply a string of 81 characters, with 0 for empty cells and the digits 1 to 9 for pre-filled cells. The task I want to solve is to bring this into a friendlier format.

The input could be:

000000075000080094000500600010000200000900057006003040001000023080000006063240000

On to our first Perl 6 program:

# file sudoku.p6
use v6;
my $sudoku = '000000075000080094000500600010000200000900057006003040001000023080000006063240000';
for 0..8 -> $line-number {
    say substr $sudoku, $line-number * 9, 9;
}

You can run it like this:

$ perl6 sudoku.p6
000000075
000080094
000500600
010000200
000900057
006003040
001000023
080000006
063240000

There's not much magic in there, but let's go through the code one line at a time.

The first line, starting with a # is a comment that extends to the end of the line.

use v6;

This line is not strictly necessary, but good practice anyway. It declares the Perl version you are using, here v6, so any version of the Perl 6 language. We could be more specific and say use v6.c; to require exactly the version discussed here. If you ever accidentally run a Perl 6 program through Perl 5, you'll be glad you included this line, because it'll tell you:

$ perl sudoku.p6
Perl v6.0.0 required--this is only v5.22.1, stopped at sudoku.p6 line 1.
BEGIN failed--compilation aborted at sudoku.p6 line 1.

instead of the much more cryptic

syntax error at sudoku.p6 line 4, near "for 0"
Execution of sudoku.p6 aborted due to compilation errors.

The first interesting line is

my $sudoku = '00000007500...';

my declares a lexical variable. It is visible from the point of the declaration to the end of the current scope, which means either to the end of the current block delimited by curly braces, or to the end of the file if it's outside any block. As it is in this example.

Variables start with a sigil, here a '$'. Sigils are what gave Perl the reputation of being line noise, but there is signal in the noise. The $ looks like an S, which stands for scalar. If you know some math, you know that a scalar is just a single value, as opposed to a vector or even a matrix.

The variable doesn't start its life empty, because there's an initialization right next to it. The value it starts with is a string literal, as indicated by the quotes.

Note that there is no need to declare the type of the variable beyond the very vague "it's a scalar" implied by the sigil. If we wanted, we could add a type constraint:

my Str $sudoku = '00000007500...';

But when quickly prototyping, I tend to forego type constraints, because I often don't know yet how exactly the code will work out.

The actual logic happens in the next lines, by iterating over the line numbers 0 to 8:

for 0..8 -> $line-number {
    ...
}

The for loop has the general structure for ITERABLE BLOCK. Here the iterable is a range, and the block is a pointy block. The block starts with ->, which introduces a signature. The signature tells the compiler what arguments the blocks expects, here a single scalar called $line-number.

Perl 6 allows to use a dash - or a single quote ' inside identifier, as long as there is a letter on both sides (to disambiguate it from subtraction).

Again, type constraints are optional. If you chose to include them, it would be for 0..9 -> Int $line-number { ... }.

$line-number is again a lexical variable, and visible inside the block that comes after the signature. Blocks are delimited by curly braces.

say substr $sudoku, $line-number * 9, 9;

Both say and substr are functions provided by the Perl 6 standard library. substr($string, $start, $chars) extracts a substring of (up to) $chars characters length from $string, starting from index $start. Oh, and indexes are zero-based in Perl 6.

say then prints this substring, followed by a line break.

As you can see from the example, function invocations don't need parenthesis, though you can add them if you want:

say substr($sudoku, $line-number * 9, 9);

or even

say(substr($sudoku, $line-number * 9, 9));

Making the Sudoku playable

As the output of our script stands now, you can't play the resulting Sudoku even if you printed it, because all those pesky zeros get in your way of actually entering the numbers you carefully deduce while solving the puzzle.

So, let's substitute each 0 with a blank:

# file sudoku.p6
use v6;

my $sudoku = '000000075000080094000500600010000200000900057006003040001000023080000006063240000';
$sudoku = $sudoku.trans('0' => ' ');

for 0..8 -> $line-number {
    say substr $sudoku, $line-number * 9, 9;
}

trans is a method of the Str class. Its argument is a Pair. The boring way to create a Pair would be Pair.new('0', ' '), but since it's so commonly used, there is a shortcut in the form of the fat arrow, =>. The method trans replaces each occurrence of they pair's key with the pair's value, and returns the resulting string.

Speaking of shortcuts, you can also shorten $sudoku = $sudoku.trans(...) to $sudoku.=trans(...). This is a general pattern that turns methods that return a result into mutators.

With the new string substitution, the result is playable, but ugly:

$ perl6 sudoku.p6
       75
    8  94
   5  6  
 1    2  
   9   57
  6  3 4 
  1    23
 8      6
 6324    

A bit ASCII art makes it bearable:

+---+---+---+
|   | 1 |   |
|   |   |79 |
| 9 |   | 4 |
+---+---+---+
|   |  4|  5|
|   |   | 2 |
|3  | 29|18 |
+---+---+---+
|  4| 87|2  |
|  7|  2|95 |
| 5 |  3|  8|
+---+---+---+

To get the vertical dividing lines, we need to sub-divide the lines into smaller chunks. And since we already have one occurrence of dividing a string into smaller strings of a fixed size, it's time to encapsulate it into a function:

sub chunks(Str $s, Int $chars) {
    gather for 0 .. $s.chars / $chars - 1 -> $idx {
        take substr($s, $idx * $chars, $chars);
    }
}

for chunks($sudoku, 9) -> $line {
    say chunks($line, 3).join('|');
}

The output is:

$ perl6 sudoku.p6
   |   | 75
   | 8 | 94
   |5  |6  
 1 |   |2  
   |9  | 57
  6|  3| 4 
  1|   | 23
 8 |   |  6
 63|24 |   

But how did it work? Well, sub (SIGNATURE) BLOCK declares a subroutine, short sub. Here I declare it to take two arguments, and since I tend to confuse the order of arguments to functions I call, I've added type constraints that make it very likely that Perl 6 catches the error for me.

gather and take work together to create a list. gather is the entry point, and each execution of take adds one element to the list. So

gather {
    take 1;
    take 2;
}

would return the list 1, 2. Here gather acts as a statement prefix, which means it collects all takes from within the for loop.

A subroutine returns the value from the last expression, which here is the gather for ... thing discussed above.

Coming back to the program, the for-loop now looks like this:

for chunks($sudoku, 9) -> $line {
    say chunks($line, 3).join('|');
}

So first the program chops up the full Sudoku string into lines of nine characters, and then for each line, again into a list of three strings of three characters length. The join method turns it back into a string, but with pipe symbols inserted between the chunks.

There are still vertical bars missing at the start and end of the line, which can easily be hard-coded by changing the last line:

    say '|', chunks($line, 3).join('|'), '|';

Now the output is

|   |   | 75|
|   | 8 | 94|
|   |5  |6  |
| 1 |   |2  |
|   |9  | 57|
|  6|  3| 4 |
|  1|   | 23|
| 8 |   |  6|
| 63|24 |   |

Only the horizontal lines are missing, which aren't too hard to add:

my $separator = '+---+---+---+';
my $index = 0;
for chunks($sudoku, 9) -> $line {
    if $index++ %% 3 {
        say $separator;
    }
    say '|', chunks($line, 3).join('|'), '|';
}
say $separator;

Et voila:

+---+---+---+
|   |   | 75|
|   | 8 | 94|
|   |5  |6  |
+---+---+---+
| 1 |   |2  |
|   |9  | 57|
|  6|  3| 4 |
+---+---+---+
|  1|   | 23|
| 8 |   |  6|
| 63|24 |   |
+---+---+---+

There are two new aspects here: the if conditional, which structurally very much resembles the for loop. The second new aspect is the divisibility operator, %%. From other programming languages you probably know % for modulo, but since $number % $divisor == 0 is such a common pattern, $number %% $divisor is Perl 6's shortcut for it.

Shortcuts, Constants, and more Shortcuts

Perl 6 is modeled after human languages, which have some kind of compression scheme built in, where commonly used words tend to be short, and common constructs have shortcuts.

As such, there are lots of ways to write the code more succinctly. The first is basically cheating, because the sub chunks can be replaced by a built-in method in the Str class, comb:

# file sudoku.p6
use v6;

my $sudoku = '000000075000080094000500600010000200000900057006003040001000023080000006063240000';
$sudoku = $sudoku.trans: '0' => ' ';

my $separator = '+---+---+---+';
my $index = 0;
for $sudoku.comb(9) -> $line {
    if $index++ %% 3 {
        say $separator;
    }
    say '|', $line.comb(3).join('|'), '|';
}
say $separator;

The if conditional can be applied as a statement postfix:

say $separator if $index++ %% 3;

Except for the initialization, the variable $index is used only once, so there's no need to give it name. Yes, Perl 6 has anonymous variables:

my $separator = '+---+---+---+';
for $sudoku.comb(9) -> $line {
    say $separator if $++ %% 3;
    say '|', $line.comb(3).join('|'), '|';
}
say $separator;

Since $separator is a constant, we can declare it as one:

`constant $separator = '+---+---+---+';

If you want to reduce the line noise factor, you can also forego the sigil, so constant separator = '...'.

Finally there is a another syntax for method calls with arguments: instead of $obj.method(args) you can say $obj.method: args, which brings us to the idiomatic form of the small Sudoku formatter:

# file sudoku.p6
use v6;

my $sudoku = '000000075000080094000500600010000200000900057006003040001000023080000006063240000';
$sudoku = $sudoku.trans: '0' => ' ';

constant separator = '+---+---+---+';
for $sudoku.comb(9) -> $line {
    say separator if $++ %% 3;
    say '|', $line.comb(3).join('|'), '|';
}
say separator;

IO and other Tragedies

A practical script doesn't contain its input as a hard-coded string literal, but reads it from the command line, standard input or a file.

If you want to read the Sudoku from the command line, you can declare a subroutine called MAIN, which gets all command line arguments passed in:

# file sudoku.p6
use v6;

constant separator = '+---+---+---+';

sub MAIN($sudoku) {
    my $substituted = $sudoku.trans: '0' => ' ';

    for $substituted.comb(9) -> $line {
        say separator if $++ %% 3;
        say '|', $line.comb(3).join('|'), '|';
    }
    say separator;
}

This is how it's called:

$ perl6-m sudoku-format-08.p6 000000075000080094000500600010000200000900057006003040001000023080000006063240000
+---+---+---+
|   |   | 75|
|   | 8 | 94|
|   |5  |6  |
+---+---+---+
| 1 |   |2  |
|   |9  | 57|
|  6|  3| 4 |
+---+---+---+
|  1|   | 23|
| 8 |   |  6|
| 63|24 |   |
+---+---+---+

And you even get a usage message for free if you use it wrongly, for example by omitting the argument:

$ perl6-m sudoku.p6 
Usage:
  sudoku.p6 <sudoku> 

You might have noticed that the last example uses a separate variable for the substituted Sudoku string.This is because function parameters (aka variables declared in a signature) are read-only by default. Instead of creating a new variable, I could have also written sub MAIN($sudoku is copy) { ... }.

Classic UNIX programs such as cat and wc, follow the convention of reading their input from file names given on the command line, or from the standard input if no file names are given on the command line.

If you want your program to follow this convention, lines() provides a stream of lines from either of these source:

# file sudoku.p6
use v6;

constant separator = '+---+---+---+';

for lines() -> $sudoku {
    my $substituted = $sudoku.trans: '0' => ' ';

    for $substituted.comb(9) -> $line {
        say separator if $++ %% 3;
        say '|', $line.comb(3).join('|'), '|';
    }
    say separator;
}

Get Creative!

You won't learn a programming language from reading a blog, you have to actually use it, tinker with it. If you want to expand on the examples discussed earlier, I'd encourage you to try to produce Sudokus in different output formats.

SVG offers a good ratio of result to effort. This is the rough skeleton of an SVG file for a Sudoku:

<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg width="304" height="304" version="1.1"
xmlns="http://www.w3.org/2000/svg">
    <line x1="0" x2="300" y1="33.3333" y2="33.3333" style="stroke:grey" />
    <line x1="0" x2="300" y1="66.6667" y2="66.6667" style="stroke:grey" />
    <line x1="0" x2="303" y1="100" y2="100" style="stroke:black;stroke-width:2" />
    <line x1="0" x2="300" y1="133.333" y2="133.333" style="stroke:grey" />
    <!-- more horizontal lines here -->

    <line y1="0" y2="300" x1="33.3333" x2="33.3333" style="stroke:grey" />
    <!-- more vertical lines here -->


    <text x="43.7333" y="124.5"> 1 </text>
    <text x="43.7333" y="257.833"> 8 </text>
    <!-- more cells go here -->
    <rect width="304" height="304" style="fill:none;stroke-width:1;stroke:black;stroke-width:6"/>
</svg>

If you have a Firefox or Chrome browser, you can use it to open the SVG file.

If you are adventurous, you could also write a Perl 6 program that renders the Sudoku as a Postscript (PS) or Embedded Postscript document. It's also a text-based format.

Subscribe to the Perl 6 book mailing list

* indicates required

[/perl-6] Permanent link

Sun, 27 Nov 2016

Perl 6 By Example: Running Rakudo


Permanent link

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


Before we start exploring Perl 6, you should have an environment where you can run Perl 6 code. So you need to install Rakudo Perl 6, currently the only actively developed Perl 6 compiler. Or even better, install Rakudo Star, which is a distribution that includes Rakudo itself, a few useful modules, and an installer that can help you install more modules.

Below a few options for getting Rakudo Star installed are discussed. Chose whatever works for you.

The examples here use Rakudo Star 2016.10.

Installers

You can download installers from http://rakudo.org/downloads/star/ for Mac OS (.dmg) and Windows (.msi). After download, you can launch them, and they walk you through the installation process.

Note that Rakudo is not relocatable, which means you have to install to a fix location that was decided by the creator of the installer. Moving the installation to a different directory.

On Windows, the installer offers you need to add C:\rakudo\bin and C:\rakudo\share\perl6\site\bin to your PATH environment. You should chose that option, as it allows you to call rakudo (and programs that the module installer installs on your behalf) without specifying full paths.

Docker

On platforms that support Docker, you can pull an existing Docker container from the docker hub:

$ docker pull mj41/perl6-star

Then you can get an interactive Rakudo shell with this command:

$ docker run -it mj41/perl6-star perl6

But that won't work for executing scripts, because the container has its own, separate file system. To make scripts available inside the container, you need to tell Docker to make the current directory available to the container:

$ docker run -v $PWD:/perl6 -w /perl6 -it mj41/perl6-star perl6

The option -v $PWD:/perl6 instructs Docker to mount the current working directory ($PWD) into the container, where it'll be available as /perl6. To make relative paths work, -w /perl6 instructs Docker to set the working directory of the rakudo process to /perl6.

Since this command line starts to get unwieldy, I created an alias (this is Bash syntax; other shells might have slightly different alias mechanisms):

alias p6d='docker run -v $PWD:/perl6 -w /perl6 -it mj41/perl6-star perl6'

I put this line into my ~/.bashrc files, so new bash instances have a p6d command, short for "Perl 6 docker".

As a short test to see if it works, you can run

$ p6d -e 'say "hi"'
hi

If you go the Docker route, just the p6d alias instead of perl6 to run scripts.

Building from Source

To build Rakudo Star from source, you need make, gcc or clang and perl 5 installed. This example installs into $HOME/opt/rakudo-star:

$ wget http://rakudo.org/downloads/star/rakudo-star-2016.10.tar.gz
$ tar xzf rakudo-star-2016.10.tar.gz
$ cd rakudo-star-2016.10/
$ perl Configure.pl --prefix=$HOME/opt/rakudo-star --gen-moar
$ make install

You should have about 2GB of RAM available for the last step; building a compiler is a resource intensive task.

You need to add paths to two directories to your PATH environment variable, one for Rakudo itself, one for programs installed by the module installer:

PATH=$PATH:$HOME/opt/rakudo-star/bin/:$HOME/opt/rakudo-star/share/perl6/site/bin

If you are a Bash user, you can put that line into your ~/.bashrc file to make it available in new Bash processes.

Testing your Rakudo Star Installation

You should now be able to run Perl 6 programs from the command line, and ask Rakudo for its version:

$ perl6 --version
This is Rakudo version 2016.10-2-gb744de3 built on MoarVM version 2016.09-39-g688796b
implementing Perl 6.c.

$ perl6 -e "say <hi>"
hi

If, against all odds, all of these approaches have failed you to produce a usable Rakudo installation, you should describe your problem to the friendly Perl 6 community, which can usually provide some help. http://perl6.org/community/ describes ways to interact with the community.

Next week we'll take a look at the first proper Perl 6 example, so stay tuned for updates!

Subscribe to the Perl 6 book mailing list

* indicates required

[/perl-6] Permanent link

Sun, 20 Nov 2016

What is Perl 6?


Permanent link

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


Perl 6 is a programming language. It is designed to be easily learned, read and written by humans, and it is inspired by natural language. It allows the beginner to write in "baby Perl", while giving the experienced programmer freedom of expression, from concise to poetic.

Perl 6 is gradually typed. It mostly follows the paradigm of dynamically typed languages in that it accepts programs whose type safety it can't guarantee during compilation. Unlike many dynamic languages, it accepts and enforces type constraints. Where possible, the compiler uses type annotations to make decisions at compile time that would otherwise only be possible at run time.

Many programming paradigms have influenced Perl 6. You can write imperative, object-oriented and functional programs in Perl 6. Declarative programming is supported through the regex and grammar engine.

Most lookups in Perl 6 are lexical, and the language avoids global state. This makes parallel and concurrent execution of programs easier, as does Perl 6's focus on high-level concurrency primitives. Instead of threads and locks, you tend to think about promises and message queues when you don't want to be limited to one CPU core.

Perl 6 as a language is not opinionated about whether Perl 6 programs should be compiled or interpreted. Rakudo Perl 6, the main implementation, precompiles modules on the fly, and interprets scripts.

Perl 5, the Older Sister

Around the year 2000, Perl 5 development faced major strain from the conflicting desires to evolve and to keep backwards compatibility.

Perl 6 was the valve to release this tension. All the extension proposals that required a break in backwards compatibility were channeled into Perl 6, leaving it in a dreamlike state where everything was possible and nothing was fixed. It took several years of hard work to get into a more solid state.

During this time, Perl 5 also evolved, and the two languages are different enough that most Perl 5 developers don't consider Perl 6 a natural upgrade path anymore, to the point that Perl 6 does not try to obsolete Perl 5 (at least not more than it tries to obsolete any other programming language :-), and the first stable release of Perl 6 in 2015 does not indicate any lapse in support for Perl 5.

Library Availability

Being a relatively young language, Perl 6 lacks the mature module ecosystem that languages such as Perl 5 and Python provide.

To bridge this gap, interfaces exist that allow you to call into libraries written in C, Python, Perl 5 and Ruby. The Perl 5 and Python interfaces are sophisticated enough that you can write a Perl 6 class that subclasses one written in either language, and the other way around.

So if you like a particular Python library, for example, you can simply load it into your Perl 6 program through the Inline::Python module.

Why Should I Use Perl 6?

If you like the quick prototyping experience from dynamically typed programming languages, but you also want enough safety features to build big, reliable applications, Perl 6 is a good fit for you. Its gradual typing allows you to write code without having a full picture of the types involved, and later introduce type constraints to guard against future misuse of your internal and external APIs.

Perl has a long history of making text processing via regular expressions (regexes) very easy, but more complicated regexes have acquired a reputation of being hard to read and maintain. Perl 6 solves this by putting regexes on the same level as code, allowing you to name it like subroutines, and even to use object oriented features such as class inheritance and role composition to manage code and regex reuse. The resulting grammars are very powerful, and easy to read. In fact, the Rakudo Perl 6 compiler parses Perl 6 source code with a Perl 6 grammar!

Speaking of text, Perl 6 has amazing Unicode support. If you ask your user for a number, and they enter it with digits that don't happen to be the Arabic digits from the ASCII range, Perl 6 still has you covered. And if you deal with graphemes that cannot be expressed as a single Unicode code point, Perl 6 still presents it as a single character.

There are more technical benefits that I could list, but more importantly, the language is designed to be fun to use. An important aspect of that is good error messages. Have you ever been annoyed at Python for typically giving just SyntaxError: invalid syntax when something's wrong? This error could come from forgetting a closing parenthesis, for example. In this case, a Perl 6 compiler says

Unable to parse expression in argument list; couldn't find final ')'

which actually tells you what's wrong. But this is just the tip of the iceberg. The compiler catches common mistakes and points out possible solutions, and even suggests fixes for spelling mistakes.

Finally, Perl 6 gives you the freedom to express your problem domain and solution in different ways and with different programming paradigms. And if the options provided by the core language are not enough, Perl 6 is designed with extensibility in mind, allowing you to introduce both new semantics for object oriented code and new syntax.

Subscribe to the Perl 6 book mailing list

* indicates required

[/perl-6] Permanent link

Sat, 19 Nov 2016

Perl 6 By Example, Another Perl 6 Book


Permanent link
proposed book cover, starring a Butterfly, by Sebastian Riedl

I'm taking a shot a writing a Perl 6 book. Let's see how this goes.

My working title is Perl 6 by Example, and I want to recycle the approach taken in Using Perl 6 to introduce topics by example. Contrary to the now abandoned previous effort, I want to introduce the examples in small chunks, developing and explaining them bit by bit.

I expect the result to be rather limited in size (maybe 70 to 100 pages), and not a comprehensive guide to the Perl 6 language, but enough to get you started.

Target Audience

The reader should have previous programming experience. I assume familiarity with conditionals, variables, loops and basic data types such as various numbers, strings and lists. Some OO concepts such as classes, instances and methods will also be assumed, but I will drop some clarifying words about how I use the terminology.

Process

I will follow roughly the same process as in my previous book: First I write blog posts about the topics I intend to cover, and later distill them into chapters.

When I have some significant portion of the manuscript available, I will pre-publish it on Leanpub, and start to sell it.

If there is a lot of interest, I might investigate options to publish it in print form.

The Team

Luckily, several people have offered to help in some form or another, typically by writing or proof-reading (in no particular order):

  • [ptc]
  • DrForr
  • Coke
  • tbowder
  • AlexDaniel
  • seatek

Knowing the warm and helpful nature of the Perl 6 community, it is likely that more folks will step up to help. Any help, as well co-authors, will be highly appreciated.

Other Perl 6 Books

Information on other Perl 6 book projects is rather sparse and vague. I try my best to keep informed about these projects.

  • Laurent Rosenfeld is working on a book which is basically a Perl 6 version of Think Python. It is targeted at programming beginners, and has been accepted by a well-respected publisher.
  • There are rumors that Larry Wall is writing a Programming Perl 6 book, and I wish that to be true. I know nothing about the progress or time horizon of that project.
  • brian d foy has started a kickstarter to fund his project to write "Learning Perl 6". Since "Learning Perl" is a book focused on beginners with no prior programming experience, I assume the same is true for the Perl 6 version of this book, so I don't see much competition between his and mine project, and wish him luck and success.
  • Ken Youens-Clark has written about metagenomics, a subset of bioinformatics, using Perl 6 example code.

Interested?

If you are interested in progress updates or milestones about my book project, please sign up for the mailing list below. It will be low volume (probably less than one email per month). I will also try my best to inform you about news of the other Perl 6 book projects.

Numerous signups will also boost my motivation to work on the book.

Subscribe to the Perl 6 book mailing list

* indicates required

[/perl-6] Permanent link

Tue, 08 Nov 2016

Icinga2, the Monitoring System with the API from He


Permanent link

At my employer, we have a project to switch some monitoring infrastructure from Nagios to Icinga2. As part of this project, we also change the way the store monitoring configuration. Instead of the previous assortment of manually maintained and generated config files, all monitoring configuration should now come from the CMDB, and changes are propagated to the monitoring through the Icinga2 REST API.

Well, that was the plan. And as you know, no plan survives contact with the enemy.

Call Me, Maybe?

We created our synchronization to Icinga2, and used in our staging environment for a while. And soon got some reports that it wasn't working. Some hosts had monitoring configuration in our CMDB, but Icinga's web interface wouldn't show them. Normally, the web interface immediately shows changes to objects you're viewing, but in this case, not even a reload showed them.

So, we reported that as a bug to the team that operates our Icinga instances, but by the time they got to look at it, the web interface did show the missing hosts.

For a time, we tried to dismiss it as an unfortunate timing accident, but in later testing, it showed up again and again. The logs clearly showed that creating the host objects through the REST API produce a status code of 200, and a subsequent GET listed the object. Just the web interface (which happens to be the primary interface for our users) stubbornly refused to show them, until somebody restarted Icinga.

Turns out, it's a known bug.

I know, distributed systems are hard. Maybe talk to Kyle aka Aphyr some day?

CREATE + UPDATE != CREATE

If you create an object through the Icinga API, and then update it to a different state, you get a different result than if you created it like that in the first place. Or in other words, the update operation is incomplete. Or to put it plainly, you cannot rely on it.

Which means you cannot rely on updates. You have to delete the resource and recreate it. Unfortunately, that implies you lose history, and downtimes scheduled for the host or service.

API Quirks

Desiging APIs is hard. I guess that's why the Icinga2 REST API has some quirks. For example, if a PUT request fails, sometimes the response isn't JSON, but plain text. If the error response is indeed JSON, it duplicates the HTTP status code, but as a float. No real harm, but really, WAT?

The next is probably debatable, but we use Python, specifically the requests library, to talk to Icinga. And requests insists on URL-encoding a space as a + instead of %20, and Icinga insists on not decoding a + as a space. You can probably dig up RFCs to support both points of view, so I won't assign any blame. It's just annoying, OK?

In the same category of annoying, but not a show-stopper, is the fact that the API distinguishes between singular and plural. You can filter for a single host with host=thename, but if you filter by multiple hosts, it's hosts=name1&host2=name2. I understand the desire to support cool, human-like names, but it forces the user to maintain a list of both singular and plural names of each API object they work with. And not every plural can be built by appending an s to the singular. (Oh, and in case you were wondering, you can't always use the plural either. For example when referring to an attribute of an object, you need to use the singular).

Another puzzling fact is that when the API returns a list of services, the response might look like this:

{
    "results": [
        {
            "attrs": {
                "check_command": "ping4",
                "name": "ping4"
            },
            "joins": {},
            "meta": {},
            "name": "example.localdomain!ping4",
            "type": "Service"
        },
    ]
}

Notice how the "name" and attrs["name"] attribute are different? A service is always attached to the host, so the "name" attribute seems to be the fully qualified name in the format <hostname>!<servicename>, and attrs["name"] is just service name.

So, where can I use which? What's the philosophy behind having "name" twice, but with different meaning? Sadly, the docs are quiet about it. I remain unenlightened.

State Your Case!

Behind the scene, Icinga stores its configuration in files that are named after the object name. So when your run Icinga on a case sensitive file system, you can have both a service example.com!ssh and example.com!SSH at the same time. As a programmer, I'm used to case sensitivity, and don't have an issue with it.

What I have an issue with is when parts of the system are case sensitive, and others aren't. Like the match() function that the API docs like to use. Is there a way to make it case sensitive? I don't know. Which brings me to my next point.

Documentation (or the Lack Thereof)

I wasn't able to find actual documentation for the match() function. Possibly because there is none. Who knows?

Selection Is Hard

For our use case, we have some tags in the our CMDB, and a host can have zero, one or more tags. And we want to provide the user with the ability to create a downtime for all hosts that have tag.

Sounds simple, eh? The API supports creating a downtime for the result of an arbitrary filter. But that pre-supposes that you actually can create an appropriate filter. I have my doubts. In several hours of experimenting, I haven't found a reliable way to filter by membership of array variables.

Maybe I'm just too dumb, or maybe the documentation is lacking. No, not maybe. The documentation is lacking. I made a point about the match() function earlier. Are there more functions like match()? Are there more operators than the ==, in, &&, || and ! that the examples use?

Templates

We want to have some standards for monitoring certain types of hosts. For example Debian and RHEL machines have slightly different defaults and probes.

That's where templates comes in. You define a template for each case, and simply assign this template to the host.

By now, you should have realized that every "simply" comes with a "but". But it doesn't work.

That's right. Icinga has templates, but you can't create or update them through the API. When we wanted to implement templating support, API support for templates was on the roadmap for the next Icinga2 release, so we waited. And got read-only support.

Which means we had to reimplement templating outside of Icinga, with all the scaling problems that come with it. If you update a template that's emulated outside of Icinga, you need to update each instance, of which there can be many. Aside from this unfortunate scaling issue, it makes a correct implementation much harder. What do you do if the first 50 hosts updated correctly, and the 51st didn't? You can try to undo the previous changes, but that could also fail. Tough luck.

Dealing with Bug Reports

As the result of my negative experiences, I've submitted two bug reports. Both have been rejected the next morning. Let's look into it.

In No API documentation for match() I complained about the lack of discoverable documentation for the match() function. The rejection pointed to this, which is half a line:

match(pattern, text)    Returns true if the wildcard pattern matches the text, false otherwise.

What is a "wildcard pattern"? What happens if the second argument isn't a string, but an array or a dictionary? What about the case sensitivity question? Not answered.

Also, the lack of discoverability hasn't been addressed. The situation could easily be improved by linking to this section from the API docs.

So, somebody's incentive seems to be the number of closed or rejected issues, not making Icinga better usable.

To Be Or Not To Be

After experiencing the glitches described above, I feel an intense dislike whenever I have to work with the Icinga API. And after we discovered the consistency problem, my dislike spread to all of Icinga.

Not all of it was Icinga's fault. We had some stability issues with our own software and integration (for example using HTTP keep-alive for talking to a cluster behind a load balancer turned out to be a bad idea). Having both our own instability and the weirdness and bugs from Icinga made hard and work intensive to isolate the bugs. It was much less straight forward than the listing of issues in this rant might suggest.

While I've worked with Icinga for a while now, from your perspective, the sample size is still one. Giving advice is hard in that situation. Maybe the takeaway is that, should you consider to use Icinga, you would do well to evaluate extra carefully whether it can actually do what you need. And if using it is worth the pain.

[/misc] Permanent link

Tue, 01 Nov 2016

Perl 6 Advent Calendar 2016 -- Call for Authors


Permanent link

Every year since 2009, the Perl 6 community publishes a Perl 6 advent calendar, in the form of blog posts on perl6advent.wordpress.com.

To keep up this great tradition, we need 24 blog posts, and volunteers who write them. If you want to contribute a blog post about anything related to Perl 6, please add your name (and potentially also a topic already) to the schedule, and if you don't yet have a login on the advent blog, please tell me your email address so that I can send you an invitation.

Perl 6 advent blog posts should be finished the day before they are due, and published with midnight (UTC) of the due date as publishing date.

If you have any questions, or want to discuss blog post ideas, please join on the #perl6 IRC channel on irc.freenode.org, or drop me an email at moritz.lenz@gmail.com.

[/perl-6] Permanent link

Sun, 09 Oct 2016

Git Flow vs. Continuous Delivery


Permanent link

I've often read the recommendation to use git flow as a branching model for software development projects, and I've even introduced it at some projects at work.

But when we adopted Continuous Delivery, it became pretty clear that git flow and Continuous Integration and Continuous Delivery don't mix well.

So you develop features and bug fixes in branches, and merge them into the develop branch. This means the develop branch needs to be the focus of your testing efforts, it needs to be automatically deployed to a testing environment and so on.

And then, git flow recommends that for going to production, you build packages from the master branch, and release them. The problem is that by building a new package or binary from the master branch, you invalidate all the testing you did in the develop branch.

Even if the code in the master branch is identical to that in the develop branch, you might get a different binary due to non-reproducible builds, changes in the versions of build tools and compilers, or even cosmic rays or other sources of random bit flips in RAM or on disk. And if you always get an identical binary, why bother to do a second build at all?

The spirit of Continuous Delivery is very much to test the heck out of one build artifact, and if it proves to be of high quality, bring it to production.

An Alternative: Trunk-Based Development

A workflow that fits better with Continuous Delivery is to do most development in the same branch. This branch is automatically tested and deployed to a testing environment, and if all the tests are successful, either deployed to production automatically, or after manual approval.

If there are features that you don't yet want your end users to see, you can hide them behind feature flags. Of course, feature flags come with their own complexity, and should be kept to a minimum.

Keeping your software releasable all the time is a core principle of Continuous Delivery, so if you practise that, and encounter a bug that needs an immediate fix, chances are high that you can apply the fix in the main development branch, and deploy it using the same automation that you use for your usual feature development.

If that doesn't work, you can always pause your deployment pipeline for the regular development branch, create a hotfix branch, and use the regular deployment code, but based on the hotfix branch, and bring your changes to your testing and production environments.

Does That Mean I Should Abandon Git Flow?

My experience is limited to pretty small development teams, and to certain kinds of software. For example, I've never developed software for embedded devices. So I certainly can't talk about all the different modes of operations.

My main point is that git flow doesn't mix well with Continuous Delivery, and that you shouldn't recommend it blindly without understanding the context in which it is used.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

Sat, 17 Sep 2016

You Write Your Own Bio


Permanent link

I love how children ask the hard questions. My daughter of 2.5 years tend to ask people out of the blue: "who are you?". Most answer with their names, and possibly with their relation to my daughter.

The nagging philosopher's voice in my head quietly comments, "OK, that's your name, but who are you?" And for that matter, who am I? My name is part of my identity, but there's more to me than my name. I hope :-).

Identity is hard to pin down, and shifts in time. Being a father, a husband and a part of a family is a big part of my identity. So is my work, software engineering and architecture. My personality traits, like being an introvert, and hopefully a kind person, are important too. As are the things that I do in my spare time. Like writing a book.

Speaking of books, please join me on a tangent.

You've read a technical book, and liked it. And the book case contained a blurb about the author: "Mr X is a successful software engineer and has worked for X, Y and Z. He has written several books on programming topics.". Plus a few sentences about his origins, family and hobbies, maybe.

Who writes these blurbs?

As a kid, I thought that the publisher hired journalists who did research on the author, to come up with a short bio that is both flattering and accurate.

Maybe the really big publishers do that. But mostly, the publishers just ask the author to provide a bio themselves.

I've written several articles in for technical print magazines, and that is exactly what happened. It's no secret either; it's right in the submission guidelines.

For my own book, which is self-published in digital form (and a print version being worked on by a small, independent publisher), I wrote my own bio, which was weird, because I had to talk about myself in the third person. And because I had to emphasize my strengths, which I'm typically not comfortable with.

You see where this is going, don't you?

The blurb, short bio, however you call it, is meant to shine a bit of light on the author's identity. This is to make the author more relatable, but also to serve as an endorsement. Which means that, depending of the topic of the publication it is attached to, it shines light on different parts of the identity. In the context of a book on software deployments, nobody cares that I kinda like cooking, but not enough to become really good at it.

So, should I call myself a successful software engineer, in the three-sentence autobiography? It sounds good, doesn't it? Am I comfortable with that description? I've had my share of successes in my professional career, and also some failures. If somebody else calls me successful, I take it as a compliment. If I put that moniker up myself, I cringe a bit. Should I? If others call me successful, it might just be my imposter syndrome kicking in.

But those adjectives are a small matter in comparison to other matters. Obviously, I write. Rambling stuff like what you're reading now. Articles. Blog posts. A book. Now, do I call myself a writer? Or an author? Do I want my gainful employment to become part of my identity?

There are no rules to do decide that. It's a choice. It's my choice.

And likely, it's a significant choice. If I consider myself a writer, the next project I'll be taking on is more likely to be another article, or a even book. If I consider myself a programmer, it's likely to be a small tool or a web app. I could decide I am an or even "the" maintainer of some Open Source projects I'm involved in. I can decide that I want to be something that I'm not yet, and make it happen.

I don't know what exactly I'll decide, but I love that I have a choice.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/misc] Permanent link

Thu, 11 Aug 2016

Moritz on Continuous Discussions (#c9d9)


Permanent link

On Tuesday I was a panelist on the Continuous Discussions Episode 47 – Open Source and DevOps. It was quite some fun!

Much of the discussion applied to Open Source in general software development, not just DevOps.

You can watch the full session on Youtube, or on the Electric Cloud blog.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

Tue, 19 Jul 2016

Continuous Delivery on your Laptop


Permanent link

An automated deployment system, or delivery pipeline, builds software, and moves it through the various environments, like development, testing, staging, and production.

But what about testing and developing the delivery system itself? In which environment do you develop new features for the pipeline?

Start Small

When you are starting out you can likely get away with having just one environment for the delivery pipeline: the production environment.

It might shock you that you're supposed to develop anything in the production environment, but you should also be aware that the delivery system is not crucial for running your production applications, "just" for updating it. If the pipeline is down, your services still work. And you structure the pipeline to do the same jobs both in the testing and in the production environment, so you test the deployments in a test environment first.

A Testing Environment for the Delivery Pipeline?

If those arguments don't convince you, or you're at a point where developer productivity suffers immensely from an outage of the deployment system, you can consider creating a testing environment for the pipeline itself.

But pipelines in this testing environment should not be allowed to deploy to the actual production environment, and ideally shouldn't interfere with the application testing environment either. So you have to create at least a partial copy of your usual environments, just for testing the delivery pipeline.

This is only practical if you have automated basically all of the configuration and provisioning, and have access to some kind of cloud solution to provide you with the resources you need for this endeavour.

Creating a Playground

If you do decide that you do need some playground or testing environment for your delivery pipeline, there are a few options at your disposal. But before you build one, you should be aware of how many (or few) resources such an environment consumes.

Resource Usage of a Continuous Delivery Playground

For a minimal playground that builds a system similar to the one discussed in earlier blog posts, you need

  • a machine on which you run the GoCD server
  • a machine on which you run a GoCD agent
  • a machine that acts as the testing environment
  • a machine that acts as the production environment

You can run the GoCD server and agent on the same machine if you wish, which reduces the footprint to three machines.

The machine on which the GoCD server runs should have between one and two gigabytes of memory, and one or two (virtual) CPUs. The agent machine should have about half a GB of memory, and one CPU. If you run both server and agent on the same machine, two GB of RAM and two virtual CPUs should do nicely.

The specifications of the remaining two machines mostly depend on the type of applications you deploy and run on them. For the deployment itself you just need an SSH server running, which is very modest in terms of memory and CPU usage. If you stick to the example applications discussed in this blog series, or similarly lightweight applications, half a GB of RAM and a single CPU per machine should be sufficient. You might get away with less RAM.

So in summary, the minimal specs are:

  • One VM with 2 GB RAM and 2 CPUs, for go-server and go-agent
  • Two VMs with 0.5 GB RAM and 1 CPU each, for the "testing" and the "production" environments.

In the idle state, the GoCD server periodically polls the git repos, and the GoCD agent polls the server for work.

When you are not using the playground, you can shut off those processes, or even the whole machines.

Approaches to Virtualization

These days, almost nobody buys server hardware and runs such test machines directly on them. Instead there is usually a layer of virtualization involved, which both makes new operating system instances more readily available, and allows a denser resource utilization.

Private Cloud

If you work in a company that has its own private cloud, for example an OpenStack installation, you could use that to create a few virtual machines.

Public Cloud

Public cloud compute solutions, such as Amazon's EC2, Google's Compute Engine and Microsoft's Azure cloud offerings, allow you to create VM instances on demand, and be billed at an hourly rate. On all three services, you pay less than 0.10 USD per hour for an instance that can run the GoCD server[^pricedate].

[^pricedate]: Prices from July 2016, though I expect prices to only go downwards. Though resource usage of the software might increase in future as well.

Google Compute Engine even offers heavily discounted preemtible VMs. Those VMs are only available when the provider has excess resources, and come with the option to be shut down on relatively short notice (a few minutes). While this is generally not a good idea for an always-on production system, it can be a good fit for a cheap testing environment for a delivery pipeline.

Local Virtualization Solutions

If you have a somewhat decent workstation or laptop, you likely have sufficient resources to run some kind of virtualization software directly on it.

Instead of classical virtualization solutions, you could also use a containerization solution such as Docker, which provides enough isolation for testing a Continuous Delivery pipeline. The downside is that Docker is not meant for running several services in one container, and here you need at least an SSH server and the actual services that are being deployed. You could work around this by using Ansible's Docker connector instead of SSH, but then you make the testing playground quite dissimilar from the actual use case.

So let's go with a more typical virtualization environment such as KVM or VirtualBox, and Vagrant as a layer above them to automate the networking and initial provisioning. For more on this approach, see the next section.

Continuous Delivery on your Laptop

My development setup looks like this: I have the GoCD server installed on my Laptop running under Ubuntu, though running it under Windows or MacOS would certainly also work.

Then I have Vagrant installed, using the VirtualBox backend. I configure it to run three VMs for me: one for the GoCD agent, and one each as a testing and production machine. Finally there's an Ansible playbook that configures the three latter machines.

While running the Ansible playbook for configuring these three virtual machines requires internet connectivity, developing and testing the Continuous Delivery process does not.

If you want to use the same test setup, consider using the files from the playground directory of the deployment-utils repository, which will likely be kept more up-to-date than this blog post.

Network and Vagrant Setup

We'll use Vagrant with a private network, which allows you to talk to each of the virtual machines from your laptop or workstation, and vice versa.

I've added these lines to my /etc/hosts file. This isn't strictly necessary, but it makes it easier to talk to the VMs:

# Vagrant
172.28.128.1 go-server.local
172.28.128.3 testing.local
172.28.128.4 production.local
172.28.128.5 go-agent.local

And a few lines to my ~/.ssh/config file:

Host 172.28.128.* *.local
    User root
    StrictHostKeyChecking no
    IdentityFile /dev/null
    LogLevel ERROR

Do not do this for production machines. This is only safe on a virtual network on a single machine, where you can be sure that no attacker is present, unless they already compromised your machine.

That said, creating and destroying VMs is common in Vagrant land, and each time you create them anew, the will have new host keys. Without such a configuration, you'd spend a lot of time updating SSH key fingerprints.

Then let's get Vagrant:

$ apt-get install -y vagrant virtualbox

To configure Vagrant, you need a Ruby script called Vagrantfile:

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure(2) do |config|
  config.vm.box = "debian/contrib-jessie64"

  {
    'testing'    => "172.28.128.3",
    'production' => "172.28.128.4",
    'go-agent'   => "172.28.128.5",
  }.each do |name, ip|
    config.vm.define name do |instance|
        instance.vm.network "private_network", ip: ip
        instance.vm.hostname = name + '.local'
    end
  end

  config.vm.synced_folder '/datadisk/git', '/datadisk/git'

  config.vm.provision "shell" do |s|
    ssh_pub_key = File.readlines("#{Dir.home}/.ssh/id_rsa.pub").first.strip
    s.inline = <<-SHELL
      mkdir -p /root/.ssh
      echo #{ssh_pub_key} >> /root/.ssh/authorized_keys
    SHELL
  end
end

This builds three Vagrant VMs based on the debian/contrib-jessie64 box, which is mostly a pristine Debian Jessie VM, but also includes a file system driver that allows Vagrant to make directories from the host system available to the guest system.

I have a local directory /datadisk/git in which I keep a mirror of my git repositories, so that both the GoCD server and agent can access the git repositories without requiring internet access, and without needing another layer of authentication. The config.vm.synced_folder call in the Vagrant file above replicates this folder into the guest machines.

Finally the code reads an SSH public key from the file ~/.ssh/config and adds it to the root account on the guest machines. In the next step, an Ansible playbook will use this access to configure the VMs to make them ready for the delivery pipeline.

To spin up the VMs, type

$ vagrant up

in the folder containing the Vagrantfile. The first time you run this, it takes a bit longer because Vagrant needs to download the base image first.

Once that's finished, you can call the command vagrant status to see if everything works, it should look like this:

$ vagrant status
Current machine states:

testing                   running (virtualbox)
production                running (virtualbox)
go-agent                  running (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.

And (on Debian-based Linux systems) you should be able to see the newly created, private network:

$ ip route | grep vboxnet
172.28.128.0/24 dev vboxnet1  proto kernel  scope link  src 172.28.128.1

You should now be able to log in to the VMs with ssh root@go-agent.local, and the same with testing.local and production.local as host names.

Ansible Configuration for the VMs

It's time to configure the Vagrant VMs. Here's an Ansible playbook that does this:

---
 - hosts: go-agent
   vars:
     go_server: 172.28.128.1
   tasks:
   - group: name=go system=yes
   - name: Make sure the go user has an SSH key
     user: name=go system=yes group=go generate_ssh_key=yes home=/var/go
   - name: Fetch the ssh public key, so we can later distribute it.
     fetch: src=/var/go/.ssh/id_rsa.pub dest=go-rsa.pub fail_on_missing=yes flat=yes
   - apt: package=apt-transport-https state=installed
   - apt_key: url=https://download.go.cd/GOCD-GPG-KEY.asc state=present validate_certs=no
   - apt_repository: repo='deb https://download.go.cd /' state=present
   - apt: update_cache=yes package={{item}} state=installed
     with_items:
      - go-agent
      - git

   - copy:
       src: files/guid.txt
       dest: /var/lib/go-agent/config/guid.txt
       owner: go
       group: go
   - lineinfile: dest=/etc/default/go-agent regexp=^GO_SERVER= line=GO_SERVER={{ go_server }}
   - service: name=go-agent enabled=yes state=started

 - hosts: aptly
   handlers:
    - name: restart lighttpd
      service: name=lighttpd state=restarted
   tasks:
     - apt: package={{item}} state=installed
       with_items:
        - ansible
        - aptly
        - build-essential
        - curl
        - devscripts
        - dh-systemd
        - dh-virtualenv
        - gnupg2
        - libjson-perl
        - python-setuptools
        - lighttpd
        - rng-tools
     - copy: src=files/key-control-file dest=/var/go/key-control-file
     - command: killall rngd
       ignore_errors: yes
       changed_when: False
     - command: rngd -r /dev/urandom
       changed_when: False
     - command: gpg --gen-key --batch /var/go/key-control-file
       args:
         creates: /var/go/.gnupg/pubring.gpg
       become_user: go
       become: true
       changed_when: False
     - shell: gpg --export --armor > /var/go/pubring.asc
       args:
         creates: /var/go/pubring.asc
       become_user: go
       become: true
     - fetch:
         src: /var/go/pubring.asc
         dest: =deb-key.asc
         fail_on_missing: yes
         flat: yes
     - name: Bootstrap the aptly repos that will be configured on the `target` machines
       copy:
        src: ../add-package
        dest: /usr/local/bin/add-package
        mode: 0755
     - name: Download an example package to fill the repo with
       get_url:
        url: http://ftp.de.debian.org/debian/pool/main/b/bash/bash_4.3-11+b1_amd64.deb
        dest: /tmp/bash_4.3-11+b1_amd64.deb
     - command: /usr/local/bin/add-package {{item}} jessie /tmp/bash_4.3-11+b1_amd64.deb
       args:
           creates: /var/go/aptly/{{ item }}-jessie.conf
       with_items:
         - testing
         - production
       become_user: go
       become: true

     - name: Configure lighttpd to serve the aptly directories
       copy: src=files/lighttpd.conf dest=/etc/lighttpd/conf-enabled/30-aptly.conf
       notify:
         - restart lighttpd
     - service: name=lighttpd state=started enabled=yes

 - hosts: target
   tasks:
     - authorized_key:
        user: root
        key: "{{ lookup('file', 'go-rsa.pub') }}"
     - apt_key: data="{{ lookup('file', 'deb-key.asc') }}" state=present

 - hosts: production
   tasks:
     - apt_repository:
         repo: "deb http://{{hostvars['agent.local']['ansible_ssh_host'] }}/debian/production/jessie jessie main"
         state: present

 - hosts: testing
   tasks:
     - apt_repository:
         repo: "deb http://{{hostvars['agent.local']['ansible_ssh_host'] }}/debian/testing/jessie jessie main"
         state: present

 - hosts: go-agent
   tasks:
     - name: 'Checking SSH connectivity to {{item}}'
       become: True
       become_user: go
       command: ssh -o StrictHostkeyChecking=No root@"{{ hostvars[item]['ansible_ssh_host'] }}" true
       changed_when: false
       with_items: groups['target']

You also need a hosts or inventory file:

[all:vars]
ansible_ssh_user=root

[go-agent]
agent.local ansible_ssh_host=172.28.128.5

[aptly]
agent.local

[target]
testing.local ansible_ssh_host=172.28.128.3
production.local ansible_ssh_host=172.28.128.4

[testing]
testing.local

[production]
production.local

... and a small ansible.cfg file:

[defaults]
host_key_checking = False
inventory = hosts
pipelining=True

This does a whole lot of stuff:

  • Install and configure the GoCD agent
    • copies a file with a fixed UID to the configuration directory of the go-agent, so that when you tear down the machine and create it anew, the GoCD server will identify it as the same agent as before.
  • Gives the go user on the go-agent machine SSH access on the target hosts by
    • first making sure the go user has an SSH key
    • copying the public SSH key to the host machine
    • later distributing it to the target machine using the authorized_key module
  • Creates a GPG key pair for the go user
    • since GPG key creation uses lots of entropy for random numbers, and VMs typically don't have that much entropy, first install rng-tools and use that to convince the system to use lower-quality randomness. Again, this is something you shouldn't do on a production setting.
  • Copies the public key of said GPG key pair to the host machine, and then distribute it to the target machines using the apt_key module
  • Creates some aptly-based Debian repositories on the go-agent machine by
    • copying the add-package script from the same repository to the go-agent machine
    • running it with a dummy package, here bash, to actually create the repos
    • installing and configuring lighttpd to serve these packages by HTTP
    • configuring the target machines to use these repositories as a package source
  • Checks that the go user on the go-agent machine can indeed reach the other VMs via SSH

After running ansible-playbook setup.yml, your local GoCD server should have a new agent, which you have to activate in the web configuration and assign the appropriate resources (debian-jessie and aptly if you follow the examples from this blog series).

Now when you clone your git repos to /datadisk/git/ (be sure to git clone --mirror) and configure the pipelines on the GoCD server, you have a complete Continuous Delivery-system running on one physical machine.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

Tue, 12 Jul 2016

Continuous Delivery and Security


Permanent link

What's the impact of automated deployment on the security of your applications and infrastructure?

It turns out there are both security advantages, and things to be wary of.

The Dangers of Centralization

In a deployment pipeline, the machine that controls the deployment needs to have access to the target machines where the software is deployed.

In the simplest case, there is private SSH key on the deployment machine, and the target machines grant access to the owner of that key.

This is an obvious risk, since an attacker gaining access to the deployment machine (or in the examples discussed previously, the GoCD server controlling the machine) can use this key to connect to all of the target machines.

Some possible mitigations include:

  • hardened setup of the deployment machine
  • password-protect the SSH key and supply the password through the same channel that triggers the deployment
  • have separate deployment and build hosts. Build hosts tend to need far more software installed, which imply a bigger attack surface
  • on the target machines, only allow unprivileged access through said SSH key, and use something like sudo to allow only certain privileged operations

Each of these mitigations have their own costs and weaknesses. For example password-protecting SSH keys helps if the attacker only manages to obtain a copy of the file system, but not if the attacker gains root privileges on the machine, and thus can obtain a memory dump that includes the decrypted SSH key.

The sudo approach is very effective at limiting the spread of an attack, but it requires extensive configuration on the target machine, and you need a secure way to deploy that. So you run into a chicken-and-egg problem and have quite some extra effort.

On the flip side, if you don't have a delivery pipeline, deployments have to happen manually, so you have the same problem of needing to give humans access to the target machines. Most organizations offer some kind of secured host on which the operator's SSH keys are stored, and you face the same risk with that host as the deployment host.

Time to Market for Security Fixes

Compared to manual deployments, even a relatively slow deployment pipeline is still quite fast. When a vulnerability is identified, this quick and automated rollout process can make a big difference in reducing the time until the fix is deployed.

Equally important is the fact that a clunky manual release process seduces the operators into taking shortcuts around security fixes, skipping some steps of the quality assurance process. When that process is automated and fast, it is easier to adhere to the process than to skip it, so it will actually be carried out even in stressful situations.

Audits and Software Bill of Materials

A good deployment pipeline tracks when which version of a software was built and deployed. This allows one to answer questions such as "For how long did we have this security hole?", "How soon after the report was the vulnerability patched in production?" and maybe even "Who approved the change that introduced the vulnerability?".

If you also use configuration management based on files that are stored in a version control system, you can answer these questions even for configuration, not just for software versions.

In short, the deployment pipeline provides enough data for an audit.

Some legislations require you to record a Software Bill of Materials. This is a record of which components are contained in some software, for example a list of libraries and their versions. While this is important for assessing the impact of a license violation, it is also important for figuring out which applications are affected by a vulnerability in a particular version of a library.

For example, a 2015 report by HP Security found that 44% of the investigated breaches were made possible by vulnerabilities that have been known (and presumably patched) for at least two years. Which in turn means that you can nearly halve your security risk by tracking which software version you use where, subscribe to a newsletter or feed of known vulnerabilities, and rebuild and redeploy your software with patched versions.

A Continuous Delivery system doesn't automatically create such a Software Bill of Materials for you, but it gives you a place where you can plug in a system that does for you.

Conclusions

Continuous Delivery gives the ability to react quickly and predictably to newly discovered vulnerabilities. At the same time, the deployment pipeline itself is an attack surface, which, if not properly secured, can be quite an attractive target for an intruder.

Finally, the deployment pipeline can help you to collect data that can give insight into the usage of software with known vulnerabilities, allowing you to be thorough when patching these security holes.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

Tue, 05 Jul 2016

Ansible: A Primer


Permanent link

Ansible is a very pragmatic and powerful configuration management system that is easy to get started with.

Connections and Inventory

Ansible is typically used to connect to one or more remote hosts via ssh and bring them into a desired state. The connection method is pluggable: other methods include local, which simply invokes the commands on the local host instead, and docker, which connects through the Docker daemon to configure a running container.

To tell Ansible where and how to connect, you write an inventory file, called hosts by default. In the inventory file, you can define hosts and groups of hosts, and also set variables that control how to connect to them.

# file myinventory
# example inventory file
[all:vars]
# variables set here apply to all hosts
ansible_user=root

[web]
# a group of webservers
www01.example.com
www02.example.com

[app]
# a group of 5 application servers,
# all following the same naming scheme:
app[01:05].example.com

[frontend:children]
# a group that combines the two previous groups
app
web

[database]
# here we override ansible_user for just one host
db01.example.com ansible_user=postgres

(In versions prior to Ansible 2.0, you have to use ansible_ssh_user instead of ansible_user). See the introduction to inventory files for more information.

To test the connection, you can use the ping module on the command line:

$ ansible -i myinventory web -m ping
www01.example.com | success >> {
    "changed": false,
    "ping": "pong"
}

www02.example.com | success >> {
    "changed": false,
    "ping": "pong"
}

Let's break the command line down into its components: -i myinventory tells Ansible to use the myinventory file as inventory. web tells Ansible which hosts to work on. It can be a group, as in this example, or a single host, or several such things separated by a colon. For example, www01.example.com:database would select one of the web servers and all of the database servers. Finally, -m ping tells Ansible which module to execute. ping is probably the simplest module, it simply sends the response "pong" and that the remote host hasn't changed.

These commands run in parallel on the different hosts, so the order in which these responses are printed can vary.

If there is a problem with connecting to a host, add the option -vvv to get more output.

Ansible implicitly gives you the group all which -- you guessed it -- contains all the hosts configured in the inventory file.

Modules

Whenever you want to do something on a host through Ansible, you invoke a module to do that. Modules usually take arguments that specify what exactly should happen. On the command line, you can add those arguments with `ansible -m module -a 'arguments', for example

$ ansible -i myinventory database -m shell -a 'echo "hi there"'
db01.exapmle.com | success | rc=0 >>
hi there

Ansible comes with a wealth of built-in modules and an ecosystem of third-party modules as well. Here I want to present just a few, commonly-used modules.

The shell Module

The shell module executes a shell command on the host and accepts some options such as chdir to change into another working directory first:

$ ansible -i myinventory database -m shell -e 'pwd chdir=/tmp'
db01.exapmle.com | success | rc=0 >>
/tmp

It is pretty generic, but also an option of last resort. If there is a more specific module for the task at hand, you should prefer the more specific module. For example you could ensure that system users exist using the shell module, but the more specialized user module is much easier to use for that, and likely does a better job than an improvised shell script.

The copy Module

With copy you can copy files verbatim from the local to the remote machine:

$ ansible -i myinventory database -m copy -a 'src=README.md dest=/etc/motd mode=644
db01.example.com | success >> {
    "changed": true,
    "dest": "/etc/motd",
    "gid": 0,
    "group": "root",
    "md5sum": "d41d8cd98f00b204e9800998ecf8427e",
    "mode": "0644",
    "owner": "root",
    "size": 0,
    "src": "/root/.ansible/tmp/ansible-tmp-1467144445.16-156283272674661/source",
    "state": "file",
    "uid": 0
}

The template Module

template mostly works like copy, but it interprets the source file as a Jinja2 template before transferring it to the remote host.

This is commonly used to create configuration files and to incorporate information from variables (more on that later).

Templates cannot be used directly from the command line, but rather in playbooks, so here is an example of a simple playbook.

# file motd.j2
This machine is managed by {{team}}.


# file template-example.yml
---
- hosts: all
  vars:
    team: Slackers
  tasks:
   - template: src=motd.j2 dest=/etc/motd mode=0644

More on playbooks later, but what you can see is that this defines a variable team, sets it to the value Slacker, and the template interpolates this variable.

When you run the playbook with

$ ansible-playbook -i myinventory --limit database template-example.yml

It creates a file /etc/motd on the database server with the contents

This machine is manged by Slackers.

The file Module

The file module manages attributes of file names, such as permissions, but also allows you create directories, soft and hard links.

$ ansible -i myinventory database -m file -a 'path=/etc/apt/sources.list.d state=directory mode=0755'
db01.example.com | success >> {
    "changed": false,
    "gid": 0,
    "group": "root",
    "mode": "0755",
    "owner": "root",
    "path": "/etc/apt/sources.list.d",
    "size": 4096,
    "state": "directory",
    "uid": 0
}

The apt Module

On Debian and derived distributions, such as Ubuntu, installing and removing packages is generally done with package managers from the apt family, such as apt-get, aptitude, and in newer versions, the apt binary directly.

The apt module manages this from within Ansible:

$ ansible -i myinventory database -m apt -a 'name=screen state=installed update_cache=yes'
db01.example.com | success >> {
    "changed": false
}

Here the screen package was already installed, so the module didn't change the state of the system.

Separate modules are available for managing apt-keys with which repositories are cryptographically verified, and for managing the repositories themselves.

The yum and zypper Modules

For RPM-based Linux distributions, the yum module (core) and zypper module (not in core, so must be installed separately) are available. They manage package installation via the package managers of the same name.

The package Module

The package module tries to use whatever package manager it detects. It is thus more generic than the apt and yum modules, but supports far fewer features. For example in the case of apt, it does not provide any control over whether to run apt-get update before doing anything else.

Application-Specific Modules

The modules presented so far are fairly close to the system, but there are also modules for achieving common, application specific tasks. Examples include dealing with databases, network related things such as proxies, version control systems, clustering solutions such as Kubernetes, and so on.

Playbooks

Playbooks can contain multiple calls to modules in a defined order and limit their execution to individual or group of hosts.

They are written in the YAML file format, a data serialization file format that is optimized for human readability.

Here is an example playbook that installs the newest version of the go-agent Debian package, the worker for Go Continuous Delivery:

---
- hosts: go-agent
  vars:
    go_server: hack.p6c.org
  tasks:
  - apt: package=apt-transport-https state=installed
  - apt_key: url=https://download.go.cd/GOCD-GPG-KEY.asc state=present validate_certs=no
  - apt_repository: repo='deb https://download.go.cd /' state=present
  - apt: update_cache=yes package={{item}} state=installed
    with_items:
     - go-agent
     - git
     - build-essential
  - lineinfile: dest=/etc/default/go-agent regexp=^GO_SERVER= line=GO_SERVER={{ go_server }}
  - service: name=go-agent enabled=yes state=started

The top level element in this file is a one-element list. The single element starts with hosts: go-agent, which limits execution to hosts in the group go-agent. This is the relevant part of the inventory file that goes with it:

[go-agent]
go-worker01.p6c.org
go-worker02.p6c.org

Then it sets the variable go_server to a string, here this is the hostname where a GoCD server runs.

Finally, the meat of the playbook: the list of tasks to execute.

Each task is a call to a module, some of which have already been discussed. A quick overview:

  • First, the Debian package apt-transport-https is installed, to make sure that the system can fetch meta data and files from Debian repositories through HTTPS
  • The next two tasks use the apt_repository and apt_key modules to configure the repository from which the actual go-agent package shall be installed
  • Another call to apt installs the desired package. Also, some more packages are installed with a loop construct
  • The lineinfile module searches by regex for a line in a text file, and replaces the appropriat line with pre-defined content. Here we use that to configure the GoCD server that the agent connects to.
  • Finally, the service module starts the agent if it's not yet running (state=started), and ensures that it is automatically started on reboot (enabled=yes).

Playbooks are invoked with the ansible-playbook command.

There can be more than one list of tasks in a playbook, which is a common use-case when they affect different groups of hosts:

---
- hosts: go-agent:go-server
  tasks:
  - apt: package=apt-transport-https state=installed
  - apt_key: url=https://download.go.cd/GOCD-GPG-KEY.asc state=present validate_certs=no
  - apt_repository: repo='deb https://download.go.cd /' state=present

- hosts: go-agent
  - apt: update_cache=yes package={{item}} state=installed
    with_items:
     - go-agent
     - git
     - build-essential
   - ...

- hosts: go-server
  - apt: update_cache=yes package=go-server state=installed
  - ...

Variables

Variables are useful both for controlling flow inside a playbook, and for filling out spots in templates to generate configuration files.

There are several ways to set variables. One is directly in playbooks, via vars: ..., as seen before. Another is to specify them at the command line:

ansible-playbook --extra-vars=variable=value theplaybook.yml

Another, very flexible way is to use the group_vars feature. For each group that a host is in, Ansible looks for a file group_vars/thegroup.yml and for files matching `group_vars/thegroup/*.yml. A host can be in several groups at once, which gives you quite some flexibility.

For example, you can put each host into two groups, one for the role the host is playing (like webserver, database server, DNS server etc.), and one for the environment it is in (test, staging, prod). Here is a small example that uses this layout:

# environments
[prod]
www[01:02].example.com
db01.example.com

[test]
db01.test.example.com
www01.test.example.com


# functional roles
[web]
www[01:02].example.com
www01.test.example.com

[db]
db01.example.com
db01.test.example.com

To roll out only the test hosts, you can run

ansible-playbook --limit test theplaybook.yml

and put environment-specific variables in group_vars/test.yml and group_vars/prod.yml, and web server specific variables in group_vars/web.yml etc.

You can use nested data structures in your variables, and if you do, you can configure Ansible to merge those data structures for you. You can configure it by creating a file called ansible.cfg with this content:

[defaults]
hash_behavior=merge

That way, you can have a file group_vars/all.yml that sets the default values:

# file group_vars/all.yml
myapp:
    domain: example.com
    db:
        host: db.example.com
        username: myappuser
        instance. myapp

And then override individual elements of that nested data structure, for example in group_vars/test.yml:

# file group_vars/test.yml
myapp:
    domain: text.example.com
    db:
        hostname: db.test.example.com

The keys that the test group vars file didn't touch, for example myapp.db.username, are inherited from the file all.yml.

Roles

Roles are a way to encapsulate parts of a playbook into a reusable component.

Let's consider a real world example that leads to a simple role definition.

For deploying software, you always want to deploy the exact version you want to build, so the relevant part of the playbook is

- apt: name=thepackage={{package_version}} state=present update_cache=yes force=yes

But this requires you to supply the package_version variable whenever you run the playbook, which will not be practical when you instead configure a new machine and need to install several software packages, each with their own playbook.

Hence, we generalize the code to deal with the case that the version number is absent:

- apt: name=thepackage={{package_version}} state=present update_cache=yes force=yes
  when: package_version is defined
- apt: name=thepackage state=present update_cache=yes
  when: package_version is undefined

If you run several such playbooks on the same host, you'll notice that it likely spends most of its time running apt-get update for each playbook. This is necessary the first time, because you might have just uploaded a new package on your local Debian mirror prior to the deployment, but subsequent runs are unnecessary. So you can store the information that a host has already updated its cache in a fact, which is a per-host kind of variable in Ansible.

- apt: update_cache=yes
  when: apt_cache_updated is undefined

- set_fact:
    apt_cache_updated: true

As you can see, the code base for sensibly installing a package has grown a bit, and it's time to factor it out into a role.

Roles are collections of YAML files, with pre-defined names. The commands

$ mkdir roles
$ cd roles
$ ansible-galaxy init custom_package_installation

create an empty skeleton for a role named custom_package_installation. The tasks that previously went into all the playbooks now go into the file tasks/main.yml below the role's main directory:

# file roles/custom_package_installation/tasks/main.yml
- apt: update_cache=yes
  when: apt_cache_updated is undefined
- set_fact:
    apt_cache_updated: true

- apt: name={{package}={{package_version}} state=present update_cache=yes force=yes
  when: package_version is defined
- apt: name={{package} state=present update_cache=yes
  when: package_version is undefined

To use the role, first add the line roles_path = roles in the file ansible.cfg in the [default] section, and then in a playbook, include it like this:

---
- hosts: web
  pre_tasks:
     - # tasks that are execute before the role(s)
  roles: { role: custom_package_installation, package: python-matheval }
  tasks:
    - # tasks that are executed after the role(s)

pre_tasks and tasks are optional; a playbook consisting of only roles being included is totally fine.

Summary

Ansible offers a pragmatic approach to configuration management, and is easy to get started with.

It offers modules for low-level tasks such as transferring files and executing shell commands, but also higher-level task like managing packages and system users, and even application-specific tasks such as managing PostgreSQL and MySQL users.

Playbooks can contain multiple calls to modules, and also use and set variables and consume roles.

Ansible has many more features, like handlers, which allow you to restart services only once after any changes, dynamic inventories for more flexible server landscapes, vault for encrypting variables, and a rich ecosystem of existing roles for managing common applications and middleware.

For learning more about Ansible, I highly recommend the excellent book Ansible: Up and Running by Lorin Hochstein.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

Tue, 28 Jun 2016

Automating Deployments and Configuration Management


Permanent link

New software versions often need new configuration as well. How do you make sure that the necessary configuration arrives on a target machine at the same time (or before) the software release that introduces them?

The obvious approach is to put the configuration in version control too and deploy it alongside the software.

Taking configuration from a source repository and applying it to running machines is what configuration management software does.

Since Ansible has been used for deployment in the examples so far -- and it's a good configuration management system as well -- it is an obvious choice to use here.

Benefits of Configuration Management

When your infrastructure scales to many machines, and you don't want your time and effort to scale linearly with them, you need to automate things. Keeping configuration consistent across different machines is a requirement, and configuration management software helps you achieve that.

Furthermore, once the configuration comes from a canonical source with version control, tracking and rolling back configuration changes becomes trivial. If there is an outage, you don't need to ask all potentially relevant colleagues whether they changed anything -- your version control system can easily tell you. And if you suspect that a recent change caused the outage, reverting it to see if the revert works is a matter of seconds or minutes.

Once configuration and deployment are automated, building new environments, for example for penetration testing, becomes a much more manageable task.

Capabilities of a Configuration Management System

Typical tasks and capabilities of configuration management software include things like connecting to the remote host, copying files to the host (and often adjusting parameters and filling out templates in the process), ensuring that operating system packages are installed or absent, creating users and groups, controlling services, and even executing arbitrary commands on the remote host.

With Ansible, the connection to the remote host is provided by the core, and the actual steps to be executed are provided by modules. For example the apt_repository module can be used to manage repository configuration (i.e. files in /etc/apt/sources.list.d/), the apt module installs, upgrades, downgrades or removes packages, and the template module typically generates configuration files from variables that the user defined, and from facts that Ansible itself gathered.

There are also higher-level Ansible modules available, for example for managing Docker images, or load balancers from the Amazon cloud.

A complete introduction to Ansible is out of scope here, but I can recommend the online documentation, as well as the excellent book Ansible: Up and Running by Lorin Hochstein.

To get a feeling for what you can do with Ansible, see the ansible-examples git repository.

Assuming that you will find your way around configuration management with Ansible through other resources, I want to talk about how you can integrate it into the deployment pipeline instead.

Integrating Configuration Management with Continuous Delivery

The previous approach of writing one deployment playbook for each application can serve as a starting point for configuration management. You can simply add more tasks to the playbook, for example for creating the configuration files that the application needs. Then each deployment automatically ensures the correct configuration.

Since most modules in Ansible are idempotent, that is, repeated execution doesn't change the state of the system after the first time, adding additional tasks to the deployment playbook only becomes problematic when performance suffers. If that happens, you could start to extract some slow steps out into a separate playbook that doesn't run on each deployment.

If you provision and configure a new machine, you typically don't want to manually trigger the deploy step of each application, but rather have a single command that deploys and configures all of the relevant applications for that machine. So it makes sense to also have a playbook for deploying all relevant applications. This can be as simple as a list of include statements that pull in the individual application's playbooks.

You can add another pipeline that applies this "global" configuration to the testing environment, and after manual approval, in the production environment as well.

Stampedes and Synchronization

In the scenario outlined above, the configuration for all related applications lives in the same git repository, and is used a material in the build and deployment pipelines for all these applications.

A new commit in the configuration repository then triggers a rebuild of all the applications. For a small number of applications, that's usually not a problem, but if you have a dozen or a few dozen applications, this starts to suck up resources unnecessarily, and also means no build workers are available for some time to build changes triggered by actual code changes.

To avoid these build stampedes, a pragmatic approach is to use ignore filters in the git materials. Ignore filters are typically used to avoid rebuilds when only documentation changes, but can also be used to prevent any changes in a repository to trigger a rebuild.

If, in the <materials> section of your GoCD pipeline, you replace

<git url="https://github.com/moritz/deployment-utils.git" dest="deployment-utils" materialName="deployment-utils" />

With

<git url="https://github.com/moritz/deployment-utils.git" dest="deployment-utils" materialName="deployment-utils">
  <filter>
    <ignore pattern="**/*" />
    <ignore pattern="*" />
  </filter>
</git>

then a newly pushed commit to the deployment-utils repo won't trigger this pipeline. A new build, triggered either manually or from a new commit in the application's git repository, still picks up the newest version of the deployment-utils repository.

In the pipeline that deploys all of the configuration, you wouldn't add such a filter.

Now if you change some playbooks, the pipeline for the global configuration runs and rolls out these changes, and you promote the newest version to production. When you then deploy one of your applications to production, and the build happened before the changes to the playbook, it actually uses an older version of the playbook.

This sounds like a very unfortunate constellation, but it turns out not to be so bad. The combination of playbook version and application version worked in testing, so it should work in production as well.

To avoid using an older playbook, you can trigger a rebuild of the application, which automatically uses the newest playbook version.

Finally, in practice it is a good idea to bring most changes to production pretty quickly anyway. If you don't do that, you lose overview of what changed, which leads to growing uncertainty about whether a production release is safe. If you follow this ideal of going quickly to production, the version mismatches between the configuration and application pipelines should never become big enough to worry about.

Conclusion

The deployment playbooks that you write for your applications can be extended to do full configuration management for these applications. You can create a "global" Ansible playbook that includes those deployment playbooks, and possibly other configuration, such as basic configuration of the system.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

Tue, 21 Jun 2016

Automating Deployments: Smoke Testing and Rolling Upgrades


Permanent link

In the last installment I talked about unit testing that covers the logic of your application. Unit testing is a good and efficient way to ensure the quality of the business logic, however unit tests tend to test components in isolation.

You should also check that several components work together well, which can be done with integration tests or smoke tests. The distinction between these two is a bit murky at times, but typically integration tests are still done somewhat in isolation, whereas smoke tests are run against an installed copy of the software in a complete environment, with all external services available.

A smoke test thus goes through the whole software stack. For a web application, that typically entails a web server, an application server, a database, and possibly integration points with other services such as single sign-on (SSO) or external data sources.

When to Smoke?

Smoke tests cover a lot of ground at once. A single test might require a working network, correctly configured firewall, web server, application server, database, and so on to work. This is an advantage, because it means that it can detect a big class of errors, but it is also a disadvantage, because it means the diagnostic capabilities are low. When it fails, you don't know which component is to blame, and have to investigate each failure anew.

Smoke tests are also much more expensive than unit tests; they tend to take more time to write, take longer to execute, and are more fragile in the face of configuration or data changes.

So typical advice is to have a low number of smoke tests, maybe one to 20, or maybe around one percent of the unit tests you have.

As an example, if you were to develop a flight search and recommendation engine for the web, your unit tests would cover different scenarios that the user might encounter, and that the engine produces the best possible suggestions. In smoke tests, you would just check that you can enter the starting point, destination and date of travel, and that you get a list of flight suggestions at all. If there is a membership area on that website, you would test that you cannot access it without credentials, and that you can access it after logging in. So, three smoke tests, give or take.

White Box Smoke Testing

The examples mentioned above are basically black-box smoke testing, in that they don't care about the internals of the application, and approach the application just like a user. This is very valuable, because ultimately you care about your user's experience.

But sometimes some aspects of the application aren't easy to smoke test, yet break often enough to warrant automated smoke tests. A practical solution is to offer some kind of self diagnosis, for example a web page where the application tests its own configuration for consistency, checks that all the necessary database tables exist, and that external services are reachable.

Then a single smoke test can call the status page, and throw an error whenever either the status page is not reachable, or reports an error. This is a white box smoke test.

Status pages for white box smoke tests can be reused in monitoring checks, but it is still a good idea to explicitly check it as part of the deployment process.

White box smoke testing should not replace black box smoke testing, but rather complement it.

An Example Smoke Test

The matheval application from the previous blog post offers a simple HTTP endpoint, so any HTTP client will do for smoke testing.

Using the curl command line HTTP client, a possible request looks like this:

$ curl  --silent -H "Accept: application/json" --data '["+", 37, 5]' -XPOST  http://127.0.0.1:8800/
42

An easy way to check that the output matches expectations is by piping it through grep:

$ curl  --silent -H "Accept: application/json" --data '["+", 37, 5]' -XPOST  http://127.0.0.1:8800/ | grep ^42$
42

The output is the same as before, but the exit status is non-zero if the output deviates from the expectation.

Integration the Smoke Testing Into the Pipeline

One could add a smoke test stage after each deployment stage (that is, one after the test deployment, one after the production deployment).

This setup would prevent a version of your application from reaching the production environment if it failed smoke tests in the testing environment. Since the smoke test is just a shell command that indicates failure with a non-zero exit status, adding it as a command in your deployment system should be trivial.

If you have just one instance of your application running, this is the best you can do. But if you have a farm of servers, and several instances of the application running behind some kind of load balancer, it is possible to smoke test each instance separately during an upgrade, and abort the upgrade if too many instances fail the smoke test.

All big, successful tech companies guard their production systems with such partial upgrades guarded by checks, or even more elaborate versions thereof.

A simple approach to such a rolling upgrade is to write an ansible playbook for the deployment of each package, and have it run the smoke tests for each machine before moving to the next:

# file smoke-tests/python-matheval
#!/bin/bash
curl  --silent -H "Accept: application/json" --data '["+", 37, 5]' -XPOST  http://$1:8800/ | grep ^42$


# file ansible/deploy-python-matheval.yml
---
- hosts: web
  serial: 1
  max_fail_percentage: 1
  tasks:
    - apt: update_cache=yes package=python-matheval={{package_version}} state=present force=yes
    - local_action: command ../smoke-tests/python-matheval "{{ansible_host}}"
      changed_when: False

As the smoke tests grow over time, it is not practical to cram them all into the ansible playbook, and doing that also limits reusability. Instead here they are in a separate file in the deployments utils repository. Another option would be to build a package from the smoke tests and install them on the machine that ansible runs on.

While it would be easy to execute the smoke tests command on the machine on which the service is installed, running it as a local action (that is, on the control host where the ansible playbook is started) also tests the network and firewall part, and thus more realistically mimics the actual usage scenario.

GoCD Configuration

To run the new deployment playbook from within the GoCD pipeline, change the testing deployment job in the template to:

        <tasks>
          <fetchartifact pipeline="" stage="build" job="build-deb" srcfile="version" />
          <exec command="/bin/bash" workingdir="deployment-utils/ansible/">
            <arg>-c</arg>
            <arg>ansible-playbook --inventory-file=testing --extra-vars=package_version=$(&lt; ../../version) #{deploy_playbook}</arg>
          </exec>
        </tasks>

And the same for production, except that it uses the production inventory file. This change to the template also changes the parameters that need to be defined in the pipeline definition. In the python-matheval example it becomes

  <params>
    <param name="distribution">jessie</param>
    <param name="package">python-matheval</param>
    <param name="deploy_playbook">deploy-python-matheval.yml</param>
  </params>

Since there are two pipelines that share the same template, the second pipeline (for package package-info) also needs a deployment playbook. It is very similar to the one for python-matheval, it just lacks the smoke test for now.

Conclusion

Writing a small amount of smoke tests is very beneficial for the stability of your applications.

Rolling updates with integrated smoke tests for each system involved are pretty easy to do with ansible, and can be integrated into the GoCD pipeline with little effort. They mitigate the damage of deploying a bad version or a bad configuration by limiting it to one system, or a small number of systems in a bigger cluster.

With this addition, the deployment pipeline is likely to be as least as robust as most manual deployment processes, but much less effort, easier to scale to more packages, and gives more insight about the timeline of deployments and installed versions.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

Tue, 14 Jun 2016

Automated Deployments: Unit Testing


Permanent link

Automated testing is absolutely essential for automated deployments. When you automate deployments, you automatically do them more often than before, which means that manual testing becomes more effort, more annoying, and is usually skipped sooner or later.

So to maintain a high degree of confidence that a deployment won't break the application, automated tests are the way to go.

And yet, I've written twenty blog posts about automating deployments, and this is the first about testing. Why did I drag my feet like this?

For one, testing is hard to generalize. But more importantly, the example project used so far doesn't play well with my usual approach to testing.

Of course one can still test it, but it's not an idiomatic approach that scales to real applications.

The easy way out is to consider a second example project. This also provides a good excuse to test the GoCD configuration template, and explore another way to build Debian packages.

Meet python-matheval

python-matheval is a stupid little web service that accepts a tree of mathematical expressions encoded in JSON format, evaluates it, and returns the result in the response. And as the name implies, it's written in python. Python3, to be precise.

The actual evaluation logic is quite compact:

# file src/matheval/evaluator.py
from functools import reduce
import operator

ops = {
    '+': operator.add,
    '-': operator.add,
    '*': operator.mul,
    '/': operator.truediv,
}

def math_eval(tree):
    if not isinstance(tree, list):
        return tree
    op = ops[tree.pop(0)]
    return reduce(op, map(math_eval, tree))

Exposing it to the web isn't much effort either, using the Flask library:

# file src/matheval/frontend.py
#!/usr/bin/python3

from flask import Flask, request

from matheval.evaluator import math_eval

app = Flask(__name__)

@app.route('/', methods=['GET', 'POST'])
def index():
    tree = request.get_json(force=True)
    result = math_eval(tree);
    return str(result) + "\n"

if __name__ == '__main__':
    app.run(debug=True)

The rest of the code is part of the build system. As a python package, it should have a setup.py in the root directory

# file setup.py
!/usr/bin/env python

from setuptools import setup

setup(name='matheval',
      version='1.0',
      description='Evaluation of expression trees',
      author='Moritz Lenz',
      author_email='moritz.lenz@gmail.com',
      url='https://deploybook.com/',
      package_dir={'': 'src'},
      requires=['flask', 'gunicorn'],
      packages=['matheval']
     )

Once a working setup script is in place, the tool dh-virtualenv can be used to create a Debian package containing the project itself and all of the python-level dependencies.

This creates rather large Debian packages (in this case, around 4 MB for less than a kilobyte of actual application code), but on the upside it allows several applications on the same machine that depend on different versions of the same python library. The simple usage of the resulting Debian packages makes it well worth in many use cases.

Using dh-virtualenv is quite easy:

# file debian/rules
#!/usr/bin/make -f
export DH_VIRTUALENV_INSTALL_ROOT=/usr/share/python-custom

%:
    dh $@ --with python-virtualenv --with systemd

override_dh_virtualenv:
    dh_virtualenv --python=/usr/bin/python3

See the github repository for all the other boring details, like the systemd service files and the control file.

The integration into the GoCD pipeline is easy, using the previously developed configuration template:

<pipeline name="python-matheval" template="debian-base">
  <params>
    <param name="distribution">jessie</param>
    <param name="package">python-matheval</param>
    <param name="target">web</param>
  </params>
  <materials>
    <git url="https://github.com/moritz/python-matheval.git" dest="python-matheval" materialName="python-matheval" />
    <git url="https://github.com/moritz/deployment-utils.git" dest="deployment-utils" materialName="deployment-utils" />
  </materials>
</pipeline>

Getting Started with Testing, Finally

It is good practise and a good idea to cover business logic with unit tests.

The way that evaluation logic is split into a separate function makes it easy to test said function in isolation. A typical way is to feed some example inputs into the function, and check that the return value is as expected.

# file test/test-evaluator.py
import unittest
from matheval.evaluator import math_eval

class EvaluatorTest(unittest.TestCase):
    def _check(self, tree, expected):
        self.assertEqual(math_eval(tree), expected)

    def test_basic(self):
        self._check(5, 5)
        self._check(['+', 5], 5)
        self._check(['+', 5, 7], 12)
        self._check(['*', ['+', 5, 4], 2], 18)

if __name__ == '__main__':
    unittest.main()

One can execute the test suite (here just one test file so far) with the nosetests command from the nose python package:

$ nosetests
.
----------------------------------------------------------------------
Ran 1 test in 0.004s

OK

The python way of exposing the test suite is to implement the test command in setup.py, which can be done with the line

test_suite='nose.collector',

in the setup() call in setup.py. And of course one needs to add nose to the list passed to the requires argument.

With these measures in place, the debhelper and dh-virtualenv tooling takes care of executing the test suite as part of the Debian package build. If any of the tests fail, so does the build.

Running the test suite in this way is advantageous, because it runs the tests with exactly the same versions of all involved python libraries as end up in Debian package, and thus make up the runtime environment of the application. It is possible to achieve this through other means, but other approaches usually take much more work.

Conclusions

You should have enough unit tests to make you confident that the core logic of your application works correctly. It is a very easy and pragmatic solution to run the unit tests as part of the package build, ensuring that only "good" versions of your software are ever packaged and installed.

In future blog posts, other forms of testing will be explored.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link