Categories

Ads

Your advertisement could be here -- Contact me!.

Thu, 28 Jan 2016

Introducing Go Continuous Delivery


Permanent link

Go Continuous Delivery (short GoCD or simply Go) is an open source tool that controls an automated build or deployment process.

It consists of a server component that holds the pipeline configuration, polls source code repositories for changes, schedules and distributes work, collects artifacts, and presents a web interface to visualize and control it all, and offers a mechanism for manual approval of steps. One or more agents can connect to the server, and carry out the actual jobs in the build pipeline.

Pipeline Organization

Every build, deployment or test jobs that GoCD executes must be part of a pipeline. A pipeline consists of one or more linearly arranged stages. Within a stage, jobs run potentially in parallel, and are individually distributed to agents. Tasks are again linearly executed within a job. The most general task is the execution of an external program. Other tasks include the retrieval of artifacts, or specialized things such as running a Maven build.

Matching of Jobs to Agents

When an agent is idle, it polls the server for work. If the server has jobs to run, it uses two criteria to decide if the agent is fit for carrying out the job: environments and resources.

Each job is part of a pipeline, and a pipeline is part of an environment. On the other hand, each agent is configured to be part of one or more environments. An agent only accepts jobs from pipelines from one of its environments.

Resources are user-defined labels that describe what an agent has to offer, and inside a pipeline configuration, you can specify what resources a job needs. For example you can define that job requires the phantomjs resource to test a web application, then only agents that you assign this resource will execute that job. It is also a good idea to add the operating system and version as a resources. In the example above, the agent might have the phantomjs, debian and debian-jessie resources, offering the author of the job some choice of granularity for specifying the required operating system.

Installing the Go Server on Debian

To install the Go server on a Debian or Debian-based operating system, first you have to make sure you can download Debian packages via HTTPS:

$ apt-get install -y apt-transport-https

Then you need to configure the package sourcs:

$ echo 'deb http://dl.bintray.com/gocd/gocd-deb/ /' > /etc/apt/sources.list.d/gocd.list
$ curl https://bintray.com/user/downloadSubjectPublicKey?username=gocd | apt-key add -

And finally install it:

$ apt-get update && apt-get install -y go-server

When you now point your browser at port 8154 of the go server for HTTPS (ignore the SSL security warnings) or port 8153 for HTTP, you should see to go server's web interface:

To prevent unauthenticated access, create a password file (you need to have the apache2-utils package installed to have the htpasswd command available) on the command line:

$ htpasswd -c -s /etc/go-server-passwd go-admin
New password:
Re-type new password:
Adding password for user go-admin
$ chown go: /etc/go-server-passwd
$ chmod 600 /etc/go-server-passwd

In the go web interface, click on the Admin menu and then "Server Configuration". In the "User Management", enter the path /etc/go-server-passwd in the field "Password File Path" and click on "Save" at the bottom of the form.

Immediately afterwards, the go server asks you for username and password.

You can also use LDAP or Active Directory for authentication.

Installing a Go Worker on Debian

On one or more servers where you want to execute the automated build and deployment steps, you need to install a go agent, which will connect to the server and poll it for work. On each server, you need to do the first same three steps as when installing the server, to ensure that you can install packages from the go package repository. And then, of course, install the go agent:

$ apt-get install -y apt-transport-https
$ echo 'deb http://dl.bintray.com/gocd/gocd-deb/ /' > /etc/apt/sources.list.d/gocd.list
$ curl https://bintray.com/user/downloadSubjectPublicKey?username=gocd | apt-key add -
$ apt-get update && apt-get install -y go-agent

Then edit the file /etd/default/go-agent. The first line should read

GO_SERVER=127.0.0.1

Change the right-hand side to the hostname or IP address of your go server, and then start the agent:

$ service go-agent start

After a few seconds, the agent has contacted the server, and when you click on the "Agents" menu in the server's web frontend, you should see the agent:

("lara" is the host name of the agent here).

A Word on Environments

Go makes it possible to run agents in specific environments, and for example run a go agent on each testing and on each production machine, and use the matching of pipelines to agent environments to ensure that for example an installation step happens on the right machine in the right environment. If you go with this model, you can also use Go to copy the build artifacts to the machines where they are needed.

I chose not to do this, because I didn't want to have to install a go agent on each machine that I want to deploy to. Instead I use Ansible, executed on a Go worker, to control all machines in an environment. This requires managing the SSH keys that Ansible uses, and distributing packages through a Debian repository. But since Debian seems to require a repository anyway to be able to resolve dependencies, this is not much of an extra hurdle.

So don't be surprised when the example project here only uses a single environment in Go, which I call Control.

First Contact with Go's XML Configuration

There are two ways to configure your Go server: through the web interface, and through a configuration file in XML. You can also edit the XML config through the web interface.

While the web interface is a good way to explore go's capabilities, it quickly becomes annoying to use due to too much clicking. Using an editor with good XML support get things done much faster, and it lends itself better to compact explanation, so that's the route I'm going here.

In the Admin menu, the "Config XML" item lets you see and edit the server config. This is what a pristine XML config looks like, with one agent already registered:

<?xml version="1.0" encoding="utf-8"?>
<cruise xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="cruise-config.xsd" schemaVersion="77">
<server artifactsdir="artifacts" commandRepositoryLocation="default" serverId="b2ce4653-b333-4b74-8ee6-8670be479df9">
    <security>
    <passwordFile path="/etc/go-server-passwd" />
    </security>
</server>
<agents>
    <agent hostname="lara" ipaddress="192.168.2.43" uuid="19e70088-927f-49cc-980f-2b1002048e09" />
</agents>
</cruise>

The ServerId and the data of the agent will differ in your installation, even if you followed the same steps.

To create an environment and put the agent in, add the following section somewhere within <cruise>...</cruise>:

<environments>
    <environment name="Control">
    <agents>
        <physical uuid="19e70088-927f-49cc-980f-2b1002048e09" />
    </agents>
    </environment>
</environments>

(The agent UUID must be that of your agent, not of mine).

To give the agent some resources, you can change the <agent .../> tag in the <agents> section to read:

<agent hostname="lara" ipaddress="192.168.2.43" uuid="19e70088-927f-49cc-980f-2b1002048e09">
  <resources>
    <resource>debian-jessie</resource>
    <resource>build</resource>
    <resource>debian-repository</resource>
  </resources>
</agent>

Creating an SSH key

It is convenient for Go to have an SSH key without password, to be able to clone git repositories via SSH, for example.

To create one, run the following commands on the server:

$ su - go $ ssh-keygen -t rsa -b 2048 -N '' -f ~/.ssh/id_rsa

And either copy the resulting .ssh directory and the files therein onto each agent into the /var/go directory (and remember to set owner and permissions as they were created originally), or create a new key pair on each agent.

Ready to Go

Now that the server and an agent has some basic configuration, it is ready for its first pipeline configuration. Which we'll get to soon :-).


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

comments / trackbacks

Sun, 24 Jan 2016

Architecture of a Deployment System


Permanent link

An automated build and deployment system is structured as a pipeline.

A new commit or branch in a version control system triggers the instantiation of the pipeline, and starts executing the first of a series of stages. When a stage succeeds, it triggers the next one. If it fails, the entire pipeline instance stops.

Then manual intervention is necessary, typically by adding a new commit that fixes code or tests, or by fixing things with the environment or the pipeline configuration. A new instance of the pipeline then has a chance to succeed.

Deviations from the strict pipeline model are possible: branches, potentially executed in parallel, for example allow running different tests in different environments, and waiting with the next step until both are completed successfully.

The typical stages are building, running the unit tests, deployment to a first test environment, running integration tests there, potentially deployment to and tests in various test environments, and finally deployment to production.

Sometimes, these stages blur a bit. For example, a typical build of Debian packages also runs the unit tests, which alleviates the need for a separate unit testing stage. Likewise if the deployment to an environment runs integration tests for each host it deploys to, there is no need for a separate integration test stage.

Typically there is a piece of software that controls the flow of the whole pipeline. It prepares the environment for a stage, runs the code associated with the stage, collects its output and artifacts (that is, files that the stage produces and that are worth keeping, like binaries or test output), determines whether the stage was successful, and then proceeds to the next.

From an architectural standpoint, it relieves the stages of having to know what stage comes next, and even how to reach the machine on which it runs. So it decouples the stages.

Anti-Pattern: Separate Builds per Environment

If you use a branch model like git flow for your source code, it is tempting to automatically deploy the develop branch to the testing environment, and then make releases, merge them into the master branch, and deploy that to the production environment.

It is tempting because it is a straight-forward extension of an existing, proven workflow.

Don't do it.

The big problem with this approach is that you don't actually test what's going to be deployed, and on the flip side, deploy something untested to production. Even if you have a staging environment before deploying to production, you are invalidating all the testing you did the testing environment if you don't actually ship the binary or package that you tested there.

If you build "testing" and "release" packages from different sources (like different branches), the resulting binaries will differ. Even if you use the exact same source, building twice is still a bad idea, because many builds aren't reproducible. Non-deterministic compiler behavior, differences in environments and dependencies all can lead to packages that worked fine in one build, and failed in another.

It is best to avoid such potential differences and errors by deploying to production exactly the same build that you tested in the testing environment.

Differences in behavior between the environments, where they are desirable, should be implemented by configuration that is not part of the build. (It should be self-evident that the configuration should still be under version control, and also automatically deployed. There are tools that specialize in deploying configuration, like Puppet, Chef and Ansible.)


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

comments / trackbacks

Tue, 19 Jan 2016

Automating Deployments: 3+ Environments


Permanent link

Software is written to run in a production environment. This is where the goal of the business is achieved: making money for the business, or reaching and educating people, or whatever the reason for writing the software is. For websites, this is the typically the Internet-facing public servers.

But the production environment is not where you want to develop software. Developing is an iterative process, and comes with its own share of mistakes and corrections. You don't want your customers to see all those mistakes as you make them, so you develop in a different environment, maybe on your PC or laptop instead of a server, with a different database (though hopefully using the same database software as in the production environment), possibly using a different authentication mechanism, and far less data than the production environment has.

You'll likely want to prevent certain interactions in the development environment that are desirable in production: Sending notifications (email, SMS, voice, you name it), charging credit cards, provisioning virtual machines, opening rack doors in your data center and so on. How that is done very much depends on the interaction. You can configure a mail transfer agent to deliver all mails to a local file or mail box. Some APIs have dedicated testing modes or installations; in the worst case, you might have to write a mock implementation that answers similarly to the original API, but doesn't carry out the action that the original API does.

Deploying software straight to production if it has only been tested on the developer's machine is a rather bad practice. Often the environments are too different, and the developer unknowingly relied on a feature of his environment that isn't the same in the production environment. Thus it is quite common to have one or more environments in between where the software is deployed and tested, and only propagated to the next deployment environment when all the tests in the previous one were successful.

After a software is modified in the development environment, it is
deployed to the testing environment (with its own database), and if all tests
were successful, propagated to the production environment.

One of these stages is often called testing. This is where the software is shown to the stakeholders to gather feedback, and if manual QA steps are required, they are often carried out in this environment (unless there is a separate environment for that).

A reason to have another non-production environment is test service dependencies. If several different software components are deployed to the testing environment, and you decide to deploy one or two at a time to production, things might break in production. The component you deployed might have a dependency on a newer version of another component, and since the testing environment contained that newer version, nobody noticed. Or maybe a database upgrade in the testing environment failed, and had to be repaired manually; you don't want the same to happen in a production setting, so you decide to test in another environment first.

After a software is modified in the development environment, it is
deployed to the testing environment (with its own database), and if all tests
were successful, propagated to the staging  environment. Only if this works is
the deployment to production carried out

Thus many companies have another staging environment that mirrors the production environment as closely as possible. A planned production deployment is first carried out in the staging environment, and on success done in production too, or rolled back on error.

There are valid reasons to have more environments even. If automated performance testing is performed, it should be done in an separate environment where no manual usage is possible to avoid distorting results. Other tests such as automated acceptance or penetration testings are best done in their own environment.

One can add more environments for automated acceptance, penetration
     and performance testing for example; those typically come before the
     staging environment.

In addition, dedicated environment for testing and evaluating explorative features are possible.

It should be noted that while these environment all serve valid purposes, they also come at a cost. Machines, either virtual or native, on which all those environments run must be available, and they consume resources. They must be set up initially and maintained. License costs must be considered (for example for proprietary databases). Also the time for deploying code increases as the number of environment increases. With more environments, automating deployments and maybe even management and configuration of the infrastructure becomes mandatory.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

comments / trackbacks

Sat, 16 Jan 2016

Automating Deployments: Installing Packages


Permanent link

After the long build-up of building and distributing and authenticating packages, actually installing them is easy. On the target system, run

$ apt-get update $ apt-get install package-info

(replace package-info with the package you want to install, if that deviates from the example used previously).

If the package is of high quality, it takes care of restarting services where necessary, so no additional actions are necessary afterwards.

Coordination with Ansible

If several hosts are needed to provide a service, it can be beneficial to coordinate the update, for example only updating one or two hosts at a time, or doing a small integration test on each after moving on to the next.

A nice tool for doing that is Ansible, an open source IT automation system.

Ansibles starting point is an inventory file, which lists that hosts that Ansible works with, optionally in groups, and how to access them.

It is best practice to have one inventory file for each environment (production, staging, development, load testing etc.) with the same group names, so that you can deploy to a different environment simply by using a different inventory file.

Here is an example for an inventory file with two web servers and a database server:

# production
[web]
www01.yourorg.com
www02.yourorg.com

[database]
db01.yourorg.com

[all:vars]
ansible_ssh_user=root

Maybe the staging environment needs only a single web server:

# staging
[web]
www01.staging.yourorg.com

[database]
db01.stagingyourorg.com

[all:vars]
ansible_ssh_user=root

Ansible is organized in modules for separate tasks. Managing Debian packages is done with the apt module:

$ ansible -i staging web -m apt -a 'name=package-info update_cache=yes state=latest'

The -i option specifies the path to the inventory file, here staging. The next argument is the group of hosts (or a single host, if desired), and -m apt tells Ansible to use the apt module.

What comes after the -a is a module-specific command. name specifies a Debian package, update_cache=yes forces Ansible to run apt-get update before installing the latest version, and state=latest says that that's what we want to do.

If instead of the latest version we want a specific version, -a 'name=package-info=0.1 update_cache=yes state=present force=yes' is the way to go. Without force=yes, apt wouldn't downgrade the module to actually get the desired version.

This uses the ad-hoc mode of Ansible. More sophisticated deployments use playbooks, of which I hope to write more later. Those also allow you to do configuration tasks such as adding repository URLs and GPG keys for package authentication.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

comments / trackbacks

Wed, 13 Jan 2016

Automating Deployments: Distributing Debian Packages with Aptly


Permanent link

Once a Debian package is built, it must be distributed to the servers it is to be installed on.

Debian, as well as all other operating systems I know of, use a pull model for that. That is, the package and its meta data are stored on a server that the client can contact, and request the meta data and the package.

The sum of meta data and packages is called a repository. In order to distribution packages to the servers that need them, we must set up and maintain such a repository.

Signatures

In Debian land, packages are also signed cryptographically, to ensure packages aren't tampered with on the server or during transmission.

So the first step is to create a key pair that is used to sign this particular repository. (If you already have a PGP key for signing packages, you can skip this step).

The following assumes that you are working with a pristine system user that does not have a gnupg keyring yet, and which will be used to maintain the debian repository. It also assumes you have the gnupg package installed.

$ gpg --gen-key

This asks a bunch of questions, like your name and email address, key type and bit width, and finally a pass phrase. I left the pass phrase empty to make it easier to automate updating the repository, but that's not a requirement.

$ gpg --gen-key
gpg (GnuPG) 1.4.18; Copyright (C) 2014 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

gpg: directory `/home/aptly/.gnupg' created
gpg: new configuration file `/home/aptly/.gnupg/gpg.conf' created
gpg: WARNING: options in `/home/aptly/.gnupg/gpg.conf' are not yet active during this run
gpg: keyring `/home/aptly/.gnupg/secring.gpg' created
gpg: keyring `/home/aptly/.gnupg/pubring.gpg' created
Please select what kind of key you want:
   (1) RSA and RSA (default)
   (2) DSA and Elgamal
   (3) DSA (sign only)
   (4) RSA (sign only)
Your selection? 1
RSA keys may be between 1024 and 4096 bits long.
What keysize do you want? (2048) 
Requested keysize is 2048 bits
Please specify how long the key should be valid.
         0 = key does not expire
      <n>  = key expires in n days
      <n>w = key expires in n weeks
      <n>m = key expires in n months
      <n>y = key expires in n years
Key is valid for? (0) 
Key does not expire at all
Is this correct? (y/N) y
You need a user ID to identify your key; the software constructs the user ID
from the Real Name, Comment and Email Address in this form:
    "Heinrich Heine (Der Dichter) <heinrichh@duesseldorf.de>"

Real name: Aptly Signing Key
Email address: automatingdeployments@gmail.com
You selected this USER-ID:
    "Moritz Lenz <automatingdeployments@gmail.com>"

Change (N)ame, (C)omment, (E)mail or (O)kay/(Q)uit? O
You need a Passphrase to protect your secret key.

You don't want a passphrase - this is probably a *bad* idea!
I will do it anyway.  You can change your passphrase at any time,
using this program with the option "--edit-key".

We need to generate a lot of random bytes. It is a good idea to perform
some other action (type on the keyboard, move the mouse, utilize the
disks) during the prime generation; this gives the random number
generator a better chance to gain enough entropy.
..........+++++
.......+++++

Not enough random bytes available.  Please do some other work to give
the OS a chance to collect more entropy! (Need 99 more bytes)
..+++++
gpg: /home/aptly/.gnupg/trustdb.gpg: trustdb created
gpg: key 071B4856 marked as ultimately trusted
public and secret key created and signed.

gpg: checking the trustdb
gpg: 3 marginal(s) needed, 1 complete(s) needed, PGP trust model
gpg: depth: 0  valid:   1  signed:   0  trust: 0-, 0q, 0n, 0m, 0f, 1u
pub   2048R/071B4856 2016-01-10
      Key fingerprint = E80A D275 BAE1 DEDE C191  196D 078E 8ED8 071B 4856
uid                  Moritz Lenz <automatingdeployments@gmail.com>
sub   2048R/FFF787F6 2016-01-10

Near the bottom the line starting with pub contains the key ID:

pub   2048R/071B4856 2016-01-10

We'll need the public key later, so it's best to export it:

$ gpg --export --armor 071B4856 > pubkey.asc

Preparing the Repository

There are several options for managing Debian repositories. My experience with debarchiver is mixed: Once set up, it works, but it does not give immediate feedback on upload; rather it communicates the success or failure by email, which isn't very well-suited for automation.

Instead I use aptly, which works fine from the command line, and additionally supports several versions of the package in one repository.

To initialize a repo, we first have to come up with a name. Here I call it internal.

$ aptly repo create -distribution=jessie -architectures=amd64,i386,all -component=main internal

Local repo [internal] successfully added.
You can run 'aptly repo add internal ...' to add packages to repository.

$ aptly publish repo -architectures=amd64,i386,all internal
Warning: publishing from empty source, architectures list should be complete, it can't be changed after publishing (use -architectures flag)
Loading packages...
Generating metadata files and linking package files...
Finalizing metadata files...
Signing file 'Release' with gpg, please enter your passphrase when prompted:
Clearsigning file 'Release' with gpg, please enter your passphrase when prompted:

Local repo internal has been successfully published.
Please setup your webserver to serve directory '/home/aptly/.aptly/public' with autoindexing.
Now you can add following line to apt sources:
  deb http://your-server/ jessie main
Don't forget to add your GPG key to apt with apt-key.

You can also use `aptly serve` to publish your repositories over HTTP quickly.

As the message says, there needs to be a HTTP server that makes these files available. For example an Apache virtual host config for serving these files could look like this:

<VirtualHost *:80>
        ServerName apt.example.com
        ServerAdmin moritz@example.com

        DocumentRoot /home/aptly/.aptly/public/
        <Directory /home/aptly/.aptly/public/>
                Options +Indexes +FollowSymLinks

                Require all granted
        </Directory>

        # Possible values include: debug, info, notice, warn, error, crit,
        # alert, emerg.
        LogLevel notice
        CustomLog /var/log/apache2/apt/access.log combined
        ErrorLog /var/log/apache2/apt/error.log
        ServerSignature On
</VirtualHost>

After creating the logging directory (mkdir -p /var/log/apache2/apt/), enabling the the virtual host (a2ensite apt.conf) and restarting Apache, the Debian repository is ready.

Adding Packages to the Repository

Now that the repository is set up, you can add a package by running

$ aptly repo add internal package-info_0.1-1_all.deb
$ aptly publish update internal

Configuring a Host to use the Repository

Copy the PGP public key with which the repository is signed (pubkey.asc) to the host which shall use the repository, and import it:

$ apt-key add pubkey.asc

Then add the actual package source:

$ echo "deb http://apt.example.com/ jessie main" > /etc/apt/source.list.d/internal

After an apt-get update, the contents of the repository are available, and an apt-cache policy package-info shows the repository as a possible source for this package:

$ apt-cache policy package-info
package-info:
  Installed: (none)
  Candidate: 0.1-1
  Version table:
 *** 0.1-1 0
        990 http://apt.example.com/ jessie/main amd64 Packages
        100 /var/lib/dpkg/status

This concludes the whirlwind tour through debian repository management and thus package distribution. Next up will be the actual package installation.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

comments / trackbacks

Sat, 09 Jan 2016

Automating Deployments: Debian Packaging for an Example Project


Permanent link

After general notes on Debian packaging, I want to introduce an example project, and how it's packaged.

The Project

package-info is a minimalistic web project, written solely for demonstrating packaging and deployment. When called in the browser, it produces a text document containing the output of dpkg -l, which gives an overview of installed (and potentially previously installed) packages, their version, installation state and a one-line description.

It is written in Perl using the Mojolicious web framework.

The actual code resides in the file usr/lib/package-info/package-info and is delightfully short:

#!/usr/bin/perl
use Mojolicious::Lite;

plugin 'Config';

get '/' => sub {
    my $c = shift;

    $c->render(text => scalar qx/dpkg -l/, format => 'text');
};

app->start;

It loads the "Lite" version of the framework, registers a route for the URL /, which renders as plain text the output of the system command dpkg -l, and finally starts the application.

It also loads the Config-Plugin, which is used to specify the PID file for the server process.

The corresponding config file in etc/package-info.conf looks like this:

#!/usr/bin/perl
{
    hypnotoad => {
        pid_file => '/var/run/package-info/package-info.pid',
    },
}

which again is perl code, and specifies the location of the PID file when run under hypnotoad, the application server recommended for use with Mojolicious.

To test it, you can install the libmojolicious-perl package, and run MOJO_CONFIG=$PWD/etc/package-info.conf morbo usr/lib/package-info/package-info. This starts a development server on port 3000. Pointing your browser at http://127.0.0.1:3000/, you should see a list like this:

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                  Version                              Architecture Description
+++-=====================================-====================================-============-===============================================================================
ii  ack-grep                              2.14-4                               all          grep-like program specifically for large source trees
ii  acl                                   2.2.52-2                             amd64        Access control list utilities
rc  acroread-debian-files                 0.2.5                                amd64        Debian specific parts of Adobe Acrobat Reader
ii  adduser                               3.113+nmu3                           all          add and remove users and groups
ii  adwaita-icon-theme                    3.14.0-2                             all          default icon theme of GNOME

though much longer.

Initial Packaging

Installing dh-make and running dh_make --createorig -p package-info_0.1 gives us a debian directory along with several files.

I started by editing debian/control to look like this:

Source: package-info
Section: main
Priority: optional
Maintainer: Moritz Lenz 
Build-Depends: debhelper (>= 9)
Standards-Version: 3.9.5

Package: package-info
Architecture: all
Depends: ${misc:Depends}, libmojolicious-perl
Description: Web service for getting a list of installed packages

Debian packages support the notion of source package, which a maintainer uploads to the Debian build servers, and from which one or more binary package are built. The control reflects this structure, with the first half being about the source package and its build dependencies, and the second half being about the binary package.

Next I deleted the file debian/source/format, which by default indicates the use of the quilt patch management system, which isn't typically used in git based workflows.

I leave debian/rules, debian/compat and debian/changelog untouched, and create a file debian/install with two lines:

etc/package-info.conf
usr/lib/package-info/package-info

In lieu of a proper build system, this tells dh_install which files to copy into the debian package.

This is a enough for a building a Debian package. To trigger the build, this command suffices:

debuild -b -us -uc

The -b instructs debuild to only create a binary package, and the two -u* options skips the steps where debuild cryptographically signs the generated files.

This command creates three files in the directory above the source tree: package-info_0.1-1_all.deb, package-info_0.1-1_amd64.changes and package-info_0.1-1_amd64.build. The .deb file contains the actual program code and meta data, the .changes file meta data about the package as well as the last changelog entry, and the .build file a transcript of the build process.

A Little Daemonology

Installing the .deb file from the previous step would install a working software, but you'd have to start it manually.

Instead, it is useful to provide means to automatically start the server process at system boot time. Traditionally, this has been done by shipping init scripts. Since Debian transitioned to systemd as its init system with the "Jessie" / 8 version, systemd service files are the new way to go, and luckily much shorter than a robust init script.

The service file goes into debian/package-info.service:

[Unit]
Description=Package installation information via http
Requires=network.target
After=network.target

[Service]
Type=simple
RemainAfterExit=yes
SyslogIdentifier=package-info
PIDFile=/var/run/package-info/package-info.pid
Environment=MOJO_CONFIG=/etc/package-info.conf
ExecStart=/usr/bin/hypnotoad /usr/lib/package-info/package-info -f
ExecStop=/usr/bin/hypnotoad -s /usr/lib/package-info/package-info
ExecReload=/usr/bin/hypnotoad /usr/lib/package-info/package-info

The [Unit] section contains the service description, as well as the specification when it starts. The [Service] section describes the service type, where simple means that systemd expects the start command to not terminate as long as the process is running. With Environment, environment variables can be set for all three of the ExecStart, ExecStop and ExecReload commands.

Another debhelper, dh-systemd takes care of installing the service file, as well as making sure the service file is read and the service started or restarted after a package installation. To enable it, dh-systemd must be added to the Build-Depends line in file debian/control, and the catch-all build rule in debian/rules be changed to:

%:
        dh $@ --with systemd

To enable hypnotoad to write the PID file, the containing directory must exists. Writing /var/run/package-info/ into a new debian/dirs file ensures this directory is created at package installation.

To test the changes, again invoke debuild -b -us -uc and install the resulting .deb file with sudo dpkg -i ../package-info_0.1-1_all.deb.

The server process should now listen on port 8080, so you can test it with curl http://127.0.0.1:8080/ | head.

A Bit More Security

As it is now, the application server and the application run as the root user, which violates the Principle of least privilege. Instead it should run as a separate user, package-info that isn't allowed to do much else.

To make the installation as smooth as possible, the package should create the user itself if it doesn't exist. The debian/postinst script is run at package installation time, and is well suited for such tasks:

#!/bin/sh

set -e
test $DEBIAN_SCRIPT_DEBUG && set -v -x

export PATH=$PATH:/sbin:/usr/sbin:/bin:/usr/bin

USER="package-info"

case "$1" in
    configure)
        if ! getent passwd $USER >/dev/null ; then
            adduser --system $USER
        fi
        chown -R $USER /var/run/package-info/
    ;;
esac

#DEBHELPER#

exit 0

There are several actions that a postinst script can execute, and configure is the right one for creating users. At this time, the files are already installed.

Note that it also changes the permissions for the directory in which the PID file is created, so that when hypnotoad is invoked as the package-info user, it can still create the PID file.

Please note the presence of the #DEBHELPER# tag, which the build system replaces with extra actions. Some of these come from dh-systemd, and take care of restarting the service after installation, and enabling it for starting after a reboot on first installation.

To set the user under which the service runs, adding the line User=package-info to the [UNIT] section of debian/package-info.service.

Linux offers more security features that can be enabled in a declarative way in the systemd service file in the [Unit] section. Here are a few that protect the rest of the system from the server process, should it be exploited:

PrivateTmp=yes
InaccessibleDirectories=/home
ReadOnlyDirectories=/bin /sbin /usr /lib /etc

Additional precautions can be taken by limiting the number of processes that can be spawned and the available memory through the LimitNPROC and MemoryLimit options.

The importance of good packaging

If you tune your packages so that they do as much configuration and environment setup themselves, you benefit two-fold. It makes it easy to the package in any context, regardless of whether it is embedded in a deployment system. But even if it is part of a deployment system, putting the package specific bits into the package itself helps you keep the deployment system generic, and thus easy to extend to other packages.

For example configuration management systems such as Ansible, Chef and Puppet allow you to create users and to restart services when a new package version is available, but if you rely on that, you have to treat each package separately in the configuration management system.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

comments / trackbacks

Wed, 06 Jan 2016

Automating Deployments: Building Debian Packages


Permanent link

I have argued before that it is a good idea to build packages from software you want to automatically deploy. The package manager gives you dependency management as well as the option to execute code at defined points in the installation process, which is very handy for restarting services after installation, creating necessary OS-level users and so on.

Which package format to use?

There are many possible package formats and managers for them out there. Many ecosystems and programming languages come with their own, for example Perl uses CPAN.pm or cpanminus to install Perl modules, the NodeJS community uses npm, ruby has the gem installer, python pip and easyinstall, and so on.

One of the disadvantages is that they only work well for one language. If you or your company uses applications uses software in multiple programming languages, and you chose to use the language-specific packaging formats and tools for each, you burden yourself and the operators with having to know (and being aware of) these technologies.

Operating teams are usually familiar with the operating system's package manager, so using that seems like an obvious choice. Especially if the same operating system family is used throughout the whole organization. In specialized environments, other solutions might be preferable.

What's in a Debian package, and how do I build one?

A .deb file is an ar archive with meta data about the archive format version, meta data for the package (name, version, installation scripts) and the files that are to be installed.

While it is possible to build such a package directly, the easier and much more common route is to use the tooling provided by the devscripts package. These tools expect the existence of a debian/ directory with various files in them.

debian/control contains information such as the package name, dependencies, maintainer and description. debian/rules is a makefile that controls the build process of the debian package. debian/changelog contains a human-readable summary of changes to the package. The top-most changelog entry determines the resulting version of the package.

You can use dh_make from the dh-make package to generate a skeleton of files for the debian/ directory, which you can then edit to your liking. This will ask your for the architecture of the package. You can use a specific one like amd64, or the word any for packages that can be build on any architecture. If resulting package is architecture independent (as as is the case for many scripting languages), using all as the architecture is appropriate.

Build process of a Debian package

If you use dh_make to create a skeleton, debian/rules mostly consists of a catch-all rule that calls dh $@. This is tool that tries to do the right thing for each build step automatically, and usually it succeeds. If there is a Makefile in your top-level directory, it will call the configure, build, check and install make targets for you. If your build system installs into the DESTDIR prefix (which is set to debian/your-package-name), it should pretty much work out of the box.

If you want to copy additional files into the Debian package, listing the file names, one on each line, in debian/install, that is done automatically for you.

Shortcuts

If you have already packaged your code for distribution through language-specific tools, such as CPAN (Perl) or pip (Python), there are shortcuts to creating Debian Packages.

Perl

The tool dh-make-perl (installable via the package of the same name) can automatically create a debian directory based on the perl-specific packaging. Calling dh-make-perl . inside the root directory of your perl source tree is often enough to create a functional Debian package. It sticks to the naming convention that a Perl package Awesome::Module becomes libawesome-module-perl in Debian land.

Python

py2dsc from the python-stdeb package generates a debian/ directory from an existing python tarball.

Another approach is to use dh-virtualenv. This copies all of the python dependencies into a virtualenv, so the resulting packages only depends on the system python and possible C libraries that the python packages use; all python-level dependencies are baked in. This tends to produce bigger packages with fewer dependencies, and allows you to run several python programs on a single server, even if they depend on different versions of the same python library.

dh-virtualenv has an unfortunate choice of default installation prefix that clashes with some assumptions that Debian's python packages make. You can override that choice in debian/rules:

#!/usr/bin/make -f
export DH_VIRTUALENV_INSTALL_ROOT=/usr/share/yourcompany
%:
        dh $@ --with python-virtualenv --with systemd

It also assumes Pyhton 2 by default. For a Python 3 based project, add these lines:

override_dh_virtualenv:
        dh_virtualenv --python=/usr/bin/python3

(As always with Makefiles, be sure to indent with hard tabulator characters, not with spaces).


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

comments / trackbacks

Tue, 05 Jan 2016

Automating Deployments: Simplistic Deployment with Git and Bash


Permanent link

One motto of the Extreme Programming movement is to do the simplest thing that can possibly work, and only get more fancy when it is necessary.

In this spirit, the simplest deployment option for some projects is to change the working directory in a clone of the project's git repository, and run

git pull

If this works, it has a certain beauty of mirroring pretty much exactly what developers do in their development environment.

Reality kicks in

But it only works if all of these conditions are met:

  • There is already a checkout of the git repository, and it's configured correctly.
  • There are no local changes in the git repository.
  • There were no forced updates in the remote repository.
  • No additional build or test step is required.
  • The target machine has git installed, and both network connection to and credentials for the git repository server.
  • The presence of the .git directory poses no problem.
  • No server process needs to be restarted.
  • No additional dependencies need to be installed.

As an illustration on how to attack some of these problems, let's consider just the second point: local modifications in the git repository. It happens, for example when people try out things, or do emergency fixes etc. git pull does a fetch (which is fine), and a merge. Merging is an operation that can fail (for example if local uncommitted changes or local commits exists) and that requires manual intervention.

Manual changes are a rather bad thing to have in an environment where you want to deploy automatically. Their presence leave you two options: discard them, or refuse to deploy. If you chose the latter approach, git pull --ff-only is a big improvement; this will only do the merge if it is a trivial fast-forward merge, that is a merge where the local side didn't change at all. If that's not the case (that is, a local commit exists), the command exits with a non-zero return value, which the caller should interpret as a failure, and report the error somehow. If it's called as part of a cron job, the standard approach is to send an email containing the error message.

If you chose to discard the changes instead, you could do a git stash for getting rid of uncommitted changes (and at the same time preserving them for a time in the deps of the .git directory for later inspection), and doing a reset or checkout instead of the merge, so that the command sequence would read:

set -e
git fetch origin
git checkout --force origin/master

(This puts the local repository in a detached head state, which tends make manual working with it unpleasant; but at this point we have reserve this copy of the git repository for deployment only; manual work should be done elsewhere).

More Headaches

For very simple projects, using the git pull approach is fine. For more complex software, you have to tackle each of these problems, for example:

  • Clone the git repo first if no local copy exists
  • Discard local changes as discussed above (or remove the old copy, and always clone anew)
  • Have a separate checkout location (possibly on a different server), build and test there.
  • Copy the result over to the destination machine (but exclude the .git dir).
  • Provide a way to declare dependencies, and install them before doing the final copy step.
  • Provide a way to restart services after the copying

So you could build all these solutions -- or realize that they exist. Having a dedicated build server is an established pattern, and there are lot of software solutions for dealing with that. As is building a distributable software package (like .deb or .rpm packages), for which distribution systems exist -- the operating system vendors use it all the time.

Once you build Debian packages, the package manager ensure that dependencies are installed for you, and the postinst scripts provide a convenient location for restarting services.

If you chose that road, you gets lots of established tooling that wasn't explicitly mentioned above, but which often makes live much easier: Querying the database of existing packages, listing installed versions, finding which package a file comes from, extra security through package signing and signature verification, the ability to create meta packages, linter that warn about common packaging mistakes, and so on.

I'm a big fan of reusing existing solutions where it makes sense, and I feel this is a space where reusing can save huge amounts of time. Many of these tools have hundreds of corner cases already ironed out, and if you tried to tackle them yourself, you'd be stuck in a nearly endless exercise of yak shaving.

Thus I want to talk about the key steps in more detail: Building Debian packages, distributing them and installing them. And some notes on how to put them all together with existing tooling.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

comments / trackbacks

Mon, 04 Jan 2016

Automating Deployments: Why bother?


Permanent link

At my employer, we developed a new software architecture. This involved developing and deploying several new components, many of them following the same pattern: A daemon process listing on a message bus (RabbitMQ, in case you're wondering) and also talking to existing applications: A database, an Active Directory service, a NetApp cluster or a VCenter, you name it.

Shortly after the development of these components begun, it was decided that a different team than before should operate the software we developed. The new team, although dedicated and qualified, was also drowning in other work.

As we had them deploy the first few components, it became clear that each new deployment distracted them from doing what we wanted most: build the infrastructure that we and our software needed.

As programmers, automating thins is much of our daily business, so why not automate some steps? We already had a Jenkins running for executing tests, so the next step was to automate the builds.

Since our systems run Debian GNU/Linux, and we build our applications as Debian packages, distributing the software meant uploading it to an internal Debian mirror. This proved to be a trickier than expected, because we use debarchiver for managing the Debian repositories, which doesn't give immediate feedback if an upload was successful.

After that, a deployment involved only an apt-get update && apt-get install $package, which at first we left to the ops team, and later automated too - though in the production environment only after a manual trigger.

Many of the manual and automatic deployments failed, usually due to missing resources in the message bus, so we automated their generation as well.

Reduced feedback cycles

So at $work, automating deployments first was a means to save time, and a means to defend the architectural freedom to develop several smaller components instead of few small components. Later it became a means to improve reliability.

But it quickly also became a tool to reduce the time it takes to get feedback on new features. We found it notoriously hard to get people to use the staging environment to try out new features, so we decided to simply roll them out to production, and wait for complaints (or praise, though we get that less often).

Being able to quickly roll out a fix when a critical bug has managed to slip into the production environment not only proved useful now and then, but also gave us a feeling of safety.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

comments / trackbacks

Sun, 03 Jan 2016

Automating Deployments: A New Year and a Plan


Permanent link

I work as a software engineer and architect, and in the last year or so I also built automated deployment pipelines for our software. While I found it hard to get started, the end result and even the process of building them were immensely satisfying, and I learned a lot.

The memories of not knowing how to do things are fresh enough in my mind that I feel qualified to teach them to others. And I've been wanting to write a tech book for ages. So yes, here it comes.

For 2016 I am planning to write an ebook on automating deployments. It's going to be a practical guide, mostly using technologies I'm already familiar with, and also pointing out alternative technologies. And there will be enough theory to justify putting in the effort of learning about and implementing automated (and possibly continuous) deployments, and to justify the overall architecture.

I will be blogging about the topics that I want to be in the book, and later distill them into book chapters.

Here is a very rough outline of topics that I want to include, subject to future change:

  • Motivations for automating deployments
  • Requirements for automated/continuous deployments
  • Teaser: Using only git and bash as the simplest thing that could possibly work
  • Discussion of the previous example, and anatomy of a more complex deployment system
  • The build stage: Building Debian packages
  • Distributing Debian packages (with aptly)
  • Deployment to a staging environment with Ansible
  • Automated integration testing
  • Propagation to a production environment
  • Stitching it all together with Go CD

If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link

comments / trackbacks

Sun, 26 Apr 2015

Writing docs helps you take the user's perspective


Permanent link

This year, most of my contributions to Perl 6 have been to the documentation, or were directly inspired by writing the documentation.

Quite often when I write documentation, I start thinking things like this is a bit awkward to explain, wouldn't it be more consistent if ... or what happens when I use a negative number here? The implementation disallows it, but does it actually need to? or if I tell people to just pass this particular value most of the time, why not make it the default?.

Like most people who aspires to be a good programmer, I'm lazy. In particular, I hate doing pointless work. And documenting inconsistencies or missing default values or arbitrary restrictions definitively feels like doing work that shouldn't be necessary. So with a sigh I overcome my laziness, and try to fix stuff in the code, the tests, and sometimes the design docs so I can be more lazy in documenting the features. And of course, to make the overall experience more pleasant for the end user.

I've been skeptical of README-driven development in the past, dismissing it as part of the outdated (or at least for software not suitable) waterfall model, or as "no plan survives contact with the enemy". But now that I'm writing more docs, I see the value of writing docs early (of course with the provision that if things turn out to be impractical a documented, the docs may still be revised). Because it's very easy as a developer to lose the user's perspective, and writing docs makes it easier (at least for me) to look at the project from that perspective again.

Examples

With the philosophy part done, I'd like to bring some examples.

The missing default value

In Perl 6 land, we distinguish meta classes, which control behavior of a type, and representations, which control memory layout of objects.

Most Perl 6 objects have the representation P6opaque, which provides opaque, efficient storage for objects with attributes, properties, slots, or however you call per-object storage in your favorite language. Special representations exists for interfacing with C libraries, concurrency control and so on.

The class Metamodel::Primitives provides primitives for writing meta classes, with this method:

method create_type(Mu $how, $repr) { ... }

$how is our standard name for Meta stuff (from "Higher Order Workings", or simply from controlling how stuff works), and $repr is the name of the representation.

Somebody new to meta object stuff doesn't need to know much about representations (except when they want to very low-level stuff), so the docs for create_type could have said if you don't know what representation to use, use P6opaque. Or I could just establish P6opaque as a default:

method create_type(Mu $how, $repr = 'P6opaque') { ... }

There, less to document, and somebody new to this stuff can ignore the whole representations business for a while longer.

Arbitrary restrictions

The method rotor on the List was intended to create a list of sublists with fixed number of elements from the original list, potentially with overlap. So the old API was:

method rotor($elems = 2, $overlap = 1) { ... }

And one would use it a

.say for (1..7).rotor(3, 1);
# 1 2 3
# 3 4 5
# 5 6 7

Again I had an issue with default values: It wasn't clear to me why $elems defaulted to 2 (so I removed that default), or whe $overlap defaulted to 1. Wouldn't 0 be a more intuitive default?

But my main issue was that the implementation disallowed negative overlaps, and the design docs were silent on the issue. If you visualize how rotor works (take $elems elements from the list, then step back $overlap elements, then rinse and repeat), it's clear what negative overlaps mean: they are steps forward instead of backwards, and create gaps (that is, some list elements aren't included in the sublists).

And once you allow negative steps backwards, why not go work with steps forward in the first place, which are more intuitive to the user, and explicitly allow negative steps to create overlaps?

So that's what we did, though the end result is even more general.

The crucial question here was why disallow negative overlaps?, or recognizing that a restriction was arbitrary. And then lifting it.

Wording of error messages

Error messages are important to communicate why something went wrong.

We used to have the error message Could not find an appropriate parametric role variant for $role. A test for a good error message is: ask why?, and if the piece of code that threw the error can know the answer, the error message needs improving.

In this case: why can't the runtime environment find an appropriate variant? Because it didn't try hard enough? No. Because it's buggy? I hope not. It can't find the candidate because it's not there. So, include that answer in the error message: No appropriate parametric role variant available for $role.

(Uninformative/lazy error messages are one of my favorite topics for rants; consider the good old SIOCADDRT: No such process that route(8) sometimes emits, or python's Cannot import name X -- why not? ...)

So, write those docs. Write them at a time where you can still change semantics. Keep asking yourself what you could change so the documentation becomes shorter, sweeter, easier understandable.

[/perl-6] Permanent link

comments / trackbacks

Sat, 14 Mar 2015

Why is it hard to write a compiler for Perl 6?


Permanent link

Russian translation available; Пост доступен на сайте softdroid.net: Почему так трудно написать компилятор для Perl 6?.

Today's deceptively simple question on #perl6: is it harder to write a compiler for Perl 6 than for any other programming language?

The answer is simple: yes, it's harder (and more work) than for many other languages. The more involved question is: why?

So, let's take a look. The first point is organizational: Perl 6 isn't yet fully explored and formally specified; it's much more stable than it used to be, but less stable than, say, targeting C89.

But even if you disregard this point, and target the subset that for example the Rakudo Perl 6 compiler implements right now, or the wait a year and target the first Perl 6 language release, the point remains valid.

So let's look at some technical aspects.

Static vs. Dynamic

Perl 6 has both static and dynamic corners. For example, lexical lookups are statical, in the sense that they can be resolved at compile time. But that's not optional. For a compiler to properly support native types, it must resolve them at compile time. We also expect the compiler to notify us of certain errors at compile time, so there must be a fair amount of static analysis.

On the other hand, type annotations are optional pretty much anywhere, and methods are late bound. So the compiler must also support features typically found in dynamic languages.

And even though method calls are late bound, composing roles into classes is a compile time operation, with mandatory compile time analysis.

Mutable grammar

The Perl 6 grammar can change during a parse, for example by newly defined operators, but also through more invasive operations such as defining slangs or macros. Speaking of slangs: Perl 6 doesn't have a single grammar, it switches back and forth between the "main" language, regexes, character classes inside regexes, quotes, and all the other dialects you might think of.

Since the grammar extensions are done with, well, Perl 6 grammars, it forces the parser to be interoperable with Perl 6 regexes and grammars. At which point you might just as well use them for parsing the whole thing, and you get some level of minimally required self-hosting.

Meta-Object Programming

In a language like C++, the behavior of the object system is hard-coded into the language, and so the compiler can work under this assumption, and optimize the heck out of it.

In Perl 6, the object system is defined by other objects and classes, the meta objects. So there is another layer of indirection that must be handled.

Mixing of compilation and run time

Declarations like classes, but also BEGIN blocks and the right-hand side of constant declarations are run as soon as they are parsed. Which means the compiler must be able to run Perl 6 code while compiling Perl 6 code. And also the other way round, through EVAL.

More importantly, it must be able to run Perl 6 code before it has finished compiling the whole compilation unit. That means it hasn't even fully constructed the lexical pads, and hasn't initialized all the variables. So it needs special "static lexpads" to which compile-time usages of variables can fall back to. Also the object system has to be able to work with types that haven't been fully declared yet.

So, lots of trickiness involved.

Serialization, Repossession

Types are objects defined through their meta objects. That means that when you precompile a module (or even just the setting, that is, the mass of built-ins), the compiler has to serialize the types and their meta objects. Including closures. Do you have any idea how hard it is to correctly serialize closures?

But, classes are mutable. So another module might load a precompiled module, and add another method to it, or otherwise mess with it. Now the compiler has to serialize the fact that, if the second module is loaded, the object from the first module is modified. We say that the serialization context from the second module repossesses the type.

And there are so many ways in which this can go wrong.

General Featuritis

One of the many Perl 6 mottos is "torture the implementor on behalf of the user". So it demands not only both static and dynamic typing, but also functional features, continuations, exceptions, lazy lists, a powerful grammar engine, named arguments, variadic arguments, introspection of call frames, closures, lexical and dynamic variables, packed types (for direct interfacing with C libraries, for example), and phasers (code that is automatically run at different phases of the program).

All of these features aren't too hard to implement in isolation, but in combination they are a real killer. And you want it to be fast, right?

[/perl-6] Permanent link

comments / trackbacks

Sun, 22 Feb 2015

Profiling Perl 6 code on IRC


Permanent link

On the #perl6 IRC channel, we have a bot called camelia that executes small snippets of Perl 6 code, and prints the output that it produces. This is a pretty central part of our culture, and we use it to explain or demonstrate features or even bugs in the compiler.

Here is an example:

10:35 < Kristien> Can a class contain classes?
10:35 < Kristien> m: class A { class B { } }; say A.new.B.new
10:35 <+camelia> rakudo-moar 114659: OUTPUT«No such method 'B' for invocant of 
                 type 'A'␤  in block <unit> at /tmp/g81K8fr9eY:1␤␤»
10:35 < Kristien> :(
10:36 < raydiak> m: class A { class B { } }; say A::B.new
10:36 <+camelia> rakudo-moar 114659: OUTPUT«B.new()␤»

Yesterday and today I spent some time teaching this IRC bot to not only run the code, but optionally also run it through a profiler, to make it possible to determine where the virtual machine spends its time running the code. an example:

12:21 < moritz> prof-m: Date.today for ^100; say "done"
12:21 <+camelia> prof-m 9fc66c: OUTPUT«done␤»
12:21 <+camelia> .. Prof: http://p.p6c.org/453bbe

The Rakudo Perl 6 Compiler on the MoarVM backend has a profile, which produces a fancy HTML + Javascript page, and this is what is done. It is automatically uploaded to a webserver, producing this profile.

Under the hood, it started with a patch that makes it possible to specify the output filename for a profile run, and another one to clear up the fallout from the previous patch.

Then came the bigger part: setting up the Apache virtual host that serves the web files, including a restricted user that only allows up- and downloads via scp. Since the IRC bot can execute arbitrary code, it is very likely that an attacker can steal the private SSH keys used for authentication against the webserver. So it is essential that if those keys are stolen, the attacker can't do much more than uploading more files.

I used rssh for this. It is the login shell for the upload user, and configured to only allow scp. Since I didn't want the attacker to be able to modify the authorized_keys file, I configured rssh to use a chroot below the home directory (which sadly in turn requires a setuid-root wrapper around chroot, because ordinary users can't execute it. Well, nothing is perfect).

Some more patching and debugging later, the bot was ready.

The whole thing feels a bit bolted on; if usage warrants it, I'll see if I can make the code a bit prettier.

[/perl-6] Permanent link

comments / trackbacks

Fri, 06 Feb 2015

doc.perl6.org: some stats, future directions


Permanent link

In June 2012 I started the perl6/doc repository with the intent to collect/write API documentation for Perl 6 built-in types and routines. Not long afterwards, the website doc.perl6.org was born, generated from the aforementioned repository.

About 2.5 years later, the repository has seen more than one thousand commits from more than 40 contributors, 14 of which contributed ten patches or more. The documentation encompasses about 550 routines in 195 types, with 15 documents for other things than built-in types (for example an introduction to regexes, descriptions of how variables work).

In terms of subjective experience, I observed an increase in the number of questions on our IRC channel and otherwise that could be answered by pointing to the appropriate pages of doc.perl6.org, or augmenting the answer with a statement like "for more info, see ..."

While it's far from perfect, I think both the numbers and the experience is very encouraging, and I'd like to thank everybody who helped make that happen, often by contributing skills I'm not good at: front-end design, good English and gentle encouragement.

Plans for the Future

Being a community-driven project, I can't plan anybody else's time on it, so these are my own plans for the future of doc.perl6.org.

Infrastructural improvements

There are several unsolved problems with the web interface, with how we store our documents, and how information can be found. I plan to address them slowly but steadily.

  • The search is too much centered around types and routines, searching for variables, syntactic constructs and keywords isn't easily possible. I want it to find many more things than right now.
  • Currently we store the docs for each type in a separate file called Type.pod. Which will break when we start to document native types, which being with lower case letters. Having int.pod and Int.pod is completely unworkable on case-insensitive or case-preserving file system. I want to come up with a solution for that, though I don't yet know what it will look like.
  • doc.perl6.org is served from static pages, which leads to some problems with file names conflicting with UNIX conventions. You can't name a file infix:</>.html, and files with two consecutive dots in their names are also weird. So in the long run, we'll have to switch to some kind of dynamic URL dispatching, or a name escaping scheme that is capable of handling all of Perl 6's syntax.
  • Things like the list of methods and what they coerce to in class Cool don't show up in derived types; either the tooling needs to be improved for that, or they need to be rewritten to use the usual one-heading-per-method approach.

Content

Of course my plan is to improve coverage of the built-in types and routines, and add more examples. In addition, I want to improve and expand on the language documentation (for example syntax, OO, regexes, MOP), ideally documenting every Perl 6 feature.

Once the language features are covered in sufficient breadth and depth (though I won't wait for 100% coverage), I want to add three tutorial tracks:

  • A track for beginners
  • A quick-start for programmers from other languages
  • A series of intermediate to advanced guides covering topics such as parsing, how to structure a bigger application, the responsible use of meta programming, or reactive programming.

Of course I won't be able to do that all on my own, so I hope to convince my fellow and future contributors that those are good ideas.

Time to stop rambling about the future, and off to writing some docs, this is yours truly signing off.

[/perl-6] Permanent link

comments / trackbacks

Thu, 05 Feb 2015

All Perl 6 modules in a box


Permanent link

Sometimes when we change things in the Perl 6 language or the Rakudo Perl 6 compiler that implements it, we want to know if the planned changes will cause fallout in the library modules out there, and how much.

To get a quick estimate, we can now do a git grep in the experimental perl6-all-modules repository.

This is an attempt to get all the published module into a single git repository. It is built using git subrepo, an unofficial git extension module that I've been wanting to try for some time, and that seems to have some advantages over submodules in some cases. The notable one in this case being that git grep ignores submodules, but descends into subrepos just fine.

Here is the use case that made me create this repository: Rakudo accesses low-level operations through the nqp:: pseudo namespace. For example nqp::concat_s('a', 'b') is a low-level way to concatenate two strings. User-level programs can also use nqp:: ops, though it is generally a bad idea, because it ties the program to the particular compiler used, and what's more, the nqp:: ops are not part of the public API, and thus neither documented in the same place as the rest of Perl 6, nor are there any promises for stability attached.

So we want to require module authors to use a pragma, use nqp; in order to make their use of compiler internal explicit and deliberate. And of course, where possible, we want them to not use them at all :-)

To find out how many files in the ecosystem use nqp:: ops, a simple command, combined with the power of the standard UNIX tools, will help:

$ git grep -l 'nqp::'|wc -l
32

That's not too bad, considering we have... how many modules/distributions again?

Since they are added in author/repo structure, counting them with ls and wc isn't hard:

ls -1d */*/|wc -l
282

Ok, but number of files in relation to distributions isn't really useful. So let's ask: how many distributions directly use nqp:: ops?

$ git grep -l nqp:: | cut -d/ -f1,2 |sort -u|wc -l
23

23 out of 282 (or about 8%) distributions use the nqp:: syntax.

By the way, there is a tool (written in Perl 6, of course) to generate and update the repository. Not perfect yet, very much a work in progress. It's in the _tools folder, so you should probably filter out that directory in your queries (though in the examples above, it doesn't make a difference).

So, have fun with this new toy!

[/perl-6] Permanent link

comments / trackbacks