Tue, 05 Jan 2016

Automating Deployments: Simplistic Deployment with Git and Bash


Permanent link

One motto of the Extreme Programming movement is to do the simplest thing that can possibly work, and only get more fancy when it is necessary.

In this spirit, the simplest deployment option for some projects is to change the working directory in a clone of the project's git repository, and run

git pull

If this works, it has a certain beauty of mirroring pretty much exactly what developers do in their development environment.

Reality kicks in

But it only works if all of these conditions are met:

  • There is already a checkout of the git repository, and it's configured correctly.
  • There are no local changes in the git repository.
  • There were no forced updates in the remote repository.
  • No additional build or test step is required.
  • The target machine has git installed, and both network connection to and credentials for the git repository server.
  • The presence of the .git directory poses no problem.
  • No server process needs to be restarted.
  • No additional dependencies need to be installed.

As an illustration on how to attack some of these problems, let's consider just the second point: local modifications in the git repository. It happens, for example when people try out things, or do emergency fixes etc. git pull does a fetch (which is fine), and a merge. Merging is an operation that can fail (for example if local uncommitted changes or local commits exists) and that requires manual intervention.

Manual changes are a rather bad thing to have in an environment where you want to deploy automatically. Their presence leave you two options: discard them, or refuse to deploy. If you chose the latter approach, git pull --ff-only is a big improvement; this will only do the merge if it is a trivial fast-forward merge, that is a merge where the local side didn't change at all. If that's not the case (that is, a local commit exists), the command exits with a non-zero return value, which the caller should interpret as a failure, and report the error somehow. If it's called as part of a cron job, the standard approach is to send an email containing the error message.

If you chose to discard the changes instead, you could do a git stash for getting rid of uncommitted changes (and at the same time preserving them for a time in the deps of the .git directory for later inspection), and doing a reset or checkout instead of the merge, so that the command sequence would read:

set -e
git fetch origin
git checkout --force origin/master

(This puts the local repository in a detached head state, which tends make manual working with it unpleasant; but at this point we have reserve this copy of the git repository for deployment only; manual work should be done elsewhere).

More Headaches

For very simple projects, using the git pull approach is fine. For more complex software, you have to tackle each of these problems, for example:

  • Clone the git repo first if no local copy exists
  • Discard local changes as discussed above (or remove the old copy, and always clone anew)
  • Have a separate checkout location (possibly on a different server), build and test there.
  • Copy the result over to the destination machine (but exclude the .git dir).
  • Provide a way to declare dependencies, and install them before doing the final copy step.
  • Provide a way to restart services after the copying

So you could build all these solutions -- or realize that they exist. Having a dedicated build server is an established pattern, and there are lot of software solutions for dealing with that. As is building a distributable software package (like .deb or .rpm packages), for which distribution systems exist -- the operating system vendors use it all the time.

Once you build Debian packages, the package manager ensure that dependencies are installed for you, and the postinst scripts provide a convenient location for restarting services.

If you chose that road, you gets lots of established tooling that wasn't explicitly mentioned above, but which often makes live much easier: Querying the database of existing packages, listing installed versions, finding which package a file comes from, extra security through package signing and signature verification, the ability to create meta packages, linter that warn about common packaging mistakes, and so on.

I'm a big fan of reusing existing solutions where it makes sense, and I feel this is a space where reusing can save huge amounts of time. Many of these tools have hundreds of corner cases already ironed out, and if you tried to tackle them yourself, you'd be stuck in a nearly endless exercise of yak shaving.

Thus I want to talk about the key steps in more detail: Building Debian packages, distributing them and installing them. And some notes on how to put them all together with existing tooling.


I'm writing a book on automating deployments. If this topic interests you, please sign up for the Automating Deployments newsletter. It will keep you informed about automating and continuous deployments. It also helps me to gauge interest in this project, and your feedback can shape the course it takes.

Subscribe to the Automating Deployments mailing list

* indicates required

[/automating-deployments] Permanent link