Tue, 25 Jul 2017
Writing a Systemd Supervised Service with Perl
Permanent link
There are many ways in which server software can fail. There are crashes, where the server process exits with an error. Program supervisors can catch those easily, and you can monitor for the presence of a server process.
But recently I had to deal with some processes that didn't just crash; they got stuck. It happens only rarely, which makes debugging harder. It involves AnyEvent, forking, the boundaries between synchronous and asynchronous code, and runloops getting stuck. I know the problem needs a much deeper solution, which will take weeks to implement.
So, there was a need for a much faster approach for at least detecting the stuck service, and possibly even restart it. And even if the problem was fixed, some monitoring wouldn't hurt.
Heartbeats
The standard approach to checking the aliveness of a process (or a connection) is a heartbeat. A heartbeat is a periodic action that a process performs; if the process fails to perform that action, a supervisor can pick up on that cue, and do an appropriate action. The action can be restarting the process, closing a TCP connection or so.
So, for a server process, what's a good heartbeat? The most basic approach is writing to a log file, or touching a file. The supervisor can then check for the up-to-dateness.
Systemd and Heartbeats
Since I already used Systemd for managing the service, I
wanted to see if systemd supported any heartbeats. It does,
and this
superuser post gives a great overview. In the context of
systemd, a watchdog needs to call the sd_notify
C function, which seems to live in the libsystemd.so
library. This communicates through some mysterious,
unknowable mechanism (actually just a UNIX socket) with systemd. To allow that
communication channel, the systemd unit file must include
the line NotifyAccess=main,
which allows the main process of the server to communicate
with systemd, or NotifyAccess=all
, which allows
subprocesses to also use sd_notify
.
The module Systemd::Daemon
module gives you access to sd_notify
in Perl.
A minimal Perl program that can be watchdog'ed looks like this:
#!/usr/bin/env perl use 5.020; use warnings; use strict; use Time::HiRes qw(usleep); use Systemd::Daemon qw( -hard notify ); my $sleep = ($ENV{WATCHDOG_USEC} // 2_000_000) / 2; $| = 1; notify( READY => 1 ); while (1) { usleep $sleep; say "watchdog"; notify( WATCHDOG => 1 ); }
If you forget the READY
notification, a
systemctl start $service
hangs (until it runs
into a timeout), and systemctl status $service
says Active: activating (start) since ...
. The
normal state is Active: active (running)
since
.
If the service misses its heartbeat, it looks like this
in the log (journalctl -u $service
; timestamps
and hostname stripped):
systemd[1]: testdaemon.service: Watchdog timeout (limit 10s)! systemd[1]: testdaemon.service: Main process exited, code=dumped, status=6/ABRT systemd[1]: testdaemon.service: Unit entered failed state. systemd[1]: testdaemon.service: Failed with result 'core-dump'. systemd[1]: testdaemon.service: Service hold-off time over, scheduling restart. systemd[1]: Stopped Testdaemon. systemd[1]: Starting Testdaemon... systemd[1]: Started Testdaemon.
And this is the corresponding unit file:
[Unit] Description=Testdaemon After=syslog.target network.target [Service] Type=notify NotifyAccess=main Restart=always WatchdogSec=10 User=moritz Group=moritz ExecStart=/home/moritz/testdaemon.pl [Install] WantedBy=multi-user.target
Relevant here are Type=notify
, which enables the
watchdog, Restart=always
as the restart policy,
and WatchdogSec=10
for 10 second period after
which the service restarts if no sd_notify
of
type WATCHDOG
occurred.
Systemd makes the WatchdogSec
setting
available as the environment variable
WATCHDOG_USEC
, converted to microseconds (so
multiplied by one million). If the server process aims to
report heartbeats twice as often as that wait period, small
timing errors should not lead to a missed heartbeat.
In my case, the WATCHDOG notification happens in an
AnyEvent->timer
callback, so if this doesn't
happen, either the event loop got stuck, or a blocking
operation prevents the event loop from running. The latter
should not happen (blocking operations are meant to run in
forked processes), so this adequately detects the error I
want to detect.
For the little functionality that I use, Systemd::Daemon is a pretty heavy dependency (using XS and quite a few build dependencies). After looking a reimplementation of the notify() protocol in python, I wonder if talking to the socket directly would have been less work than packaging Systemd::Daemon.
Summary
Systemd offers a heartbeat supervisor for processes that
manage it. It can automatically restart processes that fail
to check in regularly via calls to sd_notify
,
or doing the equivalent action on a socket. Perl's
Systemd::Daemon module gives you access to sd_notify in a
Perl server process.
Thu, 12 Aug 2010
What is "Modern Perl"?
Permanent link
These days you often hear term Modern Perl, as something new(ish), and much improved over the old ways.
But what is it exactly? Well, there's no proper definition, but here is what that term means to me:
It's a set of tools, ideas and attitudes that help you to write better Perl programs, and allows you to have more fun while writing them.
Here are some aspects of Modern Perl
- Testing: Most modern Perl modules have extensive test suites, that make development sane and robust
- Some built-ins now come with safer forms: the three-argument form of open() allows you to open files safely with arbitrary characters in them, without any extra precautions. Lexical file handles make things safer and easier too.
use strict; use warnings;
- Proper OO: with Perl 6 and with Moose in Perl 5, we have good object systems, that require less low-level fiddling than the standard Perl 5 object system
- Following Best Practices
- (For open source projects) Liberally handing out commit privileges. The source is stored in a version control system anyway, so low-quality changes or vandalism can simply be reverted (but that doesn't happen often in practice).
- Caring about marketing: do tell people that you built something cool and useful
- Small handy modules such as List::Util and Try::Tiny
- Development tools such as Devel::Cover and Devel::NYTProf
- (update) perlbrew and local::lib to help maintain your own perl installation and locally installed modules.
All of these techniques help to write scalable Perl programs by making proper encapsulation much easier, or by avoiding common errors, identifying performance bottlenecks etc.
Update: after watching some discussions about this post in various media, I should add a few more tools that I forgot about earlier:
- Devel::REPL, Read-Evaluation-Print Loop
- Perl::Critic, a code quality and style tester
- DBIx::Class, a modern and very popular object relational mapper
- Catalyst, a web framework based on the MVC pattern
Tue, 23 Mar 2010
In search of an exponential time regex
Permanent link
Somebody asked on IRC if it was dangerous to interpolate a user-supplied string into a regular expression in Perl. Since by defaul the re 'eval' pragma is disabled, there is no direct danger of malicious code execution.
But still there is danger: since the Perl regex engine does backtracking, it is possible to craft malicious regexes that take up much time. Very much time. So much time that if you don't do anything against it, it will eat up all your CPU - a denial of service attack.
Such a thing is best illustrated with an example - and I wanted one to
demonstrate exponential matching time. Easy, you'd think; the literature has
one: match the regex /(a*)*b/
against the string
"aaaa...a"
. It fails, but first tries to do search for many
possible combinations of how many a's to much with the first capture. You'd
think. A quick test reveals that this is not the case:
$ time perl -e ' ("a" x shift) ~~ /(a*)*b/' 100000000 real 0m0.286s user 0m0.194s sys 0m0.093s
This constructs a string of 10 million a's, and then matches the regex
against it - and it takes less than the third of a second. That's quite fast.
But why? The answer is that Perl does some clever optimizations. You can learn
a bit about the applied optimizations with the re 'debug';
pragma:
$ time perl -Mre=debug -e ' ("a" x shift) ~~ /(a*)*b/' 100000000 Compiling REx "(a*)*b" synthetic stclass "ANYOF[ab]". Final program: 1: CURLYX[0] {0,32767} (11) 3: OPEN1 (5) 5: STAR (8) 6: EXACT <a> (0) 8: CLOSE1 (10) 10: WHILEM[1/1] (0) 11: NOTHING (12) 12: EXACT <b> (14) 14: END (0) floating "b" at 0..2147483647 (checking floating) stclass ANYOF[ab] minlen 1 Guessing start of match in sv for REx "(a*)*b" against "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"... Did not find floating substr "b"... Match rejected by optimizer Freeing REx: "(a*)*b" real 0m0.361s user 0m0.223s sys 0m0.101s
This means the regex engine has found a literal b
in the
regex, and is clever enough to search for that in the string. Which is quite
fast, it just has to look at each character once. It doesn't find it, so it
knows that it can short-circuit all this expensive backtracking.
So how to fool the optimizer? By not using a literal string, but something
that's not so easy to detect -- a character class:
/(a*)*[bc]/
.
This fools the optimizer, and the regex runs slower - half a second for one million characters. Still not very impressive.
What's up here? If you look at the debugging output, and try to read the
structure of the compiled regex, you'll find that it actually corresponds to
/(a*)[bc]/
. The optimizer has dropped the second star, because it
prefers the longest possible match with the star, so never needs to quantify
the whole group more than once anyway.
So again I had to fool the optimizer somehow: for example by requiring at
least two matches of the first group: (a*){2,}[bc]
. However the
experiment shows that it still didn't backtrack exponentially.
The next trick finally worked: also give it an upper limit. That defies the optimizer for some reasons not quite transparent to me:
$ time perl -e ' ("a" x shift) ~~ /(.*){1,32000}[bc]/' 25 real 0m8.856s user 0m8.841s sys 0m0.008s
That's nearly 9 seconds for a 25 character string, extending the string length to 27 already takes 35 seconds, 30 characters took 276 seconds. I did not have the patience to try it with any higher numbers - but it's clearly growing fast enough to satisfy my curiosity.
So after some fiddling I did find a malicious regex, but it was not quite a trivial task.
Tue, 16 Jun 2009
Data driven programming
Permanent link
The other days somebody asked on IRC for help with this question:
He was looking for all 9 digit numbers that didn't contain a zero digit anywhere, and the first digit should be divisible by one, the number formed from the first two digits should be divisible by 2, the number formed from the first three digits should be divisible by 3 etc.
The first such number is 1232525616, because 1 can be divided by 1, 12 can be divided by 2, 123 can be divided by 3 etc.
Since finding one isn't really a challenge, let's focus on finding them all. The easiest approach is, of course, brute force. There are 109 numbers with 9 digits, that's quite a manageable number for a modern computer. On my laptop it takes perl about 45 seconds to loop over all 109 numbers, doing nothing else.
So all you've got to do is iterate over all numbers, test if they meet the criterion stated above, and print them out if they do:
use strict; use warnings; sub test { my $z = shift; for (2..9) { return if (substr($z, 0, $_) % $_ != 0); } return 1; } for (my $i = 1 x 9; length($i) == 9; $i += 9) { next if $i =~ /0/; if (test($i)) { print $i, $/; } }
This uses a small trick: the first number that doesn't
contain any zero is 111111111
, or short 1
x 9
.
It takes 4 minutes and 20 seconds, produces the right answer, and we're happy.
But it wastes a lot of resources, and wouldn't scale for larger numbers. So for the sake of fun I tried a few different optimizations.
The first one is quite simple: the second digit must always be even, otherwise the number consisting of the first two digits could not be even. Likewise the fifths digit must be 5 or 0 to ensure that it can be divided by 5. Since 0 is forbidden anyway, it has to be 5. So let's skip the expensive check if those conditions aren't fulfilled:
for (my $i = 1 x 9; length($i) == 9; $i += 9) { next if $i =~ /0/; next if $i =~ /^.[13579]/; next unless $i =~ /^....5/; if (test($i)) { print $i, $/; } }
This speeds up the computation to roughly a minute. Since 40 seconds are minimum for the iteration alone, it's nearly as good as it gets with this approach.
But of course there's still room for improvement: when the second digit is odd, the program iterates over a hundred million of numbers without finding one. Instead of skipping them each time, it would be much more efficient to generate the number digit by digit, checking divisibility at each step of the way.
for my $a (1..9) { for my $b (2, 4, 6, 8) { for my $c (1..9) { next if ($a + $b + $c) % 3; for my $d (2, 4, 6, 8) { next if (10 * $c + $d) % 4; my $e = 5; for my $f (2, 4, 6, 8) { next if ($a + $b + $c + $d + $e + $f) % 3; for my $g (1..9) { my $so_far = "$a$b$c$d$e$f$g"; next if $so_far % 7; for my $h (2, 4, 6, 8) { next if ($so_far . $h) % 8; for my $i (1..9) { my $so_far = $so_far . "$h$i"; next if $so_far % 9; print "$so_far$h\n"; } } } } } } } }
Wow, that's ugly code, but it works and it's fast. Very fast. 27 milliseconds, or more than 2000 times faster as the previous version.
Bug it contains lots of duplicated code, and again it wouldn't scale for finding larger numbers, this time because for digit a loop needs to be written.
Whenever I find myself repeating some piece of code a few times, but in slightly different forms, I try to pack the code into a data structure instead.
For each digit position there needs to a list of digit to try, and a test that determines if the newly added digit violates its divisibility rule.
use strict; use warnings; my @config = ( [[1..9], sub { 0 }], [[2, 4, 6, 8], sub { 0 }], [[1..9], sub { $_[0] % 3}], [[2, 4, 6, 8], sub { $_[0] % 4 }], [[5], sub { 0 }], [[2, 4, 6, 8], sub { $_[0] % 6 }], [[1..9], sub { $_[0] % 7 }], [[2, 4, 6, 8], sub { $_[0] % 8 }], [[1..9], sub { $_[0] % 9 }], );
The nested loops from the previous script can be emulated by recursion, passing the previous digits along (and the configuration for all digit positions that still need to be tested).
sub f { my ($so_far, $config) = @_; $config = [ @$config ]; if (!@$config) { print "$so_far\n"; } else { my $c = shift @$config; for my $current (@{$c->[0]}) { next if $c->[1]->($so_far . $current); f($so_far . $current, $config); } } } f('', \@config);
With 46ms runtime it's still acceptably fast, and much leaner than the nested loops.
Actually a rough sketch of the configuration table can also be generated automatically, and then optimized manually:
my @config; for my $n (1..9) { push @config, [ ($n % 2 ? [1..9] : [2, 4, 6, 8]), sub { $_[0] % $n }, ] } $config[4] = [[5], sub { 0 } ];
Now the script contains as little duplication as possible, is reasonably fast, and I'm happy.
(Just one final note: there's no scalability problem in this particular task, because it can't be extended to more than nine digits: if no zero is allowed, there won't be any numbers divisible by 10.)
Tue, 09 Jun 2009
Small Perl Helpers
Permanent link
CoreList
You want to use a module, but you're only allowed to use core modules? Or you want to recommend a module to somebody, and you know it's more likely that he'll use a module if it's in core (and thus he won't hand-roll his crappy CGI parser, and open a wide door to spammers)?
Module::CoreList is
the answer, and it comes with a very handy script called
corelist
:
$ corelist Unicode::Normalize Unicode::Normalize was first released with perl 5.007003 $ corelist DBI DBI was not in CORE (or so I think) # search with regexes $ corelist /Tie/ Pod::Simple::TiedOutFH was first released with perl 5.009003 Tie::Array was first released with perl 5.005 Tie::File was first released with perl 5.007003 Tie::Handle was first released with perl 5.00405 Tie::Hash was first released with perl 5.002 Tie::Hash::NamedCapture was first released with perl 5.009005 Tie::Memoize was first released with perl 5.007003 Tie::RefHash was first released with perl 5.004 Tie::Scalar was first released with perl 5.002 Tie::StdHandle was first released with perl 5.01 Tie::SubstrHash was first released with perl 5.002 TieHash was first released with perl 5
(empty lines sanitized; corelist emits an empty line after each module which is a bit annoying)
Timestamps
Not from CPAN, but a tiny script I wrote:
#!/usr/bin/perl use strict; use warnings; use Time::Local qw(timelocal); if (@ARGV) { my $date = shift @ARGV; if ($date =~ m/^(\d{4})-(\d\d)-(\d\d)$/){ my ($year, $month, $mday) = ($1, $2, $3); my ($hr, $min, $sec) = (0, 0, 0); my $time; if ($time = shift(@ARGV) and $time =~ m/^(\d{1,2}):(\d\d)(?::(\d\d))?$/) { ($hr, $min, $sec) = ($1, $2, $3||0); } print timelocal($sec, $min, $hr, $mday, $month - 1, $year - 1900), $/; } } else { print time, $/; }
If called with no argument it will print the current time as a Unix
timestamp. If called with one or two arguments, it will interpret these as a
date in the YYYY-MM-DD
and a time in HH:MM:SS
format
and print the corresponding timestamp.
Tue, 29 Jul 2008
Magic ARGV (considered harmful)
Permanent link
A bunch of threads on the perl5-porters mailing list made me aware of a rather obscure perl 5 feature.
I was quite familiar with the basic form, which goes like this:
while ( <> ) { print; }
This either reads from all files in turn that were given on the command
line, or from STDIN
if no file name was provided.
Now <>
actually uses open internally,
specifically the 2 argument form. Which means that anything that is valid as
the second argument to open
also works in @ARGV
:
local @ARGV = ('fortune |'); print while <>;
Instead of trying to open a file called fortune |
, it actually
executes the command fortune
and uses it output for the
<>
"diamond operator".
This feature can be used for quite some tricky and shiny stuff, but it's
also dangerous. If your perl program uses <>
, it is
vulnerable to the command line argument rm -rf * |
and similar
things, also known as "arbitrary code execution".
Therefore the perl porters are discussing about disabling this feature, and making the old, magic behaviour available with a command line switch or a pragma.
For me the consequence is "don't use <>
with possibly
untrusted input".
Sat, 26 Jul 2008
Dumping UTF-8 Data
Permanent link
The other day I wrote Perl6::Str, and
a small script that I called utf8-dump
helped a lot during
debugging:
$ echo Überhacker | utf8-dump \N{LATIN CAPITAL LETTER U WITH DIAERESIS}berhacker
It replaces all non-ASCII-characters with their Unicode name, in a form
that can be used in Perl 5 double quoted strings if use charnames
qw(:full)
is loaded first.
And this is how the script looks:
#!/usr/bin/perl use strict; use warnings; use charnames (); use Encode qw(decode_utf8); while (<>){ $_ = decode_utf8($_); s{([^\0-\177])}{N_escape($1)}eg; print; } sub N_escape { my $n = charnames::viacode(ord($_[0])); return defined($n) ? "\\N{$n}" : sprintf('\x{%x}', ord($_[0])); }
(Update 2010-04-19:) Added \x{...}
escapes for characters
which viacode
doesn't like.