Categories

Posts in this category

Sun, 11 Dec 2016

Perl 6 By Example: Datetime Conversion for the Command Line


Permanent link

This blog post is part of my ongoing project to write a book about Perl 6.

If you're interested, either in this book project or any other Perl 6 book news, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).


Occasionally I work with a database that stores dates and datetimes as UNIX timestamps, aka the number of seconds since midnight 1970-01-01. Unlike the original author of the database and surrounding code, I cannot convert between UNIX timestamps and human readable date formats in my head, so I write tools for that.

Our goal here is to write a small tool that converts back and forth between UNIX timestamps and dates/times:

$ autotime 2015-12-24
1450915200
$ autotime 2015-12-24 11:23:00
1450956180
$ autotime 1450915200
2015-12-24
$ autotime 1450956180
2015-12-24 11:23:00

Libraries To The Rescue

Date and Time arithmetics are surprisingly hard to get right, and at the same time rather boring, hence I'm happy to delegate that part to libraries.

Perl 6 ships with DateTime (somewhat inspired by the Perl 5 module of the same name) and Date (mostly blatantly stolen from Perl 5's Date::Simple module) in the core library. Those two will do the actual conversion, so we can focus on the input and output, and detecting the formats to decide in which direction to convert.

For the conversion from a UNIX timestamp to a date or datetime, the DateTime.new constructor comes in handy. It has a variant that accepts a single integer as a UNIX timestamp:

$ perl6 -e "say DateTime.new(1450915200)"
2015-12-24T00:00:00Z

Looks like we're almost done with one direction, right?

#!/usr/bin/env perl6
sub MAIN(Int $timestamp) {
    say DateTime.new($timestamp)
}

Let's run it:

$ autotime 1450915200
Invalid DateTime string '1450915200'; use an ISO 8601 timestamp (yyyy-mm-ddThh:mm:ssZ or yyyy-mm-ddThh:mm:ss+01:00) instead
  in sub MAIN at autotime line 2
  in block <unit> at autotime line 2

Oh my, what happened? It seems that the DateTime constructor seems to view the argument as a string, even though the parameter to sub MAIN is declared as an Int. How can that be? Let's add some debugging output:

#!/usr/bin/env perl6
sub MAIN(Int $timestamp) {
    say $timestamp.^name;
    say DateTime.new($timestamp)
}

Running it now with the same invocation as before, there's an extra line of output before the error:

IntStr

$thing.^name is a call to a method of the meta class of $thing, and name asks it for its name. In other words, the name of the class. IntStr is a subclass of both Int and Str, which is why the DateTime constructor legitimately considers it a Str. The mechanism that parses command line arguments before they are passed on to MAIN converts the string from the command line to IntStr instead of Str, in order to not lose information in case we do want to treat it as a string.

Cutting a long story short, we can force the argument into a "real" integer by adding a + prefix, which is the general mechanism for conversion to a numeric value:

#!/usr/bin/env perl6
sub MAIN(Int $timestamp) {
    say DateTime.new(+$timestamp)
}

A quick test shows that it now works:

$ ./autotime-01.p6 1450915200
2015-12-24T00:00:00Z

The output is in the ISO 8601 timestamp format, which might not be the easiest on the eye. For a date (when hour, minute and second are zero), we really want just the date:

#!/usr/bin/env perl6
sub MAIN(Int $timestamp) {
    my $dt = DateTime.new(+$timestamp);
    if $dt.hour == 0 && $dt.minute == 0 && $dt.second == 0 {
        say $dt.Date;
    }
    else {
        say $dt;
    }
}

Better:

$ ./autotime 1450915200
2015-12-24

But the conditional is a bit clunky. Really, three comparisons to 0?

Perl 6 has a neat little feature that lets you write this more compactly:

if all($dt.hour, $dt.minute, $dt.second) == 0 {
    say $dt.Date;
}

all(...) creates a Junction, a composite value of several other values, that also stores a logical mode. When you compare a junction to another value, that comparison automatically applies to all the values in the junction. The if statement evaluates the junction in a boolean context, and in this case only returns True if all comparisons returned True as well.

Other types of junctions exist: any, all, none and one. Considering that 0 is the only integer that is false in a boolean context, we could even write the statement above as:

if none($dt.hour, $dt.minute, $dt.second) {
    say $dt.Date;
}

Neat, right?

But you don't always need fancy language constructs to write concise programs. In this case, approaching the problem from a slightly different angle yields even shorter and clearer code. If the DateTime object round-trips a conversion to Date and back to DateTime without loss of information, it's clearly a Date:

if $dt.Date.DateTime == $dt {
    say $dt.Date;
}
else {
    say $dt;
}

DateTime Formatting

For a timestamp that doesn't resolve to a full day, the output from our script currently looks like this:

2015-12-24T00:00:01Z

where "Z" indicates the UTC or "Zulu" timezone.

Instead I'd like it to be

2015-12-24 00:00:01

The DateTime class supports custom formatters, so let's write one:

sub MAIN(Int $timestamp) {
    my $dt = DateTime.new(+$timestamp, formatter => sub ($o) {
            sprintf '%04d-%02d-%02d %02d:%02d:%02d',
                    $o.year, $o.month,  $o.day,
                    $o.hour, $o.minute, $o.second,
    });
    if $dt.Date.DateTime == $dt {
        say $dt.Date;
    }
    else {
        say $dt.Str;
    }
}

Now the output looks better:

 ./autotime 1450915201
2015-12-24 00:00:01

The syntax formatter => ... in the context of an argument denotes a named argument, which means the name and not position in the argument list decides which parameter to bind to. This is very handy if there are a bunch of parameters.

I don't like the code anymore, because the formatter is inline in the DateTime.new(...) call, which I find unclear.

Let's make this a separate routine:

#!/usr/bin/env perl6
sub MAIN(Int $timestamp) {
    sub formatter($o) {
        sprintf '%04d-%02d-%02d %02d:%02d:%02d',
                $o.year, $o.month,  $o.day,
                $o.hour, $o.minute, $o.second,
    }
    my $dt = DateTime.new(+$timestamp, formatter => &formatter);
    if $dt.Date.DateTime == $dt {
        say $dt.Date;
    }
    else {
        say $dt.Str;
    }
}

Yes, you can put a subroutine declaration inside the body of another subroutine declaration; a subroutine is just an ordinary lexical symbol, like a variable declared with my.

In the line my $dt = DateTime.new(+$timestamp, formatter => &formatter);, the syntax &formatter refers to the subroutine as an object, without calling it.

This being Perl 6, formatter => &formatter has a shortcut: :&formatter. As a general rule, if you want to fill a named parameter whose name is the name of a variable, and whose value is the value of the variable, you can create it by writing :$variable. And as an extension, :thing is short for thing => True.

Looking the Other Way

Now that the conversion from timestamps to dates and times works fine, let's look in the other direction. Our small tool needs to parse the input, and decide whether the input is a timestamp, or a date and optionally a time.

The boring way would be to use a conditional:

sub MAIN($input) {
    if $input ~~ / ^ \d+ $ / {
        # convert from timestamp to date/datetime
    }
    else {
        # convert from date to timestamp

    }
}

But I hate boring, so I want to look at a more exciting (end extensible) approach.

Perl 6 supports multiple dispatch. That means you can have multiple subroutines with the same name, but different signatures. And Perl 6 automatically decides which one to call. You have to explicitly enable this feature by writing multi sub instead of sub, so that Perl 6 can catch accidental redeclaration for you.

Let's see it in action:

#!/usr/bin/env perl6

multi sub MAIN(Int $timestamp) {
    sub formatter($o) {
        sprintf '%04d-%02d-%02d %02d:%02d:%02d',
                $o.year, $o.month,  $o.day,
                $o.hour, $o.minute, $o.second,
    }
    my $dt = DateTime.new(+$timestamp, :&formatter);
    if $dt.Date.DateTime == $dt {
        say $dt.Date;
    }
    else {
        say $dt.Str;
    }
}


multi sub MAIN(Str $date) {
    say Date.new($date).DateTime.posix
}

Let's see it in action:

$ ./autotime 2015-12-24
1450915200
$ ./autotime 1450915200
Ambiguous call to 'MAIN'; these signatures all match:
:(Int $timestamp)
:(Str $date)
  in block <unit> at ./autotime line 17

Not quite what I had envisioned. The problem is again that the integer argument is converted automatically to IntStr, and both the Int and the Str multi (or candidate) accept that as an argument.

The easiest approach to avoiding this error is narrowing down the kinds of strings that the Str candidate accepts. The classical approach would be to have a regex that roughly validates the incoming argument:

multi sub MAIN(Str $date where /^ \d+ \- \d+ \- \d+ $ /) {
    say Date.new($date).DateTime.posix
}

And indeed it works, but why duplicate the logic that Date.new already has for validating date strings? If you pass a string argument that doesn't look like a date, you get such an error:

Invalid Date string 'foobar'; use yyyy-mm-dd instead

We can use this behavior in constraining the string parameter of the MAIN multi candidate:

multi sub MAIN(Str $date where { try Date.new($_) }) {
    say Date.new($date).DateTime.posix
}

The additional try in here is because subtype constraints behind a where are not supposed to throw an exception, just return a false value.

And now it works as intended:

$ ./autotime 2015-12-24;
1450915200
$ ./autotime 1450915200
2015-12-24

Dealing With Time

The only feature left to implement is conversion of date and time to a timestamp. In other words, we want to handle calls like autotime 2015-12-24 11:23:00:

multi sub MAIN(Str $date where { try Date.new($_) }, Str $time?) {
    my $d = Date.new($date);
    if $time {
        my ( $hour, $minute, $second ) = $time.split(':');
        say DateTime.new(date => $d, :$hour, :$minute, :$second).posix;
    }
    else {
        say $d.DateTime.posix;
    }
}

The new second argument is optional by virtue of the trailing ?. If it is present, we split the time string on the colon to get hour, minute and second. My first instinct while writing this code was to use shorter variable names, my ($h, $m, $s) = $time.split(':'), but then the call to the DateTime constructor would have looked like this:

DateTime.new(date => $d, hour => $h, minute => $m, second => $s);

So the named arguments to the constructor made me choose more self-explanatory variable names.

So, this works:

./autotime 2015-12-24 11:23:00
1450956180

And we can check that it round-trips:

$ ./autotime 1450956180
2015-12-24 11:23:00

Tighten Your Seat Belt

Now that the program is feature complete, we should strive to remove some clutter, and explore a few more awesome Perl 6 features.

The first feature that I want to exploit is that of an implicit variable or topic. A quick demonstration:

for 1..3 {
    .say
}

produces the output

1
2
3

There is no explicit iteration variable, so Perl implicitly binds the current value of the loop to a variable called $_. The method call .say is a shortcut for $_.say. And since there is a subroutine that calls six methods on the same variable, using $_ here is a nice visual optimization:

sub formatter($_) {
    sprintf '%04d-%02d-%02d %02d:%02d:%02d',
            .year, .month,  .day,
            .hour, .minute, .second,
}

If you want to set $_ in a lexical scope without resorting to a function definition, you can use the given VALUE BLOCK construct:

given DateTime.new(+$timestamp, :&formatter) {
    if .Date.DateTime == $_ {
        say .Date;
    }
    else {
        .say;
    }
}

And Perl 6 also offers a shortcut for conditionals on the $_ variable, which can be used as a generalized switch statement:

given DateTime.new(+$timestamp, :&formatter) {
    when .Date.DateTime == $_ { say .Date }
    default { .say }
}

If you have a read-only variable or parameter, you can do without the $ sigil, though you have to use a backslash at declaration time:

multi sub MAIN(Int \timestamp) {
    ...
    given DateTime.new(+timestamp, :&formatter) {
    ...
    }
}

So now the full code looks like this:

#!/usr/bin/env perl6

multi sub MAIN(Int \timestamp) {
    sub formatter($_) {
        sprintf '%04d-%02d-%02d %02d:%02d:%02d',
                .year, .month,  .day,
                .hour, .minute, .second,
    }
    given DateTime.new(+timestamp, :&formatter) {
        when .Date.DateTime == $_ { say .Date }
        default { .say }
    }
}

multi sub MAIN(Str $date where { try Date.new($_) }, Str $time?) {
    my $d = Date.new($date);
    if $time {
        my ( $hour, $minute, $second ) = $time.split(':');
        say DateTime.new(date => $d, :$hour, :$minute, :$second).posix;
    }
    else {
        say $d.DateTime.posix;
    }
}

MAIN magic

The magic that calls sub MAIN for us also provides us with an automagic usage message if we call it with arguments that don't fit any multi, for example with no arguments at all:

$ ./autotime
Usage:
  ./autotime <timestamp>
  ./autotime <date> [<time>]

We can add a short description to these usage lines by adding semantic comments before the MAIN subs:

#!/usr/bin/env perl6

#| Convert timestamp to ISO date
multi sub MAIN(Int \timestamp) {
    ...
}

#| Convert ISO date to timestamp
multi sub MAIN(Str $date where { try Date.new($_) }, Str $time?) {
    ...
}

Now the usage message becomes:

$ ./autotime
Usage:
  ./autotime <timestamp> -- Convert timestamp to ISO date
  ./autotime <date> [<time>] -- Convert ISO date to timestamp

Summary

We've seen a bit of Date and DateTime arithmetic, but the exciting part is multiple dispatch, named arguments, subtype constraints with where clauses, given/when and the implicit $_ variable, and some serious magic when it comes to MAIN subs.

Subscribe to the Perl 6 book mailing list

* indicates required

[/perl-6] Permanent link