Sun, 25 Jul 2010


Permanent link


"Perl 5 to 6" Lesson 28 - Currying


  use v6;
  my &f := &substr.assuming('Hello, World');
  say f(0, 2);                # He
  say f(3, 2);                # lo
  say f(7);                   # World
  say <a b c>.map: * x 2;     # aabbcc
  say <a b c>.map: *.uc;      # ABC
  for ^10 {
      print <R G B>.[$_ % *]; # RGBRGBRGBR


Currying or partial application is the process of generating a function from another function or method by providing only some of the arguments. This is useful for saving typing, and when you want to pass a callback to another function.

Suppose you want a function that lets you extract substrings from "Hello, World" easily. The classical way of doing that is writing your own function:

  sub f(*@a) {
      substr('Hello, World', |@a)

Currying with assuming

Perl 6 provides a method assuming on code objects, which applies the arguments passed to it to the invocant, and returns the partially applied function.

  my &f := &substr.assuming('Hello, World');

Now f(1, 2) is the same as substr('Hello, World', 1, 2).

assuming also works on operators, because operators are just subroutines with weird names. To get a subroutine that adds 2 to whatever number gets passed to it, you could write

  my &add_two := &infix:<+>.assuming(2);

But that's tedious to write, so there's another option.

Currying with the Whatever-Star

  my &add_two := * + 2;
  say add_two(4);         # 6

The asterisk, called Whatever, is a placeholder for an argument, so the whole expression returns a closure. Multiple Whatevers are allowed in a single expression, and create a closure that expects more arguments, by replacing each term * by a formal parameter. So * * 5 + * is equivalent to -> $a, $b { $a * 5 + $b }.

  my $c = * * 5 + *;
  say $c(10, 2);                # 52

Note that the second * is an infix operator, not a term, so it is not subject to Whatever-currying.

The process of lifting an expression with Whatever stars into a closure is driven by syntax, and done at compile time. This means that

  my $star = *;
  my $code = $star + 2

does not construct a closure, but instead dies with a message like

  Can't take numeric value for object of type Whatever

Whatever currying is more versatile than .assuming, because it allows to curry something else than the first argument very easily:

  say  ~(1, 3).map: 'hi' x *    # hi hihihi

This curries the second argument of the string repetition operator infix x, so it returns a closure that, when called with a numeric argument, produces the string hi as often as that argument specifies.

The invocant of a method call can also be Whatever star, so

  say <a b c>.map: *.uc;      # ABC

involves a closure that calls the uc method on its argument.


Perl 5 could be used for functional programming, which has been demonstrated in Mark Jason Dominus' book Higher Order Perl.

Perl 6 strives to make it even easier, and thus provides tools to make typical constructs in functional programming easily available. Currying and easy construction of closures is a key to functional programming, and makes it very easy to write transformation for your data, for example together with map or grep.


[/perl-5-to-6] Permanent link

Thu, 22 Jul 2010

Common Perl 6 data processing idioms

Permanent link


"Perl 5 to 6" Lesson 27 - Common Perl 6 data processing idioms


  # create a hash from a list of keys and values:
  # solution 1: slices
  my %hash; %hash{@keys} = @values;
  # solution 2: meta operators
  my %hash = @keys Z=> @values;

  # create a hash from an array, with
  # true value for each array item:
  my %exists = @keys X=> True;

  # limit a value to a given range, here 0..10.
  my $x = -2;
  say 0 max $x min 10;

  # for debugging: dump the contents of a variable,
  # including its name, to STDERR
  note :$x.perl;

  # sort case-insensitively
  say @list.sort: *.lc;

  # mandatory attributes
  class Something {
      has $.required = die "Attribute 'required' is mandatory";
  } => 2); # no error               # BOOM


Learning the specification of a language is not enough to be productive with it. Rather you need to know how to solve specific problems. Common usage patterns, called idioms, helps you not having to re-invent the wheel every time you're faced with a problem.

So here a some common Perl 6 idioms, dealing with data structures.


  # create a hash from a list of keys and values:
  # solution 1: slices
  my %hash; %hash{@keys} = @values;
  # solution 2: meta operators
  my %hash = @keys Z=> @values;

The first solution is the same you'd use in Perl 5: assignment to a slice. The second solution uses the zip operator Z, which joins to list like a zip fastener: 1, 2, 3 Z 10, 20, 30 is 1, 10, 2, 20, 3, 30. The Z=> is a meta operator, which combines zip with => (the Pair construction operator). So 1, 2, 3 Z=> 10, 20, 30 evaluates to 1 => 10, 2 => 20, 3 => 30. Assignment to a hash variable turns that into a Hash.

For existence checks, the values in a hash often doesn't matter, as long as they all evaluate to True in boolean context. In that case, a nice way to initialize the hash from a given array or list of keys is

  my %exists = @keys X=> True;

which uses the cross meta operator to use the single value True for every item in @keys.


Sometimes you want to get a number from somewhere, but clip it into a predefined range (for example so that it can act as an array index).

In Perl 5 you often end up with things like $a = $b > $upper ? $upper : $b, and another conditional for the lower limit. With the max and min infix operators, that simplifies considerably to

  my $in-range = $lower max $x min $upper;

because $lower max $x returns the larger of the two numbers, and thus clipping to the lower end of the range.

Since min and max are infix operators, you can also clip infix:

 $x max= 0;
 $x min= 10;


Perl 5 has Data::Dumper, Perl 6 objects have the .perl method. Both generate code that reproduces the original data structure as faithfully as possible.

:$var generates a Pair ("colonpair"), using the variable name as key (but with sigil stripped). So it's the same as var => $var. note() writes to the standard error stream, appending a newline. So note :$var.perl is quick way of obtaining the value of a variable for debugging; purposes, along with its name.


Like in Perl 5, the sort built-in can take a function that compares two values, and then sorts according to that comparison. Unlike Perl 5, it's a bit smarter, and automatically does a transformation for you if the function takes only one argument.

In general, if you want to compare by a transformed value, in Perl 5 you can do:

    # WARNING: Perl 5 code ahead
    my @sorted = sort { transform($a) cmp transform($b) } @values;

    # or the so-called Schwartzian Transform:
    my @sorted = map { $_->[1] }
                 sort { $a->[0] cmp $b->[0] }
                 map { [transform($_), $_] }

The former solution requires repetitive typing of the transformation, and executes it for each comparison. The second solution avoids that by storing the transformed value along with the original value, but it's quite a bit of code to write.

Perl 6 automates the second solution (and a bit more efficient than the naiive Schwartzian transform, by avoiding an array for each value) when the transformation function has arity one, ie accepts one argument only:

    my @sorted = sort &transform, @values;

Mandatory Attributes

The typical way to enforce the presence of an attribute is to check its presence in the constructor - or in all constructors, if there are many.

That works in Perl 6 too, but it's easier and safer to require the presence at the level of each attribute:

    has $.attr = die "'attr' is mandatory";

This exploits the default value mechanism. When a value is supplied, the code for generating the default value is never executed, and the die never triggers. If any constructor fails to set it, an exception is thrown.



[/perl-5-to-6] Permanent link

Thu, 09 Jul 2009

Exceptions and control exceptions

Permanent link


"Perl 5 to 6" Lesson 26 - Exceptions and control exceptions


    try {
        die "OH NOEZ";

        CATCH { 
            say "there was an error: $!";


Exceptions are, contrary to their name, nothing exceptional. In fact they are part of the normal control flow of programs in Perl 6.

Exceptions are generated either by implicit errors (for example calling a non-existing method, or type check failures) or by explicitly calling die or other functions.

When an exception is thrown, the program searches for CATCH statements or try blocks in the caller frames, unwinding the stack all the way (that means it forcibly returns from all routines called so far). If no CATCH or try is found, the program terminates, and prints out a hopefully helpful error message. If one was found, the error message is stored in the special variable $!, and the CATCH block is executed (or in the case of a try without a CATCH block the try block returns Any).

So far exceptions might still sound exceptional, but error handling is integral part of each non-trivial application. But even more, normal return statements also throw exceptions!

They are called control exceptions, and can be caught with CONTROL blocks, or are implicitly caught at each routine declaration.

Consider this example:

    use v6;

    sub s {
        my $block = -> { return "block"; say "still here" };
        return "sub";

    say s();    # block

Here the return "block" throws a control exception, causing it to not only exit the current block (and thus not printing still here on the screen), but also exiting the subroutine, where it is caught by the sub s... declaration. The payload, here a string, is handed back as the return value, and the say in the last line prints it to the screen.

Adding a CONTROL { ... } block to the scope in which $block is called causes it to catch the control exception.

Contrary to what other programming languages do, the CATCH/CONTROL blocks are within the scope in which the error is caught (not on the outside), giving it full access to the lexical variables, which makes it easier to generate useful error message, and also prevents DESTROY blocks from being run before the error is handled.

Unthrown exceptions

Perl 6 embraces the idea of multi threading, and in particular automated parallelization. To make sure that not all threads suffer from the termination of a single thread, a kind of "soft" exception was invented.

When a function calls fail($obj), it returns a special value of undef, which contains the payload $obj (usually an error message) and the back trace (file name and line number). Processing that special undefined value without check if it's undefined causes a normal exception to be thrown.

    my @files = </etc/passwd /etc/shadow nonexisting>;
    my @handles = hyper map { open($_) }, @files; # hyper not yet implement

In this example the hyper operator tells map to parallelize its actions as far as possible. When the opening of the nonexisting file fails, an ordinary die "No such file or directory" would also abort the execution of all other open operations. But since a failed open calls fail("No such file or directory" instead, it gives the caller the possibility to check the contents of @handles, and it still has access to the full error message.

If you don't like soft exceptions, you say use fatal; at the start of the program and cause all exceptions from fail() to be thrown immediately.


A good programming language needs exceptions to handle error conditions. Always checking return values for success is a plague and easily forgotten.

Since traditional exceptions can be poisonous for implicit parallelism, we needed a solution that combined the best of both worlds: not killing everything at once, and still not losing any information.

[/perl-5-to-6] Permanent link

Wed, 27 May 2009

The Cross Meta Operator

Permanent link


"Perl 5 to 6" Lesson 25 - The Cross Meta Operator


    for <a b> X 1..3 -> $a, $b {
        print "$a: $b   ";
    # output: a: 1  a: 2  a: 3  b: 1  b: 2  b: 3

    .say for <a b c> X 1, 2;
    # output: a 1 a 2 b 1 b 2 c 1 c 2
    # (with newlines instead of spaces)


The cross operator X returns the Cartesian product of two or more lists, which means that it returns all possible tuples where the first item is an item of the first list, the second item is an item of second list etc.

If an operator follows the X, then this operator is applied to all tuple items, and the result is returned instead. So 1, 2 X+ 3, 6 will return the values 1+3, 1+6, 2+3, 2+6 (evaluated as 4, 7, 5, 8 of course).


It's quite common that one has to iterate over all possible combinations of two or more lists, and the cross operator can condense that into a single iteration, thus simplifying programs and using up one less indentation level.

The usage as a meta operator can sometimes eliminate the loops altogether.


[/perl-5-to-6] Permanent link

Wed, 10 Dec 2008

The Reduction Meta Operator

Permanent link


"Perl 5 to 6" Lesson 24 - The Reduction Meta Operator


    say [+] 1, 2, 3;    # 6
    say [+] ();         # 0
    say [~] <a b>;      # ab
    say [**] 2, 3, 4;   # 2417851639229258349412352

    [\+] 1, 2, 3, 4     # 1, 3, 6, 10
    [\**] 2, 3, 4       # 4, 81, 2417851639229258349412352

    if [<=] @list {
        say "ascending order";


The reduction meta operator [...] can enclose any associative infix operator, and turn it into a list operator. This happens as if the operator was just put between the items of the list, so [op] $i1, $i2, @rest returns the same result as if it was written as $i1 op $i2 op @rest[0] op @rest[1] ....

This is a very powerful construct that promotes the plus + operator into a sum function, ~ into a join (with empty separator) and so on. It is somewhat similar to the List.reduce function, and if you had some exposure to functional programming, you'll probably know about foldl and foldr (in Lisp or Haskell). Unlike those, [...] respects the associativity of the enclosed operator, so [/] 1, 2, 3 is interpreted as (1 / 2) / 3 (left associative), [**] 1, 2, 3 is handled correctly as 1 ** (2**3) (right associative).

Like all other operators, whitespace are forbidden, so you while you can write [+], you can't say [ + ]. (This also helps to disambiguate it from array literals).

Since comparison operators can be chained, you can also write things like

    if    [==] @nums { say "all nums in @nums are the same" }
    elsif [<]  @nums { say "@nums is in strict ascending order" }
    elsif [<=] @nums { say "@nums is in ascending order"}

However you cannot reduce the assignment operator:

    my @a = 1..3;
    [=] @a, 4;          # Cannot reduce with = because list assignment operators are too fiddly

Getting partial results

There's a special form of this operator that uses a backslash like this: [\+]. It returns a list of the partial evaluation results. So [\+] 1..3 returns the list 1, 1+2, 1+2+3, which is of course 1, 3, 6.

    [\~] 'a' .. 'd'     # <a ab abc abcd>

Since right-associative operators evaluate from right to left, you also get the partial results that way:

    [\**] 1..3;         # 3, 2**3, 1**(2**3), which is 3, 8, 1

Multiple reduction operators can be combined:

    [~] [\**] 1..3;     # "381"


Programmers are lazy, and don't want to write a loop just to apply a binary operator to all elements of a list. List.reduce does something similar, but it's not as terse as the meta operator ([+] @list would be @list.reduce(&infix:<+>)). Also with reduce you have to takes care of the associativity of the operator yourself, whereas the meta operator handles it for you.

If you're not convinced, play a bit with it (rakudo implements it), it's real fun.


[/perl-5-to-6] Permanent link

Tue, 09 Dec 2008

Quoting and Parsing

Permanent link


"Perl 5 to 6" Lesson 23 - Quoting and Parsing


    my @animals = <dog cat tiger>
    # or
    my @animals = qw/dog cat tiger/;
    # or 
    my $interface = q{eth0};
    my $ips = q :s :x /ifconfig $interface/;

    # -----------

    sub if {
        warn "if() calls a sub\n";



Perl 6 has a powerful mechanism of quoting strings, you have exact control over what features you want in your string.

Perl 5 had single quotes, double quotes and qw(...) (single quotes, splitted on whitespaces) as well as the q(..) and qq(...) forms which are basically synonyms for single and double quotes.

Perl 6 in turn defines a quote operator named Q that can take various modifiers. The :b (backslash) modifier allows interpolation of backslash escape sequences like \n, the :s modifier allows interpolation of scalar variables, :c allows the interpolation of closures ("1 + 2 = { 1 + 2 }") and so on, :w splits on words as qw/.../ does.

You can arbitrarily combine those modifiers. For example you might wish a form of qw/../ that interpolates only scalars, but nothing else? No problem:

    my $stuff = "honey";
    my @list = Q :w :s/milk toast $stuff with\tfunny\nescapes/;
    say @list[*-1];                     # with\nfunny\nescapes

Here's a list of what modifiers are available, mostly stolen from S02 directly. All of these also have long names, which I omitted here.

        :q          Interpolate \\, \q and \'
        :b          Other backslash escape sequences like \n, \t
        :x          Execute as shell command, return result
        :w          Split on whitespaces
        :ww         Split on whitespaces, with quote protection
    Variable interpolation
        :s          Interpolate scalars   ($stuff)
        :a          Interpolate arrays    (@stuff[])
        :h          Interpolate hashes    (%stuff{})
        :f          Interpolate functions (&stuff())
        :c          Interpolate closures  ({code})
        :qq         Interpolate with :s, :a, :h, :f, :c, :b
        :regex      parse as regex

There are some short forms which make life easier for you:

    q       Q:q
    qq      Q:qq
    m       Q:regex

You can also omit the first colon : if the quoting symbol is a short form, and write it as a singe word:

    symbol      short for
    qw          q:w
    Qw          Q:w
    qx          q:x
    Qc          Q:c
    # and so on.

However there is one form that does not work, and some Perl 5 programmers will miss it: you can't write qw(...) with the round parenthesis in Perl 6. It is interpreted as a call to sub qw.


This is where parsing comes into play: Every construct of the form identifier(...) is parsed as sub call. Yes, every.


is parsed as a call to sub if. You can disambiguate with whitespace:

    if ($x < 3) { say '<3' }

Or just omit the parens altogether:

    if $x < 3 { say '<3' }

This implies that Perl 6 has no keywords. Actually there are keywords like use or if, but they are not reserved in the sense that identifiers are restricted to non-keywords.


Various combinations of the quoting modifiers are already used internally, for example q:w to parse <...>, and :regex for m/.../. It makes sense to expose these also to the user, who gains flexibility, and can very easily write macros that provide a shortcut for the exact quoting semantics he wants.

And when you limit the specialty of keywords, you have far less troubles with backwards compatibility if you want to change what you consider a "keyword".


[/perl-5-to-6] Permanent link

Mon, 08 Dec 2008

The State of the implementations

Permanent link


"Perl 5 to 6" Lesson 22 - The State of the implementations




Note: This lesson is long outdated, and preserved for historical interest only. The best way to stay informed about various Perl 6 compilers is to follow the blogs at

Perl 6 is a language specification, and multiple compilers are being written that aim to implement Perl 6, and partially they already do.


Pugs is a Perl 6 compiler written in Haskell. It was started by Audrey Tang, and she also did most of the work. In terms of implemented features it might still be the most advanced implementation today (May 2009).

To build and test pugs, you have to install GHC 6.10.1 first, and then run

    svn co
    cd pugs
    perl Makefile.PL
    make test

That will install some Haskell dependencies locally and then build pugs. For make test you might need to install some Perl 5 modules, which you can do with cpan Task::Smoke.

Pugs hasn't been developed during the last three years, except occasional clean-ups of the build system.

Since the specification is evolving and Pugs is not updated, it is slowly drifting into obsoleteness.

Pugs can parse most common constructs, implements object orientation, basic regexes, nearly(?) all control structures, basic user defined operators and macros, many builtins, contexts (except slice context), junctions, basic multi dispatch and the reduction meta operator - based on the syntax of three years past.


Rakudo is a parrot based compiler for Perl 6. The main architect is Patrick Michaud, many features were implemented by Jonathan Worthington.

It is hosted on github, you can find build instructions on

Rakudo development is very active, it's the most active Perl 6 compiler today. It passes a bit more than 17,000 tests from the official test suite (July 2009).

It implements most control structures, most syntaxes for number literals, interpolation of scalars and closures, chained operators, BEGIN- and END blocks, pointy blocks, named, optional and slurpy arguments, sophisticated multi dispatch, large parts of the object system, regexes and grammars, Junctions, generic types, parametric roles, typed arrays and hashes, importing and exporting of subroutines and basic meta operators.

If you want to experiment with Perl 6 today, Rakudo is the recommended choice.


Mitchell Charity started elf, a bootstrapping compiler written in Perl 6, with a grammar written in Ruby. Currently it has a Perl 5 backend, others are in planning.

It lives in the pugs repository, once you've checked it out you can go to misc/elf/ and run ./elf_f $filename. You'll need ruby-1.9 and some perl modules, about which elf will complain bitterly when they are not present.

elf is developed in bursts of activity followed by weeks of low activity, or even none at all.

It parses more than 70% of the test suite, but implements mostly features that are easy to emulate with Perl 5, and passes about 700 tests from the test suite.


Flavio Glock started KindaPerl6 (short kp6), a mostly bootstrapped Perl 6 compiler. Since the bootstrapped version is much too slow to be fun to develop with, it is now waiting for a faster backend.

Kp6 implements object orientation, grammars and a few distinct features like lazy gather/take. It also implements BEGIN blocks, which was one of the design goals.

v6 is a source filter for Perl 5. It was written by Flavio Glock, and supports basic Perl 6 plus grammars. It is fairly stable and fast, and is occasionally enhanced. It lives on the CPAN and in the pugs repository in perl5/*/.


Smop stands for Simple Meta Object Programming and doesn't plan to implement all of Perl 6, it is designed as a backend (a little bit like parrot, but very different in both design and feature set). Unlike the other implementations it aims explicitly at implementing Perl 6's powerful meta object programming facilities, ie the ability to plug in different object systems.

It is implemented in C and various domain specific languages. It was designed and implemented by Daniel Ruoso, with help from Yuval Kogman (design) and Paweł Murias (implementation, DSLs). A grant from The Perl Foundation supports its development, and it currently approaches the stage where one could begin to emit code for it from another compiler.

It will then be used as a backend for either elf or kp6, and perhaps also for pugs.

Larry Wall wrote a grammar for Perl 6 in Perl 6. He also wrote a cheating script named gimme5, which translates that grammar to Perl 5. It can parse about every written and valid piece of Perl 6 that we know of, including the whole test suite (apart from a few failures now and then when Larry accidentally broke something). lives in the pugs repository, and can be run and tested with perl-5.10.0 installed in /usr/local/bin/perl and a few perl modules (like YAML::XS and Moose):

    cd src/perl6/
    make testt      # warning: takes lot of time, 80 minutes or so
    ./tryfile $your_file

It correctly parses custom operators and warns about non-existent subs, undeclared variables and multiple declarations of the same variable as well as about some Perl 5isms.


Many people ask why we need so many different implementations, and if it wouldn't be better to focus on one instead.

There are basically three answers to that.

Firstly that's not how programming by volunteers work. People sometimes either want to start something with the tools they like, or they think that one aspect of Perl 6 is not sufficiently honoured by the design of the existing implementations. Then they start a new project.

The second possible answer is that the projects explore different areas of the vast Perl 6 language: SMOP explores meta object programming (from which Rakudo will also benefit), Rakudo and parrot care a lot about efficient language interoperability, grammars and platform independence, kp6 explored BEGIN blocks, and pugs was the first implementation to explore the syntax, and many parts of the language for the first time.

The third answer is that we don't want a single point of failure. If we had just one implementation, and had severe problems with one of them for unforeseeable reasons (technical, legal, personal, ...) we have possible fallbacks.


Pugs:,,, source:


Elf: source: see pugs, misc/elf/.

KindaPerl6: source: see pugs, v6/v6-KindaPerl6. source: see pugs, perl5/. source: see pugs, src/perl6/.

[/perl-5-to-6] Permanent link

Sun, 07 Dec 2008

Subset Types

Permanent link


"Perl 5 to 6" Lesson 21 - Subset Types


    subset Squares of Real where { .sqrt.Int**2 == $_ };

    multi sub square_root(Squares $x --> Int) {
        return $x.sqrt.Int;
    multi sub square_root(Real $x --> Real) {
        return $x.sqrt;


Java programmers tend to think of a type as either a class or an interface (which is something like a crippled class), but that view is too limited for Perl 6. A type is more generally a constraint of what a values a container can constraint. The "classical" constraint is it is an object of a class X or of a class that inherits from X. Perl 6 also has constraints like the class or the object does role Y, or this piece of code returns true for our object. The latter is the most general one, and is called a subset type:

    subset Even of Int where { $_ % 2 == 0 }
    # Even can now be used like every other type name

    my Even $x = 2;
    my Even $y = 3; # type mismatch error

(Try it out, Rakudo implements subset types).

You can also use anonymous subtypes in signatures:

    sub foo (Int where { ... } $x) { ... }
    # or with the variable at the front:
    sub foo ($x of Int where { ... } ) { ... }


Allowing arbitrary type constraints in the form of code allows ultimate extensibility: if you don't like the current type system, you can just roll your own based on subset types.

It also makes libraries easier to extend: instead of dying on data that can't be handled, the subs and methods can simply declare their types in a way that "bad" data is rejected by the multi dispatcher. If somebody wants to handle data that the previous implementation rejected as "bad", he can simple add a multi sub with the same name that accepts the data. For example a math library that handles real numbers could be enhanced this way to also handle complex numbers.

[/perl-5-to-6] Permanent link

Sat, 06 Dec 2008

A grammar for (pseudo) XML

Permanent link


"Perl 5 to 6" Lesson 20 - A grammar for (pseudo) XML


    grammar XML {
        token TOP   { ^ <xml> $ };
        token xml   { <text> [ <tag> <text> ]* };
        token text {  <-[<>&]>* };
        rule tag   {
            '<'(\w+) <attributes>*
                | '/>'                 # a single tag
                | '>'<xml>'</' $0 '>'  # an opening and a closing tag
        token attributes { \w+ '="' <-["<>]>* '"' };


So far the focus of these articles has been the Perl 6 language, independently of what has been implemented so far. To show you that it's not a purely fantasy language, and to demonstrate the power of grammars, this lesson shows the development of a grammar that parses basic XML, and that runs with Rakudo.

Please follow the instructions on to obtain and build Rakudo, and try it out yourself.

Our idea of XML

For our purposes XML is quite simple: it consists of plain text and nested tags that can optionally have attributes. So here are few tests for what we want to parse as valid "XML", and what not:

    my @tests = (
        [1, 'abc'                       ],      # 1
        [1, '<a></a>'                   ],      # 2
        [1, '..<ab>foo</ab>dd'          ],      # 3
        [1, '<a><b>c</b></a>'           ],      # 4
        [1, '<a href="foo"><b>c</b></a>'],      # 5
        [1, '<a empty="" ><b>c</b></a>' ],      # 6
        [1, '<a><b>c</b><c></c></a>'    ],      # 7
        [0, '<'                         ],      # 8
        [0, '<a>b</b>'                  ],      # 9
        [0, '<a>b</a'                   ],      # 10
        [0, '<a>b</a href="">'          ],      # 11
        [1, '<a/>'                      ],      # 12
        [1, '<a />'                     ],      # 13

    my $count = 1;
    for @tests -> $t {
        my $s = $t[1];
        my $M = XML.parse($s);
        if !($M  xor $t[0]) {
            say "ok $count - '$s'";
        } else {
            say "not ok $count - '$s'";

This is a list of both "good" and "bad" XML, and a small test script that runs these tests by calling XML.parse($string). By convention the rule that matches what the grammar should match is named TOP.

(As you can see from test 1 we don't require a single root tag, but it would be trivial to add this restriction).

Developing the grammar

The essence of XML is surely the nesting of tags, so we'll focus on the second test first. Place this at the top of the test script:

    grammar XML {
        token TOP   { ^ <tag> $ }
        token tag   {
            '<' (\w+) '>'
            '</' $0   '>'

Now run the script:

    $ ./perl6
    not ok 1 - 'abc'
    ok 2 - '<a></a>'
    not ok 3 - '..<ab>foo</ab>dd'
    not ok 4 - '<a><b>c</b></a>'
    not ok 5 - '<a href="foo"><b>c</b></a>'
    not ok 6 - '<a empty="" ><b>c</b></a>'
    not ok 7 - '<a><b>c</b><c></c></a>'
    ok 8 - '<'
    ok 9 - '<a>b</b>'
    ok 10 - '<a>b</a'
    ok 11 - '<a>b</a href="">'
    not ok 12 - '<a/>'
    not ok 13 - '<a />'

So this simple rule parses one pair of start tag and end tag, and correctly rejects all four examples of invalid XML.

The first test should be easy to pass as well, so let's try this:

   grammar XML {
       token TOP   { ^ <xml> $ };
       token xml   { <text> | <tag> };
       token text  { <-[<>&]>*  };
       token tag   {
           '<' (\w+) '>'
           '</' $0   '>'

(Remember, <-[...]> is a negated character class.)

And run it:

    $ ./perl6
    ok 1 - 'abc'
    not ok 2 - '<a></a>'
    (rest unchanged)

Why in the seven hells did the second test stop working? The answer is that Rakudo doesn't do longest token matching yet (update 2013-01: it does now), but matches sequentially. <text> matches the empty string (and thus always), so <text> | <tag> never even tries to match <tag>. Reversing the order of the two alternations would help.

But we don't just want to match either plain text or a tag anyway, but random combinations of both of them:

    token xml   { <text> [ <tag> <text> ]*  };

([...] are non-capturing groups, like (?: ... ) is in Perl 5).

And low and behold, the first two tests both pass.

The third test, ..<ab>foo</ab>dd, has text between opening and closing tag, so we have to allow that next. But not only text is allowed between tags, but arbitrary XML, so let's just call <xml> there:

    token tag   {
        '<' (\w+) '>'
        '</' $0   '>'

    ok 1 - 'abc'
    ok 2 - '<a></a>'
    ok 3 - '..<ab>foo</ab>dd'
    ok 4 - '<a><b>c</b></a>'
    not ok 5 - '<a href="foo"><b>c</b></a>'
    (rest unchanged)

We can now focus on attributes (the href="foo" stuff):

    token tag   {
        '<' (\w+) <attribute>* '>'
        '</' $0   '>'
    token attribute {
        \w+ '="' <-["<>]>* \"

But this doesn't make any new tests pass. The reason is the blank between the tag name and the attribute. Instead of adding \s+ or \s* in many places we'll switch from token to rule, which implies the :sigspace modifier:

    rule tag   {
        '<'(\w+) <attribute>* '>'
    token attribute {
        \w+ '="' <-["<>]>* \"

Now all tests pass, except the last two:

    ok 1 - 'abc'
    ok 2 - '<a></a>'
    ok 3 - '..<ab>foo</ab>dd'
    ok 4 - '<a><b>c</b></a>'
    ok 5 - '<a href="foo"><b>c</b></a>'
    ok 6 - '<a empty="" ><b>c</b></a>'
    ok 7 - '<a><b>c</b><c></c></a>'
    ok 8 - '<'
    ok 9 - '<a>b</b>'
    ok 10 - '<a>b</a'
    ok 11 - '<a>b</a href="">'
    not ok 12 - '<a/>'
    not ok 13 - '<a />'

These contain un-nested tags that are closed with a single slash /. No problem to add that to rule tag:

    rule tag   {
        '<'(\w+) <attribute>* [
            | '/>'
            | '>' <xml> '</'$0'>'

All tests pass, we're happy, our first grammar works well.

More hacking

Playing with grammars is much more fun that reading about playing, so here's what you could implement:

  • plain text can contain entities like &amp;
  • I don't know if XML tag names are allowed to begin with a number, but the current grammar allows that. You might look it up in the XML specification, and adapt the grammar if needed.
  • plain text can contain <![CDATA[ ... ]]> blocks, in which xml-like tags are ignored and < and the like don't need to be escaped
  • Real XML allows a preamble like <?xml version="0.9" encoding="utf-8"?> and requires one root tag which contains the rest (You'd have to change some of the existing test cases)
  • You could try to implement a pretty-printer for XML by recursively walking through the match object $/. (This is non-trivial; you might have to work around a few Rakudo bugs, and maybe also introduce some new captures).

(Please don't post solutions to this as comments in this blog; let others have the same fun as you had ;-).

Have fun hacking.


It's powerful and fun


Regexes are specified in great detail in S05:

More working examples for grammars can be found at (check file lib/JSON/Tiny/

[/perl-5-to-6] Permanent link

Sun, 30 Nov 2008

Regexes strike back

Permanent link


"Perl 5 to 6" Lesson 19 - Regexes strike back


    # normal matching:
    if 'abc' ~~ m/../ {
        say $/;                 # ab

    # match with implicit :sigspace modifier
    if 'ab cd ef'  ~~ ms/ (..) ** 2 / {
        say $0[1];              # cd

    # substitute with the :samespace modifier
    my $x = "abc     defg";
    $x ~~ ss/c d/x y/;
    say $x;                     # abx     yefg


Since the basics of regexes are already covered in lesson 07, here are some useful (but not very structured) additional facts about Regexes.


You don't need to write grammars to match regexes, the traditional form m/.../ still works, and has a new brother, the ms/.../ form, which implies the :sigspace modifier. Remember, that means that whitespaces in the regex are substituted by the <.ws> rule.

The default for the rule is to match \s+ if it is surrounded by two word-characters (ie those matching those \w), and \s* otherwise.

In substitutions the :samespace modifier takes care that whitespaces matched with the ws rule are preserved. Likewise the :samecase modifier, short :ii (since it's a variant of :i) preserves case.

    my $x = 'Abcd';
    $x ~~ s:ii/^../foo/;
    say $x;                     # Foocd
    $x = 'ABC';
    $x ~~ s:ii/^../foo/;
    say $x                      # FOOC

This is very useful if you want to globally rename your module Foo, to Bar, but for example in environment variables it is written as all uppercase. With the :ii modifier the case is automatically preserved.

It copies case information on a character by character. But there's also a more intelligent version; when combined with the :sigspace (short :s) modifier, it tries to find a pattern in the case information of the source string. Recognized are .lc, .uc, .lc.ucfirst, .uc.lcfirst and .lc.capitaliz (Str.capitalize uppercases the first character of each word). If such a pattern is found, it is also applied to the substitution string.

    my $x = 'The Quick Brown Fox';
    $x ~~ s :s :ii /brown.*/perl 6 developer/;
    # $x is now 'The Quick Perl 6 Developer'


Alternations are still formed with the single bar |, but it means something else than in Perl 5. Instead of sequentially matching the alternatives and taking the first match, it now matches all alternatives in parallel, and takes the longest one.

    'aaaa' ~~ m/ a | aaa | aa /;
    say $/                          # aaa

While this might seem like a trivial change, it has far reaching consequences, and is crucial for extensible grammars. Since Perl 6 is parsed using a Perl 6 grammar, it is responsible for the fact that in ++$a the ++ is parsed as a single token, not as two prefix:<+> tokens.

The old, sequential style is still available with ||:

    grammar Math::Expression {
        token value {
            | <number>
            | '(' 
              [ ')' || { fail("Parenthesis not closed") } ]


The { ... } execute a closure, and calling fail in that closure makes the expression fail. That branch is guaranteed to be executed only if the previous (here the ')') fails, so it can be used to emit useful error messages while parsing.

There are other ways to write alternations, for example if you "interpolate" an array, it will match as an alternation of its values:

    $_ = '12 oranges';
    my @fruits = <apple orange banana kiwi>;
    if m:i:s/ (\d+) (@fruits)s? / {
        say "You've got $0 $1s, I've got { $0 + 2 } of them. You lost.";

There is yet another construct that automatically matches the longest alternation: multi regexes. They can be either written as multi token name or with a proto:

    grammar Perl {
        proto token sigil { * }
        token sigil:sym<$> { <sym> }
        token sigil:sym<@> { <sym> }
        token sigil:sym<%> { <sym> }

       token variable { <sigil> <twigil>? <identifier> }

This example shows multiple tokens called sigil, which are parameterized by sym. When the short name, ie sigil is used, all of these tokens are matched in an alternation. You may think that this is a very inconvenient way to write an alternation, but it has a huge advantage over writing '$'|'@'|'%': it is easily extensible:

    grammar AddASigil is Perl {
        token sigil:sym<!> { <sym> }
    # wow, we have a Perl 6 grammar with an additional sigil!

Likewise you can override existing alternatives:

    grammar WeirdSigil is Perl {
        token sigil:sym<$> { '°' }

In this grammar the sigil for scalar variables is °, so whenever the grammar looks for a sigil it searches for a ° instead of a $, but the compiler will still know that it was the regex sigil:sym<$> that matched it.

In the next lesson you'll see the development of a real, working grammar with Rakudo.

[/perl-5-to-6] Permanent link

Sat, 29 Nov 2008


Permanent link


"Perl 5 to 6" Lesson 18 - Scoping


    for 1 .. 10 -> $a {
        # $a visible here
    # $a not visible here

    while my $b = get_stuff() {
        # $b visible here
    # $b still visible here

    my $c = 5;
        my $c = $c;
        # $c is undef here
    # $c is 5 here

    my $y;
    my $x = $y + 2 while $y = calc();
    # $x still visible


Lexical Scoping

Scoping in Perl 6 is quite similar to that of Perl 5. A Block introduces a new lexical scope. A variable name is searched in the innermost lexical scope first, if it's not found it is then searched for in the next outer scope and so on. Just like in Perl 5 a my variable is a proper lexical variable, and an our declaration introduces a lexical alias for a package variable.

But there are subtle differences: variables are exactly visible in the rest of the block where they are declared, variables declared in block headers (for example in the condition of a while loop) are not limited to the block afterwards.

Also Perl 6 only ever looks up unqualified names (variables and subroutines) in lexical scopes.

If you want to limit the scope, you can use formal parameters to the block:

    if calc() -> $result {
        # you can use $result here
    # $result not visible here

Variables are visible immediately after they are declared, not at the end of the statement as in Perl 5.

    my $x = .... ;
            $x visible here in Perl 6
            but not in Perl 5

Dynamic scoping

The local adjective is now called temp, and if it's not followed by an initialization the previous value of that variable is used (not undef).

There's also a new kind of dynamically scoped variable called a hypothetical variable. If the block is left with an exception or a false value,, then the previous value of the variable is restored. If not, it is kept:

    use v6;

    my $x = 0;

    sub tryit($success) {
        let $x = 42;
        die "Not like this!" unless $success;
        return True;

    tryit True;
    say $x;             # 42

    $x = 0;
    try tryit False;
    say $x;             # 0

Context variables

Some variables that are global in Perl 5 ($!, $_) are context variables in Perl 6, that is they are passed between dynamic scopes.

This solves an old Problem in Perl 5. In Perl 5 an DESTROY sub can be called at a block exit, and accidentally change the value of a global variable, for example one of the error variables:

   # Broken Perl 5 code here:
   sub DESTROY { eval { 1 }; }
   eval {
       my $x = bless {};
       die "Death\n";
   print $@ if $@;         # No output here

In Perl 6 this problem is avoided by not implicitly using global variables.

(In Perl 5.14 there is a workaround that protects $@ from being modified, thus averting the most harm from this particular example.)


If a variable is hidden by another lexical variable of the same name, it can be accessed with the OUTER pseudo package

    my $x = 3;
        my $x = 10;
        say $x;             # 10
        say $OUTER::x;      # 3
        say OUTER::<$x>     # 3

Likewise a function can access variables from its caller with the CALLER and CONTEXT pseudo packages. The difference is that CALLER only accesses the scope of the immediate caller, CONTEXT works like UNIX environment variables (and should only be used internally by the compiler for handling $_, $! and the like). To access variables from the outer dynamic scope they must be declared with is context.


It is now common knowledge that global variables are really bad, and cause lots of problems. We also have the resources to implement better scoping mechanism. Therefore global variables are only used for inherently global data (like %*ENV or $*PID).

The block scoping rules haven been greatly simplified.

Here's a quote from Perl 5's perlsyn document; we don't want similar things in Perl 6:

 NOTE: The behaviour of a "my" statement modified with a statement
 modifier conditional or loop construct (e.g. "my $x if ...") is
 undefined.  The value of the "my" variable may be "undef", any
 previously assigned value, or possibly anything else.  Don't rely on
 it.  Future versions of perl might do something different from the
 version of perl you try it out on.  Here be dragons.


S04 discusses block scoping:

S02 lists all pseudo packages and explains context scoping:

[/perl-5-to-6] Permanent link

Fri, 28 Nov 2008


Permanent link


"Perl 5 to 6" Lesson 17 - Unicode




Perl 5's Unicode model suffers from a big weakness: it uses the same type for binary and for text data. For example if your program reads 512 bytes from a network socket, it is certainly a byte string. However when (still in Perl 5) you call uc on that string, it will be treated as text. The recommended way is to decode that string first, but when a subroutine receives a string as an argument, it can never surely know if it had been encoded or not, ie if it is to be treated as a blob or as a text.

Perl 6 on the other hand offers the type buf, which is just a collection of bytes, and Str, which is a collection of logical characters.

Logical character is still a vague term. To be more precise a Str is an object that can be viewed at different levels: Byte, Codepoint (anything that the Unicode Consortium assigned a number to is a codepoint), Grapheme (things that visually appear as a character) and CharLingua (language defined characters).

For example the string with the hex bytes 61 cc 80 consists of three bytes (obviously), but can also be viewed as being consisting of two codepoints with the names LATIN SMALL LETTER A (U+0041) and COMBINING GRAVE ACCENT (U+0300), or as one grapheme that, if neither my blog software nor your browser kill it, looks like this: à.

So you can't simply ask for the length of a string, you have to ask for a specific length:


There's also method named chars, which returns the length in the current Unicode level (which can be set by a pragma like use bytes, and which defaults to graphemes).

In Perl 5 you sometimes had the problem of accidentally concatenating byte strings and text strings. If you should ever suffer from that problem in Perl 6, you can easily identify where it happens by overloading the concatenation operator:

  sub GLOBAL::infix:<~> is deep (Str $a, buf $b)|(buf $b, Str $a) {
      die "Can't concatenate text string «"
          ~ $a.encode("UTF-8")
            "» with byte string «$b»\n";

Encoding and Decoding

The specification of the IO system is very basic and does not yet define any encoding and decoding layers, which is why this article has no useful SYNOPSIS section. I'm sure that there will be such a mechanism, and I could imagine it will look something like this:

  my $handle = open($filename, :r, :encoding<UTF-8>);

Regexes and Unicode

Regexes can take modifiers that specify their Unicode level, so m:codes/./ will match exactly one codepoint. In the absence of such modifiers the current Unicode level will be used.

Character classes like \w (match a word character) behave accordingly to the Unicode standard. There are modifiers that ignore case (:i) and accents (:a), and modifiers for the substitution operators that can carry case information to the substitution string (:samecase and :sameaccent, short :ii, :aa).


It is quite hard to correctly process strings with most tools and most programming languages these days. Suppose you have a web application in perl 5, and you want to break long words automatically so that they don't mess up your layout. When you use naive substr to do that, you might accidentally rip graphemes apart.

Perl 6 will be the first mainstream programming language with built in support for grapheme level string manipulation, which basically removes most Unicode worries, and which (in conjunction with regexes) makes Perl 6 one of the most powerful languages for string processing.

The separate data types for text and byte strings make debugging and introspection quite easy.


[/perl-5-to-6] Permanent link

Thu, 27 Nov 2008


Permanent link


"Perl 5 to 6" Lesson 16 - Enums


  enum Bool <False True>;
  my $value = $arbitrary_value but True;
  if $value {
      say "Yes, it's true";       # will be printed

  enum Day ('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun');
  if custom_get_date().Day == Day::Sat | Day::Sun {
      say "Weekend";


Enums are versatile beasts. They are low-level classes that consist of an enumeration of constants, typically integers or strings (but can be arbitrary).

These constants can act as subtypes, methods or normal values. They can be attached to an object with the but operator, which "mixes" the enum into the value:

  my $x = $today but Day::Tue;

You can also use the type name of the Enum as a function, and supply the value as an argument:

  $x = $today but Day($weekday);

Afterwards that object has a method with the name of the enum type, here Day:

  say $x.Day;             # 1

The value of first constant is 0, the next 1 and so on, unless you explicitly provide another value with pair notation:

  enum Hackers (:Larry<Perl>, :Guido<Python>, :Paul<Lisp>);

You can check if a specific value was mixed in by using the versatile smart match operator, or with .does:

  if $today ~~ Day::Fri {
      say "Thank Christ it's Friday"
  if $today.does(Fri) { ... }

Note that you can specify the name of the value only (like Fri) if that's unambiguous, if it's ambiguous you have to provide the full name Day::Fri.


Enums replace both the "magic" that is involved with tainted variables in Perl 5 and the return "0 but True" hack (a special case for which no warning is emitted if used as a number). Plus they give a Bool type.

Enums also provide the power and flexibility of attaching arbitrary meta data for debugging or tracing.


[/perl-5-to-6] Permanent link

Tue, 21 Oct 2008


Permanent link


"Perl 5 to 6" Lesson 15 - Twigils


  class Foo {
      has $.bar;
      has $!baz;

  my @stuff = sort { $^b[1] <=> $^a[1]}, [1, 2], [0, 3], [4, 8];
  my $block = { say "This is the named 'foo' parameter: $:foo" };

  say "This is file $?FILE on line $?LINE"

  say "A CGI script" if %*ENV<DOCUMENT_ROOT>:exists;


Some variables have a second sigil, called twigil. It basically means that the variable isn't "normal", but differs in some way, for example it could be differently scoped.

You've already seen that public and private object attributes have the . and ! twigil respectively; they are not normal variables, they are tied to self.

The ^ twigil removes a special case from perl 5. To be able to write

  # beware: perl 5 code
  sort { $a <=> $b } @array

the variables $a and $b are special cased by the strict pragma. In Perl 6, there's a concept named self-declared positional parameter, and these parameters have the ^ twigil. It means that they are positional parameters of the current block, without being listed in a signature. The variables are filled in lexicographic (alphabetic) order:

  my $block = { say "$^c $^a $^b" };
  $block(1, 2, 3);                # 3 1 2

So now you can write

  @list = sort { $^b <=> $^a }, @list;
  # or:
  @list = sort { $^foo <=> $^bar }, @list;

Without any special cases.

And to keep the symmetry between positional and named arguments, the : twigil does the same for named parameters, so these lines are roughly equivalent:

  my $block = { say $:stuff }
  my $sub   = sub (:$stuff) { say $stuff }

Using both self-declared parameters and a signature will result in an error, as you can only have one of the two.

The ? twigil stands for variables and constants that are known at compile time, like $?LINE for the current line number (formerly __LINE__), and $?DATA is the file handle to the DATA section.

Contextual variables can be accessed with the * twigil, so $*IN and $*OUT can be overridden dynamically.

A pseudo twigil is <, which is used in a construct like $<capture>, where it is a shorthand for $/<capture>, which accesses the Match object after a regex match.


When you read Perl 5's perlvar document, you can see that it has far too many variables, most of them global, that affect your program in various ways.

The twigils try to bring some order in these special variables, and at the other hand they remove the need for special cases. In the case of object attributes they shorten self.var to $.var (or @.var or whatever).

So all in all the increased "punctuation noise" actually makes the programs much more consistent and readable.

[/perl-5-to-6] Permanent link

Mon, 20 Oct 2008

The MAIN sub

Permanent link


"Perl 5 to 6" Lesson 14 - The MAIN sub


  # file

  sub MAIN($path, :$force, :$recursive, :$home = '~/') {
      # do stuff here

  # command line
  $ ./ --force --home=/home/someoneelse file_to_process


Calling subs and running a typical Unix program from the command line is visually very similar: you can have positional, optional and named arguments.

You can benefit from it, because Perl 6 can process the command line for you, and turn it into a sub call. Your script is normally executed (at which time it can munge the command line arguments stored in @*ARGS), and then the sub MAIN is called, if it exists.

If the sub can't be called because the command line arguments don't match the formal parameters of the MAIN sub, an automatically generated usage message is printed.

Command line options map to subroutine arguments like this:

  -name                   :name
  -name=value             :name<value>

  # remember, <...> is like qw(...)
  --hackers=Larry,Damian  :hackers<Larry Damian>  

  --good_language         :good_language
  --good_lang=Perl        :good_lang<Perl>
  --bad_lang PHP          :bad_lang<PHP>

  +stuff                  :!stuff
  +stuff=healthy                                        :stuff<healthy> but False

The $x = $obj but False means that $x is a copy of $obj, but gives Bool::False in boolean context.

So for simple (and some not quite simple) cases you don't need an external command line processor, but you can just use sub MAIN for that.


The motivation behind this should be quite obvious: it makes simple things easier, similar things similar, and in many cases reduces command line processing to a single line of code: the signature of MAIN.

SEE ALSO contains the specification.

[/perl-5-to-6] Permanent link