Posts in this category

Sun, 19 Oct 2008

Perl 6 Tidings from September and October 2008

Specification

Perl 6 has got two new operators.

Series, again and again

The series operator ... (yes, literally three dots) lazily constructs infinite series of objects. It takes a list on the left and a closure on the right, and calls the closure to determine the next values. It is best introduced by a few examples:

my @even   = 0, 2, 4 ... { $_ + 2 };
my @fib    = 1, 1, 2 ... { $^a + $^b };
my @powers = 1, 2, 4 ... { $_ * 2 };

These examples use a few items on the left for clarity, but it's only necessary to supply as many values as the closure expects.

also

The second new operator is infix:<also>, which constructs all-junctions like &, but guarantees order of execution.

Its primary purpose is to allow tests with follow-up tests that might otherwise throw exceptions.

ok('abc' ~~ m/<identifier>/ also $<identifier>.chars == 3);

New regex feature for matching delimited constructs

While trying to generate a syntax tree from his Perl 6 parser STD.pm, Larry noticed one construct that was repeated all over the grammar, and introduced unnecessary submatches. It looks like this:

rule argument_list {
    '(' <semilist> [ ')' || <panic: "Can't find closing ')'"> ]
}

This can now be replaced with this short construct:

rule postcircumfix:sym<( )> {
    '(' ~ ')' <semilist>
}

The ~ meta character is a parser combinator inspired by Haskell's parsec. It basically sets up a parsing goal (in this example the closing paren) and then executes the following subrule. If the goal can't be found, a nice error message is given.

To produce even nicer error messages on parse failures, the new :dba (stands for "do business as") adverb can give rules a human-understandable name (human = not a compiler implementer, but a mere mortal Perl 6 hacker).

If the example rule above failed to match, it would say Error while parsing postcircumfix:sym<( )>: can't find closing ')'. Adding a :dba<argument list> would replace the unfriendly postcircumfix blah with... well, you get the picture.

Tests part of the language?

There's also (sic) ongoing talk about moving the testing capabilities that are now in Test.pm into the language core. A likely scenario is that there will be a :ok adverb on comparison operators that has roughly the same semantics the the ok() sub, but enables better diagnosis messages in case of failure.

# Warning: hypothetical syntax ahead
'abc' ~~ m/a(.)b/        :ok<basic regex match works>;
'abc' ~~ String          :ok<type check>;
2.rand <= 2              :ok<Int.rand stays in limit>;

To achieve this, all comparison operators would need to get a new multi that accepts the named (mandatory) argument :ok. That seems like a big change, but most (or even all) could be generated automatically, and the candidate lists for the multis could be pre-computed mostly at compile time, so no (runtime) performance hit is expected.

Enhance the Match object by token list

Currently the Match object (the thing that's returned from a regex match) gives you no easy access to the sub-matches in the order in which they occured in the string, which means it's quite hard to extract information from it that the writer of the original regex didn't think about making available.

In particular I tried to (ab)use the match tree from STD.pm to write a syntax hilighter, and found it to be a rather daunting task. So I suggested to add such sequential information to the Match object. Larry liked the idea, because he knows what pain the Perl 5 parser causes because it forgets information too quickly (actually it never builds a full parse tree, it generates the optree on the fly. That's efficient, but makes introspection very hard).

However it's not entirely clear if it will be added, and if yes, in which form. Unclear are (a) performance impact, (b) access syntax and (c) symmetry.

(b) and (c) need more explanation: The match object already has list semantics for accessing positional captures, so you can't make the sequential chunks available through an array index. The most simple solution is composition, ie a method< code>$/.splits or $/.tokens that returns such a list.

But that breaks a fundamental symmetry that now exists between match objects and captures (ie argument lists). Both can have a scalar component (the object that make returned/the method invocant), list components (positional captures/positional arguments) and hash components (named captures/named arguments). Introducing a way of accessing the components of a match object in a completely different order breaks that symmetry. We're not yet entirely clear on what that means for the language.

Implementations

Rakudo

Jerry "particle" Gay implemented the is export on subroutines, taking a large leap towards making modules usable in Rakudo. To facilitate testing of this new feature he also implemented the =:= infix operator (tests whether two variables are aliases).

Allison Randal merged a branch which reworked Parrot's multi sub and multi method dispatch system. That broke some complex math in Rakudo, leaving us with "only" about 4380 passing spec tests. Otherwise we might have hit the 4500 mark by now. Still it's good work, and is expected to solve some fundamental problems in the mid or long term.

Elf

Mitchell Charity worked furiously on the new Lisp backend for elf, bringing it (almost?) to bootstrap. That demonstrated the flexibility of elf, and allows to get rid of some quirks that can creep into a compiler if it has only one backend.

It also opens up an opportunity for hackers that want to help with a Perl 6 compiler by writing code in Perl 6. Or Lisp.

STD.pm and viv

STD.pm is the Perl 6 grammar written in Perl 6. It now parses all Perl 6 code that we know of, so it's time to find out if it actually parses it in a useful way, and to check if it loses information while parsing.

Finding that out is one of the goals of viv (Read that as VI → V and think of Roman numbers). The other goal is to replace gimme5, which currently does the ugly, hacky job of translating STD.pm to Perl 5 code.

It's a script that uses reduce actions at the end of each grammar rulle to build some kind of parse tree or abstract syntax tree, and it's planned to produce either Perl 6 or Perl 5 output. We'll see what the future (and $larry) brings.

Pugs and the test suite

Pugs is still hibernating, and waiting for the release of GHC 6.10.1.

If pugs hibernates, the test suite has a light slumber, and is occasionally disturbed in its peace by a few more tests now and then.

SMOP

Daniel Ruoso and Paweł Murias are both hacking actively on smop. Currently on the agenda is multi dispatch, which is more complicated than it sounds at first. Remember that slurpy argument lists are lazy, which makes things more complicated.

For me it's rather hard to judge how much progress they are making, or how close they are to run basic, real-world code.

Update: Paweł contributed another small explanation:

SMOP now has a new compiler named mildew which uses viv/STD as its parser. Right now it supports only a handful of operations, the most advanced of it is creating an object with a simplified meta model.

It lives in the Pugs repository under v6/mildew/. Anyone who wants to hack on it (in Perl 5) is welcome, instructions can be found on #perl6

[/perl-6] Permanent link

Categories