Categories

Posts in this category

Fri, 02 Jan 2009

How to get a parse tree for a Perl 6 Program


Permanent link

Only Perl 6 can parse Perl 6, or so people say. This is mostly true, since the only parsers that can handle all known Perl 6 code is the module STD.pm written in Perl 6.

To get access to that parser, all you have to do is get a checkout of a directory of the pugs repository, linux with perl-5.10.0 or newer in the location /usr/local/bin/perl and some modules, like for example Moose and YAML::Syck. When you have that, type the following commands:

$ svn co http://svn.pugscode.org/pugs/src/perl6
$ cd perl6
$ make

Now the simplest way to check the syntax of a Perl 6 file is to use the program tryfile:

$ echo 'my ($x, $y) = 1..2;' > test.t
$ ./tryfile test.t
00:02 85m

The output might be a bit confusing at first. Since no error message was given, it means that no syntax error was found. The 00:02 is the parsing time, and 85 is the memory usage.

Suppose we introduce a syntax error in our test file:

$ echo 'my ($x, $y) = 1..; 2' > wrong.pl
$ ./tryfile wrong.pl
############# PARSE FAILED #############
Can't understand next input--giving up at wrong.pl line 1:
------> my ($x, $y) = 1..; 2
    expecting any of:
        prefix or noun
        whitespace
00:02 85m

The green part of the output indicates the part where the syntax is still correct, the red part is where the erroneous part begins. (The bold font weight isn't part of the output, though). At this point, either an prefix or noun would have been expected, as in 1..+2 or as in 1..2, or a whitespace, probably followed by another term.

But if you want to known not only if an expression was parsed, but also how, then there's another way: There's a syntax highlither based on STD.pm which generates HTML output that also encodes the parse tree:

$ ./STD_syntax_highlight --full-html=test.html test.t 

Open the generated file, test.html (click here for an example) in your favourite browser, enable Javascript, and click on the button labled Show Syntax Tree. Then when you hoover with the mouse over a piece of the code, on the right of the browser window a list is displayed that shows the calling hirarchy of rules in STD.pm.

If you move the mouse to the test my, you'll see that the top most parsing statement was statementlist, which then called statement, which in turn called statement_modexpr and so on.

You might also notice that scope_declarator calls scope_declarator__S_905my, which is slightly misleading. In fact token scope_declarator is a so called proto rule, which means that multiple rules have the same name, and if you call that rule, it tries to match all of them in parallel, and the longest match is picked. From these proto rules the compiler constructs automatically subrules whose names all contain the double underscore __.

So you should actually read the two rules scope_declarator and scope_declarator__S_905my as one, where the alternative that contains the my matched.

There are other possible ways to get a parse tree (for example modifying tryfile to emit a YAML parse tree, or using Rakudo's --target=parse option), but most of them produce somewhat verbose output that's hard to use.

[/perl-6] Permanent link