Categories
Posts in this category
- Introduction
- Strings, Arrays, Hashes;
- Types
- Basic Control Structures
- Subroutines and Signatures
- Objects and Classes
- Contexts
- Regexes (also called "rules")
- Junctions
- Comparing and Matching
- Containers and Values
- Where we are now - an update
- Changes to Perl 5 Operators
- Laziness
- Custom Operators
- The MAIN sub
- Twigils
- Enums
- Unicode
- Scoping
- Regexes strike back
- A grammar for (pseudo) XML
- Subset Types
- The State of the implementations
- Quoting and Parsing
- The Reduction Meta Operator
- The Cross Meta Operator
- Exceptions and control exceptions
- Common Perl 6 data processing idioms
- Currying
Sun, 25 Jul 2010
Currying
Permanent link
NAME
"Perl 5 to 6" Lesson 28 - Currying
SYNOPSIS
use v6; my &f := &substr.assuming('Hello, World'); say f(0, 2); # He say f(3, 2); # lo say f(7); # World say <a b c>.map: * x 2; # aabbcc say <a b c>.map: *.uc; # ABC for ^10 { print <R G B>.[$_ % *]; # RGBRGBRGBR }
DESCRIPTION
Currying or partial application is the process of generating a function from another function or method by providing only some of the arguments. This is useful for saving typing, and when you want to pass a callback to another function.
Suppose you want a function that lets you extract substrings from "Hello, World"
easily. The classical way of doing that is writing your own function:
sub f(*@a) { substr('Hello, World', |@a) }
Currying with assuming
Perl 6 provides a method assuming
on code objects, which applies the arguments passed to it to the invocant, and returns the partially applied function.
my &f := &substr.assuming('Hello, World');
Now f(1, 2)
is the same as substr('Hello, World', 1, 2)
.
assuming
also works on operators, because operators are just subroutines with weird names. To get a subroutine that adds 2 to whatever number gets passed to it, you could write
my &add_two := &infix:<+>.assuming(2);
But that's tedious to write, so there's another option.
Currying with the Whatever-Star
my &add_two := * + 2; say add_two(4); # 6
The asterisk, called Whatever, is a placeholder for an argument, so the whole expression returns a closure. Multiple Whatevers are allowed in a single expression, and create a closure that expects more arguments, by replacing each term *
by a formal parameter. So * * 5 + *
is equivalent to -> $a, $b { $a * 5 + $b }
.
my $c = * * 5 + *; say $c(10, 2); # 52
Note that the second *
is an infix operator, not a term, so it is not subject to Whatever-currying.
The process of lifting an expression with Whatever stars into a closure is driven by syntax, and done at compile time. This means that
my $star = *; my $code = $star + 2
does not construct a closure, but instead dies with a message like
Can't take numeric value for object of type Whatever
Whatever currying is more versatile than .assuming
, because it allows to curry something else than the first argument very easily:
say ~(1, 3).map: 'hi' x * # hi hihihi
This curries the second argument of the string repetition operator infix x
, so it returns a closure that, when called with a numeric argument, produces the string hi
as often as that argument specifies.
The invocant of a method call can also be Whatever star, so
say <a b c>.map: *.uc; # ABC
involves a closure that calls the uc
method on its argument.
MOTIVATION
Perl 5 could be used for functional programming, which has been demonstrated in Mark Jason Dominus' book Higher Order Perl.
Perl 6 strives to make it even easier, and thus provides tools to make typical constructs in functional programming easily available. Currying and easy construction of closures is a key to functional programming, and makes it very easy to write transformation for your data, for example together with map
or grep
.
SEE ALSO
Thu, 22 Jul 2010
Common Perl 6 data processing idioms
Permanent link
NAME
"Perl 5 to 6" Lesson 27 - Common Perl 6 data processing idioms
SYNOPSIS
# create a hash from a list of keys and values: # solution 1: slices my %hash; %hash{@keys} = @values; # solution 2: meta operators my %hash = @keys Z=> @values; # create a hash from an array, with # true value for each array item: my %exists = @keys X=> True; # limit a value to a given range, here 0..10. my $x = -2; say 0 max $x min 10; # for debugging: dump the contents of a variable, # including its name, to STDERR note :$x.perl; # sort case-insensitively say @list.sort: *.lc; # mandatory attributes class Something { has $.required = die "Attribute 'required' is mandatory"; } Something.new(required => 2); # no error Something.new() # BOOM
DESCRIPTION
Learning the specification of a language is not enough to be productive with it. Rather you need to know how to solve specific problems. Common usage patterns, called idioms, helps you not having to re-invent the wheel every time you're faced with a problem.
So here a some common Perl 6 idioms, dealing with data structures.
Hashes
# create a hash from a list of keys and values: # solution 1: slices my %hash; %hash{@keys} = @values; # solution 2: meta operators my %hash = @keys Z=> @values;
The first solution is the same you'd use in Perl 5: assignment to a slice. The second solution uses the zip operator Z
, which joins to list like a zip fastener: 1, 2, 3 Z 10, 20, 30
is 1, 10, 2, 20, 3, 30
. The Z=>
is a meta operator, which combines zip with =>
(the Pair construction operator). So 1, 2, 3 Z=> 10, 20, 30
evaluates to 1 => 10, 2 => 20, 3 => 30
. Assignment to a hash variable turns that into a Hash.
For existence checks, the values in a hash often doesn't matter, as long as they all evaluate to True
in boolean context. In that case, a nice way to initialize the hash from a given array or list of keys is
my %exists = @keys X=> True;
which uses the cross meta operator to use the single value True
for every item in @keys
.
Numbers
Sometimes you want to get a number from somewhere, but clip it into a predefined range (for example so that it can act as an array index).
In Perl 5 you often end up with things like $a = $b > $upper ? $upper : $b
, and another conditional for the lower limit. With the max
and min
infix operators, that simplifies considerably to
my $in-range = $lower max $x min $upper;
because $lower max $x
returns the larger of the two numbers, and thus clipping to the lower end of the range.
Since min
and max
are infix operators, you can also clip infix:
$x max= 0; $x min= 10;
Debugging
Perl 5 has Data::Dumper, Perl 6 objects have the .perl
method. Both generate code that reproduces the original data structure as faithfully as possible.
:$var
generates a Pair ("colonpair"), using the variable name as key (but with sigil stripped). So it's the same as var => $var
. note()
writes to the standard error stream, appending a newline. So note :$var.perl
is quick way of obtaining the value of a variable for debugging; purposes, along with its name.
Sorting
Like in Perl 5, the sort
built-in can take a function that compares two values, and then sorts according to that comparison. Unlike Perl 5, it's a bit smarter, and automatically does a transformation for you if the function takes only one argument.
In general, if you want to compare by a transformed value, in Perl 5 you can do:
# WARNING: Perl 5 code ahead my @sorted = sort { transform($a) cmp transform($b) } @values; # or the so-called Schwartzian Transform: my @sorted = map { $_->[1] } sort { $a->[0] cmp $b->[0] } map { [transform($_), $_] } @values
The former solution requires repetitive typing of the transformation, and executes it for each comparison. The second solution avoids that by storing the transformed value along with the original value, but it's quite a bit of code to write.
Perl 6 automates the second solution (and a bit more efficient than the naiive Schwartzian transform, by avoiding an array for each value) when the transformation function has arity one, ie accepts one argument only:
my @sorted = sort &transform, @values;
Mandatory Attributes
The typical way to enforce the presence of an attribute is to check its presence in the constructor - or in all constructors, if there are many.
That works in Perl 6 too, but it's easier and safer to require the presence at the level of each attribute:
has $.attr = die "'attr' is mandatory";
This exploits the default value mechanism. When a value is supplied, the code for generating the default value is never executed, and the die
never triggers. If any constructor fails to set it, an exception is thrown.
MOTIVATION
N/A
Thu, 09 Jul 2009
Exceptions and control exceptions
Permanent link
NAME
"Perl 5 to 6" Lesson 26 - Exceptions and control exceptions
SYNOPSIS
try { die "OH NOEZ"; CATCH { say "there was an error: $!"; } }
DESCRIPTION
Exceptions are, contrary to their name, nothing exceptional. In fact they are part of the normal control flow of programs in Perl 6.
Exceptions are generated either by implicit errors (for example calling a non-existing method, or type check failures) or by explicitly calling die
or other functions.
When an exception is thrown, the program searches for CATCH
statements or try
blocks in the caller frames, unwinding the stack all the way (that means it forcibly returns from all routines called so far). If no CATCH
or try
is found, the program terminates, and prints out a hopefully helpful error message. If one was found, the error message is stored in the special variable $!
, and the CATCH
block is executed (or in the case of a try
without a CATCH block the try block returns Any
).
So far exceptions might still sound exceptional, but error handling is integral part of each non-trivial application. But even more, normal return
statements also throw exceptions!
They are called control exceptions, and can be caught with CONTROL
blocks, or are implicitly caught at each routine declaration.
Consider this example:
use v6; sub s { my $block = -> { return "block"; say "still here" }; $block(); return "sub"; } say s(); # block
Here the return "block"
throws a control exception, causing it to not only exit the current block (and thus not printing still here
on the screen), but also exiting the subroutine, where it is caught by the sub s...
declaration. The payload, here a string, is handed back as the return value, and the say
in the last line prints it to the screen.
Adding a CONTROL { ... }
block to the scope in which $block
is called causes it to catch the control exception.
Contrary to what other programming languages do, the CATCH
/CONTROL
blocks are within the scope in which the error is caught (not on the outside), giving it full access to the lexical variables, which makes it easier to generate useful error message, and also prevents DESTROY blocks from being run before the error is handled.
Unthrown exceptions
Perl 6 embraces the idea of multi threading, and in particular automated parallelization. To make sure that not all threads suffer from the termination of a single thread, a kind of "soft" exception was invented.
When a function calls fail($obj)
, it returns a special value of undef
, which contains the payload $obj
(usually an error message) and the back trace (file name and line number). Processing that special undefined value without check if it's undefined causes a normal exception to be thrown.
my @files = </etc/passwd /etc/shadow nonexisting>; my @handles = hyper map { open($_) }, @files; # hyper not yet implement
In this example the hyper
operator tells map
to parallelize its actions as far as possible. When the opening of the nonexisting
file fails, an ordinary die "No such file or directory"
would also abort the execution of all other open
operations. But since a failed open calls fail("No such file or directory"
instead, it gives the caller the possibility to check the contents of @handles
, and it still has access to the full error message.
If you don't like soft exceptions, you say use fatal;
at the start of the program and cause all exceptions from fail()
to be thrown immediately.
MOTIVATION
A good programming language needs exceptions to handle error conditions. Always checking return values for success is a plague and easily forgotten.
Since traditional exceptions can be poisonous for implicit parallelism, we needed a solution that combined the best of both worlds: not killing everything at once, and still not losing any information.
Wed, 27 May 2009
The Cross Meta Operator
Permanent link
NAME
"Perl 5 to 6" Lesson 25 - The Cross Meta Operator
SYNOPSIS
for <a b> X 1..3 -> $a, $b { print "$a: $b "; } # output: a: 1 a: 2 a: 3 b: 1 b: 2 b: 3 .say for <a b c> X 1, 2; # output: a 1 a 2 b 1 b 2 c 1 c 2 # (with newlines instead of spaces)
DESCRIPTION
The cross operator X
returns the Cartesian product of two or more lists, which means that it returns all possible tuples where the first item is an item of the first list, the second item is an item of second list etc.
If an operator follows the X
, then this operator is applied to all tuple items, and the result is returned instead. So 1, 2 X+ 3, 6
will return the values 1+3, 1+6, 2+3, 2+6
(evaluated as 4, 7, 5, 8
of course).
MOTIVATION
It's quite common that one has to iterate over all possible combinations of two or more lists, and the cross operator can condense that into a single iteration, thus simplifying programs and using up one less indentation level.
The usage as a meta operator can sometimes eliminate the loops altogether.
SEE ALSO
Wed, 10 Dec 2008
The Reduction Meta Operator
Permanent link
NAME
"Perl 5 to 6" Lesson 24 - The Reduction Meta Operator
SYNOPSIS
say [+] 1, 2, 3; # 6 say [+] (); # 0 say [~] <a b>; # ab say [**] 2, 3, 4; # 2417851639229258349412352 [\+] 1, 2, 3, 4 # 1, 3, 6, 10 [\**] 2, 3, 4 # 4, 81, 2417851639229258349412352 if [<=] @list { say "ascending order"; }
Description
The reduction meta operator [...]
can enclose any associative infix operator, and turn it into a list operator. This happens as if the operator was just put between the items of the list, so [op] $i1, $i2, @rest
returns the same result as if it was written as $i1 op $i2 op @rest[0] op @rest[1] ...
.
This is a very powerful construct that promotes the plus +
operator into a sum
function, ~
into a join
(with empty separator) and so on. It is somewhat similar to the List.reduce
function, and if you had some exposure to functional programming, you'll probably know about foldl
and foldr
(in Lisp or Haskell). Unlike those, [...]
respects the associativity of the enclosed operator, so [/] 1, 2, 3
is interpreted as (1 / 2) / 3
(left associative), [**] 1, 2, 3
is handled correctly as 1 ** (2**3)
(right associative).
Like all other operators, whitespace are forbidden, so you while you can write [+]
, you can't say [ + ]
. (This also helps to disambiguate it from array literals).
Since comparison operators can be chained, you can also write things like
if [==] @nums { say "all nums in @nums are the same" } elsif [<] @nums { say "@nums is in strict ascending order" } elsif [<=] @nums { say "@nums is in ascending order"}
However you cannot reduce the assignment operator:
my @a = 1..3; [=] @a, 4; # Cannot reduce with = because list assignment operators are too fiddly
Getting partial results
There's a special form of this operator that uses a backslash like this: [\+]
. It returns a list of the partial evaluation results. So [\+] 1..3
returns the list 1, 1+2, 1+2+3
, which is of course 1, 3, 6
.
[\~] 'a' .. 'd' # <a ab abc abcd>
Since right-associative operators evaluate from right to left, you also get the partial results that way:
[\**] 1..3; # 3, 2**3, 1**(2**3), which is 3, 8, 1
Multiple reduction operators can be combined:
[~] [\**] 1..3; # "381"
MOTIVATION
Programmers are lazy, and don't want to write a loop just to apply a binary operator to all elements of a list. List.reduce
does something similar, but it's not as terse as the meta operator ([+] @list
would be @list.reduce(&infix:<+>)
). Also with reduce you have to takes care of the associativity of the operator yourself, whereas the meta operator handles it for you.
If you're not convinced, play a bit with it (rakudo implements it), it's real fun.
SEE ALSO
http://design.perl6.org/S03.html#Reduction_operators, http://www.perlmonks.org/?node_id=716497
Tue, 09 Dec 2008
Quoting and Parsing
Permanent link
NAME
"Perl 5 to 6" Lesson 23 - Quoting and Parsing
SYNOPSIS
my @animals = <dog cat tiger> # or my @animals = qw/dog cat tiger/; # or my $interface = q{eth0}; my $ips = q :s :x /ifconfig $interface/; # ----------- sub if { warn "if() calls a sub\n"; } if();
DESCRIPTION
Quoting
Perl 6 has a powerful mechanism of quoting strings, you have exact control over what features you want in your string.
Perl 5 had single quotes, double quotes and qw(...)
(single quotes, splitted on whitespaces) as well as the q(..)
and qq(...)
forms which are basically synonyms for single and double quotes.
Perl 6 in turn defines a quote operator named Q
that can take various modifiers. The :b
(backslash) modifier allows interpolation of backslash escape sequences like \n
, the :s
modifier allows interpolation of scalar variables, :c
allows the interpolation of closures ("1 + 2 = { 1 + 2 }"
) and so on, :w
splits on words as qw/.../
does.
You can arbitrarily combine those modifiers. For example you might wish a form of qw/../
that interpolates only scalars, but nothing else? No problem:
my $stuff = "honey"; my @list = Q :w :s/milk toast $stuff with\tfunny\nescapes/; say @list[*-1]; # with\nfunny\nescapes
Here's a list of what modifiers are available, mostly stolen from S02 directly. All of these also have long names, which I omitted here.
Features: :q Interpolate \\, \q and \' :b Other backslash escape sequences like \n, \t Operations: :x Execute as shell command, return result :w Split on whitespaces :ww Split on whitespaces, with quote protection Variable interpolation :s Interpolate scalars ($stuff) :a Interpolate arrays (@stuff[]) :h Interpolate hashes (%stuff{}) :f Interpolate functions (&stuff()) Other :c Interpolate closures ({code}) :qq Interpolate with :s, :a, :h, :f, :c, :b :regex parse as regex
There are some short forms which make life easier for you:
q Q:q qq Q:qq m Q:regex
You can also omit the first colon :
if the quoting symbol is a short form, and write it as a singe word:
symbol short for qw q:w Qw Q:w qx q:x Qc Q:c # and so on.
However there is one form that does not work, and some Perl 5 programmers will miss it: you can't write qw(...)
with the round parenthesis in Perl 6. It is interpreted as a call to sub qw
.
Parsing
This is where parsing comes into play: Every construct of the form identifier(...)
is parsed as sub call. Yes, every.
if($x<3)
is parsed as a call to sub if
. You can disambiguate with whitespace:
if ($x < 3) { say '<3' }
Or just omit the parens altogether:
if $x < 3 { say '<3' }
This implies that Perl 6 has no keywords. Actually there are keywords like use
or if
, but they are not reserved in the sense that identifiers are restricted to non-keywords.
MOTIVATION
Various combinations of the quoting modifiers are already used internally, for example q:w
to parse <...>
, and :regex
for m/.../
. It makes sense to expose these also to the user, who gains flexibility, and can very easily write macros that provide a shortcut for the exact quoting semantics he wants.
And when you limit the specialty of keywords, you have far less troubles with backwards compatibility if you want to change what you consider a "keyword".
SEE ALSO
Mon, 08 Dec 2008
The State of the implementations
Permanent link
NAME
"Perl 5 to 6" Lesson 22 - The State of the implementations
SYNOPSIS
(none)
DESCRIPTION
Note: This lesson is long outdated, and preserved for historical interest only. The best way to stay informed about various Perl 6 compilers is to follow the blogs at http://planetsix.perl.org/.
Perl 6 is a language specification, and multiple compilers are being written that aim to implement Perl 6, and partially they already do.
Pugs
Pugs is a Perl 6 compiler written in Haskell. It was started by Audrey Tang, and she also did most of the work. In terms of implemented features it might still be the most advanced implementation today (May 2009).
To build and test pugs, you have to install GHC 6.10.1 first, and then run
svn co http://svn.pugscode.org/pugs cd pugs perl Makefile.PL make make test
That will install some Haskell dependencies locally and then build pugs. For make test
you might need to install some Perl 5 modules, which you can do with cpan Task::Smoke
.
Pugs hasn't been developed during the last three years, except occasional clean-ups of the build system.
Since the specification is evolving and Pugs is not updated, it is slowly drifting into obsoleteness.
Pugs can parse most common constructs, implements object orientation, basic regexes, nearly(?) all control structures, basic user defined operators and macros, many builtins, contexts (except slice context), junctions, basic multi dispatch and the reduction meta operator - based on the syntax of three years past.
Rakudo
Rakudo is a parrot based compiler for Perl 6. The main architect is Patrick Michaud, many features were implemented by Jonathan Worthington.
It is hosted on github, you can find build instructions on http://rakudo.org/how-to-get-rakudo.
Rakudo development is very active, it's the most active Perl 6 compiler today. It passes a bit more than 17,000 tests from the official test suite (July 2009).
It implements most control structures, most syntaxes for number literals, interpolation of scalars and closures, chained operators, BEGIN
- and END
blocks, pointy blocks, named, optional and slurpy arguments, sophisticated multi dispatch, large parts of the object system, regexes and grammars, Junctions, generic types, parametric roles, typed arrays and hashes, importing and exporting of subroutines and basic meta operators.
If you want to experiment with Perl 6 today, Rakudo is the recommended choice.
Elf
Mitchell Charity started elf, a bootstrapping compiler written in Perl 6, with a grammar written in Ruby. Currently it has a Perl 5 backend, others are in planning.
It lives in the pugs repository, once you've checked it out you can go to misc/elf/
and run ./elf_f $filename
. You'll need ruby-1.9 and some perl modules, about which elf will complain bitterly when they are not present.
elf
is developed in bursts of activity followed by weeks of low activity, or even none at all.
It parses more than 70% of the test suite, but implements mostly features that are easy to emulate with Perl 5, and passes about 700 tests from the test suite.
KindaPerl6
Flavio Glock started KindaPerl6 (short kp6), a mostly bootstrapped Perl 6 compiler. Since the bootstrapped version is much too slow to be fun to develop with, it is now waiting for a faster backend.
Kp6 implements object orientation, grammars and a few distinct features like lazy gather/take. It also implements BEGIN
blocks, which was one of the design goals.
v6.pm
v6
is a source filter for Perl 5. It was written by Flavio Glock, and supports basic Perl 6 plus grammars. It is fairly stable and fast, and is occasionally enhanced. It lives on the CPAN and in the pugs repository in perl5/*/
.
SMOP
Smop stands for Simple Meta Object Programming and doesn't plan to implement all of Perl 6, it is designed as a backend (a little bit like parrot, but very different in both design and feature set). Unlike the other implementations it aims explicitly at implementing Perl 6's powerful meta object programming facilities, ie the ability to plug in different object systems.
It is implemented in C and various domain specific languages. It was designed and implemented by Daniel Ruoso, with help from Yuval Kogman (design) and Paweł Murias (implementation, DSLs). A grant from The Perl Foundation supports its development, and it currently approaches the stage where one could begin to emit code for it from another compiler.
It will then be used as a backend for either elf or kp6, and perhaps also for pugs.
STD.pm
Larry Wall wrote a grammar for Perl 6 in Perl 6. He also wrote a cheating script named gimme5
, which translates that grammar to Perl 5. It can parse about every written and valid piece of Perl 6 that we know of, including the whole test suite (apart from a few failures now and then when Larry accidentally broke something).
STD.pm lives in the pugs repository, and can be run and tested with perl-5.10.0 installed in /usr/local/bin/perl
and a few perl modules (like YAML::XS
and Moose
):
cd src/perl6/ make make testt # warning: takes lot of time, 80 minutes or so ./tryfile $your_file
It correctly parses custom operators and warns about non-existent subs, undeclared variables and multiple declarations of the same variable as well as about some Perl 5isms.
MOTIVATION
Many people ask why we need so many different implementations, and if it wouldn't be better to focus on one instead.
There are basically three answers to that.
Firstly that's not how programming by volunteers work. People sometimes either want to start something with the tools they like, or they think that one aspect of Perl 6 is not sufficiently honoured by the design of the existing implementations. Then they start a new project.
The second possible answer is that the projects explore different areas of the vast Perl 6 language: SMOP explores meta object programming (from which Rakudo will also benefit), Rakudo and parrot care a lot about efficient language interoperability, grammars and platform independence, kp6 explored BEGIN blocks, and pugs was the first implementation to explore the syntax, and many parts of the language for the first time.
The third answer is that we don't want a single point of failure. If we had just one implementation, and had severe problems with one of them for unforeseeable reasons (technical, legal, personal, ...) we have possible fallbacks.
SEE ALSO
Pugs: http://www.pugscode.org/, http://pugs.blogs.com/pugs/2008/07/pugshs-is-back.html, http://pugs.blogspot.com, source: http://svn.pugscode.org/pugs.
Rakudo: http://rakudo.org/, http://www.parrot.org/,
Elf: http://perl.net.au/wiki/Elf source: see pugs, misc/elf/
.
KindaPerl6: source: see pugs, v6/v6-KindaPerl6
.
v6.pm: source: see pugs, perl5/
.
STD.pm: source: see pugs, src/perl6/
.
Sun, 07 Dec 2008
Subset Types
Permanent link
NAME
"Perl 5 to 6" Lesson 21 - Subset Types
SYNOPSIS
subset Squares of Real where { .sqrt.Int**2 == $_ }; multi sub square_root(Squares $x --> Int) { return $x.sqrt.Int; } multi sub square_root(Real $x --> Real) { return $x.sqrt; }
DESCRIPTION
Java programmers tend to think of a type as either a class or an interface (which is something like a crippled class), but that view is too limited for Perl 6. A type is more generally a constraint of what a values a container can constraint. The "classical" constraint is it is an object of a class X
or of a class that inherits from X
. Perl 6 also has constraints like the class or the object does role Y
, or this piece of code returns true for our object. The latter is the most general one, and is called a subset type:
subset Even of Int where { $_ % 2 == 0 } # Even can now be used like every other type name my Even $x = 2; my Even $y = 3; # type mismatch error
(Try it out, Rakudo implements subset types).
You can also use anonymous subtypes in signatures:
sub foo (Int where { ... } $x) { ... } # or with the variable at the front: sub foo ($x of Int where { ... } ) { ... }
MOTIVATION
Allowing arbitrary type constraints in the form of code allows ultimate extensibility: if you don't like the current type system, you can just roll your own based on subset types.
It also makes libraries easier to extend: instead of dying on data that can't be handled, the subs and methods can simply declare their types in a way that "bad" data is rejected by the multi dispatcher. If somebody wants to handle data that the previous implementation rejected as "bad", he can simple add a multi sub with the same name that accepts the data. For example a math library that handles real numbers could be enhanced this way to also handle complex numbers.
Sat, 06 Dec 2008
A grammar for (pseudo) XML
Permanent link
NAME
"Perl 5 to 6" Lesson 20 - A grammar for (pseudo) XML
SYNOPSIS
grammar XML { token TOP { ^ <xml> $ }; token xml { <text> [ <tag> <text> ]* }; token text { <-[<>&]>* }; rule tag { '<'(\w+) <attributes>* [ | '/>' # a single tag | '>'<xml>'</' $0 '>' # an opening and a closing tag ] }; token attributes { \w+ '="' <-["<>]>* '"' }; };
DESCRIPTION
So far the focus of these articles has been the Perl 6 language, independently of what has been implemented so far. To show you that it's not a purely fantasy language, and to demonstrate the power of grammars, this lesson shows the development of a grammar that parses basic XML, and that runs with Rakudo.
Please follow the instructions on http://rakudo.org/how-to-get-rakudo/ to obtain and build Rakudo, and try it out yourself.
Our idea of XML
For our purposes XML is quite simple: it consists of plain text and nested tags that can optionally have attributes. So here are few tests for what we want to parse as valid "XML", and what not:
my @tests = ( [1, 'abc' ], # 1 [1, '<a></a>' ], # 2 [1, '..<ab>foo</ab>dd' ], # 3 [1, '<a><b>c</b></a>' ], # 4 [1, '<a href="foo"><b>c</b></a>'], # 5 [1, '<a empty="" ><b>c</b></a>' ], # 6 [1, '<a><b>c</b><c></c></a>' ], # 7 [0, '<' ], # 8 [0, '<a>b</b>' ], # 9 [0, '<a>b</a' ], # 10 [0, '<a>b</a href="">' ], # 11 [1, '<a/>' ], # 12 [1, '<a />' ], # 13 ); my $count = 1; for @tests -> $t { my $s = $t[1]; my $M = XML.parse($s); if !($M xor $t[0]) { say "ok $count - '$s'"; } else { say "not ok $count - '$s'"; } $count++; }
This is a list of both "good" and "bad" XML, and a small test script that runs these tests by calling XML.parse($string)
. By convention the rule that matches what the grammar should match is named TOP
.
(As you can see from test 1 we don't require a single root tag, but it would be trivial to add this restriction).
Developing the grammar
The essence of XML is surely the nesting of tags, so we'll focus on the second test first. Place this at the top of the test script:
grammar XML { token TOP { ^ <tag> $ } token tag { '<' (\w+) '>' '</' $0 '>' } };
Now run the script:
$ ./perl6 xml-01.pl not ok 1 - 'abc' ok 2 - '<a></a>' not ok 3 - '..<ab>foo</ab>dd' not ok 4 - '<a><b>c</b></a>' not ok 5 - '<a href="foo"><b>c</b></a>' not ok 6 - '<a empty="" ><b>c</b></a>' not ok 7 - '<a><b>c</b><c></c></a>' ok 8 - '<' ok 9 - '<a>b</b>' ok 10 - '<a>b</a' ok 11 - '<a>b</a href="">' not ok 12 - '<a/>' not ok 13 - '<a />'
So this simple rule parses one pair of start tag and end tag, and correctly rejects all four examples of invalid XML.
The first test should be easy to pass as well, so let's try this:
grammar XML { token TOP { ^ <xml> $ }; token xml { <text> | <tag> }; token text { <-[<>&]>* }; token tag { '<' (\w+) '>' '</' $0 '>' } };
(Remember, <-[...]>
is a negated character class.)
And run it:
$ ./perl6 xml-03.pl ok 1 - 'abc' not ok 2 - '<a></a>' (rest unchanged)
Why in the seven hells did the second test stop working? The answer is that Rakudo doesn't do longest token matching yet (update 2013-01: it does now), but matches sequentially. <text>
matches the empty string (and thus always), so <text> | <tag>
never even tries to match <tag>
. Reversing the order of the two alternations would help.
But we don't just want to match either plain text or a tag anyway, but random combinations of both of them:
token xml { <text> [ <tag> <text> ]* };
([...]
are non-capturing groups, like (?: ... )
is in Perl 5).
And low and behold, the first two tests both pass.
The third test, ..<ab>foo</ab>dd
, has text between opening and closing tag, so we have to allow that next. But not only text is allowed between tags, but arbitrary XML, so let's just call <xml>
there:
token tag { '<' (\w+) '>' <xml> '</' $0 '>' } ./perl6 xml-05.pl ok 1 - 'abc' ok 2 - '<a></a>' ok 3 - '..<ab>foo</ab>dd' ok 4 - '<a><b>c</b></a>' not ok 5 - '<a href="foo"><b>c</b></a>' (rest unchanged)
We can now focus on attributes (the href="foo"
stuff):
token tag { '<' (\w+) <attribute>* '>' <xml> '</' $0 '>' }; token attribute { \w+ '="' <-["<>]>* \" };
But this doesn't make any new tests pass. The reason is the blank between the tag name and the attribute. Instead of adding \s+
or \s*
in many places we'll switch from token
to rule
, which implies the :sigspace
modifier:
rule tag { '<'(\w+) <attribute>* '>' <xml> '</'$0'>' }; token attribute { \w+ '="' <-["<>]>* \" };
Now all tests pass, except the last two:
ok 1 - 'abc' ok 2 - '<a></a>' ok 3 - '..<ab>foo</ab>dd' ok 4 - '<a><b>c</b></a>' ok 5 - '<a href="foo"><b>c</b></a>' ok 6 - '<a empty="" ><b>c</b></a>' ok 7 - '<a><b>c</b><c></c></a>' ok 8 - '<' ok 9 - '<a>b</b>' ok 10 - '<a>b</a' ok 11 - '<a>b</a href="">' not ok 12 - '<a/>' not ok 13 - '<a />'
These contain un-nested tags that are closed with a single slash /
. No problem to add that to rule tag
:
rule tag { '<'(\w+) <attribute>* [ | '/>' | '>' <xml> '</'$0'>' ] };
All tests pass, we're happy, our first grammar works well.
More hacking
Playing with grammars is much more fun that reading about playing, so here's what you could implement:
- plain text can contain entities like
&
- I don't know if XML tag names are allowed to begin with a number, but the current grammar allows that. You might look it up in the XML specification, and adapt the grammar if needed.
- plain text can contain
<![CDATA[ ... ]]>
blocks, in which xml-like tags are ignored and<
and the like don't need to be escaped - Real XML allows a preamble like
<?xml version="0.9" encoding="utf-8"?>
and requires one root tag which contains the rest (You'd have to change some of the existing test cases) - You could try to implement a pretty-printer for XML by recursively walking through the match object
$/
. (This is non-trivial; you might have to work around a few Rakudo bugs, and maybe also introduce some new captures).
(Please don't post solutions to this as comments in this blog; let others have the same fun as you had ;-).
Have fun hacking.
MOTIVATION
It's powerful and fun
SEE ALSO
Regexes are specified in great detail in S05: http://design.perl6.org/S05.html.
More working examples for grammars can be found at https://github.com/moritz/json/ (check file lib/JSON/Tiny/Grammar.pm).
Sun, 30 Nov 2008
Regexes strike back
Permanent link
NAME
"Perl 5 to 6" Lesson 19 - Regexes strike back
SYNOPSIS
# normal matching: if 'abc' ~~ m/../ { say $/; # ab } # match with implicit :sigspace modifier if 'ab cd ef' ~~ ms/ (..) ** 2 / { say $0[1]; # cd } # substitute with the :samespace modifier my $x = "abc defg"; $x ~~ ss/c d/x y/; say $x; # abx yefg
DESCRIPTION
Since the basics of regexes are already covered in lesson 07, here are some useful (but not very structured) additional facts about Regexes.
Matching
You don't need to write grammars to match regexes, the traditional form m/.../
still works, and has a new brother, the ms/.../
form, which implies the :sigspace
modifier. Remember, that means that whitespaces in the regex are substituted by the <.ws>
rule.
The default for the rule is to match \s+
if it is surrounded by two word-characters (ie those matching those \w
), and \s*
otherwise.
In substitutions the :samespace
modifier takes care that whitespaces matched with the ws
rule are preserved. Likewise the :samecase
modifier, short :ii
(since it's a variant of :i
) preserves case.
my $x = 'Abcd'; $x ~~ s:ii/^../foo/; say $x; # Foocd $x = 'ABC'; $x ~~ s:ii/^../foo/; say $x # FOOC
This is very useful if you want to globally rename your module Foo
, to Bar
, but for example in environment variables it is written as all uppercase. With the :ii
modifier the case is automatically preserved.
It copies case information on a character by character. But there's also a more intelligent version; when combined with the :sigspace
(short :s
) modifier, it tries to find a pattern in the case information of the source string. Recognized are .lc
, .uc
, .lc.ucfirst
, .uc.lcfirst
and .lc.capitaliz
(Str.capitalize
uppercases the first character of each word). If such a pattern is found, it is also applied to the substitution string.
my $x = 'The Quick Brown Fox'; $x ~~ s :s :ii /brown.*/perl 6 developer/; # $x is now 'The Quick Perl 6 Developer'
Alternations
Alternations are still formed with the single bar |
, but it means something else than in Perl 5. Instead of sequentially matching the alternatives and taking the first match, it now matches all alternatives in parallel, and takes the longest one.
'aaaa' ~~ m/ a | aaa | aa /; say $/ # aaa
While this might seem like a trivial change, it has far reaching consequences, and is crucial for extensible grammars. Since Perl 6 is parsed using a Perl 6 grammar, it is responsible for the fact that in ++$a
the ++
is parsed as a single token, not as two prefix:<+>
tokens.
The old, sequential style is still available with ||
:
grammar Math::Expression { token value { | <number> | '(' <expression> [ ')' || { fail("Parenthesis not closed") } ] } ... }
The { ... }
execute a closure, and calling fail
in that closure makes the expression fail. That branch is guaranteed to be executed only if the previous (here the ')'
) fails, so it can be used to emit useful error messages while parsing.
There are other ways to write alternations, for example if you "interpolate" an array, it will match as an alternation of its values:
$_ = '12 oranges'; my @fruits = <apple orange banana kiwi>; if m:i:s/ (\d+) (@fruits)s? / { say "You've got $0 $1s, I've got { $0 + 2 } of them. You lost."; }
There is yet another construct that automatically matches the longest alternation: multi regexes. They can be either written as multi token name
or with a proto
:
grammar Perl { ... proto token sigil { * } token sigil:sym<$> { <sym> } token sigil:sym<@> { <sym> } token sigil:sym<%> { <sym> } ... token variable { <sigil> <twigil>? <identifier> } }
This example shows multiple tokens called sigil
, which are parameterized by sym
. When the short name, ie sigil
is used, all of these tokens are matched in an alternation. You may think that this is a very inconvenient way to write an alternation, but it has a huge advantage over writing '$'|'@'|'%'
: it is easily extensible:
grammar AddASigil is Perl { token sigil:sym<!> { <sym> } } # wow, we have a Perl 6 grammar with an additional sigil!
Likewise you can override existing alternatives:
grammar WeirdSigil is Perl { token sigil:sym<$> { '°' } }
In this grammar the sigil for scalar variables is °
, so whenever the grammar looks for a sigil it searches for a °
instead of a $
, but the compiler will still know that it was the regex sigil:sym<$>
that matched it.
In the next lesson you'll see the development of a real, working grammar with Rakudo.
Sat, 29 Nov 2008
Scoping
Permanent link
NAME
"Perl 5 to 6" Lesson 18 - Scoping
SYNOPSIS
for 1 .. 10 -> $a { # $a visible here } # $a not visible here while my $b = get_stuff() { # $b visible here } # $b still visible here my $c = 5; { my $c = $c; # $c is undef here } # $c is 5 here my $y; my $x = $y + 2 while $y = calc(); # $x still visible
DESCRIPTION
Lexical Scoping
Scoping in Perl 6 is quite similar to that of Perl 5. A Block introduces a new lexical scope. A variable name is searched in the innermost lexical scope first, if it's not found it is then searched for in the next outer scope and so on. Just like in Perl 5 a my
variable is a proper lexical variable, and an our
declaration introduces a lexical alias for a package variable.
But there are subtle differences: variables are exactly visible in the rest of the block where they are declared, variables declared in block headers (for example in the condition of a while
loop) are not limited to the block afterwards.
Also Perl 6 only ever looks up unqualified names (variables and subroutines) in lexical scopes.
If you want to limit the scope, you can use formal parameters to the block:
if calc() -> $result { # you can use $result here } # $result not visible here
Variables are visible immediately after they are declared, not at the end of the statement as in Perl 5.
my $x = .... ; ^^^^^ $x visible here in Perl 6 but not in Perl 5
Dynamic scoping
The local
adjective is now called temp
, and if it's not followed by an initialization the previous value of that variable is used (not undef
).
There's also a new kind of dynamically scoped variable called a hypothetical variable. If the block is left with an exception or a false value,, then the previous value of the variable is restored. If not, it is kept:
use v6; my $x = 0; sub tryit($success) { let $x = 42; die "Not like this!" unless $success; return True; } tryit True; say $x; # 42 $x = 0; try tryit False; say $x; # 0
Context variables
Some variables that are global in Perl 5 ($!
, $_
) are context variables in Perl 6, that is they are passed between dynamic scopes.
This solves an old Problem in Perl 5. In Perl 5 an DESTROY
sub can be called at a block exit, and accidentally change the value of a global variable, for example one of the error variables:
# Broken Perl 5 code here: sub DESTROY { eval { 1 }; } eval { my $x = bless {}; die "Death\n"; }; print $@ if $@; # No output here
In Perl 6 this problem is avoided by not implicitly using global variables.
(In Perl 5.14 there is a workaround that protects $@
from being modified, thus averting the most harm from this particular example.)
Pseudo-packages
If a variable is hidden by another lexical variable of the same name, it can be accessed with the OUTER
pseudo package
my $x = 3; { my $x = 10; say $x; # 10 say $OUTER::x; # 3 say OUTER::<$x> # 3 }
Likewise a function can access variables from its caller with the CALLER
and CONTEXT
pseudo packages. The difference is that CALLER
only accesses the scope of the immediate caller, CONTEXT
works like UNIX environment variables (and should only be used internally by the compiler for handling $_
, $!
and the like). To access variables from the outer dynamic scope they must be declared with is context
.
MOTIVATION
It is now common knowledge that global variables are really bad, and cause lots of problems. We also have the resources to implement better scoping mechanism. Therefore global variables are only used for inherently global data (like %*ENV
or $*PID
).
The block scoping rules haven been greatly simplified.
Here's a quote from Perl 5's perlsyn
document; we don't want similar things in Perl 6:
NOTE: The behaviour of a "my" statement modified with a statement modifier conditional or loop construct (e.g. "my $x if ...") is undefined. The value of the "my" variable may be "undef", any previously assigned value, or possibly anything else. Don't rely on it. Future versions of perl might do something different from the version of perl you try it out on. Here be dragons.
SEE ALSO
S04 discusses block scoping: http://design.perl6.org/S04.html.
S02 lists all pseudo packages and explains context scoping: http://design.perl6.org/S02.html#Names.
Fri, 28 Nov 2008
Unicode
Permanent link
NAME
"Perl 5 to 6" Lesson 17 - Unicode
SYNOPSIS
(none)
DESCRIPTION
Perl 5's Unicode model suffers from a big weakness: it uses the same type for binary and for text data. For example if your program reads 512 bytes from a network socket, it is certainly a byte string. However when (still in Perl 5) you call uc
on that string, it will be treated as text. The recommended way is to decode that string first, but when a subroutine receives a string as an argument, it can never surely know if it had been encoded or not, ie if it is to be treated as a blob or as a text.
Perl 6 on the other hand offers the type buf
, which is just a collection of bytes, and Str
, which is a collection of logical characters.
Logical character is still a vague term. To be more precise a Str
is an object that can be viewed at different levels: Byte
, Codepoint
(anything that the Unicode Consortium assigned a number to is a codepoint), Grapheme
(things that visually appear as a character) and CharLingua
(language defined characters).
For example the string with the hex bytes 61 cc 80
consists of three bytes (obviously), but can also be viewed as being consisting of two codepoints with the names LATIN SMALL LETTER A
(U+0041) and COMBINING GRAVE ACCENT
(U+0300), or as one grapheme that, if neither my blog software nor your browser kill it, looks like this: à
.
So you can't simply ask for the length of a string, you have to ask for a specific length:
$str.bytes; $str.codes; $str.graphs;
There's also method named chars
, which returns the length in the current Unicode level (which can be set by a pragma like use bytes
, and which defaults to graphemes).
In Perl 5 you sometimes had the problem of accidentally concatenating byte strings and text strings. If you should ever suffer from that problem in Perl 6, you can easily identify where it happens by overloading the concatenation operator:
sub GLOBAL::infix:<~> is deep (Str $a, buf $b)|(buf $b, Str $a) { die "Can't concatenate text string «" ~ $a.encode("UTF-8") "» with byte string «$b»\n"; }
Encoding and Decoding
The specification of the IO system is very basic and does not yet define any encoding and decoding layers, which is why this article has no useful SYNOPSIS section. I'm sure that there will be such a mechanism, and I could imagine it will look something like this:
my $handle = open($filename, :r, :encoding<UTF-8>);
Regexes and Unicode
Regexes can take modifiers that specify their Unicode level, so m:codes/./
will match exactly one codepoint. In the absence of such modifiers the current Unicode level will be used.
Character classes like \w
(match a word character) behave accordingly to the Unicode standard. There are modifiers that ignore case (:i
) and accents (:a
), and modifiers for the substitution operators that can carry case information to the substitution string (:samecase
and :sameaccent
, short :ii
, :aa
).
MOTIVATION
It is quite hard to correctly process strings with most tools and most programming languages these days. Suppose you have a web application in perl 5, and you want to break long words automatically so that they don't mess up your layout. When you use naive substr
to do that, you might accidentally rip graphemes apart.
Perl 6 will be the first mainstream programming language with built in support for grapheme level string manipulation, which basically removes most Unicode worries, and which (in conjunction with regexes) makes Perl 6 one of the most powerful languages for string processing.
The separate data types for text and byte strings make debugging and introspection quite easy.
SEE ALSO
Thu, 27 Nov 2008
Enums
Permanent link
NAME
"Perl 5 to 6" Lesson 16 - Enums
SYNOPSIS
enum Bool <False True>; my $value = $arbitrary_value but True; if $value { say "Yes, it's true"; # will be printed } enum Day ('Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat', 'Sun'); if custom_get_date().Day == Day::Sat | Day::Sun { say "Weekend"; }
DESCRIPTION
Enums are versatile beasts. They are low-level classes that consist of an enumeration of constants, typically integers or strings (but can be arbitrary).
These constants can act as subtypes, methods or normal values. They can be attached to an object with the but
operator, which "mixes" the enum into the value:
my $x = $today but Day::Tue;
You can also use the type name of the Enum as a function, and supply the value as an argument:
$x = $today but Day($weekday);
Afterwards that object has a method with the name of the enum type, here Day
:
say $x.Day; # 1
The value of first constant is 0, the next 1 and so on, unless you explicitly provide another value with pair notation:
enum Hackers (:Larry<Perl>, :Guido<Python>, :Paul<Lisp>);
You can check if a specific value was mixed in by using the versatile smart match operator, or with .does
:
if $today ~~ Day::Fri { say "Thank Christ it's Friday" } if $today.does(Fri) { ... }
Note that you can specify the name of the value only (like Fri
) if that's unambiguous, if it's ambiguous you have to provide the full name Day::Fri
.
MOTIVATION
Enums replace both the "magic" that is involved with tainted variables in Perl 5 and the return "0 but True"
hack (a special case for which no warning is emitted if used as a number). Plus they give a Bool
type.
Enums also provide the power and flexibility of attaching arbitrary meta data for debugging or tracing.
SEE ALSO
Tue, 21 Oct 2008
Twigils
Permanent link
NAME
"Perl 5 to 6" Lesson 15 - Twigils
SYNOPSIS
class Foo { has $.bar; has $!baz; } my @stuff = sort { $^b[1] <=> $^a[1]}, [1, 2], [0, 3], [4, 8]; my $block = { say "This is the named 'foo' parameter: $:foo" }; $block(:foo<bar>); say "This is file $?FILE on line $?LINE" say "A CGI script" if %*ENV<DOCUMENT_ROOT>:exists;
DESCRIPTION
Some variables have a second sigil, called twigil. It basically means that the variable isn't "normal", but differs in some way, for example it could be differently scoped.
You've already seen that public and private object attributes have the .
and !
twigil respectively; they are not normal variables, they are tied to self
.
The ^
twigil removes a special case from perl 5. To be able to write
# beware: perl 5 code sort { $a <=> $b } @array
the variables $a
and $b
are special cased by the strict
pragma. In Perl 6, there's a concept named self-declared positional parameter, and these parameters have the ^
twigil. It means that they are positional parameters of the current block, without being listed in a signature. The variables are filled in lexicographic (alphabetic) order:
my $block = { say "$^c $^a $^b" }; $block(1, 2, 3); # 3 1 2
So now you can write
@list = sort { $^b <=> $^a }, @list; # or: @list = sort { $^foo <=> $^bar }, @list;
Without any special cases.
And to keep the symmetry between positional and named arguments, the :
twigil does the same for named parameters, so these lines are roughly equivalent:
my $block = { say $:stuff } my $sub = sub (:$stuff) { say $stuff }
Using both self-declared parameters and a signature will result in an error, as you can only have one of the two.
The ?
twigil stands for variables and constants that are known at compile time, like $?LINE
for the current line number (formerly __LINE__
), and $?DATA
is the file handle to the DATA
section.
Contextual variables can be accessed with the *
twigil, so $*IN
and $*OUT
can be overridden dynamically.
A pseudo twigil is <
, which is used in a construct like $<capture>
, where it is a shorthand for $/<capture>
, which accesses the Match object after a regex match.
MOTIVATION
When you read Perl 5's perlvar
document, you can see that it has far too many variables, most of them global, that affect your program in various ways.
The twigils try to bring some order in these special variables, and at the other hand they remove the need for special cases. In the case of object attributes they shorten self.var
to $.var
(or @.var
or whatever).
So all in all the increased "punctuation noise" actually makes the programs much more consistent and readable.
Mon, 20 Oct 2008
The MAIN sub
Permanent link
NAME
"Perl 5 to 6" Lesson 14 - The MAIN sub
SYNOPSIS
# file doit.pl #!/usr/bin/perl6 sub MAIN($path, :$force, :$recursive, :$home = '~/') { # do stuff here } # command line $ ./doit.pl --force --home=/home/someoneelse file_to_process
DESCRIPTION
Calling subs and running a typical Unix program from the command line is visually very similar: you can have positional, optional and named arguments.
You can benefit from it, because Perl 6 can process the command line for you, and turn it into a sub call. Your script is normally executed (at which time it can munge the command line arguments stored in @*ARGS
), and then the sub MAIN
is called, if it exists.
If the sub can't be called because the command line arguments don't match the formal parameters of the MAIN
sub, an automatically generated usage message is printed.
Command line options map to subroutine arguments like this:
-name :name -name=value :name<value> # remember, <...> is like qw(...) --hackers=Larry,Damian :hackers<Larry Damian> --good_language :good_language --good_lang=Perl :good_lang<Perl> --bad_lang PHP :bad_lang<PHP> +stuff :!stuff +stuff=healthy :stuff<healthy> but False
The $x = $obj but False
means that $x
is a copy of $obj
, but gives Bool::False
in boolean context.
So for simple (and some not quite simple) cases you don't need an external command line processor, but you can just use sub MAIN
for that.
MOTIVATION
The motivation behind this should be quite obvious: it makes simple things easier, similar things similar, and in many cases reduces command line processing to a single line of code: the signature of MAIN
.
SEE ALSO
http://design.perl6.org/S06.html#Declaring_a_MAIN_subroutine contains the specification.