Categories

Ads

Your advertisement could be here -- Contact me!.

Sun, 22 Feb 2015

Profiling Perl 6 code on IRC


Permanent link

On the #perl6 IRC channel, we have a bot called camelia that executes small snippets of Perl 6 code, and prints the output that it produces. This is a pretty central part of our culture, and we use it to explain or demonstrate features or even bugs in the compiler.

Here is an example:

10:35 < Kristien> Can a class contain classes?
10:35 < Kristien> m: class A { class B { } }; say A.new.B.new
10:35 <+camelia> rakudo-moar 114659: OUTPUT«No such method 'B' for invocant of 
                 type 'A'␤  in block <unit> at /tmp/g81K8fr9eY:1␤␤»
10:35 < Kristien> :(
10:36 < raydiak> m: class A { class B { } }; say A::B.new
10:36 <+camelia> rakudo-moar 114659: OUTPUT«B.new()␤»

Yesterday and today I spent some time teaching this IRC bot to not only run the code, but optionally also run it through a profiler, to make it possible to determine where the virtual machine spends its time running the code. an example:

12:21 < moritz> prof-m: Date.today for ^100; say "done"
12:21 <+camelia> prof-m 9fc66c: OUTPUT«done␤»
12:21 <+camelia> .. Prof: http://p.p6c.org/453bbe

The Rakudo Perl 6 Compiler on the MoarVM backend has a profile, which produces a fancy HTML + Javascript page, and this is what is done. It is automatically uploaded to a webserver, producing this profile.

Under the hood, it started with a patch that makes it possible to specify the output filename for a profile run, and another one to clear up the fallout from the previous patch.

Then came the bigger part: setting up the Apache virtual host that serves the web files, including a restricted user that only allows up- and downloads via scp. Since the IRC bot can execute arbitrary code, it is very likely that an attacker can steal the private SSH keys used for authentication against the webserver. So it is essential that if those keys are stolen, the attacker can't do much more than uploading more files.

I used rssh for this. It is the login shell for the upload user, and configured to only allow scp. Since I didn't want the attacker to be able to modify the authorized_keys file, I configured rssh to use a chroot below the home directory (which sadly in turn requires a setuid-root wrapper around chroot, because ordinary users can't execute it. Well, nothing is perfect).

Some more patching and debugging later, the bot was ready.

The whole thing feels a bit bolted on; if usage warrants it, I'll see if I can make the code a bit prettier.

[/perl-6] Permanent link

comments / trackbacks

Fri, 06 Feb 2015

doc.perl6.org: some stats, future directions


Permanent link

In June 2012 I started the perl6/doc repository with the intent to collect/write API documentation for Perl 6 built-in types and routines. Not long afterwards, the website doc.perl6.org was born, generated from the aforementioned repository.

About 2.5 years later, the repository has seen more than one thousand commits from more than 40 contributors, 14 of which contributed ten patches or more. The documentation encompasses about 550 routines in 195 types, with 15 documents for other things than built-in types (for example an introduction to regexes, descriptions of how variables work).

In terms of subjective experience, I observed an increase in the number of questions on our IRC channel and otherwise that could be answered by pointing to the appropriate pages of doc.perl6.org, or augmenting the answer with a statement like "for more info, see ..."

While it's far from perfect, I think both the numbers and the experience is very encouraging, and I'd like to thank everybody who helped make that happen, often by contributing skills I'm not good at: front-end design, good English and gentle encouragement.

Plans for the Future

Being a community-driven project, I can't plan anybody else's time on it, so these are my own plans for the future of doc.perl6.org.

Infrastructural improvements

There are several unsolved problems with the web interface, with how we store our documents, and how information can be found. I plan to address them slowly but steadily.

  • The search is too much centered around types and routines, searching for variables, syntactic constructs and keywords isn't easily possible. I want it to find many more things than right now.
  • Currently we store the docs for each type in a separate file called Type.pod. Which will break when we start to document native types, which being with lower case letters. Having int.pod and Int.pod is completely unworkable on case-insensitive or case-preserving file system. I want to come up with a solution for that, though I don't yet know what it will look like.
  • doc.perl6.org is served from static pages, which leads to some problems with file names conflicting with UNIX conventions. You can't name a file infix:</>.html, and files with two consecutive dots in their names are also weird. So in the long run, we'll have to switch to some kind of dynamic URL dispatching, or a name escaping scheme that is capable of handling all of Perl 6's syntax.
  • Things like the list of methods and what they coerce to in class Cool don't show up in derived types; either the tooling needs to be improved for that, or they need to be rewritten to use the usual one-heading-per-method approach.

Content

Of course my plan is to improve coverage of the built-in types and routines, and add more examples. In addition, I want to improve and expand on the language documentation (for example syntax, OO, regexes, MOP), ideally documenting every Perl 6 feature.

Once the language features are covered in sufficient breadth and depth (though I won't wait for 100% coverage), I want to add three tutorial tracks:

  • A track for beginners
  • A quick-start for programmers from other languages
  • A series of intermediate to advanced guides covering topics such as parsing, how to structure a bigger application, the responsible use of meta programming, or reactive programming.

Of course I won't be able to do that all on my own, so I hope to convince my fellow and future contributors that those are good ideas.

Time to stop rambling about the future, and off to writing some docs, this is yours truly signing off.

[/perl-6] Permanent link

comments / trackbacks

Thu, 05 Feb 2015

All Perl 6 modules in a box


Permanent link

Sometimes when we change things in the Perl 6 language or the Rakudo Perl 6 compiler that implements it, we want to know if the planned changes will cause fallout in the library modules out there, and how much.

To get a quick estimate, we can now do a git grep in the experimental perl6-all-modules repository.

This is an attempt to get all the published module into a single git repository. It is built using git subrepo, an unofficial git extension module that I've been wanting to try for some time, and that seems to have some advantages over submodules in some cases. The notable one in this case being that git grep ignores submodules, but descends into subrepos just fine.

Here is the use case that made me create this repository: Rakudo accesses low-level operations through the nqp:: pseudo namespace. For example nqp::concat_s('a', 'b') is a low-level way to concatenate two strings. User-level programs can also use nqp:: ops, though it is generally a bad idea, because it ties the program to the particular compiler used, and what's more, the nqp:: ops are not part of the public API, and thus neither documented in the same place as the rest of Perl 6, nor are there any promises for stability attached.

So we want to require module authors to use a pragma, use nqp; in order to make their use of compiler internal explicit and deliberate. And of course, where possible, we want them to not use them at all :-)

To find out how many files in the ecosystem use nqp:: ops, a simple command, combined with the power of the standard UNIX tools, will help:

$ git grep -l 'nqp::'|wc -l
32

That's not too bad, considering we have... how many modules/distributions again?

Since they are added in author/repo structure, counting them with ls and wc isn't hard:

ls -1d */*/|wc -l
282

Ok, but number of files in relation to distributions isn't really useful. So let's ask: how many distributions directly use nqp:: ops?

$ git grep -l nqp:: | cut -d/ -f1,2 |sort -u|wc -l
23

23 out of 282 (or about 8%) distributions use the nqp:: syntax.

By the way, there is a tool (written in Perl 6, of course) to generate and update the repository. Not perfect yet, very much a work in progress. It's in the _tools folder, so you should probably filter out that directory in your queries (though in the examples above, it doesn't make a difference).

So, have fun with this new toy!

[/perl-6] Permanent link

comments / trackbacks

Thu, 01 Jan 2015

CPAN Pull Request Challenge: A call to the CPAN authors


Permanent link

The 2015 CPAN Pull Request Challenge is ramping up, and so far nearly two hundred volunteers have signed up, pledging to make one pull request for a CPAN distribution for each month of the year.

So here's a call to the all the CPAN authors: please be supportive, and if you don't like for your CPAN distributions to be part of the challenge, please send an email to neil at bowers dot com, stating your PAUSE ID and the fact that you want to be excluded.

How to be supportive? The first step is to act on pull requests. If you don't have time for a review, please say so; getting some response, even if it's "it'll be some time 'till I get around to reviewing this" is much better than none.

The volunteers have varied backgrounds; some are seasoned veterans, others are beginners who will make their first contribution to Open Source. So please be patient and encouraging.

If you have specific requirements for contributions, add a file called CONTRIBUTING or CONTRIBUTING.md to your github repositories where you state those requirements.

And of course, be civil. But that goes without saying, right? :-)

(And to those CPAN authors who haven't done it yet: put your distributions on github, so that you're not left out of the fun!

Happy New Year everybody, and have a great deal of fun!

See also: Resources for the CPAN Pull Request Challenge.

[/misc] Permanent link

comments / trackbacks

Wed, 10 Dec 2014

New Perl 6 community server now live, accepting signups


Permanent link

The new Perl 6 community server is now alive and kicking.

As planned, I've set up KVM virtualization, and so far there are two guest systems. hack.p6c.org is meant for general Perl 6 development activity (which also includes irssi/weechat sessions), and is equipped with 20GB RAM to handle multiple concurrent rakudo-jvm compilations :-). It runs a pretty bare-bones Debian Jessie.

Update: there is now a website for the new server.

www.p6c.org is the web server where I plan to host perl6.org and related (sub-)domains. It's not as beefy as hack, but sufficiently large to compile and run Rakudo, in preparation for future Perl 6-based web hosting. Currently I'm running a copy of several perl6.org subdomains on it (with the domain name p6c instead of perl6 for test purposes); the plan is to switch the perl6.org DNS over once all of the websites have been copied/migrated.

If you have a Perl 6 related use for a shell account or for serving websites, please request an account by email (moritz.lenz@gmail.com) or IRC (moritz on freenode and magnet), including:

  1. Your desired username
  2. What you want to do on the machine(s) (not necessary for #perl6 regulars)
  3. Which of the machine(s) you need access to
  4. Optionally an openssh public key
  5. Whether you'd be willing to help a bit with sysadmin tasks (mostly apt-get update && apt-get dist-upgrade, restarting hung services, killing huge processes)
  6. Software you need installed (it's OK to not know this up-front)

Note that feather.perl6.nl will shut down soon (no fixed date yet, but "end of 2014" is expected), so if you rely on feather now, you should consider migrating to the new server.

The code of conduct is pretty simple:

  1. Be reasonable in your resource usage.
  2. Use technical means to limit your resource usage so that it doesn't accidentally explode (ulimit comes to mind).
  3. Limit yourself to legal and Perl 6-related use cases (no warez).
  4. Help your fellow hackers.

The standard disclaimer applies:

  • Expect no privacy. There will potentially be many root users, who could all read your files and memory.
  • There are no promises of continued service or even support. Your account can be terminated without notice.
  • Place of jurisdiction in Nürnberg, Germany. You have to comply with German law while using the server. (Note that this puts pretty high standards on privacy for any user data you collect, including from web applications). It's your duty to inform yourself about the applicable laws. Illegal activities will be reported to the authorities.

With all that said, happy hacking!.

[/perl-6] Permanent link

comments / trackbacks

Tue, 02 Dec 2014

A new Perl 6 community server - update


Permanent link

In my previous post I announced my plans for a new Perl 6 community server (successor to feather.perl6.nl), and now I'd like to share some updates.

Thanks to the generosity of the Perl 6 community, the server has been ordered and paid. I am now in the process of contacting those donors who haven't paid yet, leaving them the choice to re-purpose their pledge to ongoing costs (traffic, public IPv4 addresses, domain(s), SSL certs if necessary) and maintenance, or withdraw their pledges.

Some details of the hardware we'll get:

  • CPU: Intel® Xeon® Haswell-EP Series Processor E5-2620 v3, 2.40 GHz, 6-Core Socket 2011-3, 15MB Cache
  • RAM: 4x8GB DDR4 DDR4 PC2133 Reg. ECC 2R
  • HD: 2x 2TB SATA3-HD

The vendor has told me that all parts have arrived, and will be assembled today or tomorrow.

Currently I lean towards using KVM to create three virtual hosts: one for websites (*.perl6.org, perlcabal.syn), one for general hacking and IRC activity, and one for high-risk stuff (evalbots, try.rakudo.org, ...).

I've secured the domain p6c.org (for "perl 6 community"), and the IPv4 range 213.95.82.52 - 213.95.82.62 and the IPv6 net 2001:780:101:ff00::/64.

So the infrastructure is in place, now I'm waiting for the delivery of the hardware.

[/perl-6] Permanent link

comments / trackbacks

Wed, 05 Nov 2014

A new Perl 6 community server - call for funding


Permanent link

So far, many Perl 6 developers have used feather as a generic development server. Juerd, who has genereously provided this server for us for free for many years, has announced that it will be shut down at the end of the year.

My daytime job is at a b2b IT outsourcing and hosting company called noris network, and they have agreed to sponsor the hosting/housing of a 1U 19" server in one of their state-of-the-art data centers in Nürnberg, Germany.

What's missing is the actual hardware. Some folks in the community have already agreed to participate in funding the hardware, though I have few concrete pledges.

So here is the call to action: If you want to help the Perl 6 community with a one-time donation towards a new community server, please send me an e-mail to moritz at faui2k3 dot org, specifying the amount you're willing do pledge, and whether you want to stay private as a donor. I accept money transfer by paypal and wire transfer (SWIFT). Direct hardware donations are also welcome. (Though actual money will be deferred until the final decision what hardware to buy, and thus the total amount required).

How much do we need?

Decent, used 1U servers seem to start at about 250€, though 350€ would get us a lot more bang (mostly RAM and hard disk space). And in general, the more the merrier. (Cheaper offers exist, for example on ebay, but usually they are without hard disks, so the need for extra drives makes them more expensive in total).

With more money, even beefier hardware and/or spare parts and/or a maintainance contract and/new hardware would be an option.

What do we need it for?

The main tasks for the server are:

  • Hosting websites like perl6.org and the synopses
  • Hosting infrastructure like the panda metadata server
  • Be available for smoke runs of the compilers, star distributions and module ecosystem.
  • Be available as a general development machine for people who don't have linux available and/or not enough resources to build some Perl 6 compilers on their own machines comfortably.
  • A place for IRC sessions for community memebers
  • A backup location for community services like the IRC logs, the camelia IRC eval bot etc. Those resources are currently hosted elswewhere, though having another option for hosting would be very valuable.
  • A webspace for people who want to host Perl 6-related material.
  • It is explicitly not meant as a general hosting platform, nor as a mail server.

    Configuration

    If the hardware we get is beefy enough, I'd like to virtualize the server into two to three components. One for hosting the perl6.org and related websites that should be rather stable, and one for the rest of the system. If resources allow it, and depending on feedback I get, maybe a third virtual system for high-risk stuff like evalbot.

    As operating system I'll install Debian Jessie (the current testing), simply because I'll end up maintaing the system, and it's the system I'm most familiar with.

[/perl-6] Permanent link

comments / trackbacks

Sat, 15 Feb 2014

The Fun of Running a Public Web Service, and Session Storage


Permanent link

One of my websites, Sudokugarden, recently surged in traffic, from about 30k visitors per month to more than 100k visitors per month. Here's the tale of what that meant for the server side.

As a bit of background, I built the website in 2007, when I knew a lot less about the web and programming. It runs on a host that I share with a few friends; I don't have root access on that machine, though when the admin is available, I can generally ask him to install stuff for me.

Most parts of the websites are built as static HTML files, with Server Side Includes. Parts of those SSIs are Perl CGI scripts. The most popular part though, which allows you to solve Sudoku in the browser and keeps hiscores, is written as a collection of Perl scripts, backed by a mysql database.

When at peak times the site had more than 10k visitors a day, lots of visitors would get a nasty mysql: Cannot connect: Too many open connections error. The admin wasn't available for bumping the connection limit, so I looked for other solutions.

My first action was to check the logs for spammers and crawlers that might hammered the page, and I found and banned some; but the bulk of the traffic looked completely legitimate, and the problem persisted.

Looking at the seven year old code, I realized that most pages didn't actually need a database connection, if only I could remove the session storage from the database. And, in fact, I could. I used CGI::Session, which has pluggable backend. Switching to a file-based session backend was just a matter of changing the connection string and adding a directory for session storage. Luckily the code was clean enough that this only affected a single subroutine. Everything was fine.

For a while.

Then, about a month later, the host ran out of free disk space. Since it is used for other stuff too (like email, and web hosting for other users) it took me a while to make the connection to the file-based session storage. What happened was 3 million session files on a ext3 file system with a block size of 4 kilobyte. A session is only about 400 byte, but since a file uses up a multiple of the block size, the session storage amounted to 12 gigabyte of used-up disk space, which was all that was left on that machine.

Deleting those sessions turned out to be a problem; I could only log in as my own user, which doesn't have write access to the session files (which are owned by www-data, the Apache user). The solution was to upload a CGI script that deleted the session, but of course that wasn't possible at first, because the disk was full. In the end I had to delete several gigabyte of data from my home directory before I could upload anything again. (Processes running as root were still writing to reserved-to-root portions of the file system, which is why I had to delete so much data before I was able to write again).

Even when I was able to upload the deletion script, it took quite some time to actually delete the session files; mostly because the directory was too large, and deleting files on ext3 is slow. When the files were gone, the empty session directory still used up 200MB of disk space, because the directory index doesn't shrink on file deletion.

Clearly a better solution to session storage was needed. But first I investigated where all those sessions came from, and banned a few spamming IPs. I also changed the code to only create sessions when somebody logs in, not give every visitor a session from the start.

My next attempt was to write the sessions to an SQLite database. It uses about 400 bytes per session (plus a fixed overhead for the db file itself), so it uses only a tenth of storage space that the file-based storage used. The SQLite database has no connection limit, though the old-ish version that was installed on the server doesn't seem to have very fine-grained locking either; within a few days I could errors that the session database was locked.

So I added another layer of workaround: creating a separate session database per leading IP octet. So now there are up to 255 separate session database (plus a 256th for all IPv6 addresses; a decision that will have to be revised when IPv6 usage rises). After a few days of operation, it seems that this setup works well enough. But suspicious as I am, I'll continue monitoring both disk usage and errors from Apache.

So, what happens if this solution fails to work out? I can see basically two approaches: move the site to a server that's fully under my control, and use redis or memcached for session storage; or implement sessions with signed cookies that are stored purely on the client side.

[/misc] Permanent link

comments / trackbacks

Sat, 19 Oct 2013

A small regex optimization for NQP and Rakudo


Permanent link

Recently I read the course material of the Rakudo and NQP Internals Workshop, and had an idea for a small optimization for the regex engine. Yesterday night I implemented it, and I'd like to walk you through the process.

As a bit of background, the regex engine that Rakudo uses is actually implemented in NQP, and used by NQP too. The code I am about to discuss all lives in the NQP repository, but Rakudo profits from it too.

In addition one should note that the regex engine is mostly used for parsing grammar, a process which involves nearly no scanning. Scanning is the process where the regex engine first tries to match the regex at the start of the string, and if it fails there, moves to the second character in the string, tries again etc. until it succeeds.

But regexes that users write often involve scanning, and so my idea was to speed up regexes that scan, and where the first thing in the regex is a literal. In this case it makes sense to find possible start positions with a fast string search algorithm, for example the Boyer-Moore algorithm. The virtual machine backends for NQP already implement that as the index opcode, which can be invoked as start = index haystack, needle, startpos, where the string haystack is searched for the substring needle, starting from position startpos.

From reading the course material I knew I had to search for a regex type called scan, so that's what I did:

$ git grep --word scan
3rdparty/libtommath/bn_error.c:   /* scan the lookup table for the given message
3rdparty/libtommath/bn_mp_cnt_lsb.c:   /* scan lower digits until non-zero */
3rdparty/libtommath/bn_mp_cnt_lsb.c:   /* now scan this digit until a 1 is found
3rdparty/libtommath/bn_mp_prime_next_prime.c:                   /* scan upwards 
3rdparty/libtommath/changes.txt:       -- Started the Depends framework, wrote d
src/QRegex/P5Regex/Actions.nqp:                     QAST::Regex.new( :rxtype<sca
src/QRegex/P6Regex/Actions.nqp:                     QAST::Regex.new( :rxtype<sca
src/vm/jvm/QAST/Compiler.nqp:    method scan($node) {
src/vm/moar/QAST/QASTRegexCompilerMAST.nqp:    method scan($node) {
Binary file src/vm/moar/stage0/NQPP6QRegexMoar.moarvm matches
Binary file src/vm/moar/stage0/QASTMoar.moarvm matches
src/vm/parrot/QAST/Compiler.nqp:    method scan($node) {
src/vm/parrot/stage0/P6QRegex-s0.pir:    $P5025 = $P5024."new"("scan" :named("rx
src/vm/parrot/stage0/QAST-s0.pir:.sub "scan" :subid("cuid_135_1381944260.6802") 
src/vm/parrot/stage0/QAST-s0.pir:    push $P5004, "scan"

The binary files and .pir files are generated code included just for bootstrapping, and not interesting for us. The files in 3rdparty/libtommath are there for bigint handling, thus not interesting for us either. The rest are good matches: src/QRegex/P6Regex/Actions.nqp is responsible for compiling Perl 6 regexes to an abstract syntax tree (AST), and src/vm/parrot/QAST/Compiler.nqp compiles that AST down to PIR, the assembly language that the Parrot Virtual Machine understands.

So, looking at src/QRegex/P6Regex/Actions.nqp the place that mentions scan looked like this:

    $block<orig_qast> := $qast;
    $qast := QAST::Regex.new( :rxtype<concat>,
                 QAST::Regex.new( :rxtype<scan> ),
                 $qast,
                 ($anon
                      ?? QAST::Regex.new( :rxtype<pass> )
                      !! (nqp::substr(%*RX<name>, 0, 12) ne '!!LATENAME!!'
                            ?? QAST::Regex.new( :rxtype<pass>, :name(%*RX<name>) )
                            !! QAST::Regex.new( :rxtype<pass>,
                                   QAST::Var.new(
                                       :name(nqp::substr(%*RX<name>, 12)),
                                       :scope('lexical')
                                   ) 
                               )
                          )));

So to make the regex scan, the AST (in $qast) is wrapped in QAST::Regex.new(:rxtype<concat>,QAST::Regex.new( :rxtype<scan> ), $qast, ...), plus some stuff I don't care about.

To make the optimization work, the scan node needs to know what to scan for, if the first thing in the regex is indeed a constant string, aka literal. If it is, $qast is either directly of rxtype literal, or a concat node where the first child is a literal. As a patch, it looks like this:

--- a/src/QRegex/P6Regex/Actions.nqp
+++ b/src/QRegex/P6Regex/Actions.nqp
@@ -667,9 +667,21 @@ class QRegex::P6Regex::Actions is HLL::Actions {
     self.store_regex_nfa($code_obj, $block, QRegex::NFA.new.addnode($qast))
     self.alt_nfas($code_obj, $block, $qast);
 
+    my $scan := QAST::Regex.new( :rxtype<scan> );
+    {
+        my $q := $qast;
+        if $q.rxtype eq 'concat' && $q[0] {
+            $q := $q[0]
+        }
+        if $q.rxtype eq 'literal' {
+            nqp::push($scan, $q[0]);
+            $scan.subtype($q.subtype);
+        }
+    }
+
     $block<orig_qast> := $qast;
     $qast := QAST::Regex.new( :rxtype<concat>,
-                 QAST::Regex.new( :rxtype<scan> ),
+                 $scan,
                  $qast,

Since concat nodes have always been empty so far, the code generators don't look at their child nodes, and adding one with nqp::push($scan, $q[0]); won't break anything on backends that don't support this optimization yet (which after just this patch were all of them). Running make test confirmed that.

My original patch did not contain the line $scan.subtype($q.subtype);, and later on some unit tests started to fail, because regex matches can be case insensitive, but the index op works only case sensitive. For case insensitive matches, the $q.subtype of the literal regex node would be ignorecase, so that information needs to be carried on to the code generation backend.

Once that part was in place, and some debug nqp::say() statements confirmed that it indeed worked, it was time to look at the code generation. For the parrot backend, it looked like this:

    method scan($node) {
        my $ops := self.post_new('Ops', :result(%*REG<cur>));
        my $prefix := self.unique('rxscan');
        my $looplabel := self.post_new('Label', :name($prefix ~ '_loop'));
        my $scanlabel := self.post_new('Label', :name($prefix ~ '_scan'));
        my $donelabel := self.post_new('Label', :name($prefix ~ '_done'));
        $ops.push_pirop('repr_get_attr_int', '$I11', 'self', %*REG<curclass>, '"$!from"');
        $ops.push_pirop('ne', '$I11', -1, $donelabel);
        $ops.push_pirop('goto', $scanlabel);
        $ops.push($looplabel);
        $ops.push_pirop('inc', %*REG<pos>);
        $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
        $ops.push_pirop('repr_bind_attr_int', %*REG<cur>, %*REG<curclass>, '"$!from"', %*REG<pos>);
        $ops.push($scanlabel);
        self.regex_mark($ops, $looplabel, %*REG<pos>, 0);
        $ops.push($donelabel);
        $ops;
    }

While a bit intimidating at first, staring at it for a while quickly made clear what kind of code it emits. First three labels are generated, to which the code can jump with goto $label: One as a jump target for the loop that increments the cursor position ($looplabel), one for doing the regex match at that position ($scanlabel), and $donelabel for jumping to when the whole thing has finished.

Inside the loop there is an increment (inc) of the register the holds the current position (%*REG<pos>), that position is compared to the end-of-string position (%*REG<eos>), and if is larger, the cursor is marked as failed.

So the idea is to advance the position by one, and then instead of doing the regex match immediately, call the index op to find the next position where the regex might succeed:

--- a/src/vm/parrot/QAST/Compiler.nqp
+++ b/src/vm/parrot/QAST/Compiler.nqp
@@ -1564,7 +1564,13 @@ class QAST::Compiler is HLL::Compiler {
         $ops.push_pirop('goto', $scanlabel);
         $ops.push($looplabel);
         $ops.push_pirop('inc', %*REG<pos>);
-        $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
+        if nqp::elems($node.list) && $node.subtype ne 'ignorecase' {
+            $ops.push_pirop('index', %*REG<pos>, %*REG<tgt>, self.rxescape($node[0]), %*REG<pos>);
+            $ops.push_pirop('eq', %*REG<pos>, -1, %*REG<fail>);
+        }
+        else {
+            $ops.push_pirop('gt', %*REG<pos>, %*REG<eos>, %*REG<fail>);
+        }
         $ops.push_pirop('repr_bind_attr_int', %*REG<cur>, %*REG<curclass>, '"$!from"', %*REG<pos>);
         $ops.push($scanlabel);
         self.regex_mark($ops, $looplabel, %*REG<pos>, 0);

The index op returns -1 on failure, so the condition for a cursor fail are slightly different than before.

And as mentioned earlier, the optimization can only be safely done for matches that don't ignore case. Maybe with some additional effort that could be remedied, but it's not as simple as case-folding the target string, because some case folding operations can change the string length (for example ß becomes SS while uppercasing).

After successfully testing the patch, I came up with a small, artifical benchmark designed to show a difference in performance for this particular case. And indeed, it sped it up from 647 ± 28 µs to 161 ± 18 µs, which is roughly a factor of four.

You can see the whole thing as two commits on github.

What remains to do is implementing the same optimization on the JVM and MoarVM backends, and of course other optimizations. For example the Perl 5 regex engine keeps track of minimal and maximal string lengths for each subregex, and can anchor a regex like /a?b?longliteral/ to 0..2 characters before a match of longliteral, and generally use that meta information to fail faster.

But for now I am mostly encouraged that doing a worthwhile optimization was possible in a single evening without any black magic, or too intimate knowledge of the code generation.

Update: the code generation for MoarVM now also uses the index op. The logic is the same as for the parrot backend, the only difference is that the literal needs to be loaded into a register (whose name fresh_s returns) before index_s can use it.

[/perl-6] Permanent link

comments / trackbacks

Wed, 14 Aug 2013

YAPC Europe 2013 Day 3


Permanent link

The second day of YAPC Europe climaxed in the river boat cruise, Kiev's version of the traditional conference dinner. It was a largish boat traveling on the Dnipro river, with food, drinks and lots of Perl folks. Not having fixed tables, and having to get up to fetch food and drinks led to a lot of circulation, and thus meeting many more people than at traditionally dinners. I loved it.

Day 3 started with a video message from next year's YAPC Europe organizers, advertising for the upcoming conference and talking a bit about the oppurtunities that Sofia offers. Tempting :-).

Monitoring with Perl and Unix::Statgrab was more about the metrics that are available for monitoring, and less about doing stuff with Perl. I was a bit disappointed.

The "Future Perl Versioning" Discussion was a very civilized discussion, with solid arguments. Whether anybody changed their minds remain to be seen.

Carl Mäsak gave two great talks: one on reactive programming, and one on regular expressions. I learned quite a bit in the first one, and simply enjoyed the second one.

After the lunch (tasty again), I attended Jonathan Worthington's third talk, MoarVM: a metamodel-focused runtime for NQP and Rakudo. Again this was a great talk, based on great work done by Jonathan and others during the last 12 months or so. MoarVM is a virtual machine designed for Perl 6's needs, as we understand them now (as opposed to parrot, which was designed towards Perl 6 as it was understood around 2003 or so, which is considerably different).

How to speak manager was both amusing and offered a nice perspective on interactions between managers and programmers. Some of this advice assumed a non-tech-savy manager, and thus didn't quite apply to my current work situation, but was still interesting.

I must confess I don't remember too much of the rest of the talks that evening. I blame five days of traveling, hackathon and conference taking their toll on me.

The third session of lightning talks was again an interesting mix, containing interesting technical tidbits, the usual "we are hiring" slogans, some touching and thoughtful moments, and finally a song by Piers Cawley. He had written the lyrics in the previous 18 hours (including sleep), to (afaict) a traditional irish song. Standing up in front of ~300 people and singing a song that you haven't really had time to practise takes a huge amount of courage, and I admire Piers both for his courage and his great performance. I hope it was recorded, and makes it way to the public soon.

Finally the organizers spoke some closing words, and received their well-deserved share of applause.

As you might have guess from this and the previous blog posts, I enjoyed this year's YAPC Europe very much, and found it well worth attending, and well organized. I'd like to give my heart-felt thanks to everybody who helped to make it happen, and to my employer for sending me there.

This being only my second YAPC, I can't make any far-reaching comparisons, but compared to YAPC::EU 2010 in Pisa I had an easier time making acquaintances. I cannot tell what the big difference was, but the buffet-style dinners at the pre-conference meeting and the river boat cruise certainly helped to increase the circulation and thus the number of people I talked to.

[/perl-6] Permanent link

comments / trackbacks

Tue, 13 Aug 2013

YAPC Europe 2013 Day 2


Permanent link

The second day of YAPC Europe was enjoyable and informative.

I learned about ZeroMQ, which is a bit like sockets on steriods. Interesting stuff. Sadly Design decisions on p2 didn't quite qualify as interesting.

Matt's PSGI archive is a project to rewrite Matt's infamous script archive in modern Perl. Very promising, and a bit entertaining too.

Lunch was very tasty, more so than the usual mass catering. Kudos to the organizers!

After lunch, jnthn talked about concurrency, parallelism and asynchrony in Perl 6. It was a great talk, backed by great work on the compiler and runtime. Jonathans talk are always to be recommended.

I think I didn't screw up my own talk too badly, at least the timing worked fine. I just forgot to show the last slide. No real harm done.

I also enjoyed mst's State of the Velociraptor, which was a summary of what went on in the Perl world in the last year. (Much better than the YAPC::EU 2010 talk with the same title).

The Lightning talks were as enjoyable as those from the previous day. So all fine!

Next up is the river cruise, I hope to blog about that later on.

[/perl-6] Permanent link

comments / trackbacks

Mon, 12 Aug 2013

First day at YAPC::Europe 2013 in Kiev


Permanent link

Today was the first "real" day of YAPC Europe 2013 in Kiev. In the same sense that it was the first real day, we had quite a nice "unreal" conference day yesterday, with a day-long Perl 6 hackathon, and in the evening a pre-conference meeting a Sovjet-style restaurant with tasty food and beverages.

The talks started with a few words of welcome, and then the announcement that the YAPC Europe next year will be in Sofia, Bulgaria, with the small side note that there were actually three cities competing for that honour. Congratulations to Sofia!

Larry's traditional keynote was quite emotional, and he had to fight tears a few times. Having had cancer and related surgeries in the past year, he still does his perceived duty to the Perl community, which I greatly appreciate.

Afterwards Dave Cross talked about 25 years of Perl in 25 minutes, which was a nice walk through some significant developments in the Perl world, though a bit hasty. Maybe picking fewer events and spending a bit more time on the selected few would give a smoother experience.

Another excellent talk that ran out of time was on Redis. Having experimented a wee bit with Redis in the past month, this was a real eye-opener on the wealth of features we might have used for a project at work, but in the end we didn't. Maybe we will eventually revise that decision.

Ribasushi talked about how hard benchmarking really is, and while I was (in principle) aware of that fact that it's hard to get right, there were still several significant factors that I overlooked (like the CPU's tendency to scale frequency in response to thermal and power-management considerations). I also learned that I should use Dumbbench instead of the Benchmark.pm core module. Sadly it didn't install for me (Capture::Tiny tests failing on Mac OS X).

The Perl 6 is dead, long live Perl 5 talk was much less inflammatory than the title would suggest (maybe due to Larry touching on the subject briefly during the keynote). It was mostly about how Perl 5 is used in the presenter's company, which was mildly interesting.

After tasty free lunch I attended jnthn's talk on Rakudo on the JVM, which was (as is typical for jnthn's talk) both entertaining and taught me something, even though I had followed the project quite a bit.

Thomas Klausner's Bread::Board by example made me want to refactor the OTRS internals very badly, because it is full of the anti-patterns that Bread::Board can solve in a much better way. I think that the OTRS code base is big enough to warrant the usage of Bread::Board.

I enjoyed Denis' talk on Method::Signatures, and was delighted to see that most syntax is directly copied from Perl 6 signature syntax. Talk about Perl 6 sucking creativity out of Perl 5 development.

The conference ended with a session of lighning talks, something which I always enjoy. Many lightning talks had a slightly funny tone or undertone, while still talking about interesting stuff.

Finally there was the "kick-off party", beverages and snacks sponsored by booking.com. There (and really the whole day, and yesterday too) I not only had conversations with my "old" Perl 6 friends, but also talked with many interesting people I never met before, or only met online before.

So all in all it was a nice experience, both from the social side, and from quality and contents of the talks. Venue and food are good, and the wifi too, except when it stops working for a few minutes.

I'm looking forward to two more days of conference!

(Updated: Fixed Thomas' last name)

[/perl-6] Permanent link

comments / trackbacks

Wed, 17 Apr 2013

The REPL trick


Permanent link

A recent discussion on IRC prompted me to share a small but neat trick with you.

If there are things you want to do quite often in the Rakudo REPL (the interactive "Read-Evaluate-Print Loop"), it makes sense to create a shortcut for them. And creating shortcuts for often-used stuff is what programming languages excel at, so you do it right in Perl module:

use v6;
module REPLHelper;

sub p(Mu \x) is export {
    x.^mro.map: *.^name;
}

I have placed mine in $HOME/.perl6/repl.

And then you make sure it's loaded automatically:

$ alias p6repl="perl6 -I$HOME/.perl6/repl/ -MREPLHelper"
$ p6repl
> p Int
Int Cool Any Mu
>

Now you have a neat one-letter function which tells you the parents of an object or a type, in method resolution order. And a way to add more shortcuts when you need them.

[/perl-6] Permanent link

comments / trackbacks

Sun, 31 Mar 2013

Rakudo's Abstract Syntax Tree


Permanent link

After or while a compiler parses a program, the compiler usually translates the source code into a tree format called Abstract Syntax Tree, or AST for short.

The optimizer works on this program representation, and then the code generation stage turns it into a format that the platform underneath it can understand. Actually I wanted to write about the optimizer, but noticed that understanding the AST is crucial to understanding the optimizer, so let's talk about the AST first.

The Rakudo Perl 6 Compiler uses an AST format called QAST. QAST nodes derive from the common superclass QAST::Node, which sets up the basic structure of all QAST classes. Each QAST node has a list of child nodes, possibly a hash map for unstructured annotations, an attribute (confusingly) named node for storing the lower-level parse tree (which is used to extract line numbers and context), and a bit of extra infrastructure.

The most important node classes are the following:

QAST::Stmts
A list of statements. Each child of the node is considered a separate statement.
QAST::Op
A single operation that usually maps to a primitive operation of the underlying platform, like adding two integers, or calling a routine.
QAST::IVal, QAST::NVal, QAST::SVal
Those hold integer, float ("numeric") and string constants respectively.
QAST::WVal
Holds a reference to a more complex object (for example a class) which is serialized separately.
QAST::Block
A list of statements that introduces a separate lexical scope.
QAST::Var
A variable
QAST::Want
A node that can evaluate to different child nodes, depending on the context it is compiled it.

To give you a bit of a feel of how those node types interact, I want to give a few examples of Perl 6 examples, and what AST they could produce. (It turns out that Perl 6 is quite a complex language under the hood, and usually produces a more complicated AST than the obvious one; I'll ignore that for now, in order to introduce you to the basics.)

Ops and Constants

The expression 23 + 42 could, in the simplest case, produce this AST:

QAST::Op.new(
    :op('add'),
    QAST::IVal.new(:value(23)),
    QAST::IVal.new(:value(42)),
);

Here an QAST::Op encodes a primitive operation, an addition of two numbers. The :op argument specifies which operation to use. The child nodes are two constants, both of type QAST::IVal, which hold the operands of the low-level operation add.

Now the low-level add operation is not polymorphic, it always adds two floating-point values, and the result is a floating-point value again. Since the arguments are integers and not floating point values, they are automatically converted to float first. That's not the desired semantics for Perl 6; actually the operator + is implemented as a subroutine of name &infix:<+>, so the real generated code is closer to

QAST::Op.new(
    :op('call'),
    :name('&infix:<+>'),    # name of the subroutine to call
    QAST::IVal.new(:value(23)),
    QAST::IVal.new(:value(42)),
);

Variables and Blocks

Using a variable is as simple as writing QAST::Var.new(:name('name-of-the-variable')), but it must be declared first. This is done with QAST::Var.new(:name('name-of-the-variable'), :decl('var'), :scope('lexical')).

But there is a slight caveat: in Perl 6 a variable is always scoped to a block. So while you can't ordinarily mention a variable prior to its declaration, there are indirect ways to achieve that (lookup by name, and eval(), to name just two).

So in Rakudo there is a convention to create QAST::Block nodes with two QAST::Stmts children. The first holds all the declarations, and the second all the actual code. That way all the declaration always come before the rest of the code.

So my $x = 42; say $x compiles to roughly this:

QAST::Block.new(
    QAST::Stmts.new(
        QAST::Var.new(:name('$x'), :decl('var'), :scope('lexical')),
    ),
    QAST::Stmts.new(
        QAST::Op.new(
            :op('p6store'),
            QAST::Var.new(:name('$x')),
            QAST::IVal.new(:value(42)),
        ),
        QAST::Op.new(
            :op('call'),
            :name('&say'),
            QAST::Var.new(:name('$x')),
        ),
    ),
);

Polymorphism and QAST::Want

Perl 6 distinguishes between native types and reference types. Native types are closer to the machine, and their type name is always lower case in Perl 6.

Integer literals are polymorphic in that they can be either a native int or a "boxed" reference type Int.

To model this in the AST, QAST::Want nodes can contain multiple child nodes. The compile-time context decides which of those is acutally used.

So the integer literal 42 actually produces not just a simple QAST::IVal node but rather this:

QAST::Want.new(
    QAST::WVal(Int.new(42)),
    'Ii',
    QAST::Ival(42),
)

(Note that Int.new(42) is just a nice notation to indicate a boxed integer object; it doesn't quite work like this in the code that translate Perl 6 source code into ASTs).

The first child of a QAST::Want node is the one used by default, if no other alternative matches. The comes a list where the elements with odd indexes are format specifications (here Ii for integers) and the elements at even-side indexes are the AST to use in that case.

An interesting format specification is 'v' for void context, which is always chosen when the return value from the current expression isn't used at all. In Perl 6 this is used to eagerly evaluate lazy lists that are used in void context, and for several optimizations.

[/perl-6] Permanent link

comments / trackbacks

Tue, 12 Feb 2013

Pattern Matching and Unpacking


Permanent link

When talking about pattern matching in the context of Perl 6, people usually think about regex or grammars. Those are indeed very powerful tools for pattern matching, but not the only one.

Another powerful tool for pattern matching and for unpacking data structures uses signatures.

Signatures are "just" argument lists:

sub repeat(Str $s, Int $count) {
    #     ^^^^^^^^^^^^^^^^^^^^  the signature
    # $s and $count are the parameters
    return $s x $count
}

Nearly all modern programming languages have signatures, so you might say: nothing special, move along. But there are two features that make them more useful than signatures in other languages.

The first is multi dispatch, which allows you to write several routines with the name, but with different signatures. While extremely powerful and helpful, I don't want to dwell on them. Look at Chapter 6 of the "Using Perl 6" book for more details.

The second feature is sub-signatures. It allows you to write a signature for a sigle parameter.

Which sounds pretty boring at first, but for example it allows you to do declarative validation of data structures. Perl 6 has no built-in type for an array where each slot must be of a specific but different type. But you can still check for that in a sub-signature

sub f(@array [Int, Str]) {
    say @array.join: ', ';
}
f [42, 'str'];      # 42, str
f [42, 23];         # Nominal type check failed for parameter '';
                    # expected Str but got Int instead in sub-signature
                    # of parameter @array

Here we have a parameter called @array, and it is followed by a square brackets, which introduce a sub-signature for an array. When calling the function, the array is checked against the signature (Int, Str), and so if the array doesn't contain of exactly one Int and one Str in this order, a type error is thrown.

The same mechanism can be used not only for validation, but also for unpacking, which means extracting some parts of the data structure. This simply works by using variables in the inner signature:

sub head(*@ [$head, *@]) {
    $head;
}
sub tail(*@ [$, *@tail]) {
    @tail;
}
say head <a b c >;      # a
say tail <a b c >;      # b c

Here the outer parameter is anonymous (the @), though it's entirely possible to use variables for both the inner and the outer parameter.

The anonymous parameter can even be omitted, and you can write sub tail( [$, *@tail] ) directly.

Sub-signatures are not limited to arrays. For working on arbitrary objects, you surround them with parenthesis instead of brackets, and use named parameters inside:

multi key-type ($ (Numeric :$key, *%)) { "Number" }
multi key-type ($ (Str     :$key, *%)) { "String" }
for (42 => 'a', 'b' => 42) -> $pair {
    say key-type $pair;
}
# Output:
# Number
# String

This works because the => constructs a Pair, which has a key and a value attribute. The named parameter :$key in the sub-signature extracts the attribute key.

You can build quite impressive things with this feature, for example red-black tree balancing based on multi dispatch and signature unpacking. (More verbose explanation of the code.) Most use cases aren't this impressive, but still it is very useful to have occasionally. Like for this small evaluator.

[/perl-6] Permanent link

comments / trackbacks