Categories
Posts in this category
- Current State of Exceptions in Rakudo and Perl 6
- Meet DBIish, a Perl 6 Database Interface
- doc.perl6.org and p6doc
- Exceptions Grant Report for May 2012
- Exceptions Grant Report -- Final update
- Perl 6 Hackathon in Oslo: Be Prepared!
- Localization for Exception Messages
- News in the Rakudo 2012.05 release
- News in the Rakudo 2012.06 release
- Perl 6 Hackathon in Oslo: Report From The First Day
- Perl 6 Hackathon in Oslo: Report From The Second Day
- Quo Vadis Perl?
- Rakudo Hack: Dynamic Export Lists
- SQLite support for DBIish
- Stop The Rewrites!
- Upcoming Perl 6 Hackathon in Oslo, Norway
- A small regex optimization for NQP and Rakudo
- Pattern Matching and Unpacking
- Rakudo's Abstract Syntax Tree
- The REPL trick
- First day at YAPC::Europe 2013 in Kiev
- YAPC Europe 2013 Day 2
- YAPC Europe 2013 Day 3
- A new Perl 6 community server - call for funding
- New Perl 6 community server now live, accepting signups
- A new Perl 6 community server - update
- All Perl 6 modules in a box
- doc.perl6.org: some stats, future directions
- Profiling Perl 6 code on IRC
- Why is it hard to write a compiler for Perl 6?
- Writing docs helps you take the user's perspective
- Perl 6 Advent Calendar 2016 -- Call for Authors
- Perl 6 By Example: Running Rakudo
- Perl 6 By Example: Formatting a Sudoku Puzzle
- Perl 6 By Example: Testing the Say Function
- Perl 6 By Example: Testing the Timestamp Converter
- Perl 6 By Example: Datetime Conversion for the Command Line
- What is Perl 6?
- Perl 6 By Example, Another Perl 6 Book
- Perl 6 By Example: Silent Cron, a Cron Wrapper
- Perl 6 By Example: Testing Silent Cron
- Perl 6 By Example: Stateful Silent Cron
- Perl 6 By Example: Perl 6 Review
- Perl 6 By Example: Parsing INI files
- Perl 6 By Example: Improved INI Parsing with Grammars
- Perl 6 By Example: Generating Good Parse Errors from a Parser
- Perl 6 By Example: A File and Directory Usage Graph
- Perl 6 By Example: Functional Refactorings for Directory Visualization Code
- Perl 6 By Example: A Unicode Search Tool
- What's a Variable, Exactly?
- Perl 6 By Example: Plotting using Matplotlib and Inline::Python
- Perl 6 By Example: Stacked Plots with Matplotlib
- Perl 6 By Example: Idiomatic Use of Inline::Python
- Perl 6 By Example: Now "Perl 6 Fundamentals"
- Perl 6 Books Landscape in June 2017
- Living on the (b)leading edge
- The Loss of Name and Orientation
- Perl 6 Fundamentals Now Available for Purchase
- My Ten Years of Perl 6
- Perl 6 Coding Contest 2019: Seeking Task Makers
- A shiny perl6.org site
- Creating an entry point for newcomers
- An offer for software developers: free IRC logging
- Sprixel, a 6 compiler powered by JavaScript
- Announcing try.rakudo.org, an interactive Perl 6 shell in your browser
- Another perl6.org iteration
- Blackjack and Perl 6
- Why I commit Crud to the Perl 6 Test Suite
- This Week's Contribution to Perl 6 Week 5: Implement Str.trans
- This Week's Contribution to Perl 6
- This Week's Contribution to Perl 6 Week 8: Implement $*ARGFILES for Rakudo
- This Week's Contribution to Perl 6 Week 6: Improve Book markup
- This Week's Contribution to Perl 6 Week 2: Fix up a test
- This Week's Contribution to Perl 6 Week 9: Implement Hash.pick for Rakudo
- This Week's Contribution to Perl 6 Week 11: Improve an error message for Hyper Operators
- This Week's Contribution to Perl 6 - Lottery Intermission
- This Week's Contribution to Perl 6 Week 3: Write supporting code for the MAIN sub
- This Week's Contribution to Perl 6 Week 1: A website for proto
- This Week's Contribution to Perl 6 Week 4: Implement :samecase for .subst
- This Week's Contribution to Perl 6 Week 10: Implement samespace for Rakudo
- This Week's Contribution to Perl 6 Week 7: Implement try.rakudo.org
- What is the "Cool" class in Perl 6?
- Report from the Perl 6 Hackathon in Copenhagen
- Custom operators in Rakudo
- A Perl 6 Date Module
- Defined Behaviour with Undefined Values
- Dissecting the "Starry obfu"
- The case for distributed version control systems
- Perl 6: Failing Softly with Unthrown Exceptions
- Perl 6 Compiler Feature Matrix
- The first Perl 6 module on CPAN
- A Foray into Perl 5 land
- Gabor: Keep going
- First Grant Report: Structured Error Messages
- Second Grant Report: Structured Error Messages
- Third Grant Report: Structured Error Messages
- Fourth Grant Report: Structured Error Messages
- Google Summer of Code Mentor Recap
- How core is core?
- How fast is Rakudo's "nom" branch?
- Building a Huffman Tree With Rakudo
- Immutable Sigils and Context
- Is Perl 6 really Perl?
- Mini-Challenge: Write Your Prisoner's Dilemma Strategy
- List.classify
- Longest Palindrome by Regex
- Perl 6: Lost in Wonderland
- Lots of momentum in the Perl 6 community
- Monetize Perl 6?
- Musings on Rakudo's spectest chart
- My first executable from Perl 6
- My first YAPC - YAPC::EU 2010 in Pisa
- Trying to implement new operators - failed
- Programming Languages Are Not Zero Sum
- Perl 6 notes from February 2011
- Notes from the YAPC::EU 2010 Rakudo hackathon
- Let's build an object
- Perl 6 is optimized for fun
- How to get a parse tree for a Perl 6 Program
- Pascal's Triangle in Perl 6
- Perl 6 in 2009
- Perl 6 in 2010
- Perl 6 in 2011 - A Retrospection
- Perl 6 ticket life cycle
- The Perl Survey and Perl 6
- The Perl 6 Advent Calendar
- Perl 6 Questions on Perlmonks
- Physical modeling with Math::Model and Perl 6
- How to Plot a Segment of a Circle with SVG
- Results from the Prisoner's Dilemma Challenge
- Protected Attributes Make No Sense
- Publicity for Perl 6
- PVC - Perl 6 Vocabulary Coach
- Fixing Rakudo Memory Leaks
- Rakudo architectural overview
- Rakudo Rocks
- Rakudo "star" announced
- My personal "I want a PONIE" wish list for Rakudo Star
- Rakudo's rough edges
- Rats and other pets
- The Real World Strikes Back - or why you shouldn't forbid stuff just because you think it's wrong
- Releasing Rakudo made easy
- Set Phasers to Stun!
- Starry Perl 6 obfu
- Recent Perl 6 Developments August 2008
- The State of Regex Modifiers in Rakudo
- Strings and Buffers
- Subroutines vs. Methods - Differences and Commonalities
- A SVG plotting adventure
- A Syntax Highlighter for Perl 6
- Test Suite Reorganization: How to move tests
- The Happiness of Design Convergence
- Thoughts on masak's Perl 6 Coding Contest
- The Three-Fold Function of the Smart Match Operator
- Perl 6 Tidings from September and October 2008
- Perl 6 Tidings for November 2008
- Perl 6 Tidings from December 2008
- Perl 6 Tidings from January 2009
- Perl 6 Tidings from February 2009
- Perl 6 Tidings from March 2009
- Perl 6 Tidings from April 2009
- Perl 6 Tidings from May 2009
- Perl 6 Tidings from May 2009 (second iteration)
- Perl 6 Tidings from June 2009
- Perl 6 Tidings from August 2009
- Perl 6 Tidings from October 2009
- Timeline for a syntax change in Perl 6
- Visualizing match trees
- Want to write shiny SVG graphics with Perl 6? Port Scruffy!
- We write a Perl 6 book for you
- When we reach 100% we did something wrong
- Where Rakudo Lives Now
- Why Rakudo needs NQP
- Why was the Perl 6 Advent Calendar such a Success?
- What you can write in Perl 6 today
- Why you don't need the Y combinator in Perl 6
- You are good enough!
Sun, 05 Feb 2017
Perl 6 By Example: Improved INI Parsing with Grammars
Permanent link
This blog post is part of my ongoing project to write a book about Perl 6.
If you're interested, either in this book project or any other Perl 6 book news, please sign up for the mailing list at the bottom of the article, or here. It will be low volume (less than an email per month, on average).
Last week we've seen a collection of regexes that can parse a configuration file in the INI format that's popular in world of Microsoft Windows applications.
Here we'll explore grammars, a feature that groups regexes into a class-like structure, and how to extract structured data from a successful match.
Grammars
A grammar is class with some extra features that makes it suitable for parsing text. Along with methods and attributes you can put regexes into a grammar.
This is what the INI file parser looks like when formulated as a grammar:
grammar IniFile {
token key { \w+ }
token value { <!before \s> <-[\n;]>+ <!after \s> }
token pair { <key> \h* '=' \h* <value> \n+ }
token header { '[' <-[ \[ \] \n ]>+ ']' \n+ }
token comment { ';' \N*\n+ }
token block { [<pair> | <comment>]* }
token section { <header> <block> }
token TOP { <block> <section>* }
}
You can use it to parse some text by calling the parse
method, which uses
regex or token TOP
as the entry point:
my $result = IniFile.parse($text);
Besides the standardized entry point, a grammar offers more advantages. You can inherit from it like from a normal class, thus bringing even more reusability to regexes. You can group extra functionality together with the regexes by adding methods to the grammar. And then there are some mechanisms in grammars that can make your life as a developer easier.
One of them is dealing with whitespace. In INI files, horizontal whitespace is
generally considered to be insignificant, in that key=value
and key =
value
lead to the same configuration of the application. So far we've dealt
with that explicitly by adding \h*
to token pair
. But there are place we
haven't actually considered. For example it's OK to have a comment that's not
at start of the line.
The mechanism that grammars offer is that you can define a rule called ws
,
and when you declare a token with rule
instead of token
(or enable this
feature in regex through the :sigspace
modifier), Perl 6 inserts implicit
<ws>
calls for you where there is whitespace in the regex definition:
grammar IniFile {
token ws { \h* }
rule pair { <key> '=' <value> \n+ }
# rest as before
}
This might not be worth the effort for a single rule that needs to parse whitespace, but when there are more, this really pays off by keeping whitespace parsing in a singles space.
Note that you should only parse insignificant whitespace in token ws
. For
example for INI files, newlines are significant, so ws
shouldn't match
them.
Extracting Data from the Match
So far the IniFile
grammar only checks whether a given input matches the
grammar or not. But when it does match, we really want the result of the parse
in a data structure that's easy to use. For example we could translate this
example INI file:
key1=value2
[section1]
key2=value2
key3 = with spaces
; comment lines start with a semicolon, and are
; ignored by the parser
[section2]
more=stuff
Into this data structure of nested hashes:
{
_ => {
key1 => "value2"
},
section1 => {
key2 => "value2",
key3 => "with spaces"
},
section2 => {
more => "stuff"
}
}
Key-value pairs from outside of any section show up in the _
top-level
key.
The result from the IniFile.parse
call is a
Match object that has (nearly) all the
information necessary to extract the desired match. If you turn a Match object
into a string, it becomes the matched string. But there's more. You can use it
like a hash to extract the matches from named submatches. For example if the
top-level match from
token TOP { <block> <section>* }
produces a Match
object $m
, then $m<block>
is again a Match object, this
one from the match of the call of token block´. And
$mis a list
of
Matchobjects from the repeated calls to token
section. So a
Match` is
really a tree of matches.
We can walk this data structure to extract the nested hashes.
Token header
matches a string like "[section1]\n", and we're only
interested in
"section1". To get to the inner part, we can modify token
header` by inserting a pair of round parenthesis around the subregex whose
match we're interested in:
token header { '[' ( <-[ \[ \] \n ]>+ ) ']' \n+ }
# ^^^^^^^^^^^^^^^^^^^^ a capturing group
That's a capturing group, and we can get its match by using the top-level
match for header
as an array, and accessing its first element. This leads us
to the full INI parser:
sub parse-ini(Str $input) {
my $m = IniFile.parse($input);
unless $m {
die "The input is not a valid INI file.";
}
sub block(Match $m) {
my %result;
for $m<block><pair> -> $pair {
%result{ $pair<key>.Str } = $pair<value>.Str;
}
return %result;
}
my %result;
%result<_> = hash-from-block($m);
for $m<section> -> $section {
%result{ $section<header>[0].Str } = hash-from-block($section);
}
return %result;
}
This top-down approach works, but it requires a very intimate understanding of the grammar's structure. Which means that if you change the structure during maintenance, you'll have a hard time figuring out how to change the data extraction code.
So Perl 6 offers a bottom-up approach as well. It allows you to write a data
extraction or action method for each regex, token or rule. The grammar engine
passes in the match object as the single argument, and the action method can
call the routine make
to attach a result to the match object. The result is
available through the .made
method on the match object.
This execution of action methods happens as soon as a regex matches
successfully, which means that an action method for a regex can rely on the
fact that the action methods for subregex calls have already run. For example
when the rule pair { <key> '=' <value> \n+ }
is being executed, first
token key
matches successfully, and its action method runs immediately
afterwards. Then token value
matches, and its action method runs too. Then
finally rule pair
itself can match successfully, so its action method can
rely on $m<key>.made
and $m<value>.made
being available, assuming that the
match result is stored in variable $m
.
Speaking of variables, a regex match implicitly stores its result in the
special variable $/
, and it is custom to use $/
as parameter in action
methods. And there is a shortcut for accessing named submatches: instead of
writing $/<key>
, you can write $<key>
. With this convention in mind, the
action class becomes:
class IniFile::Actions {
method key($/) { make $/.Str }
method value($/) { make $/.Str }
method header($/) { make $/[0].Str }
method pair($/) { make $<key>.made => $<value>.made }
method block($/) { make $<pair>.map({ .made }).hash }
method section($/) { make $<header>.made => $<block>.made }
method TOP($/) {
make {
_ => $<block>.made,
$<section>.map: { .made },
}
}
}
The first two action methods are really simple. The result of a key
or
value
match is simply the string that matched. For a header
, it's just the
substring inside the brackets. Fittingly, a pair
returns a
Pair object, composed from key and value.
Method block
constructs a hash from all the lines in the block by iterating
over each pair
submatch, extracting the already attached Pair
object.
One level above that in the match tree, section
takes that hash and pairs it
with the name of section, extracted from $<header>.made
. Finally the
top-level action method gathers the sectionless key-value pairs under they key
_
as well as all the sections, and returns them in a hash.
In each method of the action class, we only rely on the knowledge of the
first level of regexes called directly from the regex that corresponds to the
action method, and the data types that they .made
. Thus when you refactor one
regex, you also have to change only the corresponding action method. Nobody
needs to be aware of the global structure of the grammar.
Now we just have to tell Perl 6 to actually use the action class:
sub parse-ini(Str $input) {
my $m = IniFile.parse($input, :actions(IniFile::Actions));
unless $m {
die "The input is not a valid INI file.";
}
return $m.made
}
If you want to start parsing with a different rule than TOP
(which you might
want to do in a test, for example), you can pass a named argument rule
to
method parse
:
sub parse-ini(Str $input, :$rule = 'TOP') {
my $m = IniFile.parse($input,
:actions(IniFile::Actions),
:$rule,
);
unless $m {
die "The input is not a valid INI file.";
}
return $m.made
}
say parse-ini($ini).perl;
use Test;
is-deeply parse-ini("k = v\n", :rule<pair>), 'k' => 'v',
'can parse a simple pair';
done-testing;
To better encapsulate all the parsing functionality within the grammar, we can
turn parse-ini
into a method:
grammar IniFile {
# regexes/tokens unchanged as before
method parse-ini(Str $input, :$rule = 'TOP') {
my $m = self.parse($input,
:actions(IniFile::Actions),
:$rule,
);
unless $m {
die "The input is not a valid INI file.";
}
return $m.made
}
}
# Usage:
my $result = IniFile.parse-ini($text);
To make this work, the class IniFile::Actions
either has to be declared before the
grammar, or it needs to be pre-declared with class IniFile::Action { ... }
at the top of the file (with literal three dots to mark it as a forward
declaration).
Summary
Match objects are really a tree of matches, with nodes for each named submatch and for each capturing group. Action methods make it easy to decouple parsing from data extraction.
Next we'll explore how to generate better error messages from a failed parse.