Posts in this category

Wed, 12 Oct 2011

The Three-Fold Function of the Smart Match Operator

Permanent link

In Perl 5, if you want to match a regex against a particular string, you write $string =~ $regex.

In the design process of Perl 6, people have realized that you cannot only match against regexes, but lots of other things can act as patterns too: types (checking type conformance), numbers, strings, junctions (composites of values), subroutine signatures and so on. So smart matching was born, and it's now written as $topic ~~ $pattern. Being a general comparison mechanism is the first function of the smart match operator.

But behold, there were problems. One of them was the perceived need for special syntactic forms on the right hand side of the smart match operator to cover some cases. Those were limited and hard to implement. There was also the fact that now we had two different ways to invoke regexes: smart matching, and direct invocation as m/.../, which matches against the topic variable $_. That wasn't really a problem as such, but it was an indicator of design smell.

And that's where the second function of the smart match operator originated: topicalization. Previously, $a ~~ $b mostly turned into a method call, $b.ACCEPTS($a). The new idea was to set the topic variable to $a in a small scope, which allowed many special cases to go away. It also nicely unified with given $topic { when $matcher { ... } }, which was already specified as being a topicalizer.

In the new model, MATCH ~~ PAT becomes something like do { $_ = MATCH; PAT.ACCEPTS($_) } -- which means that if MATCH accesses $_, it automatically does what the user wants.

Awesomeness reigned, and it worked out great.

Until the compiler writers actually started to implement a few more cases of regex matching. The first thing we noticed was that if $str ~~ $regex { ... } behaved quite unexpectedly. What happend was that $_ got set to $str, the match was conducted and returned a Match object. And then called $match.ACCEPTS($str), which failed. A quick hack around that was to modify Match.ACCEPTS to always return the invocant (ie the Match on which it was called), but of course that was only a stop gap solution.

The reason it doesn't work for other, more involved cases of regex invocations is that they don't fit into the "does $a match $b?" schema. Two examples:

# :g for "global", all matches
my @matches = $str ~~ m:g/pattern/; 

if $str ~~ s/pattern/substitution/ { ... }

People expect those to work. But global matching of a regex isn't a simple conformance check, and that is reflected in the return value: a list. So should we special-cases smart-matching against a list, just because we can't get global matching to work in smart-matching otherwise? (People have also proposed to return a kind of aggregate Match object instead of a list; that comes with the problem that Match objects aren't lazy, but lists are. You could "solve" that with a LazyMatch type; watch the pattern of workarounds unfold...)

A substitution is also not a simple matching operation. In Perl 5, a s/// returns the number of successful substitutions. In Perl 6, that wouldn't work with the current setup of the smart match operator, where it would then smart-match the string against the returned number of matches.

So to summarize, the smart match operator has three functions: comparing values to patterns, topicalization, and conducting regex matches.

These three functions are distinct enough to start to interact in weird ways, which limits the flexibility in choice of return values from regex matches and substitutions.

I don't know what the best way forward is. Maybe it is to reintroduce a dedicated operator for regex matching, which seems to be the main feature with which topicalization interacts badly. Maybe there are other good ideas out there. If so, I'd love to hear about them.

[/perl-6] Permanent link