2023-07-09
Debugging Perl 6 books while learning
Spotted confusing errors from the "parsing" book
Errata
- (2023-07-09) The following factual errors are very unfortunate when I am trying to unravel this whitespace tangle.
If the predefined <ws>
construct is equivalent in code as
regex ws { <!ww> \s*}
then clearly it matches zero to many different whitespace characters unless it's within a word. The two fatal errors are: "at least one" and "unless it's at a word boundary".
!500
Even more shocking errors from the "learning" book
Errata
- (2023-07-09) After spotting two bad errors regarding
<ws>
in the book - Parsing With Perl 6 Regexes and Grammars, 2017, Moritz Lenz, Apress book, I tried to look for and expected clarity in guidance from this book. Instead, I found what appears to be more confusing errors and terribly imprecise misused language.
First, <|wb>
, according to official documentation, is wrong. It should be
<?wb>
or the hard-to-understand <|w>
.
Second, the comment "required between word characters, optional otherwise" is just plain wrong. It's confusing language. A word character should be [a-zA-Z0-9_] or any Unicode letter. If we take "Hello world!", "between word characters" therefore would mean any point where I added a ^:
H^e^l^l^o w^o^r^l^d!
Those points aren't places where whitespace is required. Quite the opposite.
What the author means to say is "between words" or more verbosely, "between a word character and a non-word character." If we ignore both ends, there are exactly two points where <ws>
can occur:
Hello^world^!
!450
In terms of confusing writing, seriously I'm appalled, but one instance is forgivable. Will watch out for more.
Setting out to get to the bottom of whitespace
After the horrible findings above, I set out to gain absolute clarify.
$
Perl 6 Notes
Whitespace (ws)
token
andrule
are different fromregex
by adding ratcheting (non-backtracking).rule
also inserts<.ws>
ONLY where there is a literal space, specifically:- After terms and other named things
- But not at the beginning of the rule #todo/verify
An over-the-top but super clarifying example
To absolutely understand ws handling, here's some actual code (fn1) to confirm what I suspect to be true.
my token tn {<alnum>}
my token t {\d \d}
my token t_anchored {^ \d \d $}
my token t_trailing_backslash_s {\d \d\s}
my rule r {\d \d} # two spaces
my rule r_nospace {\d\d}
my rule r_nospace_from_named {<tn><tn>}
my rule r_space_from_named {<tn> <tn>}
my rule r_training_space {\d \d } # two spaces
my rule r_training_backslash_s {\d \d\s} # two spaces
Output:
Each of these 16 cases has been confirmed.
「42」 matched token t {\d \d}
「 42」 matched token t {\d \d}
「 42」 does not match token t_anchored {^ \d \d $}
「4 2」 does not match token t {\d \d}
「42」 does not match token t_trailing_backslash_s {\d \d\s}
「42」 does not match rule r {\d \d}
「42」 matched rule r_nospace {\d\d}
「4 2」 does not match rule r_nospace {\d\d}
「4 2」 does not match rule r_nospace_from_named {<tn><tn>}
「42」 does not match rule r_space_from_named {<tn> <tn>}
「4 2」 matched rule r {\d \d}
「4 2」 matched rule r {\d \d}
「4 2」 matched rule r {\d \d}
「 4 2」 matched rule r {\d \d}
「 4 2」 matched rule r_training_space {\d \d }
「 4 2」 does not match rule r_training_backslash_s {\d \d\s}
fn1: I had to use the .gist
method and the &
method reference in &t.gist
to print token t {\d \d}
. I hope I'll learn an easier way to stringify a regex object.