(Near the end of this long post is a TL;DR section…)
In the last weeks I experimented with new ways to build parsers. You probably all know how to use basic parsers like int
, token
or spaces
, and how to combine them with oneOf
, andThen
or in parser pipelines. But I needed something else, and I found a trick which I want to share with you.
Use Case
Some time ago, on the Elm slack @SiriusStarr asked for a lookahead parser (a parser that checks whether a given parser succeeds or not, but without chomping any characters). I was interested, had time, and after many trials and errors I finally came up with this code:
import Parser exposing (Parser)
lookAhead : Parser a -> Parser ()
lookAhead parser =
Parser.oneOf
[ Parser.oneOf
[ parser
|> Parser.backtrackable
|> Parser.andThen (\_ -> Parser.commit ())
|> Parser.andThen (\_ -> Parser.problem "")
, Parser.succeed
(parser
|> Parser.backtrackable
|> Parser.map (\_ -> ())
)
]
|> Parser.backtrackable
, Parser.succeed (Parser.succeed ())
]
|> Parser.andThen identity
Here’s how it works in practice:
Parser.run (lookAhead Parser.int) "123"
--> Ok ()
Parser.run (lookAhead Parser.int) "abc"
--> Err [ { row = 1, col = 1, problem = Parser.ExpectingInt } ]
If the int
parser succeeds, so does the lookAhead
parser, and if the int
parser fails, the lookAhead
parser fails, too.
But if the parser succeeds, it doesn’t chomp any characters, as desired:
Parser.run (lookAhead Parser.int |> Parser.getChompedString) "123"
--> Ok ""
While the code works, it has its problems:
-
The implementation is hard to understand. From just looking at the code, even I as the author cannot see easily which role each part plays and how they all work together to achieve the desired effect.
-
It’s not possible to return the parsed value (changing the signature to
lookAhead : Parser a -> Parser a
). In order to move the parser back to the beginning after a successful parse, I have to use theproblem
function to turn the success into a failure, thereby loosing the successfully parsed value. -
The code as it is doesn’t work with the
Parser.Advanced
module. I would have to change the signature intolookAhead: x -> Parser c x a -> Parser c x ()
. The user would have to pass an additional value of her error typex
, which is needed internally as the parameter of theproblem
function mentioned above. The additional parameter would never be returned in the function result, though. -
It’s possible to implement a negative lookahead parser in the same way, but the resulting code wouldn’t be easily recognizable as the inverse of the above (positive) lookahead parser. It would be nice to have a common code structure like an
if ... then ... else ...
with the branches flipped for the negative lookahead parser.
As it turns out, we can solve all these problems…
Trick
Without further ado, here’s the magic trick I found:
✨💫⭐️ It's possible to run a parser inside of another parser ⭐️💫✨
Hmmm. This doesn’t sound very spectacular, doesn’t it?
Don’t we do this all the time already?
… I hear you ask, and further:
What about the following parser:
pointParser : Parser ( Int, Int ) pointParser = Parser.succeed Tuple.pair |= Parser.int |. Parser.token "," |= Parser.int
Doesn’t it run
int
andtoken
parsers inside of thepointParser
?
No, this is not what I mean. To show you what I want let’s implement a function “to run a parser inside of another parser” together, step by step. We’ll start with
trick : ...
trick =
...
Ok, let’s go. What does this mean: “run a parser inside of another parser”? How can we be “inside” a parser?
Like in the pointParser
example, it means that we are building a parser.
And what do we do with the built parser?
We are returning it as the function result.
Ok, so you want something like
trick : Parser a trick = ...
?
Yes. The “…” is the place inside the parser we are building.
Ok. This part is exactly the same as in the
pointParser
example above. Now what about “run a parser inside of another parser”? We’re building a parser, but what is the parser we need to run?
This means that we need a parser to work with. I didn’t specify any requirements for this parser, so the user of our function should be free to choose whatever parser she wants.
You mean our function needs a parameter? As in
trick : Parser a -> Parser b trick parser = ...
?
Exactly. We get a parser and we return another parser.
Nice. Now comes the last part, already: we need to “run a parser inside of another parser”. Didn’t we do this in the
pointParser
sample?
No, in the example we used the int
and token
parsers, but I want to run the given parser.
So how do we run a Parser?
Literally with the function Parser.run
. This function takes the parser as its first argument.
Ok, we could change the code to:
trick : Parser a -> Parser b trick parser = Parser.run parser ...
run
needs a source string as the second parameter. Which string should we use?
We use the currently parsed source string. The Parser.getSource
function gives us access to it.
Hmm,
getSource
gives us aParser
. How do we get theString
?
To get the source string and build another parser from it we can use the Parser.andThen
combinator. We can give it a function that takes the source string as its parameter and uses it to build another parser.
Hmm, wait a moment. You say we get the source “and then” process it?
trick : Parser a -> Parser b trick parser = Parser.getSource |> Parser.andThen (\source -> Parser.run parser source ... )
Yes, good! But I don’t want to run the parser at the beginning of the source string. I want to run it at the current parser position.
You mean the position as returned by
Parser.getPosition
?
Close, but in this case we need Parser.getOffset
instead. We need to chop off the already parsed beginning of the source string, and we can do so with String.dropLeft
. For this we need the “offset” as an index into the source string.
I see. Like in getting the source string, we need another
andThen
call to get the offset?
Yes. You can nest one inside the other.
Ok. I can get the offset, too, and then use it with
String.dropLeft
, as you said:trick : Parser a -> Parser b trick parser = Parser.getSource |> Parser.andThen (\source -> Parser.getOffset |> Parser.andThen (\offset -> Parser.run parser (String.dropLeft offset source) ... ) )
This is getting more and more complicated…
Very good! We’re almost finished. We already run the given parser at the current parser position. The result of it is a Result (List Parser.DeadEnd) a
.
What should we do with it?
This totally depends on the use case. As such it should be the choice of the user of the function. This means that we need another parameter, a callback function supplied by the user, which gets called with the result of the test run and gives us back the parser we finally need to return.
Ok, we’ll add a callback parameter. This would be
trick : Parser a -> (Result (List Parser.DeadEnd) a -> Parser b) -> Parser b trick parser callback = Parser.getSource |> Parser.andThen (\source -> Parser.getOffset |> Parser.andThen (\offset -> Parser.run parser (String.dropLeft offset source) |> callback ) )
Great. This is exactly what I meant! We’re finished
Yeah! Finally, we should give the function a meaningful name.
Absolutely. I’ll choose getResultAndThen
. “getResultAndThen” because we determine the Result
of the Parser.run
function, “getResultAndThen” like the other “getXxx” functions in the elm/parser
package which return information without affecting the current parser state, and “getResultAndThen” because the signature looks similar to a flipped version of Parser.andThen
:
flippedAndThen : Parser a -> (a -> Parser b) -> Parser b
getResultAndThen : Parser a -> (Result (List Parser.DeadEnd) a -> Parser b) -> Parser b
Here’s the final result:
getResultAndThen : Parser a -> (Result (List Parser.DeadEnd) a -> Parser b) -> Parser b
getResultAndThen parser callback =
Parser.getSource
|> Parser.andThen
(\source ->
Parser.getOffset
|> Parser.andThen
(\offset ->
String.dropLeft offset source
|> Parser.run parser
|> callback
)
)
The central aspect of this implementation is that the test run of the given parser has absolutely no effect on the current parser state.
Now let’s see how we can make use of this…
Back to the Use Case
For the lookAhead
parser, we need to check whether a given parser succeeds or not, but without chomping any characters. This is exactly what our new function can give us:
lookAhead : Parser.Parser a -> Parser.Parser ()
lookAhead parser =
getResultAndThen parser <|
\result ->
case result of
Ok _ ->
...
Err _ ->
...
This runs the given parser at the current parser position without affecting the current parser state, in particular without chomping any characters. All that remains to do is to handle the Result
passed to our callback function:
- If the result is
Ok
(the parser succeeds), we need to succeed, too. This simply isParser.succeed ()
. - If the result is
Err
(the parser fails), we need to fail, too. If we want to fail with the same error as the given parser, we can just return the parser itself! After all, we know that it will fail.
So here’s the alternative version of lookAhead
:
lookAhead : Parser.Parser a -> Parser.Parser ()
lookAhead parser =
getResultAndThen parser <|
\result ->
case result of
Ok _ ->
Parser.succeed ()
Err _ ->
-- here we know that the parser
-- fails with the expected error
Parser.backtrackable parser
(I added a Parser.backtrackable
, because in the original implementation, the lookAhead
parser was always backtrackable.)
I’m sure you agree with me that this code is much more understandable than the original version.
What about the other flaws?
Return the parsed value
Returning the parsed value in the case of a succeeding test run is easy. Just change the function signature and the Ok
case like this:
lookAhead : Parser.Parser a -> Parser.Parser a
lookAhead parser =
getResultAndThen parser <|
\result ->
case result of
Ok a ->
Parser.succeed a
Err _ ->
Parser.backtrackable parser
Support the Parser.Advanced
module
The code of getResultAndThen
and of lookAhead
can be copied 1:1 to be used together with the Parser.Advanced
module. We only need to change the function signatures to
getResultAndThen : Parser c x a -> (Result (List Parser.DeadEnd c x) a -> Parser c x b) -> Parser c x b
lookAhead : Parser.Parser c x a -> Parser.Parser c x a
Similarly structured negative lookahead parser
This is easy to implement too, both for the Parser
and the Parser.Advanced
modules:
negativeLookAhead : String -> Parser.Parser a -> Parser.Parser ()
negativeLookAhead msg parser =
getResultAndThen parser <|
\result ->
case result of
Ok _ ->
Parser.problem msg
Err _ ->
Parser.succeed ()
As you can see, it has exactly the same structure as the (positive) lookAhead
parser, with both cases flipped (more or less).
But do the new implementations deliver what they promise?
Parser.run (lookAhead Parser.int) "123"
--> Ok 123
Parser.run (lookAhead Parser.int |> Parser.getChompedString) "123"
--> Ok ""
Parser.run (lookAhead Parser.int) "abc"
--> Err [ { row = 1, col = 1, problem = Parser.ExpectingInt } ]
The positive lookAhead
parser does! If the test parser succeeds, it succeeds too, now even returning the successfully parsed value, but without chomping anything. If the test parser fails, the lookAhead
parser fails, too.
Parser.run (negativeLookAhead "unexpected int" Parser.int) "123"
--> Err [ { row = 1, col = 1, problem = Parser.Problem "unexpected int" } ]
Parser.run (negativeLookAhead "unexpected int" Parser.int) "abc"
--> Ok ()
The negativeLookAhead
parser works, too! If the test parser succeeds, it fails with the given problem message. If the test parser fails, the negativeLookAhead
parser succeeds.
Phew.
At the end of this long post, I hope I finally could show you the magic of
✨💫⭐️ It's possible to run a parser inside of another parser ⭐️💫✨
TL;DR
By running a parser inside of another parser, it’s possible to build many interesting new parsers. In this post, I showed how to literally run a parser by calling the Parser.run
function inside of another parser, which allows to examine the result of the test run without affecting the current parser state.
Using this technique, it’s easy to build lookAhead
and negativeLookAhead
parsers, but this is not all…
More Use Cases
I recently published a function like getResultAndThen
and the lookahead parsers in my elm-parser-extra
package.
But the lookahead parsers were just a byproduct.
I mainly used the getResultAndThen
function in my elm-parser-bug-workaround
package to implement workarounds for a bug in elm/parser
, where the parser gets into an inconsistent internal state after using chompUntil
, chompUntilEndOr
, lineComment
, or multiComment
.
In fact, even these bug workarounds were a byproduct. I found the getResultAndThen
trick while exploring something else. But this will be the topic of another post…
Thanks for reading this far.