How to build interesting parsers

pit · November 25, 2022, 9:01pm

(Near the end of this long post is a TL;DR section…)

In the last weeks I experimented with new ways to build parsers. You probably all know how to use basic parsers like int, token or spaces, and how to combine them with oneOf, andThen or in parser pipelines. But I needed something else, and I found a trick which I want to share with you.

Use Case

Some time ago, on the Elm slack @SiriusStarr asked for a lookahead parser (a parser that checks whether a given parser succeeds or not, but without chomping any characters). I was interested, had time, and after many trials and errors I finally came up with this code:

import Parser exposing (Parser)

lookAhead : Parser a -> Parser ()
lookAhead parser =
    Parser.oneOf
        [ Parser.oneOf
            [ parser
                |> Parser.backtrackable
                |> Parser.andThen (\_ -> Parser.commit ())
                |> Parser.andThen (\_ -> Parser.problem "")
            , Parser.succeed
                (parser
                    |> Parser.backtrackable
                    |> Parser.map (\_ -> ())
                )
            ]
            |> Parser.backtrackable
        , Parser.succeed (Parser.succeed ())
        ]
        |> Parser.andThen identity

Here’s how it works in practice:

Parser.run (lookAhead Parser.int) "123"
--> Ok ()

Parser.run (lookAhead Parser.int) "abc"
--> Err [ { row = 1, col = 1, problem = Parser.ExpectingInt } ]

If the int parser succeeds, so does the lookAhead parser, and if the int parser fails, the lookAhead parser fails, too.

But if the parser succeeds, it doesn’t chomp any characters, as desired:

Parser.run (lookAhead Parser.int |> Parser.getChompedString) "123"
--> Ok ""

While the code works, it has its problems:

The implementation is hard to understand. From just looking at the code, even I as the author cannot see easily which role each part plays and how they all work together to achieve the desired effect.
It’s not possible to return the parsed value (changing the signature to lookAhead : Parser a -> Parser a). In order to move the parser back to the beginning after a successful parse, I have to use the problem function to turn the success into a failure, thereby loosing the successfully parsed value.
The code as it is doesn’t work with the Parser.Advanced module. I would have to change the signature into lookAhead: x -> Parser c x a -> Parser c x (). The user would have to pass an additional value of her error type x, which is needed internally as the parameter of the problem function mentioned above. The additional parameter would never be returned in the function result, though.
It’s possible to implement a negative lookahead parser in the same way, but the resulting code wouldn’t be easily recognizable as the inverse of the above (positive) lookahead parser. It would be nice to have a common code structure like an if ... then ... else ... with the branches flipped for the negative lookahead parser.

As it turns out, we can solve all these problems…

Trick

Without further ado, here’s the magic trick I found:

✨💫⭐️  It's possible to run a parser inside of another parser ⭐️💫✨

Hmmm. This doesn’t sound very spectacular, doesn’t it?

Don’t we do this all the time already?

… I hear you ask, and further:

What about the following parser:
pointParser : Parser ( Int, Int )
pointParser =
    Parser.succeed Tuple.pair
        |= Parser.int
        |. Parser.token ","
        |= Parser.int
Doesn’t it run int and token parsers inside of the pointParser?

No, this is not what I mean. To show you what I want let’s implement a function “to run a parser inside of another parser” together, step by step. We’ll start with

trick : ...
trick =
    ...

Ok, let’s go. What does this mean: “run a parser inside of another parser”? How can we be “inside” a parser?

Like in the pointParser example, it means that we are building a parser.

And what do we do with the built parser?

We are returning it as the function result.

Ok, so you want something like
trick : Parser a
trick =
    ...
?

Yes. The “…” is the place inside the parser we are building.

Ok. This part is exactly the same as in the pointParser example above. Now what about “run a parser inside of another parser”? We’re building a parser, but what is the parser we need to run?

This means that we need a parser to work with. I didn’t specify any requirements for this parser, so the user of our function should be free to choose whatever parser she wants.

You mean our function needs a parameter? As in
trick : Parser a -> Parser b
trick parser =
    ...
?

Exactly. We get a parser and we return another parser.

Nice. Now comes the last part, already: we need to “run a parser inside of another parser”. Didn’t we do this in the pointParser sample?

No, in the example we used the int and token parsers, but I want to run the given parser.

So how do we run a Parser?

Literally with the function Parser.run. This function takes the parser as its first argument.

Ok, we could change the code to:
trick : Parser a -> Parser b
trick parser =
    Parser.run parser ...
run needs a source string as the second parameter. Which string should we use?

We use the currently parsed source string. The Parser.getSource function gives us access to it.

Hmm, getSource gives us a Parser. How do we get the String?

To get the source string and build another parser from it we can use the Parser.andThen combinator. We can give it a function that takes the source string as its parameter and uses it to build another parser.

Hmm, wait a moment. You say we get the source “and then” process it?

trick : Parser a -> Parser b
trick parser =
    Parser.getSource
        |> Parser.andThen
            (\source ->
                Parser.run parser source
                    ...
            )

Yes, good! But I don’t want to run the parser at the beginning of the source string. I want to run it at the current parser position.

You mean the position as returned by Parser.getPosition?

Close, but in this case we need Parser.getOffset instead. We need to chop off the already parsed beginning of the source string, and we can do so with String.dropLeft. For this we need the “offset” as an index into the source string.

I see. Like in getting the source string, we need another andThen call to get the offset?

Yes. You can nest one inside the other.

Ok. I can get the offset, too, and then use it with String.dropLeft, as you said:

trick : Parser a -> Parser b
trick parser =
    Parser.getSource
        |> Parser.andThen
            (\source ->
                Parser.getOffset
                    |> Parser.andThen
                        (\offset ->
                            Parser.run parser (String.dropLeft offset source)
                                ...
                        )
            )

This is getting more and more complicated…

Very good! We’re almost finished. We already run the given parser at the current parser position. The result of it is a Result (List Parser.DeadEnd) a.

What should we do with it?

This totally depends on the use case. As such it should be the choice of the user of the function. This means that we need another parameter, a callback function supplied by the user, which gets called with the result of the test run and gives us back the parser we finally need to return.

Ok, we’ll add a callback parameter. This would be

trick : Parser a -> (Result (List Parser.DeadEnd) a -> Parser b) -> Parser b
trick parser callback =
    Parser.getSource
        |> Parser.andThen
            (\source ->
                Parser.getOffset
                    |> Parser.andThen
                        (\offset ->
                            Parser.run parser (String.dropLeft offset source)
                                |> callback
                        )
            )

Great. This is exactly what I meant! We’re finished

Yeah! Finally, we should give the function a meaningful name.

Absolutely. I’ll choose getResultAndThen. “getResultAndThen” because we determine the Result of the Parser.run function, “getResultAndThen” like the other “getXxx” functions in the elm/parser package which return information without affecting the current parser state, and “getResultAndThen” because the signature looks similar to a flipped version of Parser.andThen:

flippedAndThen   : Parser a -> (a -> Parser b) -> Parser b

getResultAndThen : Parser a -> (Result (List Parser.DeadEnd) a -> Parser b) -> Parser b

Here’s the final result:

getResultAndThen : Parser a -> (Result (List Parser.DeadEnd) a -> Parser b) -> Parser b
getResultAndThen parser callback =
    Parser.getSource
        |> Parser.andThen
            (\source ->
                Parser.getOffset
                    |> Parser.andThen
                        (\offset ->
                            String.dropLeft offset source
                                |> Parser.run parser
                                |> callback
                        )
            )

The central aspect of this implementation is that the test run of the given parser has absolutely no effect on the current parser state.

Now let’s see how we can make use of this…

Back to the Use Case

For the lookAhead parser, we need to check whether a given parser succeeds or not, but without chomping any characters. This is exactly what our new function can give us:

lookAhead : Parser.Parser a -> Parser.Parser ()
lookAhead parser =
    getResultAndThen parser <|
        \result ->
            case result of
                Ok _ ->
                    ...

                Err _ ->
                    ...

This runs the given parser at the current parser position without affecting the current parser state, in particular without chomping any characters. All that remains to do is to handle the Result passed to our callback function:

If the result is Ok (the parser succeeds), we need to succeed, too. This simply is Parser.succeed ().
If the result is Err (the parser fails), we need to fail, too. If we want to fail with the same error as the given parser, we can just return the parser itself! After all, we know that it will fail.

So here’s the alternative version of lookAhead:

lookAhead : Parser.Parser a -> Parser.Parser ()
lookAhead parser =
    getResultAndThen parser <|
        \result ->
            case result of
                Ok _ ->
                    Parser.succeed ()

                Err _ ->
                    -- here we know that the parser
                    -- fails with the expected error
                    Parser.backtrackable parser

(I added a Parser.backtrackable, because in the original implementation, the lookAhead parser was always backtrackable.)

I’m sure you agree with me that this code is much more understandable than the original version.

What about the other flaws?

Return the parsed value

Returning the parsed value in the case of a succeeding test run is easy. Just change the function signature and the Ok case like this:

lookAhead : Parser.Parser a -> Parser.Parser a
lookAhead parser =
    getResultAndThen parser <|
        \result ->
            case result of
                Ok a ->
                    Parser.succeed a

                Err _ ->
                    Parser.backtrackable parser

Support the `Parser.Advanced` module

The code of getResultAndThen and of lookAhead can be copied 1:1 to be used together with the Parser.Advanced module. We only need to change the function signatures to

getResultAndThen : Parser c x a -> (Result (List Parser.DeadEnd c x) a -> Parser c x b) -> Parser c x b

lookAhead : Parser.Parser c x a -> Parser.Parser c x a

Similarly structured negative lookahead parser

This is easy to implement too, both for the Parser and the Parser.Advanced modules:

negativeLookAhead : String -> Parser.Parser a -> Parser.Parser ()
negativeLookAhead msg parser =
    getResultAndThen parser <|
        \result ->
            case result of
                Ok _ ->
                    Parser.problem msg

                Err _ ->
                    Parser.succeed ()

As you can see, it has exactly the same structure as the (positive) lookAhead parser, with both cases flipped (more or less).

But do the new implementations deliver what they promise?

Parser.run (lookAhead Parser.int) "123"
--> Ok 123

Parser.run (lookAhead Parser.int |> Parser.getChompedString) "123"
--> Ok ""

Parser.run (lookAhead Parser.int) "abc"
--> Err [ { row = 1, col = 1, problem = Parser.ExpectingInt } ]

The positive lookAhead parser does! If the test parser succeeds, it succeeds too, now even returning the successfully parsed value, but without chomping anything. If the test parser fails, the lookAhead parser fails, too.

Parser.run (negativeLookAhead "unexpected int" Parser.int) "123"
--> Err [ { row = 1, col = 1, problem = Parser.Problem "unexpected int" } ]

Parser.run (negativeLookAhead "unexpected int" Parser.int) "abc"
--> Ok ()

The negativeLookAhead parser works, too! If the test parser succeeds, it fails with the given problem message. If the test parser fails, the negativeLookAhead parser succeeds.

Phew.

At the end of this long post, I hope I finally could show you the magic of

✨💫⭐️  It's possible to run a parser inside of another parser ⭐️💫✨

TL;DR

By running a parser inside of another parser, it’s possible to build many interesting new parsers. In this post, I showed how to literally run a parser by calling the Parser.run function inside of another parser, which allows to examine the result of the test run without affecting the current parser state.

Using this technique, it’s easy to build lookAhead and negativeLookAhead parsers, but this is not all…

More Use Cases

I recently published a function like getResultAndThen and the lookahead parsers in my elm-parser-extra package.

But the lookahead parsers were just a byproduct.

I mainly used the getResultAndThen function in my elm-parser-bug-workaround package to implement workarounds for a bug in elm/parser, where the parser gets into an inconsistent internal state after using chompUntil, chompUntilEndOr, lineComment, or multiComment.

In fact, even these bug workarounds were a byproduct. I found the getResultAndThen trick while exploring something else. But this will be the topic of another post…

Thanks for reading this far.

eelcoh · November 26, 2022, 3:10pm

For anyone who likes to learn more about parsers after reading this great article I recommend this talk by @terezka :

dirkbj · November 29, 2022, 7:47pm

Loved reading through this. Thank you.

system · December 9, 2022, 7:48pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Demystifying an Obscure LookAhead Parser Show and Tell	1	340	September 9, 2023
Parsers with Error Recovery Learn	22	4785	October 11, 2020
Parsing puzzle, avoid look-ahead? Learn	7	1136	February 12, 2018
Elm/parser look ahead without backtrackable Learn	4	1195	February 16, 2019
Dasch/parser, an easier to use parsing library Request Feedback	7	866	December 1, 2018