Demystifying an Obscure LookAhead Parser

pit · August 30, 2023, 11:30am

Quite some time ago, back in March 2022, there was this discussion on the Elm slack:

@SiriusStarr
am I missing something or is there no lookAhead for Parser

@megapctr
what would it do?

@SiriusStarr
If p in lookAhead p succeeds (either consuming input or not) the whole parser behaves like p succeeded without consuming anything (parser state is not updated as well). If p fails, lookAhead has no effect, i.e. it will fail consuming input if p fails consuming input.

@SiriusStarr
Checks if a parser succeeds without actually consuming the input

While the discussion continued, no concrete solution emerged.

The problem interested me and I had the time to explore it. After many trials and errors I eventually came up with the following code (published in this Ellie):

import Parser as P

lookAhead : P.Parser a -> P.Parser ()
lookAhead parser =
    P.oneOf
        [ P.oneOf
            [ parser
                |> P.backtrackable
                |> P.andThen (\_ -> P.commit ())
                |> P.andThen (\_ -> P.problem "")
            , P.succeed
                (parser
                    |> P.backtrackable
                    |> P.map (\_ -> ())
                )
            ]
            |> P.backtrackable
        , P.succeed
            (P.succeed ())
        ]
        |> P.andThen identity

As you can easily see, this function solves the given task.

No, kidding aside. The code works, but it has a couple of flaws, not least the poor readability. In the meantime I found a much better way to implement a lookAhead parser. I wrote about it in this Discourse post: How to build interesting parsers.

However, apparently my old code is still being used, and from time to time, someone asks how it works.

So, if you just want to use a lookAhead parser, then I suggest to use the one I talked about in the linked Discourse post. I published it in my pithub/elm-parser-extra package.

But if you want to understand the code shown above, keep on reading.

Introduction

In the following text, I assume a certain basic knowledge of Elm parsers. You should understand the parser combinators andThen and oneOf and know what it means for a parser to be backtrackable or not. If you don’t understand what I’m writing here, feel free to ask in the comments.

I’ll explain the code line by line, roughly in the order in which Elm would execute it: from top to bottom and from inside to outside. In each section, I first show the Elm code with line numbers and then the results of running the code, line by line:

Code:

 1    {some parser}
 2    {more}

Results:

 # │ Code                               │                                   │
───┼────────────────────────────────────┼───────────────────────────────────┤
 1 │ {some parser}                      │ {some parser results}             │
 2 │ {more}                             │ {more results}                    │

There are two use cases for the lookAhead parser: if the given parser succeeds, the lookAhead parser should succeed, too, but without consuming any input, and if the given parser fails, the lookAhead parser should fail, too. I’ll name these two use cases “Success” and “Failure” and show the results of each parser for those two use cases:

Code:

 1    {some parser}
 2    {more}

Results:

 # │ Code                         │ Success            │ Failure            │
───┼──────────────────────────────┼────────────────────┼────────────────────┤
 1 │ {some parser}                │ {success result 1} │ {failure result 1} │
 2 │ {more}                       │ {success result 2} │ {failure result 2} │

Running a parser can either succeed or fail. Internally, those two states are represented in the elm/parser package by this type:

type PStep context problem value
  = Good Bool value (State context)
  | Bad Bool (Bag context problem)

I’ll show the results of running the parsers in a similar, but slightly shorter way:

if the parser succeeds: Good Bool value
if the parser fails: Bad Bool problem

For the Bools I’ll use True if the corresponding parser is backtrackable and False if it isn’t.

Side note:
This is the opposite of what elm/parser uses internally, but I find it easier to understand it this way.

I hope the notation becomes clear when you see it in action…

Let’s go

We start with the only one thing we have in the beginning: the parser given to the lookAhead function:

Code:

 1    parser

Results:

 # │ Code                               │ Success        │ Failure          │
───┼────────────────────────────────────┼────────────────┼──────────────────┤
 1 │ parser                             │ Good ? a       │ Bad ? e          │

This means that running the code in line 1 (in this case the parser given as an argument to the lookAhead function) yields the following results:

In the “Success” use case:
- Good: in the “Success” use case, the parser in line 1 succeeds (this is exactly how the “Success” use case is defined)
- ?: we don’t know whether the given parser is backtrackable or not
- a: the parser is of type P.Parser a, so I use “a” to represent the parsed value
In the “Failure” use case:
- Bad: in the “Failure” use case, the parser in line 1 fails (this is exactly how the “Failure” use case is defined)
- ?: we don’t know whether the given parser is backtrackable or not
- e: I use “e” to represent the error that the given parser produces in the “Failure” use case

Two Kinds of Bad

In the “Success” use case, the parser might have consumed part of the input. The only way to undo that is to turn success into failure. We can do this with P.andThen and P.problem:

Code:

 1    parser
 2        |> P.andThen (\_ -> P.problem "")

Results:

 # │ Code                               │ Success        │ Failure          │
───┼────────────────────────────────────┼────────────────┼──────────────────┤
 1 │ parser                             │ Good ? a       │ Bad ? e          │
 2 │ |> P.andThen (\_ -> P.problem "")  │ Bad ? ""       │ Bad ? e          │

OK, now we have a parser which doesn’t consume anything, but we get a Bad result for both use cases, and we need to be able to distinguish between them. How can we do that?

Idea 1:
Could we just use a special error text?

No. The behavior of a parser doesn’t depend on the error value.

Idea 2:
Could we make one backtrackable and the other not?

Yes, this could work, so let’s try it…

First, we need a defined state for the backtrackable flag. We don’t know whether the given parser is backtrackable or not, so let’s make it definitely backtrackable first:

Code:

 1    parser
 2        |> P.backtrackable

Results:

 # │ Code                               │ Success        │ Failure          │
───┼────────────────────────────────────┼────────────────┼──────────────────┤
 1 │ parser                             │ Good ? a       │ Bad ? e          │
 2 │ |> P.backtrackable                 │ Good True a    │ Bad True e       │

Now we can modify the “Success” use case by making it (and only it) not backtrackable with P.andThen and P.commit, and then turn success into failure as we did above:

Code:

 1    parser
 2        |> P.backtrackable
 3        |> P.andThen (\_ -> P.commit ())
 4        |> P.andThen (\_ -> P.problem "")

Results:

 # │ Code                               │ Success        │ Failure          │
───┼────────────────────────────────────┼────────────────┼──────────────────┤
 1 │ parser                             │ Good ? a       │ Bad ? e          │
 2 │ |> P.backtrackable                 │ Good True a    │ Bad True e       │
 3 │ |> P.andThen (\_ -> P.commit ())   │ Good False ()  │ Bad True e       │
 4 │ |> P.andThen (\_ -> P.problem "")  │ Bad False ""   │ Bad True e       │

Fine. We accomplished the first goal: to undo the input consumption in both use cases. Plus, we are still able to distinguish between them.

But is that really true?

Tell Bad From Bad

Let’s see how can we turn the two kinds of Bad into different results. For now, it doesn’t matter what these results are exactly, so I’ll leave the details open and just use

{result S} for the result in the “Success” use case
{parser S} for the parser used to generate the result S value
{result F} for the result in the “Failure” use case
{parser F} for the parser used to generate the result F value

We can change any backtrackable Bad using P.oneOf. The backtrackable Bad is the Bad of the “Failure” use case, so we use {parser F} in P.oneOf:

Code:

 1    P.oneOf
 2        [ parser
 3            |> P.backtrackable
 4            |> P.andThen (\_ -> P.commit ())
 5            |> P.andThen (\_ -> P.problem "")
 6        , {parser F}
 7        ]

Results:

 # │ Code                               │ Success        │ Failure          │
───┼────────────────────────────────────┼────────────────┼──────────────────┤
 2 │ parser                             │ Good ? a       │ Bad ? e          │
 3 │ |> P.backtrackable                 │ Good True a    │ Bad True e       │
 4 │ |> P.andThen (\_ -> P.commit ())   │ Good False ()  │ Bad True e       │
 5 │ |> P.andThen (\_ -> P.problem "")  │ Bad False ""   │ Bad True e       │
   │                                    │                │                  │
 6 │ {parser F}                         │                │ {result F}       │
   │                                    │                │                  │
 1 │ P.oneOf [ {2-5}, {6} ]             │ Bad False ""   │ {result F}       │

I hope the notation is clear:

line 6 is only relevant in the “Failure” use case, so I left the “Success” column empty
P.oneOf in line 1 combines the parsers of lines 2-5 and of line 6. I wrote this in the “code” column as “{2-5}” and “{6}” respectively.

Now to the “Success” use case. We’ll use another P.oneOf, but before we can do this we have to make the Bad of the “Success” use case backtrackable again:

Code:

 1    P.oneOf
 2        [ P.oneOf
 3            [ parser
 4                |> P.backtrackable
 5                |> P.andThen (\_ -> P.commit ())
 6                |> P.andThen (\_ -> P.problem "")
 7            , {parser F}
 8            ]
 9            |> P.backtrackable
10        , {parser S}
11        ]

Results:

 # │ Code                               │ Success        │ Failure          │
───┼────────────────────────────────────┼────────────────┼──────────────────┤
 3 │ parser                             │ Good ? a       │ Bad ? e          │
 4 │ |> P.backtrackable                 │ Good True a    │ Bad True e       │
 5 │ |> P.andThen (\_ -> P.commit ())   │ Good False ()  │ Bad True e       │
 6 │ |> P.andThen (\_ -> P.problem "")  │ Bad False ""   │ Bad True e       │
   │                                    │                │                  │
 7 │ {parser F}                         │                │ {result F}       │
   │                                    │                │                  │
 2 │ P.oneOf [ {3-6}, {7} ]             │ Bad False ""   │ {result F}       │
 9 │ |> P.backtrackable                 │ Bad True ""    │ {result F}       │
   │                                    │                │                  │
10 │ {parser S}                         │ {result S}     │                  │
   │                                    │                │                  │
 1 │ P.oneOf [ {2-9}, {10} ]            │ {result S}     │ {result F}       │

Great. Using two nested P.oneOf, we manage to undo the input consumption in both use cases and still can return different results for them.

What should these results be?

Final Results

Let’s start with the “Success” use case. If the given parser succeeds, the lookAhead parser should succeed, too, so we need a Good result.

I wanted the result to be backtrackable (Good True), because this seems to be appropriate for a parser that is conducting a test (lookAhead) rather than actually parsing something.

For the value of the result, we can’t use the value of the given parser, because we loose it when we turn success into failure with P.problem. Therefore we simply use the “unit” value ().

So the desired result is Good True (), which can be easily achieved with P.succeed:

Code:

 1    P.succeed ()

Results:

 # │ Code                               │ Success        │ Failure          │
───┼────────────────────────────────────┼────────────────┼──────────────────┤
 1 │ P.succeed ()                       │ Good True ()   │                  │

And now the “Failure” use case. If the given parser fails, the lookAhead parser should fail, too, so we need a Bad result.

This is simple: we can just use the given parser. In the “Failure” use case, it returns the result Bad ? e. (As a bonus for reusing the given parser, the lookAhead parser returns the same error “e” in the “Failure” use case.)

But we have to make two modifications: first, we want the result to be backtrackable as in the “Success” use case, giving us Bad True e.

Second, we have to match the type of the “Success” use case parser: P.Parser (). The given parser is of type P.Parser a, so we have to P.map it to the desired type:

Code:

 1    parser
 2        |> P.backtrackable
 3        |> P.map (\_ -> ())

Results:

 # │ Code                               │ Success        │ Failure          │
───┼────────────────────────────────────┼────────────────┼──────────────────┤
 1 │ parser                             │                │ Bad ? e          │
 2 │ |> P.backtrackable                 │                │ Bad True e       │
 3 │ |> P.map (\_ -> ())                │                │ Bad True e       │

(Line 3 changes the type of the parser.)

Good. Now that we know what we’d like to return, let’s put this into the placeholders of the previous code!

All Together Now

Here’s the code from above with the placeholders replaced by the parsers we just found:

Code:

 1    P.oneOf
 2        [ P.oneOf
 3            [ parser
 4                |> P.backtrackable
 5                |> P.andThen (\_ -> P.commit ())
 6                |> P.andThen (\_ -> P.problem "")
 7            , parser
 8                  |> P.backtrackable
 9                  |> P.map (\_ -> ())
10            ]
11            |> P.backtrackable
12        , P.succeed ()
13        ]

Lines 7-9 contain the parser for the “Failure” use case, line 12 the parser for the “Success” use case.

Results:

 # │ Code                               │ Success        │ Failure          │
───┼────────────────────────────────────┼────────────────┼──────────────────┤
 3 │ parser                             │ Good ? a       │ Bad ? e          │
 4 │ |> P.backtrackable                 │ Good True a    │ Bad True e       │
 5 │ |> P.andThen (\_ -> P.commit ())   │ Good False ()  │ Bad True e       │
 6 │ |> P.andThen (\_ -> P.problem "")  │ Bad False ""   │ Bad True e       │
   │                                    │                │                  │
 7 │ parser                             │                │ Bad ? e          │
 8 │ |> P.backtrackable                 │                │ Bad True e       │
 9 │ |> P.map (\_ -> ())                │                │ Bad True e       │
   │                                    │                │                  │
 2 │ P.oneOf [ {3-6}, {7-10} ]          │ Bad False ""   │ Bad True e       │
11 │ |> P.backtrackable                 │ Bad True ""    │ Bad True e       │
   │                                    │                │                  │
12 │ P.succeed ()                       │ Good True ()   │                  │
   │                                    │                │                  │
 1 │ P.oneOf [ {2-11}, {12} ]           │ Good True ()   │ Good True ()     │

Hmm. Do you see the problem? The “Success” use case works, but the “Failure” use case doesn’t. It doesn’t yield the desired result.

Why is this happening?

The problem is that the P.oneOf in line 1 acts on every backtrackable Bad result. If you look at the results of line 11, you can see that we have backtrackable Bad results in both use cases, and therefore we get the result of line 12 in both use cases

It wouldn’t help if we chose a not backtrackable Bad as the result of line 9, because line 11 would make it backtrackable again.

And Then Bool Comes to the Rescue

The only thing we can do is to use a Good result in the “Failure” use case, which wouldn’t be changed by the P.oneOf in line 1. We then have to turn it into the desired Bad result in a second step.

Changing a Good result into something else can be done with P.andThen. So the idea is to add P.andThen at the end of the code and - depending on the value of the input parameter - return the desired result.

What should we pass to P.andThen? Why not simply pass “True” in the “Success” use case and “False” in the “Failure” use case? Then we can use an if expression to return the desired result.

Let’s see how this looks:

Code:

 1    P.oneOf
 2        [ P.oneOf
 3            [ parser
 4                |> P.backtrackable
 5                |> P.andThen (\_ -> P.commit ())
 6                |> P.andThen (\_ -> P.problem "")
 7            , P.succeed False
 8            ]
 9            |> P.backtrackable
10        , P.succeed True
11        ]
12        |> P.andThen
13            (\success ->
14                if success then
15                    P.succeed ()
16                else
17                    parser
18                        |> P.backtrackable
19                        |> P.map (\_ -> ())
20            )

In line 7, we simply return False. The parser for the final result in the “Failure” use case is in lines 17-19. Analogously, for the “Success” use case, we return True in line 10, and put the parser for the final result in line 15.

Results:

 # │ Code                               │ Success        │ Failure          │
───┼────────────────────────────────────┼────────────────┼──────────────────┤
 3 │ parser                             │ Good ? a       │ Bad ? e          │
 4 │ |> P.backtrackable                 │ Good True a    │ Bad True e       │
 5 │ |> P.andThen (\_ -> P.commit ())   │ Good False ()  │ Bad True e       │
 6 │ |> P.andThen (\_ -> P.problem "")  │ Bad False ""   │ Bad True e       │
   │                                    │                │                  │
 7 │ P.succeed False                    │                │ Good True False  │
   │                                    │                │                  │
 2 │ P.oneOf [ {3-6}, {7} ]             │ Bad False ""   │ Good True False  │
 9 │ |> P.backtrackable                 │ Bad True ""    │ Good True False  │
   │                                    │                │                  │
10 │ P.succeed True                     │ Good True True │                  │
   │                                    │                │                  │
 1 │ P.oneOf [ {2-9}, {10} ]            │ Good True True │ Good True False  │
   │                                    │                │                  │
15 │ P.succeed ()                       │ Good True ()   │                  │
   │                                    │                │                  │
17 │ parser                             │                │ Bad ? e          │
18 │ |> P.backtrackable                 │                │ Bad True e       │
19 │ |> P.map (\_ -> ())                │                │ Bad True e       │
   │                                    │                │                  │
12 │ {1-11} |> P.andThen {13-19}        │ Good True ()   │ Bad True e       │

OK, this looks good: the result of the P.oneOf in lines 1-11 (the first step) is Good True True in the “Success” use case and Good True False in the “Failure” use case. We change this in the P.andThen in lines 12-20 (the second step) into the desired results.

And indeed, the code works! It passes all tests!

But… we’re not finished, yet.

Code Golfing

The code works, but it isn’t exactly the code I came up with. If I remember correctly, I never had a version that looked like the current code. So let’s see how we can transform the current code into the final version.

The current code roughly has the following structure:

│ Step 1                         │ Combinator   │ Step 2                 │
├────────────────────────────────┼──────────────┼────────────────────────┤
│ Parser a -> Parser Bool        │ >> P.andThen │ Bool -> Parser ()      │

We can interpret the first step as a function from Parser a to Parser Bool. The second step is a function from Bool to Parser (). Both functions are combined with >> P.andThen to finally yield a function from Parser a to Parser ().

Let’s first generalize the input of the first step and the output of the second step:

Parser a → input
Parser () → output

│ Step 1                         │ Combinator   │ Step 2                 │
├────────────────────────────────┼──────────────┼────────────────────────┤
│ input -> Parser Bool           │ >> P.andThen │ Bool -> output         │

Now the first step is a function from a general input type to Parser Bool, the second step is a function from Bool to output, so the combination of both steps is a function from input to output.

If we wouldn’t return a Parser Bool but only a Bool in the first step, we could use the forward function composition operator >> alone:

Parser Bool → Bool
>> P.andThen → >>

│ Step 1                         │ Combinator   │ Step 2                 │
├────────────────────────────────┼──────────────┼────────────────────────┤
│ input -> Bool                  │ >>           │ Bool -> output         │

Now the first step is a function which processes the input parameter into a True or a False value. The second step takes this Bool value and returns either a value for the True case or another value for the False case.

Normally I wouldn’t split a function like that into two steps:

(\input -> ..... True ..... False .....)
    >>
        (\flag -> if flag then trueValue else falseValue)

I’d rather directly use trueValue instead of True and falseValue instead of False:

\input -> ..... trueValue ..... falseValue .....

Then I wouldn’t need a second step at all:

│ Step 1                         │ Combinator   │ Step 2                 │
├────────────────────────────────┼──────────────┼────────────────────────┤
│ input -> output                │              │                        │

Unfortunately, as we have seen above, we can’t use the falseValue (the parser for the “Failure” use case) directly. Therefore we had to introduce a second step, and there’s no way around it. But we can still omit the Bools, even with a two-step implementation:

│ Step 1                         │ Combinator   │ Step 2                 │
├────────────────────────────────┼──────────────┼────────────────────────┤
│ input -> output                │ >>           │ output -> output       │

In code:

(\input -> ..... trueValue ..... falseValue .....)
    >>
        identity

By using trueValue and falseValue directly, we can simply use the identity function as the second step.

Can we do this with our Parser code, too?

Yes we can. We just have to reverse the transformations we did above…

First, we re-introduce Parsers, both around the return type of the first step and in the combinator:

output → Parser output
>> → >> P.andThen

│ Step 1                         │ Combinator   │ Step 2                 │
├────────────────────────────────┼──────────────┼────────────────────────┤
│ input -> Parser output         │ >> P.andThen │ output -> output       │

Second, we re-introduce the real input and output types:

input → Parser a
output → Parser ()

│ Step 1                         │ Combinator   │ Step 2                 │
├────────────────────────────────┼──────────────┼────────────────────────┤
│ Parser a -> Parser (Parser ()) │ >> P.andThen │ Parser () -> Parser () │

(I’ll briefly talk about the weird result type of the first step in the closing remarks…)

This gives us the final version:

Code:

 1    P.oneOf
 2        [ P.oneOf
 3            [ parser
 4                |> P.backtrackable
 5                |> P.andThen (\_ -> P.commit ())
 6                |> P.andThen (\_ -> P.problem "")
 7            , P.succeed
 8                (parser
 9                    |> P.backtrackable
10                    |> P.map (\_ -> ())
11                )
12            ]
13            |> P.backtrackable
14        , P.succeed
15            (P.succeed ())
16        ]
17        |> P.andThen identity

In lines 8-10, I replaced False with falseValue, which is the parser for the “Failure” use case. In line 15, I replaced True with trueValue, the parser for the “Success” use case. And in line 17, I replaced the second step with the identity function.

Results:

 # │ Code                               │ Success        │ Failure          │
───┼────────────────────────────────────┼────────────────┼──────────────────┤
 3 │ parser                             │ Good ? a       │ Bad ? e          │
 4 │ |> P.backtrackable                 │ Good True a    │ Bad True e       │
 5 │ |> P.andThen (\_ -> P.commit ())   │ Good False ()  │ Bad True e       │
 6 │ |> P.andThen (\_ -> P.problem "")  │ Bad False ""   │ Bad True e       │
   │                                    │                │                  │
 8 │ parser                             │                │ Bad ? e          │
 9 │ |> P.backtrackable                 │                │ Bad True e       │
10 │ |> P.map (\_ -> ())                │                │ Bad True e       │
   │                                    │                │                  │
 7 │ P.succeed {8-10}                   │                │ Good True {8-10} │
   │                                    │                │                  │
 2 │ P.oneOf [ {3-6}, {7-11} ]          │ Bad False ""   │ Good True {8-10} │
13 │ |> P.backtrackable                 │ Bad True ""    │ Good True {8-10} │
   │                                    │                │                  │
15 │ P.succeed ()                       │ Good True ()   │                  │
   │                                    │                │                  │
14 │ P.succeed {15}                     │ Good True {15} │                  │
   │                                    │                │                  │
 1 │ P.oneOf [ {2-13}, {14-15} ]        │ Good True {15} │ Good True {8-10} │
17 │ |> P.andThen identity              │ Good True ()   │ Bad True e       │

This still works, and this is the code I came up with back then

It’s no wonder no one understood that code…

Closing Remarks

You might be puzzled by the result type of the first step: Parser (Parser ()). A parser that creates another parser ???

In lines 7 and 14, we simply wrap the parsers for the “Failure” and “Success” use cases with P.succeed. Therefore, the resulting Parser (Parser ()) doesn’t do any parsing at all. It’s only use is to yield a Good result in both use cases, as described above.

The second step of the final version is

step2 =
    P.andThen identity

I didn’t show the type signature of this function on purpose. You can use the Elm REPL or maybe your IDE to show you the type.

You can use this pattern with every type that has an andThen function, for example with Maybe, Result, or Json.Decode.Decoder. Those types are often called “Monads” in functional programming, and the pattern shown is called join.

The combinator used to connect the two steps is

combinator f1 f2 =
    f1 >> P.andThen f2

It’s type signature is also interesting, and again, you can create such a function for other types with an andThen function, too. In Haskell, the function is written as >=> and called “Left-to-right composition of Kleisli arrows”…

Thank you for reading this far. If you have any questions or comments, feel free to leave them below. You can also find me as @Pit on the Elm slack.

system · September 9, 2023, 11:31am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to build interesting parsers Show and Tell	3	990	December 9, 2022
Elm/parser look ahead without backtrackable Learn	4	1195	February 16, 2019
Parsers with Error Recovery Learn	22	4785	October 11, 2020
Parsing puzzle, avoid look-ahead? Learn	7	1136	February 12, 2018
[elm/parser]: understanding what's going on Learn	3	796	August 23, 2019

Demystifying an Obscure LookAhead Parser

Introduction

Let’s go

Two Kinds of Bad

Tell Bad From Bad

Final Results

All Together Now

And Then Bool Comes to the Rescue

Code Golfing

Closing Remarks

Related topics