Quite some time ago, back in March 2022, there was this discussion on the Elm slack:
@SiriusStarr
am I missing something or is there nolookAhead
forParser
@megapctr
what would it do?
@SiriusStarr
Ifp
inlookAhead p
succeeds (either consuming input or not) the whole parser behaves likep
succeeded without consuming anything (parser state is not updated as well). Ifp
fails,lookAhead
has no effect, i.e. it will fail consuming input ifp
fails consuming input.
@SiriusStarr
Checks if a parser succeeds without actually consuming the input
While the discussion continued, no concrete solution emerged.
The problem interested me and I had the time to explore it. After many trials and errors I eventually came up with the following code (published in this Ellie):
import Parser as P
lookAhead : P.Parser a -> P.Parser ()
lookAhead parser =
P.oneOf
[ P.oneOf
[ parser
|> P.backtrackable
|> P.andThen (\_ -> P.commit ())
|> P.andThen (\_ -> P.problem "")
, P.succeed
(parser
|> P.backtrackable
|> P.map (\_ -> ())
)
]
|> P.backtrackable
, P.succeed
(P.succeed ())
]
|> P.andThen identity
As you can easily see, this function solves the given task.
No, kidding aside. The code works, but it has a couple of flaws, not least the poor readability. In the meantime I found a much better way to implement a lookAhead parser. I wrote about it in this Discourse post: How to build interesting parsers.
However, apparently my old code is still being used, and from time to time, someone asks how it works.
So, if you just want to use a lookAhead parser, then I suggest to use the one I talked about in the linked Discourse post. I published it in my pithub/elm-parser-extra package.
But if you want to understand the code shown above, keep on reading.
Introduction
In the following text, I assume a certain basic knowledge of Elm parsers. You should understand the parser combinators andThen
and oneOf
and know what it means for a parser to be backtrackable or not. If you donβt understand what Iβm writing here, feel free to ask in the comments.
Iβll explain the code line by line, roughly in the order in which Elm would execute it: from top to bottom and from inside to outside. In each section, I first show the Elm code with line numbers and then the results of running the code, line by line:
Code:
1 {some parser}
2 {more}
Results:
# β Code β β
ββββΌβββββββββββββββββββββββββββββββββββββΌββββββββββββββββββββββββββββββββββββ€
1 β {some parser} β {some parser results} β
2 β {more} β {more results} β
There are two use cases for the lookAhead
parser: if the given parser
succeeds, the lookAhead
parser should succeed, too, but without consuming any input, and if the given parser
fails, the lookAhead
parser should fail, too. Iβll name these two use cases βSuccessβ and βFailureβ and show the results of each parser for those two use cases:
Code:
1 {some parser}
2 {more}
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββΌβββββββββββββββββββββΌβββββββββββββββββββββ€
1 β {some parser} β {success result 1} β {failure result 1} β
2 β {more} β {success result 2} β {failure result 2} β
Running a parser can either succeed or fail. Internally, those two states are represented in the elm/parser
package by this type:
type PStep context problem value
= Good Bool value (State context)
| Bad Bool (Bag context problem)
Iβll show the results of running the parsers in a similar, but slightly shorter way:
- if the parser succeeds:
Good Bool value
- if the parser fails:
Bad Bool problem
For the Bool
s Iβll use True
if the corresponding parser is backtrackable and False
if it isnβt.
Side note:
This is the opposite of whatelm/parser
uses internally, but I find it easier to understand it this way.
I hope the notation becomes clear when you see it in actionβ¦
Letβs go
We start with the only one thing we have in the beginning: the parser given to the lookAhead
function:
Code:
1 parser
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ€
1 β parser β Good ? a β Bad ? e β
This means that running the code in line 1 (in this case the parser given as an argument to the lookAhead
function) yields the following results:
-
In the βSuccessβ use case:
Good
: in the βSuccessβ use case, the parser in line 1 succeeds (this is exactly how the βSuccessβ use case is defined)?
: we donβt know whether the given parser is backtrackable or nota
: the parser is of typeP.Parser a
, so I use βaβ to represent the parsed value
-
In the βFailureβ use case:
Bad
: in the βFailureβ use case, the parser in line 1 fails (this is exactly how the βFailureβ use case is defined)?
: we donβt know whether the given parser is backtrackable or note
: I use βeβ to represent the error that the given parser produces in the βFailureβ use case
Two Kinds of Bad
In the βSuccessβ use case, the parser might have consumed part of the input. The only way to undo that is to turn success into failure. We can do this with P.andThen
and P.problem
:
Code:
1 parser
2 |> P.andThen (\_ -> P.problem "")
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ€
1 β parser β Good ? a β Bad ? e β
2 β |> P.andThen (\_ -> P.problem "") β Bad ? "" β Bad ? e β
OK, now we have a parser which doesnβt consume anything, but we get a Bad
result for both use cases, and we need to be able to distinguish between them. How can we do that?
Idea 1:
Could we just use a special error text?
No. The behavior of a parser doesnβt depend on the error value.
Idea 2:
Could we make one backtrackable and the other not?
Yes, this could work, so letβs try itβ¦
First, we need a defined state for the backtrackable flag. We donβt know whether the given parser is backtrackable or not, so letβs make it definitely backtrackable first:
Code:
1 parser
2 |> P.backtrackable
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ€
1 β parser β Good ? a β Bad ? e β
2 β |> P.backtrackable β Good True a β Bad True e β
Now we can modify the βSuccessβ use case by making it (and only it) not backtrackable with P.andThen
and P.commit
, and then turn success into failure as we did above:
Code:
1 parser
2 |> P.backtrackable
3 |> P.andThen (\_ -> P.commit ())
4 |> P.andThen (\_ -> P.problem "")
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ€
1 β parser β Good ? a β Bad ? e β
2 β |> P.backtrackable β Good True a β Bad True e β
3 β |> P.andThen (\_ -> P.commit ()) β Good False () β Bad True e β
4 β |> P.andThen (\_ -> P.problem "") β Bad False "" β Bad True e β
Fine. We accomplished the first goal: to undo the input consumption in both use cases. Plus, we are still able to distinguish between them.
But is that really true?
Tell Bad From Bad
Letβs see how can we turn the two kinds of Bad
into different results. For now, it doesnβt matter what these results are exactly, so Iβll leave the details open and just use
{result S}
for the result in the βSuccessβ use case{parser S}
for the parser used to generate theresult S
value{result F}
for the result in the βFailureβ use case{parser F}
for the parser used to generate theresult F
value
We can change any backtrackable Bad
using P.oneOf
. The backtrackable Bad
is the Bad
of the βFailureβ use case, so we use {parser F}
in P.oneOf
:
Code:
1 P.oneOf
2 [ parser
3 |> P.backtrackable
4 |> P.andThen (\_ -> P.commit ())
5 |> P.andThen (\_ -> P.problem "")
6 , {parser F}
7 ]
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ€
2 β parser β Good ? a β Bad ? e β
3 β |> P.backtrackable β Good True a β Bad True e β
4 β |> P.andThen (\_ -> P.commit ()) β Good False () β Bad True e β
5 β |> P.andThen (\_ -> P.problem "") β Bad False "" β Bad True e β
β β β β
6 β {parser F} β β {result F} β
β β β β
1 β P.oneOf [ {2-5}, {6} ] β Bad False "" β {result F} β
I hope the notation is clear:
-
line 6 is only relevant in the βFailureβ use case, so I left the βSuccessβ column empty
-
P.oneOf
in line 1 combines the parsers of lines 2-5 and of line 6. I wrote this in the βcodeβ column as β{2-5}β and β{6}β respectively.
Now to the βSuccessβ use case. Weβll use another P.oneOf
, but before we can do this we have to make the Bad
of the βSuccessβ use case backtrackable again:
Code:
1 P.oneOf
2 [ P.oneOf
3 [ parser
4 |> P.backtrackable
5 |> P.andThen (\_ -> P.commit ())
6 |> P.andThen (\_ -> P.problem "")
7 , {parser F}
8 ]
9 |> P.backtrackable
10 , {parser S}
11 ]
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ€
3 β parser β Good ? a β Bad ? e β
4 β |> P.backtrackable β Good True a β Bad True e β
5 β |> P.andThen (\_ -> P.commit ()) β Good False () β Bad True e β
6 β |> P.andThen (\_ -> P.problem "") β Bad False "" β Bad True e β
β β β β
7 β {parser F} β β {result F} β
β β β β
2 β P.oneOf [ {3-6}, {7} ] β Bad False "" β {result F} β
9 β |> P.backtrackable β Bad True "" β {result F} β
β β β β
10 β {parser S} β {result S} β β
β β β β
1 β P.oneOf [ {2-9}, {10} ] β {result S} β {result F} β
Great. Using two nested P.oneOf
, we manage to undo the input consumption in both use cases and still can return different results for them.
What should these results be?
Final Results
Letβs start with the βSuccessβ use case. If the given parser succeeds, the lookAhead parser should succeed, too, so we need a Good
result.
I wanted the result to be backtrackable (Good True
), because this seems to be appropriate for a parser that is conducting a test (lookAhead) rather than actually parsing something.
For the value of the result, we canβt use the value of the given parser, because we loose it when we turn success into failure with P.problem
. Therefore we simply use the βunitβ value ()
.
So the desired result is Good True ()
, which can be easily achieved with P.succeed
:
Code:
1 P.succeed ()
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ€
1 β P.succeed () β Good True () β β
And now the βFailureβ use case. If the given parser fails, the lookAhead parser should fail, too, so we need a Bad
result.
This is simple: we can just use the given parser. In the βFailureβ use case, it returns the result Bad ? e
. (As a bonus for reusing the given parser, the lookAhead parser returns the same error βe
β in the βFailureβ use case.)
But we have to make two modifications: first, we want the result to be backtrackable as in the βSuccessβ use case, giving us Bad True e
.
Second, we have to match the type of the βSuccessβ use case parser: P.Parser ()
. The given parser is of type P.Parser a
, so we have to P.map
it to the desired type:
Code:
1 parser
2 |> P.backtrackable
3 |> P.map (\_ -> ())
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ€
1 β parser β β Bad ? e β
2 β |> P.backtrackable β β Bad True e β
3 β |> P.map (\_ -> ()) β β Bad True e β
(Line 3 changes the type of the parser.)
Good. Now that we know what weβd like to return, letβs put this into the placeholders of the previous code!
All Together Now
Hereβs the code from above with the placeholders replaced by the parsers we just found:
Code:
1 P.oneOf
2 [ P.oneOf
3 [ parser
4 |> P.backtrackable
5 |> P.andThen (\_ -> P.commit ())
6 |> P.andThen (\_ -> P.problem "")
7 , parser
8 |> P.backtrackable
9 |> P.map (\_ -> ())
10 ]
11 |> P.backtrackable
12 , P.succeed ()
13 ]
Lines 7-9 contain the parser for the βFailureβ use case, line 12 the parser for the βSuccessβ use case.
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ€
3 β parser β Good ? a β Bad ? e β
4 β |> P.backtrackable β Good True a β Bad True e β
5 β |> P.andThen (\_ -> P.commit ()) β Good False () β Bad True e β
6 β |> P.andThen (\_ -> P.problem "") β Bad False "" β Bad True e β
β β β β
7 β parser β β Bad ? e β
8 β |> P.backtrackable β β Bad True e β
9 β |> P.map (\_ -> ()) β β Bad True e β
β β β β
2 β P.oneOf [ {3-6}, {7-10} ] β Bad False "" β Bad True e β
11 β |> P.backtrackable β Bad True "" β Bad True e β
β β β β
12 β P.succeed () β Good True () β β
β β β β
1 β P.oneOf [ {2-11}, {12} ] β Good True () β Good True () β
Hmm. Do you see the problem? The βSuccessβ use case works, but the βFailureβ use case doesnβt. It doesnβt yield the desired result.
Why is this happening?
The problem is that the P.oneOf
in line 1 acts on every backtrackable Bad
result. If you look at the results of line 11, you can see that we have backtrackable Bad
results in both use cases, and therefore we get the result of line 12 in both use cases
It wouldnβt help if we chose a not backtrackable Bad
as the result of line 9, because line 11 would make it backtrackable again.
And Then Bool Comes to the Rescue
The only thing we can do is to use a Good
result in the βFailureβ use case, which wouldnβt be changed by the P.oneOf
in line 1. We then have to turn it into the desired Bad
result in a second step.
Changing a Good
result into something else can be done with P.andThen
. So the idea is to add P.andThen
at the end of the code and - depending on the value of the input parameter - return the desired result.
What should we pass to P.andThen
? Why not simply pass βTrueβ in the βSuccessβ use case and βFalseβ in the βFailureβ use case? Then we can use an if
expression to return the desired result.
Letβs see how this looks:
Code:
1 P.oneOf
2 [ P.oneOf
3 [ parser
4 |> P.backtrackable
5 |> P.andThen (\_ -> P.commit ())
6 |> P.andThen (\_ -> P.problem "")
7 , P.succeed False
8 ]
9 |> P.backtrackable
10 , P.succeed True
11 ]
12 |> P.andThen
13 (\success ->
14 if success then
15 P.succeed ()
16 else
17 parser
18 |> P.backtrackable
19 |> P.map (\_ -> ())
20 )
In line 7, we simply return False
. The parser for the final result in the βFailureβ use case is in lines 17-19. Analogously, for the βSuccessβ use case, we return True
in line 10, and put the parser for the final result in line 15.
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ€
3 β parser β Good ? a β Bad ? e β
4 β |> P.backtrackable β Good True a β Bad True e β
5 β |> P.andThen (\_ -> P.commit ()) β Good False () β Bad True e β
6 β |> P.andThen (\_ -> P.problem "") β Bad False "" β Bad True e β
β β β β
7 β P.succeed False β β Good True False β
β β β β
2 β P.oneOf [ {3-6}, {7} ] β Bad False "" β Good True False β
9 β |> P.backtrackable β Bad True "" β Good True False β
β β β β
10 β P.succeed True β Good True True β β
β β β β
1 β P.oneOf [ {2-9}, {10} ] β Good True True β Good True False β
β β β β
15 β P.succeed () β Good True () β β
β β β β
17 β parser β β Bad ? e β
18 β |> P.backtrackable β β Bad True e β
19 β |> P.map (\_ -> ()) β β Bad True e β
β β β β
12 β {1-11} |> P.andThen {13-19} β Good True () β Bad True e β
OK, this looks good: the result of the P.oneOf
in lines 1-11 (the first step) is Good True True
in the βSuccessβ use case and Good True False
in the βFailureβ use case. We change this in the P.andThen
in lines 12-20 (the second step) into the desired results.
And indeed, the code works! It passes all tests!
Butβ¦ weβre not finished, yet.
Code Golfing
The code works, but it isnβt exactly the code I came up with. If I remember correctly, I never had a version that looked like the current code. So letβs see how we can transform the current code into the final version.
The current code roughly has the following structure:
β Step 1 β Combinator β Step 2 β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββββββ€
β Parser a -> Parser Bool β >> P.andThen β Bool -> Parser () β
We can interpret the first step as a function from Parser a
to Parser Bool
. The second step is a function from Bool
to Parser ()
. Both functions are combined with >> P.andThen
to finally yield a function from Parser a
to Parser ()
.
Letβs first generalize the input of the first step and the output of the second step:
Parser a
βinput
Parser ()
βoutput
β Step 1 β Combinator β Step 2 β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββββββ€
β input -> Parser Bool β >> P.andThen β Bool -> output β
Now the first step is a function from a general input
type to Parser Bool
, the second step is a function from Bool
to output
, so the combination of both steps is a function from input
to output
.
If we wouldnβt return a Parser Bool
but only a Bool
in the first step, we could use the forward function composition operator >>
alone:
Parser Bool
βBool
>> P.andThen
β>>
β Step 1 β Combinator β Step 2 β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββββββ€
β input -> Bool β >> β Bool -> output β
Now the first step is a function which processes the input
parameter into a True
or a False
value. The second step takes this Bool
value and returns either a value for the True
case or another value for the False
case.
Normally I wouldnβt split a function like that into two steps:
(\input -> ..... True ..... False .....)
>>
(\flag -> if flag then trueValue else falseValue)
Iβd rather directly use trueValue
instead of True
and falseValue
instead of False
:
\input -> ..... trueValue ..... falseValue .....
Then I wouldnβt need a second step at all:
β Step 1 β Combinator β Step 2 β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββββββ€
β input -> output β β β
Unfortunately, as we have seen above, we canβt use the falseValue
(the parser for the βFailureβ use case) directly. Therefore we had to introduce a second step, and thereβs no way around it. But we can still omit the Bool
s, even with a two-step implementation:
β Step 1 β Combinator β Step 2 β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββββββ€
β input -> output β >> β output -> output β
In code:
(\input -> ..... trueValue ..... falseValue .....)
>>
identity
By using trueValue
and falseValue
directly, we can simply use the identity
function as the second step.
Can we do this with our Parser
code, too?
Yes we can. We just have to reverse the transformations we did aboveβ¦
First, we re-introduce Parser
s, both around the return type of the first step and in the combinator:
output
βParser output
>>
β>> P.andThen
β Step 1 β Combinator β Step 2 β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββββββ€
β input -> Parser output β >> P.andThen β output -> output β
Second, we re-introduce the real input and output types:
input
βParser a
output
βParser ()
β Step 1 β Combinator β Step 2 β
ββββββββββββββββββββββββββββββββββΌβββββββββββββββΌβββββββββββββββββββββββββ€
β Parser a -> Parser (Parser ()) β >> P.andThen β Parser () -> Parser () β
(Iβll briefly talk about the weird result type of the first step in the closing remarksβ¦)
This gives us the final version:
Code:
1 P.oneOf
2 [ P.oneOf
3 [ parser
4 |> P.backtrackable
5 |> P.andThen (\_ -> P.commit ())
6 |> P.andThen (\_ -> P.problem "")
7 , P.succeed
8 (parser
9 |> P.backtrackable
10 |> P.map (\_ -> ())
11 )
12 ]
13 |> P.backtrackable
14 , P.succeed
15 (P.succeed ())
16 ]
17 |> P.andThen identity
In lines 8-10, I replaced False
with falseValue
, which is the parser for the βFailureβ use case. In line 15, I replaced True
with trueValue
, the parser for the βSuccessβ use case. And in line 17, I replaced the second step with the identity
function.
Results:
# β Code β Success β Failure β
ββββΌβββββββββββββββββββββββββββββββββββββΌβββββββββββββββββΌβββββββββββββββββββ€
3 β parser β Good ? a β Bad ? e β
4 β |> P.backtrackable β Good True a β Bad True e β
5 β |> P.andThen (\_ -> P.commit ()) β Good False () β Bad True e β
6 β |> P.andThen (\_ -> P.problem "") β Bad False "" β Bad True e β
β β β β
8 β parser β β Bad ? e β
9 β |> P.backtrackable β β Bad True e β
10 β |> P.map (\_ -> ()) β β Bad True e β
β β β β
7 β P.succeed {8-10} β β Good True {8-10} β
β β β β
2 β P.oneOf [ {3-6}, {7-11} ] β Bad False "" β Good True {8-10} β
13 β |> P.backtrackable β Bad True "" β Good True {8-10} β
β β β β
15 β P.succeed () β Good True () β β
β β β β
14 β P.succeed {15} β Good True {15} β β
β β β β
1 β P.oneOf [ {2-13}, {14-15} ] β Good True {15} β Good True {8-10} β
17 β |> P.andThen identity β Good True () β Bad True e β
This still works, and this is the code I came up with back then
Itβs no wonder no one understood that codeβ¦
Closing Remarks
You might be puzzled by the result type of the first step: Parser (Parser ())
. A parser that creates another parser ???
In lines 7 and 14, we simply wrap the parsers for the βFailureβ and βSuccessβ use cases with P.succeed
. Therefore, the resulting Parser (Parser ())
doesnβt do any parsing at all. Itβs only use is to yield a Good
result in both use cases, as described above.
The second step of the final version is
step2 =
P.andThen identity
I didnβt show the type signature of this function on purpose. You can use the Elm REPL or maybe your IDE to show you the type.
You can use this pattern with every type that has an andThen
function, for example with Maybe
, Result
, or Json.Decode.Decoder
. Those types are often called βMonadsβ in functional programming, and the pattern shown is called join.
The combinator used to connect the two steps is
combinator f1 f2 =
f1 >> P.andThen f2
Itβs type signature is also interesting, and again, you can create such a function for other types with an andThen
function, too. In Haskell, the function is written as >=> and called βLeft-to-right composition of Kleisli arrowsββ¦
Thank you for reading this far. If you have any questions or comments, feel free to leave them below. You can also find me as @Pit on the Elm slack.