Learn about useful chompers that can help you write better parsers.
Interesting post.
I love parser combinators, I’ve learnt a bit about them via OCaml (just dabbling).
One thing I noticed is that it’s pretty much the only time where I’d make sens to me to actually introduce a runtime error, to make the program fail loud and clear (when something goes horribly horribly wrong).
Because it seems to me that some operations can’t be expressed correctly by the type system when parsing.
For instance, in your decimal function, you use Maybe.withDefault
to satisfy the type system’s constraints. But that constraint is a little bit artificial, because you do know that only floats will come out of chompDecimal
.
decimal : Parser Float
decimal =
chompDecimal
|> P.mapChompedString (\s () -> String.toFloat s |> Maybe.withDefault 0)
|> lexeme
So in this specific case, it feels to me that satisfying the type system actually gives no value (it’s extra work for zero benefit)
This is what I’d prefer to write for clarity. But even then I’m forced to represent that same branch that’ll actually never trigger.
decimal : Parser Float
decimal =
chompDecimal
|> P.getChompedString
|> P.andThen
(\s ->
case String.toFloat s of
Just n ->
P.succeed n
Nothing ->
P.problem "impossible"
)
Any thoughts?
I understand your point and it would be nice if there was a way around it in Elm. But there isn’t so I don’t lose sleep over it.
When I was writing elm-natural
I designed the API so that the Maybe.withDefault
was hidden in cases where the user knew for certain they had valid input. See for e.g. fromSafeInt and fromSafeString.
So if you were to write a natural
parser it could look like:
natural : Parser Natural
natural =
chompNatural
|> P.mapChompedString (\s () -> Natural.fromSafeString s)
|> lexeme
BTW, I like your decimal
version better because in the off chance the conversion does fail you get a clear parser error (assuming you improve the problem message).
Of course, it’s not a “big” problem I wouldn’t loose sleep over it either
I thought I’d post because maybe someone knows a trick or two I could learn about.
You say you don’t know a way around this in Elm, but presumably there is in another language?
I’m wondering because from the type system’s perspective, all makes sens, but we are left with something not ideal in the end.
fromSafeString
looks like a useful convention, I like it.
I thought about it a bit more and I think we can also generalize the problem like this:
{-| `fromMaybe` assumes that applying `f` will never return `Nothing`.
If it does, the parsing fails.
-}
fromMaybe : (a -> Maybe b) -> a -> Parser b
fromMaybe f =
f
>> Maybe.map P.succeed
>> Maybe.withDefault (P.problem "Bug: f produced Nothing!")
decimal : Parser Float
decimal =
chompDecimal
|> P.getChompedString
|> P.andThen (fromMaybe String.toFloat)
Then I feel things are slightly better that way.
Yes, in Haskell you can do it with a partial function. For e.g. if you know for sure you’re dealing with a non-empty list at some point in your application then you can use head without having to handle a Maybe
result.
head :: [a] -> a
head (x:_) = x
Thank you for the article, Dwayne.
If you want to improve performance even more, you could use the following approach:
- Use
Parser.getOffset
andParser.getSource
and thenString.dropLeft
to get the string that hasn’t been parsed yet - Implement the checks of the chompers without using the
Parser
module, for example via loops withString.uncons
- In the case of success, advance the parser with
Parser.token
, giving it the parsed text as the token
How to handle the failure case depends upon which error you want to generate at which position. One solution would be to advance the parser to the failing position with Parser.token
like in the success case, followed by a Parser.chompIf isGood
.
If you don’t need the error to be reported at the exact first failing position, I found that you can improve performance even more for chompers with a given lower bound (chompAtLeast
, chompBetween
, and chompExactly
) by using String.all isGood
for the part up to the lower bound instead of a loop.
This is just a very brief description. I can send you a sample implementation, if you like.
Thank you again for the article and the code.
Thanks for finding these interesting ways to improve performance. I’d have to try them out myself at some point but also feel free to share your sample implementation.
This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.