Useful Chompers

Useful Chompers

Learn about useful chompers that can help you write better parsers.

13 Likes

Interesting post.

I love parser combinators, I’ve learnt a bit about them via OCaml (just dabbling).

One thing I noticed is that it’s pretty much the only time where I’d make sens to me to actually introduce a runtime error, to make the program fail loud and clear (when something goes horribly horribly wrong).

Because it seems to me that some operations can’t be expressed correctly by the type system when parsing.

For instance, in your decimal function, you use Maybe.withDefault to satisfy the type system’s constraints. But that constraint is a little bit artificial, because you do know that only floats will come out of chompDecimal.

decimal : Parser Float
decimal =
    chompDecimal
        |> P.mapChompedString (\s () -> String.toFloat s |> Maybe.withDefault 0)
        |> lexeme

So in this specific case, it feels to me that satisfying the type system actually gives no value (it’s extra work for zero benefit)

This is what I’d prefer to write for clarity. But even then I’m forced to represent that same branch that’ll actually never trigger.

decimal : Parser Float
decimal =
    chompDecimal
        |> P.getChompedString
        |> P.andThen
            (\s ->
                case String.toFloat s of
                    Just n ->
                        P.succeed n

                    Nothing ->
                        P.problem "impossible"
            )

Any thoughts?

I understand your point and it would be nice if there was a way around it in Elm. But there isn’t so I don’t lose sleep over it.

When I was writing elm-natural I designed the API so that the Maybe.withDefault was hidden in cases where the user knew for certain they had valid input. See for e.g. fromSafeInt and fromSafeString.

So if you were to write a natural parser it could look like:

natural : Parser Natural
natural =
    chompNatural
        |> P.mapChompedString (\s () -> Natural.fromSafeString s)
        |> lexeme

BTW, I like your decimal version better because in the off chance the conversion does fail you get a clear parser error (assuming you improve the problem message).

1 Like

Of course, it’s not a “big” problem I wouldn’t loose sleep over it either :wink:

I thought I’d post because maybe someone knows a trick or two I could learn about.

You say you don’t know a way around this in Elm, but presumably there is in another language?

I’m wondering because from the type system’s perspective, all makes sens, but we are left with something not ideal in the end.

fromSafeString looks like a useful convention, I like it.

I thought about it a bit more and I think we can also generalize the problem like this:

{-| `fromMaybe` assumes that applying `f` will never return `Nothing`. 
If it does, the parsing fails.
-}
fromMaybe : (a -> Maybe b) -> a -> Parser b
fromMaybe f =
    f
        >> Maybe.map P.succeed
        >> Maybe.withDefault (P.problem "Bug: f produced Nothing!")


decimal : Parser Float
decimal =
    chompDecimal
        |> P.getChompedString
        |> P.andThen (fromMaybe String.toFloat)

Then I feel things are slightly better that way.

1 Like

Yes, in Haskell you can do it with a partial function. For e.g. if you know for sure you’re dealing with a non-empty list at some point in your application then you can use head without having to handle a Maybe result.

head :: [a] -> a
head (x:_) = x
1 Like

Thank you for the article, Dwayne.

If you want to improve performance even more, you could use the following approach:

  • Use Parser.getOffset and Parser.getSource and then String.dropLeft to get the string that hasn’t been parsed yet
  • Implement the checks of the chompers without using the Parser module, for example via loops with String.uncons
  • In the case of success, advance the parser with Parser.token, giving it the parsed text as the token

How to handle the failure case depends upon which error you want to generate at which position. One solution would be to advance the parser to the failing position with Parser.token like in the success case, followed by a Parser.chompIf isGood.

If you don’t need the error to be reported at the exact first failing position, I found that you can improve performance even more for chompers with a given lower bound (chompAtLeast, chompBetween, and chompExactly) by using String.all isGood for the part up to the lower bound instead of a loop.

This is just a very brief description. I can send you a sample implementation, if you like.

Thank you again for the article and the code.

1 Like

Thanks for finding these interesting ways to improve performance. I’d have to try them out myself at some point but also feel free to share your sample implementation.

Done :slightly_smiling_face:

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.