[SOLVED] Elm/Parser - Parsing keyword in a case insensitive way

Working on building a parser for a spec where certain keywords are case-insensitive. For instance: “EXT” “ext” & “EXt” are all equivalent.

I’m struggling to find an easy way to parse them in a way that doesn’t lead to a lot of backtracking (which, to my understanding, hurts performance – performance is critical for me).

Here’s what I’ve tried:

  1. oneOf [keyword "EXT", keyword "ext", keyword "EXt", etc..] - Works, but I’m worried about the amount of backtracking, especially since it’s already a nested oneOf, and since having over 100 of these per document would be perfectly normal.

  2. Writing my own caseInsensitiveKeyword function, by copy and pasting the code from the Elm/Parser library – This doesn’t work because I have to define my own caseInsensitiveToken as well, and this required using the Parser constructor, which isn’t exposed.

I believe I could achieve this with chomping and checking, but this is not ideal, as there are about 8 or 9 variations I’d have to create.

Does anyone have thoughts on an easy way to do this that doesn’t sacrifice too much performance? I’d prefer not to switch to regex, since I’m hoping to provide Elm-like nice error messages (for people writing screenplays).

I think using chompIf is really the only option here.
Something like this should work and be quite fast:

succeed ()
    |. chompIf (\c -> Char.toLower c == 'e')
    |. chompIf (\c -> Char.toLower c == 'x')
    |. chompIf (\c -> Char.toLower c == 't')
    -- parse space to distinguish between keyword and names like `exts` 
    |. chompIf isWhitespace 

If you have many such keywords then

caseInsensitiveKeyword : String -> Parser ()
caseInsensitiveKeyword kw = 
    let folder elem accum = 
            accum 
                |. chompIf (\c -> Char.toLower c == Char.toLower elem)
    in
    List.foldl folder (succeed ()) (String.toList kw)
        |. chompIf isWhitespace

Using Parser.Advanced and inContext you can leave enough information to create nice error messages (so “I expected a keyword but got …” instead of “I expected e or E but got …”).

2 Likes

Didn’t even consider folding. Yes, this will work well. Thanks.

I believe that you still need to put them as backtrackable in case several parsers accept the same beginning, for example I think that:

test : Parser ()
test =
    oneOf
        [ caseInsensitiveKeyword "ext"
        , caseInsensitiveKeyword "exs"
        ]

run test "eXs "

will not work, however:

test : Parser ()
test =
    oneOf
        [ backtrackable (caseInsensitiveKeyword "ext")
        , caseInsensitiveKeyword "exs"
        ]

will.

As an aside, caseInsensitiveToken could be written with a single backtrackable like this (you have to pass the case insensitive token in lower case and it uses loop for tail-call-elimination):

iToken : String -> Parser ()
iToken token =
    backtrackable (loop token iTokenHelp)


iTokenHelp : String -> Parser (Step String ())
iTokenHelp chars =
    case String.uncons chars of
        Just ( char, remainingChars ) ->
            oneOf
                [ succeed (Loop remainingChars)
                    |. chompIf (\c -> Char.toLower c == char)
                , problem ("Expected case insensitive \"" ++ chars ++ "\"")
                ]

        Nothing ->
            succeed <| Done ()

It will also stop as soon as a character comparison fails, which is not the case with the (clever) foldl if I’m not wrong.

For example https://ellie-app.com/4MbL9nYFZRra1

1 Like

Both @folkertdev and @dmy were correct. Have it working, needed both solutions. Thanks to both of you. Mark as solved.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.