[SOLVED] Elm/Parser - Parsing keyword in a case insensitive way

Dan_Abrams · February 19, 2019, 1:06pm

Working on building a parser for a spec where certain keywords are case-insensitive. For instance: “EXT” “ext” & “EXt” are all equivalent.

I’m struggling to find an easy way to parse them in a way that doesn’t lead to a lot of backtracking (which, to my understanding, hurts performance – performance is critical for me).

Here’s what I’ve tried:

oneOf [keyword "EXT", keyword "ext", keyword "EXt", etc..] - Works, but I’m worried about the amount of backtracking, especially since it’s already a nested oneOf, and since having over 100 of these per document would be perfectly normal.
Writing my own caseInsensitiveKeyword function, by copy and pasting the code from the Elm/Parser library – This doesn’t work because I have to define my own caseInsensitiveToken as well, and this required using the Parser constructor, which isn’t exposed.

I believe I could achieve this with chomping and checking, but this is not ideal, as there are about 8 or 9 variations I’d have to create.

Does anyone have thoughts on an easy way to do this that doesn’t sacrifice too much performance? I’d prefer not to switch to regex, since I’m hoping to provide Elm-like nice error messages (for people writing screenplays).

folkertdev · February 19, 2019, 1:46pm

I think using chompIf is really the only option here.
Something like this should work and be quite fast:

succeed ()
    |. chompIf (\c -> Char.toLower c == 'e')
    |. chompIf (\c -> Char.toLower c == 'x')
    |. chompIf (\c -> Char.toLower c == 't')
    -- parse space to distinguish between keyword and names like `exts` 
    |. chompIf isWhitespace

If you have many such keywords then

caseInsensitiveKeyword : String -> Parser ()
caseInsensitiveKeyword kw = 
    let folder elem accum = 
            accum 
                |. chompIf (\c -> Char.toLower c == Char.toLower elem)
    in
    List.foldl folder (succeed ()) (String.toList kw)
        |. chompIf isWhitespace

Using Parser.Advanced and inContext you can leave enough information to create nice error messages (so “I expected a keyword but got …” instead of “I expected e or E but got …”).

Dan_Abrams · February 19, 2019, 1:48pm

Didn’t even consider folding. Yes, this will work well. Thanks.

dmy · February 19, 2019, 2:35pm

I believe that you still need to put them as backtrackable in case several parsers accept the same beginning, for example I think that:

test : Parser ()
test =
    oneOf
        [ caseInsensitiveKeyword "ext"
        , caseInsensitiveKeyword "exs"
        ]

run test "eXs "

will not work, however:

test : Parser ()
test =
    oneOf
        [ backtrackable (caseInsensitiveKeyword "ext")
        , caseInsensitiveKeyword "exs"
        ]

will.

As an aside, caseInsensitiveToken could be written with a single backtrackable like this (you have to pass the case insensitive token in lower case and it uses loop for tail-call-elimination):

iToken : String -> Parser ()
iToken token =
    backtrackable (loop token iTokenHelp)


iTokenHelp : String -> Parser (Step String ())
iTokenHelp chars =
    case String.uncons chars of
        Just ( char, remainingChars ) ->
            oneOf
                [ succeed (Loop remainingChars)
                    |. chompIf (\c -> Char.toLower c == char)
                , problem ("Expected case insensitive \"" ++ chars ++ "\"")
                ]

        Nothing ->
            succeed <| Done ()

It will also stop as soon as a character comparison fails, which is not the case with the (clever) foldl if I’m not wrong.

For example https://ellie-app.com/4MbL9nYFZRra1

Dan_Abrams · February 20, 2019, 5:31am

Both @folkertdev and @dmy were correct. Have it working, needed both solutions. Thanks to both of you. Mark as solved.

system · March 2, 2019, 5:32am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elm Parser performance and help Learn	2	966	May 6, 2018
Best way to write this in Elm? (Parser vs Regex) Learn	2	1146	January 11, 2018
Elm/parser look ahead without backtrackable Learn	4	1201	February 16, 2019
Parsing Example in Elm Request Feedback	4	1243	August 3, 2018
Dasch/parser, an easier to use parsing library Request Feedback	7	877	December 1, 2018

[SOLVED] Elm/Parser - Parsing keyword in a case insensitive way

Related topics