Working on building a parser for a spec where certain keywords are case-insensitive. For instance: “EXT” “ext” & “EXt” are all equivalent.
I’m struggling to find an easy way to parse them in a way that doesn’t lead to a lot of backtracking (which, to my understanding, hurts performance – performance is critical for me).
Here’s what I’ve tried:
oneOf [keyword "EXT", keyword "ext", keyword "EXt", etc..] - Works, but I’m worried about the amount of backtracking, especially since it’s already a nested oneOf, and since having over 100 of these per document would be perfectly normal.
Writing my own caseInsensitiveKeyword function, by copy and pasting the code from the Elm/Parser library – This doesn’t work because I have to define my own caseInsensitiveToken as well, and this required using the Parser constructor, which isn’t exposed.
I believe I could achieve this with chomping and checking, but this is not ideal, as there are about 8 or 9 variations I’d have to create.
Does anyone have thoughts on an easy way to do this that doesn’t sacrifice too much performance? I’d prefer not to switch to regex, since I’m hoping to provide Elm-like nice error messages (for people writing screenplays).
I think using chompIf is really the only option here.
Something like this should work and be quite fast:
succeed ()
|. chompIf (\c -> Char.toLower c == 'e')
|. chompIf (\c -> Char.toLower c == 'x')
|. chompIf (\c -> Char.toLower c == 't')
-- parse space to distinguish between keyword and names like `exts`
|. chompIf isWhitespace
Using Parser.Advanced and inContext you can leave enough information to create nice error messages (so “I expected a keyword but got …” instead of “I expected e or E but got …”).
I believe that you still need to put them as backtrackable in case several parsers accept the same beginning, for example I think that:
test : Parser ()
test =
oneOf
[ caseInsensitiveKeyword "ext"
, caseInsensitiveKeyword "exs"
]
run test "eXs "
will not work, however:
test : Parser ()
test =
oneOf
[ backtrackable (caseInsensitiveKeyword "ext")
, caseInsensitiveKeyword "exs"
]
will.
As an aside, caseInsensitiveToken could be written with a single backtrackable like this (you have to pass the case insensitive token in lower case and it uses loop for tail-call-elimination):