Here’s a little parsing problem I ran into while writing a parser for EDN (planning to announce that package soon). I used elm-tools/parser, but failed to find a good solution without adding a look-ahead primitive. I’m curious whether there’s a nice way to solve this with delayedCommit
, or in some other way?
Here’s a reduced version: We want to parse a nested datastructure of Lisp-like lists and integers, where the list parentheses are “self-delimited” for lack of a better term:
data Thing = Number Int | Things (List Thing)
(1 2 3) == ( 1 2 3 ) --> Things [ Number 1, Number 2, Number 3 ]
((1) 2) == ( (1)2 ) --> Things [ Things [ Number 1 ], Number 2 ]
(()1()) == ( () 1 () ) --> Things [ Things [], 1, Things [] ]
With look-ahead, we can make a parser for numbers that ensures the number is delimited:
import Parser as P exposing ((|.), (|=), Parser)
-- run a parser, then rewind input
lookAhead : Parser a -> Parser a
sep : Parser ()
sep = P.oneOf
[ P.ignore P.oneOrMore (\c -> c == ' ')
, lookAhead (P.oneOf [P.symbol "(", P.symbol ")"])
]
number : Parser Thing
number = P.succeed Number |= P.int |. sep
and put the whole thing together with a list parser:
thing : Parser Thing
thing = P.oneOf [number, things]
things : Parser Thing
things = P.succeed Things
|. P.symbol "("
|= P.repeat P.zeroOrMore thing
|. P.symbol ")"
(This doesn’t quite work, since it doesn’t eat all the optional whitespace this way, and lacks some lazy
. Here’s a complete version.)
I think the core issue I’m running into is that I need the closing parenthesis both to terminate the number, and to terminate the list. So trying to do this without look-ahead I found my number parser had to return both the number and the closing token, which made things … messy.
I hope I haven’t broken the problem down too far to illustrate the issue! And am very curious if you have some suggestions for how to tackle this.