Elm/parser look ahead without backtrackable


#1

Problem

While trying to learn about elm/parser I decided to try and write a parser for urls. Because a lot of parts of a url are optional, I have to be able to look ahead in the given url to decide whether to go down a path or not. For example a url can start with or without a protocol, and as far as I know the deciding factor is whether or not a // is present.

Background

While digging through the documentation I found out about the oneOf (docs) function which I thought could be used for this problem. However, this function will only try other parsers if a previous parser hasn’t chomped any characters. Considering the parser for the “protocol” and the “auth” section of a url are both just strings, it will always pick the parser for the “protocol”.

The docs for the oneOf function also give a link to a document which mentions backtracking. This sounds like exactly the thing I need, however that document also mentions that if you shuffle your parsers around, you do not need any backtrackable parsers at all. Which brings me to my question.

Question

How would one write a parser with signature Parser (Maybe Protocol) without a backtrackable parser?

Extras

  • I’ve created an Ellie for this little project which can be found here

#2

I think the important part is that you should limit the amount of backtracking that can occur. Doing it for a couple of characters is fine.

In this case that will probably mean that you have something like

oneOf 
    [ backtrackable 
        ( succeed identity
            |. chompIf (\c -> c == '/')
            |. chompIf (\c -> c == '/')
            |. (commit ())
            |. restOfIt 
    , otherOption
    ]

Note this will not run, the argument and use of commit really depends on your scenario

So, this can backtrack if the two //s are not found, but if everything after that fails it will not backtrack.

You can also flip this around to try to parse the protocol name

oneOf
    [ backtrackable protocolName |> Decode.map Just
    , Decode.succeed Nothing
    ]

In both cases, the scope of the backtracking is very limited and it will be fine.
If you have an actual grammar you’re working off then it might be possible to do better (in general) but in this specific case I’m not sure that is possible (or particularly desirable; fast enough is fast enough)


#3

I definitely agree with you that in this scenario a backtracking solution is the way to go.

However, for the purpose of learning, I was wondering what the solution would look like without backtracking. In my understanding from the mentioned document about backtracking (See here), it should be possible to create a parser that doesn’t require the use of backtracking at all.
This statement in particular seems to hint at that possibility:

If you are strategic in shuffling parsers around, you can write parsers that do not need backtrackable at all.

I might be misunderstanding this sentence, so if I am, please let me know.

Even if my initial understanding was correct, it might still be a total overkill to use non backtrackable parsers for this trivial problem. But like I mentioned earlier, this project is strictly for learning purposes.


#4

I understand your confusion concerning the statement that you don’t need backtracking. In my understanding this just outlines that one should try to avoid backtracking as often as possible. In the worst case backtracking can lead to exponential runtime performance and should be avoided. But avoiding backtracking leads to code that is not as easy to understand and write. Therefore this is always a tradeoff between runtime performance and time to write the code.

There are two useful tips for you: The first is that the symbol parser just consumes (chompes characters) if it succeeds. So for example symbol "abc" will fail on input "ab" and will not consume any text. Therefore oneOf [symbol "abc", parser2] will try parser2 on this input. To learn more about the semantics of the parsers (inparticular commit, backtrackable, oneOf) I suggest to read https://github.com/elm/parser/blob/master/semantics.md .

The second tip is to delay any decision what you parse as long as possible. For example if you are not sure whether you are parsing a protocol or not you just “save” what you parsed until you are sure what to do.
In an other perspective you can imagine this as merging common paths between the parsers.

Going back to your example. I would suggest the following solution (just see this as a prototype)

protocolOrUsernameParser = 
   succeed ()
      |. chompUntil ":"
      |. symbol ":"
      |. oneOf
          [ succeed ()
              |. symbol "//"  
              |. -- now you can be sure that this a protocol 
          , -- if you are here you can be sure it is user
          ]

Aside: I don’t think that it is true in general that every parser can be written without backtrackble. However and this important almost all languages that are used don’t need it. Therefore you should always try to avoid it.


closed #5

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.