Help with a simple recursive parser


I am trying to write a parser for basic regular expressions using elm-tools/parser version 2.0.1 but I cannot. By saying “basic” I mean that I only need alternation, concatenation, iteration and groups.

I want to parse those regular expressions into the following type

type Regexp
    = Alternation (List Regexp)
    | Concatenation (List Regexp)
    | Iteration Regexp
    | Group Regexp
    | Leaf Char

So, for example, what I want to get after parsing the expression



  [ Iteration
          [ Leaf `a`
          , Leaf `b`
  , Concatenation 
      [ Group 
          Leaf `c`
      , Iteration
          Leaf `d`

Here is what I did which only partially works.

Maybe somebody can help me.

PS: In case you are curious, this is for a web-app that visualizes regex-matching algorithms step-by-step (using finite automata). I do it for my students and will make it open-source. The rest of the app is almost ready, I just need the parser.

1 Like

Hey there!

I was able to tweak the parser such that it works for all of the scenarios you had in the original snippet. I had a lot of fun playing with it, so thanks for your question! I left some comments that describe my thinking, but please let me know if you have any further questions:


Hi Kofi,

thank you for your answer! :heart:

You say that you tweaked it but you actually rewritten it, in a better way.
What I liked is that

  • you don’t use Parser.LanguageKit
  • you avoid 1-element Alternations and Concatenations

There is a problem with the operator precedence though.
The convention is that Iteration binds stronger then Concatenation.
Consider, for example, the regex ab*.
The iteration should not bind the ab but only the character b, which is not the case in your solution.

Anyway, after reading the parser documentation and your solution, I have rewritten it and I think it works correctly now.

Here it is:

Thank you again for your answer.