Code review: elm/parser for spreadsheet cell positions


#1

Hi!

From what I wrote in 7GUIs implementation in Elm, I used elm/parser and I’m not too sure I’m using it the right way, so wanted to ask for feedback (and learn!).

Inputs

I am parsing spreadsheet cell formulas, where they can look something like this (only relevant examples)

=A1
=sum(A1,B2,C3)
=sum(A1:B2,C3:C4,sum(C5,D7))

The relevant part of the ask for feedback (although I’d love to get it from all of it), is about the ambiguity between parsing a cell position like A1 and a range like A1:B1, in the middle of the input. I guess if you’ve done much parsing you already know where I’m going.

Parsers / Code to review

Position parser

source

type alias Position =
    { column : Char
    , row : Int
    }

parser : Parser Position
parser =
    succeed
        (\column row ->
            Position (String.uncons column |> Maybe.map Tuple.first |> Maybe.withDefault 'A') row
        )
        |= (getChompedString <| succeed () |. chompIf Char.isUpper)
        |= int

Although I got the job done, I still feel weird about this little parser. Particularly about the getChompedString and the “make my 1 char string a char” with String.uncons.

Position + Range parsing together

Initially, I wrote the range parser naively, and then put them up together from simpler to more complex parsing:

type alias Range =
    { from : Position, to : Position }

range : Parser Range
range =
    succeed Range
        |= Position.parser
        |. symbol ":"
        |= Position.parser

expression : Parser Expression
expression =
    oneOf
        [ map EFloat float
        , map ECoord Position.parser
        , map ERange range
        , map EApplication application
        ]

At this point, everything worked well except for the ranges, which would fail. I tried changing the order in the oneOf:

expression : Parser Expression
expression =
    oneOf
        [ map EFloat float
        , map ERange range
        , map ECoord Position.parser
        , map EApplication application
        ]

But then the positions wouldn’t work.

I don’t think I fully grasp oneOf, since it seems that it tries a parser and if it fails (like the float one) it tries the other ones, but I guess since the range parser is composed of other parsers then it is committed to that path and doesn’t go back on the oneOf.

In the end, I ended up making the first part of range backtrackable, so that if it can’t parse a : it can go back and try other parsers (I guess that’s what backtrackable does?), and I put the range parser first because it is the longest. So that it tries it out and if it doesn’t seem like a range it goes on.

range : Parser Range
range =
    succeed Range
        |= backtrackable Position.parser
        |. symbol ":"
        |= Position.parser

expression : Parser Expression
expression =
    oneOf
        [ map EFloat float
        , map ERange range
        , map ECoord Position.parser
        , map EApplication application
        ]

Questions

  • Do you have any suggestions for the position, range, or expression parsers? Anything that could be improved or done differently?
  • Is my understanding of backtrackable correct?
  • What/When do you use commit? What does it do?

Thank you!


#2

Regarding your first question:
You are right at the moment there is no primitive parser that parses one character. One alternative is writing such parser by hand with

parsePositionColumn : Parser Char
parsePoisitionColumn = 
    Parser.oneOf
        [ Parser.map (always 'A' ) ( Parser.symbol "A" )
        , Parser.map (always 'B' ) ( Parser.symbol "B" )
        ...
        ]

Towards your second question:
Your understanding of oneOf seems correct for me. OneOf only tries the next parser if the previous parser did not commit (and of course failed). “Did not commit” can be phrased as “did not consume any text”. Idea is to minimise the amount of text parsed twice. Since this makes your parser slower.

While your parser works it is not the most efficient, because you backtrack too much. I.e. if you just want to parse a position, this position gets parsed twice. You can avoid this if you have one parser that parses the position and the range. E.g.:

Parser.succeed ...
   |= Position.parser 
   |= 
      Parser.oneOf
         [ Parser.succeed Just
             |. symbol ":"
             |= Position.parser
         , Parser.succeed Nothing
         ]

You need to fill the dots (…) with a suitable function that captures both cases: That you parsed a range (then the second argument is Just) and that you parsed just one position. Note that you don’t need backtracking at all. This is the case because symbol only commits if it succeeds. I.e. if you parsed the “:” you want to parse a position next.

Understanding the semantics of this library isn’t that easy and they differ slightly from the same named functions in the other libraries (e.g. json). Therefore I recommend to read https://github.com/elm/parser/blob/1.1.0/semantics.md .


#3

I think that using a Char for the column might not be optimal for a few reasons:

  • you will be limited to 26 columns but you have the whole Int range for rows
  • depending how you will store your cells, indexing based on a Char might not be the most practical

In most spreadsheets, after the Z column there is AA, so the row is actually an integer in base 26 using the upper case alphabet (without any zero character). So I would parse a base 26 integer instead of a Char, for example with:

column : Parser Int
column =
    getChompedString (chompWhile Char.isUpper)
        |> andThen base26Int


base26Int : String -> Parser Int
base26Int str =
    base26IntHelp str 0


base26IntHelp : String -> Int -> Parser Int
base26IntHelp str sofar =
    case String.uncons str of
        Just ( c, tail ) ->
            base26IntHelp tail (26 * sofar + (1 + Char.toCode c - Char.toCode 'A'))

        Nothing ->
            succeed sofar

Note that you could use Parser.loop instead to have the parser optimized as a loop with tail-call elimination.

You can then easily get a position:

type alias Position =
    { column : Int
    , row : Int
    }

position : Parser Position
position =
    succeed Position
        |= column
        |= int

Once you have such a Position, if it is followed by : then another position, you get a range expression, else you just get a position expression, no need to use backtrackable. So you could parse it like this:

type Expression
    = EPosition Position
    | ERange Range


type alias Range =
    { from : Position
    , to : Position
    }

expression : Parser Expression
expression =
    position
        |> andThen positionOrRange


positionOrRange : Position -> Parser Expression
positionOrRange pos =
    oneOf
        [ succeed (ERange << Range pos)
            |. symbol ":"
            |= position
        , succeed (EPosition pos)
        ]

We can then test this minimal expression parser:

main : Html msg
main =
    div []
        [ p [] [ text <| Debug.toString (run expression "CZ3") ]
        , p [] [ text <| Debug.toString (run expression "R2:D2") ]
        ]
Ok (EPosition { column = 104, row = 3 })

Ok (ERange { from = { column = 18, row = 2 }, to = { column = 4, row = 2 } })

Here is the Ellie: https://ellie-app.com/4yWyZcHJM4ma1

At last, you can add other expressions in expression by using a oneOf:

expression : Parser Expression
expression =
    oneOf
        [ position |> andThen positionOrRange
        , ...
        ]

Edit: I missed the fact that you implemented the 7GUIs problems that only require columns A to Z. Anyway, you can limit to one letter the parser by using chompIf instead of chompWhile:

column : Parser Int
column =
    getChompedString (chompIf Char.isUpper)
        |> andThen base26Int

So I will let my answer as is as it might still be useful in general.


#4

Thanks @Laurin! I’ll try out combining the position and range parser like you mention.


#5

Thanks @dmy , very useful to see your version too!

As you mention I stuck to the exercise requirements as they are. In my case I do have functions on the position module to convert from row/column to x/y int coordinates, and the other datastructures work all with position internally abstracting any burden.


#6

I’ve followed your lovely advice and changed the parser to not use backtrackable. Easy peasy! :raised_hands: See commit:

Thanks a lot!


closed #7

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.