When do you store info in Problems vs Contexts when using the advanced elm/parser library?

I’m working on an app that uses an HTML parser written with Parser.Advanced from the elm/parser library. In working on the parser, I’m finding that I almost never use contexts. My interpretation of the documentation is that contexts are the sort of thing you might use to store information like “I am currently parsing a ‘header’ tag” so that you can then provide that information in errors. But my instinct is that I would rather store that information somewhere where I have a type guarantee that it will be present.

For example, instead of pushing something like HtmlElement "header" onto the context stack when parsing a “header” element, my code currently has a Problem type of ExpectingClosingTagFor String that will store that "header" string. This approach gives me a type guarantee that I’ll have that information, and the information is a little less awkward to retrieve vs extracting from the context stack. Although it does sometimes require passing things like the "header" string around a bit make sure they are available when the Problem is created.

I would be interested in hearing from other people who have been writing parsers with the elm/parser Parser.Advanced module. At present, I am pretty consistently storing data in problems rather than contexts. Am I missing something about when or how contexts are a more useful way to do this? Does anybody else find themselves making similar decisions?

1 Like

I use contexts within the Elm compiler to provide contextual information. For example:

initialModel : Model
initialModel =
    { count =  }

Produces the following error message:

Something went wrong while parsing a record in initialModel's definition.

14|     { count =  }
                   ^
I was expecting to see an expression, like x or 42.

The error message knows the name of the definition and that you are parsing a record right now. That is all stored in context. The actual error in this case is “the oneOf in the parser for expressions ran out of possibilities” but I put that all back together in a friendlier way.

If you imagine the ideal parse errors for HTML though, perhaps tracking context does not help. I think that may be related to the particular design of HTML syntax though!

1 Like

I think this is because the html grammar is pretty flat and simple, and the use practical use of the context for the end user is limited in this case.

I could see a stack like

Tag "article" :: Attribute "width"  :: Number :: []

which means that while parsing the width attribute of the article tag, there was a parse error in a number.

While developing the parser, this is much more useful than “invalid number at position (x,y)”, because you’d like to know how your parser got into that state to fix the error in the parser.

But for the end user, assuming the parser is correct, the error is in the input. Then "invalid number at (x,y) is fine, because that is usually all that’s needed to fix the error.

Only for more complex errors, where the invalid input put the parser in a completely wrong state is the context useful because then the specific error might hide the underlying problem.

given for instance <div <p>=</p></div>. Depending on how your parser is structured, this could give an error like “invalid character < in attribute name”, but the actual error is quite different (the div tag is not closed). In this case “something went wrong in parsing the attributes of the div tag” might be more helpful.

3 Likes

That makes sense. Thinking about your example versus mine, I think the key conceptual difference between the two approaches is about how much the context determines which errors are possible. If most different types of errors can happen within most different types of contexts (for example, you will see your “expecting expression” error in many different places), then keeping the problem and context data separate makes more sense. But if the error can only happen in a very specific situation (for example, missing a closing tag or an attribute value) then gathering the data in a type safe way is very helpful.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.