Error-tolerant Elm parser (for editor tooling)

The problem of coming up with a high-quality, standard way for editor tools to query source code is tricky stuff that takes time to get right. This post is about just a piece of that puzzle.


One main piece of info that IDE tools need is a project-wide map of module info (exports, types, locations, etc.) I’m aware this is something that’s still being worked out.

But an area I don’t think anyone is working on, and that I think the community could greatly benefit from is a smaller, specialized parser for dealing with incomplete/invalid syntax as a user is typing. The idea being that unlike the compiler which bails on errors, you’d try to recover from errors and keep filling out as much of the AST as possible.

It would enable editor features like:

  • completions, usages, and type info for locally scoped definitions
  • “as you type” linting that can reason about incomplete code, making errors more relevant and less distracting
  • allowing elm-format to still work in the presence of errors
  • type-directed autocomplete

I imagine this tool operating in a fine-grained manner on single declarations as they change in an open file. It could work alongside other tools that produce more coarse structured output.


I’m interested in taking on a project like this. It would require me to learn more advanced parsing techniques, but I’m up for the challenge.

Before I go further I’d like hear what others think of this and how it fits in with the larger picture.

9 Likes

FWIW, here’s a great intro to error-tolerant parsing for reader’s context (by Swift’s Joe Groff).

5 Likes

I am the author of an Elm language plugin for IntelliJ, and I recently worked on parse error recovery. You might want to look at that for an example, although I must admit that there was some trial-and-error on my part, and the grammar is not as clean as I would like it to be.

My parser was built using the GrammarKit parser generator. GrammarKit is commonly used for custom language support in the IntelliJ (WebStorm) family of IDEs. One unique thing about IntelliJ is that the editor continually parses the text buffer into an AST. Most of the IDE features operate on this AST, and since the input is being actively edited, parse error recovery has to be very good.

GrammarKit’s equivalent concepts for the synchronization points discussed in Joe Groff’s blog post would be pin and recoverWhile. Message me on Slack if you want to talk about it any further.

2 Likes

I saw this project - tree-sitter - mentioned while reading some progress updates on the atom team’s new editor work

1 Like

elm-format already does this to a small degree, and I expect to add more lenient parsing features to elm-format in the future. elm-format will also soon have an AST output mode, which is meant to be useful to editor plugins and refactoring tools. How can we better align those goals for elm-format with the needs of IDE plugins?

2 Likes

@avh4 that’s good to hear.

For IntelliJ plugin authors, using an external process to do the parsing is not really an option. But there were several other editor/IDE plugin authors who were talking about consolidating on a language server to do this sort of work. See #elm-language-server in Slack. You may also want to ask around in #editors-and-ides, which is somewhat active.

@avh4 I’d love to collaborate on this. I’ve been experimenting with megaparsec’s recovery feature using elm-format’s parser.

An AST output mode would be great for non-Haskell tools. Considering Haskell-based tools, I think it would make sense to move the parser into a separate library that both elm-format and others could import and use in-memory.

As @klazuka mentioned, #elm-language-server is a thing. One effort already underway is gyzerok/elm-language-server which I believe is aiming to put elmjutsu’s code on a node server. My goal is to write a language server from scratch in Haskell using this improved parser – my focus being performance, accuracy, and maintainability in the long run.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.