Parsers with Error Recovery

I found some time at the weekend to look through these. The parsing papers are not really much use when it comes to Elm, as they are all about LR parsers - tree sitter is presumably LR parser based. The Elm parser is a recursive descent parser and so is for LL grammars.

This is why it has backtrackable for situations where you need to parse ahead, realize you are on the wrong branch and then back-up and try a different one. For example in C, you might have a function signature or implementation:

// Signature
void f(int, const char*);

// Implementation
void f(int, const char*)
{
}

You don’t know which you are getting until you hit the ; or {. With an LL parser you either re-design the grammar to be more LL friendly (that is make the void f(int, const char*) bit a production of the grammar that forms part of a sig or implementation), or look ahead.

Googling for “error recovery in LL parser” or “error recovery in recursive descent parser” does turn up some relevant stuff.

From the tree sitter stuff though, this paper is quite fascinating on the topic of incremental tooling, https://www2.eecs.berkeley.edu/Pubs/TechRpts/1997/CSD-97-946.pdf. One thing to note is that written over 20 years ago, computers were a lot slower, I think that a simple chunking strategy might be good enough to keep the problem small enough with the amount of grunt we have at our disposal these days. Also, there seems to be an obsession with reaching the milestone having things fast enough to re-parse on every key stroke. A significant achievement, but I think its not really necessary as most editors implement a delay before offering auto suggests anyway, so fine to do it a few hundred ms after the last keystroke.