Asynchronous parsing

I’ve used elm/parser to build a substantial parser of a markdown format, and the result is really great.

However, the use-case involves routinely parsing several hundred page documents, and while the parsing takes only a few seconds, it still locks up the UI for several seconds.

The obvious solution is to move the parser to a web worker, and parse asynchronously. And this definitely improves the ui lockup situation, but deciding the rather substantial Json that comes back from the webworker takes a still noticeable amount of time, during which the interface is locked up.

I wonder if anyone has a way to pass the data from the worker to the main thread without an intermediate json stage, since their both elm, or if there’s a way to make the original parser (in elm) operate concurrently using Process.

2 Likes

Instead of moving the work around (to a web worker, to a process, etc) can you defer doing the work until it’s needed? You know the application best, but it doesn’t strike me that people will be looking at several hundred pages at once. Could you parse a page or two, or ten, and wait to parse more until they’re close to the current viewport?

Is it possible to break up the parser by having it do some parsing, then return a continuation to do more? Then you can interleave parsing work with a Cmd to run the continuation, which allows the UI to process queued Cmds?

I used this technique here:

https://package.elm-lang.org/packages/the-sett/ai-search/latest/Search#nextN

Note that SearchResult allows a continuation where there is still more searching to do:

type SearchResult state
    = Complete
    | Goal state (() -> SearchResult state)
    | Ongoing state (() -> SearchResult state)

I experimented with using nextN with batches ranging in size from 10 to 10,000. With small batches the UI remained responsive, but the search did not run very fast. With large batches I could get better performance out of the search, but the UI was very unresponsive. I concluded that if you really need a burst of 100% CPU work that will last longer that a few milliseconds - you are going to need to look into web workers.

As far as I know, you cannot just pass anything between the main UI thread and a webworker when both are built with Elm - you do it through ports and need to encode via JSON.

2 Likes

To clarify: the parsing takes place on the loading of the serialization format. Once it’s in memory, the performance is just fine.

I may be able to tokenize the document into chunks and parse a chunk at a time, but I’m not sure how easily I can tokenize in a consistently right-sized way.

The parser produces a list of several thousand items, so one thing I could do is parse on a worker and stream them back in via ports in buffers of 100 elements or so, asynchronously.

Really clever, Rupert. I may try this.

Its a form of cooperative multitasking; like we used to do in the old days…

Would be very cool if you could just fire up another Elm Process on a web-worker and be able to pass any Elm type over a channel to talk with this process.

==

I did get interested in finding out if this would ever be possible, but I don’t think it will with web workers.

You have to create a web worker from a file, see https://www.w3schools.com/html/html5_webworkers.asp. There is a workaround that allows one to be created from a function, but this is really to-stringing the javascript code, see https://medium.com/@roman01la/run-web-worker-with-a-function-rather-than-external-file-303add905a0.

A web-worker is not treating a function as a first-class function in the sense of FP. That is, to be able to pass around functions like any other type, FP captures functions as continuations - the code + the context (stack) in which the code is run. Web-workers run in separate contexts not shared with the caller. So web-workers are more like separate processes running on the same OS, rather than separate threads within a multi-threaded program.

If you can’t pass functions, you can’t pass any Elm type. Therefore it seems unlikely that we can ever do this - does Web Assembly support threads?

Are you parsing all the text at once? Is that possible to break the text to parts, parse each part and then sum them? E.g. parse by paragraph.

I had a task before to parse a text and break it into lines. And to parse 1000 lines at once was way slower than parse 10 chunks by 100 lines.

If you can find a good way to divide things into chunks then Process.sleep 0 should get you what you’re looking for. This should allow other events that the browser has backlogged during parsing to be processed before moving on to the next chunk.

loadData
  |> Task.map splitData
  |> Task.andThen
      (List.foldl
        (\chunk previous ->
          Process.sleep 0
            |> Task.andThen (\_ -> previous)
            |> Task.map (\alreadyParsed -> parsed ++ parse chunk)
        )
        (Task.succeed [])
      )
5 Likes

If Process.sleep is too slow then requestIdleCallback and ports may help a bit.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.