Binary File Parsing for Streaming Applications

I think that elm’s story around parsing binary files is a non-starter.
I’m interested in writing applications based around streamable files, and this means that if I cannot jump around a file, I should at least be able to process it as lazily as possible, so that I can get something showing up on the user’s screen as fast as possible.
It’s important that there be asynchronous tooling to perform processing while the browser loads the file into memory–and not just byte-by-byte.

In Javascript, using the file picker to select a 1GB MKV file, I use the file’s ReadableStream to immediately chop off the first 4 bytes and instantly output them to the screen: https://codepen.io/illeatmyhat/pen/XWWNmVd
In elm, this same task takes 30 seconds before anything shows up to the screen: https://gist.github.com/illeatmyhat/801b60e64a30c01cc23e8a8b40ea2a05
For reference, in Javascript, it takes ~14 seconds to do the same thing synchronously using blob.arrayBuffer().

I use the File.toBytes function here to select the file as Bytes, which seems to be the culprit here.

update : Msg -> Model -> (Model, Cmd Msg)
update msg model =
  case msg of
    FileRequested ->
      ( model
      , Select.file ["video/x-matroska"] FileSelected
      )

    FileSelected file ->
      ( model
      , Task.perform FileLoaded (File.toBytes file)
      )

    FileLoaded content ->
      ( { model | file = getEBMLID content }
      , Cmd.none
      )

getEBMLID is a pretty straight-forward function that just pops off a 32-bit int and converts it to a string.

getEBMLID: Bytes -> Maybe String
getEBMLID file = 
    Decode.decode (Decode.unsignedInt32 BE) file |> Maybe.map String.fromInt
1 Like

Hello, and welcome to Elm’s discourse!

Your usecase is interesting and definitely not covered by the exiting API!

It might be possible, depending on the file format, to use ports to “stream” chunks from javascript to Elm (although unfortunately you cannot send bytes via ports, so that’s difficult).

I can understand your frustration, but you have to keep in mind that Elm prefers to do things right rather than right now.

The best way to ask for change is to talk about a concrete use case that is uncovered or uncomfortable with the current API, which is exactly what you’ve done here.

It is also possible that Elm is not (yet!) the correct tool for the job.

One idea that comes to mind is simply doing the decoding on the js side and using a port to send the pieces of information over to the Elm side.

2 Likes

Since that first post, I’ve spent some time in other web-oriented functional languages, trying to accomplish the same thing. I’ve found that I overall prefer Elm’s approach to solving general problems the best, esp. surrounding packages.
Decoding a binary stream is more of a long term goal for me anyways, so I’m not interested in an immediate solution.
While I can’t say that I have a complete grasp over the functional style of doing things, I believe that I can be available to provide suggestions and candidate APIs. Looking through the source code of the standard libraries, the use of Elm.Kernel.Foo seems pretty straight-forward, but it will certainly take several iterations to get it right.

A stream object can be said to be “more effectful” than others, so it would be interesting to see if any other functional languages, even the ones that aren’t web-oriented, also have a story around stream processing.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.