Two use cases of the new elm/bytes

First of all, it’s great to finally have files and bytes in Elm!

We have two use cases for files and bytes (that are somewhat connected) that I want to share.

  1. We want to calculate a hash for a large file without having to load the whole file in memory at once. We calculate the hash so that we don’t have to upload a file that is already uploaded.
  2. We want to upload a file in chunks, to have better control over the upload process. See rationale at https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html.

Both these cases could be solved in Elm with the new bytes, I think, if there was a way to get a slice of a file as bytes.

10 Likes

The way this is done in the javascript libraries that provide resumable uploads is with the slice function

AFAICT, in this first version of the pacakge, toBytes is the only function provided and it loads the entire file blob into memory.

Yes, we use the Blob.slice() currently in our javascript implementation, but I’m describing a suggested improvement api that could make a move to Elm possible.

1 Like

Can you share the JS code you are using to process files in chunks?

And any links you have on this approach that help me understand what is going on behind the scenes? (If there is anything tricky!)

1 Like

I too am tempted to migrate my upload code to Elm to upload 100’s GB files. I personnaly use Evaporate.js. AWS S3 upload in chunks is fairly complex, and Evaporate contains a lot of code for backward compatibility (browser, and s3 itself) and makes extra requests, so I wouldn’t recommand as an inspiration. The process is described here in the docs: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html

roughly, here are the main informations:

  • Init the upload with AWS cryptographic methods
  • Calculate each chunk signature (using previous chunk’s signature)
  • Crypto method uses MD5 and HMAC-SHA256, which is my show stopper for now.


Certainly!

If you want me to help making a pull request for the things I suggest below, do let me know.

Reading a file in chunks in javascript is done by first doing slice() on the File, producing a Blob (files are also Blobs, by the way). The you can just use FileReader as usual, but on the Blob instead, and you get the slice you wanted.

So here is our code that does the chunking:

function readChunk(file, start, end, callback) {
  var reader = new FileReader();
  var blob = file.slice(start, end);
  reader.onloadend = function() {
    callback(reader.error, reader.result);
  }
  reader.readAsArrayBuffer(blob);
}

The calling code does readChunk with a error-first callback that gets the array buffer, processes it, and then if there is still data to read (by looking at the file length) it calls readChunk again.

Looking at the elm/file implementation of File.toBytes it looks like it works the same way (but you’d probably want to catch errors too and have a Task that can return errors. Common errors are read permissions or that the file has been deleted. The error is an DOMError/DOMException with name and message fields.

If this is functionality that should be included, there are two main options:

  1. Add a File.sliceToBytes : { from : Int, to : Int, file : File } -> Bytes
  2. Add a File.slice : { from : Int, to : Int, file : File } -> File (just returning File.slice(...)

The second one is more general (you can slice and then call File.toBytes, File.toString, Http.fileBody etc.) and in the Http case also more efficient, you don’t have to copy the bytes before sending them.

The drawback with the second solution is that some functions on the resulting File will be undefined: name, mime, lastModified, so that has to be taken into account (by returning empty values or Maybe a) or by copying these values from the original File object when creating the slice.

2 Likes

Is it feasible to implement unencrypted uncompressed zip using elm bytes? There is a description of the file format on wikipedia.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.