Two use cases of the new elm/bytes

norpan · November 15, 2018, 11:40am

First of all, it’s great to finally have files and bytes in Elm!

We have two use cases for files and bytes (that are somewhat connected) that I want to share.

We want to calculate a hash for a large file without having to load the whole file in memory at once. We calculate the hash so that we don’t have to upload a file that is already uploaded.
We want to upload a file in chunks, to have better control over the upload process. See rationale at https://docs.aws.amazon.com/AmazonS3/latest/dev/uploadobjusingmpu.html.

Both these cases could be solved in Elm with the new bytes, I think, if there was a way to get a slice of a file as bytes.

Hector · November 15, 2018, 1:54pm

The way this is done in the javascript libraries that provide resumable uploads is with the slice function

AFAICT, in this first version of the pacakge, toBytes is the only function provided and it loads the entire file blob into memory.

norpan · November 15, 2018, 4:33pm

Yes, we use the Blob.slice() currently in our javascript implementation, but I’m describing a suggested improvement api that could make a move to Elm possible.

evancz · November 15, 2018, 10:07pm

Can you share the JS code you are using to process files in chunks?

And any links you have on this approach that help me understand what is going on behind the scenes? (If there is anything tricky!)

Warry · November 16, 2018, 8:34am

I too am tempted to migrate my upload code to Elm to upload 100’s GB files. I personnaly use Evaporate.js. AWS S3 upload in chunks is fairly complex, and Evaporate contains a lot of code for backward compatibility (browser, and s3 itself) and makes extra requests, so I wouldn’t recommand as an inspiration. The process is described here in the docs: https://docs.aws.amazon.com/AmazonS3/latest/API/sigv4-streaming.html

roughly, here are the main informations:

Init the upload with AWS cryptographic methods
Calculate each chunk signature (using previous chunk’s signature)
Crypto method uses MD5 and HMAC-SHA256, which is my show stopper for now.

github.com

aws/aws-sdk-js/blob/master/lib/s3/managed_upload.js

var AWS = require('../core');
var byteLength = AWS.util.string.byteLength;
var Buffer = AWS.util.Buffer;

/**
 * The managed uploader allows for easy and efficient uploading of buffers,
 * blobs, or streams, using a configurable amount of concurrency to perform
 * multipart uploads where possible. This abstraction also enables uploading
 * streams of unknown size due to the use of multipart uploads.
 *
 * To construct a managed upload object, see the {constructor} function.
 *
 * ## Tracking upload progress
 *
 * The managed upload object can also track progress by attaching an
 * 'httpUploadProgress' listener to the upload manager. This event is similar
 * to {AWS.Request~httpUploadProgress} but groups all concurrent upload progress
 * into a single event. See {AWS.S3.ManagedUpload~httpUploadProgress} for more
 * information.
 *

This file has been truncated. show original

norpan · November 16, 2018, 8:50am

Certainly!

If you want me to help making a pull request for the things I suggest below, do let me know.

Reading a file in chunks in javascript is done by first doing slice() on the File, producing a Blob (files are also Blobs, by the way). The you can just use FileReader as usual, but on the Blob instead, and you get the slice you wanted.

So here is our code that does the chunking:

function readChunk(file, start, end, callback) {
  var reader = new FileReader();
  var blob = file.slice(start, end);
  reader.onloadend = function() {
    callback(reader.error, reader.result);
  }
  reader.readAsArrayBuffer(blob);
}

The calling code does readChunk with a error-first callback that gets the array buffer, processes it, and then if there is still data to read (by looking at the file length) it calls readChunk again.

Looking at the elm/file implementation of File.toBytes it looks like it works the same way (but you’d probably want to catch errors too and have a Task that can return errors. Common errors are read permissions or that the file has been deleted. The error is an DOMError/DOMException with name and message fields.

If this is functionality that should be included, there are two main options:

Add a File.sliceToBytes : { from : Int, to : Int, file : File } -> Bytes
Add a File.slice : { from : Int, to : Int, file : File } -> File (just returning File.slice(...)

The second one is more general (you can slice and then call File.toBytes, File.toString, Http.fileBody etc.) and in the Http case also more efficient, you don’t have to copy the bytes before sending them.

The drawback with the second solution is that some functions on the resulting File will be undefined: name, mime, lastModified, so that has to be taken into account (by returning empty values or Maybe a) or by copying these values from the original File object when creating the slice.

jxxcarlson · November 16, 2018, 11:48am

Is it feasible to implement unencrypted uncompressed zip using elm bytes? There is a description of the file format on wikipedia.

system · November 26, 2018, 11:48am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Binary and file handling in Elm Request Feedback	8	3181	September 25, 2018
How can one upload files using Elm? Learn	16	5767	September 2, 2018
Binary File Parsing for Streaming Applications Learn	3	869	October 29, 2019
Sending a File through a port Learn	19	2650	April 28, 2019
File uploads without native code? Learn	14	3545	March 15, 2018

Two use cases of the new elm/bytes

Related topics