Geting the hash sha-256 of a file for upload perpuses

Nikolis_Galerakis · October 15, 2019, 9:19am

How can one get the sha-256 hash for a file.

What I am currently trying to do is the following:

import Bytes
import Bytes.Decode  exposing (Decoder)
import Sha256
import File exposing (File)

Select.file [“application/jpeg”] ImmageLoaded)
File.toBytes file to get the bytes of the file
str_dec = Bytes.Decode.string (Bytes.width bytes)
str_hash = Bytes.Decode.decode str_dec bytes

But the result of 4 is alwasys Nothing although the value of Bytes.width is some int number

Anybody has any experience on the topic ? or how would you aproach it ?

Malax · October 15, 2019, 9:53am

A problem I can see here is, that you’re decoding a bunch of random bytes into an UTF-8 String, but not all byte sequences are valid UTF-8 strings, hence the Nothing. Can you try to select a text file and see if that works?

Edit: I created an Ellie that demonstrates the problem. UTF-8 text files work fine, but some binary files do not. https://ellie-app.com/6W2G972xfwTa1

Nikolis_Galerakis · October 15, 2019, 10:03am

Hmm you are right thank you.
Ineed whehn I try that with a txt file works perfectly.
But then do you have any direction on how to aproach this since I specifically want to get the sha256 for jpg files ?

malaire · October 15, 2019, 11:42am

I’d consider this to be a bug in Sha256 since it only accepts String as input and not raw Bytes.

keisisqrl · October 15, 2019, 12:04pm

You could try the elm-sha package which I gather should accept arbitrary byte sequences with a round trip through a hex string (the intermediate representations required might be pretty memory-intensive for a large file, but images or most documents should be okay).

Another option might be to send the file through a port (maybe base64 encode it?) and use the browser subtle crypto API. It’s much less clean though.

malaire · October 15, 2019, 12:09pm

Yes, icidasset/elm-sha package actually works unlike billstclair/elm-sha256, which gives incorrect result for some inputs: https://github.com/billstclair/elm-sha256/issues/7

keisisqrl · October 15, 2019, 12:32pm

Huh. I wonder if it trusts 32-bit math (skimming the code, it looks like maybe). I found out trying to implement chacha20 (which I might get back to some day) that 32-bit integers sort of work but act strangely because javascript. I think the Basics module warns about this.

The roll-your-own approach of the icidasset packages seems probably less efficient but it’s guaranteed to work.

Nikolis_Galerakis · October 15, 2019, 1:07pm

Well When I tried working with elm-sha first converting to hex with jxxcarlson/hex
and then going to binary and get the sha256 of the binary but something goes awfully wrong, window goes blank console the same and I still can’t close that window

Probably the momory issue you described my jpg for testing is 955,8 kB

UPDATE:
I tried to get the hash of the hex string from elm crypto but stil having some memory issues I think because I get not responding allert from chrome!

Nikolis_Galerakis · October 15, 2019, 1:29pm

Further more when trying to get the bytes of the file
, then get a Hex string out of the bytes
, use an online tool to calculate the hash of the hex string
and directly geting the hash of the file
two results differ from each other.

folkertdev · October 15, 2019, 1:50pm

using elm/bytes directly should be the way to go. Anything else will be orders of magnitude slower, and suffers from memory problems. (e.g. an 8 million items long List Bool…)

I’m working on a PR for sha1 that is 10 times faster than the current non-bytes implementation. As mentioned by @keisisqrl you have to work around javascript numbers being weird, but with proper testing that should be allright.

It looks like there is currently no package that can really do this though. I’m happy to help out if someone wants to give it a go.

keisisqrl · October 15, 2019, 4:46pm

That’s pretty slick. I wouldn’t have thought of putting the logic in a Decoder, but it makes sense considering it’s a transform from input to a SHA state. I’m still new to the FP mindset!

I think that’s the purpose of the masks in calculateDigestDeltas you (?) remarked on here. As far as I can tell all arithmetic will be, functionally, mod32, but above 2^31 it gets weird. Ints behave as signed with (most?) arithmetic operations and wrap, but bitwise operations (at least some - it varies with bit shifts, per javascript docs) seem to coerce the value to unsigned without changing the bit representation before using it.

folkertdev · October 15, 2019, 5:14pm

The main advantage of moving the logic into the Decoder is that only one pass is made over the input data, and there is little allocation. Using lists of items, you often get pipelines like byteValues |> groupsOf n |> List.map g |> etc. that looks nice and simple, but traverses and allocates effectively the whole input again for each |>.

I was able to get rid of those masks on this branch.

A tricky aspect of sha1 is that it mixes unsigned 32-bit integer addition with bitwise operators. Some bitwise operators can flip the sign (e.g. Bitwise.complement 6 == -7) and clearly that will be a problem doing addition (which as you say is signed by default). In this case it is enough to add a |> Bitwise.shiftRightZfBy 0 (built into rotateLeftBy) before addition to force the number to be unsigned and overflow. Because the starting numbers are relatively small (in the order 2^31 at most) there is no risk of javascript number overflow by just adding three of them (integer addition works till 2^53 - 1) so the intermediate Bitwise.and 0xFFFFFFFF could be removed.

I’m working on a longer post about that PR. Working with JS numbers in this case requires a bit of experimentation, but when you understand where problems can occur, strategically placing a bunch of bit shifts can make it work reliably. And the performance is just a lot better, actually making it possible to hash 1Mb+ files. So an elm-bytes based sha should be the standard.

norpan · October 15, 2019, 5:33pm

I ended up just uploading the file to the server and having the server return the hash.

Nikolis_Galerakis · October 15, 2019, 5:42pm

Did you wanted to absulutely avoid ports or you found similar issues in the JS side also ? I am asking because at this point I am considering either to give js side a try or follow your path along.

norpan · October 15, 2019, 5:56pm

The problem was that it was not possible to get the bytes via ports, so you’d have to do the whole thing in javascript. And since the file most of the time was going to be uploaded to the server anyway, I found it to be the easiest way.

Nikolis_Galerakis · October 15, 2019, 5:59pm

Yea but for my case the end destinations is s3 so probably I have to invest a bit more time on that

Nikolis_Galerakis · October 19, 2019, 12:00pm

What did the trick for me was:

Define an hiden input in elm with type file(So you are not able to use it as a user)
I have defined an onchange function also in the elm side that will allo me to know which file was selected if any
Registering a port function that when invoked trigers the click behaviour of the hidden input
Register onchange callback on the javascript side on the same input object defined in the elm side
when the file changes I calculate in the javascript side the hash of the file using the crypto-js/sha256
using a port I send back the value calculater in the javascript side.

system · October 29, 2019, 12:00pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How do I compute the SHA1 sum of a File? Learn	3	778	October 20, 2019
Bytes, ports and Uint8Array Learn	8	2491	May 19, 2019
Fast pure elm SHA2 (and soon SHA1) Show and Tell	4	1192	November 12, 2019
Two use cases of the new elm/bytes Request Feedback	7	1488	November 26, 2018
Binary and file handling in Elm Request Feedback	8	3182	September 25, 2018

Geting the hash sha-256 of a file for upload perpuses

Related topics