How can one get the sha-256 hash for a file.
What I am currently trying to do is the following:
import Bytes.Decode exposing (Decoder)
import File exposing (File)
- Select.file [“application/jpeg”] ImmageLoaded)
- File.toBytes file to get the bytes of the file
- str_dec = Bytes.Decode.string (Bytes.width bytes)
- str_hash = Bytes.Decode.decode str_dec bytes
But the result of 4 is alwasys Nothing although the value of Bytes.width is some int number
Anybody has any experience on the topic ? or how would you aproach it ?
A problem I can see here is, that you’re decoding a bunch of random bytes into an UTF-8 String, but not all byte sequences are valid UTF-8 strings, hence the Nothing. Can you try to select a text file and see if that works?
Edit: I created an Ellie that demonstrates the problem. UTF-8 text files work fine, but some binary files do not. https://ellie-app.com/6W2G972xfwTa1
Hmm you are right thank you.
Ineed whehn I try that with a txt file works perfectly.
But then do you have any direction on how to aproach this since I specifically want to get the sha256 for jpg files ?
I’d consider this to be a bug in
Sha256 since it only accepts
String as input and not raw
You could try the elm-sha package which I gather should accept arbitrary byte sequences with a round trip through a hex string (the intermediate representations required might be pretty memory-intensive for a large file, but images or most documents should be okay).
Another option might be to send the file through a port (maybe base64 encode it?) and use the browser subtle crypto API. It’s much less clean though.
Yes, icidasset/elm-sha package actually works unlike billstclair/elm-sha256, which gives incorrect result for some inputs: https://github.com/billstclair/elm-sha256/issues/7
The roll-your-own approach of the icidasset packages seems probably less efficient but it’s guaranteed to work.
Well When I tried working with elm-sha first converting to hex with jxxcarlson/hex
and then going to binary and get the sha256 of the binary but something goes awfully wrong, window goes blank console the same and I still can’t close that window
Probably the momory issue you described my jpg for testing is 955,8 kB
I tried to get the hash of the hex string from elm crypto but stil having some memory issues I think because I get not responding allert from chrome!
Further more when trying to get the bytes of the file
, then get a Hex string out of the bytes
, use an online tool to calculate the hash of the hex string
and directly geting the hash of the file
two results differ from each other.
elm/bytes directly should be the way to go. Anything else will be orders of magnitude slower, and suffers from memory problems. (e.g. an 8 million items long
It looks like there is currently no package that can really do this though. I’m happy to help out if someone wants to give it a go.
That’s pretty slick. I wouldn’t have thought of putting the logic in a Decoder, but it makes sense considering it’s a transform from input to a SHA state. I’m still new to the FP mindset!
The main advantage of moving the logic into the Decoder is that only one pass is made over the input data, and there is little allocation. Using lists of items, you often get pipelines like
byteValues |> groupsOf n |> List.map g |> etc. that looks nice and simple, but traverses and allocates effectively the whole input again for each
I was able to get rid of those masks on this branch.
A tricky aspect of sha1 is that it mixes unsigned 32-bit integer addition with bitwise operators. Some bitwise operators can flip the sign (e.g.
Bitwise.complement 6 == -7) and clearly that will be a problem doing addition (which as you say is signed by default). In this case it is enough to add a
|> Bitwise.shiftRightZfBy 0 (built into
rotateLeftBy) before addition to force the number to be unsigned and overflow. Because the starting numbers are relatively small (in the order
2^53 - 1) so the intermediate
Bitwise.and 0xFFFFFFFF could be removed.
I’m working on a longer post about that PR. Working with JS numbers in this case requires a bit of experimentation, but when you understand where problems can occur, strategically placing a bunch of bit shifts can make it work reliably. And the performance is just a lot better, actually making it possible to hash 1Mb+ files. So an
elm-bytes based sha should be the standard.
I ended up just uploading the file to the server and having the server return the hash.
Did you wanted to absulutely avoid ports or you found similar issues in the JS side also ? I am asking because at this point I am considering either to give js side a try or follow your path along.
Yea but for my case the end destinations is s3 so probably I have to invest a bit more time on that
What did the trick for me was:
- Define an hiden input in elm with type file(So you are not able to use it as a user)
- I have defined an onchange function also in the elm side that will allo me to know which file was selected if any
- Registering a port function that when invoked trigers the click behaviour of the hidden input
This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.