Elm-tar issue with special characters

neurodynamic · November 5, 2019, 7:48pm

I added an issue to the GitHub, but I’ve been trying to debug myself and I was curious if anyone might have any insight into this. I’ve been using elm-tar in one of my projects, and I’ve found that the .tar archives it creates break upon extraction if any of the files in the archive contain special characters like an em dash (—) or an emoji.

If a file contains some of these special characters, then (when extracted) the file will be missing a couple of characters at the very end of the file for each special character in it, and any files after it in the archive will not be extracted at all.

E.g. Making a tar archive with three files with these contents: 1 abcdef 2 abc🙂def and 3 abcdef will successfully extract only two files before erroring: 1 abcdef 2 abc🙂d. Note the missing ef at the end of the second file.

I’m guessing this has to do with these characters taking up more bytes of space than normal characters, but I can’t figure out a fix. I noticed that elm-tar does designate file size by getting the length of the file string without factoring in character type, which I think might be related, but I don’t know how to adjust it appropriately.

malaire · November 5, 2019, 8:03pm

That is definitely wrong. String.length returns string length in UTF-16 code units, not in bytes.

I added a comment about this to the issue.

system · November 15, 2019, 8:04pm

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Adding manual hyphenation to your site with Elm Show and Tell	1	609	January 7, 2022
Unexpected behaviour when slicing unicode strings Learn	4	550	October 9, 2018
Elm-unicode is here! Show and Tell	6	1124	April 8, 2021
Support for Latin Extended A in String.Extra.removeAccents (pull request) Request Feedback	1	518	November 20, 2021
Parse fixed length strings with elm/parser Learn	3	851	September 24, 2019

Elm-tar issue with special characters

Related topics