Elm-tar issue with special characters

I added an issue to the GitHub, but I’ve been trying to debug myself and I was curious if anyone might have any insight into this. I’ve been using elm-tar in one of my projects, and I’ve found that the .tar archives it creates break upon extraction if any of the files in the archive contain special characters like an em dash () or an emoji.

If a file contains some of these special characters, then (when extracted) the file will be missing a couple of characters at the very end of the file for each special character in it, and any files after it in the archive will not be extracted at all.

E.g. Making a tar archive with three files with these contents: 1 abcdef 2 abc🙂def and 3 abcdef will successfully extract only two files before erroring: 1 abcdef 2 abc🙂d. Note the missing ef at the end of the second file.

I’m guessing this has to do with these characters taking up more bytes of space than normal characters, but I can’t figure out a fix. I noticed that elm-tar does designate file size by getting the length of the file string without factoring in character type, which I think might be related, but I don’t know how to adjust it appropriately.

That is definitely wrong. String.length returns string length in UTF-16 code units, not in bytes.

I added a comment about this to the issue.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.