How to do multipart file upload with non-ascii filenames?

eikek · August 9, 2021, 11:27pm

Hi all!

I’m using elms File module to create a Http.multipartBody from a List File via Http.filePart file. This is working great so far for me. But now a file with cyrillic letters was uploaded and elm sends the following headers:

Content-Disposition: form-data; name="file[]"; filename="ÐÐ¾Ð½ÑÑÐ°Ð½ÑÐ¸Ð½Ð¾Ð¿Ð¾Ð»Ñ.pdf"
Content-Type: application/pdf

The name is Константинополя.pdf. It looks like the filename attribute contains utf8 bytes. My http server has problems with this and drops the filename completely.

Is this a correct behavior - or would a percent-encoded version with the filename* attribute make more sense here (and even in general)?

Does anyone has a tip how to workaround this? Can I somehow tell elm to use a different encoding for the filename?
TIA!

Edit: I just found this rfc where it says:

NOTE: The encoding method described in [RFC5987], which would add a “filename*” parameter to the Content-Disposition header field, MUST NOT be used.

Does this mean non-ascii bytes are allowed in http headers?
Is there a reason that the variant described in RFC5987 must not be used?

lydell · August 10, 2021, 9:55am

Hi!

elm/http does nothing special here. It just uses the standard FormData to create the request body:

In other words, it doesn’t feel like Elm is trying to be fancy and do a lot of custom stuff. So I’d be surprised if Elm is the culprit.

The headers that you pasted with the jumbled filename – where does that come from?

I think the first step is looking at the request body in the browser’s devtools. Does it look good there? Then, we need to rule out that your server isn’t handling the request “wrong.” Finally, I’d check if I need to set encoding=utf-8 somewhere, like in a header or something (I don’t know how that works off the top of my head).

Trying to make the request with plain JS just to see if that helps can be valuable too – then you know if you can rule out Elm or not.

eikek · August 10, 2021, 11:11am

Hi!

thanks for your help! I also had a look at the JS files and came to the same conclusion. There is nothing special done from Elm.

The headers I posted are from the request body and copied from firefox dev tools. The funny filename caught my attention. I always thought that http headers are ascii only! But now I dived into some RFCs and, while I’m feeling even more confused than before, I now think that utf8 bytes are allowed. I have to correct an old part of my brain here . I knew about the attribute* properties, like name* and filename* that allow to specify a charset and transport the bytes as percent-encoded ascii string. But the RFC i linked above states, that this “MUST NOT” be used for any attributes in a Content-Disposition header.

I also looked at other clients, for example HTML form in firefox and a java client (apache httpclient, which is quite popular). The former sends the bytes of the filename as entities (like filename="Ко…), while the java client writes the bytes as UTF8 (actually it provides two implementations).

So after all, I currently think that the server should allow utf-8 bytes in http headers, which is right now not allowed in the library I’m using. I also found issues on github on other projects that are discussing the same thing, and moved away from filename*(exapmle). It’s not very clear from the spec/rfc side (to me!), I think the filename* is just old knowledge and now superseeded by simply using utf8 encoded bytes.

I was also wondering if someone else has run into a similar issue.

system · August 20, 2021, 11:11am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
File uploads without native code? Learn	14	3534	March 15, 2018
How I uploaded a file Show and Tell	1	1442	March 23, 2019
How can one upload files using Elm? Learn	16	5749	September 2, 2018
How can one upload files using Elm? (digging the old grave) Learn	14	332	November 19, 2024
Confusing Http bug Learn	5	826	October 1, 2019

How to do multipart file upload with non-ascii filenames?

Related topics