How to do multipart file upload with non-ascii filenames?

Hi all!

I’m using elms File module to create a Http.multipartBody from a List File via Http.filePart file. This is working great so far for me. But now a file with cyrillic letters was uploaded and elm sends the following headers:

Content-Disposition: form-data; name="file[]"; filename="Константинополя.pdf"
Content-Type: application/pdf

The name is Константинополя.pdf. It looks like the filename attribute contains utf8 bytes. My http server has problems with this and drops the filename completely.

Is this a correct behavior - or would a percent-encoded version with the filename* attribute make more sense here (and even in general)?

Does anyone has a tip how to workaround this? Can I somehow tell elm to use a different encoding for the filename?
TIA!

Edit: I just found this rfc where it says:

NOTE: The encoding method described in [RFC5987], which would add a “filename*” parameter to the Content-Disposition header field, MUST NOT be used.

Does this mean non-ascii bytes are allowed in http headers?
Is there a reason that the variant described in RFC5987 must not be used?

Hi!

elm/http does nothing special here. It just uses the standard FormData to create the request body:

In other words, it doesn’t feel like Elm is trying to be fancy and do a lot of custom stuff. So I’d be surprised if Elm is the culprit.

The headers that you pasted with the jumbled filename – where does that come from?

I think the first step is looking at the request body in the browser’s devtools. Does it look good there? Then, we need to rule out that your server isn’t handling the request “wrong.” Finally, I’d check if I need to set encoding=utf-8 somewhere, like in a header or something (I don’t know how that works off the top of my head).

Trying to make the request with plain JS just to see if that helps can be valuable too – then you know if you can rule out Elm or not.

1 Like

Hi!

thanks for your help! I also had a look at the JS files and came to the same conclusion. There is nothing special done from Elm.

The headers I posted are from the request body and copied from firefox dev tools. The funny filename caught my attention. I always thought that http headers are ascii only! But now I dived into some RFCs and, while I’m feeling even more confused than before, I now think that utf8 bytes are allowed. I have to correct an old part of my brain here :slight_smile: . I knew about the attribute* properties, like name* and filename* that allow to specify a charset and transport the bytes as percent-encoded ascii string. But the RFC i linked above states, that this “MUST NOT” be used for any attributes in a Content-Disposition header.

I also looked at other clients, for example HTML form in firefox and a java client (apache httpclient, which is quite popular). The former sends the bytes of the filename as entities (like filename="Ко…), while the java client writes the bytes as UTF8 (actually it provides two implementations).

So after all, I currently think that the server should allow utf-8 bytes in http headers, which is right now not allowed in the library I’m using. I also found issues on github on other projects that are discussing the same thing, and moved away from filename*(exapmle). It’s not very clear from the spec/rfc side (to me!), I think the filename* is just old knowledge and now superseeded by simply using utf8 encoded bytes.

I was also wondering if someone else has run into a similar issue.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.