Parsing querystring - woes with + and %20 as spaces

I’ve written an interface for our website search. It handles things like filtering, sorting, etc. and refreshing the results as those things change.

I’m using the URL as the source-of-record for the initial state of the search interface (I pass the current URL in via a flag). As the filters, sorting, etc. are updated, I modify the URL and, via a port, use pushState to update the current URL. This means that users can bookmark a search results page, share by copying the url, etc. It’s working pretty well. I load state by parsing a URL and I store state by encoding a URL.

The problem I’m running into is with spaces in the search string. A search starts with a regular HTML form that does a GET request to the search page. If someone searches for “foo bar”, then the URL looks like http://www.example.com/search?q=foo+bar. The problem is that after I parse out the q param, the string I get back is “foo+bar”. The space remains as a plus and doesn’t get decoded.

Here’s how I’m decoding:

extractSearchStringFromUrl : Url -> Maybe String
extractSearchStringFromUrl url =
    let
        parser =
            s "search" <?> Query.string "q"
    in
    case Url.Parser.parse parser url of
        Nothing ->
            Nothing

        Just match ->
            match

Ignore that this case is a little verbose (and unnecessary). It’s going to do some other things in the future.

My question is how can I get a decoded querystring parameter where the pluses are replaced with actual spaces? A plus is a valid character in our search system, so I can’t replace pluses with a space after the match. I did have some success changing + to %20 in the URL before I parse it, but that feels clumsy. I’m assuming that there’s a better way to do what I’m doing where I won’t run into the friction I’m running into. Getting decoded strings from a querystring is a well-worn path. Does anyone have any ideas of the best way to do this?

Thanks in advance.

You should change your design if you want your links to be shareable in Facebook.

Facebook changes + in query to space, so any URL which needs to have literal + in query, or to differentiate between + and space, can’t be shared in Facebook.

p.s. I noticed this earlier this year when I wanted to share link to a certain graph at ourworldindata.org and it was impossible. I reported this problem to them and they “fixed” their URLs to be bug-compatible with Facebook. Facebook of course still hasn’t fixed their bug.

1 Like

I believe this is issue https://github.com/elm/url/issues/32.

That issue is invalid. URL standard doesn’t mention that + would be same as space, and it’s clear from the standard that they are NOT equivalent.

I’m not convinced that the issue is invalid. While the URL standard does not specify that spaces should be encoded as +, the HTML standard does. It’s certainly not correct to say that it’s “rather non-standard”, as another comment here does. It’s one of only three built-in form behaviors that browsers support natively. I think that if Elm targets browsers as its main platform, then it should support +.

If the elm/url package is taking the hard line of "application/x-www-form-encoded was a mistake and it stops here", what is the backwards-compatible way of supporting + in form submissions? Typically the advice is to replace the + with %20 before passing it to decodeURIComponent, but it doesn’t really seem like there’s an easy way to do that using the elm/url package. Is it necessary to write a whole new URL parsing library?

Regarding whether this is standards-based, the plus is actually totally standard. If you create an HTML form that looks like this:

<form action='/search'>
    <input type='text' name='q' />
    <input type='submit' />
</form>

And hit the submit button, the default browser behavior is to add a plus. If you go to Google or Amazon and search for something (“foo bar”, for instance), you’ll see the plus. Where spaces get encoded as %20 and + is not immediately obvious, but I believe if it is in the path, then it gets %20 and if it is a value in a querystring, it gets a +. It would be great if it were consistent, but it’s not.

@malaire Doesn’t really matter if the standard does or doesn’t address it, this is the default way all of the major browsers work. You get spaces encoded as + in querystrings.

HTML standard does not say that. HTML standard only says that for application/x-www-form-encoded but query strings can be used for other things than form data.

That is a lie. I just tested both in Firefox and Chromium and both encode space as %20, not as + when using query string without form data.

There’s so much misinformation going on that I tested more browsers.

  • All of these browsers encode space as %20 in query string
  • All of these browsers keep + as + in query string
  • None of these browsers consider space and + to be equivalent

Chrome - Windows 10
Firefox - Windows 10
Edge - Windows 10
Firefox - Linux
Chromium - Linux

There isn’t a single major browser which encodes space as + in query string. Because that would be wrong.

@malaire Here’s an example of where this happens. These images come from Chrome 81.0.4044.129.

I went to google.com and typed in “foo bar” in the text input for the search string.

Zooming in on the URL, you can see the plus:

plusplus

Let’s check some other sites:

I’ve been a web developer for as long as the web has been in existence. A plus in a querystring has been there since the very beginning - regardless of what the standard says.

Are you saying that, for instance, if you go to https://news.ycombinator.com/, scroll to the bottom of the page, and enter “foo bar” into the search form that you are taken to the page https://hn.algolia.com/?q=foo%20bar and not the page https://hn.algolia.com/?q=foo+bar?

1 Like

You clearly don’t understand this at all. When submitting a form with built-in <form action="...">, then space is encoded into +.

But that is just one way to use query strings and not the only way. You clearly don’t understand that query strings can also be used for other things which have nothing to do with submitting forms - and in those cases URL standard says that space is not equivalent to +. And there isn’t a single browser which encodes space as plus in those cases.

Also forms can be submitted without using <form action="...">, and then also space is not equivalent to +.

You can safely replace plus with space before decoding, since if there where any pluses in the search they would be percent encoded themselves. ie if your search is foo+bar baz the query string would be q=foo%2Bbar+baz

@malaire We’re talking about the specific case where text is submitted via an HTML form. I honestly don’t care about the other ways a querystring can be used right now. I care about this way.

Anyway, whatever. You win. Clearly we are all idiots and don’t understand this. Thanks for your help and making this community a better place.

2 Likes

Thanks for offering some constructive feedback @Hector. I ended up, basically, doing what you recommended. The only difference is that I replaced + with %20 and then let the built-in decoding take over. It still feels hacky to me, but it works.

1 Like

Your issue has nothing to do with query string encoding, but form data encoding, and still you’ve been claiming incorrectly that browsers encode space as plus in query strings, even adding that lie in the linked issue which could make Elm developers change URL behavior to be non-standard.

So I couldn’t stay silent when you tried to change Elm to be non-standard using lies.