Url parsing just for path, query, and fragment

#1

I have a requirement that needs me to parse and validate redirection url. For example, in https://google.com?redirect_to=%2Ftarget%2Fdestination%3Fsource%3Dgoogle it would be %2Ftarget%2Fdestination%3Fsource%3Dgoogle part (decoded to /target/destination?source=google). Because that’s a query parameter, some malicious script (redirect_to=javascript:BOOM!) can be put in there so I need to ensure that doesn’t happen.

I’ve been thinking about this issue for some time, but I couldn’t find a satisfactory solution.

The simplest hack would be to just check whether the parameter value starts with javascript, but I wanted something more robust.

The problem is that, although I have valid routes defined in Route type and have implemented parsers for all those routes, I can’t use them because Elm’s Url.Parser library only allows parsing a full url. And because my partial url lacks scheme and host, I can’t use that.

I tried to implement an alternative version of Url.parse function that takes three arguments paths, queries, and fragments instead of single Url type. But I couldn’t fork the Url library because it had Kernel codes inside it, and I didn’t want to use fork the compiler just for that.

Next option I looked into was the standalone Parse library, but that type is not compatible with the rest of functions from Url library and the Url.Parsers that I’ve defined.

There’s also an option to include origin information in my Route type:

type alias Origin = String

type Route 
    = SignIn Origin
    | Landing Origin
    | MyPage Origin MyPageRoute

type MyPageRoute
    = Account
    | Orders

Unfortunately this approach is quite inelegant and leads to tons of boilerplate code and unnecessary pattern matchings. I’d rather just check for javascript string than to take this approach.

I’m still trying to find a good solution, but I can’t think of one. If anyone has come up with a good way to handle a case like this, I’d love to hear about it!

#2

If you’ve gotten it to a plain String like "/target/destination?source=google", could you prepend a fake scheme & host to the String and then just use Url.Parser as-is ?

The fake host String may need to end with a trailing slash.

#3

That’s still hacky, but sounds much better than checking if the string starts with javascript://!

closed #4

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.