Is elm handling regex `\p{L}` correctly?

Trying to parse a summoner name for LoL API. They give the following regex to do that:
^[0-9\\p{L} _\\.]+$

Here is my code:

    OnSummonerNameInput name ->
            reg =
                Maybe.withDefault Regex.never <|
                    Regex.fromString "^[0-9\\p{L} _\\.]+$"

            passes =
                Regex.contains reg name
        if passes then
            ( { model | summonerName = name }, Cmd.none )

            ( model, Cmd.none )

It allows spaces, periods, underscores, and numbers, but not letters. If you look at my shared regex101 link, letters are also allowed.
Edit: It also allows { }, but shouldn’t.

How might I achieve this same thing with elm/parser?

You were using regex101 in PHP mode, if you switch it to ECMAScript mode (which is what Elm uses) you’ll see that \p{L} doesn’t work.

There doesn’t seem to be a pattern that matches characters of any language in ECMAScript Regex. The elm-community/string-extra package suffers from this too actually, it uses \w for some capitalization functions, but it only matches non-accented english characters, so doesn’t work as you’d expect.

1 Like

Unicode property escapes was added in ES2018:

However, only Chrome supports them at the moment (as far as I know).

There’s an extra quirk that makes things confusing. Unicode property escapes are only available if the u regex flag is used.

/\p{L}/ might look like a unicode property escape, but actually means: /p\{L\}/.
/\p{L}/u gives “SyntaxError: invalid identity escape in regular expression” in Firefox (because it doesn’t support \p yet), but works in Chrome.

(The same thing applies to RegExp("\\p{L}") vs RegExp("\\p{L}", "u")).

But even in Chrome \p{L} won’t work in Elm, because as far as I know Elm’s regexes never use the u flag.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.