Is elm handling regex `\p{L}` correctly?

Kochab · June 5, 2019, 1:31am

Trying to parse a summoner name for LoL API. They give the following regex to do that:
^[0-9\\p{L} _\\.]+$

Here is my code:

    OnSummonerNameInput name ->
        let
            reg =
                Maybe.withDefault Regex.never <|
                    Regex.fromString "^[0-9\\p{L} _\\.]+$"

            passes =
                Regex.contains reg name
        in
        if passes then
            ( { model | summonerName = name }, Cmd.none )

        else
            ( model, Cmd.none )

It allows spaces, periods, underscores, and numbers, but not letters. If you look at my shared regex101 link, letters are also allowed.
Edit: It also allows { }, but shouldn’t.

How might I achieve this same thing with elm/parser?

Herteby · June 5, 2019, 3:05am

You were using regex101 in PHP mode, if you switch it to ECMAScript mode (which is what Elm uses) you’ll see that \p{L} doesn’t work.

There doesn’t seem to be a pattern that matches characters of any language in ECMAScript Regex. The elm-community/string-extra package suffers from this too actually, it uses \w for some capitalization functions, but it only matches non-accented english characters, so doesn’t work as you’d expect.

lydell · June 5, 2019, 9:18am

Unicode property escapes was added in ES2018: http://2ality.com/2017/07/regexp-unicode-property-escapes.html

However, only Chrome supports them at the moment (as far as I know).

There’s an extra quirk that makes things confusing. Unicode property escapes are only available if the u regex flag is used.

/\p{L}/ might look like a unicode property escape, but actually means: /p\{L\}/.
/\p{L}/u gives “SyntaxError: invalid identity escape in regular expression” in Firefox (because it doesn’t support \p yet), but works in Chrome.

(The same thing applies to RegExp("\\p{L}") vs RegExp("\\p{L}", "u")).

But even in Chrome \p{L} won’t work in Elm, because as far as I know Elm’s regexes never use the u flag.

system · June 15, 2019, 9:18am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Comparing utf-8 strings by transliterating them Learn	5	1052	December 20, 2021
Best way to write this in Elm? (Parser vs Regex) Learn	2	1136	January 11, 2018
Adding manual hyphenation to your site with Elm Show and Tell	1	609	January 7, 2022
Elm-unicode is here! Show and Tell	6	1126	April 8, 2021
Is any Json string a valid Elm string? Learn	3	549	October 19, 2022

Is elm handling regex `\p{L}` correctly?

Related topics