Hello all! I’ve just released a new library: BrianHicks/elm-string-graphemes
. It does everything String
does, except it operates on graphemes instead of bytes or characters. Observe:
import String.Graphemes
String.toList "🦸🏽♂️" --> [ '🦸', '🏽', '\u{200D}', '♂', '\u{FE0F}' ]
String.Graphemes.toList "🦸🏽♂️" --> [ "🦸🏽♂️" ]
Check it out at https://package.elm-lang.org/packages/BrianHicks/elm-string-graphemes/latest/. In particular, I’ve included a primer on why this library is necessary in the README if you haven’t worked a lot with different levels of text (e.g. the emoji above is one grapheme, but four characters and 17 bytes. If that doesn’t make sense yet, go read it!)
If you find any issues with the grapheme segmentation (e.g. where it breaks improperly) please open an issue! I would also love it if we could get the parser to go even faster—I already took it from 0.1% of String.toList
performance to 1% to 2%, but can we get higher? Probably!