I’ve recently published elm-unicode. It is basically a Unicode version of the Char.isXYZ functions. I’m sorry the documentation is a bit sparse but… I really didn’t know what more to say.
This solves the problem that Char.isLower 'α' or Char.isLower 'è' are both False.
It is automatically generated based on the latest Unicode standard which, at the time of writing, is Unicode 13.
Nice! Do you plan on adding more features or just keeping it a Unicode-aware version of Char?
I’m working on an app in which I will need to be able to get the Unicode name of a specific codepoint for instance. Some more ideas that would potentially be useful:
Unicode normalization
planes and blocks names
HTML entity representations
EDIT: In any case, thank you for this example of code generation from the Unicode standard, I might use that as a guide for the use case I mentioned!
I’m open to adding new features, but only if “Char-level”. For graphemes and stuff there should be other libraries imo.
Your three ideas are interesting and respectively: very hard and probably out of scope, feasible and in scope (open an issue on the repo) and trivial (just use Char.toCode, elm-hex and some string concat) and probably out of scope
That’s fair!
I’m realizing as well that the normalization stuff would be at String level, since it’s about turning two characters into one, among other things.
What about getting the Unicode name from the character? I’m guessing it’s probably not useful to a lot of people and it might lead to a big file size maybe?
It would lead to a big size, yes, although it’s also true that dead code elimination would strip it out for people who don’t need it. At the same time I think that it’s probably uncommon and it shouldn’t be hard for people who need it to just vendor my code