I have been thinking about the
String type in elm. Specifically how they are encoded.
I do not think elm wants to use the UCS-2 encoding and so we should in the medium to long term (before elm 1.0 commits to backwards compatibility) look to change.
Here are some references on what other languages do:
|Language||char type||string type||notes|
|Rust||Unicode scalar value||Array of bytes that must be valid UTF-8||There are extra string types that provide compatibility with c and compatibility with the OS|
|c||You get alphanumeric characters and some symbols||Sequence of bytes must end with
||I think strings are normally UTF-8 encoded, c just happens to be older than unicode|
|haskell||Unicode code point||
|go||Either a byte or a rune which is a Unicode code point||An array of bytes, can be converted to an array of runes (i.e. code points)||See https://blog.golang.org/strings and https://golangbot.com/strings/|
- You have to use UCS-2 if you want to interact with the filesystem on windows. See https://simonsapin.github.io/wtf-8/#motivation