I need to normalize my strings, mainly to remove diacritics (“combining characters” to be precise).
And reimplementing this very sophisticated code of Unicode normalisation in Elm seem to be overkill, and would require downloading including the Unicode tables which weights close to a 1 Mb,
A natural long term solution would be adding it to elm/core (core/src/Kernel/string.js) with a safety for the very rare old browsers which do not implement it.
But in the time being, for my single project, can I do something similar ?
PS : I am new to Elm, just starting exploration, with a backend Typescript background.
It ought not to be that unwieldy to achieve something like this through ports. Could you post a code snippet, or an example on Ellie showing how it complicates your code?
With a concrete example, lots of people should be able to give you ideas on how to achieve this with minimum fuss…
I tried to make a terrible hack, overwriting
String.prototype.slice, but for some reason it didn’t work. https://ellie-app.com/57JVTZqZf5ga1
EDIT: Working version: https://ellie-app.com/57KjbVPXGrRa1
The shear audacity of this hack is both deeply disturbing and also strangely beautiful.
I agree with @Jess_Bromley that is should be possible to do this with ports in a way that’s not too bad. The trick is often finding natural identifiers that make it simpler to incorporate the port responses into your application flow.
Can you provide some more context about what the normalization is for? Is it for passing data to a server API or are you using it for local string comparisons in some context?
With some more details, this might make a nice addition to elm-port-examples
The overall context is : a user needs to select a city. user inputs a pattern string, which is sent to a third party REST API, which returns a JSON with candidates. This remotes API hopefully ignores case and diacritics for comparison (i.e. “Clément”, “clement” or "clèment’ all hit “Saint-Clément”)
When generating the view, I want to hilite the part that hit in the city name (“Saint-Clément”). The problem is that I need to strip combining characters in the user input and each city name, just to get the position of the target string, and insert markup (like around the hit).
If I had a String.normalize function, this would just need a simple List.map on the server result with a very small one line function.
Because the need is during view generation, I think the clean way to go is to make my own small web component, That’s what I will do.
I have a solution, but still think String.normalize() should be available in elm. The polyfill for it (unorm can be ported to Elm, but it is 143 Kb unminified, and probably much less efficient than the implementation in V8.
May be it could be a requirement for future compilation and execution targets to provide this standard functionnality in a global world. ICU, the underlying C open-source library, is portable and widely available.
I believe you should open an issue or submit a pull request on github:
Thanks for the details, @olivr70. I’m definitely not suggesting that having a unicode library of some sort in Elm wouldn’t be useful. Sometimes features like that take a bit to roll out in Elm though as there’s lots of long-term things to consider, and though to put into good API design, and it has to get prioritized alongside other work, etc.
In the meantime, I ask about ports a lot, because I’d like to help folks get the nicest possible experience with the built-in interop until more features are available in core. For what you were trying to do, I think you should only need a single pair of ports (one inbound, one outbound), but you are probably right that a web component would be a nicer solution. I may still try to work up a port example in case other folks are interested.
I believe there should be very little overhead in encoding & decoding data with ports. There’s no conversion to JSON strings when passing between JS and Elm. AFAIK, the decoder library is mostly just validating the structure of the incoming JS objects. I don’t know how large your arrays are, but if you saw actual performance issues with the decoding/encoding process, I’d be curious to know more.
Maybe you can work around the problem as well by using a less general solution. For example, maybe your app only deals with French cities? Then you might be able to whip up a little normalize function that deals with most common French characters.
I had a similar experience trying to get a correct formatted date and time in different languages:
@lydell. you are probably right, it may be the simplest for the time being to hack a french specific cleaning function. I don’t really it though.
I had a look at elm/core for a PR to expose String.normalize(), but for the time being it require Elm 0.18 to build. I will wait until it is upgraded to 0.19 to make one.
This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.