Access to javascript String.normalize()

#1

Hello everyone,

I need to normalize my strings, mainly to remove diacritics (“combining characters” to be precise).

I have tried “elm-string-normalize” package, but it does not conform to the Unicode standard. I just would like to invoke the pure Javascript string function String.prototype.normalize(), part of ES6.

I looked at using ports, but it introduces considerable overhead and complexity (breaking the code flow) for a one line invocation, as Elm compiles to Javascript.

And reimplementing this very sophisticated code of Unicode normalisation in Elm seem to be overkill, and would require downloading including the Unicode tables which weights close to a 1 Mb,

I have read several conversations and I perfectly understand that isolating side effects is a fundamental and major feature of Elm, but some Javascript functions are pure.

What is the way to go to add an Elm binding to a pure Javascript function ?

A natural long term solution would be adding it to elm/core (core/src/Kernel/string.js) with a safety for the very rare old browsers which do not implement it.

But in the time being, for my single project, can I do something similar ?

Thanks

PS : I am new to Elm, just starting exploration, with a backend Typescript background.

3 Likes
#2

Some Javascript functions are pure, but Elm has no way of knowing which ones are. This is why the only safe way to interact with Javascript is through a port.

It ought not to be that unwieldy to achieve something like this through ports. Could you post a code snippet, or an example on Ellie showing how it complicates your code?

With a concrete example, lots of people should be able to give you ideas on how to achieve this with minimum fuss…

1 Like
#3

I tried to make a terrible hack, overwriting String.prototype.slice, but for some reason it didn’t work. https://ellie-app.com/57JVTZqZf5ga1

EDIT: Working version: https://ellie-app.com/57KjbVPXGrRa1

4 Likes
#4

The shear audacity of this hack is both deeply disturbing and also strangely beautiful.

4 Likes
#5

I agree with @Jess_Bromley that is should be possible to do this with ports in a way that’s not too bad. The trick is often finding natural identifiers that make it simpler to incorporate the port responses into your application flow.

Can you provide some more context about what the normalization is for? Is it for passing data to a server API or are you using it for local string comparisons in some context?

With some more details, this might make a nice addition to elm-port-examples

#6

Thanks everyone.

The overall context is : a user needs to select a city. user inputs a pattern string, which is sent to a third party REST API, which returns a JSON with candidates. This remotes API hopefully ignores case and diacritics for comparison (i.e. “Clément”, “clement” or "clèment’ all hit “Saint-Clément”)

When generating the view, I want to hilite the part that hit in the city name (“Saint-Clément”). The problem is that I need to strip combining characters in the user input and each city name, just to get the position of the target string, and insert markup (like around the hit).

If I had a String.normalize function, this would just need a simple List.map on the server result with a very small one line function.

Because the need is during view generation, I think the clean way to go is to make my own small web component, That’s what I will do.

OK, it can be done using ports, but I have to create 4 ports, encode and decode a large JSON array 2 times on the way (the results), add 2 properties in my model, add 2 javascript functions which now I have to manage and keep in sync with my Elm code.

I have a solution, but still think String.normalize() should be available in elm. The polyfill for it (unorm can be ported to Elm, but it is 143 Kb unminified, and probably much less efficient than the implementation in V8.

May be it could be a requirement for future compilation and execution targets to provide this standard functionnality in a global world. ICU, the underlying C open-source library, is portable and widely available.

Olivier

2 Likes
#7

I believe you should open an issue or submit a pull request on github:

#8

Thanks for the details, @olivr70. I’m definitely not suggesting that having a unicode library of some sort in Elm wouldn’t be useful. Sometimes features like that take a bit to roll out in Elm though as there’s lots of long-term things to consider, and though to put into good API design, and it has to get prioritized alongside other work, etc.

In the meantime, I ask about ports a lot, because I’d like to help folks get the nicest possible experience with the built-in interop until more features are available in core. For what you were trying to do, I think you should only need a single pair of ports (one inbound, one outbound), but you are probably right that a web component would be a nicer solution. I may still try to work up a port example in case other folks are interested.

I believe there should be very little overhead in encoding & decoding data with ports. There’s no conversion to JSON strings when passing between JS and Elm. AFAIK, the decoder library is mostly just validating the structure of the incoming JS objects. I don’t know how large your arrays are, but if you saw actual performance issues with the decoding/encoding process, I’d be curious to know more.

#9

Maybe you can work around the problem as well by using a less general solution. For example, maybe your app only deals with French cities? Then you might be able to whip up a little normalize function that deals with most common French characters.

1 Like
#10

Hello,

I had a similar experience trying to get a correct formatted date and time in different languages:
https://discourse.elm-lang.org/t/module-for-formatting-time-in-different-timezones-and-languages/2547/4

As with Elm 0.19 direct integration of Javascript is not “allowed”/ possible anymore, I wished, that a way to use all pure functions of javascript would be available for building better libraries, especially for supporting multiple locations and languages.

#11

@lydell. you are probably right, it may be the simplest for the time being to hack a french specific cleaning function. I don’t really it though.

I had a look at elm/core for a PR to expose String.normalize(), but for the time being it require Elm 0.18 to build. I will wait until it is upgraded to 0.19 to make one.

Thanks everyone

#12

I added this very request to String.Extra https://github.com/elm-community/string-extra/issues/39

1 Like
closed #13

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.