Comparing utf-8 strings by transliterating them

Hello,

I wish to build a fuzzy searcher that should allow using accents and other special characters in its search string.

So, for example if I have a string like

"Dès Noël, où un zéphyr haï me vêt de glaçons würmiens, je dîne d’exquis rôtis de bœuf au kir, à l’aÿ d’âge mûr, &cætera"

I’d like it to be converted to

"Des Noel, ou un zephyr hai me vet de glacons wurmiens, je dine d'exquis rotis de boeuf au kir, a l'ay d'age mur, &caetera"

There is a lot of edge case that I can’t think about beforehead (', ÿy, etc) and even some of them need to be translated to many characters (æae)!

So my question is, is there a way to not do this conversion manually in Elm? Is there some standard utility that I missed? Some third-party libraries? Should I use ports in JS (I’d rather not at this point)?

Thank you for your time and your answers :grinning:

You can give a try to String.Normalize - elm-string-normalize 1.0.2.

I found 2 additional alternatives:
https://package.elm-lang.org/packages/Fresheyeball/deburr/latest/
https://package.elm-lang.org/packages/elm-community/string-extra/latest/String-Extra#removeAccents

Wow! Thank you, this is exactly what I am looking for!

I think “normalize” is the keyword that I should’ve tried searching before asking this, but I didn’t thought about it :sweat_smile:

For the record, here’s the output of each of these library given my example string:

TLDR:

  • deburr and elm-string-normalize have the exact same behavior on this given input
  • string-extra fails to parse œ and somewhat succeed on æ by giving a a
  • none of these are able to do ' but it’s okay I guess
  • I have no clue about the performance implications of each of these solutions
> import String.Deburr as String -- Fresheyeball/deburr (provides deburr)
> import String.Normalize as String -- kuon/elm-string-normalize (provides removeDiacritics)
> import String.Extra as String -- elm-community/string-extra (provides removeAccents)
> sample = "Dès Noël, où un zéphyr haï me vêt de glaçons würmiens, je dîne d’exquis rôtis de bœuf au kir, à l’aÿ d’âge mûr, &cætera"
"Dès Noël, où un zéphyr haï me vêt de glaçons würmiens, je dîne d’exquis rôtis de bœuf au kir, à l’aÿ d’âge mûr, &cætera"
    : String
> String.deburr sample
"Des Noel, ou un zephyr hai me vet de glacons wurmiens, je dine d’exquis rotis de boeuf au kir, a l’ay d’age mur, &caetera"
    : String
> String.removeDiacritics sample
"Des Noel, ou un zephyr hai me vet de glacons wurmiens, je dine d’exquis rotis de boeuf au kir, a l’ay d’age mur, &caetera"
    : String
> String.removeAccents sample
"Des Noel, ou un zephyr hai me vet de glacons wurmiens, je dine d’exquis rotis de bœuf au kir, a l’ay d’age mur, &catera"
    : String
3 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.