State of Localization (l10n) and v0.19


#1

At a previous job, we used elm-intl which uses Native/Kernel code for wrapping the window.Intl browser object which, once applied with a configuration, allows pure functions for translating numbers, currencies, dates, sorting, etc. The seems to have been a feature discussed but unresolved. For a myriad of reasons it makes sense to use the built-in browser feature. However, in v0.19, Native/Kernel code is effectively removed. Ports are too cumbersome to round-trip this sort of text transformation for presentation.

Soon I’ll be doing some work that requires working with multiple languages. “A delightful language for reliable webapps” to me would mean that there’s a good, well-supported story for internationalization and localization in production. Will there be some support for this? I would love to see in Elm v0.19 Browser.applicationWithI18n that could have a locale passed in … and maybe some “magic” functions could be accessed in view code to do a -> String.


#2

We also use a native package for Intl so we have to find another solution when 0.19 is out.

One way you could solve this without using a native package is by using custom element.

You could then create a custom html tag

<number-intl number="3.14159">

which would use the native Intl functions.

Having native support would of course be the best solution.


#3

Not sure if its any use to you (its certainly not a replacement for Intl), but I wrote this simple package a while ago for managing multiple languages in an app, and now use it with great success in multiple products.

http://package.elm-lang.org/packages/ccapndave/elm-translator/latest/Translator


#4

I’m currently working on an implementation of https://projectfluent.org/ for Elm, and I’m hoping that it will become a defacto standard for Elm i18n. It will work as a compiler that translate .ftl files into .elm, and address many of the deficiencies of existing Elm i18n solutions (such as support for plural forms, other variants, runtime switching of locale within a single app, embedded HTML…). This will hopefully help a lot with multiple languages and the translation side of things. If you would like to help me out on this, I’d appreciate help!

However, it will also require Intl.NumberFormat and Intl.DateTimeFormat wrappers to correctly implement number and date formatting, which is why I started the discussion you linked. I may be able to implement Intl.PluralForm myself as pure Elm code, by generating the code from CLDR definitions as a part of the compilation process I am writing.

For myself I would prefer locale objects handled explicitly rather than passed in - very often you don’t want a magic one that relies on what the browser thinks is the locale, you want a user to be able to choose.


#5

We had some conversation about this last week within the team. Luckily for us, multi-language support is not a business requirement yet as it will be a bit tricky in our case and is not that high on priority list. However, we’re already doing some key decisions which made us think about what it would mean to add translations later. Better safe than sorry. We’ve come to similar conclusions as described in here.

  • ports would mean it’s async, there would be times we don’t have correct strings and handling would be heavy. On the other hand, it would be straightforward to use native support.

  • native code or rather kernel code as it’s being called now might seem like straightforward way but won’t have long-term support.

  • web components might be a solution but wrapping texts to elements just for translations seems heavy as well. Also, of course, this would limit usage to functions that return Html

    • In this case, it might be better to define your own data type like Translation which would be easier to work with than with opaque HTML and can be possibly used in the same way outside of view code. You can then easily write a function that turns this type to Html Never in render (to turn it to web-components node).
  • Something probably pretty close to what @spookylukey is describing which would mean a bit more reimplementation in Elm. This might be the best solution but also the most ambitious one. I was thinking about spiking some proof of concept but it seems it’s not necessary since it’s being worked on!

My conclusion that the last way would be ideal but also the longest. I think if @spookylukey is working on such an ambitious project it might make sense to standardize on that implementation and put some community effort into that project. In meantime, I think either rolling out your own ad-hoc solution (might be even pretty naive implementation - we do this in case of pluralization) or custom data type solution + ability to turn it to a webcomponent are best options.


#6

Having a way to generate .elm files from the projectfluent-translation files sounds like a really good thing to have! I actually have been working/thinking about an i18n-solution for elm. So far, I came up with https://github.com/kirchner/elm-translation as an idea for an API. I also was working on some tool to generate .elm files (which use this package) from the ICU message format, and I looked into parsing the CLDR definitions to get for example number formatting and pluralization rules without having to use the Intl-API.

I would be very interested to see what you have got so far! I think it’s best to work together on this whole i18n/l10n thing as it is a pretty large project. :slight_smile:


#7

I do like that there’s a better story for i18n and translations since I blogged about my rudimentary approach in mid 2016 :joy:. The Project Fluent is really great – including bits of grammar that people don’t think about because of their own biases.

But wrapping specifically not re-implementing and having access to the built-in window.Intl is my biggest concern around l10n.


#8

I’m also not convinced that re-implementing things (like number printing) which are already available in the browser is the best way to go. On the other hand, the Intl-API does not completely reflect the specifications in the CLDR. For example, the pluralization rules are not complete. In English for example you have "1 second" but "1.0 seconds". But as far as i know, Intl.PluralRules.prototype.select() only considers the numerical value of the number, not its actual formatting.

The number formatting functions I generated from the CLDR were way slower then the native functions, though. So, this may be also something to consider? But then again, there is also Html.Lazy :wink:

You mean the feature where you can define certain keywords (for example your brand name) in the different gramatical cases which are available in the language and then reuse them inside other translations? I was not able to figure out a good way to incorporate this into my approach, yet, so any ideas on how this idea can be translated to Elm are highly appreciated :slight_smile:


#9

Regarding what I have so far in my implementation of the Fluent spec for Elm:

First, I have to admit I have nothing I can show anyone yet!

I have been working on the Python implementation in spare time over the past few months, since my backend is Python/Django, and most of my pages are server rendered and I want to keep it that way.

When I started using the Python bindings for Fluent there was a parser, but nothing to generate the translations. So, I implemented an interpreter, and that work is awaiting review. In addition I then also implemented a compiler, which compiles FTL code to Python, resulting in a massively faster implementation. The compiler branch is now also working well. I have then implemented some other things, including Django integration, and have started to internationalize my app using that.

I mention all of this to say that I now have a very good level of experience with Fluent. Plus, the FTL-to-Python compiler I wrote will be a massive head start in writing the FTL-to-Elm compiler and the techniques I’ll need - large parts of it I will copy over and modify, as I intend to write the compiler itself as a command line Python application. There are of course some big differences in design due to the differences between Elm and Python, but I’ve been thinking it through for months and I’m confident I’ve got a workable design, and be able to make the most of Elm’s static typing to have a very strongly typed solution. In this regard it will differ from other Fluent implementations - they tend do runtime parsing of FTL code, and therefore are error tolerant. We don’t need to do that - we can find most problems at compile time and stop bad translation files ever getting past the compilation hurdle.

I don’t know whether using Python might put some people off, but every language will put someone off, and my familiarity with the libraries and the existing work I’ve done is a big head start. (It’s obviously not possible to use Elm, this tool will need to do lots of IO etc.,). The nice thing about writing a command line tool is that if this takes off, once the design has been thrashed out it would be possible to switch to another language down the road and still maintain the same interface (i.e. the command line interface).

There are some things I’m waiting for before I start the actual FTL-to-Elm specific code. In particular, version 0.6 of the spec has come out, and in the Python project that branch is still being worked on. AFAICS the changes actually make things easier and will simplify the FTL-to-Python compiler, and therefore the FTL-to-Elm compiler too.


#10

We happen to be having exactly this discussion on this issue I raised regarding the Fluent spec, and it looks like some implementations of Intl.PluralRules do in fact support this case:

>>> (new Intl.PluralRules("en", {minimumFractionDigits: 1}).select(1))
"other"
>>> (new Intl.PluralRules("en", {minimumFractionDigits: 0}).select(1))
"one"

This works in Firefox, not in Chrome AFAICS. I can’t see it in the MDN docs but perhaps it is in a draft spec - the Mozilla people working on Fluent seem to be very active in the Intl spec evolution.

The fact that these APIs, along with many other of the vast array of web APIs, are constantly being developed and improved is another reason we should be avoiding re-implementing these things in Elm as much as possible.


#11

That sounds all really exciting! :slight_smile: Would it be possible to share how the generated Elm code would look like? I mean just some concrete examples. I’m really curious as I was thinking about these things, as well. And there are just so many design decisions which have to be taken into account, like for example

  • only generating static translation (functions/data?) or also having the possibility to replace them with dynamically loaded data
  • some sort of checking if pluralized translations have a value for (exactly?) all plural forms needed in the language
  • also having an export from Elm to FTL (so the generated translations are actually serializable data and not functions. Or maybe there is another way this can be achieved?)
  • typed placeholders (String, Float, Date, …)

I think it would be great to have a discussion about what the API could look like! There are some ideas in https://github.com/kirchner/elm-translation but I’m sure you have other ideas, too! :slight_smile: Also, having concrete examples would make it clearer what part of the Intl-API we would need (and in which way), i think.

Also thank you for the info on Intl.PluralForm! :slight_smile:


#12

I’m slightly hesitant to show what the generated code looks like when I haven’t actually written the implementation yet! Also, many of the details shouldn’t be important because you won’t need to edit this code yourself.

I can show you want the current Python code that I’m generating looks like: https://github.com/django-ftl/python-fluent/blob/compiling_message_context/tests/test_compiler.py#L56

The basic idea is that each message gets turned into a function. The actual code obviously depends on how complex the messages are. In Elm, we can eliminate the errors argument in the code (we can detect all errors at compile time). The simplest case would look like this:

a-message-id = This is a message

becomes:

aMessageId : Locale -> a -> String
aMessageId locale args = "This is a message"

locale is unused in this case (it would be passed to Intl.NumberFormat etc. if they were used), a is a record type here that is also unused, but still included so that these functions all have the same parity.

This would be the English function. There would be a similar one for each language, and another ‘master’ function which is the function you actually import and call from your own Elm code, passing in a locale and args. This function then dispatches to the correct one for the locale.

For a message that takes substitutions, we would find all the args in use for that message and replace a with a record type that refers to each parameter. We assume args are strings, unless we can see they are something else (e.g. they are passed to the NUMBER function, or perhaps we’ll be able to make use of hints from semantic comments about variables once that addition to the spec is done).

Example:

a-message-with-an-arg = Hello { $name }, welcome to our website.

becomes:

aMessageWithAnArg : Locale -> { a | name : String } -> String
aMessageWithAnArg locale args = 
    String.concat
        [ "Hello "
        , args.name
        , ", welcome to our website"
        ]

The important thing here is that you cannot forget to include the name argument for this message, or get the wrong type.

More complex constructions (e.g. Fluent select expressions) will obviously get more complex than this, but it is all manageable (see the Python code for examples). The generated code will usually be a lot simpler than the Python version, because we can know the type of every argument, and can ensure it is present, so the error handling can go away.

The harder case is embedded HTML. My ideas so far are these:

We use a naming convention, or some semantic comments to identify messages that need HTML embedding. For example, a -html suffix on the name would be enough:

my-message-html = You have <b>not registered</b>, please remember to do so.

The compiler then parses this message as an HTML fragment, and generates code that outputs Elm Html values, with a different type signature for this kind of message:

myMessageHtml : Locale -> a -> List (ExtraAttrs msg) -> List (Html.Html msg)
myMessageHtml = [ Html.text "You have "
                , Html.b []
                     [ Html.text "not registered" ]
                , Html.text ", please remember to do so."
                ]

Obviously cases with substitutions will be harder to handle and more complex, and require different handling for substitutions into attributes, but still possible.

The ExtraAttrs argument needs some explaining. This is a mechanism for attaching additional attributes to the generate Html nodes, which is necessary especially if you need some events attaching, or you have other attributes that you want to generate dynamically. I can go into detail about how this will work if you would like, but we should probably take this discussion to another place for that. Also there are many more details e.g. handling of numbers etc.

The main points are:

  • translators should only have to edit the .ftl files, and preferably with some helpers like the Pontoon tool that Mozilla has developed - Elm is far too complex to be edited by translators, and we don’t want developers manually copying translations into Elm files.
  • the developers should just be able to call a simple function, passing a locale and substitution args.
  • that function should be very efficient - for the simple case of a static string it should just return a static string.
  • that function should statically require any required parameters to be present and correctly typed.
  • the other details of how that function works shouldn’t be an issue for the developer, because the generated code is never edited, always compiled from .ftl files.

Responding to some of your other points:

some sort of checking if pluralized translations have a value for (exactly?) all plural forms needed in the language

This could be done by some generic FTL checking tools I think, since the plural form handling is done within select expressions in the FTL code. It wouldn’t be that hard to build it into a compiler probably.

also having an export from Elm to FTL (so the generated translations are actually serializable data and not functions. Or maybe there is another way this can be achieved?)

My idea is that the ‘serializable data’ is just FTL files, from which you generate Elm code. Trying to parse Elm and extract parts is going to be extremely fragile and limiting, but the other way around will work fine.


#13

Thank you for taking the time to answer!

I like the whole having placeholders as fixed arguments, so the compiler checks them. Is there a reason why the Locale has to be supplied as an argument. I think, the number printing function could be baked into the message. So having something like

aMessageWithANumber : { a | count : Float } -> String
aMessageWithANumber args =
  String.concat
    [ "This will take "
    , En.defaultNumberPrinter args.count
    , " seconds"
    ]

Or, does it have to be possible to change the formatting of the numbers within the program? If I remember correctly, in FTL you can specify how a number can be formatted within the message, so maybe that should not be changable in the code.

But then on the other hand, I could imagine that for example, if you render a money amount, you may want to have the possibility to change the language and the way the amount is displayed (100 $ or $100 for example) independently from each other. So maybe having some message function aMoneyAmount : Region -> { a | amount : Float } -> String then.

Incorporating HTML is really a tricky question. Is there actually any HTML allowed within FTL messages or is it only things like <b>, <strong>, … without arguments? I think it would be also important to make it possible to use other Html-like packages like style-elements, or accessible-html.

I’m not sure how important it is, to be able to write the translations within Elm and then export them to FTL. Technically it is possible without having to parse the Elm code, if the translations are modelled as serializable data instead of functions. So one would introduce a type Translation args = Translation String (args -> String), where the first string is the serialization and the function is the printer of the actual message. The simplest constructor would just be s text = Translation text (\_ -> text). So, one has

aMessageId : Translation args
aMessageId =
  s "This is a message"

And printing it would need some helper print : Translation args -> args -> String, which can be called in the dispatcher functions, so the actual functions, the user uses would be something like aMessage : Locale -> { ... } -> String again.

One then can define other helpers to get placeholders, …, so the other message would look sth like

aMessageWithAnArg : Translation { args | name : String }
aMessageWithAnArg =
  concat
    [ s "Hello "
    , string .name "name"
    , ", welcome to our website"
    ]

Where concat and string are defined such that they print the string correctly, but also take care of serializing the translation to the FTL syntax. One downside of this approach can be seen here, as every accessor has to be given a string name so serialization can be done.

What I really like about this approach is that, it makes it possible to gradually introduce these messages into your codebase:

  • You maybe start with something like
view email =
  Html.div []
    [ Html.text "Is "
    , Html.b [] [ Html.text email ]
    , Html.text " your email address?"
    ]
  • Then you can refactor it into
view email =
  Translation.asHtml <|
    concat
      [ s "Is "
      , b [] <|
          string .email "email"
      , s " your email address?"
      ]
  • Later you move this into a separate module Translations.En. So, the code looks like
view email =
  Translation.asHtml Translation.En.emailInfo
  • Then, if you need translations to other languages, you can run some tool on this package and generate the FTL files out of them, translate them (or let other people translate them) and generate Elm code for the other languages.

it would also make it possible to use the number formatting functions, without having to introduce another tool into your build chain.

What do you think about this? :slight_smile:


#14

I would be really happy about a more in depth discussion of these things, as I have not come up with a good solution for handling HTML within messages. :slight_smile:


#15

Not to get derailed talking about translations, it seems many seem to agree that leveraging the browser’s Intl feature is the right thing to do as far as crossing some of the i18n/l10n requirements of web applications. With that being the case, and there already being a wrapped implementation, what steps are required to get this on the radar of the Elm dev team? …At least getting a technical response. I posed i18n as a pain point 1.5 years ago and got a we-should-think-about-it from Evan on Reddit, and the other Discourse thread died, but I don’t want this one to succumb to the same unanswered fate.


#16

Best way to avoid unanswered fate is probably to wait a bit. Timing is key here I think. Currently core elm devs seem to be 200% on finishing elm 0.19. Then there will very likely be a time to talk about elm-explorations. I don’t think there will ever be a “process” though. Just try to engage with authenticity and logic with core elm devs to gain trust and find a way forward


#17

I don’t disagree, but v0.19 is what is going to kill the current solution (3rd party native code).


#18

It’s already on our radar.

So do we.

Making a wrapper for some browser API and posting it is not the right way to propose something for the platform. We looked at https://github.com/vanwagonet/elm-intl and quickly decided not to consider it further because it just wraps the Intl API instead of serving to explain what an ideal Elm API that uses Intl could be. We think this is a pretty hard question in its own right considering how much stuff Intl actually does.

It seems to me like this thread is really about how to do something in 0.19 that uses Native modules. The answer is to use ports or custom elements. This is going to happen for other people’s programs and other repos that distribute third party Native code. Intl isn’t special in that regard, nor is it uniquely poorly suited for ports and custom elements.

If you want to help the process along while we finish 0.19 the first step is always a literature review. Find ways that other languages and platforms address the specific i18n use cases that you have in mind (Intl does a lot of stuff that we consider distinct APIs), present what you learned and share your sources.


#19

This thead is getting long and fractious. Let’s do this:

  • Make new threads for specific problems with converting i18n stuff with Intl to use ports and custom elements.
  • Make a new thread for the fluent stuff that @spookylukey is working on, if that’s what @spookylukey wants to do. It’s definitely very interesting to me!
  • Make a new thread if @toastal or anyone else wants to complete a review of prior work on the subject of i18n APIs that might use Intl in an Elm implementation. Feel free to find me in Slack or send me DMs here if you have questions about what the final product might look like.

#20