I am currently thinking about how a nice API for internationalization and localization in Elm could look like. A first draft with a readme explaining the problems and how that API could solve them can be found at github.com/kirchner/elm-cldr.
I would like to do more work on this and continue implementing it. Therefore I would be very happy to get some feedback on the following questions:
Do you think that the drafted API is a reasonable approach to implement internationalized/localized texts in Elm?
In your opinion, what would be the best way to make the data of the CLDR accessible in Elm?
Also, I would be very interested what your use cases for such an internationalization/localization API would be!
(Sorry, I’ll probably botch the distinction between localization and text translation.)
A commonly overlooked aspect of localization APIs is that the translations themselves are generally not done by the programmers (and are also commonly outsourced). You note in the README:
It should not be difficult to generate these translation modules Translations, Translations.En, … from “standard” localization files (like ICU messages). Also it should be simple to generate ordinary localization files from translation modules.
Understanding that workflow is probably the most important thing I’d be looking for when considering adopting an API like this one. I’d be interested to see those kind of conversion tools considered to be a core part of the project, and maybe show an example of such a workflow in the README somewhere. On projects I’ve worked on that localized to multiple languages, the work of integrating the translation files always took a lot more time than the work of making the app use the localization API, so I personally think the tooling is much more important to get right than the Elm API itself.
One other consideration is: I think ideally there should be a standard approach to localization for Elm in general. You seem to be trying to look at the big picture when designing this API, so I’d like to suggest that it could be useful to give some guidelines about how package authors can support localization (for instance, a function in a package might return an error message, so what would the best practice be for allowing an Elm app that uses that package to localize the package’s error message.)
I like that your approach with the lists instead of the string templates with placeholders provides better type safety. The Translation uber module does bother me a little because it’s just copypaste from the language modules but maybe that won’t be a real issue in the end, just a gut feeling. I agree with avh4 tooling for this translation stuff is key.
I remember that iosphere/elm-18n lets you do all the i18n stuff in a build step, I like that approach very much because it keeps the file size low for the output - imagine an app with 10 languages where you most certainly would use only one and get burdened with a huge bundle size for no benefit.
As a potential user I’d really like to have the static compilation option in addition to setting the language dynamically.
As a side note: I’d personally drop the Make dependency, first it won’t work on Windows and you already have a package.json of Node fame, why not use an NPM script?
i18n workflow and tooling: I added another section to the readme, explaining how I think converting from Elm to some translation formats (and back) could look like. I thought of two scenarios:
You start with an (untranslated) application where all your texts are simply inlined String's, and you want to internationalize it.
You get a set of translations from someone (Marketing, Product Planning, …) and you want to integrate these into your application.
Are there other workflows I’m missing here?
internationalization of Elm packages: I can think of two ways, how one can translate packages:
You expose a type which models “the things your package functions can say”. For example this could be an Error type like this one. Then either, the user has to generate String's from these types themself, or you provide printer functions, for example printError : Locale -> Error -> String for some languages.
You expose a type with which the user can provide translations for the different things your package could say. Maybe it is a bit like packages, as for example thebritican/elm-autocomplete, let you customize parts of their view by providing "Html slots" which can be filled.
dynamic/compiled in languages: I think both ways should be supported! I have added an explaination how I think this could be done in the readme when describing the workflows. I wonder if at some point it won’t be necessary to only compile in the language which is currently selected, just to keep asset sizes small. Maybe Elm will support dynamic loading of compiled packages at runtime only when they are needed?
I think the other workflow I’d look for is having an already-translated app and you need to update the translations (maybe a few new strings were added, or there was a second pass at the translation and some of the translations have updated text). A notable specific case is when you have several languages already supported and you need to add a new string; how should you mark the string in the languages that you don’t yet have a translation for so that you can still compile the app and make staging builds, but also have an indication that you need translations for the new strings.
Those were my initial thoughts as well. My original suggestion was that as you (and others who use the package) learn which approach works best, I think it would be good to update the readme to recommend the best approach (rather than just describing the possible approaches).
One thing that’s missing for my use case (which is likely fairly common) is the ability to alter translations’ values without needing a rebuild of the Elm app.
EDIT: To be clear, I’m okay with needing a page refresh to pick up changes, but I don’t want to need to recompile once deployed, even if users change translations.
tl;dr: We provide elm-intl with fallback language rules and mark generated translations as fallbacks (if they are not available in the actual language). Otherwise elm-intl fails, telling the user which translations are missing. Fallback translations are excluded when running elm-intl generate-json.
If you have a complete set of translations, I would say you just add some translations/<new_locale>.json and run elm-intl generate-elm.
What about incomplete sets? So, say you don’t have German (de) translations for everything. Then elm-intl generate-elm should fail (telling you what the missing translations were) unless you say what the fallback language for de will be. So, you somehow give elm-intl the information that de has en as fallback and then it will use the translations from translations/en.json for the ones which are missing in translations/de.json but marking them as fallbacks. This could look something like this:
So we mark on the type level that these translations are just fallbacks (which does not change the way they are printed) and can exclude them when running elm-intl generate-json. This way we make sure that after generating the jsons we don’t end up with “complete” languages which are actually just filled with fallback translations.
There could also be a “stricter” version of print which breaks compilation when called with a fallback translation. So if you want to check that you have properly translated everything before releasing you just change your imports of Localized. (But I think that would require that Text has a third type variable, so we have s "foo" : Text Final args msg but s "foo" |> fallback : Text scope args msg) Maybe there are other ways to let the compiler tell us we are using fallback translations?
tl;dr: Using some versioning system (like git) could already solve most of the update/conversion issues, but I am not sure!
What about updating translations? Say you have generated your Elm code before and updated your json-files. If you run elm-intl generate-elm again, it will just overwrite your translations modules. This maybe seem bad, but if you checked these modules in via git, you can actually very easily get the changes with git diff src/Translations/De.elm for example. And then you can also use git to only apply the changes you find good and drop (or adjust) the other ones. This would also work the other way: if you made changes to src/Translations/De.elm and ran elm-intl generate-json, git will tell you what has changed in the translations/de.json and you can only stage part of them if you like.
You would only run into trouble, if you made changes to the Elm-files and ran elm-intl generate-elm before staging or commiting these. (And of course, if you edit the jsons without commiting or staging and run elm-intl generate-json).
Or am I missing some other problems here? And would this actually be enough to deal with the updating/conversion use case?
Another way would be, that elm-intl tells the user which parts of the translation modules it will overwrite and ask if it should do so. (And there could be a --yes option to just overwrite everything per default, so you get the behaviour from above.)
I’m not 100% sure in which scenario it is not possible to change the elm/js-asset on the user’s computer. Maybe that is because I am just thinking “websites”, and there, when the user reloads (or the Elm application makes their browser reload), they will get a new compiled application with updated translations. Could you perhaps elaborate what your situation is?
What I really like about the current API is, that you have the following guarantees:
Every piece of text is available.
Every placeholder is filled correctly.
Every required pluralization form is available.
If we want hot swapable translations, we would need some form of decoding at runtime and unfortunately I don’t see a way how I can make this possible and still get these three properties (without causing runtime errors). But I might be missing something!
Our (server-rendered) app allows permissioned users to edit translations while the app is running, and they expect to see the new values as soon as next page load.
I do not wish to have to provision a working Haskell build environment on my company’s prod servers.
Sounds interesting! I will think about if it is possible to also capture this use case. Does it have to be possible that the user can also change the signatures of the translations at runtime, or can the number and types of the placeholders be fixed at compile time?
I have put some more thought into the API having in mind that translations should also be dynamically changeable without having to recompile the Elm application. The new version can be found in the dynamic branch of the repository. If you want to take a look at the documentation you can put the documentation.json (of the dynamic branch) into the docs previewer.
The tl;dr idea is that you give every translation and every argument a name, which is just a String. So when you print a translation, you can now say: look up the translation’s name in this dictionary and the values will be Strings with the new content (which should be formatted in the ICU Message Format).
For example, if your Translation was this:
greeting : Translation { args | name : String }
greeting =
final "greeting" <|
concat <|
[ s "Hello, "
, string .name "name"
, s "!"
]
You could print it with dynamically generated content like so:
There are also some functions to check if the dynamic translations are in the correct format and fit the structure of the static Translations.
Is this a good extension of the API to get dynamic translations?
In particular, what do you think about the additional boilerplate? (Having to add final "translationName" and having string .name "name" instead of string .name.)