Backward compatibility of new Elm-like languages to Elm

Background

I am getting quite far along in my conversion of the current Elm compiler written in Haskell to a new “ElmPlus” compiler (currently called “Ficus”/“ficus-lang” since the names fir-lang and fig-lang weren’t available for organization names on GitHub) still written in Haskell but with the intention to translate the compiler to be written in itself since @pit and @deciob have demonstrated that one can translate the Elm compiler even into Elm transpiling to JavaScript, although with a fairly high cost in performance; since my compiler will use Elm syntax as a more general purpose language to output to at least C/Web Assembly and with linearly indexed arrays and updates-in-place-when-last-used, it will be at least as fast as Haskell, but with the possibility of outputting to many of a number of languages such as PHP, Typescript, Python, R (all easy as they are all dynamically typed languages with garbage collection, but possibly worth doing in order to enjoy the advantages of the pure functional code paradigm as well as the huge ecosystem of packages available for those languages), Go, Nim, Rust, etc. There was a recent thread translating “elm/core” into the Go language that culminated with @rupert’s suggesting that once one had a “elm/core” written in Go, one could start to look at writing a back-end for the Elm compiler to output Go code; this isn’t quite so easy as one might think even though Go has its own garbage collection, because Elm erases the Type information available to the Code Generator other than for what is exported from each module - however, my compiler conversion includes each module’s internal Type information to be made available to Code Generators, so that problem can be solved.

Desire to Avoid Fragmentation of the Elm Community

One of my concerns is not to fragment the Elm user space, with a large group of the former Elm community being frequent Roc lang contributors, another group flocking off to Gren, and now seemingly quite a bit of interest in Lamdera as being able to handle and connect a “full-stack” of back-end servers and front-end clients, and there also being the possibility of Evan eventually offering a new Elm follow-on that may do what either or both of Lamdera and my efforts do. However, of those only Lamdera and my project (and likely a new Elm, of course) are committed to being fully backward compatible to Elm as to being able to use any of the packages and code that don’t contain “Kernel” code and even being able to use those when compiling to JavaScript. All changes and features added are fully forward compatible to Elm source code as to syntax, and any “.elm” source file will compile as long as it doesn’t try to define new custom “Ability’s” which will require the use of a new keyword ability which might conflict with a name for a binding (value or function). I propose that if the feature of custom “Ability’s” is offered, it will be limited to use in source files with a new source file extension such as “.fcs”. Thus, use of my compiler will be very similar to the use of Lamdera other than the very few extensions and the capabilities of generating efficient code in other languages other than JavaScript.

Some Feedback Please:

At first I was going to mostly just replace the JavaScript code generator with a C code generator because the C code can then be passed through further compilation such as through Emscripten to produce JavaScript or Web Assembly and the C code can be used directly to produce native code for all the major platforms such as Windows, macOS, Linux, and mobile apps for Android and iOS; however, I see there may be an advantage in being able to produce JavaScript directly in then being able to make on online IDE and/or REPL without having to use a server to do the compilation and possibly run the result. Now, I also see other interest in producing code for other back-end languages such as Go and others as mentioned in the Background paragraph.

Now currently, Elm embeds the native “Kernel” JavaScript into the AST “objects” files where it applies, but that becomes a bit awkward if one were to support, say, ten back-end languages in that there would be ten different embedded “Kernel” packages in each of those modules that use them. Other languages such as Fable and Roc have essentially some “platform” code with each “platform” supporting one type of back-end, but that means there is a quite a bit of redundant code in those parts of the packages that aren’t “kernel” code. I am thinking of separating the “Kernel” code for each back-end out from being embedded in the "Artifacts.dat` files for each package that uses so as well as the “Artifacts.dat” file there will be separate files for each “platform” back-end compiled to adjacent locations to be used by the different code generators, which might well be just plug-ins that that the output of the more generalized AST files and combine them with optimizations to produce the target code. Does anyone have any suggestions on this?

I know some are dying to ask so I’ll answer the question: “Will the new language support FFI to the new back-end languages?” - Yes, there is no reason not to support this, and I can think of at least three ways to implement it: 1) create package wrappers around any foreign code one wants to include and remove the restriction on publishing packages with “Kernel” code, 2) modify the port mechanism so that it can call and be called from the back-end code languages, and 3) provide a full FFI package something like what Haskell or PureScript have; Definitely some of these will be implemented, and very likely eventually all three.

The Biggest Problems with Backward Compatibility

This last is the biggest problem with backward compabilitity with current Elm code, and it isn’t related to Elm syntax at all but rather to the “standard library” packages: Many of these packages are wrong or inconsistent!!! I’ll give examples of the main ones, as follows:

  1. Currently, the Int Type is of inconsistent bit depth: all Bitwise operations produce 32 bits, but so also does integer divide. Other languages, even producing JavaScript, are more consistent about this, with PureScript consistently treating its Int Type as 32-bits, Fable mostly retaining bit depth but allowing overflow at least in the case of addition, subtraction, and multiply to “number/float” type ranges. As most applications will use Int’s within a signed 32-bit range, I would like to make this the default behaviour, but risk breaking any code that depends on Int’s having the extended range (sometimes). The only way I can see to make this consistent with current versions is to have two “Kernel” versions for each back-end language, one to be used when compiling “elm.json” projects with “.elm” source files and the other to be used with “ficus.json” projects with “.fcs” source files. Is this important and frequent enough to justify all the extra work?
  2. Currently, integer division, and the modBy, and remainderBy functions are an inconsistent mess as to what happens when dividing by zero: integer division by zero produces zero which while not mathematically correct, is at least consistent and other languages take this shortcut. Currently modBy 0 0 produces a panic/exception which Elm should not allow and remainderBy 0 0 produces NaN (Not a Number) as an Int which it is not as it is a Float representation. To be consistent, I would like the results of these functions with zero to be a zero Int to be consistent with integer divide. This should be safe to do as current Elm code will have provided zero test’s for the first argument in order to work correctly.
  3. Currently, Elm’s power operator only works correctly for Float’s but for powers of less than one for Int’s produces a Float result which it calls an Int. I would like to provide special case code that produces a zero Int for any negative Int exponent, which should be fine with any workarounds currently used that make this usable.
  4. Currently, the String functions that use the Slice function by index values can be wrong. One of the biggest problems with the use of index values on variable length string encoding such as UTF-16 as used by Elm/JavaScript or UTF-8 is that programmers fail to consider than a character length is not necessarily one index value. Thus, the String.length function produces the number of 16-bit words in the String, not the number of Unicode characters, and use of the String.slice function can be wrong when such use assume one index count per character, so therefore the String.left, String.right, String.dropLeft, and String.dropRight functions will be wrong when such use assume the number of index positions is the number of characters as they do. I would like to fix these functions so that the relative left and right offsets are corrected to be the number of characters to be retained/dropped, which shouldn’t affect current code in that some workarounds will have to be in place to make their use correct when there is a possibility of characters requiring two 16-bit words.
  5. Also related to string representation, C strings are UTF-8 meaning that they are variable length character representations, and Emscripten then does automatic conversion between UTF-8 and UTF-16 for normal C strings when they cross the interface between Web Assembly and JavaScript. However, the obtained index values from the String.indexes/String.indexes and the count from the String.length functions would then reflect 8-bit index values rather than 16-bit ones. In order to be exactly compatible with current Elm results, one would have to use a UTF-16 string format by default, which would be easy to pass across the Web Assembly/JavaScript interface, but would require a conversion to UTF-8 when being passed to C library functions, and would be somewhat less efficient for encoding mostly ASCII character strings. I’m afraid the new String native modules will have to put up with this inefficiency in the interests of preserving strict current Elm compatibility..

Any Further Cases?

I welcome input on these and any further problem edge cases I may not have considered.

2 Likes

Probably not fully relevant to you: if you only wanted the translating into different languages part, you can get pretty far in pure elm with elm-syntax and elm-syntax-type-infer, see e.g GitHub - lue-bird/elm-syntax-to-fsharp: transpile elm to F#.
This would be low friction for existing elm users while likely easier to code because it’s already elm and not super slow (elm-syntax at least).

3 Likes

Hello @lue-bird:

First, thanks for your reply to my post, and you are right that your latest work is likely not fully relevant to what I want to accomplish.

I just finished looking at your linked “elm-syntax-to-fsharp” repo as well its dependency in your “elm-syntax-type-infer” repo, and commend you on the amount of work you have put into this - over 27,000 LOC in the main “…fsharp” source code file and over 28,000 LOC in the main “…infer” source code file!

I agree with you that F# is a worthy target for such a project for the reasons you give in your “…fsharp” README.md file, as F# is one of my three favourite languages along with Haskell and (of course) Elm. At first I assumed this would be particularly easy as you could depend on DotNet Garbage Collection (GC) to replace the GC of JavaScript so wouldn’t haven’t to worry about memory management (which works out), and would be able to rely on F#'s type inference to re-infer the necessary types; however, it looks like F#'s limited type inference wasn’t adequate for your needs (or you didn’t find a way to be able to rely on it), so you needed the whole type inference project to provide the type information that doesn’t come about from just a conversion of the Elm syntax. Thus, your combined project necessary to accomplish this Elm → F# conversion has a combined total of something like over 60,000 LOC including documentation and support code, where as a simple fork of @deciob’s Guida repo would only require the same kinds of things that I am doing to the Haskell Elm compiler in providing the Elm type inference results for each module as well as the filtered exposed values for each module, which is only a few lines of code but would mean your type inference work wouldn’t be necessary. Then all of the Type and AST information necessary to build your target F# file would be available in the “stuff” folder that could be tapped into by an extra program written in whatever language including a modified Elm (modified to be able to access the files in the “stuff” folder) or (probably preferably) as a separate “FSGenerate” module within the forked Guida project. To duplicate your work in your project, one would still need to provide a modified “elm/core” module with the “Elm/Native” JavaScript files being translated into equivalent F# files just as you have done with either this package augmented or another “FSConsole” package providing whatever CLI functions such as printing, file access, etc. to be supported. It would seem to me that this would provide F# output from Elm with probably less than half of the total work you have done.

One of my goals is to be able to provide efficient Web Assembly web pages as an option from existing Elm source code, and while your “README.md” in your repo mentions that F# code can be output as wasm code, AFAIK this produces large slow executable’s using Blazor (unless there is something new and improved I don’t know about), and (at least your current implementation) can’t take current Elm applications and packages and produce a usable web page that is compatible with what the Elm code currently produces, even if one were to provide the full set of “Native”-based converted packages such as “elm/html”, “elm/browser”, “elm/json”, “elm/time”, etc. If these packages were provided, one could likely generate usable web pages using JavaScript through Fable, but one wouldn’t really have gained much over the current Elm compiler. One could produce native code applications through Fable’s (experimental) output to Rust, but in my experience the generated Rust code isn’t any faster than JavaScript run on node.js due to all the non-smart non-elided reference counts for anything built on the heap, which would include List nodes or any other Custom Type.

As well as the above, there would come about the considerations for exact backward compatibility to current Elm web pages as is the subject of this thread. For instance, your implementation makes the F# representation of Elm’s Int into a 64-bit int which will not produce the same results as Basics functions manipulating Int’s; there is no perfect solution for this other than to make all Int’s be represented by float64’s and one could even truncate these to 32-bits in the exact same way as they are by the current “elm/core” Basics module. The other compatibility problem likely doesn’t dome up for F#: it looks like F#'s internal representation of its native strings is UTF-16 which would match JavaScript’s representation of strings so that index values and string length counts should be the same.

That is why I plan to do something like the above paragraph but forking Evan’s Haskell compiler code instead of the Guida Elm code for now, but outputting as C code for ease of conversion to Web Assembly (and ECMAScript) through emscripten. At least temporarily I am producing C code by generating Nim language code because I then don’t have to immediately implement “smart” reference counting eliding as Nim already has that when one uses its “Arc” (Automatic Reference Counting or Araq/Andreas Rumpf - Nim’s BDFL - Reference Counting) memory management which already does the data flow graph analysis necessary to accomplish this and I can work on this data flow graph analysis for C later. One of the limitations of using Nim as an intermediate step is that I don’t know that I can implement the extra optimizations of creating all references on the stack rather than the heap unless it can be proved that they escape their scope or are contained in a type that escapes their scope and whether modify-in-place-when-last-used can be made to work reliably for an extra optimization that can greatly increase the performance be eliminating many memory allocations/de-allocations.

Since I started this thread, I have been thinking that there are many other languages that would be of an advantage to those of us that love functional programming, as follows:

  1. Being able to output Dart language code would then enable being able to output applications for everything that Flutter supports, including Windows, macOS, Linux, Android, and iOS, although even Dart to native is not likely as performant as output to C.
  2. Being able to output Swift language code would then make it easy to build macOS and iOS applications.
  3. Being able to output F# code as you have done in your linked repo would then allow building all kinds of DotNet applications.
  4. Being able to output a subset of Scala code would allow one to build applications to run on the JVM.
  5. And so on and so on…

I see that you’ve taken a stab at some of these (even Roc lang), but all except F# were before you had your type inference package, so they are much more limited. As I said above, it would seem to me that one could more easily just modify the current Elm compilers, whether written in Haskell or in Elm itself, to then already have full syntax parsing and type inference without having to call them in yourself, and in fact would have a fully checked AST on which to base the code generation.

With a bunch of different code generators, one could see that applications for just about any platform could be written in Elm (or ElmPlus, which would include Ability’s as a form of Haskell’s Type Classes). It seems to me that Elm (or ElmPlus) would be a better choice for a do-it-all functional language than F#, which doesn’t enforce pure side-effect-free functional programming and is too much infected with OOP in order to support DotNet, and even than Haskell, which has never been all that successful for these kinds of applications except for the very best of programmers and is much more reliant on its GC as well as having to translate its non-strict programming model to a strict model (as evidence, GHCJS has never been as successful as PureScript, and an attempt - Eta - to provide a Haskell-like language for the JVM never took off).

Functional programming and especially Elm syntax forever!!!

1 Like