Introduction
It has been about a year since my thread about the State of Elm and where Evan is going, and at that time there seemed to be quite a bit of interest in my proposal that there be a fork of the Elm compiler that supports more timely updates to Elm’s known issues, supports some often-requested features such as Haskell’s Type Classes (which I called Capabilities or now more favour Abilities as in the Roc language) and a few other minor syntax additions, and that the new language would be a more general purpose language while still being entirely back compatible with Elm. I am opening this thread to declare that I have been working on this project and while I don’t have anything to publish yet, I have made some discoveries on how the new improvements will be implemented that I wanted to share while welcoming any feedback and suggestions. This forum thread will hopefully reach most current Elm users who haven’t entirely moved over to the Roc language camp or elsewhere.
Addressing Previous Concerns
First, in the previous thread there were some concerns that I was setting out to break Elm and its goals as set out by Evan in his previous posts and his talks; that is not the intention of this fork, and in fact, it looks like none of my proposals would prevent the new compiler from being able to do exactly the same job when compiling current Elm projects to produce web pages that still can be run on browsers back to Internet Explorer as they can now. However, new projects that compile using the new compiler are not guaranteed to be able to be compiled with the current Elm compiler due to the introduction of a few new features if such features are used (This is unlike current Zooka language that is currently both forward and backward compatible). Thus, such changes as minor as being able to use ignored underscore separator characters inside numeric literals, use of square brackets symbols as List
Type designators for Type Annotations, and being able to use negative integers as Pattern targets for case ... of
expressions won’t be back compatible, among other similar syntax extensions. However, as long as the file extensions are “.elm” and the project file is “elm.json” with the current format, the output will be a web page exactly as produced now with the current compiler, although one will have the choice of outputting a WASM or ECMAScript backed web page as well as a ES3/ES5 one as now by some simple modifications to the project JSON file and command line options when building the project. If one wants to generate a command line application, a mobile app, a desktop GUI program, or most other things, there will be a extensions to the new project JSON file and format and possibly a new extension to be used for files that extend Elm syntax in significant ways. Thus, if one wants JavaScript FFI, it likely won’t be available within Elm projects other than through the use of ports just as now, but may be made available within the new extended language formats.
Why Not the Roc Language?
Second, it was suggested that if I want a functional language that is more general purpose but still strict (non-lazy) in evaluation, I should join the Roc language team as a contributor. I have done some major explorations of the current state of that language and have even contributed with some issues and explorations with their contributor team; however, it is my current opinion that it is going to take many years (if ever) for the language to reach stability. That opinion is formed for many reasons, as including the following ones:
- It is an entirely new language and breaks many similarities to ML based languages such as Elm in not supporting currying, order of function parameters, general syntax, etc.
- To join the Roc project as a contributor, one should have a knowledge of Rust in which the compiler is written, Zig which is used for many packages, LLVM which is used for the back end, Type Inference which in Roc departs quite extensively from Elm’s and even that of Haskell and PureScript, etc. The result of this is that progress on fixing Roc’s issues is quite slow with insufficient contributors in all of the different areas.
- The branches of the project don’t seem to be well managed in that a “testing” branch which fixed a huge array manipulation performance problem on which I filed an issue still hasn’t been merged back into the main branch after many months.
- The desire to deliver the utmost in performance has led to some severe premature optimizations such as combining the “Morphic lambda set” optimization with the Type Inference phase that has led to a severely broken Type Inference and not clearly demonstrated advantages of the “Morphic” optimizations.
- As to the currently broken Type Inference, the current Roc compiler can’t handle cyclic Type references within recursive functions that Elm can, with the same restrictions on cyclic references of local bindings. Whether this is due to the premature optimizations as in the above point or is due to more major problems in the Type system definition isn’t clear to me, but this major problem hasn’t been fixed in many months since I filed an issue and discussed it with them. The following Elm code just works whereas the equivalent in the Roc language crashes the compiler:
import Html exposing (text)
-- a lazily defined Co-Inductive Stream Type expressing an infinite stream...
type CIS a = CIS a (() -> CIS a)
showNCIS : Int -> CIS a -> String
showNCIS n cis =
let go i (CIS hd tlf) str =
if i >= n then str
else go (i + 1) (tlf()) (str ++ Debug.toString hd ++ " ")
in go 0 cis ""
nats : CIS Int
nats =
let nxt (CIS hd tlf) = CIS (hd + 1) (\ () -> nxt (tlf()))
in CIS 1 (\ () -> nxt nats)
main = nats |> showNCIS 10 |> text
-- produces "1 2 3 4 5 6 7 8 9 10 "
where the Roc language problem arises from the recursive nxt
function using a recursive Type as an argument. Until this is fixed (which I can’t really help with other than reporting the problem), Roc is very much a Work-In-Progress and I only give it a moderate amount of interest, even if the syntax changes from Elm were acceptable.
6. The other advantage of a completely backward compatible ElmPlus language is that it gets to take advantage of all the available version 0.19 Elm packages that don’t require “native” code so would already have an extensive ecosystem of packages that don’t need to be developed from scratch, which problem the Roc language faces.
Proposal of What Extensions To Include
Next, one needs to consider the extensions to Elm that a new language would support without making it harder to learn than Elm. Anything that is added then needs to be considered very carefully as to if they are really needed. The major extensions that probably need to be included are an extended FFI and “Type Classes”.
Extensions such as providing an often-requested FFI may break the “no runtime crashes in practice” guarantee, but should definitely be provided for use that doesn’t involve web pages since the normal “native” back-end of the new language will be C, which can then be compiled to ECMASCRIPT, Wasm, and any machine code target as required. Thus, there definitely would be ways of providing FFI for C code but that may or may not be extended to pure non-ECMASCRIPT Elm-compatible web pages.
The other major request is for some sort of Haskell’s “Type Classes” which can be provided as in the proposed “Abilities”; however, in order to provide the full suite of “Mappable” (equivalent to Functor’s), “MultiMappable” (equivalent to Applicative’s), and “Chainable” (equivalent to Monad’s), and so on, one needs Higher Kind’ed Type’s (HKT’s), and without HKT’s there aren’t all that many things that “Abilities” can do that aren’t already provided by Elm’s constraints, perhaps with some minor extensions. For instance, if the constraints were extended from number
, comparable
, appendable
, and comappend
to include mappable
which would mean that a bare map
function works on any included type, nmappable
, which means that map2
and higher would work with included Type’s, and chainable
, which would mean that andThen
and the proposed chain
would work with included Type’s, there would be much less boilerplate code. However, there is still the problem that this would introduce many more combinations of desired constraints as in appendable
combined with the added ones and become more complex than “real” Ability
’s. So other than the problem of implementation of requiring HKT’s involving the Type Inference phase, it looks like full capability Ability
’s would be a desired feature. It must be noted that just because the new language supports Ability
’s doesn’t mean that the old constraints won’t continue to work for backward compatibility, and the compiler would just convert the old constraints as “sugar” into the new forms of Type Annotations so that use of the comappend
constraint would be converted into (Comparable a, Appendable a) -> a -> ...
to use Haskell syntax. The extensions to elm that I am considering are in an RFC document that I modify as my direction changes.
Implementation Approach
The rest of writing a new compiler just becomes implementation details, and I have considered many of these as follows:
- The biggest amount of work that needs to be done is related to Type Inference especially when adding HKT’s to an already somewhat fragile Elm Type Inference phase: When one looks at the majority of the serious issues filed on the Elm compiler that are complex to fix, the majority are related to Type Inference. I think that the Elm Type Inference phase may need to be entirely re-written and tested against all the edge cases where it fails in Elm as well as whatever new edge cases are introduced when adding HKT’s. I have recently spent some time with the PureScript compiler and tested many or most of the cases where Elm fails Type Inference by converted those cases into PuresScript code to find that PureScript doesn’t fail for those cases even though it includes a more complex Type system including HKT’s, Existential Types, RankNTypes, row polymorphism, and functional dependencies. I may need some help on this other than if the Type Inference phase can just be translated from the PureScript version, although one would have to remove some features that may not be desired such as the last two or three.
- For efficiency when using reference counted memory management, one needs to elide away unnecessary reference counts. This is obvious when one considers the many examples where this is not done, such as the Fable language (F# to JavaScript) being able to produce reference counted Rust code where the performance of the non-optimized reference counted Rust arrays is worse than the performance of JavaScript “TypedArray’s” or even regular JavaScript arrays, and an attempt to make PureScript generate C code using non-elided reference counting instead of garbage collection is a huge amount slower than even the slow JavaScript PureScript produces (PureScript treats all function calls with more than one parameter as a succession of single parameter calls generating closures which is about five times slower than as Elm does it when all arguments are provided in one call). Eliding away redundant reference counts looks to be not that hard given that the Elm compiler already uses Data flow “graph” analysis to determine recursion in that the same code can be slightly augmented to produce topographical graphs that indicate when a value is “last used” and thus can be destroyed or the value “escapes” and may need a heap based allocation and reference count, for which this type of eliding has been done successfully by many compilers for other languages, even imperative ones.
- For efficiency of code generation, the compiler should implement more function inlining, whether automatically performed by the compiler or as flagged by the programmer using “pragmas”; this is particularly necessary when extending function recursion to do tail call optimization when the recursion is across more than one function and was considered by Evan for a future upgrade to the Elm compiler although I have forgotten where he posted this.
- Again, for efficiency when inlining tail call optimizations involving chained functions as used by the
Chainable
(orandThen
) functions, the compiler needs to be able to do “lambda lifting” where functions are inlined to avoid building stack and the inner functions are converted so that the loop created to replace the function recursion uses the internal contents of the Type’s instead of the Type’s themselves as loop parameters along with eliding away code that allocates memory for a wrapping Type and immediately consumes that Type while de-allocating the storage used. - The final optimization for efficiency may be to lift array bounds checks outside of “hot loops” (if the C compilers don’t do this) when possible as when the bounds check is comparing a condition which is inherent to the loop condition expression.
- When the above optimizations have been done, the remaining C code will be optimized by the “tried-and-true” imperative-type optimizations as already employed by efficient C compilers such as GCC and Clang; this is the advantage of generating C code rather than writing one’s own code generation or even using LLVM: efficient code can be readily generated for almost any targets including WASM, many already available C libraries can readily be used through FFI, and code generation for C code is relatively simple as C is actually a fairly simple language and only a subset of what C provides may need to be used.
- Due to outputting C, writing GUI applications for platforms that support WebKit/WebView such as macOS, Linux, Windows, and mobile apps for iOS and Android (although these last may require some “wrapper” code written in Kotlin/Java and Swift, respectively) is a relatively simple matter of changing Elm’s VirtualDOM system to make calls to a WebView interface layer, of which I have found working examples in projects on GitHub that can be used or adapted.
- So, working backwards, I have written some C code examples of what I want the new compiler to generate and tested them even including one of the WebView interface packages as per the above, to find that this manually generated C code has excellent performance for generated native code and even for WASM as generated through emscripten.
Where to Start?
Finally, given that adding “Ability”'s to Elm isn’t an essential part of showing the use of the new language and can be added later as a sub-project to re-work the Type Inference phase as mentioned above, the first step in making Elm a general purpose language is to make it output efficient reference counted C code, perhaps quickly followed by adding LinearArray
’s to the language including in-place mutation when possible. Investigating this is where I have spent much of my time over the past year. At first, I was just going to have an alternate code generation phase that could be triggered by a compiler command line option for the compiler written in Haskell; however, to do that I had to better understand the AST syntax that is being generated by the previous compiler phases and that was difficult as I didn’t know all that much about how compilers worked. So then, I had the idea of translating the Haskell compiler phase code to Elm and thereby understanding the code in spite of the resulting compiler phases being much slower than the Haskell code due to being restricted to Elm’s persistent data types. I was even going to contribute the resulting code back as an Elm package as it might be useful in writing a learner’s Elm IDE that doesn’t require a server to compile; however, @pit has already done this or most of it and he was able to help me to complete my version of the compiler phases. With @pit’s version made public, there didn’t seem much point in developing this as a package, and re-examining the resulting Elm code with the new languages features in mind, I realized that what I had translated would need to be translated again once the new language was in place including Ability
’s, LinearArray
’s, and a new library supporting STRef
’s, which would make the resulting compiler about the same speed as the current Elm compiler. So having achieved the goal of better understanding the compiler through the translation exercise, it seemed better to go back to just adding to the forked version of the Elm compiler written in Haskell starting at the C code generation phase, and that is where I am now.
The Elm code generation is only about 2,400 lines of code but a version to generate C code is likely to be a little bit more due to having to add reference counting and a home-grown version of implementing closure functions capturing bindings external to the function’s scope, so is likely to take a couple of months or more, and then in order to test it thoroughly, the LinearArray
type should likely be added which may add another couple of months. So by mid this year, I might have a repository where interested people would be able to compile and use some Elm-like code or use Elm code to generate C code and through the emscripten compiler to generate WASM code. I’ll post anything ready to be tried here at that time.
Naming of the New ElmPlus Language
I have considered what the new forked language would be called and have considered “elmplus-lang” and “elmp” to pay homage to its Elm origins, but am currently thinking of “fir” and “fir-lang” as starting with the letter “f” and still paying homage to Elm by continuing the arboreal reference, but by being the next letter in the alphabet just as is the word “functional” to indicate that it is both extended while being close to “F#” and not so close to “Haskell”. What do you think?