Where is Elm going; where is Evan going

Hard to see how to do it in a backwards compatible way though? I think at the point when this changes, its a hard fork from Elm into something new.

Or perhaps it will be possible to keep the pesudo type families around but deprecated somehow, so there is a period of transition to the upgraded type system?

I donā€™t see this as an error on Evanā€™s part at all. Elm was built on top of javascript, and inherited some of javascripts warts as a result. Ints are 53-bit in javascript, so thats how we got that mess in Elm too - but it also means we donā€™t need to translate ints when going through a port. I think this was just a smart way to be economical with the implementation work because Elm was not intended to be general purpose, but just for the web.

I think it causes friction with back-end integration occasionally, since there 64-bit ints are used. Although strictly 64-bit ints are not valid JSON, backend APIs do use them.

@rupert

I think that it may be possible to keep the pseudo type families around if the compiler can just translate them into the new Abilities/Capabilities structure: numbers ā†’ Num (in Roc terms); comparable ā†’ Cmp + Eq, appendable ā†’ Listable, etc.

@rupert,

Itā€™s true that integers up to 53-bits can be represented without precision loss in 64-bit floats, but that doesnā€™t make them true integers: when one over or under flows these pseudo ā€œintsā€, then they revert back to float behaviour (as you have seen in your examples) and (runtime) checks to avoid that would be expensive in execution time. They kind of work as integers within the range up to 53 bits, but only when they donā€™t under or overflow and standard math operators are used, so as long as such as addition, subtraction, multiplication, and division involving only pseudo ints (actually, division needs to be truncated to be correct, but that isnā€™t very expensive compared to the division), then they appear correct. Powers on these pseudo ā€œintsā€ also work to the point of under and overflow, in particular of negative powers (which requires division) as you have seen, although this could be fixed by the compiler with the added truncate operation injected, just as truncation would fix division operations when the operands are integers.

What isnā€™t in the current Elm code base is support for the ā€œasm.jsā€ standard supported by all modern JavaScript engines in that when a number can be recognized as a true 32-bit integer, the engine handles it with only four bytes of storage instead of eight and does all kinds of optimizations based on the knowledge that it is one of these; the bitwise operatons of anding, oring, and exclusive oring, as well as bit shifts, always produce these so using apparently redundant bitwise operations gives the engine a clue that integer operations are what is desired, with oring with zero considered to be that a signed 32-bit integer results and exclusive oring with zero considered to be an unsigned 32-bit integer result, with these extra ā€œindicativeā€ operations easy to elide away in the engine processing.

Evanā€™s code supports mostly only pre-asm.js and pre-ES features so doesnā€™t take this into consideration, thus modern JavaScript engines automatically truncate bitwise operations including bit shifts to 32-bits, which Evanā€™s implementation doesnā€™t handle, although very old browsers may still do so.

Modern F#/Fable supports the modern asm.js and ES standards, so doesnā€™t run into this problem, and I think that in this case, that is the model a new Elm-like language should follow, especially when or if the back-end eventually produces JavaScript.

I meant to ask, on your Abilities/Capabilities proposal, what are the implications for type inference? Is full type inference still decidable with this ad-hoc polymorphism?

I agree that fixing Int should be a priority. What Ints do think our imaginary Elm extension should have?

Fixing Int
  • Big Integer
  • 64-bit Int
  • A range of precisions, U8, U16, U32, U64ā€¦ I8, I16, I32, I64, ā€¦
  • Just 32-bit correctly implemented as per asm.js
  • Some other choice, tell us about it in the threadā€¦
0 voters

@Lucas_Payr,

Thanks again for this prompt as it caused me to look into it more thoroughly. I see that UnionFind uses a data structure that is essentially a linked list but that the algorithm uses IORef to make both the contents of each list node (actually both fields in the contents of the node modifiable individually) and the link able to be modified in place, with the modifiable link presumably used so as to be able to add or delete nodes at will without rebuilding the entire list. I donā€™t see that one could replace this with anything with acceptable performance; therefore, one needs to look into how to implement it with the tools at hand.

I suppose one could use ports to pipe the calls to the UnionFind functions written in JavaScript or a language such as Fable, but ports are only allowed in applications, so that would preclude submitting the work as a library package, which was my original intention.

If one canā€™t submit the work as a package anyway, one may as well inject a modified core packages into the application in the same way as the core packages are tested that allows doing this. Once one has gone this far, one may as well add STArray-like contiguous arrays to allow monad controlled array mutation so the Haskell Data.Graph package can be directly translated. Once one has done that, there is only a few AST changes required to mark where data structure escape their scope and end their scope to mark the data structures that need to be heap rather than stack allocated, reference counted, and where the reference counting needs to be injected.

As well, one needs to look into closure recognition in the AST so that they can be emulated in languages that donā€™t have them such as C.

Finally, it is a relatively minor task to change the JavaScript code generation to generate C from the AST, and one has a working Elm-like language that generates automatic memory managed C code.

Since there may be others working on the translation at a more advanced stage than I to which I can contribute when it is published, I think Iā€™ll turn my energies onto the ā€œother stuffā€ modifying the compiler written in Haskellā€¦

You seem to be more knowledgeable about compiler implementations than I as not everyone knows the details of Hindley-Milner Type Inference; my compiler writing abilities have mainly been gained by reading and modifying Evanā€™s Haskell code. Are you doing compiler work either professionally or on your own or as a contributor?

I formalized the Elm language as part of my master thesis. So thatā€™s where the fun fact about UnionFind came from.

5 Likes

Top marks, I never knew about your work until now. This is an excellent resource for those thinking of alternative compiler implementations for Elm, always valuable to have a formal semantics for the language.

@rupert,

Yes, a valuable resource as Iā€™m sure there exist many out there on the Internet, if we could only collate a list of all of them.

Although not so much of an analysis of the Elm compiler as that of @Lucas_Payr, in preparation for my work in writing a self-hosting compiler, I wrote the missing Elm Reference Document by accessing all kinds of sources including the Elm compiler source code and reverse engineering of everything I could think of. If there is immediate interest, I could copy that out of the private repository where it currently resides and make it available on a separate repo.

I also wrote a preliminary RFC of the changes that I considered would be necessary to change Elm into an entirely or mostly backward compatible syntax Elm-like language according to my ideas in the OP, but of course that is a moving target. If we can pull together a team working to do this, I can also make that document available.

This thread has helped re-motivate me to pursue this project as a worthwhile one, and has reminded me that there are many Elm lovers out there that would support it, use it, and contribute to it to the best of their abilities, although unfortunately over the last couple of years many have drifted off. To those reading this, please chip in if you would like to get involved, either with a private PM or on this or another thread mentioning me.

2 Likes

Yes, would like to read these documents.

1 Like

Here they are: Elm+ Documents; let me know if you canā€™t access themā€¦

1 Like

Had a quick look, and I think its also a great resource for the documentation of Elm ā€œas isā€, and for the set of possible ā€œto beā€ ideas. I donā€™t like everything I see in the RFC, particularly towards the end as it gets into HKT and RankN types - I have always appreciated the simplicity of Elm over Haskel, but these things are certainly of value in such a list of ideas whether or not I think it might be going too far. But overall its really a great collection of ideas and very helpful to see them all written out like this.

Do notation sugar? Again, I like how you can fit all of Elm syntax onto a single slide, and adding more syntax which is just sugar would seem to go against the minimalist philosophy. The way I see it, if you give programmers 2 ways to do something that are equivalent, you just give them something pointless to argue over. Take elm-format for example, no options, there is just 1 format; hallelujah!

(Maybe Elm should drop "if ... then ... else" and just use "case b of True -> ... False -> ..." following that philosophy).

Was the ommision of the WebGL GLSL syntax from the language reference deliberate?

@rupert,

its also a great resource for the documentation of Elm ā€œas isā€

Yes, that has gone through quite a few iterations and as it just reflects something that is very stable, it is probably mostly accurate and useful for programmers at all sorts of levelsā€¦

I donā€™t like everything I see in the RFC, particularly towards the end as it gets into HKT and RankN types

Iā€™m still not sure about these either, but wanted to include them in the RFC to bring them up for discussion. For RankNTypes, there is one specific case where they actually make the syntax easier in that one can declare a type as ā€œtype MyType = Construct (a ā†’ b)ā€ with an implicit ā€œforallā€ for a and b (or however many ā€œfreeā€ type variables there might be) that lets these type variables stand for types of any rank, which is handy for a few uses so that one can just use MyType without a Type assigned to the Type variable. It is said that RankNTypes can cause Type Inference to be undecidable for a few cases without type signatures, but I have used this in Haskell (with the forall) and never had this problem, perhaps because I always use type signatures at the global level as is the recommended practice.

As to HKTā€™s, for my purposes their only use is in creating higher order types of Abilities/Capabilities/type classes. Richard Feldman has stated he is against them too, and may get away with it in that he doesnā€™t think Roc needs the mutation monad chains such as ST and that the compiler can just decide when mutation can automatically be determined as in-place because the mutated container is never referenced again. That may not be powerful enough for a more general purpose language, which is the only instance in which I would propose these be used. For instance, if the language is to be self-hosting, it would need some equivalent ability like IORef (or STRef) to implement UnionFind efficiently. It adds little to the syntax, especially as you may have noticed, I donā€™t propose that users be able to create custom Capabilities so their use would be limited to the Core package. Abilities are still an unimplemented proposal in Roc, so its hard to say how it will pan out there, as well.

Even limiting Capabilities to predefined ones still adds a little to the complexity of the language as we still need to document how Capabilities can be added to new custom Typeā€™s (other than the automatic default Capabilities that all qualifying Typeā€™s will have); in actual fact, the language can live without them as does F#/Fable at the cost of extra boilerplate code. However, no Capabilities or limited Capabilities would mean that HKTā€™s arenā€™t neededā€¦

Do notation sugar?

The need for do notation sugar comes up for mutation function chain calls, which are onerous to write without it. Itā€™s not so bad if implemented like Rocā€™s ā€œbackpassingā€ which works pretty well for Roc and could work well for the new language.

If we need them, remember that I propose that all of these features be optional add-onā€™s and donā€™t need to be learned if you donā€™t use them, so wonā€™t impact existing Elm codeā€¦

Roc has taken the general ā€œprovide only one way to do thingsā€ to the extreme in only being able to declare functions in the way that anonymous functions are, etc., yet still provides if/then/else and something like case/of (and even adds condition guards, which I am not proposing although they can be useful); Every language designer has to make a choice, I chose those things for discussion because they are features that can be optional and donā€™t have to be learned or used, and even more to the point, donā€™t require new keywords.

Was the ommision of the WebGL GLSL syntax from the language reference deliberate?

This should actually be included in the Elm Reference as current Elm supports it, and we are going to have to support it to be backward compatible to Elm. My document doesnā€™t cover it because I have had to learn it from looking at the current Elm compiler implementation, but it turns out that itā€™s not too bad. Although fully documenting it as part of the language would defeat your ā€œIt all fits on a single slideā€ criteriaā€¦

Actually, there is a new standard called WebGPU which may supersede WebGL that is likely going to have to be adopted if/when the new language goes mainstream in a few yearsā€¦

My view too.

But there are other possibilities, the technique in the paper I posted may work in Elm as it stands, and the above compiler auto uniqueness stuff may work out sufficiently well to handle it.

I am thinking therefore that there are 2 little mini-projects worth carrying out to verify this:

  1. Implement the algorithm without mutation for UnionFind from the paper I posted in Elm, and see if it works.
  2. Implement UnionFind without mutation in Lean4, (or Roc if it has these optimisations yet?), and see if it works.
1 Like

@rupert,

I canā€™t comment on the algorithm from the paper as I havenā€™t read it thoroughly enough yet.

However, if the Lean4 algorithm works like Roc, whose mutation in-place algorithm has been implemented, it/they work on the basis of the compiler injecting mutation in place when the compiler can show there is never a reference to the old state. However, the UnionFind algorithm is based on mutating intermediate nodes of a linked list structure, and as such there are always references to every node other then when consuming it from its head. Thus, I donā€™t think this algorithm can be written in Roc, nor in Lean4 if it is like Roc.

So the hope is that the algorithm from the paper will do the job, which I can check out starting tomorrow. This is crucial to having a self-hosting compiler (which Roc states it will never have), and therefore worth the advance research. However, failing that, I see no avoiding implementing monadic chain mutation for this purpose, although this can be done by a package and doesnā€™t absolutely require it be covered by a Capability and therefore HKTā€™s. In other words, ugly use of a set of packages for a limited set of use cases in the interests of keeping the general language simple.

You do know that doing in-place mutation as in Roc requires a very specific form of recursive code where the container being mutated is one of the arguments to the recursive loop, meaning that to the outside its as if new containers were constantly being allocated and destroyed at the end of every loop, but in reality are the same container?

Not 100% sure but I think the Lean4 implementation can handle this:

hĢ¶tĢ¶tĢ¶pĢ¶sĢ¶:Ģ¶//aĢ¶rĢ¶xĢ¶iĢ¶vĢ¶.oĢ¶rĢ¶gĢ¶/pĢ¶dĢ¶fĢ¶/1Ģ¶9Ģ¶0Ģ¶8Ģ¶.0Ģ¶5Ģ¶6Ģ¶4Ģ¶7Ģ¶.pĢ¶dĢ¶fĢ¶

Actually better to look at this one:

Slide 23 and the example leading up to it - a list is destructively updated.

I found a little bit of description of how RoC is optimising this here: roc/crates/vendor/morphic_lib/src/api.rs at main Ā· roc-lang/roc Ā· GitHub

Which sounds less sophisticated than what Lean4 has implemented but also the description there is pretty brief so hard to tell. Even the authors of the Lean4 paper suggest that what they have done barely touches what should be possible.

1 Like

I just recently wrote a POC in Elm and Capacitor. Worked great.

@rupert,

Yes, the first link only compares eliding reference counts to not eliding them and shows the execution time advantage.

The second adds some things to this, including the reusing of the object under specified conditions idea that would allow in-place mutations and shows that this properly using optimized ref counts is faster than common Garbage Collection (GC) implementations as in GHC and OCaml; however, we now know this from other language implementations as in Nim and Lobster (likely others).

We already know that we donā€™t want GC but rather smart reference counting, and the simplest way of getting there seems to be as found in this slide show: use information available from data flow graph analysis, which is already mostly available to us (with an added extra easy-to-add topographical analysis feature) in the Elm compiler. I donā€™t see that there is all that much to be learned from the Lean4 compiler other than that they use the technique I already proposed based on the data being acyclic, but what is new is that this same technique can also tell the compiler when a value is unique and therefore can be mutated in place, thus it in turn feeds around and lets one implement the Graph algorithm that needs mutable arrays to be efficient.

However, this does not help the UnionFind algorithm in replacing IORef. So yes, something like this work should be implemented in the new language just as I proposed, but with the added feature that it can also show when arrays can be mutated in place because they are unique (and also when they can be passed to other processes/threads without conflicts and data races for the same reason).

A further little improvement may be that contents of containers are analysed separately from the container so that even though the outer container may or may not be unique, the contents of such a container may be unique and be able to be mutated in place if such contents are never referenced ffrom outside. For instance, a memory pool array only creates and manipulates the outer pool without referencing the contents of each ā€œpageā€, so when the ā€œpagesā€ are assigned they can be freely modified in place at will and then replaced in the pool when they are freed from the assignee.

I found out more about how Roc does this in conversations I have had with them in reporting the very large performance issue with this. Their implementation is very convoluted and they are fighting bugs with it as there are so many cases to it. Because their compiler has been implemented from scratch in Rust rather than translated from an FP language such as Haskell/Elm, the whole mindset is based on owning/borrowing/mutating logic and not on data flow graphi analysis, which I feel is leading them down a rabbit hole of which they can hardly see the bottom. Good luck to them, but the following is an excerpt of what they are finding:

Pretty fugly, eh? Iā€™m hoping our code will be a lot cleaner using data flow graph anaylsisā€¦

These are not exactly identical. For example:

-- CAN'T DO THIS
dataTypeFromString : String -> DataType
dataTypeFromString data_type_string =
    case data_type_string of
        stringFromDataType CrewList ->
            CrewList

        stringFromDataType LocationList ->
            LocationList

        stringFromDataType ItemList ->
            ItemList

        _ ->
            UnknownList

-- "cleaner" to handle this way
dataTypeFromString : String -> DataType
dataTypeFromString data_type_string =
    if stringFromDataType CrewList == data_type_string then
        CrewList

    else if stringFromDataType LocationList == data_type_string then
        LocationList

    else if stringFromDataType ItemList == data_type_string then
        ItemList

    else
        UnknownList

Though I do understand using ifā€¦ thenā€¦ else you lose the benefit of the compiler verifying all variants are handled; trade-offs.

1 Like