Is there any downside to elmlang being a garbage collected language?

Is it possible to transform a language like elmlang from being garbage collected into non garbage collected or is there something inherently making this not possible. Would there be any advantages if elmlang would be gc? How would a backend ready elmlang version without gc perform? If starting creating a new language from scratch today, do you think gc would best fit an elmlang type language best? Thanks a lot for your thoughts.

3 Likes

It is probably possible to do some optimisation work on the Elm compiler to reduce the amount of garbage created. If a value is created but is not accessible outside of the scope in which it is created, then space for this temporary value can be allocated on the stack and automatically recovered when the function exits. I have not looked into the generated JS code that the current Elm compiler produces to get an idea of how much of this kind of optimisation it does.

This kind of optimisation, and others, is simpler in an immutable functional language like Elm. In fact languages with mutability are usually compiled through SSA, which stands for single static assignment. Behind the scenes, this creates a new variable every time a value is updated, so that each variable can be thought of as only referring to a unique immutable instance of a value - effectively turning a language with mutability into one that can be analysed as though it were immutable.

If a function returns a value, or if values end up in some longer lived scope like the Model in Elm, then as they escape the scope in which they are created, memory needs to be allocated on the heap for them, and the simple push-pop mechanism of a stack no longer works to eliminate garbage collection.

So the answer is no, Elm could not be made garbage free.

  • Garbage collection pauses.
  • Occasionally running into pathological code cases where GC runs too often resulting in poor performance.
  • GC is complex - writing a new compiler or runtime for the language is a lot of work. The current Elm compiler takes advantage of what already exists for Javascript.

Almost certainly, it is likely unavoidable and the alternative is to have the coder explicitly allocate and de-allocate, which is hard to get right and often leads to bugs.

5 Likes

First of all thanks a lot for your great reply.

I am just thinking of what and elm-like language would look in the context of iOS architecture. iOS is using a simpler automatic reference counting approach (ARC) which is the reason why iOS apps need less memory to run smoothly vs Android where because of the java inherited gc the double of ram memory allocated can’t prevent apps or the system from freezing.

And more so in the context of Rust which is not gced.
Here an excerpt from an article b Steve Klabnik:

Similar to the dichotomy between static and dynamic typing, I think that Rust is proving out a new niche in the concept of garbage collection. That is, historically, we think of GC as something dynamic , that is, it involves a run-time component that does stuff at runtime. However, the idea of automatic memory management doesn’t inherently mean that it has to execute at runtime.

Could gc be transformed into a static and functional process? This is the perspective i am looking at it from. I am not a profi so sorry if what i’m saying doesn’t make sense overall. And again i appreciate your thoughtfulness. And i think the more the fp principles will be applied to software systems the more efficient they will be.

2 Likes

You can do garbage collection with reference counting - that would work with Elm.

1 Like

I did an experiment last year to implement some of Elm’s data types in C, and I made a basic GC as part of that. https://github.com/brian-carroll/elm_c_wasm

I thought a bit about the question of whether it could be done like Rust instead of using a GC.

So languages use three ways to manage dynamically allocated memory

  1. The C approach where the programmer calls free explicitly

  2. The Rust way where you have a concept of “lifetimes” built into the language so that the compiler can insert free calls automatically at the point in the program where the value’s lifetime expires.

  3. Use a garbage collector to do it at runtime. So you detect when values become unreachable rather than working it out in advance.

I think you need the “lifetime” concept to do what Rust does. And Elm doesn’t have a concept of lifetimes.

But the other thing is: would you actually want this in Elm? It doesn’t have mutations so it creates new values all the time. You have to clean up all the dead ones, it’s just a question of when. Do you want to clean up as soon as possible, or do you want to wait until some time when the program is not doing anything more important? Like, for example, after an update cycle? At that time, the user is just sitting there looking at the UI for a while, and your program is not doing anything. It’s the perfect time to collect garbage.

So in summary I don’t think you could do it without changing the language and I don’t think you’d necessarily want to either. I think this comes down to the fact that Elm and Rust are designed for different use cases.

3 Likes

I think for an end user of the language, you wouldn’t even need reference counting, as the language currently is. The garbage collection in elm is largely just the call stack.

I’m not an expert on this stuff, but my understanding is that GC only becomes an issue with heap allocated stuff. You could solve this with special types (as Rust and some experimental Haskell builds do), but Elm delegates any kind of thing that you’d need a heap for to message passing (via tasks) anyway, or hides it in the runtime (like the virtual dom).

Since any mutation is explicitly isolated, and handled by core team and elm-exploration authors anyway, presumably that stuff could be written in C or rust or objective-C by the language authors, and could be managed manually.

It would be interesting if elm ever had to implement a concept of a mutable type, but I suspect that would have to be handled as a Task, and again, the author of the runtime could manually manage the memory.

But there are smarter people on me here who may think what I just wrote is wrong/jibberish.

1 Like

@Dan_Abrams I think there are a couple of misunderstandings there.

Even in languages that use reference counting, it doesn’t appear in user code. Reference counting in this context refers to a particular type of garbage collector. So it’s implemented by the runtime, not by user code.

Actually Elm programs have lots of values allocated on the heap. Go to https://package.elm-lang.org/, open Chrome Devtools, click on the Memory tab, click on Heap Snapshot. I see 7.5MB of heap memory. You can browse around and see where in the compiled elm.js file each object comes from.

Each function has its stack size allocated at compile time. Any local variable whose byte size the compiler can’t calculate in advance must be allocated on the heap. In practice the stack tends to mainly be used for pointers and integers. Data structures tend to be on the heap.


So the original question was: GC works to manage Elm’s heap, but would a Rust-like system work instead?

I think answering this requires a good understanding of the Rust lifetime system and I’m not sure how many people in the Elm community have that knowledge.

I do know Rust needs you to manually specify lifetimes of variables in some you can leave them out in other situations where the compiler can unambiguously work it out for you. So would you need to add lifetime annotations to Elm? Or does every Elm program already satisfy the conditions where the compiler could work it out? I think that’s what @stefanr0sca’s question really boils down to. Those conditions are specified in the Rust book but I’m not sure how to translate them to Elm!

For my education, because his is an area I’m interest in, but am still learning about…

Is the allocation of data structures, etc, on the heap, a consequence of compiling to JS instead of say, C? Given Elm’s semantics and types, if compiling to a non-GCed intermediate representation, would those allocations have to occur on the heap, or be preferable to, or is it because JS has to allocate that way given its semantics?

Which types/data structures couldn’t Elm reasonably know the size of at the time of allocating the stack?

If possible, wouldn’t keeping everything on the stack simplify GC and improve concurrency semantics?

I know these may be difficult questions to answer, especially given the hypothetical nature of the question.

1 Like

Yeah it’s interesting!
The main point here is not dependent on the target language, it’s fundamental. Let’s take some example Elm code.

f n =
  let
    hellos = List.repeat n "hello"
    s = String.join " " hellos
  in
    s

At compile time you don’t know how many Cons cells to allocate for hellos, nor how many bytes to allocate for s. The value of n might depend on some user input, or some JSON received from a server, or some random variable. In general, it’s only known at run time, so it can only be allocated then. That means it must be on the heap, regardless of the language.

But what if f is only ever called with a constant for n? Well some compilers can check for situations like this, and then do an optimization where they decide to put it on the stack instead. But in the general case, it goes on the heap. There’s some limit on how good a job the compiler can do in finding such optimizations.

OK so in any language, after every call to f, you get a new hellos on the heap. As soon as f is finished executing, we’re done with hellos. That piece of heap memory has become “garbage”.

  • If you implemented f in C, you’d have to add an extra line at the beginning calling malloc and one at the end calling free. After calling free(hellos), that heap memory would be available for some other function to use.

  • If you implemented it in Rust, the compiler would automatically figure out at compile time that it could free up hellos at the end of f. It would insert the equivalent of C’s malloc and free calls into the compiled machine code for you. You don’t have to put it in your source code. In other words, Rust does a form of automatic memory management that does not need a GC. This is the innovation that makes Rust a big deal. It’s huge.

  • If you implemented it in JavaScript (or Python / Haskell /Java), the memory for hellos would stay in the heap until the next GC cycle.

So the GC languages are sort of “clogging up” the heap with this data until the next GC cycle. But on the other hand, C has to call malloc, which will spend some time searching for an available location in the heap that’s big enough for hellos. Rust has to do the same.

Fundamentally, this allocation and de-allocation work has to be done some time. There are just different approaches to scheduling it.

So now with better context, here’s a summary of my previous comments:

  1. For Elm’s use case, the GC approach to scheduling the allocation and de-allocation is actually really great. UI programs work in bursts, based on events. We don’t want to spend time searching for free memory in the middle of an Elm update, which is what C or Rust would do. That’s the worst possible time for us. We want to postpone it until the browser is idle, which is what happens today. Yay!

  2. The Rust approach is only possible if the compiler can work out where to automatically insert the calls to the allocation and de-allocation functions. The Rust language can’t do this 100% automatically. Sometimes it requires you to put hints in your code. Would Elm source code need similar hints to do this? I can’t fully answer this because I don’t know Rust well enough.

And with that, I shall put down my computer and go outside! :sweat_smile: Didn’t intend to write this much!

6 Likes

You wouldn’t be able to avoid this in C. Even in Rust, immutable datastructures are implemented using reference counting.

Then there’s the question of what garbage collection to use. In theory Elm could get by reasonably well by using reference counting as it’s pretty hard (impossible?) to create cycles (two objects with a reference to each other). There are tradeoffs though. Performance tends to be much better with a tracing, generational garbage collector. However, such a collector is more unpredictable (pauses can show up whenever) and you need to allocate more memory than your program actually needs to be effective. A tracing gc also has lower overhead (1-4 bits per object, depending on method used) compared with reference counting (usually a short int in addition to a pointer, to count references) meaning your objects are more likely to fit in cpu cache (performance).

2 Likes

If you restrict yourself to single thread, refcounting is relatively cheap…

Wrt the lifetime being tied to the stack, this is true until closures come into the picture, as they can capture (and thus point to) some arbitrary amount of stuff (think something that closes over a function). That said, most of the values can be kept on the stack and, cycles being impossible, freed quite fast

In order to avoid mallocs and frees in C similar objects (structs) can be stored in lists instead of freeing them so they can be reused (instead of a new alloc).

But the most important question is if all such optimizations would have a significant impact on the performance of Elm programs. And since Elm is compiled to JS the solution needs to be based on JS.

Anyhow, I generally prefer reference counting since it’s totally predictable, works even if objects are referenced remotely and a GC has to do more than incrementing and decrementing an integer.

I agree to @robin.heggelund : since there should be no cyclic object references in Elm no GC should be needed.

A generational GC usually does way less work than a reference counting gc.

A generational GC can use a bump allocator (as the heap is pre-allocated) which is pretty much the fastest allocation method there is.

A generational GC doesn’t have to care about de-alloction. Garbage is simply overwritten. An example of this is linked lists. If a linked list becomes garbage, a reference counter will have to run through the entire thing, de-allocating both the cons cells as well as the data it is pointed to. In a generational gc, a linked list becoming garbage is a no op.

Old objects are scanned much less frequently in a generational gc, whereas they’re treated the same by a reference counter.

A reference counter also doesn’t de-fragment memory, while a generational gc does.

The problem with generational gc is unpredictable pauses. But performance is way better, especially in a pure language like Elm where new objects/structs are created and discarded constantly.

Another issue is complexity of implementation, but that’s fortunetly something the users of Elm don’t have to care about :slight_smile:

2 Likes

As long as Elm only transpiles to JavaScript, there is little point to considering non-GC’ed memory management as Elm gets it from the JavaScript implementation(s).

However, it seems Evan is considering reference counting as an alternative, which is why version 0.19 forbids cyclic data except at the global level (where memory can only be released at program end anyway); this idea is likely so Elm can support compilation to WebAssembly which (currently) doesn’t have built-in Memory Management (MM).

Reference counted MM would be quite easy for Elm as all heap based data structures are built from fairly small units, including arrays which are made from a tree of associative elements with a group size of 32; this means that allocations could be from pools of fixed size groups and could be very efficient. Thus, it is likely that the performance could be as at least as fast as the JavaScript GC’s. It would also have the advantage that multi-threading could be directly supported when the WebAssembly spec is expanded to support that “any day now”.

The main disadvantage of reference counted MM is that cyclic data structures can’t be disposed, but Evan has already removed that problem.

There is absolutely no possibility that Evan would allow the complexities of the “kind of” affine type system using ownership lifetimes as per Rust; besides, there is no need as part of its benefit in controlling mutability isn’t required for Elm which doesn’t support mutability. Then, as mentioned by others, there are all the cases for functional programming such as our use of closures and data structures such as lists whose elements may have multiple owners that don’t work for the Rust lifetime model and reference counting (RC) must be used anyway.

The beauty of it is that there should be few if any changes required for user code.

Thus, there are few disadvantages of Elm using RC MM for compilation and considerable advantages in combination with output to WebAssembly, including higher performance and growing away from the idiosyncrasies of JavaScript.

5 Likes

There are several disadvantages with RC:

  1. Increase in asset size (you need to add increase_reference and decrease_reference calls everywhere references switches hands, unless you know it can be elided, which is more difficult in a pure functional language like Elm).
  2. Throughput (performance) is decreased (all those increase/decrease reference calls do add up), but you’re less likely to have sudden pauses/stutters.
  3. Supporting threads is likely to require atomic operations to increase/decrease reference count, which is slower than normal. Depending on how Elm would support multiple threads, this might be possible to reduce the impact of.

In addition, engineers working on the V8 engine recently said at Google IO 2019 that Wasm isn’t faster than JS, but it should be easier to stay on the “happy path” with regards to performance when using Wasm. I’ve already identified several things which would allow the Elm compiler to produce JS that runs 100-150% faster than today by changing how the JS is emitted. This has required some research. Staying at peak performance within Wasm would be more intuitive.

The benefits of Wasm is a potentially reduction to asset size (depends on how much we would have to implement at our end, if we need to implement our own GC/RC scheme, those gains disappear fast) and escaping wierd JS semantics (like all numbers being doubles).

There’s no guarantee that Wasm brings better asset size or better performance.

RC vs tracing GC is a trade-off. Which is better depends on the domain, but there’s a reason most (all?) functional languages rely on a tracing gc over reference counting, and it’s not about cycles (after all, python uses reference counting with cycle-detection, it’s a solvable problem).

4 Likes

Thanks for the “Elm group” current ideas on GC versus RC and WebAssembly. I’ve seen some gains using WebAssembly over JavaScript and see some potential further gains as browser WebAssembly compilers improve; interestingly, Google Chrome is the slowest at it, Firefox close but faster, and the soon-to-be-defunct Microsoft Edge version the fastest for what I was doing. I also see that there could be gains in the use of JavaScript and that reduction of load size through using WebAssembly only makes sense if something other than GC is used (or the WebAssembly spec finds a way to tap into the JavaScript engine’s GC). RC may be of some benefit rather than implementing GC in WebAssembly as it is only a hundred lines of code or so, a little more with Allocator.

Yes, the overhead of reference counting is a concern, especially if it needs to be atomic, but I’ve been working with the “Nim group” in considering the concept of ownership as a means of eliminating reference counting in many cases - it can’t always be used as for nodes of lists or complex linked structures (perhaps covered in your “functional use” consideration) which typically need multi owners, but works quite well the majority of the rest of the time.

In my experiments, the execution overheads of reference counting seemed to be about the same, sometimes more sometimes less, than the overheads of GC and since Elm has already rejected cyclic data, those problems with RC don’t exist.

As you say, it’s possible to do cycle detection in RC as done in Python, but isn’t it expensive for, say, a long cyclic list; this wouldn’t be a major concern for Python which isn’t very “functional” not very performant anyway.

I’m not totally “pro RC” but it is relatively simple to implement, provides determinant destruction, and although most/all functional languages currently use GC, Elm is a good candidate to try something else because it’s data structures are all based on something very simple. I am “pro WebAssembly” for your given reasons provided either it’s spec grows to include some support for memory management through built-in GC/RC or we can find an acceptable way to implement RC in WebAssembly; if the latter can be accomplished it would give Elm the jump on other Web transpiling languages.

3 Likes

I just wanted to make sure that people didn’t think porting to wasm would be all upsides, there are uncertainties and it is a ton of work.

Same with RC, i’m not against RC but I’ve also read a lot about it to know that the sheer number of allocations we do is a bad case for it, unless you can do a bunch of clever optimizations. Meaning it might not be so simple after all.

I don’t think Python’s cycle detection is that expensive. If I remember correctly it does something along the lines of auto-release pools in obj-c, but checks for mutual references at the same time. In both cases, you loose the predictability that many like about rc though.

1 Like

Great, Robin, we are on track.

Evan seemed to be all pro RC a while ago which was one of his main reasons for eliminating cyclic data except at the global level, which I surmised was in the interests of being able to use RC with WebAssembly.

Personally, it doesn’t matter what Python does as I find I don’t use it anymore, with other languages just as easy to use and faster. If that method of cycle detection makes destruction indeterminant, I wouldn’t want it. Besides, we don’t need it if we don’t allow cyclic data as currently.

I can see that getting Elm to output WebAssembly would be a lot of work, but in my humble opinion likely worth it if only to get away from the idiosyncrasies of JavaScript; as well, WebAssembly engine’s will get better including supporting native 64-bit.

Until Elm supports WebAssembly, I do my investigations using Nim, which can compile to WebAssembly through Emscripten and in which one can easily implement RC through its built-in destructor/copy/move semantics. So far, performance doesn’t look too bad as compared to Nim’s current default GC, and there are optimizations one can try as in a concept of ownership allowing reference counts to be elided and tuning one can do simply using memory pools with the Allocator.

I suppose some of these might be applicable to Elm in its use of RC with WebAssembly; I don’t think we’ll be writing our own GC in WebAssembly anytime soon.

EDIT ADD:. Forgot to add that Elm JavaScript output would be quite a bit faster if it didn’t insist on ES3 as all mainstream browsers have moved well beyond ES5 to ECMASCRIPT2015 and beyond; using this and asm.js output can make about a 100% difference in speed. I know this because Nim has a “native” JavaScript “1.5” backends which is about this ratio slower than going though the Emscripten backend set to output ECMASCRIPT/asm.js.

EDIT ADD 2: The following code is a fairly good purely functional way of calculating the primes by creating a stream of them using the Sieve of Eratosthenes, and is much better than the way that is in the current “math” library which just tries to do in Elm what an imperative language would do. It is a little slower than using a Priority Queue or a functional Hash Table as in Dict, but is excellent for testing the speed of many small allocations. It is derived from the Richard Bird sieve but has “infinite tree folding” added for efficiency, and is quite short. To sieve to a million as here it needs about 1.62 million allocations/deallocations (one for each “cull” and double that because each cull needs a closure environment allocation/deallocation). The code is as follows:

type CIS a
    = CIS a (() -> CIS a) -- an "infinite" Co-Inductive Stream...


countCISWhile : (a -> Bool) -> CIS a -> Int
countCISWhile pred cis =
    let
        go (CIS v cf) cnt =
            if pred v then
                go (cf ()) (cnt + 1)

            else
                cnt
    in
    go cis 0


primes : () -> CIS Int
primes () =
    let
        merge ((CIS ahd acf) as a) ((CIS bhd bcf) as b) =
            if ahd < bhd then
                CIS ahd <| \_ -> merge (acf ()) b

            else if bhd < ahd then
                CIS bhd <| \_ -> merge a (bcf ())

            else
                CIS ahd <| \_ -> merge (acf ()) (bcf ())

        pmlts p =
            let
                inc =
                    p + p

                nxt c =
                    CIS c <| \_ -> nxt (c + inc)
            in
            nxt (p * p)

        allmlts (CIS p ptlf) =
            CIS (pmlts p) <| \_ -> allmlts (ptlf ())

        pairs (CIS xs csstlf) =
            let
                (CIS ys rsttlf) =
                    csstlf ()
            in
            CIS (merge xs ys) <| \_ -> pairs (rsttlf ())

        cmpsts (CIS (CIS c ctlf) csstlf) =
            CIS c <| \_ -> merge (ctlf ()) <| (cmpsts <| pairs <| csstlf ())

        minus n ((CIS c ctlf) as cs) =
            if n < c then
                CIS n <| \_ -> minus (n + 2) cs

            else
                minus (n + 2) (ctlf ())

        oddprms () =
            CIS 3 <| \_ -> minus 5 <| cmpsts <| allmlts <| oddprms ()
    in
    CIS 2 <| \_ -> oddprms ()


calculation : () -> Int
calculation () =
    countCISWhile ((>=) limit) <| primes ()

The above uses about 8.75 million allocations/deallocations and thus as little as about 300 CPU clock cycles per allocation, which isn’t bad but isn’t that great either (typical for environments which aren’t optimized for the requirements of functional programming with many small allocations). It is about 25% faster using asm.js and potentially many times faster using WebAssembly with a simple custom allocator using memory pools, even with RC. An implementation you can play with is at this Ellie link.

EDIT ADD 3: Corrected the above as I forgot the the allocations are many more than a linear relationship since the infinite tree structure has a O(log(n)) execution due to the nesting of the tree structure containing elements for each of the base primes up to the “limit”; thus 8.75 billion allocations including the closure environments and about 300 clock cycles per allocation.

4 Likes

Having spent a long time on GC in various languages, here is a quick collection of notes on the points presented here. This discussion has already well covered why some form of non-stack-based allocation is needed, so I’m going to stick to notes on how that memory gets managed:

  • Apple’s Automatic Reference Counting is reference-counting with the count operations automatically inserted and compiler intelligence about deferring decrement obligations in ways that allow them to be coalesced. For example, if one returns a reference-counted value from within another object, it should have its count incremented (in case the parent goes away or mutates its reference) and that incurs an obligation to perform a countervailing decrement at some point.

  • Auto-release pools were a mechanism for handling those decrement obligations at the point where they were incurred. They traded some promptness in space recovery for ease of coding.

  • This points to the general spectrum that applies with respect to memory usage and storage management. Precise memory management would use no more storage than was actually reachable at any moment but that requires either static analysis of ownership or aggressive book keeping. If we get more flexible about memory consumption we can reduce the need for this book keeping at the expense of using more memory. Auto-release pools (and ARC) are a step along this path. GC tends to be a further step along this path.

  • Pauses are a fact of life in many systems. In a reference counted system, freeing a large data structure can introduce a pause while the structure is traced and all of the pieces are returned to the allocator. Incremental GC systems can often largely avoid pauses by interleaving tracing or storage reclamation with other work. There is often a brief atomic phase to scan structures like the stack where we don’t want to incur the book keeping costs of incremental collection, but this doesn’t have the impact of stopping the world. And then there are systems that stop the world to do a collection because it’s just easier and you avoid having to adjudicate between the collector and the mutator. You probably want to stay away from such systems if possible unless pauses are acceptable.

  • Both reference counting and mark-sweep collectors have negative impacts on memory caches and paging because they can reference memory that wouldn’t otherwise be referenced to access reference counts or mark bits.

All that said, automatic storage management results in cleaner languages and efforts to be more precise probably often in practice result in excess book keeping or copying thereby failing to achieve the precision they seek. One should just remember in any language that storage allocation (and reclamation) can have a big performance impact and in time critical code can be worth trying to avoid.

Mark

4 Likes