A faster `List.map` for Elm

I thought a lot about this kind of optimization and if/how it could be implemented in current Elm.
tl;dr: You only need to completely rewrite the compiler!

First, we need to formalize the not-quiteness of this kind of tail-recursion. We know it should work for Cons, but the List type is implemented as a library, not compiler magic!

Here is a basic template of what a not-quite-tail-recursive function might look like in Elm:

notQuite x =
   f (notQuite (loopBody x))

f (...) here is some function application that traditionally would prevent tail recursion. As I tried to explain in my example in that other thread, the important fact here is to be able to prove that this function application does not dereference the argument. If it doesn’t, it would’nt be able to tell if we just called it with a pointer to some uninitialized memory, allowing us to initialize that memory later, for example in a tail-recursive call!

Since Elm is such a simple language, here are the rules:

  • Primitive types (Int, Float, String as well as “newtype” wrappers around those) are always dereferenced, since they are not references in the first place.
  • An if ... then ... else ... expression dereferences a value if any of the ... deferences the value.
  • A case expression always dereferences the value being examined, but follows the same rules as if for all of its branches
  • Calling constructors, including the record literal syntax, never dereferences the fields (except when they are primitives). This is the magic sauce that makes tail recursion after cons work!
  • Record extension dereferences the extended record, but not the fields.
  • Function application dereferences the function, but only dereferences the applied value if the function itself dereferences it.
  • Accessing a field of a record dereferences that record.

Note that this is not full lazyness - an expression either fully dereferences a value (think deepSeq), or leaves it fully lazy. It also does not evaluate a thunk - the value has to already be computed prior to executing that expression! There are also no partially computed values, since we try to track everything at compile time. This is especially important in function application - f x may not read x at all, even though f would, because it could also construct a closure intead! For the sake of sanity I strongly advice ignoring this.

Everything else seems pretty manageable, we just track a deref annotation for every expression, and then check if that holds for the f in our example - If it only accesses notQuite (loopBody x) by reference, we can reverse the execution order, as long as we can provide the memory address (the reference) of the result before computing it!

I minor detail I hinted at earlier is that we now have 2 different kinds of functions:

&a -> b -- only access reference to a
!a -> b -- access value to a

I just made up that reference annotation, and it would not be part of the user-facing language, but that means we need to possibly generate multiple versions of our not-quite-tail-recursive functions, for all different kinds of parameters. This also has to do with a limitation in Javascript, where objects are always passed by reference, but primitive values never are.

So now we need to additionally implement monomorphisation and inlining (or alternatively make everything run slow because you box every integer), because you may need to generate multiple versions for foldr, depending on the kind of function f is. Also, type information in the Elm compiler does not live long enough to do any of this…

I gave up.


Edit: This may be one of the things that may be easier in WebAssembly than Javascript - I still haven’t found a good encoding for references to values (numbers) on the stack in Javascript yet, or rather a general encoding for references that isn’t slow. If there was one, the first rule would be lifted, meaning all values follow the same rules and can be references. This in turn would then allow us to only generate 1 JS function per Elm function again.