I thought a lot about this kind of optimization and if/how it could be implemented in current Elm.
tl;dr: You only need to completely rewrite the compiler!
First, we need to formalize the not-quiteness of this kind of tail-recursion. We know it should work for
Cons, but the List type is implemented as a library, not compiler magic!
Here is a basic template of what a not-quite-tail-recursive function might look like in Elm:
notQuite x =
f (notQuite (loopBody x))
f (...) here is some function application that traditionally would prevent tail recursion. As I tried to explain in my example in that other thread, the important fact here is to be able to prove that this function application does not dereference the argument. If it doesn’t, it would’nt be able to tell if we just called it with a pointer to some uninitialized memory, allowing us to initialize that memory later, for example in a tail-recursive call!
Since Elm is such a simple language, here are the rules:
- Primitive types (
String as well as “newtype” wrappers around those) are always dereferenced, since they are not references in the first place.
if ... then ... else ... expression dereferences a value if any of the
... deferences the value.
case expression always dereferences the value being examined, but follows the same rules as
if for all of its branches
- Calling constructors, including the record literal syntax, never dereferences the fields (except when they are primitives). This is the magic sauce that makes tail recursion after cons work!
- Record extension dereferences the extended record, but not the fields.
- Function application dereferences the function, but only dereferences the applied value if the function itself dereferences it.
- Accessing a field of a record dereferences that record.
Note that this is not full lazyness - an expression either fully dereferences a value (think
deepSeq), or leaves it fully lazy. It also does not evaluate a thunk - the value has to already be computed prior to executing that expression! There are also no partially computed values, since we try to track everything at compile time. This is especially important in function application -
f x may not read
x at all, even though
f would, because it could also construct a closure intead! For the sake of sanity I strongly advice ignoring this.
Everything else seems pretty manageable, we just track a
deref annotation for every expression, and then check if that holds for the
f in our example - If it only accesses
notQuite (loopBody x) by reference, we can reverse the execution order, as long as we can provide the memory address (the reference) of the result before computing it!
I minor detail I hinted at earlier is that we now have 2 different kinds of functions:
&a -> b -- only access reference to a
!a -> b -- access value to a
So now we need to additionally implement monomorphisation and inlining (or alternatively make everything run slow because you box every integer), because you may need to generate multiple versions for
foldr, depending on the kind of function
f is. Also, type information in the Elm compiler does not live long enough to do any of this…
I gave up.