Tail-call optimization

Hi all :wave:

I wrote about how tail-call optimization works in Elm, why it’s tricky, and why you won’t need to worry about it anymore.

I hope you enjoy the article and my new package!

18 Likes

I noticed an error in one of my explanations, because I had a wrong understanding of it.

I was looking at elm-community/list-extra's isInfixOfHelp function (it’s the recursive helper of isInfixOf).

isInfixOfHelp : a -> List a -> List a -> Bool
isInfixOfHelp infixHead infixTail list =
    case list of
        [] ->
            False

        x :: xs ->
            if x == infixHead then
                isPrefixOf infixTail xs && isInfixOfHelp infixHead infixTail xs

            else
                isInfixOfHelp infixHead infixTail xs

I was thinking that the last branch checks the rules I wrote down in the article, but the first call happens inside isPrefixOf infixTail xs && isInfixOfHelp infixHead infixTail xs, which does not check the rules. Therefore, this function should not be tail-call optimized.

So I’m looking at the compiled output and I see an unexpected while loop.

var $author$project$Main$isInfixOfHelp = F3(
	function (infixHead, infixTail, list) {
		isInfixOfHelp:
		while (true) {
			if (!list.b) {
				return false;
			} else {
				var x = list.a;
				var xs = list.b;
				if (_Utils_eq(x, infixHead)) {
					return A2($author$project$Main$isPrefixOf, infixTail, xs) && A3($author$project$Main$isInfixOfHelp, infixHead, infixTail, xs);
				} else {
					var $temp$infixHead = infixHead,
						$temp$infixTail = infixTail,
						$temp$list = xs;
					infixHead = $temp$infixHead;
					infixTail = $temp$infixTail;
					list = $temp$list;
					continue isInfixOfHelp;
				}
			}
		}
	});

As it turns out (and my apologies to Evan who correctly explained it in his article), any recursive call that can be tail-call optimized will be tail-call optimized, and the others won’t. Having a single function call that doesn’t check the rules doesn’t de-optimize the whole function. Therefore, it is possible to have partially tail-call optimized functions. Which is great because it means it’s not a “on or off” feature, and you can sprinkle small optimizations here and there.

I have added an errata to the article and fixed the previous erroneous explanations.

For the ones who have tried the elm-review rule already, no worries. The rule by chance already reported the right thing, so it’s not like you changed anything that did not need changing. I will probably change the wording and the documentation when I have a little bit more time.

5 Likes

Great work, thank you so much!

1 Like

Does this mean the opt out mechanism should be moved from function “top level” somewhere closer to the recursive call itself?

Feedback unrelated to your amazing work:

Since TCO is not always applicable (I’m pretty sure there are theorems saying that not every function can get this optimization)

Actually, this is the contrary. You can always turn a recursive function into a “loop-based” function, at the cost of simulating the call stack. At the end, the only thing the processor will do will be conditionally jump into function pointer start until the the “base case” is reached (which really is a loop!).

“Simulating the call stack”, is generally made with the accumulator argument. In the case of the factorial function, this call stack can be reduced to a simple integer (which may quickly overflow however!). When working with lists, most of the time we want to have TCO, the accumulator is a simple list.

Well, every time you want to “force TCO”, you move some memory management from “function call mechanism” to some “manual memory management”. But I totally agree with the end of your sentence, this is not easy:

Since TCO is not always applicable […] or easy

Edit: a more “theoritical” approach would be to point out that a language with a while loop is “Turing complete” so can express any program written with recursive functions.

1 Like

I would like to mention that this is often worth doing. The call stack is limited in size, and moving the accumulator onto the heap will allow much deeper recursion.

It can also offer flexibility if you substitute different stack implementations, FIFO, LIFO, orderered etc. Only Last-In-First-Out is a stack and simple to implement using a List, but the others are handy to know about for more exotic algorithms.

Also occurs to me that your elm-review code identifies when Elm is able or not able to do a TCO, and some of the not-able cases could actually be transformed into ones that do TCO. If that code existed within the compiler instead, it could probably identify more cases that could be TCO automatically.

An easy one to do would be the |> pipe operator, which catches me out on TCO quite often.

1 Like

It does mean that that would be a potential alternative way to do it yes, though it would be a lot more annoying and intrusive to add disable comments everywhere. Also, I’m worried that comments could be a bit hard to position, as I described in this section of the elm-review docs

I think I’ll keep it the way it is today and see how people react to this rule :thinking:

This is good to know! @rupert Do you mean it’s worth it to prevent stack overflows, or because it also improves performance in most cases?

I think that it would be awesome to have an article explaining how to turn a non-TCO function into a TCO function using these different stack approaches, how to benchmark the performance, test that there is no stack overflow, etc.

Also, learning how to do recursion for tree-like shapes would be awesome, for algorithms like the following:

treeVisit tree =
  case tree of
    Leaf value -> [ value ]
    Node value left right -> value :: treeVisit left ++ treeVisit right

I’d love to link that from the article and/or review documentation, and to personally learn from it :slight_smile:
If someone would like to work on this, that would be a very valuable resource!

Good point! I think this would be a worthwhile area of exploration. The pipeline operator one would be valuable at the very least IMHO.

@jfmengels I was a bit bored, so I wrote this up:

module Tree exposing (..)


type Tree
    = Leaf Int
    | Node Int Tree Tree


visit : Tree -> List Int
visit tree =
    case tree of
        Leaf int ->
            [ int ]

        Node int tree1 tree2 ->
            int :: visit tree1 ++ visit tree2


visitStackSafe : Tree -> List Int
visitStackSafe tree =
    visitStackSafeHelp [ tree ] []


visitStackSafeHelp : List Tree -> List Int -> List Int
visitStackSafeHelp backlog acc =
    case backlog of
        [] ->
            acc

        t :: ts ->
            case t of
                Leaf int ->
                    visitStackSafeHelp ts (int :: acc)

                Node int tree1 tree2 ->
                    visitStackSafeHelp (tree1 :: tree2 :: ts) (int :: acc)


exampleTree : Tree
exampleTree =
    Node 5 (Node 6 (Leaf 3) (Leaf 2)) (Leaf 42)


ex1 =
    -- [5,6,3,2,42]
    visit exampleTree


ex2 =
    -- [ 42, 2, 3, 6, 5 ]
    visitStackSafe exampleTree

I wasn’t bored enough to think how to exactly have the same result regarding ordering. “This is left as an exercise to the reader” as they say in the books!

1 Like

Just to prevent stack overflows - I don’t know if it will improve performance or make it worse. I would guess that it would tend to degrade performance a little, since allocation on the heap will tend lead to garbage collection. Depends on how the GC is implemented though, as a stack-like allocation may be well optimized in some collector implementations.

Mutually recursive functions can be implemented in Elm using a trampoline and continuations. In Haskell, the compiler is smart enough to inline one function into the other and then maybe do TCO. Would love Elm to have this optimization too… :grinning:

For anyone who’s interested, Chapter 5 and 6 of the Essentials of Programming Languages covers this topic very well.

As a practical example, here’s how you can rewrite the sum function from Evan’s Folding Over Trees example in Tail-Call Elimination · Functional Programming in Elm to be tail-recursive and hence turn it into a candidate for tail-call optimization.

sum : Tree Int -> Int
sum tree =
  case tree of
    Empty ->
      0

    Node n left right ->
      n + sum left + sum right

The obvious implementation of sum is not tail-recursive.

n + sum left + sum right

is equivalent to

let l = sum left
in let r = sum right
in n + l + r

i.e. First we calculate sum left and remember n and right since we need them for later calculations. Then, given l, we calculate sum right and remember n and l since we need them for later calculations. Finally, given r, we can calculate n + l + r. Translating that description into code gives us the following function-based implementation.

sum : Tree Int -> Int
sum tree =
  sumK tree identity


sumK : Tree Int -> (Int -> Int) -> Int
sumK tree cont =
  case tree of
    Empty ->
      cont 0

    Node n left right ->
      -- 1. See how we remember n and right.
      sumK left (sumKLeft n right cont)


sumKLeft : Int -> Tree Int -> (Int -> Int) -> Int -> Int
sumKLeft n right cont l =
  -- 2. See how we remember n and l.
  sumK right (sumKRight n l cont)


sumKRight : Int -> Int -> (Int -> Int) -> Int -> Int
sumKRight n l cont r =
  cont (n + l + r)

The extra parameter to sumK is called a continuation and it represents the rest of the computation to be completed.

Note 1

I was surprised to see that the following straightforward implementation didn’t work.

sum : Tree Int -> Int
sum tree =
  sumK tree identity


sumK : Tree Int -> (Int -> Int) -> Int
sumK tree cont =
  case tree of
    Empty ->
      cont 0

    Node n left right ->
      sumK left (\l -> sumK right (\r -> cont (n + l + r)))

Can anyone tell me why?

Note 2

You can play with an example here, https://ellie-app.com/dphT9MxKtQBa1. It includes the data structure based implementation as well.

visit : Tree -> List Int
visit tree =
  visitK tree identity


visitK : Tree -> (List Int -> List Int) -> List Int
visitK tree cont =
  case tree of
    Leaf n ->
      cont [n]

    Node n left right ->
      visitK left (visitKLeft n right cont)


visitKLeft : Int -> Tree -> (List Int -> List Int) -> List Int -> List Int
visitKLeft n right cont l =
  visitK right (visitKRight n l cont)


visitKRight : Int -> List Int -> (List Int -> List Int) -> List Int -> List Int
visitKRight n l cont r =
  cont (n :: l ++ r)

Play with the full solution here, https://ellie-app.com/dpjhNbmRMqLa1.

“Continuation passing style” is an “automatic” and really elegant solution, thank you for sharing it!

That said, I’m not sure it’s really performant in elm: it creates a bunch of partially applied functions objects and then call them which has a non negligible cost compared to simple while loop, (::) an element and pattern match against the head of a list. I’d be really interested in some benchmarks!

Plus, I suspect the solutions using l ++ r to have some quadratic complexity which makes them unusable on big trees (but I didn’t make the exact reasoning about this, maybe it’s wrong).

@dwayne From my understanding, the implementations you’ve shown are not entirely tail-call optimized, and not stack-safe.

For the sum example, everytime you reach a Node, you call sumKLeft and which calls sumK again, adding 2 calls to the call stack. Meaning if you have a sufficiently deep tree, you will create a stack overflow.

So while the solution is more tail-call optimized and more stack-safe than before, I don’t feel like it is fair to really call it tail-call optimized. I don’t know whether the target language of “Essentials of Programming Languages” would optimize this, but Elm doesn’t as far as I understand it.

Actually, sumKLeft isn’t immediately called because it’s partially applied.

Consider the following tree:

tree =
  Node 1 (Node 2 Empty Empty) (Node 3 Empty Empty)

Let’s trace through what happens when sum tree is called:

sum tree

-- by definition
= sumK tree identity

-- since tree is not Empty
= sumK (Node 2 Empty Empty) (sumKLeft 1 (Node 3 Empty Empty) identity)
-- notice sumKLeft is partially applied so we don't go off and compute that function

= sumK Empty (sumKLeft 2 Empty (sumKLeft 1 (Node 3 Empty Empty) identity))
= sumKLeft 2 Empty (sumKLeft 1 (Node 3 Empty Empty) identity) 0
-- now we can compute sumKLeft

= sumK Empty (sumKRight 2 0 (sumKLeft 1 (Node 3 Empty Empty) identity))
-- notice sumKRight is partially applied as well

= sumKRight 2 0 (sumKLeft 1 (Node 3 Empty Empty) identity) 0
-- now we can compute sumKRight

= sumKLeft 1 (Node 3 Empty Empty) identity (2 + 0 + 0)
= sumKLeft 1 (Node 3 Empty Empty) identity 2
= sumK (Node 3 Empty Empty) (sumKRight 1 2 identity)
= sumK Empty (sumKLeft 3 Empty (sumKRight 1 2 identity))
= sumKLeft 3 Empty (sumKRight 1 2 identity) 0
= sumK Empty (sumKRight 3 0 (sumKRight 1 2 identity))
= sumKRight 3 0 (sumKRight 1 2 identity) 0
= sumKRight 1 2 identity (3 + 0 + 0)
= sumKRight 1 2 identity 3
= identity (1 + 2 + 3)
= identity 6
= 6

The key thing to take away from this trace is that every call is a tail-call. Hence, the sum function and its helper functions are indeed candidates for tail-call optimization.

However, I checked the code that Elm generates and it doesn’t eliminate all the function calls:

var $elm$core$Basics$identity = function (x) {
	return x;
};
var $author$project$Sum$sumKRight = F4(
	function (n, l, cont, r) {
		return cont((n + l) + r);
	});
var $author$project$Sum$sumK = F2(
	function (tree, cont) {
		sumK:
		while (true) {
			if (!tree.$) {
				return cont(0);
			} else {
				var n = tree.a;
				var left = tree.b;
				var right = tree.c;
				var $temp$tree = left,
					$temp$cont = A3($author$project$Sum$sumKLeft, n, right, cont);
				tree = $temp$tree;
				cont = $temp$cont;
				continue sumK;
			}
		}
	});
var $author$project$Sum$sumKLeft = F4(
	function (n, right, cont, l) {
		return A2(
			$author$project$Sum$sumK,
			right,
			A3($author$project$Sum$sumKRight, n, l, cont));
	});
var $author$project$Sum$sum = function (tree) {
	return A2($author$project$Sum$sumK, tree, $elm$core$Basics$identity);
};

So, in that regard, @jfmengels you are right.

The target language is Scheme and I believe most Scheme compilers will optimize it.

1 Like

I would agree that it is possible for compilers to optimize this, and Elm is not as advanced in this area as it could be.

I wondered if what is missing in Elm is the optimization where mutually recursive functions are simplified by inlining one into the other? I tried applying that by hand to your code but that produced:

visitK : Tree -> (List Int -> List Int) -> List Int
visitK tree cont =
  case tree of
    Leaf n ->
      cont [n]

    Node n left right ->
      visitK left (\l -> visitK right (visitKRight n l cont))

And I could not see how to fix that.

So instead, I went back to this idea and maintaining an explicit ‘call-stack’ of nodes of the tree that are pending visiting:

A pending list of nodes to visit later is maintained, and the functions always try to go deeper on the left first, right branches are just pushed onto the stack. This amounts to a depth first search, which yields the list of values backwards, so there is an extra List.reverse done at the end to fix that:

visit2 : Tree -> List Int
visit2 tree =
  visitPending [tree] []
    |> List.reverse

visitPending : List Tree -> List Int -> List Int
visitPending pending accum =
  case pending of 
    [] -> accum
    t :: ts ->
      let
        (nextPending, nextAccum) = visitLeft t ts accum 
      in
        visitPending nextPending nextAccum


visitLeft : Tree -> List Tree -> List Int -> (List Tree, List Int)
visitLeft tree pending accum =
  case tree of
    Leaf n ->
      (pending, n :: accum)

    Node n left right ->
       visitLeft left (right :: pending) (n :: accum)

Here is the Ellie, appears to be fully TCO’ed by Elm, and does not use list appends either.

https://ellie-app.com/dpYyDPd9VHJa1

1 Like