Performance of tail recursive function

Hi. This question maybe one about JavaScript, but I think it can help Elm compiler’s improvement.

I want to know why fibTopLevel performs better than fibLetIn on Google Chrome and Firefox (probably on other browsers too).

fibLetIn n =
    let
        helper m acc acc2 =
            case m of
                0 ->
                    acc

                1 ->
                    acc2

                _ ->
                    helper (m - 1) acc2 (acc + acc2)
    in
    helper n 0 1


fibHelper n acc acc2 =
    case n of
        0 ->
            acc

        1 ->
            acc2

        _ ->
            fibHelper (n - 1) acc2 (acc + acc2)


fibTopLevel n =
    fibHelper n 0 1

The only diffrence between them is whether the helper function is bound in top level or let … in statement.

Also generated JavaScript codes are almost the same.

var $author$project$Main$fibLetIn = function (n) {
	var helper = F3(
		function (m, acc, acc2) {
			helper:
			while (true) {
				switch (m) {
					case 0:
						return acc;
					case 1:
						return acc2;
					default:
						var $temp$m = m - 1,
							$temp$acc = acc2,
							$temp$acc2 = acc + acc2;
						m = $temp$m;
						acc = $temp$acc;
						acc2 = $temp$acc2;
						continue helper;
				}
			}
		});
	return A3(helper, n, 0, 1);
};
var $author$project$Main$fibHelper = F3(
	function (n, acc, acc2) {
		fibHelper:
		while (true) {
			switch (n) {
				case 0:
					return acc;
				case 1:
					return acc2;
				default:
					var $temp$n = n - 1,
						$temp$acc = acc2,
						$temp$acc2 = acc + acc2;
					n = $temp$n;
					acc = $temp$acc;
					acc2 = $temp$acc2;
					continue fibHelper;
			}
		}
	});
var $author$project$Main$fibTopLevel = function (n) {
	return A3($author$project$Main$fibHelper, n, 0, 1);
};

Here is benchmark.
https://ellie-app.com/8dWrK9R93WLa1

Chrome
fib-chrome

Is this with the elm --optimize flag?

--optimize flag makes no difference…

The generated code is extremely similar. Maybe the function in function definition or the size of the outer function are tripping the JITs. Very hard to debug!

Interesting! Performance can be super tricky.

If you’re interested in checking it out further, you could see what code V8 spits out. Robin does that in this article: https://dev.to/skinney/improving-elm-s-compiler-output-5e1h

1 Like

You may be measuring the performance of the garbage collector here - The amount of work the fib function does is incredibly tiny - it manages to calculate the 100th fibonacci number nearly 6 million times per second! On a 3GHz machine, that would mean it takes around 500 CPU cycles per call, which means it takes approx. 5 cycles per loop body. Considering most instructions actually take more than 1 cycle to complete, this should already be as fast as you can get in a “naive” way (Elm is not exactly known for being close to metal, there is not much we can do).

I made a similar comparison using Benchmark.scale here, but for the sake of not running into Infinity, I chose the harmonic instead of the fibonacci series (it works with your original functions as well, I just wanted to avoid floating-point weirdness):

https://ellie-app.com/8fH3hwBNfbfa1

  1. It takes a while until the runs/second behave like you would expect - runtime does not seem to increase 10x if there are less than 1000 iterations happening
  2. Starting at around the same point, the runs/second for both versions are roughly the same

So when doing less than 1000 runs, we do not seem to benchmark our function much at all - It’s pretty much just the overhead of calling things in Javascript. The one difference being that the fibLetIn version also allocates a new function object (“closure”) on the heap on every call, which seems to take the majority of the time if n is small.

Internally, elm-explorations/benchmark basically does the same thing @jreusch did, but it’s hard to “tune” that scaling factor at the library level because you have three basic kinds of benchmarks:

  • math stuff (fib, harmonic)
  • data structures (think tuning Array.get or Dict.get for performance.)
  • people’s app functions (update or similar)

elm-explorations/benchmark is currently tuned to give good results for data structures since that’s the primary optimization opportunity core team members see in Elm at the moment (cf. Robin’s work to optimize Array for 0.19.0.)

That means when you say…

So when doing less than 1000 runs, we do not seem to benchmark our function much at all - It’s pretty much just the overhead of calling things in Javascript.

… the benchmark harness should be taking care of that for you. In this case, it looks like we start to get on a better track for a linear fit around 128 iterations of harmonic—that is, doubling the work done doubles the runtime.

I think there may be an opportunity to find a better algorithm for sample size. If you want to explore this yourself, Benchmark.LowLevel exposes the functions you’d need to create a new runner.

I haven’t done this myself, though, since it’s a lot of work to make sure the math checks out. A bad benchmark can be really misleading. I have also put my work on elm-explorations/benchmark on hold because the time spent in DOM stuff far outweighs algorithmic complexity stuff for most apps.

1 Like

oh, also: when you see less than 99% goodness of fit and the benchmark completes very quickly it’s a good indication that tuning the benchmark may give more accurate results.

Thank you for explanation!

I found out that F3 or A3, helper functions for currying generated by Elm compiler, are bottleneck.

I compared fibLet(1) and fib(1) on Chrome, Firefox and Opera.

var fibLet = function (n) {
    var helper_ = function (m, acc, acc2) {
        helper_:
        while (true) {
            switch (m) {
                case 0:
                    return acc;
                case 1:
                    return acc2;
                default:
                    var $temp$m = m - 1,
                        $temp$acc = acc2,
                        $temp$acc2 = acc + acc2;
                    m = $temp$m;
                    acc = $temp$acc;
                    acc2 = $temp$acc2;
                    continue helper_;
            }
        }
    };
    return helper_(n, 0, 1)
}

var helper = function (m, acc, acc2) {
    helper:
    while (true) {
        switch (m) {
            case 0:
                return acc;
            case 1:
                return acc2;
            default:
                var $temp$m = m - 1,
                    $temp$acc = acc2,
                    $temp$acc2 = acc + acc2;
                m = $temp$m;
                acc = $temp$acc;
                acc2 = $temp$acc2;
                continue helper;
        }
    }
}

var fib = function (n) {
    return helper(n, 0, 1)
}

fibLet performs almost as much as fib. Allocation has no harm, or perhaps the browser make some optimization.

But fibLetWrapped is extremely slow. (differs only in being wrapped by F3 and A3 from fibLet)

var fibLetWrapped = function (n) {
    var helper_ = F3(
        function (m, acc, acc2) {
            helper_:
            while (true) {
                switch (m) {
                    case 0:
                        return acc;
                    case 1:
                        return acc2;
                    default:
                        var $temp$m = m - 1,
                            $temp$acc = acc2,
                            $temp$acc2 = acc + acc2;
                        m = $temp$m;
                        acc = $temp$acc;
                        acc2 = $temp$acc2;
                        continue helper_;
                }
            }
        });
    return A3(helper_, n, 0, 1)
}

Here is the result. (Chrome)

You can try it here

Source code

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.