Performance of tail recursive function

m-shaka · March 7, 2020, 12:35am

Hi. This question maybe one about JavaScript, but I think it can help Elm compiler’s improvement.

I want to know why fibTopLevel performs better than fibLetIn on Google Chrome and Firefox (probably on other browsers too).

fibLetIn n =
    let
        helper m acc acc2 =
            case m of
                0 ->
                    acc

                1 ->
                    acc2

                _ ->
                    helper (m - 1) acc2 (acc + acc2)
    in
    helper n 0 1


fibHelper n acc acc2 =
    case n of
        0 ->
            acc

        1 ->
            acc2

        _ ->
            fibHelper (n - 1) acc2 (acc + acc2)


fibTopLevel n =
    fibHelper n 0 1

The only diffrence between them is whether the helper function is bound in top level or let … in statement.

Also generated JavaScript codes are almost the same.

var $author$project$Main$fibLetIn = function (n) {
	var helper = F3(
		function (m, acc, acc2) {
			helper:
			while (true) {
				switch (m) {
					case 0:
						return acc;
					case 1:
						return acc2;
					default:
						var $temp$m = m - 1,
							$temp$acc = acc2,
							$temp$acc2 = acc + acc2;
						m = $temp$m;
						acc = $temp$acc;
						acc2 = $temp$acc2;
						continue helper;
				}
			}
		});
	return A3(helper, n, 0, 1);
};
var $author$project$Main$fibHelper = F3(
	function (n, acc, acc2) {
		fibHelper:
		while (true) {
			switch (n) {
				case 0:
					return acc;
				case 1:
					return acc2;
				default:
					var $temp$n = n - 1,
						$temp$acc = acc2,
						$temp$acc2 = acc + acc2;
					n = $temp$n;
					acc = $temp$acc;
					acc2 = $temp$acc2;
					continue fibHelper;
			}
		}
	});
var $author$project$Main$fibTopLevel = function (n) {
	return A3($author$project$Main$fibHelper, n, 0, 1);
};

Here is benchmark.
https://ellie-app.com/8dWrK9R93WLa1

Chrome
fib-chrome

rupert · March 7, 2020, 7:23pm

Is this with the elm --optimize flag?

m-shaka · March 8, 2020, 2:19am

--optimize flag makes no difference…

joakin · March 8, 2020, 8:39am

The generated code is extremely similar. Maybe the function in function definition or the size of the outer function are tripping the JITs. Very hard to debug!

mdgriffith · March 8, 2020, 3:00pm

Interesting! Performance can be super tricky.

If you’re interested in checking it out further, you could see what code V8 spits out. Robin does that in this article: https://dev.to/skinney/improving-elm-s-compiler-output-5e1h

jreusch · March 8, 2020, 4:55pm

You may be measuring the performance of the garbage collector here - The amount of work the fib function does is incredibly tiny - it manages to calculate the 100th fibonacci number nearly 6 million times per second! On a 3GHz machine, that would mean it takes around 500 CPU cycles per call, which means it takes approx. 5 cycles per loop body. Considering most instructions actually take more than 1 cycle to complete, this should already be as fast as you can get in a “naive” way (Elm is not exactly known for being close to metal, there is not much we can do).

I made a similar comparison using Benchmark.scale here, but for the sake of not running into Infinity, I chose the harmonic instead of the fibonacci series (it works with your original functions as well, I just wanted to avoid floating-point weirdness):

https://ellie-app.com/8fH3hwBNfbfa1

It takes a while until the runs/second behave like you would expect - runtime does not seem to increase 10x if there are less than 1000 iterations happening
Starting at around the same point, the runs/second for both versions are roughly the same

So when doing less than 1000 runs, we do not seem to benchmark our function much at all - It’s pretty much just the overhead of calling things in Javascript. The one difference being that the fibLetIn version also allocates a new function object (“closure”) on the heap on every call, which seems to take the majority of the time if n is small.

brian · March 9, 2020, 3:14pm

Internally, elm-explorations/benchmark basically does the same thing @jreusch did, but it’s hard to “tune” that scaling factor at the library level because you have three basic kinds of benchmarks:

math stuff (fib, harmonic)
data structures (think tuning Array.get or Dict.get for performance.)
people’s app functions (update or similar)

elm-explorations/benchmark is currently tuned to give good results for data structures since that’s the primary optimization opportunity core team members see in Elm at the moment (cf. Robin’s work to optimize Array for 0.19.0.)

That means when you say…

So when doing less than 1000 runs, we do not seem to benchmark our function much at all - It’s pretty much just the overhead of calling things in Javascript.

… the benchmark harness should be taking care of that for you. In this case, it looks like we start to get on a better track for a linear fit around 128 iterations of harmonic—that is, doubling the work done doubles the runtime.

I think there may be an opportunity to find a better algorithm for sample size. If you want to explore this yourself, Benchmark.LowLevel exposes the functions you’d need to create a new runner.

I haven’t done this myself, though, since it’s a lot of work to make sure the math checks out. A bad benchmark can be really misleading. I have also put my work on elm-explorations/benchmark on hold because the time spent in DOM stuff far outweighs algorithmic complexity stuff for most apps.

brian · March 9, 2020, 3:16pm

oh, also: when you see less than 99% goodness of fit and the benchmark completes very quickly it’s a good indication that tuning the benchmark may give more accurate results.

m-shaka · March 14, 2020, 10:22am

Thank you for explanation!

I found out that F3 or A3, helper functions for currying generated by Elm compiler, are bottleneck.

I compared fibLet(1) and fib(1) on Chrome, Firefox and Opera.

var fibLet = function (n) {
    var helper_ = function (m, acc, acc2) {
        helper_:
        while (true) {
            switch (m) {
                case 0:
                    return acc;
                case 1:
                    return acc2;
                default:
                    var $temp$m = m - 1,
                        $temp$acc = acc2,
                        $temp$acc2 = acc + acc2;
                    m = $temp$m;
                    acc = $temp$acc;
                    acc2 = $temp$acc2;
                    continue helper_;
            }
        }
    };
    return helper_(n, 0, 1)
}

var helper = function (m, acc, acc2) {
    helper:
    while (true) {
        switch (m) {
            case 0:
                return acc;
            case 1:
                return acc2;
            default:
                var $temp$m = m - 1,
                    $temp$acc = acc2,
                    $temp$acc2 = acc + acc2;
                m = $temp$m;
                acc = $temp$acc;
                acc2 = $temp$acc2;
                continue helper;
        }
    }
}

var fib = function (n) {
    return helper(n, 0, 1)
}

fibLet performs almost as much as fib. Allocation has no harm, or perhaps the browser make some optimization.

But fibLetWrapped is extremely slow. (differs only in being wrapped by F3 and A3 from fibLet)

var fibLetWrapped = function (n) {
    var helper_ = F3(
        function (m, acc, acc2) {
            helper_:
            while (true) {
                switch (m) {
                    case 0:
                        return acc;
                    case 1:
                        return acc2;
                    default:
                        var $temp$m = m - 1,
                            $temp$acc = acc2,
                            $temp$acc2 = acc + acc2;
                        m = $temp$m;
                        acc = $temp$acc;
                        acc2 = $temp$acc2;
                        continue helper_;
                }
            }
        });
    return A3(helper_, n, 0, 1)
}

Here is the result. (Chrome)

You can try it here

Source code

system · March 24, 2020, 10:22am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Tail-call optimization Show and Tell	16	1339	June 18, 2021
List.map optimization options Show and Tell	14	1808	June 20, 2021
A faster `List.map` for Elm Show and Tell	9	3398	January 24, 2021
Using safe recursion: how deep is a recursive stack? Learn	2	410	February 9, 2024
Use cases for recursive functions Learn	20	3995	May 12, 2019

Performance of tail recursive function

Related topics