Benchmarking alternative List.map implementations

Awhile back @ilias wrote a cool post about making List.map faster.

I was looking at current the current core implementation of List.map - which uses a List.foldr under the hood - and it occurred to me that it might be faster if it used List.foldl instead, which doesn’t convert to an array first.

So I benchmarked it using elm-benchmark. The results?

Firefox 58 on macOS 10.13.2, 2013 MacBook Pro

38 AM

Chrome 63 on macOS 10.13.2, 2013 MacBook Pro

43 AM

Safari 11.0.2 on macOS 10.13.2, 2013 MacBook Pro

28 AM

So which implementation runs faster depends on the browser. :sweat_smile:

I was really surprised by this outcome. I assumed the numbers would fluctuate cross-browser, but expected a consistent winner across the three. I can’t explain why foldr wins on Safari and loses in Chrome and FF.

Does anyone know cross-browser JS engine implementations well enough to explain this result?

2 Likes

Single thousands of runs per second is quite low. Do the results hold with shorter lists?

Safari often behaves pretty unpredictable in my tests, I haven’t quite figured out why that is exactly. If I rerun that benchmark a couple of times, goodness of fit fluctuates between 70 and 99, and I get mixed results on both ends of the spectrum.

Important to note, though, is that your foldlMap is a reverseMap. Hence why we come up with such complicated schemes :smiley:

Shorter than 5 elements?

Ha, fair point! I could throw a List.reverse in there to fix it, but I’m mainly curious about the cross-browser discrepancy. :smile:

Fair enough! I think having the ((+) 5) in there plays a big role, as that compiles not to partial application but to an anonymous function, created in the function you’re benchmarking. Changing that slightly gives pretty different results: https://ellie-app.com/fMDPRWdrZa1/0

The int lists are 2,000 aren’t they?

Ah, I see what you meant - by “the results” you meant “Safari showing foldr as faster.” (The String lists are 5 elements, but Safari has the same winner as the other browsers on those.)

I’ve no idea about browser differences, but if performance is really what matters most, using a recursive definition is what seems to be the fastest with my tests (especially with firefox, can’t test Safari though)

Firefox 57 on Arch Linux, Intel Core i5 5300U
Screenshot-2018-1-1 Ellie - (1)

Chromium 63 on Arch Linux, Intel Core i5 5300U
bench-recursive-map-chromium

2 Likes

So… Small aside, but List.foldr has been changed in master to not convert to an Array anymore. I also hope Ilias implementation makes it in as it’s even faster, especially for large lists, although code-size is a concern.

When it comes to your benchmark results, I saw this all the time with my work on Array, and this is my theory for why it is so:

There are two surprises here:

  1. In all results, even though foldr converts the list to an array, then performs the foldr operation on that array (iterating the list twice, essentially) it’s only 9% or 32% slower.

  2. In Safari, foldr is actually faster.

(1) can be explained by the second pass through the list (the actual foldr pass) doesn’t have to chase pointers in the heap to retrieve it’s results. So the second pass will always be much faster then the first (converting a list to an array). Also, the second pass doesn’t need to constantly check if one is currently holding the empty list, as one already knows the size of the array.

(2) cannot entirely be explained by (1). My theory is simply that Safari (using their FTL jit, which uses LLVM I believe) is able to optimize a loop calling a function better than a loop sometimes calling a function (only when not the empty list), and so two simple passes is faster than one tiny bit more complex pass.

However, there are two things which are kinda important to mention when it comes to benchmarking.

(1) Both foldl and foldr calls the given function through a A2 wrapper. If the JS engine only ever sees one use of A2 (like, if currying is never used…), it can inline that into the loop. Making sure that A2 is called differently, like having one benchmark which returns a curried result, could prevent inlining of the A2 call, thus giving you different results.

(2) JIT’s can do loop unrolling if the size of the array is somewhat even (like, say, 5 or 2000). JIT’s cannot do this if you loop until some special condition (like the empty list). Try benchmarking with a non-even number (some prime number bigger than 10) to prevent this optimization in the foldr benchmark.

Why create benchmarks in a way that prevents certain optimizations? Cause code in the real-world doesn’t always trigger the JIT’s happy path. This is especially true in Elm, where loops tend to be handled by the same piece of code, preventing certain optimizations to ever be made (as that looping code suddenly needs to handle many different cases instead of potentially just one).

3 Likes

These are assumptions btw, I’ve not digged into the actual code the respective JIT for each browser generates.