Benchmarking alternative List.map implementations

rtfeldman · December 31, 2017, 4:25pm

Awhile back @ilias wrote a cool post about making List.map faster.

I was looking at current the current core implementation of List.map - which uses a List.foldr under the hood - and it occurred to me that it might be faster if it used List.foldl instead, which doesn’t convert to an array first.

So I benchmarked it using elm-benchmark. The results?

Firefox 58 on macOS 10.13.2, 2013 MacBook Pro

38 AM

Chrome 63 on macOS 10.13.2, 2013 MacBook Pro

43 AM

Safari 11.0.2 on macOS 10.13.2, 2013 MacBook Pro

28 AM

So which implementation runs faster depends on the browser.

I was really surprised by this outcome. I assumed the numbers would fluctuate cross-browser, but expected a consistent winner across the three. I can’t explain why foldr wins on Safari and loses in Chrome and FF.

Does anyone know cross-browser JS engine implementations well enough to explain this result?

brian · December 31, 2017, 4:43pm

Single thousands of runs per second is quite low. Do the results hold with shorter lists?

ilias · December 31, 2017, 4:43pm

Safari often behaves pretty unpredictable in my tests, I haven’t quite figured out why that is exactly. If I rerun that benchmark a couple of times, goodness of fit fluctuates between 70 and 99, and I get mixed results on both ends of the spectrum.

Important to note, though, is that your foldlMap is a reverseMap. Hence why we come up with such complicated schemes

rtfeldman · December 31, 2017, 4:45pm

Shorter than 5 elements?

Ha, fair point! I could throw a List.reverse in there to fix it, but I’m mainly curious about the cross-browser discrepancy.

ilias · December 31, 2017, 4:49pm

Fair enough! I think having the ((+) 5) in there plays a big role, as that compiles not to partial application but to an anonymous function, created in the function you’re benchmarking. Changing that slightly gives pretty different results: https://ellie-app.com/fMDPRWdrZa1/0

brian · December 31, 2017, 5:02pm

The int lists are 2,000 aren’t they?

rtfeldman · December 31, 2017, 5:57pm

Ah, I see what you meant - by “the results” you meant “Safari showing foldr as faster.” (The String lists are 5 elements, but Safari has the same winner as the other browsers on those.)

mattpiz · January 1, 2018, 11:07am

I’ve no idea about browser differences, but if performance is really what matters most, using a recursive definition is what seems to be the fastest with my tests (especially with firefox, can’t test Safari though)

Firefox 57 on Arch Linux, Intel Core i5 5300U
Screenshot-2018-1-1 Ellie - (1)

Chromium 63 on Arch Linux, Intel Core i5 5300U
bench-recursive-map-chromium

robin.heggelund · January 1, 2018, 1:46pm

So… Small aside, but List.foldr has been changed in master to not convert to an Array anymore. I also hope Ilias implementation makes it in as it’s even faster, especially for large lists, although code-size is a concern.

When it comes to your benchmark results, I saw this all the time with my work on Array, and this is my theory for why it is so:

There are two surprises here:

In all results, even though foldr converts the list to an array, then performs the foldr operation on that array (iterating the list twice, essentially) it’s only 9% or 32% slower.
In Safari, foldr is actually faster.

(1) can be explained by the second pass through the list (the actual foldr pass) doesn’t have to chase pointers in the heap to retrieve it’s results. So the second pass will always be much faster then the first (converting a list to an array). Also, the second pass doesn’t need to constantly check if one is currently holding the empty list, as one already knows the size of the array.

(2) cannot entirely be explained by (1). My theory is simply that Safari (using their FTL jit, which uses LLVM I believe) is able to optimize a loop calling a function better than a loop sometimes calling a function (only when not the empty list), and so two simple passes is faster than one tiny bit more complex pass.

However, there are two things which are kinda important to mention when it comes to benchmarking.

(1) Both foldl and foldr calls the given function through a A2 wrapper. If the JS engine only ever sees one use of A2 (like, if currying is never used…), it can inline that into the loop. Making sure that A2 is called differently, like having one benchmark which returns a curried result, could prevent inlining of the A2 call, thus giving you different results.

(2) JIT’s can do loop unrolling if the size of the array is somewhat even (like, say, 5 or 2000). JIT’s cannot do this if you loop until some special condition (like the empty list). Try benchmarking with a non-even number (some prime number bigger than 10) to prevent this optimization in the foldr benchmark.

Why create benchmarks in a way that prevents certain optimizations? Cause code in the real-world doesn’t always trigger the JIT’s happy path. This is especially true in Elm, where loops tend to be handled by the same piece of code, preventing certain optimizations to ever be made (as that looping code suddenly needs to handle many different cases instead of potentially just one).

robin.heggelund · January 1, 2018, 1:49pm

These are assumptions btw, I’ve not digged into the actual code the respective JIT for each browser generates.

Topic		Replies	Views
List.map optimization options Show and Tell	14	1811	June 20, 2021
A faster `List.map` for Elm Show and Tell	9	3404	January 24, 2021
2-3x faster List.concat implementation Request Feedback	9	1679	May 6, 2019
Elm-benchmark 2.0.0 Show and Tell	1	2143	December 23, 2017
Data structure project Learn	5	1183	March 21, 2022

Benchmarking alternative List.map implementations

Firefox 58 on macOS 10.13.2, 2013 MacBook Pro

Chrome 63 on macOS 10.13.2, 2013 MacBook Pro

Safari 11.0.2 on macOS 10.13.2, 2013 MacBook Pro

Related topics