Remove duplications from the List - reviewed

With the latest issue of @alexkorban’s Elm Weekly the first article caught my attention as I have been recently in the need for such deduplication of the List.

The article is saying that using the List |> sort |> and fold can be faster than Set |> fromList |> toList. And proves it with Benchmarking those two approaches on the reverse List with few duplicities and list with a lots of duplicities:

But then I found out that the blogpost is from 2018, and I’m unable to run the old Ellie to see for myself. So I have recreated the benchmark and run it with the current Elm version and I have ran it.

What do you think the result looks like?

Well, it is one of those funny ones as… it is inconclusive :smiley:

The result of benchmark for small number of duplicities used in the original article is still faster with sort |> fold method, but not 10x but 4x.

But the second part where you have a list with lots of duplicities is having different outcome than in the article as the Set |> fromList |> toList performs better by 33% :scream:compared to being 2x slower in the article version.

Try it out for yourself in the Ellie: https://ellie-app.com/byLyGTsH9rFa1

So the conclusion? It is interesting to know that the List |> sort |> fold is faster when you have fewer duplications, but it is hard to tell from what number of duplicities it is outperformed by the Set |> fromList |> toList approach.

Lesson from this short exercise: Trust no one! :smiley: And rather try the hypothesis out by yourself if you have a chance.

Update

@Janiczek ran the benchmark with --optimize flag and here are the results

Chrome:

Screenshot 2020-11-18 at 13.31.30

Firefox:

Screenshot 2020-11-18 at 13.31.48

5 Likes

Awesome job on peer reviewing the work and making sure it’s still accurate! Always nice to see.

2 Likes

Just a note, you should not benchmark things in Ellie since it is compiled with --debug I believe. (or at least it used to be, I haven’t check for a while)

2 Likes

Very interesting! Thanks for posting

And what about using elm-optimize-level-2?

I just recently made optimizations in app that we have at work by replacing 2 lists by AnySet. I measured about 5x difference on our production data in measuring not even just the functions using some micro benchmark but measuring whole performance of view. In other words it means 10 lines of code change and render of whole app went from 1.5s to 300ms. In our case lists were causing worse than cubic complexities of algorithms all over the place.

I even think that de-duplication is not even a good thing to measure. It’s extremely naive case. What you should care about in any more real world benchmark is how your data structure will scale with increasing complexity over time. Some people say that rendering is the most expensive operation every app is doing. I disagree. Most render transitions are changing very little things in the DOM, setting one class, changing few nodes - that’s the common case. Meanwhile view code gets executed over and over after every model update. My advice - never List.Extra.find and don’t use AssocList.

1 Like

never List.Extra.find and don’t use AssocList.

Well, I’d say it depends on what you are doing. In a code base we have at work, we use some Dict String ComplexValue dicts which won’t exceed 20 elements (ok, in the very very very worst case, it could ended up to 50, but this really would be unlikely).

I really think we should use AssocList for this in order to be able to have some Key type more significant than String. Moreover, for really small dicts (like e.g. less than 10 elements), I even think assoclist would be faster than dict because when sing Dicts we might have an overhead (not sure however).