De-throning the List

I’ve got a plan to see if Array can replace List as Elm’s default collection type.

This is the beginning of a series of blog posts where I’ll explain stuff, show some benchmarks and propose changes. It’s meant to serve as the foundation for future work. I’d love to hear feedback as this experiment moves along.

I’ll update this thread whenever a new blog post is released.

Introduction: https://dev.to/skinney/de-throning-the-list-2fjk
Part Deux: https://dev.to/skinney/de-throning-the-list-part-deux-4idm
Part π: https://dev.to/skinney/de-throning-the-list-part--44dl
Part SC4K: https://dev.to/skinney/de-throning-the-list-part-sc4k-4e3n
Part Boron: https://dev.to/skinney/de-throning-the-list-part-boron-185
Summary: https://dev.to/skinney/de-throning-the-list-summary-3f3c

19 Likes

Robin, your cunning plan will have to wait until we go over the top.

1 Like

Well written, easy to follow.:+1:
I’m normally not interested in datastructures, but I am now :grinning: Looking forward to the next one…

2 Likes

Sounds very interesting … I’ll look forward to future instalments!

Depending on how deep you want to take your analysis, and whether you think it’s the right time to think about these things, there is a language-level issue underlying this question that you might want to think a bit about.

As a language, Elm tends to force package authors to be more specific about types – especially container types like List or Array – than they would ideally need to be. As an example, consider something like the view function in the elm-sortable-table package, which looks like this:

view : Config data msg -> State -> List data -> Html msg

This quite deliberately works with your own data type, whatever it is, rather than forcing you to use some secondary type (possibly introducing inconsistencies etc.). However, it does insist on getting a List (as, for that matter, does the html package itself).

Now, perhaps it would be better if packages insisted on getting an Array instead of a List – that’s quite possible, and I’ll look forward to your analysis. However, the deeper question (if you’re interested, and think it’s the right time to think about it) is whether it would be desirable for Elm to provide a mechanism that allows package authors to avoid insisting on working in one collection type or another.

The intuition behind an interest in this question is something like this. As far as a particular application’s data model is concerned, there will always be reasons to prefer one collection type vs. another. That is, no collection type is optimal for all cases. Of course, it’s always possible to convert from the app’s preferred collection type to whatever a particular package is demanding. However, there are a couple of ways in which this is less than ideal. First, it may be inefficient, especially in cases like view where it will be done repeatedly. (Though, of course, it may turn out not to matter much in practice). Second, it may discourage developers from selecting the collection type that would actually be ideal for their data, given that working in the “default” type (whatever it is) is easier.

I’m deliberately being vague about what changes would be needed to allow package authors to be less prescriptive about container types, since this is the kind of thing that Elm would want to analyze deeply and innovate. If one were to think about how other languages currently deal with this sort of thing, the minimal thing needed would be support for higher-kinded polymorphism. For instance, if this were a legal function signature in Elm, then elm-sortable-table and html wouldn’t have to insist on a List (or, could at least have variations that don’t insist on a List).

view : Config collection data msg -> State -> collection data -> Html msg

Now, let me say again that you may have all sorts of good reasons not to be interested in this question, or to think that it’s not a good time to think about it right now. However, if there is going to be a conversation about what Elm’s default collection type should be, I thought it could possibly be a good moment to wonder whether it might be desirable to permit package authors to be less prescriptive about container types.

2 Likes

This question is already being thought about quite deeply by Evan, he just haven’t found the right solution for Elm yet. In the meantime, I think what I’ll reveal in the upcoming blog posts will help alleviate the problem.

2 Likes

Part Deux is now released. I’ve updated the main post with the link.

4 Likes

I found both blog posts very interesting. However, I think I’m suffering from some kind of blind spot here, maybe someone can help.

In my day job I’ve worked with Java for almost 20 years now. So it’s very ingrained in my mind that the “array versus linked list” question is just an implementation detail, not something that a core language designer would ever really need to consider, or think about “switching between” in the ways discussed.

In Java, LinkedList and ArrayList are “just” the two most common implementations of the List interface. There are several others. You can write your own if you want. :slight_smile:

Looking at the Elm API docs for List and Array, it’s clear they don’t currently provide the same set of functions (~ “interface”), but I also didn’t see anything that was really unique to either one. (In other words, I think I could implement all the “missing” functions w/o resorting to Array.fromList and Array.toList.)

So, why is this so much more of a core distinction in a language like Elm than in a language like Java?

2 Likes

I only spent 4 years working professionally with Java, but I can relate to this feeling.

Having spent more time with other languages that don’t follow that philosophy (e.g. JS, Elm) my conclusion is that “data structures should be an implementation detail” wasn’t actually a worthwhile goal after all.

Like inheritance and dependency injection, it’s one of those things I was told was very important in the Java world, but now that I don’t have it anymore, I can’t point to how my life got worse. (I can point to how my life got simpler though!)

2 Likes

Not quite following … maybe I’m bringing more baggage along here, but another mantra I’ve absorbed is “the evils of premature optimization”. Choosing up-front – and hard-coding into my API – whether I want a data structure with constant-time prepend or constant-time random access feels like an unnecessary complexity to me.

I want to be as general as possible: all I need is a data structure with order & homogeneity. Only if the code I’m writing turns out later to be a performance hot-spot will I come back and consider details like what underlying implementation that data structure ought to have.

All that being said, I think in practice, my “avoid premature optimization” strategy so far has ended up being:

Yeah, I know this Elm List thing is a linked list under the hood, and that may perform horribly in some cases. But it seems like everyone in Elm land has already agreed to use it for everything, none of the code examples I see are using Array. So I’ll just go with that and revisit later if it turns out to be a performance problem in practice.

I hadn’t really articulated it even to myself that way until I re-acquainted myself with the Elm Array type after reading these blog posts though … :slight_smile:

2 Likes

I’m favouring the idea that Elm Lists and Arrays are different things with commonality - that common bit being like in C# having IList and IEnumerable.
Keep making improvements under and above the hood to improve their efficiency and alignment and have a master type that can be used as the default to handle the commonality and interchangeability of both.
Sort of like ints and floats are both numbers, right?

2 Likes

I’m not a Haskeller, but I believe that’s a typeclass, right? Elm has some, like Number, just hardcoded into the system.

1 Like

Yes, and @evancz has repeatedly stated that he doesn’t want to simply implement typeclasses in Elm, because he’s unconvinced that it’s the best solution for Elm’s use case and for Elm users (the actual implementation would be “easy” but code is always the easy part)

The main reason people are using List everywhere is because it’s Elms default. Every library makes use of it, it had literal syntax and is the most flexible.

I agree this is not really a choice one should care about unless there is a performance problem, and that’s really what my proposal is about. Array is a better general purpose datastructure than List, and so it should be the default.

Besides, in languages like Java, C#, JS and more, you’re most likely to reach for a data type like Array instead of List.

3 Likes

I’ve been doing exactly this. And indeed, even when I had a situation where I thought Array might be better, I used List because I didn’t know enough about the underlying reasons list is used.

I deeply appreciate your work on this, Robin. Almost as much as I appreciate that it started with a Blackadder reference.

6 Likes

From an API perspective, I appreciate having to explicitly opt-in to index-based random access. It puts positive pressure to solve collection problems via a functional approach like map, filter, and foldl rather than porting index-based imperative solutions.

8 Likes

A more direct comparison of performance between lists and arrays might be to compare modifying the first element of a list (best case for a list) with modifying the last element of an array (best case for an array) — and vice-versa. After all, if the first element really was the routine focus, then one could just create an array type that ran the indices backwards. Without reading your actual benchmark code, it seems like you could shuffle your numbers around to get this. Lists in in all cases except for removing the last element of a list v removing the first element of an array if I’m permuting the numbers correctly. The wins aren’t nearly as big and obviously lists suffer when it comes to random indexing. (I’m surprised that adding an element to the end of a list is so much faster than removing the last element of a list in your benchmark results but I note that removal always seems to benchmark as slower in your results.)

Now, as for why lists win out over arrays in functional code, I agree that there is probably some historical precedent but I think it also probably stems from the fact that lists are significantly more friendly to recursion and pattern matching. This carries through to performance: List.head returns a Maybe (thereby incurring a storage allocation) but matching against x :: xs does not. So, to really compete in these cases, arrays would need support for pattern matching and allocation free slicing. Or there needs to be a sufficient range of cases where the places where arrays win that the list wins can be pushed into a more peripheral case of a linked list type that doesn’t receive special linguistic support.

Mark

1 Like

Index based solutions doesn’t give you loops though, so i still think you’d reach for foldl-based solutions. At least, that’s my experience with Clojure, which was my first FP language and which uses Vector/Array as default.

1 Like

I think there are significant enough cases that makes Array a better default. But we can have that discussion after all the posta are out and I’ve made all my arguments :slight_smile: thankd for your input, you’ve made good points.

I think this is a case where side by side code comparisons will be key.

Take some logic that uses a List pattern match and put it side by side with an Array version of the same logic. How do they read? What are the performance trade-offs? Stuff like that.

Code examples will be the most revealing!

4 Likes

BTW you don’t need language support for that, see this gist:

5 Likes