De-throning the List

My idea was have both arrays and lists around at the same time, in an “array , list”.

You access the appropriate structure based on operation : )

Terribly wasteful, perhaps only optimal in niche situations…

In Java, the order of preference of data types is not really Array > List, it’s more like Stream > Iterable > Collection > List > Array. It’s quite a pain to work with arrays in Java, or to use them in API signatures.

2 Likes

Here are a couple other statistics it seems would be useful to this discussion:

  • What is the statistical distribution of collection (list/array) lengths in the target class of programs?

  • How often are these collections constructed in a literal?

  • How often are these collections constructed through transformation of another such collection?

  • Alternatively, how often are these structures accumulated through custom logic not based on folds, maps, etc.

I didn’t include access patterns in the above list of questions because those seem likely to be influenced by the strengths and weaknesses of the existing use of lists but it would be interesting to find an unbiased way to look at this question as well.

I raised these questions because it dawned on me that benchmarks for 1000 element collections are interesting — speaking as someone who works with even larger collections right now — but could be misleading if program performance is dominated by working with lots of small collections. The performance on 100 ten element collections could be more relevant than the performance on a single thousand element collection. Of course, it’s awkward if the correct answer for the small case and the large case are different because it means that any default becomes problematic at some point.

But keep at it. Pushing hard on this question is likely to be revealing whatever the outcome.

Mark

3 Likes

I wanted to point out that having a swap-able List/Array(/Other) data structure is already possible in Elm, by using a slightly more OO style of encoding. Compared with the union type approach of @gampleman above, it is also more flexible as arbitrary new implementations can be added after the fact, without changing the type.

People have mentioned that in Java there are various ‘interfaces’ that are implemented by the collections library in java.util. It is also possible to define ‘interfaces’ with separate ‘implementations’ in Elm.

Here is an example. I needed FIFO, LIFO and ordered buffer types for my search algorithms to produce depth first, breadth first and informed search strategies. The Buffer interface can be seen here:

With the 3 implementations on lines 287 (fifo), 304 (lifo) and 314 (ordered).

It would be possible in a similar way to define interfaces for Iterable and Collection in Elm, with List and Array backed implementations. Iterable and Collection would then be standard interfaces that packages could use in APIs to say what they want without saying how it should be implemented.

2 Likes

Please stay on topic. If people want to talk about interfaces, they can do so in another thread.

1 Like

Ok, but I don’t think it is off topic. I’m pointing out a different way that List/Array could be combined, which seems very relevant? But as you request, we’ll take any further discussion about interfaces to a new thread. :smile:

1 Like

The topic isn’t about combining , but replacing the list entirely.

No thank you - there are many data types to handle many different situations - why replace a linked list with an array in a language when it already has both and the history of computing science has provided both of us to them?

I would like to keep having choices.

1 Like

List can still be in the std lib, but it shouldn’t be the default.

1 Like

New part of the series out now. Look at the top post for link.

2 Likes

Just a silly question: in your examples you use these type annotations:
find : (a -> Bool) -> List a

instead of
find : (a -> Bool) -> List a -> Maybe a

Why is that?

1 Like

Oversight on my part. Have fixed it now. Thanks for pointing it out :slight_smile:

New part published now, see top post for link.

2 Likes

I’ve been accessing this through the email interface, and so I didn’t realize you had been editing the top post. I was beginning to get the impression that you were making a point by having us click through a linked list of articles each time. :slight_smile:

1 Like

I’ll keep that in mind for my next update :slight_smile:

In the latest post it points out that Set in Elm is ordered. Worth noting, in general Set implementations do not need to be ordered - an example being a Set based on hash tables.

If you were to build one of these ‘collections’ with Set semantics, how would you do that? The same collection implementation also supports List semantics, so they must have different constructors right? Would be interesting to see the constructor APIs.

Its an interesting piece of work you have done here, and on the whole I would be in favour of it. It will perform well enough, and is a real simplification.

1 Like

While Sets doesn’t necessarily have to be ordered to be a set, Elm’s Set is. I’m working on a HashSet for Elm right now, so there could be multiple Set implementations in the future.

The point is that Set’s are collections, and so there are multiple functions available in the List and Array API that also makes sense to implement for Set.

Excellent series. I liked the approach of speaking in broader concepts and then applying them to the current state of Elm.

I think performance concerns (especially wrt defaults) are central to Elm because Elm runs in the web browser and thus slow clients (like low-end mobile devices) cannot be avoided.

And since defaults have such a big impact on what people use, we have situations like people preferring List even though their data access patterns don’t match the List’s niche. It’s easy for us to say “well, I don’t want to have to think about datastructures,” but that comes at the expense of every client having to pay the penalty for us, like reversing a large array of endless-scroll gallery images on every render on the UI thread.

Kotlin’s solution here was to provide built-in functions like listOf(1, 2, 3), arrayOf(1, 2, 3), setOf(1, 2, 3) instead of a collection literal. Though I think there is now a generic [1, 2, 3] literal that you must explicitly annotate so it knows which collection impl to use, else it won’t compile.

Just about every time I implement a custom datastructure in Elm, I end up using an Array for its performance properties. For example, any time I need a grid of game pieces or gallery images, I often need to index into it. When I have a sequence in Elm (and UIs in general), I often am appending to the end but want to iterate in insertion order, which is precisely the List’s worst case.

If Elm were to adopt the Array as the default collection, I wonder how much of its APIs would also change. For example, ctrl-f “List” on http://package.elm-lang.org/packages/elm-lang/html/2.0.0/Html. One downside of using Array currently is that you have to convert it to a List in the view on each render though I never took the time to see what kind of perf impact it can have on a real world app.

I look forward to see what comes of this discussion.

1 Like

The last pieces of the blog series are out now:

Part Boron: https://dev.to/skinney/de-throning-the-list-part-boron-185
Summary: https://dev.to/skinney/de-throning-the-list-summary-3f3c

6 Likes

I am a little confused by this post:

Since in that one you also bring Set into the picture, but in the earlier posts you only talked about List and Array. Are you also going to try and build a data structure which can replace Set too?

That is why, above, I was asking what the constructors would look like. That is, how do you build this thing with Array vs Set semantics?

Dethroned.list [ 1, 2, 2, 3]
Dethroned.set [ 1, 2, 2, 3]

But perhaps this is not your intention and the above post has given me an incorrect understanding.