Typed Arrays for Elm

Here’s an important question: “Should Elm ever expose bindings to Typed Arrays?”

Elm aspires to target more platforms than JavaScript in the future. Typed Arrays are a language feature unique to JavaScript, and they cover a lot more use cases than just Canvas. Is it best for Elm’s long-term ecosystem to have mathematical packages built on top of a JS API, forever coupled to JS and inaccessible to people who want to do that sort of math on, say, Elm servers that compile to LLVM or BEAM?

I think the default answer is that Elm should never expose bindings to Typed Arrays. Rather, it makes more sense to have Typed Arrays be an implementation detail of libraries which require them (e.g. Canvas, Blobs, perhaps certain low-level math libraries like the use case in OP). Possibly someday the Elm compiler could situationally emit JS that uses typed arrays under the hood as a performance optimization. Who knows?

I could be wrong about this, of course…but I would not make the opposite assumption lightly!

4 Likes

I think this may be the intended direction for this typed arrays library.

In order to try out typed arrays without modifying the Elm compiler, Matt has had to bolt them on in such a way that to use them, you definitely know you are using them.

A cleaner way would be if the compiler automatically recognised uses of Array Int or Array Float, and segwayed into the typed array code for those cases, but continued to treat other types of Array in the normal way. I believe such things are called compiler intrinsics. Doing this would require modifying the compiler, which is a lot more work. In the JS version of Elm, they would map to typed arrays, in some future LLVM version, they would map to some other kind of fast/compact array.

So the way I see this work is as a testing ground for trying out typed arrays in Elm, understanding if they can be made to behave nicely and perform well when fitted to the same API structure as Arrays.

They are useful for interacting with other Web APIs (to add Blob, Elm perhaps needs to add a Byte type?), although not essential. We can always convert an array to a typed array to interact with some API that needs the typed array.

The idea is to optimise Elm Array -> Typed Array -> Some Web API, to just Elm Array -> Some Web API. What Web APIs are there that could benefit from this optimisation?

Thanks Brian, I’ll continue to experiment with elm-tensor, see where it gets me.

I wrongly assumed that since they are needed for many web APIs, they would end up available in the language. But as you say, they could very well stay as an internal implementation, only available to core. I’m not sure this devalues the work I’m sharing here though.

Regarding why choosing typed arrays for my experiment with elm-tensor, well mostly because I “need” (understand on a performance point of view) constant time slicing and reading of arrays to implement most linear algebra base functions. The API proposed here is extremely similar to the one of Array for the reason you mention, should elm have such data structure, it would be easy to adapt.

In Skinney/elm-array-exploration, Array.Hamt is based on a thin layer of native code (JsArray.js and JsArray.elm). I think of this experiment as analogue to the thin layer of native code in Robin’s package. Perhaps a type called NumericArray (analogue to Array.Hamt) and that would provide the constant time slicing and reading guaranties could be introduced, independently of the actual plateform implementation.

Point taken. Again my goal here is not to hurt the elm ecosystem, simply to share my thoughts and experiments on tools for a transformation (growing the web platform) that will happen way before elm targets another platform.

As you and Richard mentionned, Elm doing performance optimization under the hood is possible and would be awesome. Yet, elm Arrays and typed arrays behave fundamentally differently, not having the same complexity on operations. So doing something like automatically using typed arrays for numeric values would not be beneficial.

If you refer to this blog post from 2012, “Typed Arrays are a relatively recent addition to browsers, born out of the need to have an efficient way to handle binary data in WebGL.” So I think they are probably beneficial everywhere they were introduced.

PS: please keep also in mind that I don’t have any elm programmer in my near surrounding (though helping organization of a meetup in my area) so sharing my experiments online is the only way I can have feedback on those. If I had communicated earlier about my experiment with elm-tensor, saying something like “hey, I’d love to have constant time slicing and reading array to do this” I wouldn’t have done anything at all since this would not happen anytime soon. Timing and communication are hard things to figure out when you practice elm on your own.

Totally! I don’t want to discourage you - just want to share my top-of-mind thought. :smiley:

1 Like

But if you built NumericArray as you suggested above, as a HAMT on top of typed arrays, then they would be almost the same.

I think though, that web APIs that consume typed arrays expect to be given a single contiguous array? Or is this not the case? I think the HAMT structure works with arrays of 16 (or is it 32) elements, so that a write to the array only requires copying those 16 elements plus some of the structure above it.

To pass such an array to a Web API expecting a single contiguous typed array would require appending all those 16 element chunks together. At which point, you may as well just be working with non-typed arrays, since the conversion could happen at that point.

@rupert In that way yes it is not very useful. It would be simpler to just work with Array.

What I actually meant by "analogue to Array.Hamt" was that it would be the pure elm module exposed, internally based on _ (put here JsTypedArray or another implementation with the same complexity guaranties), just like Array.Hamt is the pure elm module exposed, internally based on JsArray.

So what advantages would NumericArray have?

While I wouldn’t want to tie Elm to JavaScript’s typed array API, the concept of a single chunk of memory storing values in a linear, contiguous way exists everywhere (usually just as ‘an array’!) and I think ultimately would be worth having in Elm as an alternative to Array with different performance characteristics. I personally like the name Buffer, but I’m not picky =) I think this means that something like this JsTypedArray package could then be one of two things:

  • An internal implementation detail for the Buffer implementation on the JavaScript backend (use as is but write a backend-agnostic layer on top of it)
  • A useful reference on/example of how to use the JavaScript typed array API from Elm code, for whenever a Buffer type gets implemented
3 Likes

Buffer would be a good name for it.

There would then need to be a set of transformation functions Array -> Buffer and Buffer -> Array to convert between them.

One disadvantage of having essentially 2 versions of array would possibly be that of having to write the same functions many times to deal with the different combinations.

For example, suppose I dynamically built up a list of vectors to be rendered, but then transform that into a 3d perspective using a matrix that is the same each time. It is going to be more efficient for the list of vectors to be an Array, but the transformation matrix to be a Buffer (or a slice of a Buffer). So now I need a linear algebgra library that supports all the combinations for Matrix multiplication, Array x Buffer, Buffer x Buffer and so on.

Is there some way they could be allowed to be the same thing by introducing another pseudo-type class like we have for comparable and number?

My other suggestion was that they really both be the same type, Array, and the compiler automatically figures out what underlying representation is best. But that sounds very hard to do.

@rupert Yes interop with other elm data structures in important.

If they are the same type, one way the compiler could decide which implementation to use could be to analyze the operations performed and chose accordingly. There could be no good choice though.

As Ian mentionned, a buffer-like data structure would be a welcomed addition to the language. And I think programmers can choose the adequate data structure if they are aware of the trade-offs.

In your case, I think there would be no issue. If you are dynamically building a list of vectors, in an asynchronous manner and don’t know how much time it will take, adding one interop at the end is not a big deal. If however, you are building it incrementally but successively in a deterministic way, there should be a creation function enabling you to do it without needing an intermediate interop. Anyway, this is very hypothetical and the best way to sort it out would be to have a concrete problem like Brian said.

Ok, that sounds right. So if you were to write an optimized linear algebra library it would work over vectors and matrices implemented on top of Buffers. No need for there to be a mixed mode.

I think such a library would need to be implemented in native code. For example, if multiplying 2 matrices, the result will be written to the output matrix using Buffer.set many times if it were done in Elm, and Buffer.set is going to copy the whole Buffer. A native implementation would read from the 2 input matrices, and build up a the answer in a mutable array, then return that as a Buffer back to the Elm program.

What would the API for creating Buffers look like? I imagine something like this:

intBufferFromArray : Array Int -> Buffer Int
floatBufferFromArray : Array Int -> Buffer Float

and the rest of the API is just like Array.

You could have (emptyIntBuffer : Int -> Buffer Int), but using .set to fill it in is going to be inefficient compared with using Array.set, then converting, so I think just the Array constructors are needed.

Yep, if you have some time, have a look at what I’m trying to do with mpizenberg/elm-tensor. It’s still a bit rough, no readme and all, but you can get pretty much an idea of what I’m doing from this roadmap issue and with this Tensor data structure much inspired by numpy arrays:

type alias Tensor =
    { data : DataArray
    , dimension : Int
    , length : Int
    , shape : List Int
    , view : TensorView
    }


type TensorView
    = RawView
    | TransposedView
    | StridesView (List Int)


type alias DataArray =
    JsTypedArray Float64 Float

I’m trying as much as possible to avoid any native in elm-tensor so that’s why it influenced a bit the functions I made available in JsTypedArray like the one called unsafeIndexedFromList which basically enabled me to avoid the use of setter when considering tensors that have a StridesView, i.e. non contiguous relevant data in the underlying buffer. I have few experimentations locally not committed or pushed since I’m a bit short on time these days. We can discuss more about that on slack or in elm-tensor issues if you want.

It depends a lot on if we were to make the distinction between a raw buffer that has no type, just a chunk of memory data like ArrayBuffer and “typed” buffers like Buffer Int etc.

I hope to have some more detailed response to the points in this thread later… But for starters:

I have a very concrete use case for something like this: I have a moderate elm application for managing audio synthesizers. The app must handle large buffers of binary data in two cases:

  1. Messages sent via MIDI which can be 10k bytes or longer. These must be built up or parsed byte by byte.
  2. Sampled audio buffers, which can easily be over 1M byte. These must be chunked up or joined together, written to/from files, and occasionally computed over.

Today, I bridge via port from the JS WebAPIs that deal in UInt8 arrays, and elm’s Array of Int. For the first use case, this has been mostly workable - Manipulation of Array Int isn’t so bad… And clearly this will work in 0.19.

But the second use case gets trickier. I’m dubious of manipulating 1M+ byte sample buffers as Array Int - though perhaps I’ll be surprised when I finish this part.

I’m totally good with immutability of the arrays… but size and speed are a bit of a concern.

Note: The app, with the first use case, has been out for a few months, and is in use by over 500 users. It is a manager for Elektron synthesizers. The sample transfer features (second use case) is in development for release this month.

A function like Buffer.initialize : Int -> (Int -> a) -> Buffer a would be a good building block for these sort of operations. Array.initialize is already a thing. :slight_smile:

While I’d love to have people try this, now that we know that native code will not be usable in 0.19, I recommend that you don’t spend time doing so. But one thing that will still be very valuable is feedback on your concrete use case. In your second case for example, you mention slicing and joining. Great, I’d suggest that once you’re advanced enough in your project, you create a post explaining the API you’ve come up with, and the pain points you had implementing it with current available array API and ports. This will be very valuable input for core elm members. And possibly, some solutions might appear in pure elm for part of these issues.

I used to do that also for one of my image manipulation use cases. Eventually I had issues with elm Array type that forced me to switch to Robin’s array exploration (HAMT array). At this moment, JS interop by ports became a burden because HAMT array was not a type directly supported. The good news is that Robin’s work is now merged in elm 0.19 so it will get better in 0.19.

Indeed :wink:

Great work Matthieu ! And very detailed explanation. Like you said, Typed Arrays are everywhere in modern browsers. at Nomalab we use Blobs (File api) to compute large files’ md5 (hundreds of GB), before slicing, encrypting and sending them to AWS S3. Right now this is is done by a quite buggy js library, but at some point we will have to do it ourselves in order to improve resuming after errors. It’d be awesome if we could do it in 100% Elm.

2 Likes

Nice benchmarking work, I need to do this myself.

I have gone and done the opposite and removed typed arrays from linear-algebra https://github.com/robert-wallis/elm-linear-algebra/tree/PureElm?files=1 don’t shoot me, I am just experimenting.

What I want to know is practically, when dealing with arrays of length 2,3,4 and 16 if typed arrays help or hinder performance.

Also, unrelated to linear algebra, just yesterday I discovered that new Float32Array([1,2,3]).map(()=>”hi”) returns [NaN, NaN, NaN]` so the elm compiler cannot use native map to implement elm map.

And by practially, I mean linear algebra operations like Mat4 multiplication in elm. I’m guessing streaming a large float array to the gpu is faster, which is also practial.

Thanks @Robert, haha don’t worry I don’t “byte” :door: . My approach is essential generic about typed arrays. Your’s makes me think of the one of @zinggi by the way (Zinggi/elm-webgl-math) except he used tupples where you used records. I’d encourage you to do benchmarks to evaluate all this. elm-benchmark is really nice to work with.

Regarding that, it is not actually an issue in elm. In the API I have:

map : (b -> b) -> JsTypedArray a b -> JsTypedArray a b

So you can only provide an Int -> Int or a Float -> Float as a mapped function. Elm prevents these kind of things from happening at compile time. But there is still a reason not to use map in the implementation, and that’s browser compatibility.

1 Like