Typed Arrays for Elm

Hi everyone!

Since Francisco announced his ambition to create NumElm, a package for machine learning in elm, I’ve been very excited about seing more of this math stuff in elm. One of the key building blocks would be an efficient fixed size multi-dimensional array (also called a tensor). I though I could leverage typed arrays to do so. Here is my shot at providing the JavaScript typed arrays API in elm: mpizenberg/elm-js-typed-array. As stated in the readme, my reasons are two-fold:

  1. Grow the cover of Web APIs in elm.
    Typed arrays are use for ArrayBuffers, Blobs, Files, WebGL, network exchange,
    canvas data etc. So having them in elm is important in my opinion.
  2. They are the only fixed-size, typed structures in JS. Due to this,
    I’m convinced they can be used as a solid ground for fixed size
    efficient mathematical (Linear Algebra) library.

In the following of this post, I detail:

  • design goals
  • use cases
  • performances
  • current API design
  • main API issues

I would love to have feedback on any of these, either here or in issues on the repo.

Design Goals

  • cover full TypedArray API (except few bits that don’t make sense in elm)
  • be elm compliant:
    • appear as immutable data structure: modification create copies
    • type safe polymorphism: add type parameters
  • be interoperable with elm data structures:
    • from elm data structures: fromList, fromArray
    • to elm data structures: toList, toArray
  • be interoperable with JavaScript: 0-cost encoders / decoders
  • be as minimalist as possible. It aims to follow the successful approach
    of Skinney/elm-array-exploration which splits its array implementation in two parts.
    First a minimal wrapper of JavaScript array in native code (Native/JsArray.js
    and Array/JsArray.elm),
    and second, a pure elm implementation on top of it (Array/Hamt.elm).
  • be flexible yet efficient:
    • most functions are benchmarked and optimized
    • most functions are also provided in an “indexed” form
    • some functions have an “unsafe” form to avoid overcost of functor type returned

Use Cases

In this blog post introducing typed arrays, the author list APIs making use of typed arrays, providing some examples.
I tried to implement one example of each use case (except MediaSource) with this elm-js-type-array package to improve design. Corresponding issue on github for more details.

  • [x] WebGL: examples/WebGL/{Main.elm, index.html, webgl.js}. Typed arrays are used to pass the vertices attributes at initialization and projection matrices at each animation frame.
  • [x] Canvas 2D: examples/CanvasImageData/{Main.elm, index.html}. Canvas 2D ImageData buffer is generated in elm and sent through port to JS that uses ctx.putImageData to draw it on the canvas.
  • [x] XMLHttpRequest: examples/XMLHttpRequest/{Main.elm, index.html, answer.bin}. The request is sent through port in JS since there is no support of ArrayBuffer in the Expect type of elm http package. Once array buffer is received in JS land, it is directly sent back through port to elm.
  • [x] File: examples/File/{Main.elm, index.html, answer.bin}. A File input is used to load a file, sent through port to call readAsArrayBuffer() in JS, array buffer result sent back to elm through port.
  • [ ] MediaSource
  • [x] WebSocket: examples/WebSocket/{Main.elm, index.html}. The exchanged data with WebSocket is done through port in JS since there is no support of ArrayBuffer in the kind of data sendable through elm websocket api. The array buffer is extracted in JS from the Blob received and sent directly back to elm as an ArrayBuffer.

Performances

Benchmarks Structure

The goal of the benchmarks is to make sure that typed arrays are fast for all potential use cases, ranging from small array manipulation, to big image-like (~10^6 pixels) matrices. Therefore, I’m comparing each key function at different scales (set in Constants.sizeScales) with three data structures, List, Array and JsTypedArray (4 actually since testing both Uint8 and Float64).

Therefore, every benchmark file looks like the following:

module Append exposing (..)

-- imports

main : BenchmarkProgram
main =
    program <|
        describe "Append" <|
            [ lists
            , hamtArrays
            , uint8Arrays
            , float64Arrays
            ]

lists : Benchmark
lists =
    -- scale benchmark
    Constants.sizeScales
        |> List.map (\size -> ( size, List.repeat size 0 ))
        |> List.map (\( size, list ) -> ( toString size, \_ -> List.append list list ))
        |> scale "List"

hamtArrays : Benchmark
-- scale benchmark

uint8Arrays : Benchmark
-- scale benchmark

float64Arrays : Benchmark
-- scale benchmark

Results

Detailed results with plots are available in this document. As can be seen there, the different functions benchmarked can be grouped into five categories:

  1. Typed arrays behave roughly like Array.
  2. Typed arrays are generally faster probably because of some implementation detail.
  3. Type (1) or (2) but with Uint8 implementation faster than Float64 since it benefits from lower memory size.
  4. Typed arrays are orders of magnitude faster due to a constant time implementation.
  5. Typed arrays are order of magnitude slower, due to full copy of array.

In category (1) (similar performance) we have:

  • all: return true if all elements satisfy the predicate.
  • any: return true if at least one element satisfies the predicate.
  • initialize: initialize an array with a function of the index.
  • filter: recreate an array by keeping only values verifying predicate.
  • foldl: reduce array from the left.
  • length: get the length of data structure.
  • map: map over array to create a new one.

In category (2) (generally faster) we have:

  • equal: the native implementation, restricted to numeric values, help a lot compared to elm == implementation.
  • foldr: reduce array from right. Native implementation is faster.
  • get: slightly faster since in constant time instead of log time. Even faster in unsafe form that do not check out of bounds.
  • slice first half: I’m not sure how exactly slicing is implemented in Array. It is faster here though.

In category (3) (faster with smaller byte size) we have:

  • append: append two arrays. Faster with Uint8 since it takes a lot less memory.
  • zeros: create an array of zeros.

In category (4) (order of magnitude faster) we have:

  • slice second half: with typed arrays, slicing is in constant time.

In category (5) (order of magnitude slower) we have:

  • set (replaceWithConstant): used to modify one or multiple values of an array with a new one. Orders of magnitude slower due to full copy of array.

In summary, we have globally better performances, plus one trade of: slicing is in constant time, but modifying data is in linear time due to a full copy.

Current API Design

Currently, the API is split into 6 modules (+ 2 mutable experiments):

  • JsArrayBuffer: manipulation of array buffers
  • JsDataView: operations on data views
  • JsTypedArray: polymorphic operations on typed arrays
  • JsUint8Array: creation of Uint8 typed arrays
  • JsFloat32Array: creation of Float32 typed arrays
  • JsFloat64Array: creation of Float64 typed arrays

The other integer array creation modules are still missing since I’m waiting for more feedback first. The full API documentation, as of commit 1accc47 is available here: doc-1accc47.txt. More details are available in this issue on Github. Below is a brief summary of the API.

Typed Arrays

In the typed array creation modules (JsUint8Array, JsFloat32Array, JsFloat64Array), with have:

  • Creation from scratch: zeros, repeat, initialize
  • Creation from existing data: fromBuffer, fromArray, fromList, fromTypedArray, unsafeIndexedFromList
  • Decoding from JS value: fromValue, decode

In the module JsTypedArray of polymorphic functions we have:

  • Types: JsTypedArray a b, Uint8, Float32, Float64
  • Interoperability: toList, toArray, encode
  • Basic requests: length, getAt, unsafeGetAt, buffer, bufferOffset
  • Predicates: all, any, findIndex, filter, and their indexed... equivalent
  • Comparison: equal
  • Extraction and appending: extract, append
  • Array transformations: replaceWithConstant, map, map2, reverse, sort, reverseSort, indexedMap, indexedMap2
  • Array reductions: join, foldl, foldr, foldl2, foldr2, foldlr, and their indexed... versions.

For questions about why two type parameters a and b, refer to the last part of the discussion of the main issues of the current API.

DataView and ArrayBuffer

JavaScript DataView provides a low-level interface for reading and writing multiple number types in an ArrayBuffer, with control over endianness. Current API in JsDataView do not let you choose endianness. It really won’t be an issue to just add this control when we first figure the best fitting API.

The JsDataView module provides functions that match almost exactly the JS implementation except for setters. More on this in next section.

The JsArrayBuffer API is very short like its JS counterpart. Basically, you almost never manipulates buffers directly, but as data views or typed arrays.

API Issues

Two Type Parameters

There can be many ways to choose a type in elm for JavaScript TypedArray. In discussions with Ian and Francisco, few forms were mentionned in comments, in the issue about API design:

  1. Fully generic: JsTypedArray. No type parameter at all. The same type is used for all different typed arrays.
  2. Fully specialized: JsUint8Array, JsInt8Array, etc. Each module has its own set of operations.
  3. One elm type parameter: JsTypedArray b where b would be Int or Float.
  4. Two type parameters: JsTypedArray a b where a is a type describing the typed array (Uint8, Float32, etc.) and b is either Int or Float.

Fully generic:

In the fully generic case, the main advantage is that the type is way simpler and functions only need to be implemented once. There would still be different functions to initialize a JS Uint8Array or Float32Array etc.

The main drawback is that functions taking two typed arrays (like map2) could be called with different typed arrays. This would imply that we have to do a lot of manual error handling. This is a critical drawback for performances in case functions manipulating multiple typed arrays, like mathematical operations for example.

Fully specialized:

In the fully specialized case, all operations are type safe. The main advantage is that it is super explicit what you’re dealing with.

The main drawbacks are that it implies a loooot of boilerplate and it prevents user to write a function once that would work on multiple typed arrays. Boilerplate is really annoying, and increases the chances that the documentation gets outdated, or that we forget to change something specific to the typed array. Preventing API users to easily write a function once, that would work with multiple typed arrays is also not a very good user experience.

One type parameter:

Using one type parameter that would be Int or Float depending on the compatible elm type still does not prevent type safety issues mentionned in the fully generic case. This is probably a half mesure not solving our main issues.

Two type parameters:

Using two type parameters like JsTypedArray Uint8 Int as one big advantage, it solves all issues with previous options. The first type parameter a (here Uint8) provide the specific typed array type, and thus enable us to prevent type safety issues related to underlying typed arrays. The second type parameter b (here Int) provide type safety for operations involving elm typed and typed arrays (like fold). The polymorphic aspect enable us to write all polymorphic functions only once. Only functions specific to each type (like creations of arrays) have to be written once for each typed array.

The main disadvantage of this approach is the possible unfamiliarity of programmers with types holding two type parameters. Since this drawback is hypothetical and it solves concrete issues of other approaches I chose this one for the time being.

DataView

In JavaScript, DataView provides a low-level interface for reading and writing multiple number types in an ArrayBuffer, with control over endianness. It has a constructor and three other properties: buffer, byteLength, byteOffset that have their equivalent elm functions in the JsDataView module. DataView methods are of two types, getters and setters, 8 of each, for every type.

Getters have the corresponding elm functions. Setters however, cannot be implemented as so if we want to keep immutability guaranties of elm. As a minimalist wrapper, I couldn’t figure how to deal with those setters. So for the time being I created an experimental module MutableJsDataView. It defines a type MutableJsDataView which is an alias to JsDataView and setters functions using this MutableJsDataView type in type signatures. This of course does not have any compile time guaranties that those functions will not be used on data views that are not supposed to be mutated. However by putting those in a separate module, with a different type in type signature, it makes the caller strongly aware of the mutability nature of those operations.

There might be a solution though (or multiple solutions) as suggested by Laurin in still this same API design issue. This solution would be to embrace the encoder/decoder api. As such, one might describe how to read/write all data on a buffer and the actual reading or writing “transaction” would happen once. This has many advantages over the getter/setter approach regarding immutability. It also has one inconvenient consequence, the API wouldn’t be anymore a minimalist wrapper around JS equivalents.

Another slightly inconvenient aspect, is that whether we use getters/setters or encoders/decoders, we will have to specify the positions (the “offset”) where to read/write data. Yet buffers are sequential data structures so it is possible that most use cases only need to read/write data sequentially (this, then that, then toto, etc.). As mentionned by Ian in this same github issue, the parser approach is very well fitted for reading such sequential data. I’m not sure what is the equivalent of a parser for writing data though.

Immutability

One of the design strengths of elm is the immutability guaranty. However, in some rare cases, it can be a performance chasm. Such cases are for example the modification of a 2D canvas data at frame rate. In this case, with immutability, we are creating a new array of millions of values, at roughly 30 hz. This implies a lot of garbage collection.

Such example is available in folder examples/CanvasImageData/. Use elm-make Main.elm --output Main.js to compile it and spawn any static http server to load index.html in your browser. Garbage Collection events are noticeable by small freezes in the color animation.

I did an experimentation by creating a module MutableJsTypedArray containing two functions unsafeSetAt and unsafeSet. Those functions are not used anywhere else in the API, only in this example experiment. You can find the example in examples/CanvasImageDataMutable/. When running this code, you won’t notice any garbage collection freeze, and it will globally consume less memory and CPU usage.

I’m aware that this module MutableJsTypedArray and the one mentionned in previous part MutableJsDataView break the immutability constraint guaranty of elm so I’m not very satisfied with them. I recently heard of Linear type systems that might provide some solution to a class of problems solved currently by mutability like the previous canvas 2D example. I am wondering if there would be a way to do something similiar in elm, by describing the transformation, and letting the elm runtime (or something else) do the immutable transformation “in place”.

7 Likes

This is a whole lot to read through, and after doing so I still don’t have the feeling I understand what’s up. Would you mind answering a couple of questions?

  1. What are your next steps? Like “this is the thing I will try next”… is it ready to evaluate?
  2. If it is, what do you want us to help you with? It would be good to write that down as succinctly as possible. There are too many words here for me to say “yeah this is good” or “maybe rethink this bit.”
  3. Are you asking for Evan to approve this native code at this point? If that’s the case, then a couple sentence summary of the whole thing will make it much easier to get to a “yes”, “no”, or “not yet.”

Aside from these things, I note that you’ve set out to create a memory efficient data structure, but you’re benchmarking CPU time. I recognize that elm-benchmark makes it really easy to do time-based benchmarks, but it’s the wrong measurement here. Have you measured the memory efficiency of your implementation versus your comparisons? What have you found?

(ps: In your spreadsheet, the chart is over the data. Could you move it? I can’t move or hide since the link you posted has only viewing permissions.)

Thanks Brian for your questions and sorry if this felt overwhelming. I’ll try to answer briefly :slight_smile:

So basically, I’m trying (when I have some time) to build elm-tensor on top of this as an experiment for a linear algebra package in elm. I’m very far from something usable in elm-tensor though.

elm-js-typed-array however, is more mature and I wanted to share it with the community since it could help with unsupported web apis.

Shortly I think the message is: I certainly can’t cover all typed arrays use cases on my free time so, if you are interested and doing an elm project involving one of those use cases, would you mind test this package and give feedback?

Btw, what should I do to make this testable? tag a version? something else?

This clearly isn’t mature enough to involve Evan. I’d rather have more feedback from people interested in using this first for actual projects.

The point about memory garbage collection was mainly for the canvas 2D data animation at refresh frame rate. And it was a result of me watching at small freezes in animation, and yoyo of memory usage in htop. For all the rest, the focus was on CPU performance. Thanks alot for elm-benchmark btw!

Yes sorry, that’s corrected.

1 Like

You’ll have to take a different path since your package uses native code. In order for other people to test they’ll need to copy all your code to their package and set flags in elm-package.json. It will be messy, and I can’t recommend it in good faith since doing so spreads more native code around the ecosystem.

But I don’t think it’s the right way to proceed. If you want my advice: come up with a minimal API that solves a strong initial use case which can’t be solved without native code. The fact that you don’t feel this is mature enough to seek approval for native publication suggests to me that the need is not quite there yet. Once you have that pair of concrete problem/solution, get feedback here or Slack or whatever, then suggest it on the dev list. It’s better to get that feedback sooner than later.

I sure hope this doesn’t come across as discouragement from continuing on! It’s just that I haven’t seen a successful native project that hasn’t worked like this in one form or another: figuring out what doesn’t work, communicating about that, and suggesting a way around it. Things the language can’t do tend to get prioritized higher than things that might be a slightly nicer iteration of what we already have.

But more than that, thank you for taking your time to do this work! You’ve already got together a great deal of research that will help you move forward! In particular, you might already have a good use case in elm-tensor, but I wouldn’t jump to native code until you can demonstrate a real need for this particular solution. A poor user experience with animation might be a similarly blocking case.

2 Likes

Well, my goal certainly isn’t to spread native code but to reduce it in some sense :). Once typed arrays and array buffers are in the language, it will reduce the amount of native code out there, including in libraries such as elm-community/webgl. There are things not doable in the elm langage yet, namely File, Blob and everything else involving ArrayBuffer over sockets, http requests, or multimedia. I’ve read on Evan’s roadmap that “Expanding web platform support is a high priority, just behind single-page apps”, so I might not be surprised if work on all these start right after 0.19 is out. I’m simply (optimistically) hoping that this initiative can help when it will be time to introduce array buffers and co. into the langage.

2 Likes

Hmm, that was probably the least important part of what I said. :thinking:

I agree with your motivation! This would be really nice to have, and enable some pretty cool things. But saying “this is necessary!” needs at least one extremely solid use-case to back it up. That was the point of my post. Find that, and you’re in really good shape (and, bonus, you’re on the right track IMO!)

Here’s an important question: “Should Elm ever expose bindings to Typed Arrays?”

Elm aspires to target more platforms than JavaScript in the future. Typed Arrays are a language feature unique to JavaScript, and they cover a lot more use cases than just Canvas. Is it best for Elm’s long-term ecosystem to have mathematical packages built on top of a JS API, forever coupled to JS and inaccessible to people who want to do that sort of math on, say, Elm servers that compile to LLVM or BEAM?

I think the default answer is that Elm should never expose bindings to Typed Arrays. Rather, it makes more sense to have Typed Arrays be an implementation detail of libraries which require them (e.g. Canvas, Blobs, perhaps certain low-level math libraries like the use case in OP). Possibly someday the Elm compiler could situationally emit JS that uses typed arrays under the hood as a performance optimization. Who knows?

I could be wrong about this, of course…but I would not make the opposite assumption lightly!

4 Likes

I think this may be the intended direction for this typed arrays library.

In order to try out typed arrays without modifying the Elm compiler, Matt has had to bolt them on in such a way that to use them, you definitely know you are using them.

A cleaner way would be if the compiler automatically recognised uses of Array Int or Array Float, and segwayed into the typed array code for those cases, but continued to treat other types of Array in the normal way. I believe such things are called compiler intrinsics. Doing this would require modifying the compiler, which is a lot more work. In the JS version of Elm, they would map to typed arrays, in some future LLVM version, they would map to some other kind of fast/compact array.

So the way I see this work is as a testing ground for trying out typed arrays in Elm, understanding if they can be made to behave nicely and perform well when fitted to the same API structure as Arrays.

They are useful for interacting with other Web APIs (to add Blob, Elm perhaps needs to add a Byte type?), although not essential. We can always convert an array to a typed array to interact with some API that needs the typed array.

The idea is to optimise Elm Array → Typed Array → Some Web API, to just Elm Array → Some Web API. What Web APIs are there that could benefit from this optimisation?

Thanks Brian, I’ll continue to experiment with elm-tensor, see where it gets me.

I wrongly assumed that since they are needed for many web APIs, they would end up available in the language. But as you say, they could very well stay as an internal implementation, only available to core. I’m not sure this devalues the work I’m sharing here though.

Regarding why choosing typed arrays for my experiment with elm-tensor, well mostly because I “need” (understand on a performance point of view) constant time slicing and reading of arrays to implement most linear algebra base functions. The API proposed here is extremely similar to the one of Array for the reason you mention, should elm have such data structure, it would be easy to adapt.

In Skinney/elm-array-exploration, Array.Hamt is based on a thin layer of native code (JsArray.js and JsArray.elm). I think of this experiment as analogue to the thin layer of native code in Robin’s package. Perhaps a type called NumericArray (analogue to Array.Hamt) and that would provide the constant time slicing and reading guaranties could be introduced, independently of the actual plateform implementation.

Point taken. Again my goal here is not to hurt the elm ecosystem, simply to share my thoughts and experiments on tools for a transformation (growing the web platform) that will happen way before elm targets another platform.

As you and Richard mentionned, Elm doing performance optimization under the hood is possible and would be awesome. Yet, elm Arrays and typed arrays behave fundamentally differently, not having the same complexity on operations. So doing something like automatically using typed arrays for numeric values would not be beneficial.

If you refer to this blog post from 2012, “Typed Arrays are a relatively recent addition to browsers, born out of the need to have an efficient way to handle binary data in WebGL.” So I think they are probably beneficial everywhere they were introduced.

PS: please keep also in mind that I don’t have any elm programmer in my near surrounding (though helping organization of a meetup in my area) so sharing my experiments online is the only way I can have feedback on those. If I had communicated earlier about my experiment with elm-tensor, saying something like “hey, I’d love to have constant time slicing and reading array to do this” I wouldn’t have done anything at all since this would not happen anytime soon. Timing and communication are hard things to figure out when you practice elm on your own.

Totally! I don’t want to discourage you - just want to share my top-of-mind thought. :smiley:

1 Like

But if you built NumericArray as you suggested above, as a HAMT on top of typed arrays, then they would be almost the same.

I think though, that web APIs that consume typed arrays expect to be given a single contiguous array? Or is this not the case? I think the HAMT structure works with arrays of 16 (or is it 32) elements, so that a write to the array only requires copying those 16 elements plus some of the structure above it.

To pass such an array to a Web API expecting a single contiguous typed array would require appending all those 16 element chunks together. At which point, you may as well just be working with non-typed arrays, since the conversion could happen at that point.

@rupert In that way yes it is not very useful. It would be simpler to just work with Array.

What I actually meant by "analogue to Array.Hamt" was that it would be the pure elm module exposed, internally based on _ (put here JsTypedArray or another implementation with the same complexity guaranties), just like Array.Hamt is the pure elm module exposed, internally based on JsArray.

So what advantages would NumericArray have?

While I wouldn’t want to tie Elm to JavaScript’s typed array API, the concept of a single chunk of memory storing values in a linear, contiguous way exists everywhere (usually just as ‘an array’!) and I think ultimately would be worth having in Elm as an alternative to Array with different performance characteristics. I personally like the name Buffer, but I’m not picky =) I think this means that something like this JsTypedArray package could then be one of two things:

  • An internal implementation detail for the Buffer implementation on the JavaScript backend (use as is but write a backend-agnostic layer on top of it)
  • A useful reference on/example of how to use the JavaScript typed array API from Elm code, for whenever a Buffer type gets implemented
3 Likes

Buffer would be a good name for it.

There would then need to be a set of transformation functions Array -> Buffer and Buffer -> Array to convert between them.

One disadvantage of having essentially 2 versions of array would possibly be that of having to write the same functions many times to deal with the different combinations.

For example, suppose I dynamically built up a list of vectors to be rendered, but then transform that into a 3d perspective using a matrix that is the same each time. It is going to be more efficient for the list of vectors to be an Array, but the transformation matrix to be a Buffer (or a slice of a Buffer). So now I need a linear algebgra library that supports all the combinations for Matrix multiplication, Array x Buffer, Buffer x Buffer and so on.

Is there some way they could be allowed to be the same thing by introducing another pseudo-type class like we have for comparable and number?

My other suggestion was that they really both be the same type, Array, and the compiler automatically figures out what underlying representation is best. But that sounds very hard to do.

@rupert Yes interop with other elm data structures in important.

If they are the same type, one way the compiler could decide which implementation to use could be to analyze the operations performed and chose accordingly. There could be no good choice though.

As Ian mentionned, a buffer-like data structure would be a welcomed addition to the language. And I think programmers can choose the adequate data structure if they are aware of the trade-offs.

In your case, I think there would be no issue. If you are dynamically building a list of vectors, in an asynchronous manner and don’t know how much time it will take, adding one interop at the end is not a big deal. If however, you are building it incrementally but successively in a deterministic way, there should be a creation function enabling you to do it without needing an intermediate interop. Anyway, this is very hypothetical and the best way to sort it out would be to have a concrete problem like Brian said.

Ok, that sounds right. So if you were to write an optimized linear algebra library it would work over vectors and matrices implemented on top of Buffers. No need for there to be a mixed mode.

I think such a library would need to be implemented in native code. For example, if multiplying 2 matrices, the result will be written to the output matrix using Buffer.set many times if it were done in Elm, and Buffer.set is going to copy the whole Buffer. A native implementation would read from the 2 input matrices, and build up a the answer in a mutable array, then return that as a Buffer back to the Elm program.

What would the API for creating Buffers look like? I imagine something like this:

intBufferFromArray : Array Int -> Buffer Int
floatBufferFromArray : Array Int -> Buffer Float

and the rest of the API is just like Array.

You could have (emptyIntBuffer : Int -> Buffer Int), but using .set to fill it in is going to be inefficient compared with using Array.set, then converting, so I think just the Array constructors are needed.

Yep, if you have some time, have a look at what I’m trying to do with mpizenberg/elm-tensor. It’s still a bit rough, no readme and all, but you can get pretty much an idea of what I’m doing from this roadmap issue and with this Tensor data structure much inspired by numpy arrays:

type alias Tensor =
    { data : DataArray
    , dimension : Int
    , length : Int
    , shape : List Int
    , view : TensorView
    }


type TensorView
    = RawView
    | TransposedView
    | StridesView (List Int)


type alias DataArray =
    JsTypedArray Float64 Float

I’m trying as much as possible to avoid any native in elm-tensor so that’s why it influenced a bit the functions I made available in JsTypedArray like the one called unsafeIndexedFromList which basically enabled me to avoid the use of setter when considering tensors that have a StridesView, i.e. non contiguous relevant data in the underlying buffer. I have few experimentations locally not committed or pushed since I’m a bit short on time these days. We can discuss more about that on slack or in elm-tensor issues if you want.

It depends a lot on if we were to make the distinction between a raw buffer that has no type, just a chunk of memory data like ArrayBuffer and “typed” buffers like Buffer Int etc.

I hope to have some more detailed response to the points in this thread later… But for starters:

I have a very concrete use case for something like this: I have a moderate elm application for managing audio synthesizers. The app must handle large buffers of binary data in two cases:

  1. Messages sent via MIDI which can be 10k bytes or longer. These must be built up or parsed byte by byte.
  2. Sampled audio buffers, which can easily be over 1M byte. These must be chunked up or joined together, written to/from files, and occasionally computed over.

Today, I bridge via port from the JS WebAPIs that deal in UInt8 arrays, and elm’s Array of Int. For the first use case, this has been mostly workable - Manipulation of Array Int isn’t so bad… And clearly this will work in 0.19.

But the second use case gets trickier. I’m dubious of manipulating 1M+ byte sample buffers as Array Int - though perhaps I’ll be surprised when I finish this part.

I’m totally good with immutability of the arrays… but size and speed are a bit of a concern.

Note: The app, with the first use case, has been out for a few months, and is in use by over 500 users. It is a manager for Elektron synthesizers. The sample transfer features (second use case) is in development for release this month.