TL;DR: Performance is important for WebGL. Repeated allocations harm performance. Recycling variables can help.
I’ve worked as a WebGL graphics engine engineer for almost 5 years. Performance was a constant issue. Between puke-free VR, better mobile experience or just fancier graphics, there was always a reason to get better performance. This is why I’ve never done much with WebGL in Elm, despite liking both Elm and graphics. A language that relies on constant allocation and garbage collection seemed like a bad start for a “serious” WebGL application. Missing RTT would also have been a problem but that’s a different matter.
The issue resurfaced for me a few months ago at a presentation by unsoundscapes about his physics engine. In particular the fact that it was faster when using record-typed linear algebra than the elm-experimentation ones.
The reason for this was obvious to me: allocation of objects and arrays is faster than allocation of typed arrays in JavaScript. I used to choose between regular and typed arrays on a case-by-case basis, depending on their longevity. A test to show the difference:
var nIterations = 5000000;
var t0 = performance.now();
for (let i = 0; i < nIterations; ++i) {
var v3 = {x:1,y:2,z:3};
}
var t1 = performance.now();
console.log("Object creation took " + (t1 - t0) + " milliseconds.");
var t2 = performance.now();
for (let i = 0; i < nIterations; ++i) {
var v3 = new Float32Array([1,2,3]);
}
var t3 = performance.now();
console.log("Typed array creation the Elm way took " + (t3 - t2) + " milliseconds.");
var t4 = performance.now();
for (let i = 0; i < nIterations; ++i) {
var v3 = new Float32Array(3);
v3[0] = 1;
v3[1] = 2;
v3[2] = 3;
}
var t5 = performance.now();
console.log("Typed array direct creation took " + (t5 - t4) + " milliseconds.");
var t6 = performance.now();
for (let i = 0; i < nIterations; ++i) {
var v3 = [1,2,3];
}
var t7 = performance.now();
console.log("Regula array creation took " + (t7 - t6) + " milliseconds.");
Resulting in 1120ms, 4070ms, 5232m and 1578ms. IOW, allocating objects is a lot faster. Typed arrays on the other hand are faster for computation, especially when SIMD instructions are available.
In Elm, every computation comes with an allocation, and generally speaking allocation is the expensive part of a program, generic memory managers being terribly complex.
Again I did a little experiment to illustrate how much of a difference this is. Setting the x value of 5 million vectors in an Elm program takes 3637ms. I did a little bit of cheating to see how far this can be reduced by removing allocation:
/*var _MJS_v3setX = F2(function(x, a) {
return new Float64Array([x, a[1], a[2]]);
});*/
var _MJS_v3setX = F2(function(x, a) {
a[0] = x;
return a;
});
We get 827ms, the cost of just the computation we’re interested in (and surrounding loops etc). Of course setting the x value is one of the least computationally-intensive linear algebra operations there is, but I don’t think I need to show that allocation is relatively expensive, this is a well-understood property. In fact prominent js linear algebra libraries use out arguments to avoid allocations. C-libraries on the other hand tend to pass vectors by value, avoiding the heap altogether.
In short: I would not consider Elm for any “serious” WebGL work due to how allocations with const-type implementation harm performance. I love Elm, but this is just not currently its strong point.
What to do?
I see a few potential solutions, there may be more.
1. Do nothing
WebGL does not have to be an important part of Elm. For this case, I think it would be fair to put a warning in the WebGL library documentation discussing this performance issue to avoid disappointing people down the line.
2. Replace native linear algebra with records
Not actually a solution, as allocation is still expensive, but it would alleviate the problem a bit.
3. Cheat with allocation
There’s a bunch of optimizations you could do to allocations. When working on WebGL professionally I kept arrays of objects and pulled/pushed pre-allocated objects from them to avoid allocations. Elm could do this using reference counting. It might be difficult to figure out for which types to use these pre-allocated pools. Doing it for small product types (such as vectors) only might also be a good heuristic. Profiling and annotations could further expand the use-cases, but that doesn’t sound very Elm-like.
4. Introduce linear types
Linear types are ‘consumed’ when passed to a function. They are old and well-understood, but don’t fit in many languages. Rust is built around them, Haskell got them recently, Idris has them as uniqueness types, even C++ has something similar in the form of unique_ptr
.
I believe Elm is sufficiently clean and simple that it could leverage linear (or rather, uniqueness) types without any real issue.
Essentially, if a type is consumed, you can safely recycle its allocated memory in a function, removing the need for allocation and garbage collection.
The downside is the introduction of a new concept to the language, possibly requiring new syntax, the upside is that it would work with anything. As a bonus, this could be used for RTT, safer random seeds and similar inherently ‘consuming’ operations.
5. Pretend some const types are linear
If a const type is used in only one place (let’s call this unique), it can safely be recycled, same as a linear type, as nothing else has access to that memory location.
Const types can become unique by making a deep copy whenever a new reference is required. Again, this could work well with small product types, although I’d expect programmer annotations or profiling to be more powerful. This solution can be easily combined with #3, removing the need for reference counting.
Finally
To clarify: I’m not saying Elm should do X or that it is doing Y wrong. I simply like theory on FP and performance. Would love to hear some thoughts on this.
I have a few more ideas about cool (IMO) stuff you could do with/to Elm code, if this kind of post is undesirable please let me know and I will keep it to myself in the future.