I wrote a blog post about how Elm’s first-class functions could work in WebAssembly, if/when the compiler targets it in the future. Let me know your thoughts on these ideas! Are there other techniques I should look at? Do you know any good resources on this stuff?
I have some ideas for further posts, listed at the bottom of this one. Let me know if any of them are of interest to you!
Very interesting read, I’d be very interested in whether or not a WASM target for Elm would really need a vanilla garbage collector? Maybe the immutability lends itself to a simple reference counting mechanism? Looking forward to your next posts, representing extensible records should be interesting
Thanks!
There’s a proposal to provide some access to the existing browser GC, and they’re looking at different language types, including typed functional ones. Will be interesting to see what they come up with!
Elm has some roots in SML, which is a statically typed and eagerly evaluated functional language with full type inference. A lot of theoretical work on optimising such a compiler was put to use in the implementation of MLTon.
And if you want a set of resources to read up on, I guarantee you many a good nights sleep with this bedtime reading list (plenty papers on GC in there):
On the subject of code generation, why write your own? Ok, it is interesting, but if you just want to get the job done I would look at the WASM back-end for LLVM. You could also compile Elm to x86 assembler if a front end to LLVM bytecode were written. LLVM also provides a framework for implementing garbage collectors. I would say that LLVM shrinks the job of writing a compiler back-end by 95+% compared with writing your own from scratch.
I guess I just wanted to understand how WebAssembly works and know exactly how I’m mapping to it. Also I’m just not familiar with LLVM so didn’t really think of it or realize it was an option! Maybe I’ll create a branch to write some stuff in LLVM and compare.
To be honest I haven’t found it all that hard writing Wasm though. Wasm is definitely higher-level than “real” assembly languages for hardware. I feel like the name makes it sound scarier than it really is.
For example the text format actually has nested S-expressions, which is a complete game changer because it makes everything composable. I made a Haskell AST and DSL for it and it’s pretty decent to work with. Sure, it’s lower level than the JavaScript AST Evan uses, but the difference isn’t as huge as you would think. It’s not quite Lisp, but not quite assembly either!
But I’ll check it out for sure. Sounds like it has a lot of nice things in it. Maybe it’ll make things way easier.
Well I agree it is good to carry out these experiments to understand things better.
LLVM provides a lot, but you still have to decide how you will encode things on the heap, how you will construct continuations, how and when you will collect garbage and so on, and LLVM doesn’t decide any of that for you, so in no way invalidates what you are doing. Once you have a design for how the whole virtual machine (the correct term is actually ‘abstract machine’) is layed out in a flat memory space, coding it up in LLVM is on a par effort wise with coding it up in Wasm, and the design would translate accross easily enough.
There is some work in progress to compile Haskell to Wasm at https://github.com/tweag/asterius . Seems very limited for now, but there are already some bindings to bynaryen. Might be less scary than pure LLVM.
The section “Why not just use LLVM bitcode as a binary format?” bit in the WebAssembly FAQ at https://webassembly.org/docs/faq/ explains some of the differences. I think the appeal of LLVM to me is the way it supports multiple back-ends, and that it gives you a lot of optimisations out of the box. So you can get pretty well optimised code out of it, and support for lots of architectures at a low cost. Webasm is a slightly unusual back-end for LLVM in that it is also a VM with its own JIT compilation step and optimisations, so that does slightly undermine the value of LLVM. However, I’d prefer to put my effort into a compiler front end that would give me both Webasm and optimized x86 code as outputs.
At the moment I am compiling Elm on the server side by running it as Javascript through the Java Nashorn engine. I could run it on NodeJS but as my server processes tend to be Java I’d have to call out to NodeJS in a different process, so even though the Nashorn engine is not the fastest, it works out fairly efficiently through the JVM. If I could compile Elm to x86 through LLVM, then I could call into well optimised Elm code running in the same process as my Java code. Also javascript is not so strongly typed and eliminating it as an intermediate language would have real benefits to Elm.
I understand your angle, you seem to have given it quite a lot of thoughts. Do you believe that a simple syntax like Elm’s can compile to bare metal without the hassle of fine tuning at the compiler level ? To me the biggest barrier to languages like Rust Haskell or Ocaml were compiler flags, options and/or annotations. Good compiler guidance could change that eventually, but I have yet to encounter such experience.
I don’t mean that the a compiler would necessary be free of options when I say that “it gives you a lot of optimisations out of the box”. What I mean is, there are lots of optimisations already written for LLVM, you just pick which ones you want to run rather than figuring out from scratch how to optimize a code generator.