Eco - Compile Elm to native code

I am delighted to be able to share my new Elm compiler project with you today.

eco - Elm Compiler Optimized - is a new optimizing compiler infrastructure for Elm. Here is eco - introduction

eco is fully open sourced today - GitHub - eco-lang/eco-compiler: eco compiler for elm · GitHub

Some things about eco that may interest you:

  • Self-compiling “boot strapped” compiler.
  • Front-end in Elm, back-end in C++. No Javascript and no Haskell.
  • Generates x86 native code via MLIR and LLVM (arm64 for Mac soon)
  • Generational Garbage Collector
  • Aims to be 100% Elm compatible (core, json, bytes, http, regex, url, parser, time, already implemented)
  • Can run existing Platform.Worker + NodeJS programs with Elm.init and ports in Javascript linking to a native binary for the Elm program.
  • Int compiles to native 64-bit integers
  • There are > 150,000 elm-test tests against the front-end compiler pipeline.

What are the long-term goals here ?

  • A multi-threaded Elm runtime with parallel Tasks and TEA Actors
  • Support for servers with high core counts and large RAM - eco supports 8TB address space.
  • Throughput and latency to match C++/Rust/zig
  • Ref counting and optimistic mutation optimizations to replace GC.
  • Keep the gradual migration path from existing 0.19 Elm programs on NodeJS
  • Potential pathway to compiling web apps to WASM.

Ok, those are some grand claims and there is a long way still to go to achieve them, but I wanted to share my vision for what I believe will be possible.


The version being made public today is 0.1.0 alpha.

Please note that the alpha label is a suggestion that you should interpret this work as experimental today. There will be bugs, possibly even severe ones. Please find some bugs and tell me about them in the GitHub issues. This is the whole point of making an alpha release and starting the journey of the code from my computer to meet the real world.

Some limitations of eco 0.1.0 alpha:

  • Single threaded, including the whole compiler pipeline, the compiler is not fast.
  • Runtime performance is moderate - roughly the same as running under NodeJS.
  • Minimal IO API (to support the compiler).
  • GC pause times are significant, it is single threaded, your program thread does the collection work.
  • “hello world” binary is 8MB in size - mostly debug symbols. Needs debug stripping and not linking unused kernel code - should be around 100KB after that.
  • 0.1.0-alpha release only supports Linux x86. Windows and Mac builds will follow very soon.

It is risky to make an early release such as this if people are expecting higher performance because it compiles to native. Please be understanding of the early version number and experimental nature of this project at this time.

I am confident that performance can be greatly improved, all the techniques are known, it is just a question of putting in the work to implement them. Right now there are so many levers to pull on to make it faster, the harder choice is choosing the correct levers to pull first.

eco has its compiler front end implemented in Elm. This part started its life as the Guida elm-in-elm compiler. eco adds a new pipeline to that codebase for an optimizing compiler pass that generates native machine code. The back-end code generation part of eco is implemented in C++ and built on top of LLVM and MLIR. There is a well defined bytecode at the boundary between front-end and back-end, called the “eco dialect”.

The compiler is already capable of self compiling. This is a 160K LOC codebase, some of which makes heavy use of continuation passing style. There are over 150,000 tests applied to the code, and over 1000 end-to-end Elm test programs that try to cover many corner cases of the language itself. So you should find that the compiler is well tested and capable of handling large and complex programs.


I began working on eco in November last year. Since this is the second elm-to-native compiler you are hearing about in just a few short weeks! It is worth mentioning that eco is an entirely separate and independant project. eco builds a new compiler pipeline that branches off at the type checking phase to enable more high level optimisations that operate at the language level and make use of type information - such as monomorphization.


Full disclosure - most of the code is implemented by AI, Claude Code specifically. It is not “vibe coded”, there is a strong software engineering and design element to drive that code generation. Without AI this project could have taken 2 or 3 years to implement at least. The structure of the code is good and not a mess. My current feeling is that I should use less AI during a tidy up phase of the code, in order to ensure that the kinds of mistakes that AI can make get reviewed out. But I want to be transparent about this in case you are against AI - you might not like what you find here!

13 Likes

I am starting again.

As I said on the other thread for the last 6 months I worked 40/50/60 hours a week to make this. It is my original work in its entirety but was coded with the assistance of Claude. Please respect the amount of work I have put into this instead of just attacking what I have done. Only a very unkind person would do that.

Lets maybe have some questions about the code, the ideas, future plans. Or perhaps you have tried it out and it is not working or you need some help ?

4 Likes

I posted this before because I really enjoyed this talk. I worked for John O’hara about 20 years ago - he was the one who started AMQP (RabbitMQ).

This talk made me have a lot of thoughts about the kinds of software systems we could build on modern CPUs.

When I think about threaded actor model, CPU affinity and cache aligned architectures, I also think about how could we do this with Elm ? How could we take advantage of this hardware without compromise, yet build in an aesthetically brilliant language (Elm) and use domain modelling to build software that “makes illegal states impossible” at the same time. That is the goal that I am aiming for, although it is still a long way off to achieve it.

So now you know why I posted about Elm + threads: If Elm had threads..?

And there is an experimental API and simulation of it here: GitHub - rupertlssmith/elm-with-threads: What would Elm with threads be like? · GitHub

eco runtime is currently single threaded only. Your program thread even runs the garbage collection! I am already thinking about how to introduce threads into the implementation - the API side seems fairly clear - but on the implementation side, when to have heaps that are shared between threads or not, when considering the question of how to have the highest possible event throughput between threads.

Perhaps it is better to tackle reference counting first, and eliminate much of the GC work.

Exciting to see this announced! I was wondering if it supports reproducible builds (i.e. is the binary produced for a particular CPU architecture deterministic)?

Yes, it should be.

There is a step in the build pipeline where the compiler builds itself and then compares the output to the original to check they are binary equal. For a long time only the MLIR bytecode was passing that test because the binaries contain symbol tables that were not ordered deterministically. That should be fixed now, and the actual executables are deterministic.

2 Likes

Somebody at Elm Camp asked me about the compiler bootstrap process, which I did not do a great job of explaining. It is quite complicated, so here it is:

Stage 1 - The stock npm Elm compiler compiles the Eco source using XHR-based IO, without --optimize to JS (the XHR Eco.Crash uses Debug.todo).

Gate A - Run the E2E suite through Stage 1’s ouput to validate the frontend + MLIR-codegen + runtime + JIT stack.

Stage 2 - JS self-compiles with kernel IO enabled (Eco.Kernel.*, enabling --optimize with Eco.Crash), producing eco-boot.js.

Stage 3 - eco-boot.js compiles itself to eco-boot-2.js.

Stage 4 - eco-boot-2.js compiles itself to eco-boot-3.js, then diffs eco-boot-2.js vs eco-boot-3.js; they must be byte-identical (JS fixed point reached).

Gate B - Run the AOT E2E suite: compile each test via eco-boot-2.js, lower to native ELF via eco-boot-native, run, and check stdout against – CHECK: patterns.

Stage 5 - The fixed-point-verified eco-boot-2.js compiles itself to MLIR producing eco-compiler.mlir

Stage 6 - eco-boot-native lowers eco-compiler.mlir (Eco dialect → LLVM dialect → LLVM IR → object) and links with runtime + C++ kernel static libs into a native x86-64 ELF, eco-compiler.

Stage 7 - The native eco-compiler self-compiles to MLIR, which eco-boot-native lowers into the bootstrapped native executable eco-compiler-boot.

Stage 8 - eco-compiler-boot self-compiles again, is lowered, and the result is compared byte-for-byte against eco-compiler-boot (native fixed point), yielding eco-compiler-boot-2.

Stage 9 - Fuse the front-end (eco-compiler) and the lowering back-end into a single user-facing eco binary.

Stage 9b - eco self-compiles to eco-2; a successful self-compile is the success criterion.

JIT = Just In Time LLVM runner.
XHR = XmlHttpRequest hack, a way to write new Tasks for Elm by hijacking elm/http.
AOT = Ahead Of Time, that is, compilation to native binary.
CHECK patterns = Comments in Elm E2E tests that state what the expected test output should be.
E2E = End to End test suite
LLVM = Low Level Virtual Machine
MLIR = Mid Level Intermediate Representation, lowered to LLVM.

This shows how we go from Elm JS kernel only, through Eco JS kernel and then to Elm+Eco C++ kernel implementations.

Once the compiler self-builds, we could just go eco → eco → eco from that point onwards. But it remains important to keep the entire pipeline running for some time. For example, maybe there is some bug in the fixpoint compiler that cannot be removed, and we need to go right back to the original Elm compiler to fix it. Unlikely, but sensible to keep the road open anyway.

2 Likes