Writing a compiler in elm?

I think the answer to this is ‘no’, but would appreciate confirmation:

Is it advisable (or even possible) to write a compiler in elm?

I’m assuming not given elm’s focus on in-browser ui. So there’s no native file system access, and I’m not even sure what would be involved in running locally (on nodejs ?).

Thanks.

2 Likes

It is absolutely possible. In fact, it is actually in the works! Check this out: https://github.com/elm-in-elm/compiler

1 Like

Also check out schelme - an interpreter not a compiler but such things have a fair amount in common.

Wow - excellent. Thanks both, that’s really encouraging. Off to read some code…!

Remember you have ports. You can work with filesystem etc. APIs using Node and talking to it through ports.

It’s exciting to see more and more compilers and interpreters cropping up in Elm! But be warned - Elm not having some advanced features sometimes makes it necessary for you to write boilerplate that you wouldn’t have to in some other languages. Tradeoffs!

I think it’s worth thinking a bit about this one…

You certainly can write a compiler in Elm, but the question to ask is should you write it in Elm. That really depends on the goals of your project.

Only technical note that stands out to me is that the type inference algorithm absolutely needs mutation to run in roughly linear time for large files. Specifically the union-find data structure as outlined here. Without it, you can get really bad asymptotics that are obvious when you start getting to 200+ line files. I believe that is the only thing that I use mutation for when compiling individual files in the Elm compiler! I think it’d be cool to write a compiler in Elm, but I also wouldn’t want to worry about that aspect. And I know the language isn’t changing to accommodate such a particular scenario at such high cost to the rest of the ecosystem!

I think Haskell is a good fit for this kind of problem. It has good portability to different OSes, it can be made to go quite fast, and it is pretty good for concurrently compiling files. That said, a huge amount of my time has been wasted by laziness in my experience. E.g. compilation is not necessarily going to happen on the light-weight thread you want unless you are very careful and vigilant. So you may make a cool concurrent system to compile files, but it all “gets evaluated lazily” when you actually write the assets to disk, thereby becoming sequential. Very very frustrating! Thinking of all the tradeoffs though, I still think it is a good option for my needs!

If I was starting again knowing everything I know now, I might consider Rust as well. But I’m not sure it would have been ideal to learn how to write compilers with a language that lets you get so deep into perf. Maybe that would have been okay though! Not sure!

11 Likes

Is it possible to write the type inference algorithm without mutation and have a reasonably smart compiler convert it to the efficient mutable algorithm when compiling?

1 Like

I remember reading something in similar vein about Algorithm W (pure but not practical) vs Algorithm J (fast but with side effects). Is this the same thing, or are those two different problems?

If I remember correctly, our professor told us that ghc type inference works using a greedy algorithm that performs worse than union-find asymptotically, but better in practise.

I took a look at ghc source code and it seems there is a pure unification algorithm in its Unify.hs module using “Algorithm U”, and an impure unification algorithm in TcUnify.hs.

Maybe a pure unifier in elm-in-elm could perform sufficiently well?

@MartinS, that doesn’t sound plausible to me. It needs a certain data structure, so you end up writing code a certain way and small details are important.

@Janiczek, not sure! I believe the approach I use is called HM(X). It lets you have variants like HM(=) for languages like Elm and HM(≤) for languages like Scala with subtyping.

@Philipp_Krueger, it is true about using a greedy algorithm, but that refers to the order of solving constraints. GHC can reach a type class constraint that it cannot solve right now. Maybe you have (Num a) => a but I do not know what a is yet. So it kicks this constraint to the end of the queue. Solving more equality constraints may reveal that a is an Int, so when they get to that type class constraint again later on, there is enough information to solve it. It’s probably true that this has a perf cost, but I suspect it’s pretty small. Especially compared to using union-find or not! (E.g. Scala cannot use union-find in all cases because it doesn’t work well with subtyping. That’s a big deal for perf there AFAIK!)

I dug around the file you mention, but it all seems to be happening in TcM which is a type alias for an IOEnv which gives you access to IORef values which are needed for union find.

3 Likes

Hey! I recommend checking out Elchemy . It’s an Elm-like to Elixir transpiler written entirely in Elm. It uses a heavily modified version of bogdanp/elm-ast for parsing and also is slowly preparing its own type-checker based on @joonazan’s fork elchemy/elm-type-inference which already works for the simplest examples.

Elchemy uses a small sed substitution (literally a one-liner) to allow itself running on Node.js, but since it’s self-hosted we can also use it on Erlang VM after self-transpilation.

Since there is no interaction with the system it required a bunch of workarounds, but it also kept the compiler as safe as possible (it’s basically a wrapped, huge String -> String function)
Elchemy currently runs close to 10k LoC and I must say the maintainability of it is really a bliss compared to other weaker-typed projects I’m maintaining.

You can check out how it works on Elchemy-live web page, but I also recommend joining our Slack, we’re always happy to cooperate on any cool tooling-related projects

6 Likes

I’m not familiar with the algorithm, but you could semi-automatically heap recycling. The Clean programming language can apparently do something similar (though slightly different, as it converts shared types to unique types).

On the subject of making a compiler in Elm, I’m currently working through the book Implementing functional languages: a tutorial, which I can certainly recommend. It has code samples in the Miranda language, which looks nearly identical to Elm, and doesn’t use any advanced features you might find in e.g. Haskell. So yes, you should be able to write a compiler in Elm, so long as you don’t care how fast it is.

P.S.: keep in mind that the book treats non-strict languages specifically (this is often called “lazy”, but that is technically incorrect)

1 Like

Perhaps this would be the best book to use as a guide: Modern Compiler Implementation in ML? IIRC it uses Miranda as an example, or at least some small strict FP language.

==

Took a look, its actually a toy FP of the author’s own invention that is used as an example - but still that would be very useful material in the context of implementing a real-world FP language.

Dear all, belated thanks for all the feedback. Particularly grateful for the links on type inference - that’s also on my “todo” list.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.