Most compile times are pretty quick these days, but I have a theory that explains the outliers in compile times I have heard about so far.
EDIT: MY THEORY IS NOT CORRECT AT ALL IN THIS CASE!
It turned out later in this thread that this project has a
case with 19k+ branches. It seems like that makes exhaustiveness checking slow
Anyway, here’s what I wrote before we found that all out!
If the theory is correct, I would expect @NicolasGuilloux’s project to have two or three of these characteristics:
.elmi files bigger than 5 MB in
- heavy use of records or extensible records in the functions and types available through the various
module ... exposing (..) in the project
- very long user name, project name, module names, and/or function names
Is that the case?
Your scenario could be good evidence that this theory is valid, and help me know if some of the ideas I have will actually help. Very curious to hear!
type alias is just a name for another type, it is stored in interface files as (the name + the aliased type) so that all the information needed for type inference is available. So saying
Record -> Record in Elm could map onto a much bigger type in the interface files. And when those files are read into memory, they take up a lot of space because there is a copy of the underlying aliased type for each usage. So compilation ends up being slow mostly due to file IO and frequent GC since the program keeps overrunning the heap size.
If people are mostly using opaque types and strong boundaries, they tend to use
type which does not have this issue. I suspect @lydell has a project like that. But there are some cases (especially when integrating with some OO kind of system) where it makes sense to use a large
type alias more often, and that seems to be the predictor for having these outlier compile times.
Temporary Work Arounds
elm with additional heap space can also make GC less frequent, which can help a lot in some cases. Elm normally asks for 128mb of heap space, but the following call would ask for 1 GB instead:
elm make src/Main.elm +RTS -H1024m
This can help a lot, especially if you are on a computer that has a bunch of RAM and you can make the heap bigger than needed. This is an easy change that can help anyone running into this.
Elm 0.19.1 was built with GHC 8.6.3, so you can tweak this more with the flags listed here.
Second, it may be possible to identify particular modules that trigger these issues and change them around. The steps are:
- Look through
elmi files that are quite large (like 20 MB)
- Identify any
type alias that is large and commonly used in the corresponding Elm file
- Temporarily swap it to a
If you try this and it makes a difference, I would be very curious to hear! This would be good evidence in favor of the
type alias theory.
I would only recommend trying this second path in certain cases though. If it makes your code better, then 100% go for it! If it does not help with compile times in practice, let me know and revert it. If it does help with compile times, let me know as well! I can only really recommend on a case-by-case basis whether it’s worth it in that last case.
I suspect this approach may not be viable for a library that wants to have large records as a central feature of its API design though, so this probably works better for applications and companies.
Ideas for a Compiler Fix
I explored a revamp of
elmi files that would ensure that each
type alias is only stored once per
elmi file. This would make those outlier files much smaller (helping with file IO) and would mean that every usage of a type alias would point to the same underlying type in the heap (helping with GC). I also explored storing type names as
UInt32 using some tricky scheme. That would cut down the size of
elmi files even more.
I suspect these two changes would help with the performance issue, but they are quite tricky to implement. The
elmi idea requires some surprising topological sorting and an alternative to Haskell’s
Data.Binary library to permit ideal sharing in the heap. The
UInt32 idea is particularly disruptive because it requires changes in type inference and error messages. So while it appears to be quite valuable for perf, it ends up requiring changes across the whole compiler.
Anyway, this is an offshoot of the exploratory compiler work I’ve been doing. I know enough to know they seem possible, and that there are a couple avenues to explore for
elmo files as well. So I think the best I can do on timeline is what I wrote here. While I am comfortable saying that I really want to get these ideas into the next release of Elm, I would not want anyone to (1) imagine that it is just around the corner or (2) think I have hard data that these changes will definitely resolve their outlier compile times. It’s still a theory. Furthermore, large infrastructure projects like this take significant time to do well (and require coordination with tools like
elm-test that peek at interface files) and I am trying to balance these ideas with more ambitious explorations.
I hope the information in this post is useful to @NicolasGuilloux or anyone else with outlier compile times. I also hope that sharing my ideas on how to improve the compiler does not create animosity towards me or the project. I am working as fast as I can, but sometimes a thing that is easy to describe can take a long time to implement.
Finally, I hope that people will not take this detail into account in API design unless they are experiencing very extreme compile times. I think it would be a shame to have API design based on behavior of the compiler that may be resolvable through compiler infrastructure changes, but I appreciate that some balance must be struck on a case-by-case basis in the meantime.