Most compile times are pretty quick these days, but I have a theory that explains the outliers in compile times I have heard about so far.
EDIT: MY THEORY IS NOT CORRECT AT ALL IN THIS CASE!
It turned out later in this thread that this project has a case
with 19k+ branches. It seems like that makes exhaustiveness checking slow
Anyway, here’s what I wrote before we found that all out!
If the theory is correct, I would expect @NicolasGuilloux’s project to have two or three of these characteristics:
.elmi
files bigger than 5 MB in elm-stuff/
- heavy use of records or extensible records in the functions and types available through the various
module ... exposing (..)
in the project
- very long user name, project name, module names, and/or function names
Is that the case?
Your scenario could be good evidence that this theory is valid, and help me know if some of the ideas I have will actually help. Very curious to hear!
The Theory
Since a type alias
is just a name for another type, it is stored in interface files as (the name + the aliased type) so that all the information needed for type inference is available. So saying Record -> Record
in Elm could map onto a much bigger type in the interface files. And when those files are read into memory, they take up a lot of space because there is a copy of the underlying aliased type for each usage. So compilation ends up being slow mostly due to file IO and frequent GC since the program keeps overrunning the heap size.
If people are mostly using opaque types and strong boundaries, they tend to use type
which does not have this issue. I suspect @lydell has a project like that. But there are some cases (especially when integrating with some OO kind of system) where it makes sense to use a large type alias
more often, and that seems to be the predictor for having these outlier compile times.
Temporary Work Arounds
First, calling elm
with additional heap space can also make GC less frequent, which can help a lot in some cases. Elm normally asks for 128mb of heap space, but the following call would ask for 1 GB instead:
elm make src/Main.elm +RTS -H1024m
This can help a lot, especially if you are on a computer that has a bunch of RAM and you can make the heap bigger than needed. This is an easy change that can help anyone running into this.
Elm 0.19.1 was built with GHC 8.6.3, so you can tweak this more with the flags listed here.
Second, it may be possible to identify particular modules that trigger these issues and change them around. The steps are:
- Look through
elm-stuff/
for elmi
files that are quite large (like 20 MB)
- Identify any
type alias
that is large and commonly used in the corresponding Elm file
- Temporarily swap it to a
type
If you try this and it makes a difference, I would be very curious to hear! This would be good evidence in favor of the type alias
theory.
I would only recommend trying this second path in certain cases though. If it makes your code better, then 100% go for it! If it does not help with compile times in practice, let me know and revert it. If it does help with compile times, let me know as well! I can only really recommend on a case-by-case basis whether it’s worth it in that last case.
I suspect this approach may not be viable for a library that wants to have large records as a central feature of its API design though, so this probably works better for applications and companies.
Ideas for a Compiler Fix
I explored a revamp of elmi
files that would ensure that each type alias
is only stored once per elmi
file. This would make those outlier files much smaller (helping with file IO) and would mean that every usage of a type alias would point to the same underlying type in the heap (helping with GC). I also explored storing type names as UInt32
using some tricky scheme. That would cut down the size of elmi
files even more.
I suspect these two changes would help with the performance issue, but they are quite tricky to implement. The elmi
idea requires some surprising topological sorting and an alternative to Haskell’s Data.Binary
library to permit ideal sharing in the heap. The UInt32
idea is particularly disruptive because it requires changes in type inference and error messages. So while it appears to be quite valuable for perf, it ends up requiring changes across the whole compiler.
Anyway, this is an offshoot of the exploratory compiler work I’ve been doing. I know enough to know they seem possible, and that there are a couple avenues to explore for elmo
files as well. So I think the best I can do on timeline is what I wrote here. While I am comfortable saying that I really want to get these ideas into the next release of Elm, I would not want anyone to (1) imagine that it is just around the corner or (2) think I have hard data that these changes will definitely resolve their outlier compile times. It’s still a theory. Furthermore, large infrastructure projects like this take significant time to do well (and require coordination with tools like elm-test
that peek at interface files) and I am trying to balance these ideas with more ambitious explorations.
Hopes
I hope the information in this post is useful to @NicolasGuilloux or anyone else with outlier compile times. I also hope that sharing my ideas on how to improve the compiler does not create animosity towards me or the project. I am working as fast as I can, but sometimes a thing that is easy to describe can take a long time to implement.
Finally, I hope that people will not take this detail into account in API design unless they are experiencing very extreme compile times. I think it would be a shame to have API design based on behavior of the compiler that may be resolvable through compiler infrastructure changes, but I appreciate that some balance must be struck on a case-by-case basis in the meantime.