We are getting sporadic OOM (and maybe other errors) when compiling either a 900 elm module or 200 elm module app with the ghc runtime options +RTS -N2 -A16m -n4m on Github Actions. If anybody has experience with this, I’m curious to learn more.
Compiling (202)
Compiling (203)
Compiling (204)
Compiling (205)
Compiling (206)
Compiling (207)
/home/runner/work/_temp/6e45e6c3-88e5-4f26-b187-249abcdf9f25.sh: line 2: 3327 Killed elm make ../../elm/.../Main.elm --output=/dev/null +RTS -N2 -A16m -n4m
Compiling (208)
Error: Process completed with exit code 137.
Here’s a measurement with the intermittent failing settings (during a successful run):
Success! Compiled 873 modules.
31,525,127,728 bytes allocated in the heap
40,637,112,784 bytes copied during GC
2,654,368,576 bytes maximum residency (34 sample(s))
7,736,512 bytes maximum slop
6101 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 960 colls, 0 par 32.227s 32.268s 0.0336s 0.0755s
Gen 1 34 colls, 0 par 18.052s 20.845s 0.6131s 4.1169s
TASKS: 75 (1 bound, 69 peak workers (74 total), using -N2)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.001s elapsed)
MUT time 11.642s ( 10.305s elapsed)
GC time 50.279s ( 53.114s elapsed)
EXIT time 0.001s ( 0.001s elapsed)
Total time 61.923s ( 63.421s elapsed)
Alloc rate 2,707,929,742 bytes per MUT second
Productivity 18.8% of total user, 16.3% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
also, we are now enforcing +RTS -s -H1G -M6G in GHA for precaution
-H “suggests” suitable heap size, while -M sets the maximum
On 2., it was hard to catch. By “interweaved extensible records” I mean records like this:
type alias PageModule urlParams model msg =
{ init : Shared -> urlParams -> ( HasShared model, Cmd msg )
, update : msg -> HasShared model -> ( HasShared model, Cmd msg )
...
}
as you can see it packages “page” module APIs into a single record for better code organization. (The pattern also found in elm-spa and such, but made for ourselves)
However as described in the linked issue, this pattern can accelerate heap memory usages.
We decided to remove these packaging records and modified the code generation and auto-wiring implementation. So that it directly exports init, update and other functions from page modules then imports/wires them from root app module.
At least as of now the heap usage is stable and compilation works well enough on average CI/Dev environments.
I cannot say the situation is the same for you @kanishka but our process to tackle the issue may help
Issues like this are the only time where I think rescript-tea and the historical work around benchmarking compiler perf in ocaml would be nice, but definitely prefer elm overall
When I get it to reproduce the OOM case, how can I get -s flag to output something? That is, it would be nice to have incremental dumps of those statistics instead of having only a cumulative dump on successful exit.
If you want to look deeper, You have to build elm-compiler by yourself with profiling option.
In short, you should get newest cabal and ghc8.6.5 by ghcup, clone the elm-compiler repo, and build it with cabal v2-configure --enable-profiling; cabal v2-build. (Probably you need cabal v2-install --only-dependencies before first cabal v2-build)
Capping memory reduced the failure rate, but we are still getting occasional out of memory errors, but the errors are slightly nicer, with the GC stats dumped before crashing.
My coworker is exploring GC alg options for ghc. I am also hoping that we can shrink down our failure case from 900 modules down to a few modules, so that we can identify the root cause.