OOM with elm make using ` +RTS -N2 -A16m -n4m` on GitHub actions runner

We are getting sporadic OOM (and maybe other errors) when compiling either a 900 elm module or 200 elm module app with the ghc runtime options +RTS -N2 -A16m -n4m on Github Actions. If anybody has experience with this, I’m curious to learn more.

Compiling (202)
Compiling (203)
Compiling (204)
Compiling (205)
Compiling (206)
Compiling (207)
/home/runner/work/_temp/6e45e6c3-88e5-4f26-b187-249abcdf9f25.sh: line 2:  3327 Killed                  elm make ../../elm/.../Main.elm --output=/dev/null +RTS -N2 -A16m -n4m
Compiling (208)
Error: Process completed with exit code 137.

Here’s a measurement with the intermittent failing settings (during a successful run):

Success! Compiled 873 modules.
  31,525,127,728 bytes allocated in the heap
  40,637,112,784 bytes copied during GC
   2,654,368,576 bytes maximum residency (34 sample(s))
       7,736,512 bytes maximum slop
            6101 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       960 colls,     0 par   32.227s  32.268s     0.0336s    0.0755s
  Gen  1        34 colls,     0 par   18.052s  20.845s     0.6131s    4.1169s

  TASKS: 75 (1 bound, 69 peak workers (74 total), using -N2)

  SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)

  INIT    time    0.001s  (  0.001s elapsed)
  MUT     time   11.642s  ( 10.305s elapsed)
  GC      time   50.279s  ( 53.114s elapsed)
  EXIT    time    0.001s  (  0.001s elapsed)
  Total   time   61.923s  ( 63.421s elapsed)

  Alloc rate    2,707,929,742 bytes per MUT second

  Productivity  18.8% of total user, 16.3% of total elapsed

gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
1 Like

I’ve encountered the similar situation on GitHub-hosted runners, which only have 7GB of RAM unless you are opting in to run larger runners.
https://docs.github.com/en/actions/using-github-hosted-runners/about-github-hosted-runners#supported-runners-and-hardware-resources
In our case, the app was somewhat large (around 400 modules) including many auto-generated code. We profiled the compilation using +RTS -s.

After investigation,

  1. it was basically due to excessive GCs
  2. found out it could be mitigated by reducing interweaved extensible records (the situation described in the issue Compilation time is O(2^n) when composing ext. records · Issue #1897 · elm/compiler · GitHub)
  3. also, we are now enforcing +RTS -s -H1G -M6G in GHA for precaution
    • -H “suggests” suitable heap size, while -M sets the maximum

On 2., it was hard to catch. By “interweaved extensible records” I mean records like this:

type alias PageModule urlParams model msg =
    { init : Shared -> urlParams -> ( HasShared model, Cmd msg )
    , update : msg -> HasShared model -> ( HasShared model, Cmd msg )
    ...
    }

as you can see it packages “page” module APIs into a single record for better code organization. (The pattern also found in elm-spa and such, but made for ourselves)

However as described in the linked issue, this pattern can accelerate heap memory usages.
We decided to remove these packaging records and modified the code generation and auto-wiring implementation. So that it directly exports init, update and other functions from page modules then imports/wires them from root app module.

At least as of now the heap usage is stable and compilation works well enough on average CI/Dev environments.
I cannot say the situation is the same for you @kanishka but our process to tackle the issue may help :crossed_fingers:

5 Likes

Issues like this are the only time where I think rescript-tea and the historical work around benchmarking compiler perf in ocaml would be nice, but definitely prefer elm overall

When I get it to reproduce the OOM case, how can I get -s flag to output something? That is, it would be nice to have incremental dumps of those statistics instead of having only a cumulative dump on successful exit.

capping the memory seems to be working.

2 Likes

If you want to look deeper, You have to build elm-compiler by yourself with profiling option.
In short, you should get newest cabal and ghc8.6.5 by ghcup, clone the elm-compiler repo, and build it with cabal v2-configure --enable-profiling; cabal v2-build. (Probably you need cabal v2-install --only-dependencies before first cabal v2-build)

These pages may help. How to profile? · Issue #5930 · haskell/cabal · GitHub Tutorial: Profiling Cabal projects – Functional programming debugs you

2 Likes

Capping memory reduced the failure rate, but we are still getting occasional out of memory errors, but the errors are slightly nicer, with the GC stats dumped before crashing.

My coworker is exploring GC alg options for ghc. I am also hoping that we can shrink down our failure case from 900 modules down to a few modules, so that we can identify the root cause.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.