I spent some more time profiling builds today, and found that with the current build of elm-make, 40% of the time is spent in the garbage collection (varies by repo, for this data I used elm-spa-example).
This first build is using the 0.18 elm-make from https://dl.bintray.com/elmlang/elm-platform/0.18.0/linux-x64.tar.gz, with the flags “+RTS -s -RTS” added to allow collecting this information
Starting run for repo: https://github.com/rtfeldman/elm-spa-example.git
Original elm-make 0.18
Success! Compiled 93 modules.
Successfully generated index.html
9,922,025,592 bytes allocated in the heap
2,175,916,768 bytes copied during GC
10,988,024 bytes maximum residency (180 sample(s))
939,136 bytes maximum slop
29 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 19003 colls, 19003 par 13.640s 7.232s 0.0004s 0.0105s
Gen 1 180 colls, 179 par 3.215s 1.618s 0.0090s 0.0314s
Parallel GC work balance: 23.65% (serial 0%, perfect 100%)
TASKS: 6 (1 bound, 5 peak workers (5 total), using -N2)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.001s ( 0.001s elapsed)
MUT time 14.685s ( 12.258s elapsed)
**GC time 16.855s ( 8.850s elapsed)**
EXIT time 0.009s ( 0.011s elapsed)
Total time 31.555s ( 21.120s elapsed)
Alloc rate 675,641,834 bytes per MUT second
Productivity 46.6% of total user, 69.6% of total elapsed
gc_alloc_block_sync: 811892
whitehole_spin: 0
gen[0].sync: 95
gen[1].sync: 52976
real 0m21.130s
user 0m21.070s
sys 0m10.480s
This second run is with a larger allocation area for the garbage collector, and with that area divided into larger chunks of memory so that the GC runs less often (-A128m -n8m divides the 128m allocation area into 8m chunks). The -n8m
option only makes sense if you are running on multiple cores, and this runs faster than the sysconfcpus -N 1
trick for me (or using -N1
through rtsopts).
elm-make with options: -A128m -n8m
Success! Compiled 93 modules.
Successfully generated index.html
9,969,011,928 bytes allocated in the heap
109,864,712 bytes copied during GC
6,429,864 bytes maximum residency (16 sample(s))
121,480 bytes maximum slop
278 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 46 colls, 46 par 1.075s 0.537s 0.0117s 0.0368s
Gen 1 16 colls, 15 par 0.277s 0.139s 0.0087s 0.0184s
Parallel GC work balance: 24.00% (serial 0%, perfect 100%)
TASKS: 6 (1 bound, 5 peak workers (5 total), using -N2)
SPARKS: 0 (0 converted, 0 overflowed, 0 dud, 0 GC'd, 0 fizzled)
INIT time 0.004s ( 0.004s elapsed)
MUT time 8.288s ( 8.271s elapsed)
**GC time 1.352s ( 0.676s elapsed)**
EXIT time 0.002s ( 0.002s elapsed)
Total time 9.653s ( 8.954s elapsed)
Alloc rate 1,202,873,985 bytes per MUT second
Productivity 86.0% of total user, 92.7% of total elapsed
gc_alloc_block_sync: 97391
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 3096
real 0m9.097s
user 0m8.930s
sys 0m0.850s