Elm-format 0.8.7 — it’s fast!

This announcement is a bit delayed, since much of this was included in elm-format 0.8.6, but the Windows npm installer had problems that were discovered after release, so I was waiting to announce until those were fixed. So this it the combined update for 0.8.6 and 0.8.7.

As always, you can install with elm-tooling, nixpkgs, npm, or download from the release page.

Thanks to many contributors, elm-format is now much faster. Testers have consistently reported an 11x speedup (wall time) on MacOS ARM, and a 4x speedup on other platforms. This is thanks to the following:

  • Native binaries are now provided for MacOS ARM64 (previous versions required Rosetta) and for Linux ARM64 (aarch64)
  • File are now processed in parallel
  • The low-level parser API (previously parsec) has been replaced by Evan’s new parsing primitives that he introduced in Elm 0.19’s compiler. The work to integrate this into elm-format was done in Emma’s 2021 Google Summer of Code project (work submission, implementation notes), and it’s now finally available in a released version! If you’re interested in Haskell, here are a few details about why it’s fast (or at least my own understanding of it):
    • It avoids a lot of memory allocation (and garbage collection) by loading the input file into a memory buffer and having parsed tokens refer to ranges in that buffer instead of making copies of all retained strings.
    • It speeds up backtracking by modeling parser state as pointers into the memory buffer, instead of managing a queue of input tokens.
    • It avoids UTF-8 decoding and encoding by working with ByteStrings and ByteArrays directly.
    • It avoid unnecessary memory allocation and copying when writing output derived from the input file by using ByteString.Builder when possible.
    • There are a lot of other optimizations like the use of continuation-passing style, strictness flags, and unboxed types, which Evan did a lot of benchmarking on to find the most performant design decisions.
  • Improvements to ghc (the Haskell compiler) and the text-2.0 library

Other changes (0.8.6 and 0.8.7 combined)

New features:

  • case ... then is now auto-corrected to case ... of
  • => is now auto-corrected to -> when unambiguous

Bug fixes:

  • module exposing listings containing comments no longer add extra leading spaces
  • Redundant import aliases (when the alias is the same as the module name) are now removed
  • Top-level declarations named “infix” no longer make files unprocessable

Other changes:

  • The npm installer now has zero dependencies

Thanks to …

  • Lamdera for providing CI runners to build the MacOS ARM64 release binaries
  • @emmabastas for refactoring the parser internals to integrate Elm 0.19’s compiler’s parser, and the initial draft of test coverage scripts
  • @lydell for processing files in parallel, and the new dependency-free npm installer script
  • @mdevlamynck for the exposing listings bug fix
  • @tfausak for linux-aarch64 build script updates
  • @supermario for mac-arm64 build script updates
  • @kutyel for lenient parsing additions
  • @jfmengels for continued thoughtful issue discussion across the Elm devtools community (only partially related to elm-format, but thank you!)
  • @8n8 for code cleanup help
  • Elm community members for testing the new binaries and npm packages

Also thanks to anyone who’s contributed to cross-compilation support in ghc or nix in the past two years — that made the ARM64 binaries possible.

What’s next

My open-source work has mostly been on pause for the past three years (since COVID started), but I’m now finding time for it again. I’m hoping to take what I’ve learned from elm-format in the past 7½ years and help make it easier for folks to write tools for Elm or for other new languages. If you have any such projects and are interested in the following, please get in touch!

Here’s specifically what’s next:

  • extract some publishable Haskell libraries from elm-format:
    • a slightly-opinionated, language-agnostic library for writing the “formatting” part of a code formatter
    • a library for parsing, formatting, and transforming Elm code
    • a convenience library for writing command line tools that transform and/or validate files
  • do a new “experimental” version of Elm format (roadmap) — the first one since 2017. “Experimental” means it will contain some formatting behavior changes that may or may not get promoted to the next stable release after we see how they work in real-world use.
  • explore using ghc-9.6’s javascript backend to compile elm-format as a javascript library (interested contributors, please get in touch)
55 Likes

Most of items mentioned under “low-level parser API” improvements sound like they would be easier to control and tune with rust. I have no experience with or deep knowledge of rust. I assume using the parsing primitives module is more desirable than using a lower level language because it allows elm-format to share implementation with the elm compiler.

1 Like

Thanks everyone for this great effort. This community is awesome. :slight_smile: .

5 Likes

Most of items mentioned under “low-level parser API” improvements sound like they would be easier to control and tune with rust.

I assume using the parsing primitives module is more desirable than using a lower level language because it allows elm-format to share implementation with the elm compiler.

Yes, exactly. I suppose “easier to control and tune” might be true if writing a new parser from scratch, but if you include the work of re-writing Elm’s parser in Rust, I think the total effort would not be easier. Also, in elm-format’s case, the goal has been to fork Elm’s compiler’s parser, both to help reduce the work as much as possible, and to reduce the risk of bugs (w/r to parsing things the same way as Elm) as much as possible, so rewriting everything in Rust would also be detrimental to that goal.

But beyond the context of elm-format, I’m not sure I’d be convinced without some analysis of working examples implemented each way. First of all, what parsing APIs exist for Rust and how those compare among themselves. Then, does one of those options implement the same type of approach that Elm 0.19’s parser uses? If so, it would be interested to compare those specifically. Finally, a specific difference that I’d be interested in seeing the impact of is that implementing in Rust would lose the benefits of Haskell’s built-in laziness. In my experience with Haskell, despite the risk of space leaks, the benefit of laziness is that you often get near-optimal evaluation speed automatically without having to do anything special when writing your code, and that when you do run into issues, you can often incrementally improve the performance without having to refactor your entire system. In contrast, my experience so far with Rust has been that you have to explicitly design for performance and then implement your code in a specific way that aligns with that design.

But with all that said, I’d love to stay up-to-date on how the best-in-class Rust parsing APIs compare to the best-in-class Haskell parsing APIs. Personally, I probably won’t pursue doing such comparisons myself, but I’d be happy to consider integrating the learnings from any such comparisons that other folks document.

9 Likes

Thanks @avh4 for your awesome continued work on elm-format! :heart:

4 Likes

Awesome and thanks for this release! :raised_hands: I’m @kutyel on gh but @kvothe here haha, maybe I should change my handle! :thinking:

Thanks for such a thoughtful, thorough response!

One minor note

best-in-class Haskell parsing APIs

I probably wouldn’t have posted my note at all if elm was using some off the shelf haskell parsing libraries. From what I understand, elm is using a completely homegrown parsing library. I may be mistaken there. Once I realized that the library was homegrown, then I was curious about homegrown rust vs homegrown haskell library performance tuning. Regardless, I would have definitely chosen the path that you have chosen given the implementation language of the elm compiler.

I’m glad we have people working on this kind of stuff. I certainly don’t have the chops for it. Thanks.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.