Hello,
https://package.elm-lang.org/packages/the-sett/elm-syntax-dsl/latest/
Well that was some rabbit hole to go down… I wrote a pretty printer for Elm source code using a Wadler/Leijen style pretty printer. This is part of the package I posted about previously, which is a simplified DSL for building Elm code in Elm:
The idea behind the pretty printer is that its output should be invariant under elm-format. This means that if you generate some code, but later decide to start modifying that code by hand, when you save and format the file using elm-format, you won’t end up with a huge white-space diff. Going from generated code to hand edited code should flow more smoothly as a result.
Another way to do this would simply be to run elm-format
on any generated code, but I wanted to make it possible to not need to do this. For example, maybe you are doing some code gen interactively in the browser.
How is a pretty printer different from a formatter?
elm-format
does not decide by itself where to break lines, except in the case of some structures that are always broken. It lets the author decide that by the input given to it. If you break a line in the middle of a structure, say a list, it will break the whole list. The rule is if something is partially broken, break it all. Some structures like if-then
, case-of
and let-in
are always broken. It will happily let you define lines that are 1000s of characters long though - so long as they do not contain any expressions that are always broken.
The pretty printer takes as context the width of the page. Based on that it tries to fit code into single lines where possible. If the code does not fit, it decides where to break it to make it fit. It does the same thing as elm-format
, but in addition it works out how to fit it into a given page width - with the minimal amount of overflow should that not be possible.
Example. Suppose we have a list that is longer than the page width, say 120 characters. If this list is written all on one line, then given to elm-format or the pretty printer, the results will be:
elm-format - keeps it all on one line, since none of it is broken.
[ "a", "long", "list", ... ]
pretty printer - decides to break it since it does not fit inside 120 characters:
[ "a",
, "long",
, "list"
, ...
]
A pretty printer can also work with a ribbon width. Ribbon width is the width of the text on a page not including any initial indent on the line. By combining page width with a ribbon width more consistent layouts can be generated when expressions get more deeply nested; the same expression on the left of a page will be broken the same if is more deeply nested towards the right of the page. This is a feature I intend to add to the pretty printing library itself in the near future.
Wild code…
I mined my .elm
folder for example Elm files to try it on. These files were parsed, then printed with the pretty printer and stored in a folder called /pre
, then elm-format
was run on that and results stored in a folder called /post
. Then pre and post were diffed to check that the pretty printer is indeed invariant under elm-format
.
I found plenty of hard to get right material in there. For example a case statement like:
case
(\lambda -> case
some
huge
expression
of
InnerCase1 -> ...
LotsOfThem -> ...
)
of
Case 1 -> ...
You have to wonder - at what point is it sensible to break something out using a let-in
block or defining a top-level function. Anyway, it certainly gave me some good material to stress test this on.