Elm pretty printer invariant under elm-format (for code gen)

Hello,

https://package.elm-lang.org/packages/the-sett/elm-syntax-dsl/latest/

Well that was some rabbit hole to go down… I wrote a pretty printer for Elm source code using a Wadler/Leijen style pretty printer. This is part of the package I posted about previously, which is a simplified DSL for building Elm code in Elm:

The idea behind the pretty printer is that its output should be invariant under elm-format. This means that if you generate some code, but later decide to start modifying that code by hand, when you save and format the file using elm-format, you won’t end up with a huge white-space diff. Going from generated code to hand edited code should flow more smoothly as a result.

Another way to do this would simply be to run elm-format on any generated code, but I wanted to make it possible to not need to do this. For example, maybe you are doing some code gen interactively in the browser.

How is a pretty printer different from a formatter?

elm-format does not decide by itself where to break lines, except in the case of some structures that are always broken. It lets the author decide that by the input given to it. If you break a line in the middle of a structure, say a list, it will break the whole list. The rule is if something is partially broken, break it all. Some structures like if-then, case-of and let-in are always broken. It will happily let you define lines that are 1000s of characters long though - so long as they do not contain any expressions that are always broken.

The pretty printer takes as context the width of the page. Based on that it tries to fit code into single lines where possible. If the code does not fit, it decides where to break it to make it fit. It does the same thing as elm-format, but in addition it works out how to fit it into a given page width - with the minimal amount of overflow should that not be possible.

Example. Suppose we have a list that is longer than the page width, say 120 characters. If this list is written all on one line, then given to elm-format or the pretty printer, the results will be:

elm-format - keeps it all on one line, since none of it is broken.

[ "a", "long", "list", ... ]

pretty printer - decides to break it since it does not fit inside 120 characters:

[ "a", 
, "long",
, "list"
, ...
]

A pretty printer can also work with a ribbon width. Ribbon width is the width of the text on a page not including any initial indent on the line. By combining page width with a ribbon width more consistent layouts can be generated when expressions get more deeply nested; the same expression on the left of a page will be broken the same if is more deeply nested towards the right of the page. This is a feature I intend to add to the pretty printing library itself in the near future.

Wild code…

I mined my .elm folder for example Elm files to try it on. These files were parsed, then printed with the pretty printer and stored in a folder called /pre, then elm-format was run on that and results stored in a folder called /post. Then pre and post were diffed to check that the pretty printer is indeed invariant under elm-format.

I found plenty of hard to get right material in there. For example a case statement like:

case 
    (\lambda -> case
                    some
                        huge 
                        expression
                of
                    InnerCase1 -> ...
                         
                    LotsOfThem -> ...
    ) 
of 
        Case 1 -> ...

You have to wonder - at what point is it sensible to break something out using a let-in block or defining a top-level function. :man_shrugging: Anyway, it certainly gave me some good material to stress test this on.

8 Likes

Cool idea and such a fascinating concept!

Are you messing with function composition? Something like this would be perfect to illustrate the power of function composition! I know it took me a while to grok things like >>, |>, <|, <<. As I was figuring out how to sort out my parenthesis madness as an newbie elm dev, I would have been super excited to find a tool like this!

Are you already using elm-format?

A lot of Elm devs have this set up in their editor to run every time you save a file. It already has rules to add or remove parenthesis as needed to remove redundant ones and insert them where it makes the intention clearer. I had to replicate these rules in my code, but I don’t think it is what you are looking for - try elm-format.

1 Like

Got some useful feedback from Slack.

Published an update with pretty printers for more snippets of Elm syntax, not just whole files.

Improvements to the docs - an example to show how to actually use the pretty printer.

:+1:

Also note, its not a command line tool - aimed at code generation from within Elm programs for now. It could be used to make one called elm-pretty, but there are quite a few issues that would need to be addressed first. In particular, end-line comments are not handled so well by elm-syntax. You can see why, because there are so many places in the syntax treee that they could be placed, it will be a PITA to cover them all. Unless it makes sense to put them in the Nodes.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.