Elm-syntax - The rough edges

rupert · December 14, 2025, 11:22am

I have been trying again to get the pretty printer output from elm-syntax-dsl to match the output of elm-format, and have fixed quite a few bugs towards this goal recently.

There are a couple of problems with elm-syntax, that cannot be solved in the pretty printer alone. Specifically, triple quoted multi-line strings “”", and single line comments --. Not that elm-syntax-dsl consumes elm-syntax as a dependency.

For Strings, In Elm.Syntax.Expression, line 125:

type Expression 
     = ... 
     | Literal String      -- Just one constructor for ALL strings 
     | ...

In Elm/Parser/Expression.elm, lines 448-454:

 literalExpression : Parser (WithComments (Node Expression)) 
 literalExpression = 
     Tokens.singleOrTripleQuotedStringLiteralMapWithRange 
         (\\range string -> 
             { comments = Rope.empty 
             , syntax = Node range (Literal string)  -- Same constructor for both! 
             } 
         )

For single line comments, there is a representation:

type alias File =
    { moduleDefinition : Node Module
    , imports : List (Node Import)
    , declarations : List (Node Declaration)
    , comments : List (Node Comment)
    }

The difficulty here is that these comments are gathered at the file level, with their Node giving their position in the code. If I re-format the code, this position may change, so how do I know where to put the comments back again?

So lets add comments directly into Expressions? Except there a lot of places that you can put a comment in Elm code, and that would mean putting a lot of Maybe Comments in the Expression AST, > 99.9% of which would not be used.

The String one seems solvable by adding a Normal|Triple label to the Literal String representation.

The comments one… perhaps take a look at how elm-syntax does it and borrow from there.

Just curious if anyone has any thoughts on this?

lydell · December 14, 2025, 1:19pm

Comments. It’s always the comments. They take the joy out of modifying files at the AST level.

rupert · December 14, 2025, 3:40pm

Yeah, the doc comments are better - but even dealing with them is pretty complex, since they contain markdown, and then there is the interaction between the @docs tags and how the exposing list gets organized.

But I will investigate how elm-format deals with it all.

rupert · December 14, 2025, 4:58pm

From what I can figure out elm-format seems to use the fact that a – comment can only appear once per line, and at the end of a line, or on a line by itself.

So if I have some code like this:

let
    x = a -- Comment here is end line attached to Node of a
          -- Comment here is after line break but attaced to Node of a
         + b -- Comment here is end line attached to Node of whole assignment
    -- Comment here can have its own Comment entry in the AST
in

So it looks like it would be enought to let each Node have 1 or more comments, attached either at the end of the line, or after a line break. Then its a question of figuring out the correct Node to attach each comment to.

Comments on their own line, not really attached to anything can have an AST entry.

rupert · December 14, 2025, 5:00pm

let
    -- Comment here on line before, attached to assignment.
    x = a -- Comment here is end line attached to Node of a
          -- Comment here is after line break but attaced to Node of a
         + b -- Comment here is end line attached to Node of whole assignment
    -- Comment here can have its own Comment entry in the AST
in

Also attached to a Node, but on line before.

lue-bird · December 14, 2025, 5:33pm

you can fairly easily go through all comments and collect those in a range: elm-syntax-format/src/ElmSyntaxPrintDefunctionalized.elm at main · lue-bird/elm-syntax-format · GitHub , then there are various ways elm-format prints them (some collapse into a single line if possible, some are all printed with linebreaks in between etc depending on where in the source code you are): elm-syntax-format/src/ElmSyntaxPrintDefunctionalized.elm at main · lue-bird/elm-syntax-format · GitHub

As you said, and also in my opinion, storing the parsed comments at each possible place makes it both very annoying to collect and is also a waste of space, so I think the current solution is good enough.

The only real issue currently is that many tokens do not have ranges in the syntax tree so when printing, we cannot know if the comments are before or after it: GitHub - lue-bird/elm-syntax-format: pretty print elm-syntax like elm-format

As for single vs triple quoted: yes, this will need to be fixed in elm-syntax v8, currently I check which one was used by seeing if the Node range is multi-line or the single-line range length is greater than the string unicode length + 2 (which sometimes fails when doing a bunch of escapes like escaping single quotes in a string but it’s generally a decent guess): elm-syntax-format/src/ElmSyntaxPrintDefunctionalized.elm at main · lue-bird/elm-syntax-format · GitHub

rupert · December 14, 2025, 6:27pm

My pretty printer does not just print the code as elm-format does. Instead it lets you print the code to a particular page-width (and now with an optional ribbon-width too). It re-flows the code to try and fit it to a particular page with. If there are broken lines which are short, it may reassemble them into single lines. If there are lines which are too long or lead to too deeply indented code further on, it may break those lines to get everything to fit.

So I could find all comments in a range. But the ranges are changing during this code reformat. So I will not know what range to find all comments for, if you see what I mean?

===

The idea is to get this reformated output as close as I can to being idempotent under elm-format - that is, running elm-format on it will not make any changes. In practice this will be impossible to do when fitting, because elm-format sometimes puts things on a line regardless of how long it is. But hopefully I can get very close, or find ways to violate line length restrictions where absolutely needed to match elm-format.

Most likely I will just make a tool that runs a re-format and then after runs elm-format on it anyway to ensure an exact match.

Viir · December 14, 2025, 8:26pm

Really curious timing, just ran into these issues this week, and am currently working on an expanded version of the syntax model.

Adding to what @lue-bird already mentioned, here is the link to the PR adding the triple quote distinction in breaking-changes-v8: https://github.com/stil4m/elm-syntax/pull/189

The difficulty here is that these comments are gathered at the file level, with their Node giving their position in the code. If I re-format the code, this position may change, so how do I know where to put the comments back again?

When I am processing a node for formatting, I still have its original location/range. I use that to determine which of the comments (if any) belong here. In other words, I compare original node location with original comment location to associate them.

Some libraries for other languages do it this way. For example, in Roslyn, each node has both LeadingTrivia and TrailingTrivia properties. (Trivia not only contain comments, since it supports different kinds of whitespaces and line-breaks, all to be preserved exactly)
Fun exercises to figure out whether to put such a comment in the trailing trivia of the last, or the leading trivia of the next expression node…

But, so far the model keeping the comments in the list at the root seems to work well. I did not run into a case which made me want to change that aspect of the syntax model.

Absolutely, I am experimenting with a modified version to add locations of these tokens.

Provide position of tokens

opened 09:17PM - 09 Aug 21 UTC

jfmengels

enhancement proposal

I would very much like to be able the position (technically, the `Range`) of mos…t tokens of the AST (turning it into a CST I guess?). One use-case for that is to provide higher quality fixes or squiggly line targets in `elm-review` for instance. For now, I have used hacks to get around this missing information. I am thinking of the tokens like these: - Position of `if`, `then` and `else` keywords for an if expression - `case` and `of` keyword for case expressions, as well as `->` for every pattern - `module`, `exposing` and parens for what is exposed (as well as `effect` and `port`) - `|` in a record update expression - `in` in `let` expressions - `:` and `=` in function declarations. Also the `->` in between types of a type annotation - ... The thing with (almost?) all of the above is that whitespace can be manipulated, and because of that, it becomes quite hard to know for sure where the keywords are. We also don't want to bloat the AST too much, so maybe whenever the keyword is known to always be at the very beginning or very end of the expression, we don't provide it (sorry `if` and `case`). For that reason, we should also not provide the tokens for strings' `"`, or for `(` & `)` for tuples and parens. Though in the case for strings, we should at least have some way of knowing whether single or triple quotes are used (https://github.com/stil4m/elm-syntax/issues/57). I don't yet know how best to represent delimiters, like `,` in lists and records, or `|` in custom types, and probably more.

Viir · December 14, 2025, 11:03pm

To work around the problem of figuring out placement between these location-less tokens on one side and comments on the other, we used an approach that places the tokens where the gaps between comments are.

The following is an example from our test suite that we succesfully round-trip using the original stil4m/elm-syntax (version 7) model:

module Test exposing (..)


decl a b =
    let
        alfa =
            42
    in
    if a == 71 then
        [ 13 ]

    else if
        -- A comment before condition
        (a == 73)
            || (a == 77)
        -- Comment between conditions
    then
        [ 17 ]

    else if
        a == 79
        -- Another simple comment
    then
        -- And another simple comment
        [ 21 ]

    else
        -- Yet another simple comment
        [ 23 ]

While this might be sufficient to achieve proper round-tripping aligned with avh4/elm-format, the implementation is somewhat hacky here: Since our architecture has a formatter separate from the renderer, the renderer has duplicate implementations of this placement logic.

jfmengels · December 15, 2025, 9:03pm

I am in agreement with you all, there’s a lot of pain points with elm-syntax, and the v8 branch/project is work in that direction. I described the future of elm-syntax v8 during its last major development. There’s an branch/PR with all the ongoing changes: Ongoing branch for v8 by jfmengels · Pull Request #188 · stil4m/elm-syntax · GitHub

Unfortunately, I have not had the time/energy to work on it recently. But if people want to discuss, propose or even make PRs, I’ll do my best to help out so we can get this to a close. In my OSS experience, there’s nothing like collaboration to re-ignite the fire. And it would feel so good to get over this milestone!

The simple/triple quotes is one of the issues fixed on the v8 branch.

As for comments, the discussion has been brought up many times, and I don’t think I have been convinced by anything yet (but it’s been a while). I’m absolutely open to changing my mind, but right now I’m going with not changing anything and leaving the comments be a separate list next to comments.

rupert · December 16, 2025, 9:15am

Thanks - I had not realized there was a v8 branch till lue mentioned it, I will check that out.

I can try re-inserting the comments from the list, using original position ranges from the nodes being re-formated. Perhaps it will not be as hard as I imagine, but knowing how hard it was to write the pretty printing code, I do imagine it will be extremely fiddly. Its worth exploring this though as keeping the comments in a list is definitely a much simpler model than working them into the AST directly.

system · December 26, 2025, 9:16am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Elm pretty printer invariant under elm-format (for code gen) Show and Tell	4	940	September 29, 2019
How does elm-format work for the Elm community? Request Feedback	19	2647	March 23, 2019
Hi Elmers, Let's talk about Elm's code formatting Learn	19	3514	October 17, 2018
Rewriting elm-syntax and future plans Show and Tell	6	908	August 23, 2024
ESLint equivalents in Elm Show and Tell	4	694	August 26, 2022

Elm-syntax - The rough edges

Related topics