Elm Compiler in Elm - Update

pit · November 7, 2024, 8:23am

I finally managed to (hopefully) solve the stack overflow problems when running my port of the Elm compiler on other browsers than Safari.

With the new version I can compile the big module of the elm-default-tailwind-modules package in my Chrome browser, and I can compile the compiler with itself, too There still might be problems with other big projects, though, so please tell me if you happen to encounter a stack overflow or other errors.

If you’re interested in the details of the changes I applied, feel free to ask.

pit · November 9, 2024, 7:18am

In the Elm Slack I was asked what I had changed. I’m answering here because the posts stay longer.

Here is a short list:

In the original compiler, types like Parser, Decoder, Result, Tracker are implemented using a technique which is called (I think) “Church Encoding”. I wanted to add something like a stack-safe loop function to some of these types, but didn’t know how to do this in the Church Encoding, so I changed the defiinition of these types. For Decoder and Tracker this was all I had to do to solve the stack overflow problems there.
For Parser, Result and my IO monad I added the said “loop” function and used them where needed. I also used them to add some optimized fold and traversal functions to Result and IO.
In some places, the Elm compiler builds a graph (of nodes and edges) and then uses a library to dermine the so-called “strongly connected components” of such a graph. I had ported the code from the Haskell library to Elm, but as it turned out, that hasn’t been stack-safe. I had to re-implement the graph algorithm from scratch.
Finally, there were some functions where I had to manually try to make them tail-recursive, sometimes using a so-called continuation passing style: the functions constrainDecls and solve from the type checker, and the function checkDecls from the nitpicker. I was also lucky that Jeroen Engels showed me in his pull request how to make my implementation of the MVector type tail recursive. Thank you again @jfmengels.

This is just a very high-level description of my changes. If you want to know more, I’ll try to find some time to describe some points in more detail.

I was also asked whether these changes had any impact on performance, but after some very rudimentary testing I can’t see any substantial difference.

marcw · November 18, 2024, 8:25pm

Thank you for making this even better and finding ways to not exceed stack limits on Firefox and Chromium.

Do you maybe have a basic introduction to this maybe-named “church encoding” technique?
Is it similar to defunctionalization, where instead of stacking function executions one returns records that are then handled in later runs, like e.g. in elm/parser loop?

pit · November 27, 2024, 7:12pm

Hey Marc, I am currently handicapped because of a cold, but I’ll try to answer as soon as possible…

pit · December 6, 2024, 1:02pm

Sorry Marc, I need more time to recover. Just posting this to keep the thread open…

pit · December 15, 2024, 8:56am

Hi Marc, now I’m back again

Instead of giving you just a few links I decided to explain this with a brief, little example. I hope that I’m not writing too much again, but here’s the first part:

Part 1: Either as an Algebraic Data Type

Let’s define the Haskell type Either as an Algebraic Data Type, in Elm called “Custom Type”:

module Either exposing (Either(..))

type Either a b
    = Left a
    | Right b

This type looks pretty much like Elm’s Result type, but in Result x a it is clear that x is an error type and a is the result type. In Either a b the types a and b are equivalent, one isn’t “better” than the other.

Remark: In Haskell, Either is often used like Elm’s Result type with Left as the error case and Right as the result case, but in the following example we treat both as equal.

To create values of type Either we use the constructor functions Left and Right:

import Either exposing (Either(..))

eitherList : List (Either Int Char)
eitherList =
    [ Left 1
    , Right 'a'
    , Right 'z'
    , Left 9
    , Right 'p'
    ]

To process values of type Either we use case expressions:

eitherToString : Either Int Char -> String
eitherToString anEither =
    case anEither of
        Left i ->
            "Int: " ++ String.fromInt i

        Right c ->
            "Char: " ++ String.fromChar c

Finally, we can put all this togther in a little program:

import Html exposing (Html)

main : Html msg
main =
    Html.text (String.join ", " (List.map eitherToString eitherList))

and if we run this we get

Int: 1, Char: a, Char: z, Int: 9, Char: p

Here’s an Ellie with the code: (https://ellie-app.com/t5jVrfnPnMfa1)

The second part will follow soon…

Edit: changed “Abstract” to “Algebraic Data Type”.

pit · December 15, 2024, 9:34am

Part 2: Either as an Opaque Type

Now suppose we want to make the type Either in module Either opaque.

If we do so, we can’t use the constructors Left and Right in other modules anymore, and we also can’t use them to pattern match in case expressions. This means we need some helper functions.

For creating values of type Either we can define and expose the following functions:

left : a -> Either a b
left =
    Left

right : b -> Either a b
right =
    Right

Now the code of our previous example looks like this:

import Either exposing (Either, left, right)

eitherList : List (Either Int Char)
eitherList =
    [ left 1
    , right 'a'
    , right 'z'
    , left 9
    , right 'p'
    ]

Easy.

But what about processing values of type Either in case expressions like our previous example?

eitherToString : Either Int Char -> String
eitherToString anEither =
    case anEither of
        Left i ->
            "Int: " ++ String.fromInt i

        Right c ->
            "Char: " ++ String.fromChar c

Such a case expression takes an Either value and turns it into a new result type. If I name the result type z, we need a function with a signature like:

someFun : Either a b -> z

But that’s not enough. The caller of this function also has to tell us how to turn an a value into a z value in the Left case, and how to turn a b into a z in the Right case. So we need to pass two more functions to someFun:

someFun : (a -> z) -> (b -> z) -> Either a b -> z

In Haskell, this function is called “either”. Here’s the implementation in Elm:

either : (a -> z) -> (b -> z) -> Either a b -> z
either leftFun rightFun anEither =
    case anEither of
        Left a ->
            leftFun a

        Right b ->
            rightFun b

Using this function, we can rewrite our eitherToString function like this:

eitherToString : Either Int Char -> String
eitherToString anEither =
    either
        (\i -> "Int: " ++ String.fromInt i)
        (\c -> "Char: " ++ String.fromChar c)
        anEither

The rest of the code doesn’t have to change. Here’s an Ellie with the new code: (https://ellie-app.com/t5kDfyrsxRPa1)

The next part will follow soon…

pit · December 16, 2024, 7:59am

Part 3: Either in Church (?) Encoding

Ok, now that the type Either is opaque, we can change its implementation.

Notice: I always thought that the new implementation I’ll show you in this part was called “Church encoding”, but I’m not so sure anymore. I learned the name a while back when I experimented with so-called “Free Monads”. You can see a Haskell implementation of Free Monads as an Algebraic Data Type in this module: Control.Monad.Free. Back then I noticed there’s another module called Control.Monad.Free.Church which implements Free Monads as a function in the so-called “Church encoding”. This is where I got the name from.

But as I said, I think that I’ve been using the wrong term. When Marc asked his question above, I looked at the Wikipedia article for “Church encoding”. At the bottom there’s a link to the so called “Scott encoding”, and I now think that this is a better name.

So thank you Marc for helping me recognize this mistake.

The new implementation is related to the either function from the last part:

either : (a -> z) -> (b -> z) -> Either a b -> z

This could be read as follows: if we have an Either a b value, we can turn two functions (a -> z) and (b -> z) into a z value.

The so-called “Scott encoding” (!) follows exactly this interpretation. It defines the type Either as:

type Either a b
    = Either ((a -> z) -> (b -> z) -> z)

Unfortunately, this isn’t valid Elm code.

In Haskell we could define an Either type like this:

newtype Either a b =
    Either (forall z. (a -> z) -> (b -> z) -> z)

The forall keyword allows us to hide the type parameter z, but in Elm we have to make it explicit, so we have to change the type Either to:

type Either z a b
    = Either ((a -> z) -> (b -> z) -> z)

With this new definition, the helper functions can be implemented as follows:

left : a -> Either z a b
left a =
    Either (\leftFun _ -> leftFun a)

right : b -> Either z a b
right b =
    Either (\_ rightFun -> rightFun b)

either : (a -> z) -> (b -> z) -> Either z a b -> z
either leftFun rightFun (Either eitherFun) =
    eitherFun leftFun rightFun

As a client of module Either, we have to add the new type parameter z to the Either type:

eitherList : List (Either z Int Char)
eitherList =
    [ left 1
    , right 'a'
    , right 'z'
    , left 9
    , right 'p'
    ]

Normally we can leave the type parameter z unspecified as in the code above.

Only when we call the either function, the type parameter must match the result type:

either : (a -> z) -> (b -> z) -> Either z a b -> z

All four z types must be the same.

This turns the eitherToString function from the last part into:

eitherToString : Either String Int Char -> String
eitherToString anEither =
    either
        (\i -> "Int: " ++ String.fromInt i)
        (\c -> "Char: " ++ String.fromChar c)
        anEither

Notice that the type parameter z is String in this case, because we turn the Either value into a String.

The rest of the code doesn’t have to change. Here’s an Ellie with the new code: (https://ellie-app.com/t5KPkjJG9Lxa1)

Stay tuned for more…

pit · December 21, 2024, 2:33pm

Part 4: Scott Encodings in the Elm Compiler

Scott encodings are used in many places in the Elm compiler. Here are all I know of, shown in the Elm version of the compiler:

type Decoder z x a
    = Decoder
        (AST
         -> (a -> z)          -- ok
         -> (Problem x -> z)  -- err
         -> z
        )

type Extractor z a
    = Extractor
        (Set.Set Opt.Global
         -> Set.Set Opt.Global
         -> (Set.Set Opt.Global -> Set.Set Opt.Global -> a -> z)  -- ok
         -> z
        )

type Parser z x a
    = Parser
        (State
         -> (a -> State -> z)                       -- consumed ok
         -> (a -> State -> z)                       -- empty ok
         -> (Row -> Col -> (Row -> Col -> x) -> z)  -- consumed err
         -> (Row -> Col -> (Row -> Col -> x) -> z)  -- empty err
         -> z
        )

type Result z info warnings error a
    = Result
        (info
         -> warnings
         -> (info -> warnings -> OneOrMore.OneOrMore error -> z)  -- bad
         -> (info -> warnings -> a -> z)                          -- good
         -> z
        )

type Solver z a
    = Solver
        (State
         -> (State -> a -> (State -> z) -> z)  -- ok
         -> (State -> z)                       -- backtrack
         -> (Exit.Solver -> z)                 -- err
         -> z
        )

type Task z x a
    = Task
        ((a -> z)     -- ok
         -> (x -> z)  -- err
         -> z
        )

type Tracker z a
    = Tracker
        (Int
         -> Set.Set Opt.Global
         -> Map.Map Name.Name Int
         -> (Int -> Set.Set Opt.Global -> Map.Map Name.Name Int -> a -> z)  -- ok
         -> z
        )

type Unify z a
    = Unify
        (List Type.Variable
         -> (List Type.Variable -> a -> z)   -- ok
         -> (List Type.Variable -> () -> z)  -- err
         -> z
        )

This code worked fine as long as I tested it in my Safari Browser. But when I tried to run it in Chrome, I got stack overflow errors in several places, most of which had to do with these Scott encodings.

In the last part I’ll show you how I fixed those problems.

pit · December 23, 2024, 8:59am

Part 5: From Scott Encodings to ADTs

Let’s look at one example of the Scott encoded types from the last part, the Parser type:

type Parser z x a
    = Parser
        (State
         ->
            {- Scott encoding -}
            (a -> State -> z)                       -- cok
         -> (a -> State -> z)                       -- eok
         -> (Row -> Col -> (Row -> Col -> x) -> z)  -- cerr
         -> (Row -> Col -> (Row -> Col -> x) -> z)  -- eerr
         -> z
        )

The stack overflow errors had to do with “chains” where you use functions like andThen (in Haskell called bind). With andThen, you first execute one parser, and then, depending on the result, another parser. If you do this many times in a row, for example when parsing a module with thousands of declarations, you get a long chain of functions. In browsers other than Safari, this caused a stack overflow.

With my still limited functional programming know-how, the only way I could solve the problem was to change the implementation of some Scott encoded types to ADTs (Custom Types), just the opposite of what I did in part 3.

So how can we do this for the Parser type? If you compare Parser with the Either type from part 3, you see that the function in the Parser constructor has an additional parameter of type State before the “normal” Scott encoding (the function parameters of the form ... -> z). In the code above I marked the place of this split with the comment “{- Scott encoding -}”.

With an ADT, we can only replace the part after this comment, so the new implementation of the Parser type still is a function with a State parameter which looks like this:

type Parser x a =
    Parser (State -> PStep x a)

And the ADT replacement for the part with the Scott encoding looks like this (I took the name PStep from the “elm/parser” package):

type PStep x a
    = Cok a State
    | Eok a State
    | Cerr Row Col (Row -> Col -> x)
    | Eerr Row Col (Row -> Col -> x)

After this change, I could implement a stack-safe loop function like the one in the “elm/parser” package, which finally helped to solve the stack overflow errors.

To end this series of posts, here’s a list of the other types I changed from Scott encodings to ADTs:

type Decoder x a =
    Decoder (AST -> DStep x a)

type DStep x a
    = Dok a
    | Derr (Problem x)

type Result info warnings error a =
    Result (info -> warnings -> RStep info warnings error a)

type RStep info warnings error a
    = Rbad info warnings (OneOrMore.OneOrMore error)
    | Rgood info warnings a

type Tracker a =
    Tracker (Int -> Set.Set Opt.GlobalComparable -> Map.Map Name.Name Int -> TStep a)

type TStep a =
    Cok Int (Set.Set Opt.GlobalComparable) (Map.Map Name.Name Int) a

You can see that the conversion is very mechanical. (I simply adopted the naming convention from the parser example for the other new ADTs.)

Thank you for your patience

system · January 2, 2025, 8:59am

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Zokka: Bug fixes for the Elm compiler and core packages Show and Tell	7	1667	February 2, 2024
Tailwind and Elm = Love? Request Feedback	22	2372	December 7, 2020
Current State of General Purpose ElmPlus Request Feedback	24	1568	March 7, 2025
Project Proposal: Compiler for libraries Request Feedback	6	1726	April 28, 2018
Backward compatibility of new Elm-like languages to Elm Request Feedback	3	286	June 24, 2025