Idea: Extensible Union types and benefits they bring for real world Elm code

As for languages that already implement this, apart from OCaml, the closest next approximation is Purescript with Now I think there are aspects of that library that don’t work for Elm (and the fact that it’s a library rather than built-in makes its ergonomics pretty terrible compared to the standard that Elm puts forth), but it’s a good example of how it’s really just all row types under the hood (i.e. just piggy-backing off the same fundamental mechanism behind extensible records).

RE the error messages, as I’m translating some current Elm SPA codebases to this pseudo-Elm syntax I’m also starting to build a catalog of potential error messages to try to get a feel for how unwieldy the error messages can get in day-to-day coding. It’ll take a while though before I finish those up I imagine.


I don’t think it’s exactly the same as in Daan’s paper.

For example, every record literal has concrete type. So { x = "hi" } can never have a type like { a | x : String } which is really important for getting good error messages in case expressions.

In contrast, I believe @X "hi" would need to have the open type [a | @X String ] if you wanted to try to make it like record types. I believe this is the only way way it could be fed into a case expressions expecting a closed [ @X String or @Y Int ] type with rules similar to Daan’s. I think that’s an important difference when it comes to error message quality!

And if the literal type is open, wouldn’t the whole case result type be open too? Or is there a special typing rule for case expressions that does not appear in Daan’s paper?

I think these kinds of differences would come out in typing rules specific to your idea.

Note: I included the OCaml quote for readers who may be new to this topic. I personally cannot fully evaluate how a proposal like this differs other things without seeing the specific typing rules.


Yep exactly! This thread may be getting too long for its own good; I think there’s some confusion on exactly what the proposal is at hand. This is long and verbose so I’m not sure if you’ll have the time to read it, but I’ve posted the full (informal) outline of the proposal

I think that’s an important difference when it comes to error message quality!

I don’t actually think the hit to error messages is too high (although I’ll know more about that once I finish writing up that hypothetical error catalog). There’s already an analogous situation in Elm today with field names and records (record literals have concrete types, but field literals have polymorphic types whereas case matches have concrete types and variant literals have polymorphic types). That’s a perfect segue into

And if the literal type is open, wouldn’t the whole case result type be open too?

No, but that’s because a literal used in a case is specialized to a concrete type. case matches can only occur with concrete types (which is why catch-all cases are disallowed).

For example from the scoped labels paper, tab works fine with showEvent even though tab is an open type and showType is entirely closed types.


I’ll have to step out of the discussion for now. I’d need to see typing rules specific to this idea to be able to engage further.


Fair enough. I’ve got to run too, but sometime over the next few days I’ll type out how the whole thing desugars to Leijen’s paper.


Just jamming in a quick thanks for that great explanation of the pitfalls! Really interesting, thanks for taking the time to lay it out!

1 Like

May I provide some interesting reading material about this topic?

From there you can follow the deep rabbit hole of references. I’ve been looking into making a language centered around the idea of extensible data types because it honestly deserves more attention that it’s getting.


Well life made this take a week or two or more than I had hoped to get back to this…

@lboklin good eye! Indeed Koka is the brainchild of Daan Leijen, the guy behind the record paper that I’m desugaring everything behind and whose work forms the basis of Elm’s record types!

@evancz (and anyone else interested): I finally got around to desugaring everything to Leijen’s paper. It’s typed as a PDF because writing proof trees is hell without LaTeX.

The PDF ended up being a bit longer than I had hoped it would be, but the main takeaway is the one page that desugars all my examples in my original post to Leijen’s formalisms. The rest of the PDF is just demonstrating that my assigned type signatures are valid.


After only glancing a little over your blog post last time, I’ve now committed to thoroughly reading it this time.

First of all, some nitpicking:

(please skip to “Now, an Idea”, if you want to read interesting stuff)


One of the code examples lists this code:

type alias T1 = @A () or @B () or @C ()

decodeFromString1 : String -> Maybe (a or T1)
decodeFromString1 str = case str of
    "a" -> Just <| @A ()
    "b" -> Just <| @B ()
    "c" -> Just <| @C ()
    _ -> Nothing

There’s no syntax for something like { a | Person } in Elm right now, so I don’t quite understand why there should be (a or T1).

Shouldn’t you add the type parameter to T1 directly? Sorry if this came up in the thread already. I think I’ve read through all of this, but that was two weeks ago already.


In general, the elm community seems to not recommend using extensible record types.
One of the reasons is that an alternative to extensible records is simply nesting records. That’s what Evan recommended years ago (maybe his views have changed).
But your blog says the recommendation is to ‘Keep your models flat’.
Trying to combine the two recommendations, I’d argue that ‘Keep your models flat’ doesn’t mean ‘Keep your models flat and abstract with extensible records instead of nesting records’, but something like ‘Keep your models flat for now and don’t abstract prematurely’.


You make two big points to advocate for extensible union types.

At a high level extensible unions generally bring two things to the table:

  1. The ability to make a type’s size “just right” when a union type is the output of a function.
  2. The ability to use flat hierarchies of union types rather than being forced into nested hierarchies.

I really like how concise they are, btw.

But (unless I’m missing something) these points seems to be more about the structuralness than about the extensiblity of union types. I think it would make sense to ‘pitch’ your Idea to be more about structural union types than about extensible union types. Obviously, you’d need extensible union types, as you need to infer something useful, for e.g. f = @A (), but the big benefits of your proposal could be used without ever writing any extensible union type signatures, just like today many people use the huge benefits of structural record types without ever writing extensible record type signatures, even though the ‘extensible’ feature is absolutely necessary for structural record types to exist in conjunction with type inference.

Though that’s just a mad level of nitpicking about how to pitch this. :smiley:

Now, an Idea

We can (almost) try out (a little less ergonomic version of) extensible union types today!

One huge problem this proposal is set out to solve is to break up huge message types that naturally grow, when you grow an application.

Let’s say you have such a message type Msg. Now you’ve got a view function:

view : Model -> Html Msg
view model =
        (viewHeader model)
        (viewBody model)
        (viewFooter model)

viewHeader : Model -> Html Msg
viewHeader = ...

viewFooter : Model -> Html Msg
viewFooter = ...

Now, viewHeader and viewFooter have this problem that you mention in your article: They’re over-constrained.
Just like

-- An exaggerated example that people don't do, but illustrative
addOne : Int -> Maybe Int
addOne x = Just (x + 1)

which doesn’t need to return Maybe, viewHeader doesn’t produce any constructor of Msg, but maybe just 6-10 of them, like GoToPage, when you click on a settings icon, or ToggleDarkTheme and some more.

But as you noted in your article, records and unions are dual to each other. So when you have extensible record types, you almost have extensible union types:

viewHeader :
    { a
        -- these fields mirror the signatures of your union constructors
        | goToPage : Url -> msg
        , toggleDarkTheme : msg
        ... -- other messages used by viewHeader
            -- but no messages used by e.g. viewFooter
    -> Html msg
viewHeader =

This way, you make the msg type of your viewHeader function ‘just right’.

What you’d do to use this, would be to create one huge record of all of your constructors for Msg, let’s call that msgConstructors and pass that to all view functions:

view : Model -> Html Msg
view model =
        (viewHeader msgConstructors model)
        (viewBody msgConstructors model)
        (viewFooter msgConstructors model)

msgConstructors =
    { goToPage = GoToPage
    , toggleDarkTheme = ToggleDarkTheme
    , emailFormSubmit = EmailFormSubmit -- used in viewFooter
    ... -- a field for all other message constructor

However, there are still differences with this approach and actual structural union types:

  1. You have to define this msgConstructors record!
  2. When adding a message constructor, you need to also add it to msgConstructors. Also, msgConstructors and Msg can get out-of-sync and there can be errors in translation.
  3. Abstracting out these ‘extensible union types’ doesn’t take only 1, but 2 type parameters:
    type alias HeaderMsgs a msg =
        { a
            | goToPage : Url -> msg -- what else, if not 'msg'?
            , toggleDarkTheme : msg
    With actual extensible union types, it’s only 1 type parameter.

But I like that it’s almost possible to try out how extensible union types could be used to structure an elm application today. In such a way that it’s also possible to run it and verify that it gives you the benefits you hope for.

I also remember that something like msgConstructors gets auto-generated in the functional programming language ‘Dhall’ for their extensible union types. They essentiall support this:

type alias HeaderMsgs =
    @GoToPage or @ToggleDarkTheme

msgConstructors = HeaderMsgs -- a value-level identifier generated for the type-level type alias definition.

For more info on that feature see the dhall language cheatsheet.

Earlier I suggested this is not exactly like extensible records for two reasons:

  1. That the type of value @X "hi" would be the open type [ a | @X String ]
  2. There needs to be a special rule for case expressions.

Both of these differences appear in your write up, which makes sense. The fact that the general mechanism is the same does not account for practical differences like this when it comes to implementation and error messages. That’s the point I was trying to make above.


The fact that @X "hi" has an open type has particularly big consequences with error messages. Take this function:

toOutput input =
  case input of
    @A a -> @Quick a
    @B b -> @Brown b
    @C c -> @Fox c
    @D d -> @Jump d
    @E e -> @Lazy e
    @F f -> @Dog f
    @G g -> @How g
    @H h -> @Much h
    @I i -> @Wood i
    @J j -> @Could j
    @K k -> @Wood k
    @L l -> @Chuck l
    @M m -> @Chuck m
    @N n -> @If n
    @O o -> @Wood o
    @P p -> @Coud p
    @Q q -> @Chuck q
    @R r -> @Wood r
    @S s -> @Could s
    @T t -> @Chuck t
    @U u -> @Wood u
    @V v -> @Could v
    @W w -> @Check w
    @X x -> @Wood x
    @Y y -> @Brown y
    @Z z -> @Fox z

Where is the typo? Is there one typo? Zero? Multiple?

And are there any type errors here? How would you even begin to figure that out?

With a closed ADT, the compiler can underline the exact constructor that has a typo or type error, with or without a type annotation.

It is easy to say “this is no problem if people just add the type annotation” but this is not fully true:

  1. You can only give an error for the whole case branch, not for the specific constructor. So with a branch with let x = ... in x, instead of underlining Coud directly, it is going to say "there is something wrong with the type of this let or this let body. If the branch is 10 lines or 20 lines, this is significantly less specific. So even in the best case, the error message is much less specific.

  2. Say you want intend for @Whatever x to have type [ a | @Whatever Int ] but in a large case branch someone ends up using the constructor with a String value. With the current design, you get the error message directly under Whatever x but with the proposed design you can only say “this branch does not match the type annotation” again getting 10 or 20 line chunks on large case branches.

  3. My experience very strongly suggests that if a program can be written, it will be written. People will be looking at 400 and 600 line case expressions hunting for a typo or type error with associated data. At that point, it is fair to say “the error messages are not very good” and there is no real way to get the quality back besides not using the feature.

In the past, we took out the ability to change the type of record fields in the record update syntax specifically because of problems (1) and (2) where you couldn’t get good specificity, particularly with unannotated cases which are not uncommon in practice. Even after restricting the design of records, it’s still hard to underline the specific field name that has a typo.

I hope this establishes the error message quality issues clearly.


Say we have this BEFORE and AFTER code, where we are getting the best case error messages for both open and closed union types:


type Output
    = Quick String
    | Brown String
    | Fox String
    | Jump String
    | Lazy String
    | Dog String
    | How String
    | Much String
    | Wood String
    | Could String
    | Chuck String
    | If String

toOutput : Input -> Output
toOutput input =


type alias Output
    = @Quick String
    or @Brown String
    or @Fox String
    or @Jump String
    or @Lazy String
    or @Dog String
    or @How String
    or @Much String
    or @Wood String
    or @Could String
    or @Chuck String
    or @If String

toOutput : Input -> Output
toOutput input =

To my eye, the AFTER looks harder to understand and comes with a bunch of downsides that will come up in practice a lot. Conservatively lets say 50% of users don’t think about all the ways the error messages are impacted by the BEFORE and AFTER and many are not writing type annotations, especially beginners. The result is that error messages are worse in practice, and there is nothing that really can be done about it.

The best path to deliver them good quality is to strongly recommend against using this feature at all. Furthermore, beginners would be seeing person A recommending this feature highly and person B recommending against it strongly. What should they do? Should they try both? Is this question important to making the website or game they set out to make? Do teams need to argue about it in their style guide on features to use or not? (Any team I’ve been on that uses C++ or Haskell has had a style guide banning specific features specifically because there are so many trade-offs with extensions, macros, features, etc. The reason these style guides are so common is that there is a real tension between “make working code that others can read and modify easily” and having lots of ways to express the same thing.)

So based on my understanding of the design, it seems like some aspects of error message quality can be addressed (e.g. maybe with pattern part of case branches) but that there are still significant tradeoffs in error message quality in other areas.


I could be wrong about things here, but it feels like this kind of feature is a bit risky for Elm. I try to prioritize ease-of-learning and error message quality very highly. So while I am open to the idea that someone could figure out how to make open union types strong on those points, I would not be comfortable running that experiment in Elm with the information I have at this point.

It seems like a lot of cool things could be done if the core design of the language was “always use open union types” and there just wasn’t closed unions except when you say [ X Int | Y String ]. That would also mean that @Foo could be written as Foo without clashing with another language feature. It’d look a lot cleaner, and it may end up leading towards a different style of typed functional programming that many people could be into. (Lots of people value flexibility and having many ways to do things! E.g. people who prefer Ruby over Python! So even if the error messages are never as good, many people have priorities where that is a worthwhile tradeoff.)

So it feels to me like something worth exploring independently to get a feeling for the full implications in a setting where the culture, best practices, libraries, etc. can all evolve in a coherent way, with flexibility prioritized a bit higher than other things.


I just want to drop my two cents in here:

I am reading this discussion with excitement and joy, even when I don’t really understand the specifics of the details. However, what this discussion boils down to me is this:

  1. It is basically a ‘Keeping Elms integrity and spirit vs. Effort to make beneficial changes’ strong energy investment. To be honest, this one point, in my opinion, is immature to the Language and Community. A lot of theory in this case (which of course is also important) vs. way less practical applications and use case scenarios.

  2. It boils down that some people make a great effort not knowing that their worked out solution works against the core strengths of Elm and others make a great effort for keeping the spirit and core features. So what could we do about that to approach things like this better and invest energy more efficiently ? Even more generally speaking, the culture of developing Elm ? This is what I see in the discussion.

Open unions are still quite useful without the extensible part, the main feature in the examples provided by @yosteden and even the Ocaml guide are the composition part, where different constructors can be composed different ways, like this example How to represent many subsets of an enum?.

Could it possible to have exactly the dual of records for Elm, where features like adding fields/variants are just removed to achieve the same error message experience?

I agree with the trade off part, but can we just force type annotation on functions with open unions (otherwise compilation error with a nice suggestion) to make the experience a bit smoother?

@eimfach I don’t know about @evancz’s feelings here, but I think this current discussion is perfectly cromulent! I view Evan and I as playing out the roles we’re each supposed to have in this discussion. Evan is supposed to be the conservative, wary gatekeeper, one who has seen countless other breathless proposals of “the next best thing since sliced bread” that must be added to Elm.

I play the role of the breathless proponent! As I said at the outset of all this, although I would very much love to see this feature added to Elm, this thread is primarily about sparking discussion around it rather than forcing a decision on its admission.

@Philipp_Krueger after writing out my post I link later in this reply I’m a bit out of gas to give you a full reply that goes into detail about your idea, but I’ll just say that your proposal has legs! You’ve stumbled on a pattern that sometimes goes by the name of the Translator Pattern and is found in a lot of Elm libraries (notably a lot of No Red Ink’s libraries use this pattern). It turns out there’s actually another step forward you can take which is to use a record of functions to entirely represent an extensible union (instead of having an additional output type that the record maps to, e.g. your Html msg). This technique goes by various names (Church encoding, Scott encoding, Boehm-Berarducci encoding, etc.), but the basic idea is the same as yours. Unfortunately it can’t be full implemented in Elm due to the lack of higher-rank polymorphism (it’s a good thing Elm doesn’t have this) and so comes with sharp edges in Elm as it stands.

And as for @evancz’s comments, I’ve typed up my reply as a separate blog post because it’s too long to fit in a single Discourse reply. The post is here:

Basically, while his comments are true for the way that Elm currently implements error reporting, I don’t think this is intrinsically true of extensible unions. In fact, I think it’s possible to have specific error messages localized to a single offending line even in the case of completely unannotated code, let alone code with annotations!

My post goes into that. The actual algorithm is quite straightforward, but it requires quite a bit of set up to motivate and explain why it works.

Also, my current iteration of the algorithm is somewhat slow and clunky, if straightforward, and I think it can be improved significantly, but I’m putting it out there to show the quality of error messages we can potentially have and to show that the tension between extensible error messages is not an intrinsic one, but an extrinsic one specific to the implementation of an error reporting system.

One big thing to note is that what I propose in this post is potentially independently useful for Elm, even outside of extensible unions. I’ll couch a lot of this reply in the language of extensible unions since that’s what we’re talking about, but everything I say here pertains only to row types, and so pertains to Elm’s records as well. N.B. row types are the types that underlie both record types and extensible union types, the set of types and type tags (i.e. field names in records and variant tag names for extensible unions) within the brackets. As shown earlier here, ultimately the typing judgments for both records and extensible unions are the exact same.

Here’s the overall structure of my post, for easy jumping to the parts that interest readers.


Great writeup. There’s a line I don’t quite understand. Did the sentence cut off?

Assume we have a type error between two row types A and B for a given expression E , at least one of which is polymorphic, take the row type that has two

Another one that’s confusing me:

The dropping of elements in step 3 occurs because we can have irrelevant symbols that are “swallowed” by functions and don’t contribute to the final

What do you mean by ‘C’?

First, unification results in x3 having a type @BB Int which f does not support. Then we collect all our symbols for C , which are the following:

Do you mean value? So, if I understand correctly, you try to list all symbols that appear in value?

I like this idea. I think it only works though when being used to construct members of an ‘extensible’ union type, rather than to pattern match on them as inputs. In the expression below, the msg is always the return type of some function passed in, not the input type. Is the right way to describe this to say that this msg type is always in the contravariant position? Too long since I studied the theory…

{ a
| goToPage : Url -> msg
, toggleDarkTheme : msg

But I think you are on the right track as far as I am concerned. You can sort of do extensibility or sub-sets of union types in Elm in a number of ways. It may add some complexity, but I also think its an attractive property of Elm that it forces you to trade complexity for flexibility - do you really need the flexibility? if so, you must pay a price for it.

Also worth noting @gampleman’s proposed encoding in Elm too:


Regarding use cases in current Elm code, I am reminded of the Scale module in elm-visualization. I think that module effectively uses records of functions to simulate some of the functionality proposed here. The problem there is of bounded polymorphism - some operations need to work on some types, but not on all and there is some advantage gained from doing this in a polymorphic way.

I suspect this proposal could make it much nicer: Currently you have function signatures like this:

nice : Int -> 
    Scale { a | nice : domain -> Int -> domain, domain : domain } -> 
    Scale { a | nice : domain -> Int -> domain, domain : domain }

Whereas it could be coded as:

nice : Int -> @LinearScale or @QuantizeScale -> @LinearScale or @QuantizeScale

which I think would be much nicer for the users of the package (as they either have to do a lot of squinting at the types to figure out what is a valid argument or check the docs).

This signature appears to say that you could provide a @LinearScale and the function might return an @QuantizeSclale. This seems under-constrained. I haven’t worked my way far enough through the proposal yet to know whether something like Int -> (@LinearScale -> @LinearScale) or (@QuantizeScale -> @QuantizeScale) is expressible, but that seems more like what you would want.

While this is true, that also is the case if the signature was

nice : Int -> Scale -> Scale

so this is still an improvement. The point is that it prevents you from passing say a @BandScale.

@Philipp_Krueger yep sorry for those typos! I was typing quickly to try to crank out the post before I needed to do something else. I’ve corrected them and hopefully the post makes a bit more sense.

@rupert A small nit, but if the type appears in the output of a function it’s in a covariant position. Regardless, variance almost never shows up as an issue you need to think about in Elm (at least I don’t think I’ve ever run into it), because there’s only one kind of subtyping: specializing a type variable. In fact @Philipp_Krueger’s proposal exactly encodes a case match statement (you think of it as the record argument being a template case match statement that you then fill in with an actual record to do the actual case match).

Indeed Elm is very close to allowing that proposal to completely encode union types (extensible or not) with product types (extensible or not) and higher-order functions. This is effectively what OO languages do with the visitor pattern to encode union types when they don’t have them.

The actual sharp edge with @Philipp_Krueger’s proposal I mentioned has to do with the “very close” part. In particular, Elm lacks the ability to “delay” specialization of a polymorphic signature. For example, the following code has no problem with identity and specializes it separately to an Int and String argument in its two invocations.

x : Int
x = identity 5 + (String.length (identity "hello"))

However, as soon as you try to extract identity as an argument to a function, Elm refuses to compile the expression. There is no way in Elm’s type system to type the resulting function.

-- Does not compile
f : (a -> a) -> Int
-- Not the correct type signature, but the closest we can get
f identityAsArg = identityAsArg 5 + (String.length (identityAsArg "hello"))

This is because identityAsArg must be specialized once for the entire body of the definition. Its specialization cannot be “postponed” to each call site. With explicit polymorphic foralls we can express the signature we want

-- Elm can't write this signature, but if it could then this code would
-- compile
f : (forall a. a -> a) -> Int
-- Previously we had forall a. ((a -> a) -> Int)
-- Notice the position of the parens in the type signature
f identityAsArg = identityAsArg 5 + (String.length (identityAsArg "hello"))

This same problem is behind what @Philipp_Krueger’s astutely notices as an issue with the fact that there are two type parameters instead of one.

-- If we had this feature these two encodings would be equivalent
-- Notice output does not appear as a type argument for ExtUnion0
type alias ExtUnion0 a = forall output.
    { a
    | case0 : Int -> output
    , case1 : String -> output
    } -> output

type alias ExtUnion1 a = a or @Case0 Int or @Case String

Their equivalence can be seen by looking at a case match

handleInt : Int -> MyType
handleInt = ...

handleString : String -> MyType
handleString = ...

value1 : MyType
value1 = case extUnion1 of
    @Case0 int -> handleInt int
    @Case1 string -> handleString string

value0 : MyType
value0 = extUnion0
    { case0 = handleInt
    , case1 = handleString

Unfortunately, the additional type argument is more than a simple syntactic annoyance. With that additional type argument this encoding is equivalent 80% of the time, but has a huge gap in functionality for the remaining 20%.

type alias BadExtUnion a output =
    { a
    | case0 : Int -> output
    , case1 : String -> output
    } -> output

listOfValues0 : List (BadExtUnion {} output)
listOfValues0 = ...

matchToInts = { case0 = (\_ -> 0), case1 = (\_ -> 1) }

matchToStrings = { case0 = (\_ -> ""), case1 = (\_ -> "a") }

-- I would like to evaluate the first and second elements of the list to
-- have different output types, but the output argument means I have 
-- to treat the entire list uniformly
cantDoThisWontCompile = case listOfValues of
    x :: y :: _ -> 
        -- This won't work, I can only use matchToInts or matchToStrings,
        -- not both, since the concrete type of `output` can only be
        -- determined once
        (x matchToInts) + (y matchToStrings) 

whereas with normal extensible unions you could do separately typed case matches with no problem.

I think I know where this sentiment is coming from but I think it comes back to bite users. As the above demonstrates, there’s brick walls that a user can unexpectedly hit with partial encodings of extensible union types and more importantly the error messages that Elm starts generating become a bit confusing for a user not used to those patterns (this is a real problem with the Translator pattern today). On the other hand I think true extensible union types can have very informative error messages and don’t come with the brick walls.

It’s sort of the same reason why normal union types are so valuable even though in theory you could encode union types with product types and higher order functions. We could implement the equivalent of the visitor pattern everywhere instead of using union types, but it’s pretty painful to do so for no real gain (there are good reasons to be using union types and it’s probably more harmful than helpful to gate their access behind knowledge of a specific pattern).

Also just to be entirely clear:

I don’t think Elm should get explicit foralls. That way lies a lot of knock-on complexity (in its full generality we lose global type inference, unification becomes potentially trickier, etc.).

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.