Idea: Extensible Union types and benefits they bring for real world Elm code

(Egads! I’m a crazy Internet dude who’s written far far too much on this, please let me know if this should be written as a blog post and then just relinked here)

EDIT: I have posted most of this as a blog post here and only left the top-level summary so that it doesn’t cause users to have to keeping scrolling forever to get to the end. Please let me know if posting the original content here instead would make things easier to talk about.

Hello! This is an exploratory post of what extensible unions (a.k.a. polymorphic variants) might look like in Elm. Note these currently do not exist in Elm! This mainly a post to plant a seed in the minds of the collective Elm community and is not a call to action that they need to be implemented in Elm (although I would really really really love if they were).

One-line headline: I think extensible unions not only fill in a missing abstract corner of Elm’s type system (rather than being a entirely novel extension to its type system), but also solve many real-world problems affecting Elm codebases today, all without compromising and indeed augmenting the Elm compiler’s excellent way of guiding users through refactors.

This post is jumping off a separate post here with some folks who were interested in a deeper explanation of extensible unions/polymorphic variants. Since they expressed interest, I’ll tag @rupert and @dta here. I’ll also tag @ymtszw because their comment about “narrowing down” types is exactly what extensible unions are all about (despite their moniker of “extensible”, in the same that “extensible” records let you narrow down records).

This post ended up being really long (I swear I’m only slightly crazy, not totally so), so I apologize for its length in advance, but I do hope that people find it useful. It is in fact too long for a single post, so I’ll split it in two. (EDIT: I’ve moved it all to a separate blog post so I’ve deleted the second post and just linked it here).

Because of its length I’ll just lead off with a high-level summary.

Polymorphic variants or extensible union types are to normal Elm union types (e.g. type MyUnionType = UnionVariant0 | UnionVariant1 | UnionVariant2) what records and extensible records are to normal Elm product types (e.g. type MyProductType = ProductType TypeOfElement0 TypeOfElement1 TypeOfElement2).

As such they allow for many of the same benefits that records and extensible records bring to input types of functions and modularizing Elm models, to output types of functions and modularizing Elm messages and update functions.

The post is split up into the following sections:

For more information, see this post: https://shuangrimu.com/posts/elm-extensible-unions.html#what-are-extensible-union-types

26 Likes

That is an impressive blog post and well thought-out. Thank you!

I don’t have much more to say but to agree that extensible union types in this (or a similar) fashion would be an interesting addition to Elm.

2 Likes

Because other users have commented on this being withdrawn, this post was not actually withdrawn, I just deleted what used to be part 2 of the post. The original post was too long to fit in a single post.

I moved the entirety of this post to a blog post because of its length and deleted the second part (this overall thread will remain though!).

1 Like

This puts into words the discomfort with “Msg” types that I’ve run up against time and time again as I take on increasingly complex Elm apps. I think what really resonates with me is how you address the problem we have trying to follow the instructions in the Structure section of the Elm guide. I love those guidelines, but have found them impossible to follow with our current tools, and I think you’ve hit it on the head with extensible union types filling that gap.

4 Likes

This is the most well-thought through language proposal I’ve read on this forum and the one that got my mind whirling about how it might disrupt my code. I would be curious to see code for a small (or large!) spa showing the advantages of this approach when handling message hierarchies.

Just as extensible records are something that beginner Elm programmers don’t need to immediately learn, extensible union types can similarly be introduced when the problems they solve become apparent.

In the article, its suggested that the case statement syntax doesn’t need to change (with “case#” being used for clarity in the examples). I’d be curious to know if extensible union types as a whole could be introduced without new syntax in, say, Elm 0.20 if we accept that all 0.19.x code would be broken (but perhaps easily updated with an automated tool).

1 Like

At a bare minimum, thanks for the writeup. I find it very easy to understand. This might be the first new feature I could see being worth adding to Elm, particularly because of the purposeful limits you mentioned to keep it inline with extensible records.

Wow. This is the best Elm feature suggestion I have ever seen. How will this affect the elm architecture?

Hi,

As others have commented its a solid idea and I appreciate the in-depth write up you have given it.

I notice how ‘clean’ Elm syntax is, and that is something I find very appealing about the language. Since custom operators were removed the operator table for the language is very small, and almost entirely obvious. A beginner coming from another FP language might take 5 minutes to review what |>, <| and >>, << do, and also look over the difference between / an //.

None of the existing keywords or user defined variables or types have any symbols in them - so I find that the ‘@’ and ‘#’ jar a little and look like the more accademic FP languages. Elm uses alpha characters only in its built in keywords, and symbols only in other fragments of syntactical glue that you use in your code; such as -> or [] and so on. So I though I would grab some of your example code and see if we can make it look more… elmish.

What do you think of this?

-- Hypothetical extensible union type, again note that this is a type alias
type alias User1 = 
    [ RegularUser Int 
    , AdminUser String
    ]

-- Hypothetical extensible union type actually using the extensible portion,
-- note the type variable
type alias User2 a =
    [ a |
    , RegularUser Int
    , AdminUser String
    ]

toInt : [ RegularUser Int, AdminUser String ] -> Int
toInt user = typecase user of
    RegularUser x -> x

    AdminUser str -> length str
8 Likes

Just to join in on the joyful bike shedding, you could probably with out much ambiguity simply use the existing syntax:


type alias User1
    = RegularUser Int
    | AdminUser String

type alias User2 a 
    = a
    | RegularUser Int
    | AdminUser String

toInt : ( RegularUser Int | AdminUser String ) -> Int
toInt user =
    case user of
        RegularUser x ->
            x

        AdminUser str ->
            String.length str
15 Likes

That’s a really well thought and well presented language extension! I’m also not sure about the syntax as others pointed out but the general idea would be very much welcomed in my opinion.

I agree with the other posters that this looks like a cool feature to have in Elm. One thing I don’t understand though is why is the type signature looks like this (borrowing @gampleman’s syntax)
toInt : ( RegularUser Int | AdminUser String ) -> Int
and not like this
toInt : ( a | RegularUser Int | AdminUser String ) -> Int

If there’s no type variable, won’t there be problems in cases such as this?

foo : ( A Int | B String ) -> ( A Int | B String ) -> Bool
foo first second =
    -- Without a type variable, we could be doing equality on values that aren't of the same type.
    first == second

foo2 : ( a | A Int | B String ) -> ( a | A Int | B String ) -> Bool
foo2 first second =
    -- Now we require that first and second are of the same type.
    first == second

Without a type variable, we could be doing equality on values that aren’t of the same type.

Without a type variable, they would be the same type. Remember this is exactly like records:


fooRecord: { a : Int, b : String } -> { a : Int, b : String } -> Bool
fooRecord first second = 
      --  this is fine, since these are literally the same type
      first == second

fooRecord2 : { a | a : Int, b : String } -> { a | a : Int, b : String } -> Bool
fooRecord2 first second =
    -- This is also fine, as the type variable means that both are the same type
    first == second

In the same manner, both of these work:

foo : ( A Int | B String ) -> ( A Int | B String ) -> Bool
foo first second =
    -- This means a type with exactly 2 cases: `A` and `B`
    first == second

foo2 : ( a | A Int | B String ) -> ( a | A Int | B String ) -> Bool
foo2 first second =
    -- This means two values of the same type that have at least 2 cases: `A` and `B`.
    first == second
4 Likes

As others have said, a really well thought out and welcome language proposal.

I understand that the @/# syntax was a temporary thing just to make it crystal clear when you were showing extensible union types and when you were showing traditional non-extensible union types.

A couple of thoughts:

Case expressions

A problem with case expressions is that you frequently wish to define a value that is used in some of the cases, but not all. The standard way around this is to create a function in a ‘let’ above the ‘case’ expression, or maybe just the entire value if you don’t care about it being needlessly evaluated. However, in large case expressions this is really sub-optimal. I’m not concerned here with performance, I’m concerned with code organisation, I want to keep definitions close to where they are used if possible, and limit their scope if that’s also possible.
It seems to me that extensible union types could offer some optimism here. Because it would be possible to type part of a case expression. I’m not exactly sure how this would work, and it would likely need some additional changes to the language to take advantage, roughly it might look like this:

case e of
    @One ->
       something here that does not depend on x
    e2 ->
        let
            x =
                some expression
        in
        case e2 of
            @Two ->
               x + 1
            @Three ->
               x * 1

This works a bit better than the current way because the nested case over e2 does not need to have a needless case to take care of @One which can never happen. The type of the second case expression is (@Two | @Three) -> Int and that would ‘merge’ with the type of the first branch of the outer case expression which would be @One -> Int to give the type of the outer case expression to be @One | @Two | @Three -> Int

Recursive types

Consider recursive types carefully. I had thought this would be a really nice way to implement non-empty lists, something like:

    type List a = @Empty | @Cons a (List a)
    type NonEmptyList a = @Cons a (List a)

Then you can implement head without using Maybe as:

head : NonEmptyList a -> a
head l =
    case l of
        @Cons a _ -> a

A potential use of this is the abstract syntax of some DSL which you can simplify, so imagine you have the syntax into which you parse something like:

type Expr 
  = @Number Int 
  | @Name String 
  | @Add Expr Expr 
  | @Mul Expr Expr 
  | @Sub Expr Expr

Then a simpler syntax into which you munge the parsed syntax. This is sometimes used in compilers as a ‘desugaring’ phase.
Here the simplified syntax is the same but doesn’t have the @Sub expression:

type SimpExpr a = 
      = a 
      | @Number Int 
      | @Name String 
      | @Add (SimpleExpr a) (SimpleExpr a) 
      | @Mul (SimpleExpr a) (SimpleExpr a)

Then you can define a simplify function as:

simplify : Expr -> SimpleExpr ()
simplify expr =
    case expr of
        @Number i ->
            @Number i
        @Add left right ->
            @Add (simplify left) (simplify right)
        @Mul left right ->
            @Mul (simplify left) (simplify right)
        @Sub left right ->
            -- This is the important case that actually gets rid of the @Sub constructor
            @Add (simplify left) (@Mul (@Number -1) (simplify right))

So the nice thing about this is that suppose you write a function that extracts all names, it works over both the Expr type and the SimpleExpr type.

Now as I said, you have to consider this carefully, I’m not sure any of this actually works. Note that to even get it to half work I had to essentially copy the definition you cannot define lists as something like:

type NonEmptyList a elem = a | @Cons elem (a | NonEmptyList elem)
type List elem = @Empty | NonEmptyList (@Empty) elem

Though I’m struggling to remember why not, other than it’s certainly less intuitive than the traditional definition.

Bike shedding on syntax

I liked @gampleman’s “joyful bike shedding” you can safely ignore this section :wink:

I think it would be most elegant if union types were treated the same as record types, such that you do not define a new ‘type’ but just a ‘type alias’. In which case there would be no need for just ‘type’ and you could therefore remove the need for the ‘alias’ keyword. This is basically what @gampleman was suggesting. This would have some issues:

You would need a way to distinguish between simply defining an alias, such as the current type alias Id = String (or type alias IntList = List Int)and defining the type with a single constructor type Id = String. Personally think that the way around this is to have unions with a surrounding parentheses, say ‘()’ or ‘[]’ analogous to ‘{}’ for records. Obviously this would be a backwards breaking change. But it feels to me as if it would be the most elegant change. All type ‘definitions’ would merely be aliases for types that could be inferred.

I realise this is a huge breaking change.

Also I realise this is sort of mostly separate to the question of whether extensible union types are desirable.

2 Likes

Having this syntax would not work:

foo2 : ( a | A Int | B String ) -> ( a | A Int | B String ) -> Bool

What if you have this type? There would be no way to distinguish between the Thing type and the Thing variant:

type Thing
    = Thing
    | OtherThing

I think this syntax would be best because it’s clean and it’s consistent with extensible record syntax:

foo2 : [ a | A Int, B String ] -> [ a | A Int, B String ] -> Bool

I’m extremely excited by the reception this idea has gotten from everyone! Quite a few responses to unpack here.

Given this interest, I’ll expand out the post some more over the coming weeks (timing is a bit tight at the moment so I don’t know if I can do it all now). There’s some stuff I wanted to get to but haven’t yet.

More things to come in the proposal

I was definitely starting to handwave exactly how to make different tradeoffs than OCaml has made towards the end of the post (I was rushing to finish and wasn’t sure if people would read the whole post to begin with). Although disallowing adding and removing variants is part of the limitation we’re imposing on extensible unions (by following extensible records here), that’s only a (in retrospect minor) portion of what makes extensible unions more “foot gun-y” in OCaml than I think is appropriate for Elm. There’s some more cleverness in how OCaml does things that I think is worth pointing out if only to show the full limitations of extensible unions as I’ve presented them. This also helps flesh out some more the purposeful limits @wolfadex is referring to (and more holistically look at the tradeoffs).

I also hope to go through other languages (Typescript and Purescript in particular) that have versions of extensible unions (or at least structural unions, i.e. the structural part without the extensible part) and what tradeoffs this proposal is making relative to those versions. I chose OCaml initially because it is the closest in spirit to this proposal, but I think it’s worth outlining what this proposal does differently from Typescript and Purescript and what we gain or lose based on that.

There’s also another point that I glossed over in the current proposal, which relates to typeInOutput.

typeInOutput : a or @SomeTag Int -> a or @SomeTag Int
typeInOutput x = x

Is there a way of writing a non-trivial function that keeps the same type signature, but does something other than just be the identity function? There is, but it might be non-obvious. The case# notation I’ve presented so far can’t handle the a case and the obvious extensions to make it possible (either by ignoring it with _ -> () or making it a passthrough with a -> a) make extensible unions overly powerful in comparison to extensible records. Again thinking about what Elm already has for extensible records is helpful here (there is record update notation, which is quite limited compared to record creation notation, maybe there’s an analog here for union update notation vs a full case statement). I have thoughts there that I haven’t written out yet.

Miscellaneous replies

@creminology yes I’d love to go through https://github.com/rtfeldman/elm-spa-example and change everything over and show the comparison. I’m not sure if I’ll get the time to do that this week, but it’s definitely on my radar. I’d also love if someone else tried to do the same thing with what they understand of this proposal. It’d be great to compare notes (and they might finish before I do)!

@allanderek RE case expressions: yes! The specifics of whether your plan will work as is will depend on the implementation of the type checking, but even if that specific syntax doesn’t work, you can annotate e2 with the remaining variants left (perhaps a type alias instead of literals for conciseness) and then go from there. RE recursive types, so as things stand right now, because type aliases cannot be recursive, you need a normal (nominal) union type somewhere to break the recursion (as in the “Less obvious, but nicer” example in https://github.com/elm/compiler/blob/9d97114702bf6846cab622a2203f60c2d4ebedf2/hints/recursive-alias.md). Your NonEmptyList and Expr examples are still really interesting to talk about, so I’ll address them in the last part of this reply where I talk about potentially even bigger changes (since this would involve changes to the semantics of type alias).

@DullBananas RE changes to the Elm architecture, if you’re talking about the Elm architecture (i.e. model, update, view) there is no change and in fact a lot of this proposal is built around keeping that front and center. If you’re talking about smaller scale modularization of that architecture, extensible unions hopefully obviate the need to use OutMsg or Transformer patterns and makes NoMap more type safe.

@MartinS, yep @gampleman is exactly right. Although his example does reveal I made a small mistake when I said “extensible union types never appear in the input of a function signature unless that exact type shows up in the output.” Elm’s magical == operator shows up as an exception again! (I swear so much of Elm needs to suffixed with an asterisk saying “except if you use == in weird ways”). If you ignore == though then that statement still stands.

Syntax

@rupert and @gampleman 's syntax choices look way nicer than the clunky, rough syntax I’ve presented in the article and I’m really happy to see them. And luckily everyone recognizes that the syntax is not the most important part of this. For the rest of this syntax section, I’ve experimented a bit with some alternative syntaxes as well (replacing @ with ` to make it less visually noisy, at the cost of making it a bit harder to write in Markdown as well as replacing or with + to have a single character), but I don’t really care about those concrete choices.

There are, however, two points about syntax that I think go beyond bike-shedding.

First, in a type alias declaration, I would recommend against a syntax that uses brackets, square or otherwise (at use sites in function signatures brackets are fine). I initially thought of going with bracket-ful syntax because it does further highlight the analogy with extensible records, but it bakes in a choice as to whether the type alias can be used in an extensible context. Compare

type alias T
    = `Tag0 Int
    + `Tag1 Int
    + `Tag2 Int

-- I can decide after the fact to make this extensible
f : () -> a + T
f _ = `Tag1 0

-- I can also use this as part of building up another name
g : Bool -> T + `Tag3 Int
g x = case x of
    True -> `Tag3 0
    False -> `Tag1 0

-- I can also build a new type alias from it
type alias T0 = T + `Tag3 Int

On the other hand with bracket syntax

type alias T =
    [ Tag0 Int
    , Tag1 Int
    , Tag2 Int
    ]

-- Seems syntactically weird since this suggests we have nested brackets
f : [ a | T ]
f = Tag1 0

-- Same with new tags
type alias T0 = [ T, Tag3 Int ]

This was part of what made the Msg example in my post so easy to construct from other parts and goes a long way towards realizing the dream of “just pull out a chunk of the message type and give it a name.”

This is also an issue that affects extensible record syntax (I’ve definitely written type alias R = { f0 : Int, f1 : Int } and wished I could reuse that in type alias R0 a = { a | f0 : Int, f1 : Int, f3 : Int }), but there’s a compelling case there to use syntax familiar to programmers who’ve programmed in class-based languages.

Second, if normal union types remain (so that extensible unions are strictly an additive change), I think some amount of redundancy in syntactic differentiation is important for generating friendly error messages and for making sure new users don’t get tripped up by accidentally “opting into” extensible unions inadvertently. If they don’t remain… well see the last part of this post.

If we leave aside function signatures, it is true that simply changing a type declaration to a type alias declaration is enough to completely disambiguate an extensible union from a normal union type, but it can result in weird errors and a confusing experience for newcomers (if they accidentally type an “alias” now all sorts of potentially weird things can happen). Some redundancy can safeguard against this.

As @allanderek noted, I went a bit overboard on differentiation redundancy for the sake of clarity in a proposal. I don’t think you need quite as much redundancy as I’ve outlined, but I do think you need some.

I chose 4 points of syntactic differentiation.

  1. Differentiating a normal variant tag vs an extensible variant tag (the use of @).
  2. Differentiating a normal union type declaration vs naming of an extensible variant type (type vs type alias).
  3. A different syntax for a case statement (case#).
  4. Differentiating the separator among normal variant tags vs extensible variant tags (| vs or)

There are other things you can choose to differentiate on (e.g. you can choose to change the -> in a case statement, perhaps to something like =>), but those four I think are the most likely candidates for changes.

This means at a declaration site, there are three pieces (therefore two redundant pieces) of syntax that disambiguate between a normal union type and a structural union type.

-- Three points of differentiation
-- type alias
-- @
-- or
type alias T = @Tag0 Int or @Tag1 Int
type T = Tag0 Int | Tag1 Int

At use sites in values there were two pieces of disambiguating syntax (both redundant because we can look up the tag and realize it’s a structural union).

-- Two points of differentiation
-- case#
-- @
-- Also rupert, I definitely agree case# looks ugly
-- typecase is a great choice! One small wrinkle is that other languages
-- already have the concept of "typecase" that refers to runtime reflection
-- on types, which this is not, but it is similar, so maybe it's worth
-- co-opting it? Or maybe the confusion is not worth it?
f x = case# x of
    @Tag0 a -> ...
    @Tag1 a -> ...

f x = case x of
    Tag0 a -> ...
    Tag1 a -> ...

To provide helpful error messages (that are localized to exactly where things go wrong) and to prevent users from inadvertently using extensible unions when they meant to use normal union types, I would recommend at least one point of redundancy in both the declaration site and use site. That means at least two points of differentiation at the declaration site and at least one point of differentiation at the use site. You might want even more, but then that starts getting more into aesthetics.

The second point about keeping normal union types then brings me to the next block of stuff…

Do we want to completely replace normal union types with extensible unions?

As @allanderek points out this is an even bigger change. I waffle back and forth between whether I’m comfortable advocating for such a big change. (EDIT: Today I happen to be quite reluctant)

Nonetheless, just to get people’s minds moving, assuming we keep one way of keeping types “opaque” (that is where type-checking is entirely name-based, not structural, allowing us to hide an implementation of a type), we can do everything we currently do in Elm with entirely extensible/structural types for both union and product types.

-- We might decide to not export UserIdCtor
-- Borrowing syntax from upcoming Scala 3.0 here
-- https://dotty.epfl.ch/docs/reference/other-new-features/opaques.html
-- This lets us keep nominal types when we want them
-- This is also known as "newtype" in Haskell
-- The compiler doesn't care about the right-hand side of = when type
-- checking, after confirming that the initial declaration is well-formed, 
-- just like in Elm right now with normal types
opaque type UserId = UserIdCtor Int

type ExtensibleUnionByDefault
    = Tag0 Int
    | Tag1 String
    -- We might decide to force all variant tags to only have one argument
    -- so e.g. Product Int String is impossible but instead must be a record
    | ProductTag { field0 : Int, field1 : String }

-- No more normal product types or tuples, everything product-based is
-- a record
update : Msg -> Model -> { model : Model, cmd : Cmd Msg }
update = ...

There is precedent for this. All types in Typescript are structural (again Typescript lacks the extensible part and uses subtyping instead), although Typescript feels the pain of not having an opaque type, which makes stuff like UserId harder than it needs to be.

By doing so, it also starts making sense to talk about recursive structural types (again a feature which Typescript just recently got), since we no longer have type alias or alternatively everything that is not an opaque type is a type alias. This is not the only way to talk about recursive extensible unions (you could e.g. extend the semantics of type alias), but it’s a natural entry point.

So to get back to @allanderek’s examples of NonEmptyList and Expr, in the case of the former, extensible unions don’t change NonEmptyLists API too much. You still will have a separate set of functions for List and NonEmptyList.

Some functions must be different because they intrinsically must change the type. For example, filter for a NonEmptyList can result in an empty List and so must have a different output type from its input type, so you need separate filter functions for NonEmptyList and List.

Other functions must be different because of limitations in this proposal on extensible unions (that another implementation of extensible unions could get rid of). For example you need separate map functions and can’t just use an extensible union on @Cons because just as you can’t update the type of an extensible record, only a concrete record, you can’t update the type of an extensible union, only a concrete structural union (i.e. an extensible union without the type variable).

The main lift you get from using an extensible union for NonEmptyList is that you save a bit of code when implementing those two parallel APIs. Currently in Elm, you can reuse the code in List to help implement functions for NonEmptyList whereas with extensible unions, you could reverse the directionality and use NonEmptyList to help implement List. This saves a small amount of code, but not much. So you get a little bit of benefit from using this proposal’s version of extensible unions for NonEmptyList but not a lot.

For Expr, extensible unions in general let you write “nanopass”-style compilers, where you can gradually build a compiler with tons of different Exprs that all gradually add or remove bits of syntax with each “pass” as opposed to a few (or none) very different intermediate languages. This is a really fun and maintainable way of writing compilers since you have so much introspective capabilities as to what has changed from one pass to the next.

The current proposal has some limitations that makes writing “nanopass” compilers harder (again one big one is the inability to add or remove variants, another one is the ability to only “add” two "type alias"es together, rather than the ability to “subtract” one from the other), but it’s still potentially doable, if a bit clunky with this current proposal, in a way that would be an enormous pain to maintain in Elm currently.

I personally think that those limitations should stay in place to keep the same tradeoffs in place that Elm has already made with extensible records.

4 Likes

Just to note that the ‘type’ of a case expression is already overloaded, you can case on booleans, integers, strings, lists, tuples etc. In addition, such things can be nested, so if you decided to differentiate the case expression syntax for extensible and traditional custom types, what do you do when you have one nested within the other, such as a Maybe (NonEmptyList Int) or a NonEmptyList (Maybe Int). The obvious thing to do would be to have the case expression syntax agree with the outer most type, but if that’s the case I don’t see what you’re gaining by having multiple case expression syntaxes.

1 Like

This is an exciting proposal! Thanks for taking the time to write out your thoughts @yosteden, and thanks to all the people joining in on the brainstorming. I look forward to seeing how this pans out over the coming days/weeks/months.

1 Like

I’m late to the party, and I haven’t read everything super carefully, but I think there is one step missing going from union types to extensible (anonymous) union types.

(Non-extensible) records are a way of anonymously declaring (i.e., without giving it a top-level name) a product type. Extensible records enable you to “pattern match” on a record to implicitely “extract” the fields you’re interested in.

We do not yet have a syntax to anonymously declare a union/sum type, right? I think that’s where the syntax proposals come from. After that, the next step is extensible sum types that allow you to “extract” just the variants from a bigger sum type.

In the blog, the issue with the output error types boils down to the anonymous sum type that we lack, I think. If we had a way to declare the variants in the function type declaration without creating a top-level name for each combination, I think that would solve the issue with over-broad sum types presented. Then we could also think about activating exhaustivenesss checking on the function output.

After that it might be convenient to allow picking out just the variants a function is interested in via an extensibility mechanism as presented here. I’m not sure, because I rarely use extensible records anyway, but I’ve not worked on bigger projects in Elm. So I can see that there is appeal in reducing conversion boilerplate.

I hope it helps to explicitely split this problem into the anonymous sum types and extensible (anonymous) sum types subproblems. :wink:

2 Likes

@allanderek: Well the default thing I was thinking of was just disallowing nested case matching for structural union types (this is effectively what e.g. Kotlin does when it comes to its equivalent syntax for non-structural union types), because part of the point of structural unions is to encourage flatter hierarchies.

That being said, I’ve been playing around with the syntax some more and it feels oh so nice to have a single unified case statement (mainly because of the nesting). But yeah, it’s good to explicitly point out the conflict between nested matching and a separate case statement.

@Y0hy0h I would probably use the term “structural” as opposed to just anonymous, because there’s an element of order independence as well, meaning that the compiler really cares about the structure of the type rather than just its name. For example, tuples can be thought of as anonymous product types, but they aren’t structural in the same way as records are because the order of the tuple fields matters.

However, in Elm, structural types and extensibility are very closely linked because of type inference and “global typeability” (i.e. every Elm expression has a type). Without extensibility, you are required to have top-level type signatures for everything that uses a structural type, i.e. you lose inference. To use records as an example:

f record = record.someField

Without extensible records, the type signature of that function is ambiguous. This is even more accentuated in Elm because .someField is a valid function in and of itself that can no longer be given a valid type signature without extensible records.

-- Note that we use a type signature here instead of relying on inference
-- I put .field0 in parentheses to emphasize that it is an Elm expression
-- What is the type of (.field0)? Can we assign a single type to it?
f : { field0 : Int } -> Int
f record = (.field0) record

Note this is a big change in Elm code! Previously every Elm expression could be given a valid type that holds globally, i.e. you could refactor code by simply lifting it out from where it’s being used, but this is no longer true.

(If you want a language-agnostic, internet-searchable term, the specific thing that Elm currently has is “principal” types, i.e. every Elm expression has a single “most general type” which we lose with structural types without extensibility.)

I’ve been using records here to keep it familiar, but all this holds for structural union types.

-- How should Elm infer the type of value?
-- More problematically, what is the type of @SomeTag?
-- Does it change every time it's being used?
-- Note that this is infectious, this means any code using a tag must
-- have a type signature and cannot be inferred unless we annotate its type
value = @SomeTag 1

Again, without extensibility, note that this means any code anywhere that uses a structural type (record or union) requires a type annotation, i.e. you lose global type inference.

Now you can recover “global typeability,” i.e. every expression can be given a type that holds when you pull it out into an independent, top-level expression, by taking a page out of object-oriented languages and bringing in subtyping.

-- Subtyping for records
-- Ignore unused fields
-- Note that we go from "narrow" to "wide"
record : { field0 : Int }
record = { field0 = 1, field1 = "unused" }

-- Subtyping for structural unions
-- Note that we go from "wide" to "narrow"
-- In isolation this looks identical to extensible unions
x : @Tag0 Int or @Tag1 String
x = @Tag0 1

-- But the following doesn't work with extensible unions,
-- but does with subtyping
y0 : @Tag0 Int
y0 = @Tag0 1

y1 : @Tag0 Int or @Tag1 String
y1 = y0 -- This would be a type error with extensible unions

-- And when you start trying to combine subtyping in larger
-- expressions things get wonky
record0 : { field0 : @Tag0 Int }
record0 = { field0 = @Tag0 0 }

-- This seems like a reasonable alternative type
record1 : { field0 : @Tag0 Int or @Tag1 String }
record1 = record0

-- But these two functions don't work
f0 : { field0 : @Tag0 Int } -> Int
f0 record = case record.field0 of
    @Tag0 x -> x

-- Whoops type error!
-- What happens if I pass a String?
f1 : { field0 : @Tag0 Int or @Tag1 String } -> Int
f1 = f0

-- Potentially confusing:
-- This works and is accepted by the type checker
f0 record0
-- This is a type error
f0 record1

-- The fact that record1's type is compatible with record0 but f1's
-- type is not compatible with f0's is known as type variance and
-- is not a concept Elm needs to worry about in its current form,
-- but would need to worry about with subtyping

But now things are starting to become really different from the way that Elm currently works and global type inference remains tricky (it’s no accident languages that rely heavily on subtyping tend to only have local type inference).

Now of course you can do all this just for unions and not records (I was mainly using records as an example to make the trade-offs clearer), but now you have two different typing regimes for what are two complementary parts of the language.

So in short, given the way Elm currently works, you really do need extensibility as soon as you have structuralness. It’s not really just something to reduce conversion boilerplate. To get rid of extensibility means making a lot of other changes to the way Elm works.

Also, I’m not sure what you meant by “After that it might be convenient to allow picking out just the variants a function is interested in via an extensibility mechanism as presented here.” but keep in mind that extensible unions usually show up in the output of a function rather than the input to a function, so the function is generally not “picking” variants so to speak. Perhaps you meant it in the sense of what variants the function is allowed to output?

EDIT: Ah I see, your intuition is focused on how structural unions allow you to pick arbitrary subsets of a greater type when pattern matching, i.e. focusing on the input of a function. This is indeed true, but note that that is already true with ordinary unions in Elm (case the variants you about and then use _ -> for the rest). It may be useful to add to your intuition the additional factor that this proposal is allowing you to arbitrarily combine outputs of functions together (and this is why extensibility matters). Again, in the context of functions, extensible unions are usually dual to or “opposite” to records (extensible records focus on function inputs, extensible unions focus on function outputs).

Is that maybe closer to what you’re getting at? (I’m rushing here so apologies for some lack of clarity)

3 Likes