I’m extremely excited by the reception this idea has gotten from everyone! Quite a few responses to unpack here.
Given this interest, I’ll expand out the post some more over the coming weeks (timing is a bit tight at the moment so I don’t know if I can do it all now). There’s some stuff I wanted to get to but haven’t yet.
More things to come in the proposal
I was definitely starting to handwave exactly how to make different tradeoffs than OCaml has made towards the end of the post (I was rushing to finish and wasn’t sure if people would read the whole post to begin with). Although disallowing adding and removing variants is part of the limitation we’re imposing on extensible unions (by following extensible records here), that’s only a (in retrospect minor) portion of what makes extensible unions more “foot gun-y” in OCaml than I think is appropriate for Elm. There’s some more cleverness in how OCaml does things that I think is worth pointing out if only to show the full limitations of extensible unions as I’ve presented them. This also helps flesh out some more the purposeful limits @wolfadex is referring to (and more holistically look at the tradeoffs).
I also hope to go through other languages (Typescript and Purescript in particular) that have versions of extensible unions (or at least structural unions, i.e. the structural part without the extensible part) and what tradeoffs this proposal is making relative to those versions. I chose OCaml initially because it is the closest in spirit to this proposal, but I think it’s worth outlining what this proposal does differently from Typescript and Purescript and what we gain or lose based on that.
There’s also another point that I glossed over in the current proposal, which relates to typeInOutput
.
typeInOutput : a or @SomeTag Int -> a or @SomeTag Int
typeInOutput x = x
Is there a way of writing a non-trivial function that keeps the same type signature, but does something other than just be the identity function? There is, but it might be non-obvious. The case#
notation I’ve presented so far can’t handle the a
case and the obvious extensions to make it possible (either by ignoring it with _ -> ()
or making it a passthrough with a -> a
) make extensible unions overly powerful in comparison to extensible records. Again thinking about what Elm already has for extensible records is helpful here (there is record update notation, which is quite limited compared to record creation notation, maybe there’s an analog here for union update notation vs a full case statement). I have thoughts there that I haven’t written out yet.
Miscellaneous replies
@creminology yes I’d love to go through https://github.com/rtfeldman/elm-spa-example and change everything over and show the comparison. I’m not sure if I’ll get the time to do that this week, but it’s definitely on my radar. I’d also love if someone else tried to do the same thing with what they understand of this proposal. It’d be great to compare notes (and they might finish before I do)!
@allanderek RE case expressions: yes! The specifics of whether your plan will work as is will depend on the implementation of the type checking, but even if that specific syntax doesn’t work, you can annotate e2
with the remaining variants left (perhaps a type alias
instead of literals for conciseness) and then go from there. RE recursive types, so as things stand right now, because type alias
es cannot be recursive, you need a normal (nominal) union type somewhere to break the recursion (as in the “Less obvious, but nicer” example in https://github.com/elm/compiler/blob/9d97114702bf6846cab622a2203f60c2d4ebedf2/hints/recursive-alias.md). Your NonEmptyList
and Expr
examples are still really interesting to talk about, so I’ll address them in the last part of this reply where I talk about potentially even bigger changes (since this would involve changes to the semantics of type alias
).
@DullBananas RE changes to the Elm architecture, if you’re talking about the Elm architecture (i.e. model, update, view) there is no change and in fact a lot of this proposal is built around keeping that front and center. If you’re talking about smaller scale modularization of that architecture, extensible unions hopefully obviate the need to use OutMsg or Transformer patterns and makes NoMap more type safe.
@MartinS, yep @gampleman is exactly right. Although his example does reveal I made a small mistake when I said “extensible union types never appear in the input of a function signature unless that exact type shows up in the output.” Elm’s magical ==
operator shows up as an exception again! (I swear so much of Elm needs to suffixed with an asterisk saying “except if you use ==
in weird ways”). If you ignore ==
though then that statement still stands.
Syntax
@rupert and @gampleman 's syntax choices look way nicer than the clunky, rough syntax I’ve presented in the article and I’m really happy to see them. And luckily everyone recognizes that the syntax is not the most important part of this. For the rest of this syntax section, I’ve experimented a bit with some alternative syntaxes as well (replacing @
with `
to make it less visually noisy, at the cost of making it a bit harder to write in Markdown as well as replacing or
with +
to have a single character), but I don’t really care about those concrete choices.
There are, however, two points about syntax that I think go beyond bike-shedding.
First, in a type alias
declaration, I would recommend against a syntax that uses brackets, square or otherwise (at use sites in function signatures brackets are fine). I initially thought of going with bracket-ful syntax because it does further highlight the analogy with extensible records, but it bakes in a choice as to whether the type alias
can be used in an extensible context. Compare
type alias T
= `Tag0 Int
+ `Tag1 Int
+ `Tag2 Int
-- I can decide after the fact to make this extensible
f : () -> a + T
f _ = `Tag1 0
-- I can also use this as part of building up another name
g : Bool -> T + `Tag3 Int
g x = case x of
True -> `Tag3 0
False -> `Tag1 0
-- I can also build a new type alias from it
type alias T0 = T + `Tag3 Int
On the other hand with bracket syntax
type alias T =
[ Tag0 Int
, Tag1 Int
, Tag2 Int
]
-- Seems syntactically weird since this suggests we have nested brackets
f : [ a | T ]
f = Tag1 0
-- Same with new tags
type alias T0 = [ T, Tag3 Int ]
This was part of what made the Msg
example in my post so easy to construct from other parts and goes a long way towards realizing the dream of “just pull out a chunk of the message type and give it a name.”
This is also an issue that affects extensible record syntax (I’ve definitely written type alias R = { f0 : Int, f1 : Int }
and wished I could reuse that in type alias R0 a = { a | f0 : Int, f1 : Int, f3 : Int }
), but there’s a compelling case there to use syntax familiar to programmers who’ve programmed in class-based languages.
Second, if normal union types remain (so that extensible unions are strictly an additive change), I think some amount of redundancy in syntactic differentiation is important for generating friendly error messages and for making sure new users don’t get tripped up by accidentally “opting into” extensible unions inadvertently. If they don’t remain… well see the last part of this post.
If we leave aside function signatures, it is true that simply changing a type
declaration to a type alias
declaration is enough to completely disambiguate an extensible union from a normal union type, but it can result in weird errors and a confusing experience for newcomers (if they accidentally type an “alias” now all sorts of potentially weird things can happen). Some redundancy can safeguard against this.
As @allanderek noted, I went a bit overboard on differentiation redundancy for the sake of clarity in a proposal. I don’t think you need quite as much redundancy as I’ve outlined, but I do think you need some.
I chose 4 points of syntactic differentiation.
- Differentiating a normal variant tag vs an extensible variant tag (the use of
@
).
- Differentiating a normal union type declaration vs naming of an extensible variant type (
type
vs type alias
).
- A different syntax for a case statement (
case#
).
- Differentiating the separator among normal variant tags vs extensible variant tags (
|
vs or
)
There are other things you can choose to differentiate on (e.g. you can choose to change the ->
in a case
statement, perhaps to something like =>
), but those four I think are the most likely candidates for changes.
This means at a declaration site, there are three pieces (therefore two redundant pieces) of syntax that disambiguate between a normal union type and a structural union type.
-- Three points of differentiation
-- type alias
-- @
-- or
type alias T = @Tag0 Int or @Tag1 Int
type T = Tag0 Int | Tag1 Int
At use sites in values there were two pieces of disambiguating syntax (both redundant because we can look up the tag and realize it’s a structural union).
-- Two points of differentiation
-- case#
-- @
-- Also rupert, I definitely agree case# looks ugly
-- typecase is a great choice! One small wrinkle is that other languages
-- already have the concept of "typecase" that refers to runtime reflection
-- on types, which this is not, but it is similar, so maybe it's worth
-- co-opting it? Or maybe the confusion is not worth it?
f x = case# x of
@Tag0 a -> ...
@Tag1 a -> ...
f x = case x of
Tag0 a -> ...
Tag1 a -> ...
To provide helpful error messages (that are localized to exactly where things go wrong) and to prevent users from inadvertently using extensible unions when they meant to use normal union types, I would recommend at least one point of redundancy in both the declaration site and use site. That means at least two points of differentiation at the declaration site and at least one point of differentiation at the use site. You might want even more, but then that starts getting more into aesthetics.
The second point about keeping normal union types then brings me to the next block of stuff…
Do we want to completely replace normal union types with extensible unions?
As @allanderek points out this is an even bigger change. I waffle back and forth between whether I’m comfortable advocating for such a big change. (EDIT: Today I happen to be quite reluctant)
Nonetheless, just to get people’s minds moving, assuming we keep one way of keeping types “opaque” (that is where type-checking is entirely name-based, not structural, allowing us to hide an implementation of a type), we can do everything we currently do in Elm with entirely extensible/structural types for both union and product types.
-- We might decide to not export UserIdCtor
-- Borrowing syntax from upcoming Scala 3.0 here
-- https://dotty.epfl.ch/docs/reference/other-new-features/opaques.html
-- This lets us keep nominal types when we want them
-- This is also known as "newtype" in Haskell
-- The compiler doesn't care about the right-hand side of = when type
-- checking, after confirming that the initial declaration is well-formed,
-- just like in Elm right now with normal types
opaque type UserId = UserIdCtor Int
type ExtensibleUnionByDefault
= Tag0 Int
| Tag1 String
-- We might decide to force all variant tags to only have one argument
-- so e.g. Product Int String is impossible but instead must be a record
| ProductTag { field0 : Int, field1 : String }
-- No more normal product types or tuples, everything product-based is
-- a record
update : Msg -> Model -> { model : Model, cmd : Cmd Msg }
update = ...
There is precedent for this. All types in Typescript are structural (again Typescript lacks the extensible part and uses subtyping instead), although Typescript feels the pain of not having an opaque type
, which makes stuff like UserId
harder than it needs to be.
By doing so, it also starts making sense to talk about recursive structural types (again a feature which Typescript just recently got), since we no longer have type alias
or alternatively everything that is not an opaque type
is a type alias
. This is not the only way to talk about recursive extensible unions (you could e.g. extend the semantics of type alias
), but it’s a natural entry point.
So to get back to @allanderek’s examples of NonEmptyList
and Expr
, in the case of the former, extensible unions don’t change NonEmptyList
s API too much. You still will have a separate set of functions for List
and NonEmptyList
.
Some functions must be different because they intrinsically must change the type. For example, filter
for a NonEmptyList
can result in an empty List
and so must have a different output type from its input type, so you need separate filter
functions for NonEmptyList
and List
.
Other functions must be different because of limitations in this proposal on extensible unions (that another implementation of extensible unions could get rid of). For example you need separate map
functions and can’t just use an extensible union on @Cons
because just as you can’t update the type of an extensible record, only a concrete record, you can’t update the type of an extensible union, only a concrete structural union (i.e. an extensible union without the type variable).
The main lift you get from using an extensible union for NonEmptyList
is that you save a bit of code when implementing those two parallel APIs. Currently in Elm, you can reuse the code in List
to help implement functions for NonEmptyList
whereas with extensible unions, you could reverse the directionality and use NonEmptyList
to help implement List
. This saves a small amount of code, but not much. So you get a little bit of benefit from using this proposal’s version of extensible unions for NonEmptyList
but not a lot.
For Expr
, extensible unions in general let you write “nanopass”-style compilers, where you can gradually build a compiler with tons of different Expr
s that all gradually add or remove bits of syntax with each “pass” as opposed to a few (or none) very different intermediate languages. This is a really fun and maintainable way of writing compilers since you have so much introspective capabilities as to what has changed from one pass to the next.
The current proposal has some limitations that makes writing “nanopass” compilers harder (again one big one is the inability to add or remove variants, another one is the ability to only “add” two "type alias"es together, rather than the ability to “subtract” one from the other), but it’s still potentially doable, if a bit clunky with this current proposal, in a way that would be an enormous pain to maintain in Elm currently.
I personally think that those limitations should stay in place to keep the same tradeoffs in place that Elm has already made with extensible records.