Idea: Extensible Union types and benefits they bring for real world Elm code

Earlier I suggested this is not exactly like extensible records for two reasons:

  1. That the type of value @X "hi" would be the open type [ a | @X String ]
  2. There needs to be a special rule for case expressions.

Both of these differences appear in your write up, which makes sense. The fact that the general mechanism is the same does not account for practical differences like this when it comes to implementation and error messages. That’s the point I was trying to make above.

Example

The fact that @X "hi" has an open type has particularly big consequences with error messages. Take this function:

toOutput input =
  case input of
    @A a -> @Quick a
    @B b -> @Brown b
    @C c -> @Fox c
    @D d -> @Jump d
    @E e -> @Lazy e
    @F f -> @Dog f
    @G g -> @How g
    @H h -> @Much h
    @I i -> @Wood i
    @J j -> @Could j
    @K k -> @Wood k
    @L l -> @Chuck l
    @M m -> @Chuck m
    @N n -> @If n
    @O o -> @Wood o
    @P p -> @Coud p
    @Q q -> @Chuck q
    @R r -> @Wood r
    @S s -> @Could s
    @T t -> @Chuck t
    @U u -> @Wood u
    @V v -> @Could v
    @W w -> @Check w
    @X x -> @Wood x
    @Y y -> @Brown y
    @Z z -> @Fox z

Where is the typo? Is there one typo? Zero? Multiple?

And are there any type errors here? How would you even begin to figure that out?

With a closed ADT, the compiler can underline the exact constructor that has a typo or type error, with or without a type annotation.

It is easy to say “this is no problem if people just add the type annotation” but this is not fully true:

  1. You can only give an error for the whole case branch, not for the specific constructor. So with a branch with let x = ... in x, instead of underlining Coud directly, it is going to say "there is something wrong with the type of this let or this let body. If the branch is 10 lines or 20 lines, this is significantly less specific. So even in the best case, the error message is much less specific.

  2. Say you want intend for @Whatever x to have type [ a | @Whatever Int ] but in a large case branch someone ends up using the constructor with a String value. With the current design, you get the error message directly under Whatever x but with the proposed design you can only say “this branch does not match the type annotation” again getting 10 or 20 line chunks on large case branches.

  3. My experience very strongly suggests that if a program can be written, it will be written. People will be looking at 400 and 600 line case expressions hunting for a typo or type error with associated data. At that point, it is fair to say “the error messages are not very good” and there is no real way to get the quality back besides not using the feature.

In the past, we took out the ability to change the type of record fields in the record update syntax specifically because of problems (1) and (2) where you couldn’t get good specificity, particularly with unannotated cases which are not uncommon in practice. Even after restricting the design of records, it’s still hard to underline the specific field name that has a typo.

I hope this establishes the error message quality issues clearly.

Tradeoffs

Say we have this BEFORE and AFTER code, where we are getting the best case error messages for both open and closed union types:

-- BEFORE

type Output
    = Quick String
    | Brown String
    | Fox String
    | Jump String
    | Lazy String
    | Dog String
    | How String
    | Much String
    | Wood String
    | Could String
    | Chuck String
    | If String

toOutput : Input -> Output
toOutput input =
  ...


-- AFTER

type alias Output
    = @Quick String
    or @Brown String
    or @Fox String
    or @Jump String
    or @Lazy String
    or @Dog String
    or @How String
    or @Much String
    or @Wood String
    or @Could String
    or @Chuck String
    or @If String

toOutput : Input -> Output
toOutput input =
  ...

To my eye, the AFTER looks harder to understand and comes with a bunch of downsides that will come up in practice a lot. Conservatively lets say 50% of users don’t think about all the ways the error messages are impacted by the BEFORE and AFTER and many are not writing type annotations, especially beginners. The result is that error messages are worse in practice, and there is nothing that really can be done about it.

The best path to deliver them good quality is to strongly recommend against using this feature at all. Furthermore, beginners would be seeing person A recommending this feature highly and person B recommending against it strongly. What should they do? Should they try both? Is this question important to making the website or game they set out to make? Do teams need to argue about it in their style guide on features to use or not? (Any team I’ve been on that uses C++ or Haskell has had a style guide banning specific features specifically because there are so many trade-offs with extensions, macros, features, etc. The reason these style guides are so common is that there is a real tension between “make working code that others can read and modify easily” and having lots of ways to express the same thing.)

So based on my understanding of the design, it seems like some aspects of error message quality can be addressed (e.g. maybe with pattern part of case branches) but that there are still significant tradeoffs in error message quality in other areas.

Thoughts

I could be wrong about things here, but it feels like this kind of feature is a bit risky for Elm. I try to prioritize ease-of-learning and error message quality very highly. So while I am open to the idea that someone could figure out how to make open union types strong on those points, I would not be comfortable running that experiment in Elm with the information I have at this point.

It seems like a lot of cool things could be done if the core design of the language was “always use open union types” and there just wasn’t closed unions except when you say [ X Int | Y String ]. That would also mean that @Foo could be written as Foo without clashing with another language feature. It’d look a lot cleaner, and it may end up leading towards a different style of typed functional programming that many people could be into. (Lots of people value flexibility and having many ways to do things! E.g. people who prefer Ruby over Python! So even if the error messages are never as good, many people have priorities where that is a worthwhile tradeoff.)

So it feels to me like something worth exploring independently to get a feeling for the full implications in a setting where the culture, best practices, libraries, etc. can all evolve in a coherent way, with flexibility prioritized a bit higher than other things.

18 Likes