Idea: Extensible Union types and benefits they bring for real world Elm code

I like this idea. I think it only works though when being used to construct members of an ‘extensible’ union type, rather than to pattern match on them as inputs. In the expression below, the msg is always the return type of some function passed in, not the input type. Is the right way to describe this to say that this msg type is always in the contravariant position? Too long since I studied the theory…

{ a
| goToPage : Url -> msg
, toggleDarkTheme : msg
}

But I think you are on the right track as far as I am concerned. You can sort of do extensibility or sub-sets of union types in Elm in a number of ways. It may add some complexity, but I also think its an attractive property of Elm that it forces you to trade complexity for flexibility - do you really need the flexibility? if so, you must pay a price for it.

Also worth noting @gampleman’s proposed encoding in Elm too:

https://discourse.elm-lang.org/t/how-to-represent-many-subsets-of-an-enum/6088/18

2 Likes

Regarding use cases in current Elm code, I am reminded of the Scale module in elm-visualization. I think that module effectively uses records of functions to simulate some of the functionality proposed here. The problem there is of bounded polymorphism - some operations need to work on some types, but not on all and there is some advantage gained from doing this in a polymorphic way.

I suspect this proposal could make it much nicer: Currently you have function signatures like this:

nice : Int -> 
    Scale { a | nice : domain -> Int -> domain, domain : domain } -> 
    Scale { a | nice : domain -> Int -> domain, domain : domain }

Whereas it could be coded as:

nice : Int -> @LinearScale or @QuantizeScale -> @LinearScale or @QuantizeScale

which I think would be much nicer for the users of the package (as they either have to do a lot of squinting at the types to figure out what is a valid argument or check the docs).

This signature appears to say that you could provide a @LinearScale and the function might return an @QuantizeSclale. This seems under-constrained. I haven’t worked my way far enough through the proposal yet to know whether something like Int -> (@LinearScale -> @LinearScale) or (@QuantizeScale -> @QuantizeScale) is expressible, but that seems more like what you would want.

While this is true, that also is the case if the signature was

nice : Int -> Scale -> Scale

so this is still an improvement. The point is that it prevents you from passing say a @BandScale.

@Philipp_Krueger yep sorry for those typos! I was typing quickly to try to crank out the post before I needed to do something else. I’ve corrected them and hopefully the post makes a bit more sense.

@rupert A small nit, but if the type appears in the output of a function it’s in a covariant position. Regardless, variance almost never shows up as an issue you need to think about in Elm (at least I don’t think I’ve ever run into it), because there’s only one kind of subtyping: specializing a type variable. In fact @Philipp_Krueger’s proposal exactly encodes a case match statement (you think of it as the record argument being a template case match statement that you then fill in with an actual record to do the actual case match).

Indeed Elm is very close to allowing that proposal to completely encode union types (extensible or not) with product types (extensible or not) and higher-order functions. This is effectively what OO languages do with the visitor pattern to encode union types when they don’t have them.

The actual sharp edge with @Philipp_Krueger’s proposal I mentioned has to do with the “very close” part. In particular, Elm lacks the ability to “delay” specialization of a polymorphic signature. For example, the following code has no problem with identity and specializes it separately to an Int and String argument in its two invocations.

x : Int
x = identity 5 + (String.length (identity "hello"))

However, as soon as you try to extract identity as an argument to a function, Elm refuses to compile the expression. There is no way in Elm’s type system to type the resulting function.

-- Does not compile
f : (a -> a) -> Int
-- Not the correct type signature, but the closest we can get
f identityAsArg = identityAsArg 5 + (String.length (identityAsArg "hello"))

This is because identityAsArg must be specialized once for the entire body of the definition. Its specialization cannot be “postponed” to each call site. With explicit polymorphic foralls we can express the signature we want

-- Elm can't write this signature, but if it could then this code would
-- compile
f : (forall a. a -> a) -> Int
-- Previously we had forall a. ((a -> a) -> Int)
-- Notice the position of the parens in the type signature
f identityAsArg = identityAsArg 5 + (String.length (identityAsArg "hello"))

This same problem is behind what @Philipp_Krueger’s astutely notices as an issue with the fact that there are two type parameters instead of one.

-- If we had this feature these two encodings would be equivalent
-- Notice output does not appear as a type argument for ExtUnion0
type alias ExtUnion0 a = forall output.
    { a
    | case0 : Int -> output
    , case1 : String -> output
    } -> output

type alias ExtUnion1 a = a or @Case0 Int or @Case String

Their equivalence can be seen by looking at a case match

handleInt : Int -> MyType
handleInt = ...

handleString : String -> MyType
handleString = ...

value1 : MyType
value1 = case extUnion1 of
    @Case0 int -> handleInt int
    @Case1 string -> handleString string

value0 : MyType
value0 = extUnion0
    { case0 = handleInt
    , case1 = handleString
    }

Unfortunately, the additional type argument is more than a simple syntactic annoyance. With that additional type argument this encoding is equivalent 80% of the time, but has a huge gap in functionality for the remaining 20%.

type alias BadExtUnion a output =
    { a
    | case0 : Int -> output
    , case1 : String -> output
    } -> output

listOfValues0 : List (BadExtUnion {} output)
listOfValues0 = ...

matchToInts = { case0 = (\_ -> 0), case1 = (\_ -> 1) }

matchToStrings = { case0 = (\_ -> ""), case1 = (\_ -> "a") }

-- I would like to evaluate the first and second elements of the list to
-- have different output types, but the output argument means I have 
-- to treat the entire list uniformly
cantDoThisWontCompile = case listOfValues of
    x :: y :: _ -> 
        -- This won't work, I can only use matchToInts or matchToStrings,
        -- not both, since the concrete type of `output` can only be
        -- determined once
        (x matchToInts) + (y matchToStrings) 

whereas with normal extensible unions you could do separately typed case matches with no problem.

I think I know where this sentiment is coming from but I think it comes back to bite users. As the above demonstrates, there’s brick walls that a user can unexpectedly hit with partial encodings of extensible union types and more importantly the error messages that Elm starts generating become a bit confusing for a user not used to those patterns (this is a real problem with the Translator pattern today). On the other hand I think true extensible union types can have very informative error messages and don’t come with the brick walls.

It’s sort of the same reason why normal union types are so valuable even though in theory you could encode union types with product types and higher order functions. We could implement the equivalent of the visitor pattern everywhere instead of using union types, but it’s pretty painful to do so for no real gain (there are good reasons to be using union types and it’s probably more harmful than helpful to gate their access behind knowledge of a specific pattern).

Also just to be entirely clear:

I don’t think Elm should get explicit foralls. That way lies a lot of knock-on complexity (in its full generality we lose global type inference, unification becomes potentially trickier, etc.).

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.