Why are union types not treated as namespaces?

In Elm, types are not treated as namespaces i.e. you cannot reference it’s members by the TypeName.Member syntax which is extremely common in other languages. A more complete example:

type MyType =
    Variant1
    | Variant2

doSomething : MyType -> Int
doSomething thing =
    case thing of
        MyType.Variant1 -> 1  -- Compiler complains that it can't find #MyType.Variant1#
        MyType.Variant1 -> 2  -- ditto

Basically, this became a problem for me when I was mapping between two data structures and I wound up with two modules that had types with members of the same name. Okay, how do I disambiguate these names? The obvious syntax from most other languages didn’t work. This got me stuck against a wall for about an hour and searching in vain for answers to this problem. Even more frustrating, no one seems to have encountered it before. Or they have but figured it out as I later did and it’s never occurred to anyone this can be a real bummer if you’re new to Elm. Example code:

In Module1:

type WhichEntity =
    Entity1
    | Entity2

In Module2:

import Module1 exposing (WhichEntity(..))

type SelectedEntity =
    Entity1
    | Entity2

fromRequestMsg : WhichEntity -> SelectedEntity
-- This implementation doesn't work
-- Compiler complains that we're matching WhichEntity against variants of SelectedEntity
fromRequestMsg whichEntity =
    case whichEntity of
        Entity1 -> Entity1
        Entity2 -> Entity2

-- This implementation doesn't work either
-- Compiler complains that it can't find the names #WhichEntity.Entity1# and #WhichEntity.Entity2#
fromRequestMsg whichEntity =
    case whichEntity of
        WhichEntity.Entity1 -> Entity1
        WhichEntity.Entity2 -> Entity2

Now I do understand how to disambiguate these and it’s by the module name:

fromRequestMsg whichEntity =
    case whichEntity of
        Module1.Entity1 -> Entity1
        Module1.Entity2 -> Entity2

But this strikes me as so utterly non-obvious as to be almost intentionally contrived. When I am thinking about my union types, I think of the variants as members of the type not members of the module and so my first inclination on how to reference / instantiate / disambiguate them is by dereferencing the type, not the module! This is especially the case if you are coming from places like Java, C++, Python, C#, Rust, etc as I of course am and I suspect many are, where enum members often can’t exist in isolation, and must be dereferenced from the overarching type name.


So my question is why doesn’t this syntax exist? Why are types not considered namespaces?

I have to imagine there’s a reason and I’d like to understand why so I don’t go developing bad opinions that are poorly informed :slightly_smiling_face:

If there is no reason and this is some oversight for years and years, can I formerly suggest it?

7 Likes

i think it would be more obvious if this was the syntax for union types

type Maybe a

Just : a -> Maybe a
Nothing : Maybe a
2 Likes

ADT constructor is a kind of function scoped in a module, they are not local to the ADT type scope, but the module the type defined in.

If you want to disambiguate this way you could achieve it with:

`import Module1 exposing (WhichEntity(..)) as WhichEntity

at least it allows having such semantics you want.

I think it is a common problem (in Haskell-like languages), qualified imports allow to solve it.

In F# one can choose between “Elm mode” and @kmurph1271’s suggestion:

// Like Elm
// Can use just `Variant1` (but also `MyType.Variant1` if you want, unlike Elm)
type MyType =
    | Variant1
    | Variant2

// “Namespaced”
// Have to use `MyType.Variant1`
[<RequireQualifiedAccess>]
type MyType =
    | Variant1
    | Variant2

So it’s a reasonable thing to ask. I don’t know how the semantics for Elm were chosen. Never thought about it before.

Would probably be a good thing to mention in the Elm Guide, perhaps on this page:

https://guide.elm-lang.org/types/custom_types.html

3 Likes

Elm is inspired heavily by Haskell and I would assume this decision comes directly from there.

I can only speculate to the real reason things are that way in Haskell, but here are a few observations:

This:

data Person = Person { firstName :: String, lastName :: String }

just defines three functions:

Person :: String -> String -> Person (Elm inherits this bit.)

lastName :: Person -> String

firstName :: Person -> String (Elm dropped this bit in favor of the .firstName :: { a | firstName : b } -> b syntax, which is possible because Elm’s type system supports extensible records.)

In Haskell the Person type is completely opaque. There isn’t any way to get any data out of it or inspect it in any way. Unrelated as far as the type system is concerned, you also have two functions: firstName and lastName which both take a Person and return a String.

The compiler de-sugars that syntax into something like this:

data Person = Person String String

firstName :: Person -> String
firstName (Person value _) = value

lastName :: Person -> String
lastName (Person _ value) = value

Note how this way records are just sum/enum types with a single constructor. In fact, the Person :: String -> String -> Person constructor bit that is kind of a special form in Elm is just a normal type-constructor in Haskell. The bits that are actually special are the firstName and lastName functions that you get from the { ... } syntax.

If I remember correctly early versions of Haskell compiled to something like this:

Person :: String -> String -> Person
Person firstName lastName accessor =
	accessor firstName lastName

firstName :: Person -> String
firstName person =
	person (\f l -> f)

lastName :: Person -> String
lastName person =
	person (\f l -> l)

If I remember correctly again, this compilation method actually stuck around for quite a while because in combination with Haskell’s laziness it ended up being a fairly efficient way to implement records.

In a similar way to how records are just special syntax for function definitions, enum/sum types are just special syntax for function definitions too:

data Timer = Off | Counting Int

Creates two functions: Off :: Timer and Counting :: Int -> Timer. We might call them “type constructors” but they’re just functions that happen to have a capital first letter.

When viewed this way you might actually ask your question in the other direction: Why should syntax that just creates function definitions cause modules or namespaces to suddenly appear?

That’s one way of looking at it anyway.

5 Likes

Thanks for the explanation :+1:. That actually makes a lot of sense!

… From a theoretical perspective that is. I do feel the need to address this:

The issue I have with this is basically that programming languages are ultimately user interfaces themselves (just like the things we build with them) that turn human-readable and comprehensible text into machine instructions. That’s the whole reason we have programming languages; machine instructions are basically incomprehensible for anything besides simple math. A programming language isn’t doing its job if it’s not intuitive.

And even with the explanation, I don’t find the idea of types not being namespaces very intuitive. I define a type because values are all logically related to each other under some umbrella. So if I go looking for them, I expected to find them under the umbrella. Not on the other side of the pavilion that the umbrella is in because the compiler doesn’t actually consider the umbrella to exist.

If I did find that line of logic intuitive, then I would also have to accept that C++ (or in fact any object oriented language for that matter) shouldn’t have function dereferencing from instances of objects. Because the compilers just translate classes into flat structures and a series of functions whose first arguments are pointers to the structures:

class ClassName {
private:
    int member;

public:
    ClassName(int member_) : member(member_) {}

    void add(int num) {
        this.member += num;
    }
};

Compiles into something like:

struct ClassName {
    int member;
}

struct ClassName ClassName_constructor(int member_) {
    return struct ClassName { .member = member_ };
}

void add(struct ClassName* self, int num) {
    self->member += num;
}

So you would have to write:

struct ClassName instance = ClassName_constructor(1);
add(&instance, 1);

Instead of:

ClassName instance(1);
instance.add(1);

But which one would we rather read if we’re dealing with object-oriented programming where data and behavior are encapsulated?

I agree. (For the most part, but my caveat isn’t important here.)

This is a perspective that is very useful to discuss. Especially if the discussion is about how to make a language more intuitive to programmers coming from other disciplines. I think it’s important to realize that it’s mostly contextual and based on previous experience though. Nobody who isn’t a programmer would have any intuition in either direction on this subject. Someone coming from Haskell or a similar ML family language is likely to have the opposite intuition.

Some people do have exactly that perspective. I’ve meet several people who prefer something like this:

add(&instance, 1);

over this:

instance.add(1);

(As an aside, I’m also not sure that the line of logic necessarily implies you would have to accept the dereferencing example. I feel that I could think of a few reasons someone might accept the namespace argument but reject the dereferencing argument while maintaining logical consistency. But that’s probably missing the underlying point.)

Depends on the person, their previous programming experience, and their priorities.

I think the question shouldn’t be “which of these is more intuitive” but “which of these is more intuitive for people who have previously used C++ & Java, and do we want to make it more intuitive for them at the cost of making it less intuitive for people with the opposite intuition?”

That’s not to say that one way or the other is more correct or better. Just that people have their own intuitions and that these are mostly (though not entirely) based on their previous experience rather than being inherent.


[edit]: removed a point about platonic ideal forms that was more confusing than helpful.

1 Like

Fair enough :grinning_face_with_smiling_eyes:

To that end, my formal education is actually computer engineering. So as a result my degree courses were often in assembly and C. Professionally, I’ve mostly worked with Java and Python and learned quite honestly more than I really wish to know about either; I’m talking deep stuff like garbage collection and monitor implementations… uggh. And I dealt in plain JavaScript, HTML, and CSS enough to know it’s the web equivalent of assembly lang (in my opinion anyways).

Apart from that I’m a hardcore Rust programmer. This is probably why I’ve for the most part found Elm positively delightful. Rust is very expression-heavy i.e. everything evaluates a a value, so Elm just feels like idiomatic Rust minus the loops.

But I can imagine that if I didn’t have the Rust background, Elm would have been a lot harder to grasp. And I suspect a typical web programmer is coming more often from the imperative-procedural world than the declarative-functional one. Now imagine total paradigm shift on top of alien syntax and semantics, and without Rust I may have never thought Elm worthwhile.

3 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.