Using Phantom Type Ids

Hi all,

Background

I often use an Id type that helps distinguish Ids from other Strings and therefore prevents mixing up Ids with Strings. For example, this is possible:

updateUsersName : String -> String -> Cmd msg
updateUsersName id newName =

-- and then mixing up the parameters here
    updateUsersName "cool-new-name-42" user.id

But this, using an Id type isnt possible…

updateUsersName : Id -> String -> Cmd msg
updateUsersName id newName =

-- and then mixing up the parameters here
    updateUsersName "cool-new-name-42" user.id

… you would get a compiler error.

New Problem

Recently I worked on a project that used the facebook api. The project involved updating a button on a facebook page owned by the users facebook account. I was managing the button’s Id, the page’s Id, and the users Id. My functions had type signatures like

updateUsersPagesButton : Id -> Id -> Id -> E.Value -> Cmd Msg
updateUsersPagesButton userId pageId buttonId buttonPayload =

You can see that this problem of mixing up values re-emerges. Theres nothing stopping me from mixing up userIds from pageIds

Thats okay, Phantom Types are a thing

Instead I used an `Id type like…

type Id x = Id String

so my functions were…

updateUsersPagesButton : Id User -> Id Page -> Id Button-> E.Value -> Cmd Msg
updateUsersPagesButton userId pageId buttonId buttonPayload =

and now its impossible to mix them up again.

This worked well enough, but then I realized that you cant put phantom type Ids inside the thing they are an Id of. The following is impossible due to a circular definition

type alias User = { id : Id User }

to work around this I did

type UserId = UserId

type alias User = { id : Id UserId }

Is there anything better than that work around?

I maintain and use a package for Ids and managing lots of data that have Ids. It would be really powerful to have phantom types in that package. You could have functions with type signatures like save : Id a -> a -> ..., that commit the Id exactly to the type of the thing its meant to be an Id of. Id a always refer to an a. But the trade off is that you cant ever have the Id internal to the data it refers to because that would always be circular.

Instead you would have to manage data with Ids like this

    (Id User, User)

I store ids like this anyway, because it avoids other problems Ive encountered as well. To me it seems like a good practice. But Im wondering if its a bad idea to fully commit all data with Id to never containing its own Id internally. This tuple approach is certainly a lot less obvious.

Do any problems jump out to any of you by handling Ids in this tuple style and never inside the data itself? Is this even a worthy consideration?

Thanks

4 Likes

What are your thoughts comparing this to the approach with one type with a single constructor per Id?

type UserId = UserId String
type PageId = PageId String
type ButtonId = ButtonId String
2 Likes

This is one of the reasons to use record-datatypes in Elm.
If your function has the fields updateUsersPagesButton : {user_id: Id, page_id: Id, button_id: Id} -> E.Value -> Cmd Msg then mixing those keys up is a lot harder.
(In fact: It’s about equally hard as when you alias UserId, PageId and ButtonId to a single datatype (like String) because that would allow you to use one in the place of the other without warnings as well; only the visual difference of the constructor names will warn you. The compiler will not!)


I know that there are some Haskell SQL libraries that use the same (Id User, User) tuple-approach, for similar reasons (like: The Id can only be set after inserting the data structure for the first time, but working with a Maybe Id everywhere is horrible, etc).

I’m not sure what to think of that solution, other than: You point out a great problem, which we need to find an answer for :slightly_smiling_face:

2 Likes

To start with OP’s question, I think it’s okay to have the tuples. You could always define a type alias:

type alias Identified a = (Id a, a)

Like you say, you can make a generic table using the phantom type, backed by a Dict, and this is an advantage over a UserId type as suggested by Andy. (Richard glosses over IDs in his recent talk, Immutable Relational Data, but it’s still helpful background.)

empty : Table a
index : Table a -> List a
get : Id a -> Table a -> Maybe a
create : a -> Table a -> (Table a, Id a)
update : Id a -> a -> Table a -> Table a
delete : Id a -> Table a -> Table a

This interface has the Table generate the ID. But where do IDs come from?

  1. From an incrementing integer. The (opaque) Table a type would store an incrementing number which would become the new ID. The Id a type would also be opaque and be defined in the Table module.
  2. From UUIDs. You need access to a source of randomness, and (depending on the type of UUID) the time and MAC address. These are not the most pleasant things to obtain in Elm.
  3. From the data. You could imagine initializing the table with an (a -> String) function to extract the ID from an inserted record. This allows you to look up records by the ID the server uses, e.g. for a RESTful API. But you can no longer guarantee that two records will have distinct IDs, and you have to handle the case of creating a record with an ID that already exists.
  4. From the request. When consuming a RESTful API, you know the ID before you have the rest of the record. You also want to track a history of network traffic: we tried to get this record and it failed; we have a request in flight for this record so we won’t ask again. You often want to show this information in the UI. The RemoteData library is designed for exactly this purpose, so a Table’s values might need to be RemoteData. You would need a way to handle IDs in a safe way from the URL, to your network request Cmds, to your Table, to your view. I think you’d need to expose fromString : String -> Id a or similar, which may defeat a lot of the benefits we are looking for.
  5. As a composite of #1 and #4. One thing I’ve noticed about RemoteData is that it’s really oriented around requests, not data modeling. You may have multiple ways to obtain the same value (index, get, or created in the client). So maybe you’d need a Dict String (RemoteData (Id a)) to handle requests in flight, and to translate from server IDs to those hidden incrementing integers in a phantom type. This would be most beneficial if you could use the phantom type in the view code, and do the translation only when you talk to the server.

I think it would be really valuable for people who are already having “mixed up IDs” problems to try out these solutions to see which ones work best in practice, and report back. (This isn’t “I was told there would be someone” because if no one is actually having the problem, then no one needs to bother.)

4 Likes

@andys8, @Qqwy, Those both look like really good solutions to me. I do that record thing a lot @Qqwy mentioned a lot, and while I havent made explicit Id types for each data type like @andys8 mentioned, it must work in differentiating ids.

I think, this topic, like a lot of other topics Ive noticed in our community, are actually a blurry nexus of similar and connected problems. I think if you have a project where only one kind of thing needs an id, then just an Id type works completely and fully solves the problem. If you have a small project with a few data types with Ids, then something like either of your suggestions work completely and fully solves the problem.

This concept of data with ids grows in complexity and bleed into related problems, such as the entities stuff @mgold is hitting on (in hindsight better than my OP did; thanks all for letting me rubber duck). Im trying to explore if those solutions either become less-scalable or not fully optimized for the projects they serve. For example, here some questions:

  1. If you have 5 remote data types with ids that come in a large volume, do you really want to be duplicating the code for the ids? Do you want to be duplicating the helpers functions that work with data for ids?
  2. Wouldnt it be great if you could safely handle Id while scaling and re-using Id logic? You shouldnt have to have type safe Ids as a trade off with good large remote data techniques. How can we get both?
  3. I definitely could be wrong, but it kind of seems like there are some patterns associated with this handling data with ids. Wouldnt it be great to really nail down what that is, then try and optimize an api just for handling that?

Thanks for the video link to Richard’s talk. That approach is exactly how we ended up handling this problem where I work in our Elm projects.

We could have an api for optimized for large volumes of entities. Furthermore, entities necessarily have ids that more than likely come from a remote data source; we can optimize an api for those ids too. Broadly speaking, Im wondering how to do that, and how I can iterate on my Chadtech/id package (if it needs to be iterated at all). Narrowly, Im wondering if an api with phantom types can do that.

I really like some of your points @mgold, I’ll really have to think about all that. On the incrementing integer point, I wonder with using an incrementing integer for ids, would be a problem if the front end used different ids than the back end? The front end would probably still retain the ids associated with the back end? Would it be confusing or difficult if data had a front-end Int id and a back end UUID?

You bring up some very interesting points, @Chadtech.

What came to mind just now, is that Ids are basically a ‘poor-man’s pointers’. We can change an Id to any other value, as long as we do it to all instances of the id, and the transformation is bijective (one-to-one). Maybe it’s possible to create a nice wrapper for them using this property.

Maybe, the answer to the original question is that one should just define the to-be-identified datatypes as true record datatypes, rather than as aliases.

The Recursive Alias page says that matching + re-wrapping is ‘kind of annoying’, but to be honest it seems like the most clean solution to this whole charade, since it allows us to simply use Id User, Id Product etc. without having to manually define wrappers for each and every one of them.

(And if it becomes too annoying, maybe one can define a (comment_body_record -> comment_body_record) -> Comment -> Comment function, to essentially ‘map’ over the record that is inside the data structure?)

EDIT: Ellie example

Well said! When I wrote my first post, I remember trying to articulate that IDs can be used for client-side data storage from an API, and for passing into functions like updateUsersPagesButton. As mentioned, APIs are asynchronous and can fail, leading to RemoteData. Can we get a nice solution for all of these entangled problems?

That’s what I was getting at in point #5. Most of the app would deal with the opaque Id User, which Table User would generate from an incrementing integer. (type Table a = T { dict : Dict Int a, nextId : Int } or similar. Because custom types aren’t comparable, the key is the unwrapped Int, not Id a.) Only when you need to talk to the API would you convert from the ServerId (could be an int or a UUID) and Id a. You might a Bijection (Id a) ServerId value that would let you convert in either direction. I don’t know if this overhead would outweigh the benefits gained from using Id a everywhere else.

A good place to start may be an application that already has a lot of data and lets you browse it. (“Already has” could mean random generation, or reading in a JSON file, or something.) See if the phantom types work and are helpful before worrying about RemoteData and server IDs. (Just a thought - I’m not trying to hand out work.)

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.