Moving from "Similar" to "Same"

I’ll outline three paths in order of worst to best (in my opinion!). I feel pretty strongly that these are good recommendations, but it is based on my personal experience with each one and I am sure other people think different things or have different scenarios than me.

Extensible records :warning:

I wasn’t even thinking of this. I personally find it leads to a mess in the long run. Making sure different types stay in sync structurally is not really very easy in this world because the dependency goes like this:

Thing1   Thing2
     \   /

So you change Thing1, update sharedFunc, and then realize that you cannot change Thing2 in the same way. I think this is not so fun to see in practice.

The literature on extensible records had big dreams for them being useful for this sort of thing. They had not been implemented in OCaml or SML or Haskell, so I was excited to see how they would work in Elm. I just have not seen them pan out in practice though. You have to think real hard and things still feel misaligned and messy in the end. So overall, I think extensible records are useful for making type inference on .x expressions easy, but a red herring for pretty much everything else.

Types with Holes :warning:

At some point I got into the idea of just adding more type variables to my types in the compiler. So instead of type AST = ... I had type AST loc var tipe doc = ..., allowing me to fill in different parts differently depending on context.

Thing1 == GenericThing a b c d e == Thing2

This is the path that I think looks neat, but I have found it is not worth it in the end. In my case, each phase of the AST was similar, and at the time, the Haskell community online was really excited about generic traversals of generic data structures. The trouble is that all of my traversals were specific. They relied on certain details interacting with other details. So in the end, I had done a bunch of work to have less code, but I ended up with code that was more complex and more frail to changes. Touching code about parsing meant messing up totally unrelated traversals two phases later. I eventually just made four separate AST types and the code got simpler and faster.

So I think this pattern is attractive in that it seems to promise less code, but I found that it led to code that was much harder to understand and modify. Instead of being messy like the extensible records approach, I found the result here was complex.

Nesting Types :white_check_mark:

I try to find a subset of information that makes sense as a type of its own. So let’s say Thing1 and Thing2 had shared fields about their location. I would see if a Location type made sense. Are there helper functions specifically relevant to it? Is it exactly the same in both cases? Maybe it corresponds to some idea that is true about your overall system? If it seems like a solid concept, I would consider making a module around it.

Now you have a structure like this:

      /   \
Thing1   Thing2

So everyone gets the benefit of the Location type. They can share decoders and helper functions. If you change something about Thing1 it does not ruin any code that works on Thing2. They are just separate.

The risk here is that a Location that looks the same in both cases today will only be similar in the future. If that happens, many people just start making Location more complex rather than pushing it back into Thing1 and Thing2. So the risk is that if you draw these lines wrong, you end up with a bunch of optional fields that are actually not optional. They are contextual! People think “oh, I’ll just make a little edit. This type surely exists for a good reason.” And when folks realize that they have all these optional fields that are actually contextual, they may try to get to Location a b c or { a | location : Location } to “fix” things. Now someone spent a lot of time getting that to work, but no one else on the team can understand or modify it easily anymore and you are gonna have all the problems discussed in the sections above.

So I give this a :white_check_mark: because it seems to work out the most often of any technique I know of. It has risks when it comes to misidentifying the same and similar where you just have to know how to back out of the situation without leaving messy debris. This is why I try to emphasize the same/similar distinction a lot!

In the end, these are just my opinions based on the kinds of code I have written. I prioritize “easy to understand and modify” in my code, and I have found that sometimes that means having code that is similar :man_shrugging: