Dealing with state duplication in the model


#1

Hey everyone. I’ve been wrestling with how best to approach state management and would like some feedback from the community on best practices. I’ll use an example to illustrate the question.

Suppose I have an application with “posts” that each have an “user” as the author, and I want to list out the posts with user data. A GraphQL query for the data may look like this:

{
  posts {
    id
    body
    user {
      id
      name
    }
  }
}

And return data like this:

{
  "data": {
    "posts": [{
      "id": "1",
      "body": "Hello John",
      "user": {
        "id": "1",
        "name": "Bob"
      }
    },
    {
      "id": "2",
      "body": "Hi Bob",
      "user": {
        "id": "2",
        "name": "John"
      }
    },
    {
      "id": "3",
      "body": "What's the latest?",
      "user": {
        "id": "1",
        "name": "Bob"
      }
    }]
  }
}

My initial approach would be to model the state like this:

type alias User =
    { id : String
    , name : String
    }

type alias Post =
    { id : String
    , body : String
    , user : User
    }

type alias Model =
    { posts : List Post
    }

The problem is that user "1" (Bob) appears twice and his data is now duplicated in two places. Anytime user attributes change, I would need to crawl the model for all the places where duplicates may be stored and update them.

This is along the same lines as what Brian describes here:

It seems like the way to get around this would be to create an “identity map” using a Dict and just store user IDs on posts:

type alias User =
    { id : String
    , name : String
    }

type alias Post =
    { id : String
    , body : String
    , userId : String
    }

type alias Model =
    { posts : List Post
    , users : Dict String User
    }

This seems suboptimal because I am now forced to deal with an "impossible state."

According to my API contract, I can be confident that my query will always return user data for each post (or else it should fail at the JSON decoding layer). However, fetching the user by ID from the Dict will return a Maybe User. When I’m iterating over posts to display them in the view, I have to handle
the Nothing case that will never actually get called.

I’m not sure what is worse:

  • Forcing myself handle Maybe everywhere I use a Dict identity map, or
  • Accepting the duplication (thereby making the impossible state impossible) and maintaining a user update function that knows all the places to propagate changes to user state

Or, am I missing some other way to deal with this?

Thanks for your time!

Related:


#2

The GraphQL query you are talking about is a typical NoSQL data set, where there is duplication (as there is no “relation” between sub-trees) or rather the data has been “joined”.

That is, of course, different from getting a dataset that is “unjoined”, for example just a userId, e.g.

{
  posts {
    id
    body
    userId
  }
  users {
    user {
      id
      name
    }
  }
}
{
  "data": {
    "posts": [{
      "id": "1",
      "body": "Hello John",
      "userId": "1",
    },
    {
      "id": "2",
      "body": "Hi Bob",
      "userId": "2",
    },
    {
      "id": "3",
      "body": "What's the latest?",
      "userId": "1",
    }],
    "users": [
      user: {
        id: "1",
        name: "Bob"
      },
      user: {
        id: "2",
        name: "John"
      }],
  }
}

as you describe.

It’s not like there is a right answer. It just depends on what you are trying to do with the data.

If you are just going to display it, maybe the api is fine as it, and you have duplicated data.

If you are looking to update the data, then that is an entirely different problem, as I’d assume you have to push the changes back to a server.

In that case, yes, you have to deal with the Nothing | Just User issue you mention when displaying a post, as you’d clearly want to only update an author’s info in one place, and the assumption would be that if there is no userId in the users list/dict, you would be unable to update it.

I wouldn’t consider this an “impossible state”. It’s possible that a post is written, but the author forgot to sign it. Or deleted their name without deleting the post. Etc. Or maybe there are two authors. Etc. Or there is an author that hasn’t written any posts.

And the same applies in reverse. An author may not have any posts.

All of that CRUD logic would have to be enforced at the database level or the on the server at the time the post was created or updated.

In other words, there are two possibilities.

  1. Authors are directly embedded in blog posts, in which case there is no real relation and each article has an author. If there happens to be the same author on multiple posts, that is just coincidence, and therefore updating an author on one post doesn’t imply anything else should change.
  2. Authors are independent of posts, in which case the relations must be enforced somehow (e.g. at the database level with foreign keys, etc.), and therefore updates to author’s names must be pushed back to a server so that the next time a post is queried, the author’s name is updated.

To sum it up, I don’t think this is suboptimal. It is just the nature of dealing with relational data that may or may not exist. I think when you actually go to implement this, it will be quite straightforward!

Hope this helps!


#3

Thanks @madasebrof for the thorough reply.

A few clarifications:

It is an impossible state according to my application domain. On the server-side, this is a required foreign key relationship between the users table and the posts table. Ideally, I would be able to represent that a Post MUST have a User, but it’s not possible to enforce that with the Elm compiler using a Dict identity map. This is why it feels suboptimal – having to deal with the Maybe case when it is not actually necessary according to domain.

(For those who aren’t familiar, the “impossible state” concept is a reference to this great @rtfeldman talk: https://www.youtube.com/watch?v=IcgmSRJHu_8)

The goal is to keep the user interface up-to-date with server state at all times (via websockets that trigger model updates). So if “Bob” decides to change his name to “Robert” and I have some of his posts displayed on my screen, everywhere “Bob” is represented in the view should automatically update to “Robert”.


#4

I don’t think there’s a way around handling this particular “impossible state” without de-normalizing your data. But then you don’t get the nice “only update in one place” behaviour. Seems like a fundamental trade-off.

BUT you don’t have to use Maybe. You can use any container that has some ‘empty’ element. List is often a good alternative to try when Maybe gets awkward. For example if you’re updating “Bob” to “Robert” for user 123, you find the List of matching authors in your Model, and iterate over it. The impossible state (no matching authors) is represented by an empty list rather than Nothing. The nice thing is that you can easily handle multiple author updates in one go. This technique works best when you’re already handling Lists anyway.

So you could think about modelling the data as List (Id, User), or even List User if the User record has the ID inside it. Dict gives you a slightly faster lookup, but that’s usually tiny, and List might give you nicer code.


#5

Thanks @Brian_Carroll.

It is true that you don’t have to deal with Maybe when performing the update, but you do have to deal with it when fetching the user by ID from either a Dict or a List. This is the impossible state scenario I’m concerned about.


#6

Yes but what I’m questioning is whether you really need to look at it as retrieving a single item by ID or whether you can think of it as filtering a list of items by ID, where the resulting filtered list contains either one or zero items.

It can often be a useful refactor. For example when you create a Html view you are often dealing with lists anyway. And you can easily extend to filtering out multiple items.


#7

If you have this kind of UI, what I would call an “editor” - where there is almost direct manipulation of the backend model in the frontend - I would consider a more normalized model like @madasebrof suggested. So you can replace one node of the graph from the backend on any given change, and it stays consistent. So you’d do the joins in the frontend.

(I say this, but not having tried it myself, so take it with a grain of salt. I am sure this kind of thing is quite a bit more complex than a typical SPA backend strategy and there are many other pieces to consider.)


#8

@Brian_Carroll Was going to suggest that, too!

I think what we are not conveying properly is that there is nothing intrinsically impossible about a blog post not having an author.

There is no way for Elm (or any UI) to “know” that your database is enforcing a particular foreign key relationship or a particular method of referential integrity under the hood. Elm just sees data.

Elm is designed to work no matter what–e.g. whether or not your DB is working properly, whether or not you set it up correctly, etc. You could have set a one-to-one, one-to-many, many-to-many, zero-or-more. Elm/GraphQL have no knowledge of that. Only you do because you wrote both server and the client.

Thus this situation.

There is no way for Elm to know that you have enforced this relationship. It’s only an impossible state because you say it is–because you have a priori knowledge of the server–not because it’s inherently impossible for a blog post to have no author.

Think about it this way. If Elm wasn’t designed like this, someone could change the way the server worked and your Elm code would break.

So I guess what I’m saying is that this behavior is a good thing!

To Brain’s point, I often use a pattern where a function takes a list and always return a String result so your view is easier to comprehend, e.g. something like:

https://runelm.io/c/k77

Anyway, I hope this clicks for you!


#9

It is possible for Elm to know that this relationship is enforced:

type alias User =
    { id : String
    , name : String
    }

type alias Post =
    { id : String
    , body : String
    , user : User
    }

This can also be enforced at the schema level with GraphQL (the query would error out if the User field is defined as non-nullable but null was given) and the database layer.

But, this implicitly leads to duplication.

A concrete example: here’s what the view function might look like when the user is embedded in the post:

viewPost : Post -> Html Msg
viewPost post =
    div []
        [ text post.body
        , text post.user.name
        ]

And here’s what it might look like with a users Dict:

viewPost : Dict String User -> Post -> Html Msg
viewPost users post =
    let 
        maybeUser =
           Dict.get post.userId users
    in
        case maybeUser of
           Just user ->
               div []
                   [ text post.body
                   , text user.name
                   ]
           Nothing ->
               -- This will never actually happen according to the
               -- rest of my application domain logic
               text ""

So I think it all boils down to a dichotomy:

  • Use the duplication approach and have my Elm types align with my domain rules (and make impossible states impossible)
  • Use a identity map to eliminate duplication and be forced to deal with the impossible states

#10

I haven’t tried this in practice or thought about it in depth, so it may well be a terrible idea, but you could try this:

postAuthor : Dict String User -> Post -> User
postAuthor users post =
    Dict.get post.user.id users
        |> Maybe.withDefault post.user

or maybe even

getuser : Dict String User -> { a | user: User } -> User
getUser users item =
    Dict.get item.user.id users
        |> Maybe.withDefault item.user

Your data is still de-normalized, but the user on the post is only used as a non-updating fallback to the offical record in the Dict.


#11

LOL.

Last reply, promise! (Mulling over a complex thing I have to do next… this is a pleasant distraction!)

And because I think this is an important distinction.

Given this:

type alias User =
    { id : String
    , name : String
    }

type alias Post =
    { id : String
    , body : String
    , user : User
    }

You are correct. There is no possibility of having a Post without a User. Elm will not compile, or a JSON Parser will bomb when it gets bad data. (Yay!) Likewise, there is nothing that will guarantee unique User across multiple posts. Thus, your dilemma.

However, given this:

type alias User =
    { id : String
    , name : String
    }

type alias Post =
    { id : String
    , body : String
    , userId : String
    }

Dict String User 

There IS the possibility of having a Post without a matching User. That’s what I mean by it not being an impossible state.

You are saying that it will never actually happen due to your application domain logic. But my point is that it could, by definition, if you define the types this way.

It sounds like what you’d like to have is something like:

type table User =
    { id : PrimaryKey String
    , name : String
    }

type table Post =
    { id : PrimaryKey String
    , body : String
    , userId : ForeignKey User.id
    }

…to avoid the possibility of duplication of data. Thus, if you updated a User, because a Post referred to foreign key not a specific User record, the next time you queried the Post it would reflect the update.

There would be a lot of logic that you’d have to add to Elm to enforce this kind of thing. Basically, Elm’s records would then really be an in-memory relational database. (Which, honestly, could be kinda cool! Elm 0.20!)

But given that Elm is a functional language, you do indeed have to work around mapping relational structures to non-relational Records, Lists and Dicts.

:peace_symbol:


#12

If you think of your model as being like a database, then you can enforce the constraint that a blog post have a user in the user dictionary in the same way that a database would enforce the foreign key constraint: reject addition of blog posts where there is no corresponding user and delete blog posts when deleting users. There are lots of little things like that that databases do for one that dictionaries and other simple structures won’t do. (I would love an FP in memory database that would do things like support various indices on a table, for example.)

Another approach to dealing with this sort of constraint is to think about getting the blog posts out as a query operation and let that query do the foreign key enforcement. So, for example, the result of the query could be blog posts with user records. But this would not be the reference data. It would be the query result on the model and the query would not return blog posts without users.

Mark


#13

That’s an interesting idea @matt.cheely


#14

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.