Do you put database IDs in your records?

I’m using Elm on the frontend and ASP.NET in F# on the backend. They work pretty well together, and it’s been quite nice to create the same data structures (i.e. records) in both languages that both support unions and records.

Our application lets you create and edit some data we call markups, and we have the same data structure:

type alias Markup = 
  { description : String
  , notes : String
  -- etc
  , id : Maybe Int
  }

and in F#

type Markup = 
  { Description: string
    Notes: string
    // etc
    id: int option }

My question is whether we should store the database ID of the markup inside the record, as we are today, or whether we should store it separately?

The current reason for the optionality is that we have a create API endpoint and an update API endpoint, and both use the same JSON payload, so when we create a markup, we don’t yet have an ID as it hasn’t been stored in the database, but when we update one we pass the ID of the markup to update, and the backend has to awkwardly unpack the id from the int option (in F#, you can “just do this” in a way that will fail, similar to Haskell’s fromJust, but it is nonetheless awkward.

Is there a best practice here?

Edit: added some clarifying language

Maybe the question needs more context to get a proper answer (from me at least). My takes are:

If it is an ID, it should not be optional.

If the ID is needed by the application logic, then yes, it must be part of the record.
Warning in that case, prefer ID that can not be guessed like UUID, rather than sequential integers.

String is often a good datatype for database ids, even if they are Ints. They could potentially be large 64-bit ints, which are not strictly allowed in Json. Making them Strings is a good way to indicate that this id is an opaque value which has no meaning.

3 Likes

Guessability is not a concern for us, in fact I’m okay surfacing it to users. Here’s what we’d like

https://www.our-domain.com/markups/new
https://www.our-domain.com/markups/1
https://www.our-domain.com/markups/2
etc.

The ID is just a database-generated one (serial in Postgres). So, when you create it, you get an ID back. And when you update it, you pass in an ID. Typical CRUD stuff.

The main question is whether you store that ID inside the record for the markup, or whether you store it alongside it. Maybe and option make it easy to handle the case where your ID isn’t there, but it’s awkward when you really “should” have it, for example, for an update.

@rupert Good note on data types! I’ll be sure to include that.

Why is the id optional? Is it that in /markups/new you have

myMarkup : Markup
myMarkup = 
  { description = "some text"
  , notes = "other text"
  , id = Nothing
  }

and then in /markups/55 you have

myMarkup : Markup
myMarkup = 
  { description = "some text"
  , notes = "other text"
  , id = Just 55
  }
1 Like

Correct, it’s an entity that the user has drafted locally but has not yet saved.

Ah! Then I would have

type alias Markup = 
  { description : String
  , notes : String
  -- etc
  , id : Int
  }


type alias MarkupDraft = 
  { description : String
  , notes : String
  -- etc
  }

because these are 2 distinct things. They look and are quite similar but they’re not the same.

6 Likes

Interesting - I hadn’t thought of a separate type entirely. I get the trap of “similar, but not the same”, but personally, it feels a little cumbersome to have an entirely separate type for something based on whether it’s been saved or not. Maybe this is a generalizable pattern? What do you think of this?

type alias DbEntity a
  { data : a
  , id : Int
  -- maybe more metadata
  }

type alias Markup = 
  { description : String
  , notes : String
  -- etc
  }

type alias Model =
  { markups : List DbEntity Markup
  , draft : Markup
  }

viewMarkup : Markup -> Html Msg

view model = 
  div [] (List.map (\entity -> viewMarkup entity.data) model.markups)

This sort of feels like Remote Data but for database stuff.

The comparison to Remote Data is interesting! I haven’t tried that approach so I can’t say for certain, but it seems reasonable. I have gone the route of 2 different but similar data structures and that does tend to work.

I think generally with Elm I find I care a lot less about writing more code because it’s all so easy to refactor that the quantity doesn’t really get in the way. But I would be curious to hear if you like this Remote Data inspired approach. I could definitely see it working.

I would also be curious if it works with something like

type Id a = Id Int
type alias DbEntity a
  { data : a
  , id : Id a
  -- maybe more metadata
  }

type alias Markup = 
  { description : String
  , notes : String
  }

such that you don’t accidentally send a Id Markup when you meant to send an Id User or vice versa.

2 Likes

I tend to avoid having ids in entity records because I often have collections of entities and I don’t want the entity record to have a redundant (and possibly inconsistent) copy of the dictionary key.

Another option option in this vein, though, is something like

type alias Markup = 
  { description : String
  , notes : String
  }


type alias Identified id entity = 
  { entity
  | id : id
  }
  
dbMarkup : Identified Int Markup
dbMarkup =
  { id = 3
  , description = "foo"
  , notes = "bar"
  }
2 Likes

Just wanted to make a clarifying comment. Even though IDs are appearing in the URL, and you are correct, in this case, these IDs can get out in the wild, the issue with guessable sequences is it creates an attack vector for brute force attacks to cycle through IDs that are outside of a user’s data domain.

So the general recommendation is, unless you need the IDs to be sequential, it is best to use cryptographically secure IDs. :slight_smile:

Non-sequential IDs are also recommended for performance reasons in case your application ever starts to scale past the limits of a single machine.

1 Like

In my most recent app I store the id on the record itself. New records have to be identified somehow, since the user can create multiple new records before saving any of them. (My ‘new’ routes look like this: “/contact/new/:id”)

I use negative ids for the new records as an easy way to communicate to the server that the record is new rather than an update. (It would also be possible to put a “new: bool” field on the request sent to the server, or use an enum type for the id itself, but I haven’t yet encountered any issues just using negative ids.)

All new/updated records are sent in a single (combined) request to the server when the user clicks save.

The server keeps track of the newly created ids as new records are inserted, and automatically associates related records that have negative foreign keys with the correct new record.

for contact in request.contacts {
	let new_record = insert_record(contact);
	id_map.insert(contact.id, new_record.id);
}

for person in request.persons {
	if person.contact_id < 0 {
		person.contact_id = id_map.get(person.contact_id);
	}

	let new_record = insert_record(person);
	id_map.insert(person.id, new_record.id);
}

It also sends the id_map to the client, which the client uses to patch its local data with the new records from the server.

In my app I haven’t found there to be any difference between new and existing records in terms of their data structure. All fields are used in both cases, and neither case has fields that the other does not. Probably depends on the app or UI, but it works well for me.

I’ve actually used a remote data like type in a previous application. It looked like this:

type RemoteRecord id data
	= New data
	| Creating data
	| Fetching id
	| FetchFailed id Error
	| Unchanged id data
	| Changed id data

I think overall the strategy I described above resulted in simpler code, however. Several different parts of my application need to reference records by ids (data cache, routes, views when linking to other records, …) and it’s better to just always have an id for them to use rather than handling the possibility of there not being one everywhere.

Edit:
Oh, I forgot to mention something that might be relevant. My current app is structured to always have all the data already available for any page within one click of the user’s current view, so that the user is almost never waiting for a network request to complete.

The data is managed by a centralized caching layer, which pretty much eliminated the need for a remote data type like I described above.

Storing an id separate from the record in a remote data type might make more sense if you’re loading each record when the page is loaded. :man_shrugging:

2 Likes

This makes sense , though it’s a bit confusing long term I’d think. If I want to do some operations on a Contact I now need to inspect its ID to know if it’s an existing one or a new one. This changes how you read the code. Now when I’m fixing a bug and I see any function with Contact -> Contact I can’t know if it’s operating on a new or existing Contact without first inspecting the implementation. Whereas separating them allows you to see at a glance which you’re operating on. Anything that operates on Contacts now has to additionally inspect the ID to know what can/can’t be done rather than looking strictly at the type and/or shape.

For your specific app this might be fine, definitely not suggesting you should change it! For more generalized “how should I structure my code” questions though I wouldn’t recommend it as the first approach.

1 Like

Thanks for the reply! I’m curious about this part: you said your new IDs are negative, so do you use -1, -2, -3, etc. and allow the user to go back and forth before saving them all?

I also started into looking into something like this. We also support deleting items, and my head started spinning when I thought about deleting items that were in the middle of being saved.

Negative is just for internal tracking. The route parser converts the negative numbers to positive for display. Something like this:

Url.s "contact" </> Url.s "new" </> Url.map (\x -> Id -x) Url.int

Same on the toString implementation.

Users can go back and forth before saving them all. Or they can save just one and discard the others. Or they can edit one record while a new record hasn’t been saved yet, etc., etc. One of the key design goals for the app was, as much as possible, let the user do whatever they want. No artificial limitations like “you can’t do that because you’re actually working on something unrelated on another page.” Partly in response to the question: “Why can’t I? I could do that when the app was just Excel.” For similar reasons we also defer all data validation. Rather than saying they can’t save the record now, we permit them to save (with a warning) and keep a list of invalid records that they (or another user) can return to and correct at a later time.

For my app at least, a ‘delete’ is just an update that sets the ‘archived’ flag to true, since we can’t legally delete data for regulatory reasons. I actually think I might do the same thing on other apps without the same reasons in the future though, because it ended up working out quite well. It’s great for the users too since there’s less fear of “deleting” something, since it’s always still there. This makes it more comfortable for them to delete old or incorrect data without “what if I need it?” interfering.

Also, we side-step the save→delete (and similar) sequencing issues because all communication with the server is over a websocket, which strictly orders all operations. The server must finish processing a save request before it even reads the next request from the socket.

1 Like

I’ve been watching this thread for some time not commenting @wolfadex already gave an answer there is no point trying to improve upon. In my view any elaborate logic around this is just about introducing problem just to offer solution to it.

To me the clearly superiour approach is to have two types. Say NewUser and User. User has id, new user doesn’t. But that’s only the first obvious naive difference, there are more. User has created at and updated at attributes. User has any other attributes that system assigns automatically too. NewUser might have different types at certain places (which did not pass the validation that is delayed etc). Simply there is no world where these are the same type. They are different type and act of saving is esentially an act of turning one type into another.

You can choose to ignore this and try to define one type that that holds both types of data. But that’s a odd choice for statically typed functional langauge. It’s a choice to not use language features you’re given (static types) and replace them with your own invariants (negative ids or so).

6 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.