Key management between server and application

I have a question that is not specific to Elm, but is general to client / server systems and to which I hope there might be an elegant solution in Elm, or at least some agreed best practice amongst full stack developers.

Back in the day (the '90s) when last I coded, C clients connected to relational database servers. In addition to tripping into SEGV faults merely, it sometimes seems, by breathing hard, we faced the problem of how best to handle record keys across the server / client boundary. I had rather hoped that thirty year’s experience might have landed a well accepted solution, but Googling around and searching Stack Overflow (and elsewhere) it seems not.

The issue was/ is whether to use natural or surrogate keys in the data architecture. Using surrogate keys (whether in the form of row ids, sequences or GUIDs) definitely simplifies the relational database schema, but at the cost of making applications more complex. Should the data server create keys and dole them out to applications on demand? Or should all applications connecting to a data server be obliged to use a common algorithm to create keys that are guaranteed not to clash?

If the server is responsible for key creation, when should the application request them? When the user starts to create a new record? Or when he or she tries to save it? If the latter, how should master-detail be handled? You need the key for the master record in order to create the detail.

The strategy we ended up using was for the database to employ a two part key. The server gave the application a session id when it connected. This was simply a 32-bit integer generated using a database sequence and was the “top” part of the database key. Externally it was represented as a hex string. The “bottom” part of the key was a second, 32-bit integer allocated by the application, again represented externally as a hex string. The application could combine the top and bottom parts to create a new key. In our particular application users would often import large CSV files that created a many new records, only to delete a goodly portion of them; avoiding costly round trips to obtain keys for records that might never be saved to the database seemed a good thing.

Key allocation/ management was, however, a complete pain in the neck. It required an inordinate amount of book keeping to “swizzle” keys to pointers (and back) and great care to ensure that master records were always queried before related detail (many-to-many relations were a particular nightmare, requiring “placeholders” to be allocated). The whole edifice was brittle and hellish to debug.

So, my plea. How’s this issue addressed in modern web apps? And does the immutability of Elm data and its strict type system open a better solution? Has anyone written a package that deals with it?

I’ve struggled with similar issues. I’d hesitate to say there is “a solution” but there are some options. Mostly what you just described though. I’ll lead it off by saying that I don’t typically run into this issue with regular CRUD applications. Users typically create records before connecting them to related records just as a side-effect of good UI design. The few edge cases can usually be handled individually.

Some Options

  1. Don’t generate it. Lots of times you really don’t need an id. The relationships you’re describing can be encoded in the structure of your client-side data. (type alias Person = { classes: List ClassAttendance }) Then just create the root person first and wait for it to succeed before creating the other records. In my experience the best way to deal with that in Elm is to not have the Id be part of the record at all, ever. Instead have a enum like this: type RemoteRecord a = WithId Int a | WithoutId a. That can also ensure you don’t call functions like update with a record which doesn’t yet exist. There can be ACID issues though where you really want all the records to be created transitionally or fail as a group, which usually requires writing custom endpoints to create the records as a group. See below.

  2. UUIDs/GUIDs: Basically the same strategy you described to allow the client to create its own ids. No need for the complexity of a session used to generate two-part keys though. Just generate a v4 UUID on the client and be done with it. I have an application that uses surrogate uuids for all ids. It works really well! If there is a “one true solution” this is probably it. On the other hand, there are issues. If you put ids in your urls (like /user/fb967702-8a5e-455f-95f0-6da62b4f7c02) you end up with that crazy URL. I usually have yet another column for a shorter slug which the server generates randomly and is used only for URLs. This usually works since the client doesn’t need to know the future URL, just the id to link up related data. Again, if the ids are not tied up correctly on the client, you would want the operations to fail as a group. See below.

Failing/succeed transactionally

This is really the root of most issues. How can you ensure that the group of data is created correctly, especially when some records have foreign key constraints to other records which might not have been created yet. The “correct” answer to this was described by C.J. Date in one of his database books - the Third Manifesto I think. Databases should provide a single atomic “update” operation which supports multiple statements but succeeds/fails as a group, not enforcing key constraints on intermediate states during the update. This is probably the solution after 30 years of research you were hoping for. Unfortunately I’m not aware of any databases which provide such a primitive.

  1. You may be able to disable foreign key constraints prior to inserting the data. Some databases support that (MySQL), others do not (PostgreSQL). This would only work if the transaction can be rolled back after a failure, and could cause data integrity issues if your foreign keys are connected incorrectly.

  2. Have a single endpoint for all operations which accepts a list of operations to perform. Treat them all as a single transaction and rollback any db changes if there are issues along the way. Then you can either require them to be sorted by the client (it’s probably pretty clear what order the operations should be done in when the client constructs the request) or have the server sort them. (Probably by defining a “less than” ordering on the operations themselves and treating operations which need to be preformed before other operations as less than. Then use the sorting algorithm provided by your language.)

Misc

I’m not sure what you mean by “swizzle” keys to pointers. Are you talking about memory pointers?
Most languages used on the server don’t have pointers anymore, so I guess it’s not an issue?

I would not go down the hole of the client requesting a key from the server before allowing the user to create a new record. What will you put in the fields of the created record? Default data? What if the user cancels their operation, will the record be deleted? How? Too many issues this way. If you really need the id before the user starts entering data, definitely use a UUID generated on the client. I can’t think of a good reason not to, and it’s much simpler that way.

Does the application need to request a key at all? The usual pattern is to just store “in-progress” data without a key in the client state (like form contents) and then POST this to the server, which in turn returns an ID for a newly created record (or validation errors) to the client.

@jackwilsdon: I think I see where you’re coming from, but it looks to me that it just pushes the problem further down the line.

The server is receiving new, as yet unidentified data in the POST. It then saves it in some database and returns its key (id) to the client. Which process creates that key? The server transforming the POST data into a SQL insert statement, or the database? If the latter, how does the server actually get the key from the database, particularly if the insert is part of a larger transaction? There’s been a lot of water under the bridge since I last did this type of thing and I’d be the first to admit my SQL is rusty, but I don’t think it’s possible to get more than a success/fail code in response to a transaction.

That said, @jayshua suggests generating v4 UUIDs (I need to look that up!) on the client. Perhaps the client side code could (as you suggest) simply send new data, to which the server adds identifiers in the form of UUIDs before inserting it into the database.

If you’re using anything other than UUIDs the database itself would almost certainly have to generate the id to avoid conflicts. I routinely get the generated id back from the database inside a transaction using a RETURNING clause. Pretty sure MySQL doesn’t support that, but PostgreSQL does.

If you use sequential ids on a database that doesn’t support RETURNING you can also SELECT max(id) inside the transaction just after the record is inserted. I haven’t reviewed ACID requirements for a while, but I think transaction isolation will ensure you get the correct id.

@Jayshua: Thanks for this extensive answer, I shall ponder it.

Regarding “swizzle”, it was a phrase we used to describe translating surrogate keys/ foreign keys to memory addresses. This being C, the address was typically saved as a void *. The great thing about void * was that is was just the address of some chunk of memory returned from malloc(); the worst thing about void * was that you had no idea what it pointed to.

@Jayshua: I must learn more SQL! I’d not come across RETURNING, but it looks useful.

Its not the only way to do it but - in my experience the server typically generates the key, and also any further keys needed for your master-detail use case where a more complex things than a simple record is being created. You could google “Hibernate key generation” for different keygen strategies that can be used in Java/Hibernate if you want some examples for that particular server database framework.

Its usual to POST without a key, which is why the POST url usually does not contain the key, but PUT, GET, DELETE (PATCH) do.

POST /user – Creates a new user
GET /user/5 – Gets user with key 5
PUT /user/5 – Updates user with key 5
and so on…

1 Like

@rupert: thanks for the suggestion re key generation strategies, I will read up.

What’s niggling me here is that I may be entirely misunderstanding how best to design a REST-like interface/ API. I was used to minimising the number of round trips to the server: networks were a big, if not the greatest, bottleneck. We’d try to do as much as possible client side before bundling all the changes up as a single transaction, which was fired off to the server when the user hit “save”. All that would come back was Ok or a failure (code).

Your examples, however, create/ update only one record at a time. This makes some sense if POST returns a new key. If I POSTed two hundred records in a single go and go two hundred keys back, how would I then map those keys to correct records? It does seem, however, like you could end with orders of magnitude more runs trips. Is that common practice?

You can just use the order of items in the request:

Request: POST /users

[
  { "name": "John Doe" },
  { "name": "Johnny Appleseed" }
]

Response:

[
  { "result": "OK" },
  { "result": "Failed to create - name already taken" }
]

This does bring in the question of the correct status code to return - I guess it depends on whether you count the whole create batch as having failed or not (e.g. if it was run in a transaction and you rolled back on a failure).

Again, lots of ways to do it. I think a bulk create might not even return anything except an HTTP 204 to say it all went well, and a list of errors in the case it did not go well.

So insert 200 records, get a HTTP 204 all ok, then use some other part of the API to look up one or more of the records. It really depends on what the data is and how your API is structured, no real hard and fast rules.

Suppose I had a user, and I inserted 200 bank transactions against that user. The next thing that might happen is that the user asks for last weeks transactions, so a search call is made to fetch that date range - not necessarily the 200 I just inserted.

I would recommend your to have a look at James Long great talk about this at dotJS: https://www.dotconferences.com/2019/12/james-long-crdts-for-mortals

Usually, delegating ID management to the server is a good enough idea. You could also blindly generate a UUID on client side or server side without worrying to much about conflicts in UUID4 generation.

However if you need keys to be sortable with regards to object creation precedence, you can also have a deeper look at Hybrid Logical Clock and Vector Clock algorithms.

2 Likes

Wow–great talk! Super interesting!!!

1 Like

Very interesting indeed - would be a fun project to implement some of these algorithms in Elm. There was a talk on one at Elm Europe a couple of years ago:

Think the code is here: https://github.com/myrho/bright-db

1 Like

Connecting this back to my original question, using eventually consistent local storage across multiple devices looks interesting, but seems to come with a different set of problems. Certainly it solves my question to who makes and manages keys, it’s the client because there’s no longer a server. But it looks as if Actual have bumped into the issue that n-way connections between devices is itself fraught.

My nascent app is corrporate “teamware”, not “personal”, so I think I’ll stick with a central data server at least to start. That said, generating UUIDs client side may be the way to go. More to think about…

Assigning the IDs is a server-side concern, don’t let it slip into the client. It will work just fine in the client but I hate to see things leak across boundaries in a design. Feel free to ignore me…

@rupert. I shall take your advice and rethink; I guess the plan was a case of better the devil you were acquainted with years ago than the devils you’ve yet to meet!

I’ll reiterate the point that was made earlier: This is rarely an issue in standard CRUD applications. Just not having an id until the record is created then assigning the id when the record is inserted into the database and returning it to the client usually works fine. I happen to have recently worked on an unusually pathological application which is why I had so much to say about it.

That’s definitely what I’d prefer to do. The client simply needs to know that such values are unique and can be compared, not how to make them. The alternative was to assert that all applications POSTing data to the server must use v4 UUIDs, but of course the server can never be absolutely certain that’s the case. This way the server can just ignore and replace any key an errant application sends it in a POST; if that breaks the app, we’ll it’s better than breaking the database!

Yes. And even if the IDs are integers, I always make them Strings on the Elm side for that reason. (OR maybe wrap the strings in custom types, so they cannot accidentally be set on the wrong kind of thing.)

1 Like