Strategies for representing persisted and un-persisted remote data

I’ve written something like a dozen small Elm apps that are in production in internal tools, and I’m still struggling to find a pattern for managing API data that I’m happy with. The types always spiral out of control when I try to model every use case perfectly.

I am not a fan of the various “RemoteData” packages I have seen, since they are mostly about loading data. Most of my remote data needs revolve around persisting form data in the following contexts:

  1. Bound to a form as the form changes.
  2. New “blank” records, and un-persisted data that may or may not be valid.
  3. Data that we are in the process of attempting to save.
  4. Valid data that we just got from the server.

Here are some strategies I have tried in chronological order:

1. One naive record to rule them all.

Define a record type matching the API exactly, for example:

type alias Record =
    { id : Int
    , name : String
    , optional_value : Maybe String
    }

If I need a blank one, I just stuff anything in it:

blank : Record
blank =
    { id = -1
    , name = ""
    , optional_value = Nothing
    }

I also use some simple helper functions like: isPersisted { id } = id > 0 to check if the data has been persisted or not.

This felt un-Elmy, and I obviously had some problems with treating un-persisted records as persisted occasionally, or POST-ing a “-1” to the API accidentally. The “isPersisted” checks were all over the place and ugly.

2. Use a union type to indicate “Blank” values.

At first I used Maybe, but then I decided to get more explicit:

type PrimaryKey
    = PrimaryKey Int
    | Unsaved

type alias Record = { id : PrimaryKey, ... }

isPersisted : { a | id : PrimaryKey } -> Bool
isPersisted {id} =
    case id of
        PrimaryKey _ ->
            True
        Unsaved ->
            False 

I’m using this a lot now, and if anything it’s honestly messier. I have to unwrap the ID every time I want to use it, and I can never assume that the record has an ID, even though it will only ever be Unsaved once in its lifetime!

Calling toString on my record ID has actually been a large source of bugs as well, in instances where I replaced Int with PrimaryKey. The compiler didn’t catch it. Lesson learned: toString is a bit unsafe. Use your own String conversion helpers that expect a specific type.

3. Separate types for “new” and “persisted” records and a union type to combine them.

type alias NewRecord =
    { name : String }

type alias PersistedRecord =
    { id : Int, name : String }

type RemoteRecord e a b =
    = NotAttempted a
    | Saving a
    | Persisted b
    | Failure e

I thought I’d really hit on something here, but it is a royal pain to deal with this type. I have to write code to handle every case everywhere it appears, because no helper I have come up with seems useful. The “Saving” state is particularly annoying, and only occasionally meaningful. The fact that the encapsulated record might be different is an incredible pain. It actually makes me miss just putting a “-1” in the ID and calling it a day.

That said, I never reference an ID when I don’t have one now!

But it really makes me ask, just how many of my Elm bugs had to do with handling persisted and un-persisted data with one record type in the first place? I think my answer has to be: Not many, it just felt ugly and object-oriented.

What do you guys think? I think I am still missing something.

5 Likes

Really interesting to hear about your journey with this!

I read a great article a while ago that I can’t find now, frustratingly! It had some great recommendations on how to structure code with Maybe, Result and friends. Might have been Haskell examples.

The idea was to write your logic as if you always have the data and it’s always successful. Then use Maybe.map, Maybe.andThen, etc to deal with the Nothing cases. It’s a really nice approach and cleans things up a lot. You get rid of all the case expressions.

This would apply most directly to your strategy #2, with the database ID as a Maybe Int. I think this is something you tend to lose when you do a custom union type rather than a standard one. You could implement your own map and andThen, but experimentation is a lot easier if you can try out functions from the Maybe package, or even Maybe.Extra.

You might be able to create a custom map over your union type in strategy #3, I’m not sure how well it would work.

1 Like

Can you write out an example spec? Like “I have a form which asks the users for these pieces of information, I want to persist these parts of it to the server under these circumstances, I want to gracefully handle errors in these ways,” etc.

3 Likes

Haha not well! I tried it. The two record types are a real problem, and what I found is that the way I wanted a map to work changed in the context of where I was using the record. It’s very difficult to explain, but a big problem I’m having is that context matters. The RemoteData packages that are available publically do not match my context well, and my RemoteRecord really isn’t perfect either.

You mean like as an exercise in design discipline before my next attempt? Or would you like to see one for my latest train wreck? :slight_smile:

I think the model I want to move toward is using a “Form” type as an intermediary that deals only with strings, and converting to an API record type when it’s time to try and persist it.

1 Like

Not code, just a description of what the code will need to accomplish! Maybe someone here can come up with a nice approach.

It’s easier to think about things to try in the context of a specific problem statement. :smiley:

1 Like

Okay, here we go:

App’s purpose: Create printable document consisting of some header information about a job (contract), and some lines representing items in that contract. All user input will be persisted as soon as possible, so they don’t lose work. Ideally, it should not be possible to enter invalid data. We are using a REST JSON API for each record type.

  1. Given that we have a contract, we want to start a new document about that contract and show some details about the contract.

  2. We want to wait until the user makes a minimal change to save the document, to avoid database clutter.
    A minimal change will be “adding a line to the document” by clicking “Add Line”.

  3. When the user clicks “Add Line”, we have to persist the document, and then use its ID to persist the line so that it can be modified. (I would like to avoid persisting the line immediately, but all the line edits involve adding other records that reference the line that we won’t go into, so we have to do that first.)

  4. While the document is saving, the document needs to be “read-only”, to avoid complications.

  5. We want to show the new line immediately, but disallow editing until it has been persisted, to make the page feel snappier.

  6. Subsequent “Add Line” clicks will skip step 3 (save the document), but still need to wait until they are persisted for editing.

The main pain point I had is the app will always end up in a happy state eventually where we have all persisted records and the user can do anything they want, but I kept having to check if I was in that state everywhere I dealt with the line and document types.

1 Like

I’ll describe approaches I’ve used that I’ve been happy with.

For records that are persisted in a database and have an arbitrary key, instead of storing that key in the model of the record, I would keep a Dict of those records indexed by the key. So instead of storing something like

    records = List { id : Int, name : String }

I would instead store

    records = Dict Int { name : String }

where the Int is the arbitrary key.

This gives a consistent model for valid records, whether they’re persisted or not.

Let’s say the type for a valid record is Record; I would have a separate type for unvalidated forms, RecordForm. You can convert between the two as needed:

    recordToForm : Record -> RecordForm
    
    formToRecord : RecordForm -> Result RecordFormErrors Record

Like you said, the advantage is that the RecordForm only has to deal with types that relate to user inputs. It also makes it more natural to define an emptyForm value without resorting to definitions like id = -1.

This approach to validation has been mentioned before and there is at least one library that promotes the approach.

At the top-level model, I tend to store server data separately from any other state. This makes it simple to reload server data, as it can safely supersede existing server data without needing to resolve conflicts with user-entered data or any other state.

    type alias Model =
        { serverData :
            { records : Dict RecordId (Remote Record)
            , ..
            }
        , clientData :
            { editedRecords : Dict RecordId RecordForm
            , newRecord : Maybe RecordForm
            , ..
            }
        }

It sounds like you’re on the right track! I hope this helps.

5 Likes

Thank you, that looks excellent! You just gave me a lot of brain waves. I was definitely heading toward separate form types and separate server data repositories slowly, but you just got me there a lot quicker, and I have already had some success with a repository type like that.

I do have a few questions about specific implementation details, but I will need some time to write it up.

1 Like

I’m not sure I understand your exact pain point - there are a number of different issues wrapped up together here. But I’m glad for the discussion because I think they are issues most of us face one way or the other.

If your document data is not too big, I think I would try to model every edit within a document at the Document level. In other words, “Add Line” triggers a backend call that returns data representing the entire changed document. Which you use to replace the whole document in your Elm model. And some edit within of an existing line likewise triggers a backend call that returns the entire document. For one thing, then you are only dealing with the Saving state of the Document, not individual Lines.

This may be completely obvious to you and more or less what you’re doing, so I don’t know if that helps at all.

The main pain point I had is the app will always end up in a happy state eventually where we have all persisted records and the user can do anything they want, but I kept having to check if I was in that state everywhere I dealt with the line and document types.

I wouldn’t think you would need to check this state (i.e. case on it) more than once in the view – where else did you find you needed to do it?

1 Like

Yeah, that is my whole problem: lots of issues! It’s difficult to describe the minimal case. Situation: My model has to transition between un-persisted and persisted states. Issue: How to I represent un-persisted vs persisted data so that both are easy to work with?

If it is un-persisted, it may be missing all kinds of strings and keys that it will have later, but I still need to let the user change it. Once it is persisted, it will always have those values, so it is annoying for them to be Maybe.

When updating the data from the server, via much the model you described. I need to case over the data every time the user makes a change, if I wrap it in a union type.

1 Like

I’m having a lot of success with the following pattern (using Firebase).

  1. Subscribe to data I’m interested in, as I become interested. The incoming data is the only source of truth.
  2. I cannot change my model locally. I can only craft and send an update over a port (which gets applied to Firebase).
  3. If/when the data gets updated successfully (by me or someone else), I’ll be notified because I’m subscribed to changes, and my model will get updated appropriately.

I have an Id type and sentinel value of empty, which is used to determine if something exists in the database, or has not been persisted.

PROs: simple, has always worked.
CONs: if it doesn’t work, the experience would be: I tried an update and nothing happened. No error, etc.

1 Like

I think I am in the same position as you @wmakley , and I have been thinking about the same questions. At work and in my own projects at home I have been trying to manage persisted and unpersisted data and I think I have gone through both of your ideas of different record types for persisted and unpersisted data, and having a Maybe Id kind of thing.

Eventually a coworker of mine recommended something like what @justinmimbs suggested, and I would say it works pretty well. Its my favorite option right now. At work we’ve been calling the Dict Id Record thing Entities, and in my personal projects Ive been calling it a Db. I think the only draw back is if you are passing around ids and Db around, theres the logical possibility that your model is saying something like the selected record is such and such id, but theres no record with that Id in your Db, and accounting for that possibility is a real drag.

I have a library where I am trying to formalize good practices on this. Its at http://package.elm-lang.org/packages/Chadtech/id/3.2.0 . Right now its not documented very well and doesnt reflect best practices as I currently know them. PRs welcome! But whether it be in my existing package or somewhere else I would love to collaborate on a community-wide solution to this general problem.

2 Likes

All user input will be persisted as soon as possible, so they don’t lose work. Ideally, it should not be possible to enter invalid data.

If it is un-persisted, it may be missing all kinds of strings and keys that it will have later, but I still need to let the user change it.

Between these statements I only have a fuzzy sense of what the intended UX is. On the one hand you are describing a situation where the aim is to save as soon as possible; on the other the aim is to save only when the input is valid. Yet it sounds like there is no explicit user ‘submit’ action, is that right? Or are you saying the “Add Line” is a submit action after the user enters the contents of the new (not yet persisted) Line?

And is it possible to be a little more concrete, for instance give an example or two of cases where there is invalid data and you don’t want to save it yet?

In general if you are dealing with wanting to validate input, and edit validated models, I agree you are best served by separate data structures and something like recordToForm and formToRecord (aka validate) functions as @justinmimbs outlined.

And how/when do you want to communicate to the user that their input is invalid, or is this not needed?

Hey me too, but thanks for pointing it out. As usual, I was up against a deadline without a clear idea of how to proceed. This doesn’t work as well with Elm as it does for me with Rails CRUD apps.

Let me break it down to a VERY small almost incidental part of this app.

  1. The user needs to select an item from a dropdown.
  2. They need the ability to add option to the dropdown, if the desired option is not available.
  3. The form to create a new option has two fields, and both are required. The “submit” button should not be enabled until both fields are filled in with valid data. (The user should never see an error, just a clear indication of what they need to enter, primarily because parsing and displaying errors from the API would be even more code I don’t have time for.)
  4. When the user submits their new option, it will selected automatically.

Behind the scenes, each item can only be selected after it has been persisted, and has a database ID. So when the user submits the form, we need to send a “new” record to the API without an ID, and get a “persisted” back with an ID. Then we can add that record to the select box.

I coded my API functions to all work on the same type of record, so when I need to create a new one I give it a record with Nothing for an ID, and get back a record with Just Int. This little hitch is constantly annoying when I can usually gurantee that the ID will be one or the other given the context. I always persist everything as soon as possible so the users don’t lose work.

That’s the key design decision right there, I think. Once you decide to use the same record type everywhere, you fundamentally have to have case or map all over the place. No way round it.

Your other option could be to rewrite the function signatures to take the ID as a separate argument from the rest of the stuff.

type alias DatabaseId = Int
type alias OtherStuff = {field1: String, field2: String}
type ThingModel = ThingModel (Maybe DatabaseId) OtherStuff
viewUnsavedThing : OtherStuff -> Html Msg
viewSavedThing : DatabaseId -> OtherStuff -> Html Msg

You may find that you only need one or two case expressions, in the top-level functions of your view and update.

1 Like

So when the user submits the form, we need to send a “new” record to the API without an ID, and get a “persisted” back with an ID.

To add to what @Brian_Carroll said the stuff you send to the backend does not have to match what you get back from the backend – assuming that you control how the backend works. It’s a kind of dogma drilled into us that we are passing back and forth models (objects that we C/R/U/D), but it doesn’t have to be that way. In a lot of situations we are just modifying an aspect of a resource, for instance. Or there might be different ways of creating a resource, with more or less data attached.

I think of the bits of data we encode in messages to the backend less as models and more as Msgs – Msgs that are used by the backend to init and update persisted models. Just like Msgs in an Elm app, they might contain whole models; or they might contain just bits of data used to update models in various application-specific ways.

So for instance your dropdown example. I could be wrong, but it sounds like you are implementing something like a combobox, but with two fields instead of one for the “manual entry” option? Setting aside the validation, and assuming the two fields are String, it seems like you could have something like this (simplified):

type ComboBoxValue
    = Selected ID
    | Entered String String

type Msg
    = Submit ComboBoxValue
    | Submitted (Result Http.Error Model)

update : Msg -> Model -> (Model, Cmd Msg)
update msg model
    case msg of
        Submit value ->
            case value of
                Selected id ->
                    ( model, Task.attempt Submitted (submitSelected id model) )
                Entered field1 field2 ->
                    ( model, Task.attempt Submitted (submitEntered field1 field2 model) )
        -- ...

For what it’s worth, here is what I’m doing at the moment. This is a work in progress so I do not yet have enough hindsight but it seems to work well.

My objects are records like:

type alias Object =
    { id : ObjectId
    ...
    }

and their id are like:

type alias Id =
    Int


type ObjectId
    = ObjectId Id
    | LocalObjectId Id

with an “unboxer” like:

unObjectId : ObjectId -> Id
unObjectId objectId =
    case objectId of
        Objectd id ->
            id

        LocalObjectId id ->
            id

The objects are stored in an opaque Dict Id a that I initialize with the “unboxing” function so I can actually have several ones using different id types.

Also:

  • objects from the database have a ObjectId Id with a positive integer as the id (the one from the db)
  • objects generated locally have a LocalObjectId Id with a negative integer as the id that is unique locally (using Maybe.map ((+) -1) (minimum localIds) |> Maybe.withDefault -1 to generate one)

When objects are sent to the backend to be persisted, the backend generates a new unique db positive index and replace the negative local id by it, and sends back the object with the local id as a reference in the answer to allow the client to update the object.

This way:

  • All objects, persisted or not, are stored in the same data structure, without using Maybe
  • It is easy to filter reliably unpersisted ones to send them to the backend (using the record typed id field)
  • Most functions using the objects do not care if they are local or not and I can use Keyed as every “unboxed” id is unique locally
  • The typed ids are used to identify the objects in messages, so I cannot not inadvertently mix them up between different types of objects

The drawback is that I must be careful to avoid race conditions in messages that could use a local id after it has been changed to a database one, but this does not seem to be a problem specific to this solution. Also I store the id both as the dict key (unboxed) and in the object record (typed), so there is some slight duplication waste. I will see how it goes…

@dmy I have considered a scheme like that! I have seen the negative ID thing done using SQL databases in a non-elm project, with satellite clients that update the master. It can definitely work.

You’re right to point out the race condition, something that I feel more Elm writings should touch on. I usually generate dumb “UID” values for my collections of GUI elements and wrap them up in a Tuple or record, and ignore the database ID’s entirely. I hate dealing with this implementation detail, and am trying to find ways to abstract it out. I would love an efficient ordered Dict type.

@Brian_Carroll perhaps it was Problem Solving with Maybe ?

1 Like