Experimental JSON decoding API


#1

I put together an experimental JSON decoding API that addresses some longstanding issues with Decode.Pipeline. The README in the repo explains the motivation and the design:

I’m looking for feedback on this, specifically:

  • Would this meet your needs when it comes to the two use cases of (1) decoding JSON objects with many fields, and (2) decoding JSON objects where some fields may be missing, or is there still something missing?
  • If you’re currently using elm-json-decode-pipeline (or something similar), is this a design you’d be interested to try instead?

To keep the thread focused on the API design, let’s assume elm-format introduces support for this style of DSL. The idea is fairly moot if elm-format doesn’t support it, but it’s a chicken-and-egg situation; there’d be no reason for elm-format to add support for it unless there’s interest.

Thanks to @supermario for helping to identify this approach!


#2

I made a lib for this some time ago which seems to do the same:

I haven’t published it yet since I was going to put it up here first to get some feedback. I guess you beat me to it :slight_smile:

In my company we switched to using this style some months ago and we like this style better. It produces code that is easier to understand, and it also avoid bugs when reordering/adding fields in a record.


#3

It seems tedious having to repeat the field names (or name the intermediate values something else). I would love an API where you could specify the decoders right at the record instantiation, like this:

decoder : Decoder User
decoder = somethingHere {id = require "id" int, name = require "name" string}

This might not be possible in Elm, but if it were, it would solve the ordering issue without introducing even more code than in the current pipeline case.


#4

For someone relatively inexperienced with Elm like me, the nested anonymous functions syntax of the proposal seems fairly confusing. Then again it may be down to the back pipe usage, always gives me headaches!

Do you think adding another example based on the final example (with last/first name) that showed how it could be further simplified with those anonymous functions being replaced with named functions would help? Or would that actually confuse matters?


#5

@ianmjones Thank you for the feedback! I’ve added an explanation of how things fit together:

If you have time to read it, any further feedback would be appreciated! (Totally fine if the feedback is “it’s still confusing, even after reading that.”)


#6
  1. If I swap the order of email and name in my type alias , everything will continue to work correctly. This version is not at all coupled to the order in type alias because it does not use the type alias record constructor function.

The proposed experimental API doesn’t really fix this issue. Using a lambda to construct a literal record fixes the issue regardless of whether you’re using core decoders, pipeline decoders, or this experimental approach. In all three cases, users are likely to use the auto-generated constructors for large records.


#7

I like it a lot for deciding large, sometimes complex Json. Are there any performance implications?


#8

That new TEACHING doc was very helpful, it nicely explains each of the parts of the pattern as it progresses to build the final syntax. Thank you!


#9

Although, this mostly offtopic., I’ll share my opinion, since partly related: I think the best approach to avoid the mistake is that users don’t have to write decoders by hand in the first place. That said, I hope tooling around generating en/decoders will improve.

Really offtopic, but for the ones interested: There is elm support in https://github.com/OpenAPITools/openapi-generator :slight_smile:


#10

The idiom you are using here is what the “do notation” of other languages desugars to. Or, to put it another way, what you’re proposing is an idiom which is “do notation without the syntax sugar”.

For those who aren’t familiar with “do notation”, something like this …

userDecoder : Decoder User
userDecoder =
    require "username" string <| \username ->
    require "id" int <| \id ->
    require "name" string <| \name ->
    succeed { id = id, username = username, name = name }

… would be equivalent to something like this (in languages that have “do notation”) …

userDecoder : Decoder User
userDecoder =
    do
        username <- field "username" string
        id <- field "id" int
        name <- field "name" string
        succeed { id = id, username = username, name = name }

The proposed idiom is also similar to the way we used to write andThen before its arguments were flipped a few Elm versions ago. That is, in Elm 0.17 you might have seen code that looked like this:

userDecoder : Decoder User
userDecoder =
    field "username" string `andThen` (\username ->
    field "id" int `andThen` (\id ->
    field "name" string `andThen` (\name ->
    succeed { id = id, username = username, name = name } )))

So, what’s my point?

I suppose what I’m skeptical about is this:

To keep the thread focused on the API design, let’s assume elm-format introduces support for this style of DSL. The idea is fairly moot if elm-format doesn’t support it, but it’s a chicken-and-egg situation; there’d be no reason for elm-format to add support for it unless there’s interest.

What you’re proposing isn’t really a unique style of DSL for JSON decoders. It’s a general-purpose idiom, for which there exists syntax sugar in some languages, and which is familiar from Elm’s history. So, whether to encourage this idiom, and what kind of syntax or formatting support to give it, are important questions for Elm generally, and questions that have been given some thought in the history of Elm.

So, I don’t think it’s quite right to say that there would be no reason for elm-format to support this idiom aside from interest in this particular API design. It’s an idiom that has general significance. For elm-format to make it look nice would encourage its use. If that were to happen, it would tend to get used more often, in all sorts of contexts. So, whether this is an idiom that ought to be generally encouraged is a significant question.

For my own part, I think the idiom you’ve sketched out is an essential idiom, and it would be a good for Elm to have nice syntax or formatting to support it. But, I think it’s a bigger question than you’ve made it out to be.


#11

This is a cool formulation. That said, formatting is a pretty big issue to the extent that elm-format has been pretty strongly embraced.

I think you also just re-invented Haskell do notation. Or rather putting a pleasing syntax on this would be essentially what Haskell do notation does.

Mark


#12

Yeah, maybe a better way to say it would be “this approach makes it easier to decode into record literals, which do not have this problem.”

I guess you could kick off a Decode.Pipeline by passing succeed an anonymous function whose argument order matches the order of the |> steps, but at that point I think I’d prefer this approach for its other benefits (introducing a let and such.)


#13

That’s a totally fair point!

I think that’s definitely worth discussing in the context of an elm-format feature request, but I don’t want to get ahead of myself. I wouldn’t propose an elm-format feature request unless it would permit something valuable, so I want to keep this thread focused on the questions in the OP.


#14

I recognise the suggested benefits, but I still have a fairly strong preference for the existing API:

  • It is similar to the elm/parser API, and it’s nice to have this kind of uniformity as JSON decoding is a special case of parsing. It’s also a good stepping stone towards learning elm/parser.

  • In the experimental API, the name of each field appears 4 times (or 3 times if using the record constructor) instead of once with the existing API. Particularly in the case of JSON objects with many fields, this will be very noisy.
    For example, when parsing Postgres query plans, I have field names like “Actual Total Time” or “Rows Removed by Filter”, with correspondingly long field names in the records.

  • As Joël pointed out, fields can still be swapped if people use a constructor with succeed (and it’s likely that they will in order to avoid typing field names yet again).

  • This API is based on continuation passing, which is known to result in “callback pyramids of doom” in JavaScript. On one hand, the negative consequences are limited because each callback is very short. On the other hand, if this approach becomes popular and spreads to other packages and applications (especially if aided by elm-format), I suspect that it will lead to similar problems as in JavaScript (overly complex functions, over-reliance on inversion of control, refactoring difficulties, line length issues).


#15

Thanks for the pushback, @alexkorban!

That’s true, but the alternative teaches the relationship between map and andThen, which I think is a more valuable concept for a beginner to learn.

I think the more applicable comparison would be to Promise chaining in JS (via .then()). The callback pyramid is caused by the stylistic choice of indenting further with each chained call, which isn’t typically done with Promise chaining and of course isn’t done here either!

I think the tradeoffs of using this pattern elsewhere would be more likely to look like the tradeoffs of do notation in Haskell or PureScript (as @rgrempel and @MarkHamburg noted) than of callbacks in JS.


#16

I haven’t benchmarked it, but the implementations are barely any different, so I don’t expect a noticeable difference either way!


#17

I have been trying to solve the same problem in webbhuset/elm-json-decode
I came up with a slightly different approach in my lib and I want to share my reasoning behind it.

In your lib you have the following four methods exposed:

  • require
  • requireAt
  • default
  • defaultAt

while I have these:

  • required
  • requiredAt
  • optional
  • optionalAt

The difference is that optional gives you a Maybe value. If the JSON field is missing or its value can not be decoded Nothing is returned. Otherwise Just value is returned.

I found that the optional approach covers more use cases over using default and I don’t think it is harder to use or decreases readability.
I actually think optional improves readability slightly.

Readability

I will exemplify what I mean with optional improving readability:

Using default

decoder : Decoder User
decoder =
    require "id" int <| \id ->
    require "email" string <| \email ->
    default "name" string "Guest" <| \name ->
    succeed { id = id, email = email, name = name }

Using optional

decoder : Decode User
decoder =
    required "id" int <| \id ->
    required "email" string <| \email ->
    optional "name" string <| \maybeName ->
    succeed
        { id = id
        , email = email
        , name = Maybe.withDefault "Guest" maybeName
        }

What I like is that you can read the decoder like this:

  • The top of the decoder just collects things from the JSON object.
  • The second part builds an Elm type from the collected data and processes data if needed.

This does not make a big difference in this example, but imagine that you have a lot of decoders in your application and each decoder has more than 10 fields.

Other use cases

Sometimes you don’t want to just default the value. When working with custom types for example:

type User
    = Guest
        { name : String
        }
    | RegisteredUser
        { id : Int
        , name : String
        }

decoder : Decoder User
decoder =
    required "name" string <| \name ->
    optional "id" int <| \maybeId ->
        case maybeId of
            Nothing ->
                Guest
                    { name = name
                    }
                    |> succeed

            Just id ->
                RegisteredUser
                    { id = id
                    , name = name
                    }
                    |> succeed

Here is another example that would be awkward to implement with default:

decoder : Decode User
decoder =
    required "id" int <| \id ->
    optional "firstname" string <| \maybeFirst ->
    optional "lastname" string <| \maybeLast ->
    succeed
        { id = id
        , name = 
            case (maybeFirst, maybeLast) of
                (Just first, Just last) ->
                    first ++ " " ++ last

                (Just first, _) ->
                    first

                _ ->
                    "Unknown"
       }

I know these examples are “made up” but I hope that I managed to exmplain my reasoning.

My Conclusion

  • optional can do the same as default without loosing readability or making things more complicated to use.
  • optional also covers some use cases that would be awkward to implement using default

#18

@albertdahlin thanks for writing that up! Your reasoning makes sense to me.

I defaulted to the same design as Decode.Pipeline, but I agree that the ease of doing a Maybe.withDefault in this style makes optional producing a Maybe better overall.


#19

I don’t really like it. It can be useful for some cases but it introduces more complexity (more mental work) and it is harder to read and write compared to the existing pipeline. As @albertdahlin mentioned, having to write/read the same variable name 3/4 times, plus cryptic symbols “<|”,"->" makes it more error prone and all of this just to avoid position arguments.
(complex decoding can be done by lazy/map).
This is of course all about tradeoffs, but for me, Elm is already too verbose I would try to keep it cleaner.


#20

@SergKam Interesting you’d say that. I am of the complete opposite opinion. Maybe it depends on your use case which approach is more complex. Also what you are used to. I don’t think verbosity is a bad thing necessary.

For us (in my company) it is common with more than 10 fields in an object / record. It is also common that all fields are of type String. We also add fields frequently and we are 10+ developers in the same codebase. The type aliases are usually not in the same file as the JSON decoder (we have more than one decoder for each record depending on which API we are using).

We ended up with this decode API because no one liked to deal with record field orders and accidentally breaking things.

I think it would be valuable to get more input from beginners. Which approach is easier to use and understand when you are writing a decoder for an object for the first time? That is what I want to optimize for at least.

Using map:

import Json.Decode as Decode exposing (Decoder)

person : Decoder Person
person =
    Decode.map5 Person
        (Decode.field "id" Decode.int)
        (Decode.field "name" Decode.string)
        (Decode.maybe <| Decode.field "weight" Decode.int)
        (Decode.field "likes" Decode.int
            |> Decode.maybe
            |> Decode.map (Maybe.withDefault 0)
        )
        (Decode.succeed "Hardcoded Value")

Using continuations:

import Json.Decode as Decode exposing (Decoder)
import Json.Decode.Field as Field

person : Decoder Person
person =
    Field.require "name" Decode.string <| \name ->
    Field.require "id" Decode.int <| \id ->
    Field.attempt "weight" Decode.int <| \maybeWeight ->
    Field.attempt "likes" Decode.int <| \maybeLikes ->

    Decode.succeed
        { name = name
        , id = id
        , maybeWeight = maybeWeight
        , likes = Maybe.withDefault 0 maybeLikes
        , hardcoded = "Hardcoded Value"
        }