How to write a generic JSON decoder?

Some data formats are extensible, in addition to a fixed set of fields they may also allow additional, not known in advance fields to be included in the JSON.

The case I am looking at just now is the Open API spec, which allows extensions in the form of fields starting with x-. So if you had some record type definition, and for code generation purposes you could specify what type something is to be mapped to in the target language with an extra field like this: x-elm-type: String. There can be any number of these additional x- fields for any purpose as a general mechanism for making the spec extensible - they can also have values that are JSON objects or arrays.

So my question is, how do you write a Json decoder that can handle any valid Json, decoding it into a Dict?

My first attempt was to use the Json.Decode.dict function, and try and decode things as Strings. The example given in its docs is:

decodeString (dict int) "{ \"alice\": 42, \"bob\": 99 }"
  == Ok (Dict.fromList [("alice", 42), ("bob", 99)])

but if I change that to:

decodeString (dict string) "{ \"alice\": 42, \"bob\": 99 }"
  == Ok (Dict.fromList [("alice", 42), ("bob", 99)])

it fails, because Json.Decode.string won’t allow 42 as a String.

Perhaps I can try all the possibilities and combine them with Json.Decode.oneOf?

1 Like

With decoders, I find that it’s impossible to give good advice without knowing both the format coming in and what you want to do with the result in Elm. Ultimately a decoder is about mapping a concept in JSON to a concept in your Elm program’s data structures, and it depends on both of those two ends equally.

For instance, if all you’re going to do with these extended fields is remember them and reencode them as JSON later, you can use Json.Decode.value to get the fields as unparsed JSON blobs. You can do that with

decodeDict : Json.Decode.Decoder (Dict String Json.Decode.Value)
decodeDict = Json.Decode.dict Json.Decode.value 

If you want to use the values in Elm, though, this doesn’t help very much. You could try to decode them into some completely generic JSON elm data structure, but that’s unlikely to be of much help (I wrote an article about that a while back).

So I guess my question is: Suppose you could do anything you wanted with your decoder. What would its type be and how would you use it in Elm?

Thanks - I need to specify the problem a bit more clearly.

Suppose this is the JSON I was trying to decode:

   name: "rupert",
   age: 42,
   x-superpower: "teleportation"

I want to decode the { name : String, age : Int } part which will always be the same and known in advance. I also want to decode the x-superpower part, which is optional and can be any number of additional x- fields of any type.

Those x- fields are perhaps not so useful, since they do not map to static types in my Elm program, it is true. In the first instance all I may end up doing with them is displaying them in some UI - so for that reason at least I would like to capture them, what their names are and what their values are.

In that case I think I’d try using the decodeDict decoder I wrote above and filtering the result so that the only keys are the x-* fields. Out of the box Elm doesn’t have good support for formatting JSON values as strings, but it looks like you can use for that.

Thanks, that gives me an idea.

Use a generic Dict decoder that lifts out the x- fields, and also a specific Decoder for the fixed format fields that are expected and ignores any x- fields. I can then combine those together to get what I am after using Decode.map2 - gives me a direction to get started with anyway.

1 Like

What you can do is lift all the fields to strings.

module Main exposing (main)
import Html exposing (text)
import Json.Decode as Json 
import Dict

fromBool val = 
    if val then "True" else "False"

asString = 
        [ Json.string 
        , String.fromInt
        , String.fromFloat Json.float
        , fromBool Json.bool
        , Json.null ""

myDecoder = 
    Json.map3 (\name age powers -> {name=name, age = age, powers = powers} )
        (Json.field "name" Json.string)
        (Json.field "age"
powersDecoder = (Dict.filter (\k _ -> not (List.member k ["name", "age" ]))) (Json.dict asString)  
inputJson = """
   "name": "rupert",
   "age": 42,
   "x-superpower": "teleportation",
   "x-king": true,
   "x-op": 9001

main =
     text <| Debug.toString <| Json.decodeString myDecoder inputJson 

link to Ellie

Of course, this would not work for collection fields (arrays, objects) but you can extend asString to cover those cases if you need to.

1 Like

This looks good to me! I’d maybe use this filter function instead: (\key _ -> String.startsWith "x-" key)

This would work too if all the other fields have a predictable prefix. The solution I proposed has the advantage that it would capture all fields other that a known set (name, age).

Here is what I did so far to decode to the data structure from Jacob’s article:

import Dict exposing (Dict)
import Json.Decode as Decode exposing (Decoder)

type Json
    = JString String
    | JBool Bool
    | JInt Int
    | JFloat Float
    | JNull
    | JObj (Dict String Json)
    | JArr (List Json)

string : Decoder Json
string = JString Decode.string

bool : Decoder Json
bool = JBool Decode.bool

int : Decoder Json
int = JInt

float : Decoder Json
float = JFloat Decode.float

null : Decoder Json
null =
    Decode.null JNull

array : Decoder Json
array =
    Decode.list generic
        |> JArr

dict : Decoder Json
dict =
    Decode.dict generic
        |> JObj

generic : Decoder Json
generic =
    Decode.oneOf [ bool, int, float, null, string, Decode.lazy (\_ -> array), Decode.lazy (\_ -> dict) ]

Its recursive so had to make use of Decode.lazy. Seems to work nicely - just need to add the Dict filter and so on.

For OpenAPI, the extra fields always start with x-, so I can use that filter pattern.

I was intrigued to take a look at the pretty printer, as it also must do generic JSON decoding. It bypasses having a generic model to describe the JSON, and goes straight to building the pretty printable document. However, the structure is similar to what I came up with:

decodeDoc : Int -> Decoder Doc
decodeDoc indent =
                [ stringToDoc Decode.string
                , numberToDoc Decode.float
                , boolToDoc Decode.bool
                , (listToDoc indent) (Decode.lazy (\_ -> Decode.list (decodeDoc indent)))
                , (objectToDoc indent) (Decode.lazy (\_ -> Decode.keyValuePairs (decodeDoc indent)))

Interestingly, that article was a result of me doing this same exercise, and writing just about the same code as you did, while trying to figure out if I could make a better decoder API. Doing that work and then trying to use the resulting Json values really drove home for me the idea that a decoder isn’t simply bringing JSON into Elm, but bringing JSON into your Elm program. A converter of JSON into a generic Json data structure is pretty simple to write, but doesn’t buy you very much – you still need to turn Json into data structures your program understands, and that transformation is basically just as difficult as the original problem.

That’s why I now try to steer people away from generic decoding as a first step, and instead steer them towards deciding what they’d want the native Elm representation of the JSON to be if they could choose anything, and then writing a decoder directly from JSON to that.

Quick reminder to y’all that Json.Encode.encode : Int -> Json.Encode.Value -> String exists :smiley:

I agree, and its a good observation.

There are a couple of situations where a generic decoder can be useful:

  • If you wanted to validate an arbitrary json against an arbitrary json-schema - which of course has its own fixed schema, that you would decode and write a program around.
  • If you wanted to write a UI to help a user understand and work with arbitrary JSON. Say to visualise it or search it.
  • Programs that take JSON and try to infer its schema, or automatically map it to a data model. For example,

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.