Safe and explicit records constructors exploration

Why?

I was thinking about the weaknesses of records constructors and the APIs that use them, for example decoders, parser and generators:

  1. Records constructors depend on fields order. Changing the order of fields in the type alias may break silently code using its constructor if some fields had the same type.
  2. Fields are not explicit using the constructor. This forces to look at the type alias to know which is which, and make APIs using records constructors look a little magical.

Those flaws are well known, and previous experiments to improve this have often been based on using some kind of do-notation, for example rtfeldman/elm-json-experiment, discussed in this post.

But the root cause of these issues is actually the signature of records type aliases constructors, which do not seem to have been discussed much since their initial introduction discussion (please tell me if you find something else).

So let’s try to improve those to see where this takes us.

How?

Safer records constructors

Currently, for the following record:

    type alias Point = { x : Int, y : Int }

We have the following constructor defined automatically:

    Point : Int -> Int -> Point

What if instead the constructor was:

    Point : { x : Int } -> { y : Int } -> Point

You can think of it as merging single-field records together to get the whole one.

This would prevent using expressions like Point 0 0, but this is not a bad thing as arguments order is not safe for fields of the same type, fields are not explicit, and I suspect that { x = 0, y = 0 }performance is better.

Improved constructor signature and error messages

First, this new constructor signature allows to understand the record built without looking at the record type alias. For example in elm repl:

    > Point
    <function> : { x : Int } -> { y : Int } -> Point

Also partially applied constructors are more explicit and safer:

    > Point { x = 0 }
    <function> : { y : Int } -> Point

Swapping by mistake the arguments can today lead to a logical error without any compiler error if the fields have the same type.

With this new constructor signature, there would be a compilation error, with a very useful message:

-- TYPE MISMATCH -------------------------------------------------- src/Main.elm

The 1st argument to `Point` is not what I expect:

24|     Point { y = 0 } { x = 0 }
              ^^^^^^^^^
This argument is a record of type:

    { y : number }

But `Point` needs the 1st argument to be:

    { x : Int }

Hint: Seems like a record field typo. Maybe y should be x?

Safer and explicit mapN and pipeline APIs:

This constructor would be used with Json.Decode like:

    import Json.Decode exposing (..)

    map2 Point
        (map (\x -> { x = x }) (field "x" int))
        (map (\y -> { y = y }) (field "y" int))

This could be more readable and there is a remaining repetition, but already:

  • Fields order cannot be wrong
  • Fields are explicit

Record singleton constructor for improved readability and no repetition

As record singletons would be used a lot, it would make sense to have syntactic suggar
to create them. Mirroring the current .field one, we could have:

{ field } which would have the signature a -> { field : a }

It seems logical to me: { field = 2 } creates a { field : Int }, so { field } needs a missing argument to create a record. I believe it would not add too much complication for the compiler parser either, and would fit naturally in the Elm language syntax.

We would get some readable and robust APIs:

    map2 Point
        (map { x } (field "x" int))
        (map { y } (field "y" int))

Everything is now important, this is more apparent for example when decoding an array:

    map2 Point
        (map { x } (index 0 int))
        (map { y } (index 1 int))

It’s a little more verbose than current API, but I think its explicitness would make it easier to understand by beginners and would look less magic.

Maybe one drawback is that newcomers could confuse the syntax with records pattern matching, but from my experience, they rarely know or use the pattern matching syntax anyway.

Using the current json-decoder-pipeline API:

    succeed Point
        |> required "x" (map { x } int)
        |> required "y" (map { y } int)

The only verbose thing is the map and associated parentheses, but updated packages APIs could add functions to improve this if this is really an issue, maybe for example with something like:

    map2 Point
        (mapField "x" { x } int)
        (mapField "y" { y } int)

or maybe

   map2To Point
       ({ x }, field "x" int)
       ({ y }, field "y" int)

or in pipeline:

    succeed Point
        |> mapRequired "x" { x } int
        |> mapRequired "y" { y } int

These examples could most likely be improved, but this gives an idea.

Resolvers from json-decode-pipeline would also be more robust, preventing fields order errors:

    type alias User =
        { id : Int
        , email : String
        }

    userDecoder : Decoder User
    userDecoder =
        let
            toDecoder : { id : Int } -> { email : String } -> Int -> Decoder User
            toDecoder id email version =
                if version > 2 then
                    Decode.succeed (User id email)
                else
                    fail "This JSON is from a deprecated source. Please upgrade!"
        in
        Decode.succeed toDecoder
            |> required "id" (map { id } int)
            |> required "email" (map { email } string)
            |> required "version" int
            |> resolve

A singleton record can even be used for version, just in case other parameters are added later:

    userDecoder : Decoder User
    userDecoder =
        let
            toDecoder : { id : Int } -> { email : String } -> { version : Int } -> Decoder User
            toDecoder id email { version } =
                if version > 2 then
                    Decode.succeed (User id email)
                else
                    fail "This JSON is from a deprecated source. Please upgrade!"
        in
        Decode.succeed toDecoder
            |> required "id" (map { id } int)
            |> required "email" (map { email } string)
            |> required "version" (map { version } int)
            |> resolve

The single-field record pattern

Record singletons are quite safe parameters types, between scalar and opaque types, and they do not require a declaration like the latter. They can be used more generally as a pattern for named arguments, if removing ambiguity is needed.

Anonymous records constructors

It makes sense that the single-field constructor syntax would work with several fields,
compensating the removal of current constructors with scalar values.

{ x, y } would have the signature a -> b -> { x : a, y : b }

So defining the old version constructor would be as easy as:

    type alias Point = { x : Int, y : Int }

    unsafePoint : Int -> Int -> Point
    unsafePoint = { x, y }

Nested fields would not be supported, like the update syntax currently.

This would be useful for example to decode anonymous records that don’t have a type alias:

  map2 { x, y }
    (field "x" int)
    (field "y" int)

Compiler optimization

Record singletons would be used quite a lot, so it could make sense to optimize them even more.

I wonder if optimizing { x = value} to value when --optimize is used would be possible, but maybe ports automatic conversions would prevent this (if this was the only issue, forbidding single-field records in ports could be reasonable).

Feedback

I may have completely neglected some valid use cases of current records constructors signature, but the exercise was interesting nonetheless.

  • What do you think? Do you see any drawback?
  • Has this been discussed before?

Edit: the anonymous record constructor syntax has been discussed by the past in Easily Constructing Records · Issue #73 · elm/compiler · GitHub, with the following comment:

{x,y} is exactly the same as the pattern matching syntax and it is not clear that it is a function.

Another proposal was:

There’s also {x=,y=} to consider - similar to (,,,) in that only the actual values are left out. Also reminiscent of the way binary operators are curried in Haskell ( (5+) or (+x) ).

29 Likes

Seems really nice! I wonder if instead of the implicit constructor that takes a bunch of single-field records, it would be enough to just not generate an implicit record constructor and instead support the { x, y } anonymous record construction syntax that you propose. Then if you had

type alias Person =
    { first : String
    , last : String
    }

you could write

personDecoder =
    Decode.map2 { first, last }
        (Decode.index 0 Decode.string)
        (Decode.index 1 Decode.string)

to decode a Person from a 2-element array, or

personDecoder =
    Decode.map2 { last, first }
        (Decode.field "lastName" Decode.string)
        (Decode.field "firstName" Decode.string)

to decode from an object (note that I swapped the order of fields but everything would still work out). You don’t get the extra little bit of type safety of having each record field name on the same line as that field’s decoder, but I think it’s still pretty easy to check that the order of fields match up between the argument given to map2 and the individual field decoders (much easier than having to check against the Person type alias itself which may be hundreds of lines away).

6 Likes

@ianmackenzie This would be an improvement for small records already, although I thought that the proposed syntax would be mostly handy for records without a type alias. A common example would be a first step decoder before an andThen, even more since the tuples were limited to 3 elements at most:

type alias User =
    { name : String
    , age : Int
    }


userDecoder =
    Decode.map4 { firstname, middlename, lastname, age}
        (Decode.field "firstname" Decode.string)
        (Decode.field "middlename" Decode.string)
        (Decode.field "lastname" Decode.string)
        (Decode.field "age" Decode.int)
        |> Decode.andThen userDecoderHelp


userDecoderHelp :
    { firstname : String, middlename : String, lastname : String, age : Int }
    -> Decoder User
userDecoderHelp user =
    if user.age >= 0 then
        Decode.succeed
            { name = String.join " " [ user.firstname, user.middlename, user.lastname ]
            , age = user.age
            }

    else
        Decode.fail "invalid age"

Actually, I think it could have been interesting if the { x, y } : a -> b -> { x : a, y : b } syntax was introduced in 0.19.0 when the tuples were limited to 3 and their construction operator removed, independently from my single field record type alias constructors exploration. This might have made an easier upgrade path, facilitating the transition to records which are indeed better above 3 elements, even if they do not replace the tuples values pattern matching use case.

2 Likes

I’m no language designer, but as a beginner, here’s my point of view:

  • I’ve been bitten by the lack of safety of decoders before
  • The { x } syntax seems very natural coming from JS. Actually I’ve been hoping that this syntax would make it for all records, as I find it much more readable and concise (I use “same named” variable for records/JS objects in the majority of constructions/declarations).

So from my perspective, those changes would be great! :clap: :clap:

I think in general about this topic, that this constructors are a good feature, I understand that if you have same types consecutively, this can develop a mistake, but you have the same issues in a normal custom type like:

Type Boo =
    Boo String String

This is common an issue also in routing and URLs parsing, maybe we should think that this is not a record issue, and is an issue of type constructors?

@Juan_Gabriel_Fraire You can use the single field record pattern elsewhere, including in type constructors. You can actually use it today already.

This can be useful to remove ambiguity, without requiring a type alias to be useful, nor making partial application harder, nor requiring an opaque type:

type Boo = Boo { foo : String } { bar : String }

Compiler error:

boo = Boo { bar = "bar" } { foo = "foo" }

Partial application:

f = Boo { foo = "foo" }

If they were optimized to just the value, there could be used at no cost.

With the proposed anonymous records constructors syntax, using a single record instead of individual single fields records would be easier though, because this would not require a type alias (nor a lambda) to build them (fully or partially).

If the old syntax were dropped in favor of the proposed one, a lot of code would break. This is the only drawback that I can see. But perhaps both syntaxes could be allowed? The safety and clarity offered by the proposed syntax is really great.

1 Like

@jxxcarlson Adding only the new anonymous records constructors syntax as proposed by @ianmackenzie but without removing current constructors would be backward compatible. This might be an option to consider.

I guess one issue would be that just using the anonymous record constructor syntax wouldn’t scale that well to very large numbers of record fields; you’d end up with something like

Decode.map8 { id, first, last, age, email, address, ssn, phone }
    (Decode.field "id" Decode.int)
    (Decode.field "firstName" Decode.string)
    (Decode.field "lastName" Decode.string)
    (Decode.field "age" Decode.int)
    (Decode.field "email" decodeEmail)
    (Decode.field "address" Decode.string)
    (Decode.field "ssn" Decode.string)
    (Decode.field "phoneNumber" decodePhoneNumber) 

which starts to creep back into “hard to see errors in ordering” territory (although still much better than having to check against a far-removed type alias). Having your original proposal with just the type name passed to Decode.map8 and a single-field constructor on each line would certainly scale better.

It’s fun to think about potential extensions to the anonymous record construction syntax; could you mix it with normal record field assignment, basically “currying” normal record construction syntax? For example, could you use

Decode.map2 { x, y, z = 0 }
    (Decode.field "x" Decode.float)
    (Decode.field "y" Decode.float)

to decode some 2D point data into 3D points with a zero Z coordinate?

2 Likes

What do you think of this syntax for decoders?

Json.Decode.recordSequence
    { x = field "x" int
    , y = field "y" int
    }

In this concrete case, recordSequence's type would be { x : Decoder Int, y : Decoder Int } -> Decoder { x : Int, y : Int }.

The order of decoding would be given by the order of fields in the syntax.

A general type for this cannot be represented in Elm today, but if it had a powerful extensible record system it would be possible to give recordSequence a type that is abstract over the actual record.

It is possible to define such a function in typescript or purescript.

The name sequence comes from a haskell function that does something quite similar for any Applicative and lists (In your mind, replace the f type param with Decoder and the t parameter with List in this signature).

EDIT: for clarity: List (Decoder a) -> Decoder (List a)

1 Like

This kind of thing is what I tend to do (with the current syntax), it is more verbose but also more explicit and less error prone.

personDecoder : Decoder Person
personDecoder =
    Decode.map2
        (\first last ->
            { first = first
            , last = last
            }
        )
        (Decode.index 0 Decode.string)
        (Decode.index 1 Decode.string)

I personally wouldn’t cry if the function constructor for record types was removed. A record literal seems like it is always more readable and doesn’t have any implicit order problems.

6 Likes

Yeah, I’ve used that pattern a few times myself when I wanted to be extra careful. I also think it’s a useful pattern to show beginners since it more clearly highlights what’s actually going on/how Decode.map2 actually works.

1 Like

Personally, i only use miniBill/elm-codec for writing Json encoders/decoders. Most often i need both anyway.

person : Codec Person
person =
    Codec.object Person
    |> Codec.field "first" .first Codec.string
    |> Codec.field "last" .last Codec.string
    |> Codec.buildObject

This gives me type safety but if i only need either just the decoder or just the encoder then this is really not every optimised.

@Philipp_Krueger that would require a big change in Elm’s typesystem which is not going to happen any time soon if ever

@Lucas_Payr actually, elm-codec has the exact same problem, if you swap the two field calls you will get the wrong result. Don’t get fooled by the strings for the property names, those are just to make the resulting json nicer

2 Likes

I thought that .first and .last would save me here. But yes, you are right!

Also records constructors are used for a lot more APIs than just JSON. They can be used for example by any API that includes mapN and/or andMap functions, including generators (ex: elm/random), parsers (ex: elm/parser), and others decoders (ex: elm/bytes).

The examples use JSON decoding because it is likely the most well known use case at the moment.

3 Likes

Not neccessarily.
If you want to be able to define recordSequence within Elm, then, yes, that would be a big change in the type system.
If you only want to be able to use recordSequence, then you can build it into the compiler. The type checker still needs a change, as it will have to special-case recordSequence, but if it’s considered useful enough, then that’d be a non-breaking change.
Such a language-design approach is taken by the configuration language Dhall, look for the toMap and merge keywords in the Dhall Cheetsheet. Even though these two keywords in Dhall look and work like functions, they don’t have a type, so can’t be used un-applied.

Edit: Let me clarify. I wouldn’t build in the exact recordSequence for Decode. Rather, I’d suggest building in a recordSequence keyword that expects another argument with any map2 function, so it would work with Codec and Decode and built in code would be independent of any actual implementations like Codec or Decode.

The { x } syntax seems very natural coming from JS.

I didn’t notice until you said this, but actually the { x } syntax in JS means something different – it’s syntactic sugar for { x = x }, whereas here it means \x -> { x = x }. I’d worry that this would confuse newcomers, although the compiler might be able to tell that the user was using it “as though” it were JS and give a helpful error message.

I think this confusion is made even worse by the { x, y } syntax. In JS, { x, y } and { y, x } mean the same thing – here, they don’t.

I think the underlying idea, of record constructors having “tagged” arguments, is clever, but I think it needs a really good syntax for “tagging” arguments, and I’m not sure this is it.

4 Likes

I agree that the proposed syntax additions here look like they might create some confusion. They don’t map neatly onto features in other languages I’m aware of, while also being “similar but different” to JS in a way that could be harmful.

It’s certainly true that this is something that comes up a lot and if we take an honest step back we probably all agree that there’s an underlying weakness in the Elm ecosystem here. The use cases for which Elm is primarily aimed aren’t brilliantly served by the current approach to JSON codecs. My current solution to the problem is to use the Elm IntelliJ plugin, which generates them for me. Adding language-level features to address this shortcoming is also an option and this could take the approach you outline above (new syntax or features to mitigate the current problems), or even new concepts that address the issue (in my day job I primarily use Scala, where the typical approach is to have the compiler generate the boiler plate of codecs).

Should we let tooling (e.g. editors) solve this problem, or is there a sense that error-prone and verbose codecs are one example of a weakness in Elm that needs addressing at the language level?

I confess that for the most part I am satisfied with Elm being a little more verbose and explicit than I’m used to in other languages, and the codec problem only feels different because:

  • JSON codecs are so ubiquitous
  • models evolve during development so the fragility of the current approach concretely causes hard-to-spot bugs

In the spirit of sharing experience from other areas I’d once again direct you to IntelliJ’s excellent codec generation feature.

The experience in Scala is also fairly pleasant (circe is a popular JSON library, other libraries take a similar approach):

import io.circe.generic.semiauto.{deriveEncoder, deriveDecoder}

case class Foo(n: Int, s: String)

val fooEncoder = deriveEncoder[Foo]
val fooDecoder = deriveDecoder[Foo]

These codecs are generated directly from the type definitions, so they are always correct. The underlying codecs are sensitive to order but because the compiler made them we know for sure that the order is correct.

Now you can use these elsewhere in the codebase (Scala often prefers methods over functions, that wouldn’t be the best approach in Elm):

import io.circe.syntax._
import io.circe.parser.parse

// returns String
Foo(1, "hi").asJson(fooEncoder)
// returns Either[<Error type>, Foo], same as Result in Elm
parse("""{"n": 1, "s": "example"}""").as[Foo](fooDecoder)

My sense is that introducing this power into the language would be a big change for Elm, but if it’s possible to do this in tooling then this may be easier to adopt. What if there was a code generation step in Elm’s ecosystem? Should this be a part of the compiler? A separate command? Something else? What other similar problems would be solved with this power?

the root cause of these issues is actually the signature of records type aliases constructors

I guess this is the bit I disagree with! Given Elm as it currently is, changing record constructors could mitigate the problem of JSON codecs, but if we look deeper there are other approaches available and it is worth thinking about what we are trying to solve, and what trade-offs we’re prepared to make in doing so.
If the existing IntelliJ (or the myriad online generators) approach of Java-style code generation in the editor is enough by itself then great!
Solving this problem in the language would mean making Elm more powerful. I think it’s a challenge to find a way to do this that doesn’t do one or more of:

  • increase the complexity of the syntax
  • add a new concept that needs to be internalised
  • create a new powerful feature that can be (ab)used elsewhere in the language

Again, this is about building records safely in APIs.
JSON decoding API is only one of many APIs doing it with records constructors, just probably the most ubiquitous currently.

3 Likes