Resurrecting elm-lang/lazy

The readme for Elm lazy states that is it deprecated because it is often confused with delayed computation, but I’ve a project for which I think it’s really required.

I’m writing an Apache Avro library for elm, which I’ve mentioned here before. The API is super neat, and the implementation is so much smaller and easier to think about than all the other languages I’ve seen.

But it gets tricky at the point where schemas in Avro can be mutually recursive by their names. Without mutation, building these from user defined data can be very challenging to make efficient. Laziness can provide just enough implicit mutation to make it possible.

Now what I want to write is a function which builds a map of names to their decoders, but these decoders lazily depend on each other.

Something like this:

buildNamedDecoders : List ( String, ResolvedSchema ) -> Dict String (Bytes.Decoder Value)
buildNamedDecoders namedPairs =
    let
        environment =
            let
                single ( name, schema ) =
                    let
                        recDec =
                            lazy (\_ -> makeBytesDecoder environment schema)

                        decoder =
                            Decode.lazy
                                (\_ -> force recDec)
                    in
                    ( name, decoder )
            in
            List.map single namedPairs
                |> Dict.fromList
    in
    environment

Calls to makeBytesDecoder use the environment when they hit a named type.

But there are two issues which are stopping me.

  1. Elm’s recursive checks within let bindings are more strict than in modules (Let binding cyclic value detection is stronger than top level · Issue #2322 · elm/compiler · GitHub)
  2. The elm lazy library is deprecated.

Now the really nice bit about this implementation is that although makeBytesDecoder is actually quite costly in some cases, the resulting decoders absolutely are not, they’re almost completely linear.

Implementing without these changes requires building the environment map and decoder every time a named type is reached… every time a record is read, which makes it so much slower.

So what is the way to help shepherd changes like this? I’m happy to put in the work on the compiler and get the elm lazy module going again.

Will it work if you make environment a function?

Ok, that is going to infinite loop. So instead of using () as the function arg, a Dict of pairs already calculated is needed. This could be achieved by using Dict.foldl instead of map and building up the pairs in the accumulator . Then only invoke makeBytesDecoder when one has not already been created?

Looks tricky. Your call to resurrect elm-lang/lazy will fall on deaf ears I am afraid. You will need to find some other way to solve it.

I have a version which works, which does indeed use environment as a function, it’s this (more or less):

buildNamedDecoders namedPairs =
    let
        environment _ =
            let
                single ( name, schema ) =
                    let
                        decoder =
                            Decode.lazy
                                (\_ -> makeBytesDecoder (environment ()) schema)
                    in
                    ( name, decoder )
            in
            List.map single readPairs
                |> Dict.fromList
    in
    environment ()

The issue is that both the map, and the decoders are built every time a record hits a named type.
If you’re thinking of decoding lots of records, it adds up.

What about making decoders of Environment -> Value instead? This moves the need for an environment to a later time.

https://ellie-app.com/qqWtzyXn7TFa1

type Environment
    = Environment (Dict String (Decoder (Environment -> Value)))


makeBytesDecoder : ResolvedSchema -> Decoder (Environment -> Value)
makeBytesDecoder schema =
    Debug.todo "makeBytesDecoder"


buildNamedDecoders : List ( String, ResolvedSchema ) -> Environment
buildNamedDecoders namedPairs =
    namedPairs
        |> List.map (Tuple.mapSecond makeBytesDecoder)
        |> Dict.fromList
        |> Environment
1 Like

Janiczek was faster :+1:

I was about to suggest a slightly different approach, but using functions as well:

type Env
    = Env (Dict String (Env -> Bytes.Decoder Value))


buildNamedDecoderFunctions : List ( String, ResolvedSchema ) -> Dict String (Env -> Bytes.Decoder Value)
buildNamedDecoderFunctions namedPairs =
    namedPairs
        |> List.map (Tuple.mapSecond makeDecoderFunction)
        |> Dict.fromList


makeDecoderFunction : ResolvedSchema -> (Env -> Bytes.Decoder Value)
makeDecoderFunction schema =
    ...

And an example here: https://ellie-app.com/qr3XVk9rM22a1

1 Like

This feels a bit like an XY problem to me.
What are you trying to achieve? When implementing the Protobuf Plugin for Elm, I had my fair share of issues with recursion and Elm’s module system, but I never had to solve these with a map of schemas.
Maybe I’m wrong here, but at first glance Avro seems very similar to Protobufs. So if you are going a similar route of a base package of en/decoders and a codegen tool on top, I might be able to help.
Or are you not doing any code generation at all and decode the schemas at runtime?

Thanks folks.

Looks like pit’s solution works and is a bit faster than what I had originally.

My version had about 20% overhead compared to manually inlining up to N layers, while with those changes it’s only about 2% overhead.

Yeah, might be a bit of x/y, my goal was to not call the make decoder function more than once (which is still actually happening, as that was potentially a way of making things more efficient. I did build a fork of the compiler, and version of elm/lazy with the changes, but due to the way elm elaborates lets into javascript it didn’t seem to help much (it uses a js function anyway, which was still being called every pass).

Regarding the differences between avro and protobuf:

The biggest differences come from Avro being designed to be used without code generation. Schemas are required for parsing as fields are not tagged. So the binary data can only be interpreted when the writer’s schema is looked up at run time (usually from a Schema registry).

Avro also has a richer type system with few gotchas, which gives a better experience building codecs manually for Elm types (as opposed to generating said types and their codecs).

In the core library the user writes Codec, which are an invariant functor containing a parser, writer, and schema.

type alias Degree =
    { name : String }

type alias Student =
    { name : String, age : Int, sex : Maybe String, degrees : List Degree }

type alias Staff =
    { name : String, school : Maybe String }

degreeCodec : Codec Degree
degreeCodec =
    success Degree
        |> requiring "name" string .name
        |> record { baseName = "degree", nameSpace = [] }

studentCodec : Codec Student
studentCodec =
    success Student
        |> requiring "name" string .name
        |> requiring "age" int .age
        |> optional "sex" string .sex
        |> withFallback "degrees" (array degreeCodec) [] .degrees
        |> record { baseName = "student", nameSpace = [] }

staffCodec : Codec Staff
staffCodec =
    success Staff
        |> requiring "name" string .name
        |> optional "school" string .school
        |> record { baseName = "staff", nameSpace = [] }

person : Codec (Result Student Staff)
person =
    union studentCodec staffCodec

This is all good, but NamedTypes and recursive types (via naming) are also pretty common.

In those situations, the schema contains a type which is just a name to another type. So given a handful of writer schemas (which can not be known until run time) you need to construct a (potentially recursive) binary decoder.

1 Like

Ah that makes sense, thanks for the explanation!

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.