Announcing Avro for Elm

I’m very please to announce my Apache Avro library for Elm.

I wrote this library originally because I was using the Haskell Avro library and wanted to explore different design directions as I was hitting limitations due to its type class heavy approach and lack of consistency between schemas, encoder, and, decorders.

It has worked out so well that I’ve ported this library back to Haskell for my own projects.

My longer term goals are to use this in a product for permitting elm to be used as the language for Kafka streaming applications – but that’s another story.

This library covers the core binary protocol of Avro and its types, as well as schema resolution and canonicalisation; and leaves container formats, RPC, and things like the Kafka Schema registry to downstream libraries.

While writing it I’ve written more than 100 tests, and found a bunch of bugs in other implementations, and found legitimate bugs in the specification itself.

The Elm Avro API is designed to be very tidy but powerful, eschewing code generation for a simple builder pattern (technically it’s an applicative profunctor) for constructing records and building encoders and decoders.

Here’s an example for a decoder for an Elm record type:

import Avro
import Avro.Codec exposing (..)
import Avro.Schema exposing (Schema, SchemaMismatch)
import Bytes.Decode exposing (Decoder)
import Bytes.Encode exposing (Encoder)

{-| Declaration of the data type we want to encode and decode.
-}
type alias Person =
    { name : String, age : Maybe Int }

{-| Build the Codec for it.

This type includes a Schema, encoder, and decoder.
-}
personCodec : Codec Person
personCodec =
    success Person
        |> requiring "name" string .name
        |> optional "age" int .age
        |> record { baseName = "person", nameSpace = ["demo"] }

{-| A byte encoder for a person.
-}
encodePerson : Person -> Encoder
encodePerson =
    Avro.makeEncoder personCodec


{-| Build a decoder for data written using a schema.
-}
decodePerson : Schema -> Result SchemaMismatch (Decoder Person)
decodePerson writerSchema =
    Avro.makeDecoder personCodec writerSchema
11 Likes

Hey @HuwCampbell – this looks interesting!

What are the advantages/disadvantages of the Avro format over other serialisation formats, and what makes it the “first choice for streaming data pipelines” ?

1 Like

Avro has the most expressive type system of the well known binary file formats, and while it’s not quite full sums and products (and fixed points over functors), it gets pretty darn close with its unions and recursive named structs.

Most things you would want to serialise from Elm can be easily done in Avro in a type safe way. It’s also pretty good for Schema evolution, new constructors can be added to unions for instance, and simple types made optional.

It’s also very compact, because it doesn’t tag record fields, and enums and union tags are just simple integers; unlike protobuf where all fields must be permanently tagged up front (and unions just don’t exist). It can do this under the understanding that programs reading the data have access to the schema the data was written with.

My statement that it’s the first choice amongst streaming pipelines is a based on it being the de-facto standard across the Kafka ecosystem, which then flows on to Flink and friends.

Mostly these applications are written for the JVM, and use generated code to go from an Avro schema to Java classes. It’s a bit clunky and not 100% type safe, but it works well enough for most businesses.

4 Likes

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.