Protocol Buffers using bytes: `elm-protocol-buffers`

Some time ago I started working on implementing Protocol Buffers in Elm (see also here). Protocol Buffers can be used to ensure type-safe and future-proof communication with your back-end by defining .proto files containing a explicit interchange format. For example, here’s a service that expects a search request and gives a response:

service SearchService {
  rpc Search (SearchRequest) returns (SearchResponse);
}

message SearchRequest {
  string query = 1;
  int32 page_number = 2;
  int32 result_per_page = 3;
}

message SearchResponse {
  repeated Result results = 1;
}

message Result {
  string url = 1;
  string title = 2;
  repeated string snippets = 3;
}

Here you can find more information on Protocol Buffers. Basically, all data is encoded as a sequence of Bytes, leading to less data and faster (de)serialization. The package I built can be used to create encoders and decoders that handle the conversions for you.

In contrast to JSON (which is decoded as an object), Bytes come in sequentially where each key-value pair (field) is send after each other. My initial approach was to decode the Bytes sequence into small chunks in a Dict and then decode each field, like mimicking an object. However, this

  • requires the decoder to touch ‘every’ chunk twice;
  • requires the decoder to re-encode some Bytes;
  • suffers from a bug in elm/bytes.

So my current implementation handles things quite differently. When decoding a message, a default record is defined. Then, for each field that is encountered, that message is updated with the new value. This allows for

  • a linear flow of decoding (stream);
  • only touching the Bytes sequence once (which is faster);
  • an implementation that is easier to read and maintain.

However, it does slightly degrade the API as it now requires setter functions that tell the decoder how it can update a record’s field, when encountered. My next goal is to enable generating decoders & encoders for this package directly from .proto files. This would also make the need for these setter functions less painful.

Although for now you need to write your own encoders & decoders, it would be great if you can have a look at my package for some early feedback. For now, the documentation can be previewed via elm-doc-preview.

13 Likes

Nice work. Do you have some example code that intetacts with gRpc endpoint?

1 Like

Do you have benchmarks you can share? JSON.parse is notoriously hard to beat, so we’re very interested in this kind of data

2 Likes

Thanks. Unfortunately, I do not know any publicly available ProtoBuf/gRPC server. You can test the package using a node server like this one by running something like this gist: Example usage of elm-protocol-buffers · GitHub.

That statement is mainly based on ProtoBuf in general. Currently, I am focusing on completing my protoc plugin to get the code generation part done. Once that is complete I’ll definitely invest some time in running benchmarks and sharing the outcome here. I am also curious to see how good it performs in Elm.

2 Likes

I think this is brilliant. So far at my job we’re generating encoders/decoders/request handlers, but it’s for a gRPC-JSON gateway.

We won’t be able to try this immediately, but I’m going to follow the development closely.

2 Likes

I’ve also started working on a parser for .proto files some time ago, but unfortunately, I don’t think I can make time to continue working on it anymore, so I wanted to share it here. Maybe someone has some use for it!

Here’s the link: https://gitlab.com/arkandos/elm-protobuf-compiler/blob/master/src/

As far as I remember, it should parse v3 files correctly. You can open the Main.elm file in reactor to get a simple input / compiled output (using Debug.toString) view to experiment with it. However, there are still a lot of things to do:

  • There is no validation. So you could reserve a field and also use it.
  • Imports are not handled at all (they are parsed, though). This probably requires some more work to schedule the files to the parser and to resolve all the symbols in the validation step correctly.
  • It doesn’t use the Parser.Advanced module, so error messages could be greatly improved. It also does not store any position info on tokens that could be used to make better error messages.
  • Of course, there is no code generation either, because I would have had to write your package first (which looks great, btw!).

I’m also pretty sure that code generation could get rid of a lot of those getters/setters, but might require to inline a lot of the functionality of those functions.

Thanks for sharing your project, @jreusch. Initially that was the direction I wanted to take as well. Then I found out protoc actually provides a nice plugin interface described by a .proto file. I’ve build a thin node wrapper around an Elm project using my own elm-protocol-buffers to receive a request from protoc over a port and send back a response containing the output files. So in the end this plugin should be able to generate the files I now have written manually :slight_smile:

This looks exciting. I generated Json decoders from proto files for a project a while back. It would be nice to get rid of some of the overhead of JSON/http.

2 Likes

I’m very excited to see this! The API feels quite natural.

I decided to take a stab at writing a compiler plugin (https://github.com/adeschamps/protoc-gen-elm). I started by sending stdin/stdout across ports and handwriting a minimal subset of the types, decoders, and encoders from plugin.proto and descriptor.proto. I now have a protoc plugin that can read a CodeGeneratorRequest and generates a CodeGeneratorResponse by using https://package.elm-lang.org/packages/stil4m/elm-syntax/latest/ to construct and write an Elm syntax tree. It works for simple messages - there are still plenty of protobuf features that it doesn’t handle. I’m curious how much/little work would have to be done before it can generate the encoders and decoders that I was writing by hand.

Is your code generator available to contribute to? I haven’t put too much time into this yet, and if you already have something going then it would be nice to collaborate on that.

1 Like

I started the same some time ago, but using the canonical JSON representation. The code is not public at the the moment. Maybe it’s time to go back to it. I’m happy to chat about that.

I always get a strange feeling when I start thinking about being able to round trip, generating Google/Protobuf/Descriptor.elm :slight_smile:

Currently, I have an basic generator running (exactly the way you described). I can successfully decode a CodeGeneratorRequest, for which I had to write all decoders by hand initially. I am now working on generating modules from the request. Here I got delayed as a found out the API was missing a Decode.lazy for recursive structs (like DescriptorProto). Once it is mature enough I will make it public as it will support using elm-protocol-buffers.

I am not sure what you are referring to here: generating code or de/encoding ProtoBuf JSON? Feel free to ping me on slack @eriktim.

Ah, yes, I noticed there would be a challenge with recursive types. I decided to just leave out the recursive fields at first.

Generating code, by first implementing DescriptorProto in Elm, manually. I started this by reading a JSON representation, but that does not influence the types.

I’m realizing that recursive types are quite challenging. I initially felt it would be nice to put nested types in their own modules. So, for example, this protobuf definition:

package google.protobuf;
message FieldDescriptorProto {
    enum Type { /* ... */ }
    optional Type type = 5;
}

would generate the following modules:

module Google.Protobuf exposing (FieldDescriptorProto)
module Google.Protobuf.FieldDescriptorProto exposing (Type)

However, aside from the issue of recursive types (which @klaftertief pointed out can be handled by wrapping messages in their own custom type) this approach would also lead to cyclic dependencies between modules, which the compiler will reject.

So I think, although I may be wrong, that each .proto file file has to translate to one and only one .elm file, otherwise you wind up with dependency issues. Unfortunately, I think that also means that nested types would need to have names like type FieldDescriptorProto_Type, which seems unfortunate, so I hope there’s a more clever solution.

I did a specialised, closed source Elm protoc plugin a while back (not implemented in Elm), and there was one Elm module for one .proto file, leading to really long type names. Imports were aliased to protobuf package names. I’m not really sure what protoc checks with regards to packages, but maybe this can be used a little bit more then only aliasing.

Currently, I am using the following approach:

  • Multiple .proto files may lead to one .elm file as I group them by their package. I feel this makes sense as that is what the .proto file describes. Recursive types shouldn’t be a issue here. For example, I generate:
module Google.ProtoBuf.Compiler exposing (CodeGeneratorRequest, ...)

import Google.ProtoBuf

type alias CodeGeneratorRequest =
    { fileToGenerate : List String
    , parameter : String
    , protoFile : List Google.ProtoBuf.FileDescriptorProto
    , compilerVersion : Version
    }
  • Nested type names are prefixed by their parent’s type name. This may indeed to long names. However there is no way to ensure uniqueness without doing so. If you do not want this to happen you should not use a nested type;
  • Probably I wont be able to cover all cases on name collisions initially, as I am first working towards a usable release now.

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.