Status Update - 3 Nov 2021

My exploratory compiler work has been going very well! It has been great to work like I used to back in 2011 and 2012, and I wanted to share some of the technical results to give a better idea of what I have been up to.

The ideas I will cover here are:

  1. dense binary format for Elm types to minimize disk/network usage
  2. static evaluation to avoid intermediate allocations
  3. primitive types like Int16 and UInt64

If someone were looking into making it really fast and easy to communicate with a database from Elm, these could be very valuable building blocks. If such a project did not ultimately pan out, these ideas would still be quite valuable just in the browser, ideally without massively disappointing people who might have obtained unrealistic expectations about the importance/usefulness/plausibility of the overall project.

I plan to open source the final results of the broader exploration under the normal BSD-3-Clause that I use, but I am working more like when I was doing my thesis in 2011 and 2012: Put in a lot of serious work in a safe/comfortable work environment with healthy feedback loops, and publish when you are fully ready to present and defend your technical results. The broader exploration still has many difficult open questions that I have not answered for myself (let alone everyone else!) so I expect the remaining work to take a while longer.

I realize that this working style clashes heavily with the Silicon Valley style of using hype and reckless urgency to achieve GROWTH and DOMINANCE, but I needed to make some changes in my life to keep sane. I appreciate that these changes have implications for people with bosses or consulting clients who want the “corporate open source” style of emphasizing marketing and customer service, so I very much appreciate your patience with my present working style and hope you might find these early results interesting.

Binary Format

Having a binary format for Elm types could be useful for storing Elm types on disk and for sending data between frontend and backend. These are both areas of interest for the broader exploration, so getting a basic design here was a necessary first step.

A common runtime representation uses “boxed” data, so a record would be a sequence of pointers. Each pointer pointing to the actual data held in the field. This representation has worthwhile benefits at runtime, but it does not make sense for on-disk storage or network transmission. I am also interested in monomorphizing compilers like MLton that use unboxed data, so I saw this little project as a way of exploring these larger concepts. I think going through the basic design will clarify this somewhat.

In the compilation environment I am exploring, variable length data always starts with a uint32_t stating the length of the entire chunk of data. So with a type like Maybe String I use a memory layout like this:

                   ___LENGTH__   TAG   ___DATA___...
                  /           \ /   \ /
Nothing      =>  | 00 00 00 05 |  00 |
Just "ok"    =>  | 00 00 00 07 |  01 | 6F 6B
Just "mode"  =>  | 00 00 00 09 |  01 | 6D 6F 64 65

So everything is “unboxed data” here. This means that the value Just "ok" is in a format that can be stored on disk or sent over the network directly, no special serialization is needed.

Again, I need to have the length of the whole piece of data up front, so one neat thing is that I can reuse that information to tell me the length of the string. In this case it is always LENGTH - 5.

This gets a bit trickier when you have multiple variable length values though. Take the following record type:

type alias Fruit =
  { name : String
  , quality : UInt8
  , price : UInt8
  , desc : String
  }

apple  = { name = "apple" , quality = 7, price = 2, desc = "red"              }
banana = { name = "banana", quality = 1, price = 1, desc = "serious bruising" }
mango  = { name = "fig"   , quality = 9, price = 4, desc = "ok"               }

I want each field to be addressable in constant time, so I put all the fixed size fields up front. Accessing the variable length data in constant time means you must know the relative offset, so I store that next. Finally, the variable length data is stored.

              __LENGTH___   _FIXED_   __OFFSET1__   ___VARIABLE___...
             /           \ /       \ /           \ /
apple   =>  | 00 00 00 12 | 07 | 02 | 00 00 00 0F | applered
banana  =>  | 00 00 00 1F | 01 | 01 | 00 00 00 10 | bananaseriousbruising
mango   =>  | 00 00 00 0F | 09 | 04 | 00 00 00 0D | figok

Fixed size data like fruit.quality and fruit.price always have a fixed location, so they can be accessed with expressions like fruit[4] and fruit[5]. The offsets also have a fixed location, so we can always find the offset and length of our variable length data in constant time as well.

           | ptr             | len              |
           |-----------------|------------------|
fruit.name | fruit + 10      | OFFSET1 - 10     |
fruit.desc | fruit + OFFSET1 | LENGTH - OFFSET1 |

This scheme means that for N variable length slots, you need N-1 offsets.

There are lots of tricky ways to improve upon this. One is “flattening” types, so if you have Just (3,4) it would be ideal to flatten it into | 01 | 03 | 04 |. Another is using uint8_t offsets when the value meets certain length requirements. Etc.

One tricky thing with these ideas is that changing the types of the fields changes the memory layout quite significantly. This has implications for functions like Maybe.map which are typically compiled to always expect boxed data. “Just follow the indirection!” With unboxed values, it becomes important to have a monomorphizing compiler so that Maybe.map f m behaves correctly for all the possible memory layouts that might be flowing through.

I implemented the “flattening” idea, but held off on the uint8_t idea for now. It may turn out that my enthusiasm for saving bits will be outweighed by alignment considerations or something, so I want to be in a situation where I can do realistic profiling to assess the tradeoffs there.

After doing some implementation work, I got to a point where I felt like “I have enough intuition/experience to feel confident that something nice can be done here” so started exploring some of the implications of these kinds of layouts.

Aside: For people interested in network formats, like past me, this format might remind you of FlatBuffer or Cap’n Proto. I think using fixed-size numbers has some cool benefits for my specific scenario, so I ended up exploring that first. Ultimately, I want this sort of thing to be a compiler concern, so I think profiling will be the way forward there.

Static Evaluation

One way to monomorphize a program is to start with known input types and just evaluate it. As you evaluate, you will be computing concrete types at every step.

Take a program like this:

maybeIncr : Maybe UInt8 -> Maybe UInt8
maybeIncr maybe =
  map increment <| map increment maybe

increment : UInt8 -> UInt8
increment n =
  n + 1

map : (a -> b) -> Maybe a -> Maybe b
map func maybe =
  case maybe of
    Nothing -> Nothing
    Just a  -> Just (func a)

I worked on an implementation that evaluates this program at compile-time, leaving holes for values that will be provided at run-time. For maybeIncr it would go something like this:

    map increment <| map increment maybe

    -- eval 1st map
    map increment <|
      case maybe of
        Nothing -> Nothing
        Just x  -> Just (x + 1)

    -- eval 2nd map
    case
      case maybe of
        Nothing -> Nothing
        Just x  -> Just (x + 1)
    of
      Nothing -> Nothing
      Just y  -> Just (y + 1)

    -- apply case-in-case rule to invert the order
    case maybe of
      Nothing ->
        case Nothing of
          Nothing -> Nothing
          Just y  -> Just (y + 1)

      Just x ->
        case Just (x + 1) of
          Nothing -> Nothing
          Just y  -> Just (y + 1)

    -- evaluate cases as far as possible
    case maybe of
      Nothing ->
        Nothing

      Just x ->
        Just ((x + 1) + 1)

Knowing the type of maybe means I know the type of everything else in this program as well. So the very rough pseudocode for the statically evaluated maybeIncr is something like this:

uint8_t* maybeIncr(uint8_t* maybe)
{
  if (maybe[0] == 0)
  {
    return {0};
  }
  else
  {
    return {1, maybe[1] + 2};
  }
}

The allocation of the return value works way different than this, but that is the rough idea! The neat things with static evaluation are:

  1. No intermediate allocations. You only allocate at the very last moment. (Two allocations become one.)
  2. The number of branches to get to a return is equal or less than before. (Two branches became one!)

The cost is that the case-in-case rule can increase code size quite a bit in some cases! Or to put it another way, how does this work on recursive programs? Basically, you have to stop static evaluation at some point.

Language like Datalog and SQL restrict use of loops and recursion, favoring of constructs that can be given more efficient implementations. Languages like Lucid Synchrone use this kind of restriction in very interesting ways, providing nice upper bounds on time and memory usage for certain operations. Totally unrelatedly, I expect the function call depth to naturally be pretty shallow in the domain I am exploring, so I felt comfortable starting with the naive approach of just evaluating as much as possible.

One neat thing about this static evaluation work is that it echos a design from my thesis! I needed to figure out the Signal graph statically to guarantee certain properties. My implementation never actually did this work at compile-time, but my thesis showed that such an implementation was possible. The motivation now is pretty similar, but (1) with different types and properties and (2) I actually MUST do the work at compile-time this time :sweat_smile:

And if my naive static evaluation ends up producing code that is too large (maybe instruction locality is harmed?) I only MUST do it a small and finite amount, so once the overall exploration wraps up, it may be worth looking into balancing the depth of such optimizations.

Primitive Types

It is very common to use specific size integers in databases. A field for human height in centimeters might reasonably be expected to always contain a value less than 65536, so you could use a UInt16 to save some space. (Unfortunately, there are six people who were over 255cm, so it cannot safely be a UInt8.)

Totally unrelatedly, I have been looking into types like Int16, UInt64, and arbitrary size integers. Everything was pretty straight-forward for fixed-size types like Uint8 and Int32, but I wanted to make sure I would be able to handle arbitrary size numbers as well. I ended up exploring a BIGNUM implementation that revealed some interesting scenarios.

When compiling x + 1, Elm just generates a JS + and lets things happen from there. That is not workable in C where the BIGNUM library needs to malloc memory for the result of BN_add or BN_mul. I need to know the specific type being added or multiplied, so static evaluation helped a lot here!

A fun part of all this was learning about Sethi-Ullman Register Allocation. That helped me minimize the number of BIGNUM allocations, and I made some tweaks to reduce allocation a bit more with usages of BN_add_word and BN_sub_word. So an expression like x ^ 8 + 12345 * x + 1 would become something like this:

    // allocate just enough
    BN_CTX* ctx = BN_CTX_new();
    BN_CTX_start(ctx);
    BIGNUM* reg1 = BN_CTX_get(ctx);
    BIGNUM* reg2 = BN_CTX_get(ctx);
    BIGNUM* reg3 = BN_CTX_get(ctx);

    // 12345 * x
    BN_set_word(reg1, 12345);
    BN_bin2bn(x, x_length, reg2);
    BN_mul(reg1,reg1,reg2,ctx);

    // _ + 1
    BN_add_word(reg1,1);

    // x ^ 8
    BN_set_word(reg3, 8);
    BN_exp(reg2,reg2,reg3,ctx);

    // _ + _
    BN_add(reg1,reg1,reg3);

    // convert to disk/network format
    BN_bn2bin(reg1, ptr);

    // free everything
    BN_CTX_end(ctx);
    BN_CTX_free(ctx);

I can definitely improve my implementation in a few ways, especially when it comes to casting expressions and error handling, but again, the point was to make it good enough that I felt comfortable moving on to other questions.

Aside: My simple approach to static evaluation requires knowing the input types statically. One tricky additional case is an integer literal like 42. Is that an Int32 or UInt8 or something else? This is basically the one case where I need to carry type information through from type inference, so one pretty big benefit of exploring primitive types has been that I figured out how to do that. This learning can help optimize a few frontend cases, like certain (==) and (<) uses, so I expect that to come in handy no matter what happens with the overall exploration.

Personal

If you made it through all that, I have some more personal updates as well!

As I allude to in the roadmap, nearly ten years of working in constant interaction with “silicon valley mindset” had taken a toll on me in many ways. Within the dominant value system, there are specific rewards and punishments for specific behaviors, and despite my personal views on this value system, I had internalized certain patters of thought and behavior by interacting with it online.

Physically moving to a region with different cultural norms gave me the distance to understand this more fully. I started recognizing the patterns that I had absorbed, and I began to figure out alternatives that would be personally sustainable.

One aspect of all this is wanting to fund my work on Elm in a way better way. My goal there is to have a more direct feedback loop. It always used to be:

       ME
     /    \
    /      \
Company   Compiler
   ^        |
   |        V
  ???    Elm users

The ??? was always something that I had relatively little control over. Will ??? punish me for prioritizing X over Y? Should I just work to the absolute limit and hope ??? is satisfied? This uncertainty was not so good psychologically. Without getting into specifics, my experience suggests that it is very difficult to please ???, particularly when VC people get involved.

I think larger organizations often have more leeway to push back on VC or investors on having one guy to do something, but I think the ??? starts to get more and more sketchy in these cases. I outlined some ways in which “corporate open source” can be used to solidify market dominance, like lowering Traffic Acquisition Costs.

Many open source projects fix the ??? by having a more direct relationship with the users:

      Team
    /     \
   /       \
Users <- Open-source software

People give away software, and that demonstrates some expertise that some fraction of users might want. Something that is totally optional for anyone using the project, but some users might be into it for whatever reason. Maybe that is consulting. Maybe it is a product like hosting, monitoring, analytics, etc.

You can see variations of this in Elixir, Julia, and elm-ts-interop, or in cases I know less about like Clojure, Scala, nginx, Red Hat, etc. I talked with José Valim of Elixir and Jeff Bezanson of Julia about some of these options here and on Elm Radio here. I do not think “corporate open source” works for every project, so I find these past instances of “sustainable open source” and “independent open source” encouraging. People get to use neat open-source languages, and the authors can also have a more typical life and work environment.

My wife (Tereza) and I can support our present full-time work indefinitely with the aid of periodic consulting contracts, but based on our personal skills and dispositions, our goal is to help people get their programs working more easily though the hosting/monitoring/anaytics type pathways. Maybe a little “deploy” button in the online editor so people can put up hobby projects really easily in a way that directly supports work on Elm. Whether the people that like such a thing are newcomers, students, scientists, hobbyists, small companies, etc. I think any of these would create the great kind of self-reinforcing loop described in this talk that makes the overall usership more interesting and diverse. And no VC genius can come in with the genius plans they learned from studying United Fruit at Harvard Business School.

Anyway, I hope you found this post somewhat interesting. I really appreciate the people who have been supportive of my present working style and priorities. I have found it really invigorating to work more like when I was creating Elm in the first place. Low expectations, comfortable work environment, deep focus on technical foundations, psychologically healthy feedback loops, etc. So I hope that people will not get too excited about anything I have shared here. I said it best in the roadmap, but I want to emphasize it again. There is a lot more work left to do on these explorations. Maybe it will be cool. Maybe not. If you like what you see in 0.19.1 now, that’s pretty much what Elm is going to be for a while!

Finally, shout out to people like Mario, Dillon, Jeroen, and Luca who have found ways to take on some of the community tasks with online meetups, podcasts, and blog posts! And the folks posting cool Elm projects on Twitter! As I come to accept certain personal limitations with online interaction, it is encouraging to see others creating and sharing wholesome programs and events! :blush:

127 Likes

I’m glad to hear you’ve found a way that works for you :blush:

9 Likes

Thank you for all those interesting news! Pretty excited to see that land in the compiler!

It seems like you look to be interested in “backend stuff” (e.g. with will to communicate with SQL with a dense binary format). So is the implicit meaning of your post that Elm is moving to something more “fullstack”?

Also I noticed you move the target language from JS to C (with the BIGNUM part). Do you plan using some Elm → C → WASM pipeline?

5 Likes

Thank you for this update! It’s really satisfying to know what “some exploratory work with compiler techniques and targets” is, even knowing that it might not pan out.

I really appreciate your effort and dedication to making Elm a solid and reliable foundation, and look forward to hearing more as your work continues.

Thank you,
David

7 Likes

Hey, very interesting to hear some updates. Excited to see what it will eventually turn into. Also I am glad you seem to have found a way to make working on Elm more pleasurable and enjoyable for yourself again and I hope it stays that way given all the different economic and social forces you outlined.

2 Likes

thanks for the update, evan! :slight_smile:

looking forward to the new possibilities these serialization changes will open up. I can imagine frameworks like lamdera and wasm efforts would greatly benefit from it, right?

3 Likes

Thank you for sharing about this exploration, very interesting.

As someone “looking into making it really fast and easy to communicate with a database from Elm,” when I hear “communicate with a database,” in my mind it always entails an existing server product with its wire protocol already defined (say Redis, or Postgres, or Mysql). The part 1 of the update talks about a dense binary format for Elm types. Assuming the case of an existing-database-product-driven application, wouldn’t the conversion between backend’s on-the-wire bits and this specific dense layout negate a solid chunk of performance gained from described optimizations?

Can a compiler be made to adapt memory layout of its types based on a compile-time definition, instead of having a single specific binary layout?

For example: say you want to communicate with a database backend and have a way to feed the wire protocol directly to the browser (let’s use Postgres for an example, though any would do). Your database instance has a schema (type definitions) and a stable wire protocol: strings consist of utf-8 chars prefixed with an Int32, a record is tagged with a UInt8 value 0x44 followed by UInt32 length, followed by UInt16 for number of fields; a -1 in the field length position indicates a Nothing value of a Maybe type (nullable value), etc. With appropriate api design we can also order fixed-length and variable length fields in records as well.

If there was a way to feed the schema and a definition of this binary layout in some form during the compilation step to create a program that natively lays out its types in memory the same way as my backend is sending them, my client-side network buffer would contain a list of records laid out as if I created it in the local program, so there would be no need for ports or to consider the bits to be foreign in any way.

How much of the speedup would that be over generating a serializer/deserializer library for the single micro-optimized type layout? My intuition is it’d be a solid win for this specific class of applications, as they are dominated by data exchanges between the front end and the back end.

To my knowledge, no general purpose programming language does that. There’s the “The Bits Between the Lambdas” paper exploring something along those lines for Haskell, and there are a few protocol-parsing languages, but I believe those create serializing/deserializing code and do not necessarily introduce specific type layouts, though I did not explore that in depth yet.

Is such a design even possible?

Apologies in advance if the discussion is uncalled for or the subject too far off the mark.

–
[1] For some background: I’m presently feeding postgres wire protocol to the browser, and generating binary parsers in elm from a database schema and the api definition. Even with the slowness of Elm binary parsers compiled to javascript, this results in at least an order of magnitude speedup for the whole application compared to “classic backend” design. Removing the “middle-tier” backend and feeding the wire protocol directly to Elm in the browser brings latency from >150ms down to single-digit-ms per request. The 140+ ms delta turns out to be the backend code unproductively shuffling bits around on the server through the database driver and endless conversions to/from programming language’s types. I often think about how to speed things up further. Obvious next steps are parsing with web assembly instead of Elm (and dealing with interfacing between Elm and such parsers), but having a compiler that produces programs with zero binary layout conversions would probably be the most efficient way.

13 Likes

What a great update! Thanks so much @evancz. So glad you have found your happy working stride. Looking forward to seeing where all your research and proofs land in the end.

2 Likes

I was researching Protobuf vs Flatbuffers for usage in Elm when I luckily landed on this post. As other have said, very good to see explorations happening for Elm again!

Would it not be possible to integrate Flatbuffers (or variant) directly with Elm? Or is it that the concept of it being a “compiler concern” is more promising that this warrants a custom approach? I understand the performance would of course be better if the serialization format is in line with Elm’s memory layout, but perhaps you have considered an integration of the aforementioned.

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.