Benchmarking overhead of bytes bounds checks


#1

This relates to elm/bytes issue 9: Decoder suppresses stack overflow.

Currently, any error thrown in a bytes decoder is caught and makes the decoder return Nothing.
The reason is that a read (for say a 32-bit integer) can try to read bytes that are out of bounds.
But, exceptions caused by infinite recursion or Debug.todo are also caught and turned into Nothing.

When a byte out of bounds is accessed, a RangeError is thrown. The problem is that infinite recursion (i.e. a stack overflow) will also throw a RangeError, which makes it hard to distinguish these two cases.

But ideally, a Decoder would only result in Nothing when it does out-of-bounds access.
If that were the case you are immediately alerted of infinite recursion or a Debug.todo in a decoder, and when the decoder results in Nothing you know what to look for (too many bytes are read).

I see two options:

  • more fine-grained try/catch such that the only caught exceptions must be out-of-bounds access
  • bounds checks on the buffer before access

In both cases the rest of the code is not in a try/catch block so if exceptions occur they will bubble up.

Benchmark

Evan points out in the issue that the bounds check might be expensive. So I made a benchmark (https://jsperf.com/bytes-bounds-check/6):

In Chrome (72.0.3626)

  • the bounds check a third slower (when the offset is valid)
  • a try/catch has no effect on performance when no exception occurs (why is it faster than the baseline though?)

In Firefox (64.0.0)

  • bounds check is a quarter slower (when the offset is valid)
  • a try/catch still has no effect on performance when no exception occurs

Conclusions

  • An up-front bounds check is significantly slower for valid offsets
  • A try/catch block has no impact for valid offsets

Open questions from my side:

  • are the test cases correct, maybe they are optimized in a way that the elm/bytes code could not be?
  • I tested this with recent(ish) chrome and firefox on linux. Maybe older/other browsers give different numbers?

Edits
Thanks @joakin for pointing out that the Chrome JIT was optimizing in an implausible way. The benchmark was updated to pick a random offset on every iteration. This makes the numbers roughly consistent between browsers.


#2

Benchmarking can be tricky with the JIT in place. Most times where there is such a stark difference in the results, specially to the point of some being absurdly performant like in the chrome case, I’m pretty sure the benchmark is faulty and the JIT has found out that the code is a NoOp and is basically optimized it all away removing it, which is why it seems so fast.

I’ve run it on an iPhone (safari) and the results seem more like Firefox and less like Chrome.

I think you may need to rethink the benchmark.


#3

Thanks

I updated the benchmark to:

  • write the result to a global variable (I think that should not be optimized by the JIT)
  • pick the offset randomly on every iteration

This gives results in chrome that are roughly consistent with the FF and mobile Safari numbers we got.
Unless there are other obvious errors in these benchmarks I’ll try to locally modify elm/bytes tomorrow and run elm-benchmark to see what the numbers are like in practice.


#4

If the upfront bounds check is too expensive, it could instead be put in the try catch. This may get the best of both worlds.

I am thinking something like:


try {
    window.y = view.getInt32(offest);
} catch (e) {
    if (offset + 4 <= buffer.byteLength) {
        // Access is valid and error comes from another source.
        throw e;
    }
}