Plots and outlier removal in elm-benchmark 2.0.3

I ended up with what we have now by going through a lot of the same things the two of you are saying. :slight_smile: TL;DR: sounds like you two have interesting ideas, let’s all collaborate to make things nicer.

Scale benchmarks are marked as experimental for some of the reasons y’all are saying. They’re a tool I wanted to get into people’s hands to see what you’re doing… it’d really help me if you could open issues on the repo with your use cases.

More inline…

I’d really rather not. If you visualize your data every time you tend to develop a heuristic for how things should look. Once you get a feel for that, you can spot weirdness pretty easily and in a way that a statistical approach couldn’t catch.

Anscombe’s quartet illustrates this in an interesting way: it contains four data sets which have identical basic statistics but are visually distinct. (a more recent, and IMO funnier take: the datasaurus dozen.)

See if you can get used to it. I’d hate for you to miss out on the benefit of your brain’s excellent heuristic creation. If you’re having a really bad time, I’d welcome a PR to make the plots slightly smaller/denser or adjust their layout to be less intrusive. (caveat: horizontal space is off-limits because of some plans I’m not ready to talk about just yet.) Shortening the plot area might be a good way to do this, but makes series odd.

I’m curious about your use case for this. Why would this be more important than verifying the samples are acceptable? I’m having some trouble imagining what you want, as well. Can you link to an example of this visualization elsewhere?

I considered plots like this, but the primary motivation is to show the data collected so you can decide whether or not to trust it. Bar plots with runs/second would be a nice alternative, but I’m not sure how to fit them in visually without spreading out vertically even more.

This would be pretty cool, but depending on the visualization it would require a pretty big refactoring to keep track of the input size. If you want more visualization options it makes sense to start elsewhere.