Hello folks!
A week ago I posted about elm-minithesis
.
I believe it’s roughly ready for publishing / for larger discussions about inclusion to elm-test
, but I’d like to first
- gather feedback on the API
- benchmark against
elm-test
- compare behaviour against against
elm-test
For that, I need your help! Let’s talk about each point:
1. API feedback
Here are the preview docs (thanks @jfmengels!)
Could you please read it through and tell me your suggestions for wording and even API design? Eg. there are differences from elm-test
Fuzz module like:
-- elm-test:
intRange : Int -> Int -> Fuzzer Int
int : Fuzzer Int
-- elm-minithesis:
int : Int -> Int -> Fuzzer Int
anyNumericInt : Fuzzer Int
Could you also try using the library - rewriting some of your existing fuzz tests in this style and telling me what was surprising / unexpected / confusing? (Also perhaps contributing them to the benchmarks, see #2
) (You’ll need to vendor the elm-minithesis
source code / add it to source-directories
since it’s not published yet.)
2. Benchmark against elm-test
I’ve created an example benchmark and a template for writing your own:
main =
ourBenchmark
{ name = "int 0 10000"
, minithesisFuzzer = MF.int 0 10000
, elmTestFuzzer = F.intRange 0 10000
, minithesisFn = \i -> i < 5000
, elmTestFn = \i -> Expect.lessThan 5000 i
}
I’d like to benchmark various fuzzers and how they behave with various test functions. Perhaps we find out that frequency
is much slower in elm-minithesis
and needs to be optimized, or something similar.
Also, (real-world) combinations of fuzzers would be helpful.
3. Compare behaviour against elm-test
There are differences:
-
elm-minithesis
stops doing extra work after it finds a failing example and shrinks it fully; I suspectelm-test
finishes all 100 runs even if it already found and shrunk an conterexample. If true this probably makes the benchmarks a little bit apples-to-oranges. Theelm-minithesis
behavour makes sense to me though. - There might be differences in distributions of floats, lists, etc. - I hope some of your testing could uncover bugs / issues, like:
- “I’d expect it to find a counterexample for XYZ but it never did!”
- “With these
frequency
weights myelm-test
test never triggered stack overflow, butelm-minithesis
does!”
- Unexpected shrink “targets”: eg. I know of a bug where
int -5000 5000
will shrink towards -5000 instead of 0. Are there more?
I’ll be very glad for any feedback you give me on this.
I’m in touch with @drathier about possible integration of this into elm-test
, and the benchmarks etc. are an important steps before we can decide that in any more detail.
Thanks and stay safe, folks!