Hello folks!
A week ago I posted about elm-minithesis.
I believe it’s roughly ready for publishing / for larger discussions about inclusion to elm-test, but I’d like to first
- gather feedback on the API
- benchmark against
elm-test - compare behaviour against against
elm-test
For that, I need your help!
Let’s talk about each point:
1. API feedback
Here are the preview docs (thanks @jfmengels!)
Could you please read it through and tell me your suggestions for wording and even API design? Eg. there are differences from elm-test Fuzz module like:
-- elm-test:
intRange : Int -> Int -> Fuzzer Int
int : Fuzzer Int
-- elm-minithesis:
int : Int -> Int -> Fuzzer Int
anyNumericInt : Fuzzer Int
Could you also try using the library - rewriting some of your existing fuzz tests in this style and telling me what was surprising / unexpected / confusing? (Also perhaps contributing them to the benchmarks, see #2) (You’ll need to vendor the elm-minithesis source code / add it to source-directories since it’s not published yet.)
2. Benchmark against elm-test
I’ve created an example benchmark and a template for writing your own:
main =
ourBenchmark
{ name = "int 0 10000"
, minithesisFuzzer = MF.int 0 10000
, elmTestFuzzer = F.intRange 0 10000
, minithesisFn = \i -> i < 5000
, elmTestFn = \i -> Expect.lessThan 5000 i
}
I’d like to benchmark various fuzzers and how they behave with various test functions. Perhaps we find out that frequency is much slower in elm-minithesis and needs to be optimized, or something similar.
Also, (real-world) combinations of fuzzers would be helpful.
3. Compare behaviour against elm-test
There are differences:
-
elm-minithesisstops doing extra work after it finds a failing example and shrinks it fully; I suspectelm-testfinishes all 100 runs even if it already found and shrunk an conterexample. If true this probably makes the benchmarks a little bit apples-to-oranges. Theelm-minithesisbehavour makes sense to me though. - There might be differences in distributions of floats, lists, etc. - I hope some of your testing could uncover bugs / issues, like:
- “I’d expect it to find a counterexample for XYZ but it never did!”
- “With these
frequencyweights myelm-testtest never triggered stack overflow, butelm-minithesisdoes!”
- Unexpected shrink “targets”: eg. I know of a bug where
int -5000 5000will shrink towards -5000 instead of 0. Are there more?
I’ll be very glad for any feedback you give me on this.
I’m in touch with @drathier about possible integration of this into elm-test, and the benchmarks etc. are an important steps before we can decide that in any more detail.
Thanks and stay safe, folks!
