Elm-minithesis: shrinking without compromises

Hello! I’ve ported @'drmaciver’s Minithesis to Elm: elm-minithesis. (Phew, I’ve tried porting Hypothesis a few times over the years and never managed to do it, it’s very stateful.)

It’s a property based testing library similar to elm-explorations/test's Fuzz module, with the difference that it uses a shrinking method that has less disadvantages.

In short, it shrinks the underlying PRNG’s decisions instead of the generated values themselves, and that sidesteps the conventional shrinkers’ trouble with shrunk andThen-ed values no longer satisfying preconditions, since it instead generates the values from scratch each time.

So, elm-minithesis has Fuzz API much closer to the Random.Generator APIs; doesn’t expose any shrinking details to you the user, and lets you use andThen in your fuzzers.

I’m planning to add some adapters so that it can be used comfortably in elm-explorations/test, define some more fuzzers and then publish it.

See the test suite for usage examples.

Do you have any suggestions / questions / concerns? I’d like to get the 1.0.0 release as good as possible :slight_smile:

25 Likes

Extremely cool. I’m looking forward to reading the source and getting some better understanding of the differences.

Are you planning on (eventually) contributing this to elm-exploration/test? I feel like having a standard testing package is pretty handy and this seems like an easier approach (for the user).

1 Like

Looks cool! Is this approach also resistant against causing stack overflows?

I still want to clean up and refactor the code after I get a bit of distance from it after the initial port :joy: At least some nicer separation into modules. But hopefully it should be easy to digest. Reading from the first commit forward helps too.

I would love to. Doing this as a separate library has had some differences in API as a result, since I can’t use JS kernel functions. So, for example, tests can’t be Test but must be Test a where a is the type of the value generated by the fuzzer. elm-explorations/test does some black magic with Elm.Kernel.Debug.toString so that it doesn’t have to :upside_down_face:

Also, so far I believe this approach to be superior but I (or somebody) will have to compare/benchmark them side by side a bit closer. Right now it’s all just big words without any numbers :slight_smile:

I’ll create an issue in elm-explorations/test to start discussion about adopting this approach.

I believe andThen in principle can’t be totally safe against stack overflows; for example:

import Minithesis as M
import Minithesis.Fuzz as F

badFuzzer =
  F.unit
    |> F.andThen (\() -> badFuzzer)

M.run 0 <| M.test "" badFuzzer (\_ -> True)
--> RangeError: Maximum call stack size exceeded

But I believe I could include some trampolining / looping support like what elm/bytes or elm/parser have. (Maybe I’m conflating two unrelated ideas, didn’t look that much into it yet.)

1 Like

Gotcha. I understand that recursively defined fuzzers can overflow if the user is careless. I was thinking more of elm-test’s tendency to cause stack overflows when met with even slightly complicated fuzzers (i.e. Fuzz.list (Fuzz.map2 Tuple.pair Fuzz.float Fuzz.float))

Looks really cool! The only obvious feedback I have is that it would be great to flesh out the Fuzz module with support for more complex things like Floats and Chars and Strings.

I’d love to have a Float fuzzer with better shrinking behavior than elm-test - I think it currently just tries to make Float values as small as possible, which means I end up getting test failures in elm-geometry for things like “a point with coordinates (0.0000001, 0.00000003, -0.000000002)” which is pretty hard to read/reason about. It would be great to try to set things up so the shrinking tended to be more towards more “round” numbers with fewer decimal digits or something like that.

1 Like

I was reading the paper on this and coincidentally came across this right after you mentioned how 0.000001 is less useful than 1.0

3 Likes

Ah, I remember hitting that too myself.
The answer right now is “I don’t know”; I’ll have to test elm-minithesis more thoroughly and find its limits.

One thing I can say is, I deliberately followed case...of pattern instead of stuff |> Result.andThen |> Result.withDefault etc., so hopefully the library itself doesn’t make the situation much worse than the user-defined fuzzers already will. But, again, testing and measuring needed.

Oh yeah. Right now it’s basically just Ints, Bools, Lists :slight_smile: I will add more fuzzers soon!

Hmm, interesting. Floats are tricky :slight_smile: I don’t have any clear answers right now, I’ll probably ping you when I start playing with Float fuzzers and we can try and come up with something that would be useful to your failing tests?

(And yeah, that paper @MartinS linked is great!)

1 Like

Issue created!

So, for example, tests can’t be Test but must be Test a where a is the type of the value generated by the fuzzer. elm-explorations/test does some black magic with Elm.Kernel.Debug.toString so that it doesn’t have to :upside_down_face:

I made a rough proposal a while ago that might be relevant here: Enable test runners to produce structured diff · Issue #147 · elm-explorations/test · GitHub. It would be be good to remove or atleast encapsulate that black magic a bit.

1 Like

Forgive me for asking a question before I have read about minithesis fully:

How does this shrinking method ensure that generating smaller random integers gives simpler test cases? Is it just by careful design of the Fuzz.xxxx functions?

Yes, in general the shrinking is sensitive to how the Fuzzers are implemented. For example, two ways to write a List fuzzer would be:

1. Throw a coin to decide whether to generate another item

(Used in elm-minithesis right now)

list : Fuzzer a -> Fuzzer (List a)
list itemFuzzer =
  let
    go : List a -> Fuzzer (List a)
    go acc =
      Fuzz.weightedBool 0.9
        |> Fuzz.andThen (\bool ->
          if bool then
            itemFuzzer
              |> Fuzz.andThen (\item -> go (item :: acc))

          else
            Fuzz.constant (List.reverse acc)
        )
  in
  go []

2. Decide a list length beforehand

(didn’t try, and it has a different distribution and other problems, but is possible)

list : Fuzzer a -> Fuzzer (List a)
list itemFuzzer =
  let
    go : Int -> List a -> Fuzzer (List a)
    go leftToDo acc =
      if leftToDo <= 0 then
        Fuzz.constant (List.reverse acc)

      else
        itemFuzzer
          |> Fuzz.andThen (\item -> go (leftToDo - 1) (item :: acc))
  in
  Fuzz.int 0 20
    |> Fuzz.andThen (\length -> go length [])

The two approaches will have different representations for the same values:

Generated list value PRNG for 1) Throw a coin PRNG for 2) Decide length beforehand
[] [0] [0]
[42] [1, 42, 0] [1, 42]
[42, 99] [1, 42, 1, 99, 0] [2, 42, 99]
[42, 99, 5] [1, 42, 1, 99, 1, 5, 0] [3, 42, 99, 5]

And again, given shrinkers are generic and aren’t customized per fuzzer, some fuzzer strategies might shrink better than others.

As an anecdote, I’m currently reading how generating Floats works in Hypothesis. It’s intense. And cares more about how it shrinks than what the distribution is!

2 Likes

Basic Float fuzzers are now live! By @ianmackenzie’s testing they shrink to nice readable values, so… I consider that a success :slight_smile:

7 Likes

The main current blocker is:

  • porting the frequency fuzzer which in Hypothesis uses a fancy algorithm
  • fleshing out the floatWith fuzzer using that :arrow_up:

After that, I believe we can start testing it, benchmarking and comparing it against elm-test - I’d love your help with that. I’ll write this up in more detail when the codebase is ready :slight_smile:

I wonder if the fancy algorithm is worth it, given that arrays are not exactly constant time in Elm.

Seems like some benchmarking in Python seems to indicate that it only becomes worthwhile with extremely large numbers of samples:

https://bugs.python.org/msg197540

1 Like

Is it possible to try elm-minithesis for package tests before it gets published? I’d be interested in trying to fuzz sets of dependencies to see if some of my Debug.todo branches get triggered. How did you try it @ianmackenzie?

I feel like currently the easiest solution would be to clone elm-minithesis somewhere and add that path to your source-directories. I’m preparing a “call for testing and feedback” post where I’ll say how people can do this, but hopefully the above will unblock you?

From my conversations with Ian, I believe he has just written his test inside cloned elm-minithesis repo.

The issue is there is no source-directories for a package. But there are only 3 items in your src/ directory so I’ll try just symlink those

1 Like

I can confirm that simlinks work. I’ve just written my first elm-minithesis test and got the that nicely shrunk failure.

✗ uncorrelatedDependencies

Given 0

    FailsWith [(("",Version { major = 0, minor = 0, patch = 0 }),("",Range []))]
    ╷
    │ Expect.equal
    ╵
    Passes

It took 1 full minute though on a 10th gen i7 CPU to fuzz these values. The fuzzer is built like this:

type alias Dependencies =
    List ( ( String, Version ), ( String, Range ) )


{-| Names of packages and their dependencies are not correlated,
so most probably, this will lead to an unsolvable set of dependencies.
-}
uncorrelatedDependencies : MF.Fuzzer Dependencies
uncorrelatedDependencies =
    MF.list (MF.map2 Tuple.pair packageVersion packageRange)


{-| Random package name and range
-}
packageRange : MF.Fuzzer ( String, Range )
packageRange =
    MF.map2 Tuple.pair name range


{-| Random package name and version
-}
packageVersion : MF.Fuzzer ( String, Version )
packageVersion =
    MF.map2 Tuple.pair name version


{-| String with 3 times less chance to be empty
-}
name : MF.Fuzzer String
name =
    MF.map3
        (\s1 s2 s3 ->
            if String.isEmpty s1 then
                if String.isEmpty s2 then
                    s3

                else
                    s2

            else
                s1
        )
        MF.string
        MF.string
        MF.string


range : MF.Fuzzer Range
range =
    MF.map2
        (\v1 v2 -> Range.between (Version.min v1 v2) (Version.max v1 v2))
        version
        version


version : MF.Fuzzer Version
version =
    MF.map3 Version.new_ positive positive positive


positive : MF.Fuzzer Int
positive =
    MF.int 0 100
1 Like

Great, thanks for sharing @mattpiz!

One thing that is probably not obvious enough / will need changing is that Minithesis.run by default tries to generate the value 1000 times or until it gets 100 generated values. Combine this with elm-test default --fuzz 100 and you potentially get up to 100k generation attempts for a single test. So that’s where your 1 minute run might come from. We might have to tweak the defaults a little if publishing functions for use with elm-test.

Looking at your fuzzers, they look fine to me - I’d perhaps use a different implementation for name, but it shouldn’t matter much (if shrinking is fine for that fuzzer).


I still have a few TODOs regarding what values do the fuzzers shrink towards: it seems negative floats are preferred, and when using anyNumericInt it leans towards the negative values also. This is all solvable, it’s just things I didn’t port from the Hypothesis repo yet :upside_down_face: