Expect.equal argument order convention

I opened an issue on elm-test because I kept confusing the arguments of the diff outputs:

 "Am I the actual"
    ╷
    │ Expect.equal
    ╵
"Am I the expected"

But actually, these are supposed to mimic the piping convention commonly used.

Still, I personally feel that it would be clearer if the first argument actually was supposed to be the expected value and the second one the actual. Off the top of my head, I can’t think of a case, where I would need the first argument to be the actual and the second one the expected.

This expected/actual convention would make it possible to output diffs like

"I am actual"
    ╷
    │ (^ actual)
    │ Expect.equal
    │ (v expected)
    ╵
"I am expected"

What do you think about these two conventions?

1 Like

Always having the same problem

1 Like

@drathier offered two interesting points of discussion in the GitHub issue.

If we use actual vs expected, I’d prefer to switch the api over to take a single record argument with two fields, like Expect.equal {expected=1, actual=a} .

This is something I also thought about, but in my opinion, this should only remain a convention. Having the named fields only brings more clutter, and only slightly helps with mistakes. Accidentally switching expected and actual value does not break the test (in case of Expect.equal), it would only lead to wrong output formatting.
So the question to me is whether accidentally mislabeling the outputs is better than not explicitely labeling them?

But then you have other problems, like remembering what to call each field, see Expect.lessThan {smaller=a, larger=14} . Expect.lessThan {expected=a, actual=14} makes no sense to me.

As I said above, I wouldn’t use a record to enforce this convention. The package already implicitly follows this convention, so it currently implicitly is Expect.lessThan expected actual, which is the way that feels right to me. So even for the other comparison functions, the piping convention is coherent with the expected/actual convention.

@drathier’s idea of including the pipe in the output as to hint at this convention already is an improvement. And I’m beginning to ask myself what the disadvantages would be of going a step further and officially documenting the expected/actual convention, because it feels that there might not be that much more clarity to be had without any hickups.

Basically, to me the point is to what extend does it make sense to formalize this convention? What are the disadvantages of treating the arguments as expected/actual?

This was exactly the sequence of events (including trying a record-based API with labeled fields) that led to the current design. :smile:

Personally, I have always struggled with test frameworks that put “acutal” and “expected” in the output. I’m always like “okay, now which one was which again?” and I do a bunch of investigation to figure out what output I’m looking at.

One downside is that it’s potentially lying to you. There is absolutely nothing preventing someone from reversing the order; the test will still pass or fail just the same.

What happens when someone new to the code base misunderstands the convention and reverses them? (I would personally do this all the time; I still have to consciously think which one is “expected” and which one is “actual” on a case-by-case basis, every time, with test frameworks that use that terminology.)

How much time is someone going to waste staring at test output going “how could the actual result possibly be that?!” until realizing the real problem was that whoever wrote the test didn’t follow the convention? I really don’t think “expected” and “actual” are the way elm-test should go.

I also tried out and rejected the following, because “it might be a lie” sometimes:

"cats"

|> Expect.equal

"hats"

There’s no guarantee there’s actually a |> in the test. Sometimes people will write Expect.equal foo bar and that’s it!

Having said that, I could see a persuasive argument for this being fine because it’s necessarily an equivalent expression to whatever the test author did write. Even if I wrote Expect.equal "hats" "cats" and it showed me "cats" |> Expect.equal "hats", although it’s not literally what I wrote, it is equivalent to what I wrote, at the expression level.

Personally I’d be open to that change, although I’d want to run it by a lot more people first!

So my original problem was that I confused the two. Now that I know about the piping convention, I think that won’t happen to me as easily anymore.

But maybe it’s possible to document the piping convention as the recommended way of doing tests. Maybe even a note about the formatting. By teaching people about the convention (and saying it’s fine to deviate, it’s just a convention), I think it will then make sense to format the output as

"cats"

|> Expect.equal

"hats"

(or similar)

The advantage of this format is that, even if I forgot which was which, I could remember the convention, since I see the pipe there, and then figure out which one the input was.

I totally agree that this is something we should check with people, to pick the one that actually makes sense to the most. What is the best way of testing this?

1 Like

I’ve spent some time squinting at my monitor over this one. I like the idea of making the actual and expected value explicit in both the input and output values, like @drathier suggested, but I thought that custom types might be a bit nicer than records for that, so I spent a little bit of time experimenting with that approach. Here’s where I ended up:

suite : Test
suite =
    describe "Custom types as test inputs"
        [ test "are reasonable as arguments" <|
            \_ ->
                let
                    palindrome =
                        "hannah"
                in
                equals
                    (Expected palindrome)
                    (Actually (String.reverse palindrome))
        , test "read nicely in a pipeline" <|
            \_ ->
                "ABCDEFG"
                    |> String.reverse
                    |> Actually
                    |> equals (Expected "GFEDCBA")
        , test "work for notEquals as args" <|
            \_ ->
                notEquals
                    (Unexpected "bar")
                    (Actually "foo")
        , test "work for notEquals as a pipeline" <|
            \_ ->
                "foo"
                    |> Actually
                    |> notEquals (Unexpected "bar")
        , test "Can handle limits as well" <|
            \_ ->
                10
                    |> Actually
                    |> lessThan (Limit 20)
        ]

The type errors aren’t too bad if you naively leave out the variant constructors, IMO:

-- TYPE MISMATCH ---- /home/matt/projects/test-api-exploration/tests/Example.elm

The 2nd argument to `lessThan` is not what I expect:

80|                 lessThan 20 10
                                ^^
This argument is a number of type:

    number

But `lessThan` needs the 2nd argument to be:

    Actual comparable

Hint: I always figure out the argument types from left to right. If an argument
is acceptable, I assume it is “correct” and move on. So the problem may actually
be in one of the previous arguments!

Hint: Only Int and Float values work as numbers.

-- TYPE MISMATCH ---- /home/matt/projects/test-api-exploration/tests/Example.elm

The 1st argument to `lessThan` is not what I expect:

80|                 lessThan 20 10
                             ^^
This argument is a number of type:

    number

But `lessThan` needs the 1st argument to be:

    Limit comparable

Hint: Only Int and Float values work as numbers.

I’m curious what other folks think. It’s a bit more verbose, for sure. I’m not sure I’ve chosen the best possible names, but it seems like it would be tough to get confused and put the wrong variant on the expected or actual value.

The usual way is to try a bunch of things side by side.

For example, take a bunch of failed tests with different comparisons (e.g. some with Expect.equal, others with Expect.lessThan, etc.) and different outputs (e.g. some with numbers, some with short strings, some with really long strings, some with short records, some with really long records - an important consideration is how things look when the values being compared are so long, they wrap multiple lines in the terminal) and see how they look!

I think this exact format is the most compelling alternative to the status quo I’ve seen:

"cats"

|> Expect.equal

"hats"

I think an indented version is also worth trying:

"cats"

    |> Expect.equal

"hats"

After all, elm-format will indent |> Expect.equal, so this makes the top portion of it look more familiar at least. I’m not sure how this would look when the two values are really big records, though.

Along those lines, it’s possible indenting the bottom one could make sense too…

"cats"

    |> Expect.equal

        "hats"

…since that’s closest to how elm-format would format it. However, I think it’s valuable to have the two values left-aligned so you can more easily spot differences between them at a glance. (We do have diffing now, so that’s less important than it used to be, but I think it still has value.)

1 Like

I don’t think I was clear enough about this before. I don’t like being discouraging, but I also don’t want anyone to waste valuable time! So just to be super clear about it:

I think the odds that elm-test ever includes “expected” and “actual” in its built-in test output are way below 1%. I don’t think it is worth further exploration.

The reasons they aren’t in there now are not because we haven’t found the right API, but because the English words “expected” and “actual” are open to interpretation on the part of the test author. This is fundamentally problematic given that the person reading the test is often a different person than the author, meaning the possibility for misinterpreting what the test output is high.

No API can change that. At best, it can offer labels to help the reader when they are jumping between the test output and the source code to figure out which side corresponds to which. At that point it becomes strictly worse than the current |> path, since the person reading might sometimes misinterpret the “expected” versus “actual” and think “I understand what this means; I don’t need to jump to the source to double check” when in fact they did - leading to an inaccurate mental model of what’s going on, and making the wrong changes to their code.

Given all that, I don’t see how any possible “expected” and “actual” API can be an improvement over the path we’re discussing here. At best, it can be equally helpful in showing which part of the code corresponds to which part of the test output, and at worst, it can instill a false sense of confidence, leading to potentially a great deal of wasted time.

I think the most promising path to explore is a world where elm-test has tighter integration with elm make, and showing annotated source code in the test failure becomes possible - like what power-assert does. That is a harder project which is blocked on tighter integration with the compiler, but it seems like the long-term fastest path for the person reading the failure to understand what went wrong!

I think the odds that elm-test ever includes “expected” and “actual” in its built-in test output are way below 1%. I don’t think it is worth further exploration.

Ah, I didn’t get that from the previous conversation. Thanks for the clarification and the explanation of the reasoning behind it!

Being able to show annotated source code does seem like a better long-term approach, and changing the API certainly wouldn’t make sense in that context.

1 Like

@rtfeldman Order of actual and expected params is a common mistake and a confusing issue.

But I’m genuinely surprised, that people confuse the meaning or interpret the words (actual / expected) the wrong way. Since I’m not a native English speaker this could be missing background on my side. Could you explain where confusion could come from in a message like “The test expected the result to be 2, but the actual value was 3 instead”. Not triggering API discussion, just curious :slightly_smiling_face:

1 Like

Maybe “open to interpretation” was the wrong way to say it. The problem is that there’s no way to have confidence that “expected” and “actual” in test output will accurately reflect the test author’s intent.

For example, let’s say I write this test:

fuzz2 int int "multiplication of integers should commute" <|
    \num1 num2 ->
        Expect.equal
            { expected = num1 * num2
            , actual =  num2 * num1
            }

I get this test output:

Given: ( 1, 2 )

Expected: 4
Actual: 2

This is telling me that 1 * 2 was expected to be 4, but it was actually 2. That’s not true! Nobody expected that 1 * 2 should equal 4. The test author only expected that both sides would be equal, not which one was necessarily going to be “right” or “wrong” if the test failed.

That’s what I meant by “open to interpretation” - it’s claiming to follow English semantics that aren’t (and can’t be) enforced, and which can produce misleading output like this.

1 Like

Interesting! I’m not getting the example, because the code snippet wouldn’t generate 4 and 2, but 2 and 2. But I think I got the point. Property based tests could be different compared to “classic” unit tests with static values for expected. I’d think about if my expectation in a fuzz test is instead, that the property num1 * num2 == num2 * num1 is True. And my test error could be: "Your property num1 * num2 == num2 * num1 won’t hold for input (1,2). Although this is perhaps over specific for this example.

To clarify, the code was demonstrating “what if there was a bug in the implementation of (*), which the test revealed” (in this case, the bug would be “(*) squares its second argument before performing the multiplication”)

If Expect.equal takes an { expected : Expectation, actual : Expectation } record, then it can’t accept something different for fuzz than it does in the case of test - that’s just the argument it accepts everywhere!

An alternative would be to have Expect.equal for test and some other function (maybe Expect.same) that’s tailored to fuzz, and then tell everyone to use the one for the one and the other for the other, but honestly that sounds too clunky to consider seriously.

Anyway, I’m gonna duck out of this discussion. Like I said, I don’t think expected/actual is worth further exploration. :slightly_smiling_face:

1 Like

This topic was automatically closed 10 days after the last reply. New replies are no longer allowed.