Elmer: An Elm Testing Tool

When I started with Elm, I found it difficult to practice test-driven development for two reasons: (1) the core functions (view, update) return opaque types like Html msg or Cmd msg that cannot be inspected and (2) I found myself writing unit tests that had to know too much about implementation details, which made refactoring more difficult.

So, I’ve been working on a package called Elmer that addresses these problems.

Elmer supplements elm-test and allows you to make expectations about the view and the model, simulate html events, stub commands and subscriptions (as well as ports), and spy on functions. Instead of testing functions in isolation, Elmer allows you to test that the pieces of your app – the model, the view, and the update function – interact in the right way, under the conditions that you specify.

Here are some things you can do with Elmer: write a test that describes what happens to the view when a user clicks a button or types into a field; expect that an http request is made under certain conditions; simulate a subscription and describe the expected results. You can test a full html application or a module within it, a headless worker, or a single command.

Check it out here:

Elmer is a work in progress. Any feedback would be much appreciated.



Hey! Did you look at elm-html-test which already provides a lot of these features?

Namely, it provides:

  • A way to test HTML (link)
  • A way to simulate events on nodes (link)

Likewise, for testing commands, there’s elm-testable.

If you did, I’m curious what made you create this project regardless – is there some feature that your package provides you couldn’t find in elm-html-test or elm-testable? I’d love to hear those things if so!


I have looked at elm-html-test and elm-testable. For me, the choice to write my own library was less about features and more about how I think test-driven development should be practiced.

I think tests should describe the behavior of the software while knowing as little as possible about implementation details. Such tests allow you to refactor your code with confidence. Make as many changes as you like, experiment with design patterns, clean up the code so it’s easy to add more features later, then just run the tests to check whether the software still has all the behavior you expect.

It seems to me that elm-html-test provides a good way to unit test the view function (or some other function that produces an Html msg). But such tests will need to know lots about implementation details: the shape of the model, the Msg type that my application sends around, etc. Furthermore, just testing the view function in isolation doesn’t give me much confidence that my app will actually work, so I’d also need to write unit tests for the update function as well. I tried this strategy when I first started with elm, but felt like my tests just reproduced the implementation in another form.

If my tests know about implementation details, then I can’t change my code without changing my tests. But in that case, it seems to me, my tests aren’t providing much value; I’m not able to run them as I make changes, so they can’t give me confidence that my software still has all the right behavior.

With elm-testable, it’s possible to write tests at a higher level – that is, given a model, view function, and update function I can make expectations about the resulting system (elmer takes a similar approach). Such tests can do a better job avoiding knowledge of implementation details. However, elm-testable tightly couples code and tests in the other direction, since my code now has to use the elm-testable modules, that is the ‘testable’ version of the Http package and so on. That feels risky to me. I now have a layer of code in my app that I don’t really understand. In addition, it feels like I’m limited to writing tests only about those commands and subscriptions that are covered by elm-testable. (I could be wrong about this)

Elmer is not a perfect solution. It relies on a small bit of native code. But it allows me to write tests that give me freedom to refactor my code while remaining confident that my software has all the behavior I expect.


I tried this strategy when I first started with elm, but felt like my tests just reproduced the implementation in another form.

I’m not sure how Elmer solves this differently compared to elm-html-test and elm-testable. Can you give a code example of this? I’d help me understand a ton. :smiley: My current understanding is that Elmer provides the exact same as elm-testable right now. Is there some other difference in the API that I missed?

Elmer is not a perfect solution. It relies on a small bit of native code. But it allows me to write tests that give me freedom to refactor my code while remaining confident that my software has all the behavior I expect.

You could’ve built this on top of elm-html-test, without requiring additional kernel code. Which is what elm-testable did :D. We added support for event-testing in elm-html-test so that elm-testable could make exactly these types of queries. I’d love to enable your project too, so if there’s something that you need to be provided by elm-html-test and elm-test, let’s talk!

However, elm-testable tightly couples code and tests in the other direction, since my code now has to use the elm-testable modules, that is the ‘testable’ version of the Http package and so on.

This is true right now! However, this has been raised as an issue previously, and the plan for elm-testable’s next avh4 version to not require this at all. This is currently on hold for the next release of Elm, as there will be related changes for elm-test to be done there. Are there any things you found in Elmer that you prefer to elm-testable? I’m super curious to hear if so.

Contributing to the community

I thought I might write up a little bit about how I feel we’ve managed in the community to increase the number of high quality libraries in the Elm community, and avoid the “framework overload” that is usually experienced in the frontend world. This is related to the talk Code is the easy part by Evan, which is worth a watch.

I believe in the community, a healthy process looks like this:

  1. Come up with some novel idea or new approach
  2. Maybe implement a proof of concept if it enables discussion
  3. Discuss in the community, especially if a tool already exists (Slack is the best place for this IMO)
  4. If it makes sense to be added to another library, great! Now there is one less library to choose from.
  5. If not, then it’s not and that’s okay too, and a new package is born.

What happens sometimes is that people skip 3 and 4 and go straight to 5. Which is totally fine, though it does create fragmentation which didn’t need to exist, along with duplicating a bunch of work. I’d love to encourage the community to discuss their problems with each other, before investing a bunch of effort working on them. In some cases the conversation can just be “hey! I want to do this thing. Does anything else already do this thing?”.

In this case

As it stands, “Elmer” is less likely to be adopted because of 1) the approach taken to implement it (with a bunch of unverified kernel code), which makes it hard to install or recommend, 2) elm-html-test is part of elm-test and will be merged into elm-test next released 3) elm-testable is published and easily usable, with a lot of work gone into the API.

An example of where you would’ve benefited from using elm-html-test is to handle edge cases: I can see that “Elmer” does not account for

  1. Lazy views
  2. Mapped lazy views
  3. Markdown
  4. elm-graphics
  5. webgl
  6. Keyed views
  7. Lazy keyed view

However, in elm-html-test we do account for each of those edge cases. The thing is, when a library has been used a bunch then you find more edge cases. elm-html-test has been used a bunch, and so we’ve found a lot of edge cases and addressed each.

Likewise, elm-html-test is based on a pure Elm representation of the virtual-dom. This allows us to:

  1. create beautiful error messages
  2. provide a bunch of support in the elm-test runners easily
  3. use decoders for all the parsing of nodes.

I would argue that 3) is the most fundamental differences between Elmer and elm-html-test: in elm-html-test, we know that the virtual-dom is correctly formed, and if it’s not, we get a nice error telling us what was missing. In Elmer, you will just get a runtime error.

If we had discussed, then we could’ve helped you to leverage elm-html-test to achieve this without duplicating the work. I believe we all benefit from talking and sharing ideas! :smiley:

That’s not to say there’s something wrong with making a new library: in some cases it’s needed if the approach is radically different! But in this case, I’m not sure it is radically different. Most of the differences are actually implementation details. I guess what I’m saying is: it would’ve been awesome if we had discussed this in #testing on Slack, or even in a Github issue. Don’t get me wrong: this is cool, and I can always appreciate work gone in to improving problems you see! I just think that the community can make things better, by working together.


The unique things that Elmer has are that it lets you completely mock the outcome of effects (unlike elm-testable, which tries to help you simulate the outcome of effects), it provides more general spies which lets you do isolated testing in a way that’s not otherwise possible in Elm, and it tries a different API for mocking HTTP requests.

I’m not sure why there’s such a resistant response here. The current combination of elm-test, elm-html-test, and elm-testable has a lot of problems at the moment and doesn’t yet account for all the needs of TDD. I’m happy to see attempts at improving testing in Elm, and I hope it will be useful in understanding and solving the problems with the currently recommended tools.


The unique things that Elmer has are that it lets you completely mock the outcome of effects (unlike elm-testable, which tries to help you simulate the outcome of effects), it provides more general spies which lets you do isolated testing in a way that’s not otherwise possible in Elm, and it tries a different API for mocking HTTP requests.

Right, cool! I’d love to see a comparison of the two approaches side by side in order to better say look at the different attempts to see what works and what doesn’t. An illustrated example comparison goes so far in figuring out what’s useful, especially when there’s a lot of docs to read.

I’m not sure why there’s such a resistant response here. The current combination of elm-test, elm-html-test, and elm-testable has a lot of problems at the moment and doesn’t yet account for all the needs of TDD. I’m happy to see attempts at improving testing in Elm, and I hope it will be useful in understanding and solving the problems with the currently recommended tools.

There is not a resistance response – like I stated, I could not tell from the examples the unique offering provided by Elmer. The README is rather huge and I’ve read it a couple of times and come away unsure of the differences. The only resistance is to the approach – I believe much more could’ve been done through a conversation! :smiley: So we are in agreement, both you and I. We can always do better and make things better, and putting heads together is more productive than re-inventing a wheel that doesn’t need re-inventing, in the case of the underlying code.

Edit: updated my previous post to hopefully better reflect this!

I decided to take another look, focusing on the spy section. It’s interesting!

I have some questions though:

parseTest : Test
parseTest =
  describe "when the string is submitted"
  [ test "it passes it to the parsing module" <|
    \() ->
        spy = Elmer.Spy.create "parser-spy" (\_ -> MyParserModule.parse)
        Elmer.given App.defaultModel App.view App.update
          |> Elmer.Spy.use [ spy ]
          |> Elmer.Html.target "input[type='text']"
          |> Elmer.Html.Event.input "A string to be parsed"
          |> Elmer.Spy.expect "parser-spy" (wasCalled 1)

In this example, it seems like parser-spy couples your test code to the implementation detail of your view. What’s the benefit here of making sure the function is called rather than checking the end result? I’m familiar with spys in other languages: but in those languages, they generally focus on providing a way of inspecting something with mutation. In this case, wouldn’t the just testing the view/model be enough? Or some other unit tests?

The most interesting part, to me, is actually the Http section! Elmer lets you provide some stubbed responses by doing:

Elmer.Spy.use [ Elmer.Http.serve [ stubbedResponse ] ]

It’s a great idea and I really dig it! I think this API could be implemented via the elm-testable route, so that’s great too!

I’m not too sure about faking the view functions:

thingsViewSpy = Elmer.Spy.create "things-view" (\() -> ThingsModule.view)
    |> Elmer.Spy.andCallFake (\_ ->
            Html.div [ Html.Attributes.id "thingsView" ] []

What is the benefit of faking the view function in this way? Or rather, what’s the use cause for this? If you haven’t yet implemented things-view, isn’t it simpler to just implement things-view than faking it?

Either way, cool stuff! Some of it should be implementable without any kernel code. I have some ideas about a safer long-term way of implementing spies, without need for kernel code, which you should be able to do now. Would love to discuss in #testing on Slack!

Three things:

(1) I think Aaron did a nice job highlighting some of the differences between elmer and elm-testable – Thanks! Here’s a link to a medium article I wrote that gives another example of test-driving an elm application:

(2) It’s a great suggestion to use elm-html-test for the low-level html stuff in elmer; I will definitely look at that. Reducing the amount of native code in elmer is, of course, something I’m very interested to do.

(3) For spying on functions: Yes, if you start spying on functions during tests, you are starting to expose some implementation details, so you’ll want to do this with care.

It might be helpful here to distinguish two types of details: the ‘inner workings’ of a module and its ‘exposed interface’. Tests will probably need to know about the exposed interfaces of software components, but should try not to know about the inner workings of those components.

For example, when you’re writing ‘end-to-end’ style tests with elmer (edit: not a good use of ‘end-to-end’ here; I mean unit tests that describe the public behavior of a model, view function, and update function in interaction with each other), these tests will need to know about the interfaces of the app’s high-level dependencies – for example, a function that produces a command to make an http request. Usually, you can just spy on and stub functions that produce the relevant commands and subscriptions to simulate the side effects necessary for describing the behavior of your app under specific conditions. So your tests know some details about these interfaces but not about the inner workings of the app.

As an app becomes large, you might also find a need to spy on functions that represent exposed interfaces between different parts of the app. I’m interested in agile architecture design patterns like hexagonal architecture and clean architecture, and I’m a big fan of the SOLID principles. Those kinds of strategies recommend dividing your application into smaller, decoupled components. Once you do that, you might want to divide your tests along similar lines, describing the behavior of each component independently. Elmer’s capacity to spy on and stub functions can be useful when testing such components in isolation.

For example, you might want the UI portion of your app (with its own model, update, view) to be decoupled from the mechanisms by which the application gathers data. This is a good thing, since these two portions of your app will probably change for different reasons, and separating them makes each easier to understand and change. But then, your tests should probably respect this decoupling. You don’t want the tests of your UI to presuppose any particular mechanism for gathering data, since if you change that mechanism you’ll then need to update those tests. So, in the tests of your UI, you might spy on and stub the functions the represent the exposed interface by which data is gathered, and vice-versa. This lets you ensure that each component respects the interface between them and more easily simulate conditions to help you describe specific behaviors of the component under test.

So, yes, to spy on a function during a test you’ll need to know about some implementation choices. If you follow the SOLID principles or other agile architecture patterms, you’ll want to give your code a high-level structure composed of modules that are decoupled and easy to change. In that case, the tests may need to know about the interfaces exposed by those components, but they should know as little as possible about the inner workings of each component. And, I should stress, I think these kinds of patterns are something code needs to adopt only when an application starts to become big. For simple elm apps, just spying to stub commands and subscriptions is enough, and I think elmer lets you do that without needing to know too much about the inner workings of the app.

If there are ways to accomplish the kind of spying that elmer enables without the use of native code, I’d be happy to hear it.

It never occurred to me to push for spies in elm-test because to be totally honest, I eventually came to regret the tests I wrote using spies in other languages. :sweat_smile:

Can you share some Elm code you’ve written that uses spies to good effect? I’d like to discuss in the context of real-world code!

1 Like

You can check out the medium article I referred to above for a straightforward example. One test spies on the WebSocket.send function so the test can make expectations about its arguments. This kind of test ensures that our code is using the WebSocket.send function in the way we expect. Another test uses a spy for the WebSocket.listen function to provide a fake implementation that allows us to simulate receiving data over the websocket during the test.

Here’s another example. Suppose we write a magic eight ball app. You ask it a question and it returns some answer, like “The future is hazy”, “Definitely”, etc. We might structure the app so that the user interface is decoupled from the module that actually determines the answer. We could even apply the dependency inversion principle and provide the update function with a reference to the relevant function from the module that fetches the answer. That way, our UI module doesn’t have to know how an answer is fetched, it only needs to worry about managing the UI – gathering the user input, passing it to the function, and displaying the result.

When we write the test to drive out this code, we’ll use a spy to represent the module that fetches an answer. We could just use a normal function, but a spy allows us to easily assert that the user interface passes the right arguments to this function. Let’s say that function looks like (Result () EightBallAnswer -> msg) -> String -> Cmd msg, where the string is the question text. Here’s the test:

module Web.EightBallViewTests exposing (..)

import Test exposing (..)
import Expect
import Elmer
import Elmer.Html as Markup
import Elmer.Html.Matchers exposing (element, hasText)
import Elmer.Html.Event as Event
import Elmer.Spy as Spy exposing (Spy)
import Elmer.Spy.Matchers exposing (wasCalledWith, anyArg, stringArg)
import Elmer.Platform.Command as Command
import Web.UI as App
import Types exposing (EightBallAnswer(..))

requestAnswerCommandSpy : Result () EightBallAnswer -> Spy
requestAnswerCommandSpy answer =
  Spy.createWith "request-answer-command" (\tagger _ ->
    tagger answer
      |> Command.fake

answerTests : Test
answerTests =
  describe "when a question is given" <|
      state =
        Elmer.given App.defaultModel App.view (App.update <| Spy.callable "request-answer-command")
          |> Spy.use [ requestAnswerCommandSpy <| Ok Hazy ]
          |> Markup.target "#question"
          |> Event.input "Will I eat pizza soon?"
          |> Markup.target "#submit-question"
          |> Event.click
      [ test "it sends the question to the question service" <|
        \() ->
            |> Spy.expect "request-answer-command" (
              wasCalledWith [ anyArg, stringArg "Will I eat pizza soon?" ]
      , test "it presents the answer" <|
        \() ->
            |> Markup.target "#answer"
            |> Markup.expect (element <| hasText "The future looks hazy.")

Here, Spy.createWith provides the function to spy on. Spy.callable provides a reference to that function.

So, in general, I think spies help you describe the behavior of the code under test with respect to its collaborators; we can assert that the code under test calls those collaborators with the right arguments at the right time. We can also provide fake implementations to functions we spy on, and this lets us stub data to simulate conditions that help us describe the behavior of the module we’re testing.

Not every app needs to use spies in this way (and the example I’ve given is certainly contrived). But once an app gets large, it can be a good idea to decouple parts of the software system, and in that case spies can be handy as a means to make sure the parts interface with each other in the right way.

Thanks for the examples! :heart:

As I see it, tests are useful for two reasons:

  1. Helping guide implementation (test-first development)
  2. Identifying regressions early

As Spies rely on implementation details, it’s not possible for them to help guide implementation, so we can rule out benefit 1 categorically. What about Spies a way to catch regressions?

The answerTests suite has two tests. One checks that "Will I eat pizza soon?" is sent to the question service. The other checks that the correct answer was presented to the user.

From a regression-catching perspective, the Spy test is redundant. If the wrong arguments get sent to the service, the service won’t (can’t) respond with the correct answer, so both tests will fail. This means the Spy test neither helped guide our implementation nor helped us catch regressions we wouldn’t have already caught otherwise! It seems to me that the Spy test does not benefit us, but does have a maintenance cost, so our test suite would be improved if we deleted it. :smile:

The medium article tests that "Hello!" was sent to "ws://testserver.com". I can see the utility of testing that, but using a Spy for that seems unnecessarily brittle.

The regression we want to catch is if clicking the button no longer sends Hello! to a websocket connected to "ws://testserver.com". Testing this with a Spy creates two undesirable scenarios:

  1. The implementation of WebSocket.send changes such that it no longer does what we expect. In other words, it no longer sends "Hello!" to the expected websocket, even when we continue passing it the same arguments. We wanted the test to fail in this scenario, but it continued to pass.
  2. WebSocket.send gets renamed, we update our implementation, but the test still fails even though our implementation works - meaning we have to spend time updating the test for no benefit. We wanted the test to pass, but instead it failed.

It seems to me that it would be strictly better to write a test which actually verifies that the data gets sent to the Websocket. It would catch all the relevant regressions the Spy-based approach would catch, but it would not have either of these problems!

Granted, this would require creating a Websocket-specific testing API to do this (the way elm-html-test does for testing Html), but I think that is a better goal to pursue than Spies. I think these two examples generalize - specifically to the claim that for any Spy test, there exists a better test that doesn’t use Spies - and that we should follow the example of elm-html-test rather than creating a system for Spying.

That way we can catch the same regressions without the downsides! :smile:

1 Like

Growing Object-Oriented Software, Guided by Tests is a good book on the mock/spy approach.

I was a reviewer on the original mock objects paper, and totally missed the point. It took me about eight years and a fair amount of experimentation before I understood and became fond of mocks.

I find that thinking of mocks/stubs/spies in terms of implementation details is misleading. I’ve come to think of them as having the same relationship to product code as lemmas do to proofs. Here’s an explanation from my clojure testing tool’s documentation.

I have a hunch that, under this interpretation, mocking fits better with FP than with OO. Hard to know, though.


Let me explain a bit more how the test I gave in the post above describes behavior and drives out implementation. First, you write that test. It will fail to compile. You do the simplest thing to make it compile. Then you run the test and see that it fails in the way you expect. Finally, you do the simplest, reasonable thing to make that test pass. If you follow this process, you’ve done a few cool things:

  1. You’ve described the behavior of part of your software system. In this case, we’ve described how the UI takes input from the user and passes it to a collaborator (another part of the software system) that fetches the eight ball answer.

  2. In writing the test, we’ve made decisions about the interface we expect from that collaborator. This is great since when we go to write the tests for that collaborator, we won’t be tempted to build things that won’t be used.

  3. To make these tests pass, we have to do some work. That work is the implementation driven out by the test. The test doesn’t know what this implementation is – and that’s a good thing since it gives us freedom to refactor later, if we need to.

  4. Because we used a test double (more on that below) to stand in for the real collaborator, we’ve effectively decoupled the part of our software we’re describing with these tests (the UI) from the part that fetches the eight ball answer. That’s really good, since in implementing the UI I don’t have to worry about how the data is fetched. As a result, my UI module has a chance to be cleaner and easier to understand.

The tests in my post above are, I take it, a fairly straightforward example of the style of TDD known as the London School or ‘mockist’ TDD. (I’m not advocating for this school over any other style of TDD, just providing extra context). Here’s a good blog post that discusses the ‘mockist’ style in relation to other approaches:

@marick also provides a good explanation of this style of testing in his link – programming by wishful thinking is a nice way to describe it.

As I suggested in my earlier post, it’s not necessary to use an elmer Spy in this test. We just need a test double to stand in for the collaborator during the test. A spy is a type of test double that records the calls made to it. This is a great blog post on the different types of test doubles:


Using a spy simply makes it easier to write these tests. An earlier version of elmer did not have spies at all, and you could still write the same tests – in your code you’d just need to inject any functions that produce commands or subscriptions so they could be replaced during a test with functions that returned fake commands or subscriptions. I added spies because it facilitates certain testing strategies without necessarily requiring the code to inject all these dependencies. Dependency inversion seems to me to be one of those patterns that really pays dividends only in larger apps. So it’s good, I think, that elmer doesn’t force you into that pattern unless you really want to adopt it.

@rtfeldman suggests two other things. First, I believe he’s suggesting that end-to-end/integration tests give you more confidence that the software works than unit tests. That makes sense to me. However, I wouldn’t want my test suite to contain only integration tests – in my experience these tend to be slower, sometimes flakey, and more cumbersome to set up for testing edge cases. I’d advocate combining a few high-level, happy path integration tests with many more unit tests that describe the behavior of each part of the software system in isolation. And for those unit tests, you will need test doubles to stand in for collaborators.

I take it that the other point @rtfeldman gets at has to do with the extent to which there should be library-specific testing apis. To me, there are two challenges in writing tests for elm apps: managing side effects during the test and inspecting (when necessary) values represented by opaque types. To my mind, it should be the test framework that helps to manage side effects during the test, since this problem is (for the most part) the same across libraries – that’s what elm-testable does (I take it) and that’s what elmer does (not with spies, of course, but with the ability to send and process fake commands and fake subscriptions during a test). Some library-specific testing tools may still be necessary to manage side effects, though – elm-lang/html in particular complicates things since apps that use it subscribe to dom events without using Platform.Sub.

I think library-specific testing tools are necessary when you need to inspect opaque types used by those libraries. For example, in elm-lang/http, it’s not possible to inspect one of the core values – the request record. So, in writing elmer I had to add some native code that looks into that opaque type and makes the values hidden by it inspectable during the test. Now, it would be great if library authors took testability into account in designing their apis, adding functions that made it possible to inspect important values otherwise hidden by opaque types. Since, for various reasons, that might not always happen, I’m looking into ways to make elmer more extensible so that it’s easy to write library-specific plugins that help with making expectations about values hidden behind opaque types.

Thanks again for the thoughtful reply @brian-watkins!

The approach Abelman and Sussman called “programming by wishful thinking” is not about testing.

Suppose I’m writing a function and I think “I really wish I could call a depth : Tree -> Int function right about here” - I can follow “programming by wishful thinking” like so:

depth : Tree -> Int
depth tree =
    Debug.crash "TODO implement"

My function can call this, and I can continue on my way. (In Elm 0.19, Debug.crash won’t be allowed outside --debug mode, so I can’t forget to implement this function for real, even if I never write a single test.)

Like using a Spy, having a Debug.crash in the depth function will cause the test to fail when my original function calls it. However, now when I finish everything up, my original test won’t be brittle to implementation detail changes such as my refactoring to no longer rely on the depth function.

This makes Debug.crash a better choice for “programming by wishful thinking” than Spies would be. That means introducing Spies increases API size but does not introduce a better way to do things.

in my experience these tend to be slower, sometimes flakey, and more cumbersome to set up for testing edge cases

That’s been my experience with tests where race conditions are possible, e.g. Selenium tests in browsers.

However, elm-html-test tests are exclusively Elm code inspecting exclusively Elm data structures. They require no more work to set up than any other elm-test test, they run just as fast, and they are 100% deterministic; I’m not sure how they could even theoretically flake. :sweat_smile:

My point is that we can do the same thing for testing the opaque value Cmd as we’ve done for testing the opaque value Html. (This is one of the benefits of representing effects as data!) Granted, we haven’t done that work yet, but I claim that once we have, those Cmd tests will be just as flake-proof, easy to set up, and fast to execute as any typical elm-test test.

In a world where we have elm-html-test-esque ways to test commands, I think the claim “for any test written using a Spy, a better test could be written that doesn’t use a Spy” becomes true. Debug.crash is better for guiding implementation where the function you want to call doesn’t exist yet, and Cmd tests are better for detecting regressions.

As API designers, I think it’s just as important for us to decide what goes in the toolbox as it is for us to decide what doesn’t. I think the toolbox that will lead to the best test suites is one that contains a full menu of of Cmd tests, and no Spies. :slightly_smiling_face:

(1) When I mentioned slow, flakey, cumbersome to set up tests, I was specifically referring to “end-to-end/integration tests” and @rtfeldman’s comment, “It seems to me that it would be strictly better to write a test which actually verifies that the data gets sent to the Websocket,” made me think of that – I’d call a test where the app connects to a websocket an integration test, and such tests, while still being useful in certain cases, in my experience can be slow, flakey, and cumbersome to set up. Obviously, elm-html-test allows you to write different kinds of tests that do not have this problem. And so does elmer.

(2) @rtfeldman suggests that a good approach to dealing with side effects in tests is to follow the pattern of elm-html-test and find ways to inspect the opaque value Cmd. I tried something like this in the early days of working on elmer and I ran into a few challenges. First, I believe, unlike the opaque value Html, libraries can choose to store values in a Cmd however they like. So, if you compare the Cmd generated by WebSocket.send vs that generated by Http.send, you’ll see that these have different structures for the data stored within them. Each library that uses a Cmd will thus probably need its own specific testing tools, but maybe that’s ok. The second, deeper, problem is that some libraries might store relevant data in a Cmd in a way that is inaccessible. For example, Http.send from elm-lang/http stores a function that captures the values it needs (the details about the request). That could be the right decision as far as that library is concerned, but I’m pretty sure it means it’s not possible to inspect the details of the request (details I might like to test), based solely on the Cmd.

For these reasons, elmer takes a different approach. Elmer manages effects for you during a test, passing messages from commands or subscriptions to the update function as necessary. Test writers don’t test the update or view functions in isolation; instead, test writers describe the behavior associated with a model, view function, and update function working together. To manage effects during the test, elmer just knows how to process fake commands (and subscriptions). For example, test writers can pass a tagged value to Elmer.Platform.Command.fake : msg -> Cmd msg to generate a fake command. When elmer sees this kind of command, it knows to get the tagged value and pass it to the update function. When you test a part of your app that deals with a Cmd in some way, you need to do one of two things: structure the code so that during the test you can pass a fake command instead of the real one (ie, follow the dependency inversion principle with respect to cmd-generating functions) or use a Spy to replace the cmd-generating function during the test with a function that returns a fake command. (Again, spies are not necessary; they just make writing certain kinds of tests easier.)

There are certainly other approaches one could take, but I’ve found this one to be quite nice. I don’t have to worry about how some particular library has decided to structure data in a Cmd. My tests describe the high-level behavior of my code – ie that it passes the correct arguments to the cmd-generating function (if that’s necessary to test) or that it does the correct thing when the command is processed. My tests are all pure functions, so they’re super fast and flake-free. I am completely faking out Cmd and Sub, though, so, of course, as I mentioned earlier, it’s a good idea to supplement elmer tests with a few high-level, happy path, end-to-end/integration tests to provide confidence that the system works as a whole.

Yep, I agree. Fortunately, there are a fixed, small number of Cmd values in the language!

Oh, for sure - it’s impossible to inspect a function, any Task can contain a function if Task.andThen was used to create it, and many Cmd values come from tasks. So there are lots of ways this can happen.

Hm. If I understand this right, here’s the sequence of events:

  1. Test author comes up with a Msg value
  2. Test author passes that Msg value to fake to generate a fake Cmd
  3. The test library takes the Msg value provided by the author, and passes it to update
  4. Test author runs assertions on the result of update

What value is step 3 adding? I can call update myself! :grinning_face_with_smiling_eyes:

That way my workflow is strictly shorter:

  1. Test author comes up with a Msg value
  2. Test author passes that Msg value to update
  3. Test author runs assertions on the result of update

What regressions would I expect to be caught by adding the layer of indirection of having the library call update instead of calling it myself?

A third option: offer an API for running expectations on that command and simulating its output.

For example, here’s a rough sketch of how such an API might look for HTTP:

    |> Expect.Http.fromCmd
    -- Verify things about the HTTP request in this command
    |> Expect.Http.header "X-Requested-With" "XMLHttpRequest"
    |> Expect.Http.url "/foo"
    -- Convert to an Expectation when finished verifying
    |> Expect.Http.toExpectation

You could read this as “Expect that this command contains an HTTP request which has this particular header and this particular URL.” The Expect.Http module would support checking all the various ways you can construct a HTTP request.

That module could also include APIs for running expectations through Task.andThen chains by specifying what the server sent back as a response, e.g.:

    |> Expect.Http.fromCmd
    |> Expect.Http.header "X-Requested-With" "XMLHttpRequest"
    |> Expect.Http.url "/foo"
    -- Expect this HTTP request to chain into a second HTTP request
    |> Expect.Http.toTask expectedResponse
    |> Expect.Task.andThen
    |> Expect.Http.fromTaskExpectation
    -- Run checks on the second HTTP request
    |> Expect.Http.header "X-Requested-With" "XMLHttpRequest"
    |> Expect.Http.url "/bar/baz"
    |> Expect.Http.toExpectation

This style of API would permit:

  • Testing Cmd values which wrap HTTP requests
  • Testing HTTP requests chained together with Task.andThen
  • Testing how HTTP responses get converted into commands (not shown in example)
  • Doing all of these without needing a browser environment or actually running any effects

These are the things I really want in a Cmd testing solution. I want to verify that when a certain Msg gets passed to update, the result includes a HTTP request being sent to a particular URL. If that doesn’t happen, regardless of implementation details, I know I have a regression!

You can imagine modules like Expect.Websocket and Expect.Dom for the other entries in the fixed set of Cmd values in Elm.

This approach for testing commands really appeals to me. It can catch any regression a Spy can, but without the false positives or false negatives! :smiley:


I agree with this very much. And this is the same advice I always give when someone brings testing. That being said, I think it’s more about flaws of whole TDD approach than about this specific package. If the point of this tool is to improve TDD experience in elm I see it as being a beneficial contribution to elm ecosystem.

From the Fowler blog post:

With the classic approach, however, any tests of client objects can also fail, which leads to failures where the buggy object is used as a collaborator in another object’s test. As a result a failure in a highly used object causes a ripple of failing tests all across the system.

In OOP, you have highly interconnected systems that can be brought down like this. In FP, everything acts only by its return value. This kind of torpedoes the “state/behavior validation” dichotomy. There’s no state to validate unless its returned, which is the only thing you can validate. Behavior validation is less than useless: whether or not a function calls another one is an implementation detail. Such a call can’t set state somewhere else, so the only thing it can do is ease computation of the return value.

The other thing that strikes me about Fowler’s quote is how abstract and speculative it is. He goes on to describe how mockists and classicists can’t agree on whether this is a problem – a sure sign, to me, that hard evidence is lacking. Elm has a philosophy of gathering evidence and being data-driven, rather than worrying about everything that can go wrong up front. I would summarize it as, “it’s not a problem until it becomes a problem”. I’d say a testing concern “becomes a problem” when either you can’t be confident in the correctness of your app, or you can’t guide your implementation with tests. (This connects to Richard’s two reasons for testing in the 11th post in the thread.)

All of that is to say that purity and immutability dramatically change the approach and philosophy of testing. We don’t have the “gorilla holding the banana in the jungle” problem, so we don’t need mock jungles and gorilla spies (guerrillas?). But we do need to pass arguments to functions, and elm-test takes the most salient idea from prior art FP testing – fuzz tests – and makes them super easy to use. And once we’ve called the function under test (not a whole system, just a function!), we need to validate the return value. Even a command is just a value that can theoretically be validated, but for various reasons those values are opaque and cannot be inspected.

That’s where Richard’s Expect.Http and similar idea comes in. I think that, in broad strokes, it’s a good idea since it makes unit testing a function that returns Http.Request look like any other unit test. The specific API Richard proposes requires a fairly large number of function in Expect.Http, all of which require kernel/native code right? It also means a single expectation and test includes an unlimited number of validations. I wonder if a better API would be to convert the request to a record, and then use existing expectations on its fields?

    |> Expect.Http.fromCmd
    |> .header
    |> Dict.get "X-Requested-With"
    |> Expect.equal (Just "XMLHttpRequest")

I mean, maybe not, but this would take a very direct approach to turning an opaque value into an inspectable one.

describe "my HTTP Request"
   req = Expect.Http.fromCmd cmd
   [ test "has the correct X-Requested-With header" <| \_ ->
      |> Dict.get "X-Requested-With"
      |> Expect.equal (Just "XMLHttpRequest")
    -- , other tests here

I don’t think I did a good job in my earlier post explaining one should use Elmer.Platform.Command.fake, and @rtfeldman’s question about the value of step 3 brings this out.

As I’ve said before, to practice good TDD, I want to write tests that know as little as possible about implementation details. Why? Because as I add new features or realize better ways to structure my code, I’ll have to change the details of my implementation. When I change those details I want to run my tests; if those tests pass then I know my software still has all the behavior I’ve described with those tests. In this way, my tests provide me with confidence to change my code. But if my tests know a lot about implementation details, chances are that when I change how my software is implemented, I won’t be able to run those tests – I’ll have to spend some time changing them so they’re in sync with the new details of the implementation. But then my tests won’t be able to give me that confidence anymore, that fast feedback as to whether my software still has all its expected behavior. And if that’s true, then the value of those tests has been significantly diminished.

When it comes to writing tests for an Elm application, I want my tests to know as little as possible about the shape of the model, the messages passed around by the application, the particular functions that are called (besides view and update), and other implementation details. If I can write tests like that, then I’ll have a lot of freedom to refactor my code, and I’ll do so with confidence provided by running those tests.

This gets me back to @rtfeldman’s question: Of course I could write a test that called the update function myself – but that’s precisely what I want to avoid. To call update from a test, the test would need to know details about my model and about the messages passed around my app. If my tests know those things, then they will have to change should I refactor my code in a way that modifies these details; and if that’s so, then my tests won’t be able to give me confidence to change my code.

Take a look back at the examples I’ve provided in this thread. You’ll see that whenever Elmer.Platform.Command.fake or Elmer.Platform.Subscription.fake is used, there is no reference to any particular Msg value or tagging function. What Elmer test writers do is inject a command (or subscription) generating function into a test that then calls Elmer.Platform.Command.fake or Elmer.Platform.Subscription.fake with a reference to the Msg value used by the code. For example, consider WebSocket.listen. During the test, you would replace this function (by injecting it or by using a Spy) with something like this:

fakeWebSocketListen : String -> (String -> msg) -> Cmd msg
fakeWebSocketListen host tagger =
  Elmer.Platform.Subscription.fake "fake-websocket-listen" tagger

My test does know that I am using WebSocket.listen – but it doesn’t have to know about the details associated with how my app uses it. The test doesn’t care what the tagger is, it just passes the reference to Subscription.fake so that Elmer can process the subscription data just as the Elm runtime normally would. In this way, I can describe the high-level behavior of my app while my tests know as little as possible about the low-level details.

If you’re not a fan of TDD then Elmer might not seem worthwhile. This project has been a fun way for me to learn what the kind of TDD practice I follow might look like in Elm.

Thanks for clarifying! I think we agree that it’s desirable to be able to test commands and subscriptions without having to couple the tests to update, Model, or Msg.

I want to write tests that know as little as possible about implementation details.

This paragraph describes precisely what Spies do - testing internal implementation details such as which other functions are being called - and then argues forcefully that doing this is a mistake.

I don’t know how to read this as anything other than a full-throated condemnation of Spies, so I’m confused why they still seem like a good idea to include in the API. Don’t these supporting arguments point to the opposite conclusion?

1 Like