I’ve been toying with setting up a code coverage tool for Elm. In light of this, I’ve been breaking my head over how to present the collected data in a goal-oriented way.
Let’s frame this by defining exactly what code-coverage ultimately does:
Code coverage gives a visual indication of which parts of the source code were and which parts were not evaluated during a test-run.
To be clear - “code was evaluated” does not mean “code is correct”. Writing tests which make no useful assertions but simply increase test coverage is trivial. Conversely, there is code you do not want to be evaluated, ever.
Based on this, I posit that code coverage is not a useful metric for measuring the quality or extensiveness of unit tests; and as such I don’t want to condense it down to a single percentage.
This brings me to the questions I’m actively looking for feedback on:
Is there a point to tracking expression-coverage?
Since Elm is eagerly evaluated, the only ways for expressions not to be evaluated is one of the following:
- part of an unused function
- part of an unused lambda
- part of an unused
- part of an unused
Hence, tracking evaluation of function bodies, case branches, if else branches, top-level and let declarations seems “sufficient”. Looking only are those, one gains the same information about which parts of the code are used or unused. This also makes it harder to condense coverage down to a single number, which seems like a good idea.
There are edge-cases to consider, however.
- implicit lambdas in composed pointfree expressions. But then these seem like measuring implementation details of the code generation.
- boolean expressions in short-circuiting context:
True || foo == barwould never evaluate
foo == bar.
Are there other edge-cases I’m not thinking of that should be informing my decisions here? Regardless of the edge-cases, are there uses of expression-level coverage that I’m not seeing here?
How to present code-coverage?
Most tools come in two parts - a generic overview with some progess bars, raw numbers and funny names, and then the source code with some markup to visualize which things were or weren’t evaluated.
Is there an alternative approach possible that does a better job of informing how to write better tests?
This is a topic people tend to be very opinionated about, so I kindly ask to provide feedback specifically for the 2 questions I’m asking, and not diverge into discussing the merits of code-coverage-as-a-metric. Thanks in advance ️