The ideal code coverage report

I’ve been toying with setting up a code coverage tool for Elm. In light of this, I’ve been breaking my head over how to present the collected data in a goal-oriented way.

Let’s frame this by defining exactly what code-coverage ultimately does:

Code coverage gives a visual indication of which parts of the source code were and which parts were not evaluated during a test-run.

To be clear - “code was evaluated” does not mean “code is correct”. Writing tests which make no useful assertions but simply increase test coverage is trivial. Conversely, there is code you do not want to be evaluated, ever.

Based on this, I posit that code coverage is not a useful metric for measuring the quality or extensiveness of unit tests; and as such I don’t want to condense it down to a single percentage.

This brings me to the questions I’m actively looking for feedback on:

Is there a point to tracking expression-coverage?

Since Elm is eagerly evaluated, the only ways for expressions not to be evaluated is one of the following:

  • part of an unused function
  • part of an unused lambda
  • part of an unused case..of branch
  • part of an unused if..else branch

Hence, tracking evaluation of function bodies, case branches, if else branches, top-level and let declarations seems “sufficient”. Looking only are those, one gains the same information about which parts of the code are used or unused. This also makes it harder to condense coverage down to a single number, which seems like a good idea.

There are edge-cases to consider, however.

  • implicit lambdas in composed pointfree expressions. But then these seem like measuring implementation details of the code generation.
  • boolean expressions in short-circuiting context: True || foo == bar would never evaluate foo == bar.

Are there other edge-cases I’m not thinking of that should be informing my decisions here? Regardless of the edge-cases, are there uses of expression-level coverage that I’m not seeing here?

How to present code-coverage?

Most tools come in two parts - a generic overview with some progess bars, raw numbers and funny names, and then the source code with some markup to visualize which things were or weren’t evaluated.

Is there an alternative approach possible that does a better job of informing how to write better tests?

I haven’t been able to come up with any, so for now this is what it currently looks like when running it on the rather impressive elm-syntax codebase.


This is a topic people tend to be very opinionated about, so I kindly ask to provide feedback specifically for the 2 questions I’m asking, and not diverge into discussing the merits of code-coverage-as-a-metric. Thanks in advance :heart:

3 Likes

Have you considered a heat map? Seeing that one branch has many many hits and another having only one can be an indicator that the logic in the second branch is not completely tested.

I’m interested in how you’re generating this report. Coverage and tracing are close siblings and I plan to add tracing to elm-benchmark at some point. I’d love to be able to use a similar approach if you’ve found something that works well.

Ooh, I like that. I was indeed wondering how to visualize “this was evaluated once” vs “this was evaluated many times”. I used color intensity at some point, but ran into some issues with nested things not being very clear and accessibility being a concern (hence the dashed underlines for things that weren’t evaluated).

This is very interesting, thanks!

Let’s discuss that on slack :slight_smile:

Have you ever seen the coverage report that Clover generates (for Java)? Its heap map is shown as a tag cloud of class names, names are bigger if they have more complexity, and redder if they have more untested LOC in them. Complexity is formally the cyclomatic complexity of the code, informally a measure of how many branches it has. This map makes it easy to identify code that has many untested branches in it, and focus on those to improve the overall quality of the testing.

1 Like