Measuring Quality: Code Coverage

Delete All the Asserts?

4 min readFeb 13, 2023

All my developer life I knew that code coverage is somehow misleading. Still, I always found it helpful to a certain point, yet never really thought about what’s wrong with it.

A wall of roman shields covering some Java code. — An **inaccurate** parable for code coverage

For my recent talk “How (Not) to Measure Quality” I took the time and analyzed why code coverage is so wrong.

So, just to be on the same page: by “code coverage” I mean pretty much the basic definiton from Wikipedia:

… a percentage measure of the degree to which the source code of a program is executed when a particular test suite is run.

So, following the reverse Goal, Question, Metric approach, I’d like to figure out what question this metric answers that helps us to reach our goal. Our goal here should be to determine the quality of our software.

I think a lot of people implicitly expect code coverage to answer the question “How well is the code covered by tests?”. But what is good testing? A big part of testing is to know what to expect of a product and to assert that these expectations are met by it. Code coverage but only measures if a line of code was executed, not the result of that execution.

To make this more tangible: given the following partial implementation of the Bowling Game Kata and its test:

The test yields a code coverage of about 80%. Now, there’s a part of the test that can be removed without reducing that number at all: the assertion in lines 15–19. But if we remove these lines, we can change whatever we like in the implementation and the test will still pass! We can also change the numbers in the test without any consequence!

So, I think we can agree that the code would not be well tested given such a test, right? So, we can also agree that code coverage does not answer the question “How well is the code covered by tests?”. It merely answers the question “How much code is executed by tests?”. I think that “How much code is not executed by tests?” is more to the point of determining the quality of the product, because — as shown above — execution doesn’t mean well tested, but non-execution surely means not tested at all.

So this is the bit of value code coverage truly gives us: a negative metric and hints of where tests are still missing. A coverage of 80% can mean a lot of meaningless, badly structured tests that make working with the code super hard. It can also mean a great test suite that documents the intent of previous developers and prevents unintended changes. A coverage of 30% surely means the code is poorly tested.

Alternative: Mutation Testing

There’s an alternative to code coverage, which in my opinion truly answers the question “How well is the code covered by tests?”, which is provided by Mutation Testing: killed mutations over total mutations.

Mutation testing manipulates the code of the product. For instance, it would change

int score = firstRoll + secondRoll;

into

int score = firstRoll - secondRoll;

Then it executes all the tests that execute that mutated line. If all tests pass despite the mutation, the mutation ‘survives’. If at least one test fails, the mutation is ‘killed’.

A mutation is a clear mistake. It changes the code in a way that cannot be good. If our test suite doesn’t detect a mistake like that, our code is clearly not well tested.

In my experience, Mutation Testing is a great tool to guide testing. By simply ‘killing mutations’, I ended up with a very decent test suite with meaningful cases. It also is quite effective to detect code branches that are not needed at all: if a condition can be mutated without changing the result, it can probably be removed.

The only problem with Mutation Testing is execution time. To know which lines of code are executed by which test, it requires an initial code coverage analysis. On top of that, it applies a number of possible mutations and for each of these, it executes the tests covering the mutated line of code again. The execution time largely depends on the number of possible mutations, but it is certainly several times longer compared to code coverage analysis.

Running it nightly is one way of dealing with this. I also run it locally for the part of the code I’m currently working on to fill the gaps my (far from perfect) test driven development (TDD) left.

Measuring Quality: Code Coverage

Delete All the Asserts?

Alternative: Mutation Testing

Written by Michael Kutz