Code Quality Metrics Are Guides, Not Goals

It’s tempting to try to measure code quality. And, to be fair, there is often a correlation between various metrics we’ve devised and the actual quality of code.

However, it’s a mistake to use the numbers to declare your code’s quality absolutely. Instead, use code quality metrics as a guide to shine a light on opportunities for improvement.

Let’s take code coverage as an example. When running your automated tests, your code is instrumented so we can tell which lines or branches are executed. The idea: the closer you get to 100% coverage, the more thoroughly your tests are exercising your code.

That number can be quite useful. But if a team is gating the act of shipping code by requiring developers to stay over a threshold, you can quite easily encourage bad practices.

Examine a contrived example of poor code quality.

For example, let’s say we want to test a new function.


def greet(name: str, is_night: bool = false):
    if is_night:
        return f"Good evening, {name}."
    else:
        return f"Good morning, {name}."

Let’s also say our standard is 90% test coverage. And our developers, who are supposed to be using test-driven development… aren’t. So they’re writing the tests after the fact, and they’re not very good:


def test_greet():
    result = greet("Mattie")
    assert isinstance(result, str)

Our code coverage is now 75%, which won’t allow us to ship the code. Our developer adds a test:


def test_greet_night():
    result = greet("Mattie", is_night=True)
    assert isinstance(result, str)

100% coverage! Ship it.

Terrible tests are real.

The problem is, this test isn’t actually checking anything useful. Yes, we should return a string. But we could easily change the strings to anything else, not use name, etc., and our tests would still pass.

“Who writes a test that doesn’t check the return value?” you ask. Unfortunately, many developers, under pressure and just looking to ship, will write these kinds of tests when the functions under test get more complex.

Here are a couple of things I’ve spotted in codebases and code reviews.

  1. A very complex object is returned that would take several assertions to check completely. The developer opts to merely make sure the return is not null.
  2. The function under test has a lot of dependencies and some are really hard to mock. The developer implements just enough mocking to keep the test from throwing a null exception, and no returns are actually checked.
  3. Neither will show up on code coverage. If nobody examines these tests, they could easily mask a fatal bug in the function under test.

    In this case, coverage helped us as a guide, pointing us to a problem. But making the coverage number happy wasn’t the whole solution. Careful reading of the tests to make sure they’re actually guarding against breakage and allowing for refactoring is the real goal.

    Expensive tests are also a real problem.

    Sometimes, though, covered code will be low for a good reason — the return on the investment in testing something may be too high.

    When we look at the smoke coming from the coverage tool to check for a fire, we may find code that, for example, requires setting up a large number of complex external dependencies to run a reasonable test.

    It’s possible this code simply might be too expensive to test. That is a valid outcome, but one not to be used lightly. Developer leads should be reviewing these cases closely.

    But that might also be a sign that our code is too tightly coupled to those dependencies. It might need refactoring to break it up into much less complex items to test.

    There still might be code at the end of that refactoring that is too expensive to test. But if you can reduce the complexity of that function by extracting parts that aren’t tied to a glut of dependencies, you can reduce that footprint and much more safely ignore the remainder.

    It’s not just coverage.

    Code coverage is the easiest to explain in a blog post, but other code metrics have similar problems.

    Complexity is one. Even in a complex system, it’s really easy to write really complex functions.

    Complexity metrics count the number of branches in a function and highlight functions that have high complexity. The idea: high complexity breaks brains.

    Simplifying a function that has grown to a high-enough complexity can be challenging. The developer tasked with the simplification likely already has a broken brain. Many will try to extract as many functions as they can to bring down the complexity count inside the main function.

    But extracting functions is no guarantee that readability is at all improved. A complex function may need analysis, extensive refactoring, or even rethinking.

    In this case, a metric pointed us to a problem. But making the metric happy is not the end of the solution. Making the function comprehensible is.

    What will you do with code quality metrics?

    I’m not opposed to coverage, complexity, or other code quality measurements. I even think it’s interesting to plot them over time to see how we’ve been doing with our codebase as a whole.

    A really good analog is the body mass index (BMI). Taken in aggregate over a population, BMI can be a useful measurement. Applied to a single person by a medical professional, BMI can completely misjudge their health and result in them receiving poor care.

    I don’t think teams should set up arbitrary numbers as goals to reach. Rather, when the tools show you something, take it seriously, solve any real problems, and move on.

    Your job as a developer is to ship code that is effective, well-tested, and comprehensible — not to hit a number that someone decided measures your output.


    As I was wrapping up this post, I was looking at several tools designed to improve our posts’ searchability and readability. I had to laugh a little because my writing was being measured by automated tools.

    But even though I may not be able to turn all the indicators green and still keep this article useful and understandable, I can see what they have to say and find opportunities to make this post better.

    They’re good guides.