Developers: Be Careful How You Use Metrics

Whether we like it or not, we all use metrics in our lives. Sometimes it’s as subtle as guessing how much time you need to get ready to leave for work in the morning based on the previous times you’ve gone to work. Other times it’s as intentional as tracking your salary to measure your career growth.

It’s a natural extension of that to apply metrics to job performance to measure how well things are going. When it comes to software development, however, this is surprisingly complicated. Whether you’re a manager tracking the performance of a whole team, or an individual contributor comparing your work against a coworker, it’s easy to stumble into pitfalls.

Be careful what you’re measuring.

The first thing to consider is what you’re really measuring. For example, stories completed over some period is a commonly used metric. What does this actually measure though? Are all stories the same size? You might assume that even if they’re not, they average out evenly if you just take a large enough sample size. Unfortunately, there’s no way to know if that’s actually true.
Software Development Teams: Be Careful How You Use Metrics
What if an individual (or team) specifically takes small stories because they like quick turnarounds? Meanwhile, someone else on that same team takes big complicated ones since they like diving into deep, complex problems. Suddenly, you’re measuring people’s preferences instead of performance.

You might then say, “Well let’s abstract this to points completed, as that measures complexity of the work.” That’s true, but now you have to keep in mind that each team measures points differently. It is a personal metric per team. Even if you compare within the same team, not all points are equal as they only get so granular. Additionally, they don’t measure time but rather complexity. This means that someone can still do a lot of 1-point stories and one person can be stuck on a 5-point story all sprint, because higher complexity means higher unpredictability. On top of that, things like spikes and bugs are considerably less reliable (if you even point them), and someone could easily be doing more or less of those for some reason.

Be careful why you’re measuring.

Even if you’re extremely specific and find a metric where you’re sure you’re measuring something relevant, you still have to be careful. It’s worth stopping for a minute and taking into account WHY you’re observing something abnormal. For instance, let’s say you want to compare the time it takes for someone to complete 5-point stories. Not all 5-point stories are the same, but if you take a large enough sample size, the differences should even out for a mature team. This means you will actually probably be measuring relevant differences, because you made your metrics specific enough.

However, what does it mean if you’re seeing a large discrepancy? It could mean that one person is more efficient than the other. It could also mean that one person cuts more corners. Or maybe one person likes to take a lot of time to make sure they do it the right way, and then later developers use that work to make their stories faster. If one person consistently does this, they might be the most valuable person on your team… but they might look awfully slow by that one metric.

Be careful how you use results.

Developers are pretty smart, and you pay them to optimize. If you use metrics as a means to determine pay raises or bonuses, you might be surprised at what you get. I’ve seen teams refine stories as small as possible to inflate their count of stories completed just to make their team look good to management. Even if you use a good metric, using it for something that directly impacts the careers of individuals makes it less reliable at best and sabotages your own team at worst.

Let’s take the previous example of the developer who takes twice as long to do things in order to set their team up for success. If they find out they are performing poorly on an important metric that determines their raise, they might suddenly stop doing that. Suddenly your software is actively worse than before, all because you were using metrics to gauge performance blindly.

Use metrics wisely.

I’m not saying you should never use metrics. It can be very useful to have discrete items to measure for insights into what’s going on. You just have to be careful. Make sure you’re measuring something relevant. Consider what possibilities could result from the metrics you’re getting. Don’t use those results in a vacuum that will encourage bad habits. Consider using multiple metrics from different angles so you get a more complete picture. Sit down and have conversations about things with your employees or coworkers.

Sometimes the trend of your metrics is more important than the actual results. Sudden changes can be an indicator that something is wrong, and it’s not necessarily anybody’s fault.

It’s worth taking the time to consider these issues when employing your metrics to make sure they’re used responsibly, for everybody’s sake.