Making Better Custom Software Estimates

A friend asked me on the spur of the moment to give a short talk on estimating for his team-lead meeting. They were facing the somewhat daunting task of estimating and planning phase three of a large enterprise software adoption project.

Trying to provide help to another team made me reflect on this question: Why is estimation so hard, and so disliked by technical staff?

Why is estimation hard and universally unloved? For one thing, it’s a discouraging problem: ambiguous, fuzzy, imprecise, full of unknown factors, risky, consequential. You never have all the information you want. You worry about your second-order ignorance (what you don’t know, you don’t know).

On the other hand, it’s not like we have a choice about tackling this thorny problem. The business needs to synchronize other decisions with the work we do: they have to plan sales and marketing, to budget, do return analyses, schedule resources, etc. And we certainly have selfish reasons to participate and do this well: missing a schedule commitment or overrunning a budget should injure our professional pride. It can also result in pressure to work beyond our sustainable pace and make personal sacrifices to our jobs.

When it comes to effective estimation, we should remember Voltaire’s caution about perfection being the enemy of the good.

Happily, there are some simple techniques we’ve learned to significantly increase our effectiveness and accuracy on this important work. I decided to organize my impromptu talk around the shortcomings of a typical, un-enlightened estimation effort.

Here’s what I’ve seen is pretty typical when a team makes a genuine effort to estimate, but isn’t familiar with some of the techniques I’m going to describe. Naive estimates are usually:

  1. made on very big things,
  2. constructed by the manager or team-lead only,
  3. not data-driven,
  4. based on little or no concrete investigation,
  5. expressed in absolute values of time,
  6. expressed to a false level of significance,
  7. created from single point estimates,
  8. independent of the calendar,
  9. independent of risks or assumptions.

Decomposition

Decomposition, breaking big things into multiple small things, is a critical element to improving your estimates.

Decomposition refers to breaking something large (a project, a feature, an interface) into multiple, smaller things (tasks, stories, widgets, phases, etc). Rather than estimate the big thing, we estimate all the small things and add them up.

The smaller the chunks, the more accurate the estimate. I don’t know who’s proved this, or who observed it first, but it’s certainly true in our experience. Partly it’s the conversation you have, as a team, in decomposing a problem for estimation. That work, the beginning of a shared mental model for the project, reveals misunderstandings, hidden features, requirements, dependencies and constraints, and creates a set of smaller estimation problems.

It’s not lots of fun, so we “share the pain” and do it in at least a pair, if not the whole team. The positive side effect of this approach is that you build responsibility and buy-in from the team for the estimate. On the other hand, doing it with the whole team certainly increases the cost of the estimate. The tradeoff between improved accuracy and decreased cost differs by context.

Divide and conquer.

Team Estimates

The team needs to own the estimate, so the team needs to make the estimate.

If you’re estimating with the team, then you need to work to consensus on each chunk you’re estimating. The conversation that comes out of differing opinions on the complexity of a task helps reveal the nature of that task, the diversity and importance of assumptions, and the range of optimism within the team. The results improve from simply talking about all this stuff.

You need to watch out for an overly strong voice, or overly influential person negating the benefit of the diversity of perspectives. Junior or less-experienced members of the team may not feel there is value in disputing a senior member’s estimate. Planning poker can help in these cases.

Prior Data

Mining data from prior projects, especially if time was formally tracked, can be an invaluable source of data for estimation efforts

Using data from prior projects is an easy way to ground your estimates in some reality. If you don’t track your time, and hence don’t have an accurate database from prior projects, shame on you. But don’t let that stop you. If you have to gather data from unreliable sources (people’s memories, for instance), then keep it coarse (weeks, not hours; months, not weeks).

If you made estimates on prior projects, and you have data about how long the projects actually took, use that comparison to reflect on how the estimate varied from the reality. Feed what you learn into your current estimation effort.

The task decomposition from prior projects is also a good place to check your current work against. If your projects are similar in nature you may eventually develop a checklist of items to be sure are on your estimation radar.

Concrete Experiments

Experiment trumps speculation when it comes to improving estimation accuracy.

A little investigation goes a long way. Concrete experiments trump idle meeting room speculation every time. If it’s important to be accurate, make or take the time to investigate the tasks with large estimates (usually a sign of insufficient understanding, fear or risk). An investigation might consist of simple research, writing a small amount of throw-away code, further interviewing a vendor, seeking clarity on requirements, investigating assumptions, quizzing partners, etc. These investigations take time, and hence have a cost. As with whole team estimation, there’s a tradeoff between accuracy and cost.

Relative/Arbitrary vs Absolute/Real

Relative estimation in arbitrary units beats absolute estimates in actual time units.

Relative complexity is easier to judge than absolute values. If you doubt this, do a quick experiment: look at two people sitting near you. Which one weighs more? Pretty easy, right? How many pounds does each weigh? Harder, right?

Estimating in relative complexity means judging how big or complex tasks are with respect to other tasks. The units of complexity to use in this kind of estimation are irrelevant. We prefer “points”. I’ve also used NUTs (nebulous units of time). You start by having the team agree on a reference point. For instance, the team might decide that task 1 is a 10 NUT problem. Now, when estimating task 2, it can be compared to task 1. 50% bigger or more complex? 15 NUTs. Half as complex? 5 NUTs.

Why not use actual values of time, even if you are doing relative complexity estimation? After all, the business needs an estimate in days, weeks or months, not NUTs or points. Part of the problem of using actual time units is the confusing accounting that will arise. If you estimate a task to take 20 hours, and you actually finish it in 10 hours, then what do you tell your manager? That you got 20 hours of work done in 10 hours? And what about in the other direction? When you underestimate a task you find yourself reporting that you did 20 hours of work in 40 hours. Really? Why were you slacking? And do you mean real hours or estimated hours?

Estimating in arbitrary units lets you report on the natural wins and losses in a less confusing manner. Great week? “We finished 30 points!” Run into some difficulties? “We only finished 17 points.” You put forth the same effort in each case, so there’s nothing odd to explain.

Tracking your project velocity lets you turn your estimation units into actual time (and hence a calendar and cost for the business). It’s really simple: you simply measure and track the amount of work the team finishes in each iteration. The team’s velocity is the rate at which they can complete work. In our experience, teams takes 2-3 weeks (iterations, in our case) to find their stable velocity. Teams that have worked together before, or similar recent projects may even shorten this stabilization time.

If you’ve estimated the project in points, and you’re tracking velocity, then you simply divide the total number of points of work remaining by the velocity to know how many iterations you expect are required to finish the project. We usually report this to the customer with a burndown chart.

The final advantage I’ll point out about using relative complexity points and tracking velocity is the way they naturally account for the difference between an ideal 8 hour workday and the hacked-up, interrupted, full-of-meetings, occasionally longer-lunch, day. Team velocity represents the team’s capacity with all the messy details of real work situations already accounted for.

False Significance

Estimation is frustrating: fuzzy, difficult, inexact. You can simplify the process, reduce the effort, and maybe even improve overall accuracy by trying to be less accurate at the detail level.

Estimation is an inherently squishy activity. You can’t prove your estimate is correct (at least until you’ve done the work). You have no option but to make a choice in the face of imperfect or incomplete knowledge. This can make it hard for analytical or technical people to estimate.

Trying to makes estimates in too many significant digits makes thing worse. Is this task 6.0, 6.5, or 7.0 hours of work? Call it 6 or 7 and be done. The added accuracy is false anyway, considering the nature of the problem, so there’s no use in spending time discussing it.

We believe in taking this a step further and estimating in discrete buckets related by powers of 2. Our project tasks are either 1, 2, 4, 8, 16, or 32 points. The biggest measurement is usually a sign that we need to work harder decomposing the task so as to better estimate it. Using this discrete set of estimates helps avoid pointless time and effort attempting to distinguish between a 6 and 7 point story. It simplifies and improves the use of a reference task, since the common cases are:

  • relatively trivial (1 point),
  • 1/2 the reference (2 points),
  • same as the reference (4 points),
  • twice the reference (8 points),
  • four times the reference (16 points).

A false level of significance (aren’t you glad you learned about significant digits in middle school?) can also hurt when you perform operations like summing and averaging on your estimates. Telling a customer that you achieved 16.745 points per hour is silly, and invites them to expect an unattainable level of accuracy in your project management metrics.

Range Estimates

Single point estimates don’t accurately represent the natural variation in a task. A range estimate of low and high is a good first step to improve accuracy. With a little definition and some simple math, range estimates can be used to much more accurately and responsibly estimate a project.

Single point estimates do a poor job of representing the variability in the actual time required for a task. The detailed explanation of why this is so involves understanding the form of the probability distribution function of a typical software task’s completion time. Steve McConnell’s book on Estimating has a good explanation of this.

A range estimate (low and high, say) gives a lot more information about the nature of the task being estimated. The difference between low and high indicates the uncertainty the team has in the estimate, or the natural variability of the task itself. A big spread indicates more uncertainty, a lower difference indicates more confidence and less variability.

Our experience has taught us that simply asking developers to make two estimates (“low” and “high”) doesn’t really result in much more information than a single point estimate. Developers tend to be optimistic when it comes to individual estimates. So without a little structure, the “low” estimate usually means the absolute smallest amount of time the task could take – an event with a very low probability of coming true. The “high” estimate means what it’s most likely to take to get done. This leaves a lot of stuff that could push the actual task completion time out unaccounted for (the long right tail of the estimate’s probability distribution function), and your overall project likely to be under-estimated.

If we’re making coarse, large-grain estimates, then we’ll use a range analysis technique described in Chapter 17 of Mike Cohn’s book Agile Estimating and Planning. This approach to project buffering is closely related to Goldratt’s critical chain project management techniques.

In this approach, our low or “aggressive but possible” (ABP) estimate is the most likely amount of time the task will take. The high or “highly probable” (HP) estimate is a conservative estimate that takes into account possible problems. By way of example, let’s say I live 15 minutes from work. On a good day, I know my ABP estimate for getting into work is 15 minutes. But what if there’s bad traffic, construction, or an accident? I know the route and alternate routes well enough that I feel confident, given whatever conditions, that I could make it to work in 30 minutes. Making the Highly Probably estimate requires either a lot of confidence and knowledge, or a really high estimate. On a task of high variability or unknown elements, the spread between “low” (ABP) and “high” (HP) can be quite large.

So how should we use the range estimate? Summing the HP estimates for all tasks will give a very large estimate for the project. After all, it’s very unlikely that you’ll hit the high estimate on every single task. It might feel like you’ve been on such projects before, but that’s probably because you weren’t making HP estimates for the high estimate. If you’ve done your HP estimates accurately, then a 10 task project only has a one in 1 billion chance of exceeding the high estimate on every single task. Using the sum of the high estimates would be terrible sandbagging.

Using the sum of the ABP estimates is also a problem. Doing so doesn’t account for any of the natural variation in the tasks, or the asymmetry of the completion time distribution function. The approach we use is to add a project buffer to the sum of the ABP estimates. You can think about the project buffer as receiving a contribution from each task in the project. You don’t know in advance which tasks will draw upon the project buffer, but you want to make sure that each task has contributed to it in proportion to the likelihood of need. The spread between the ABP and HP estimates indicates the potential for a task to go over and make a withdrawal from the project buffer.

The calculation we favor for project buffer is:

The overall project estimate then is:

The project estimate is simply the sum of all the most likely task estimates (the ABP estimates), plus a project buffer. Since the buffer is sized based on the spread between the low and high estimates, it protects the project from variability in a responsible manner. With this approach you’re neither sandbagging, nor irresponsibly underestimating.

Date vs. Duration

Probably the simplest way of improving your project estimates is to explicitly de-couple duration from calendar. It’s easy to become so involved with estimating the duration of tasks that you forget to account for the actual work calendar.

On a small time scale, estimating in relative complexity and measuring velocity helps with this de-coupling. If your company culture is such that you can expect to spend 10 hours per week in meetings, then your velocity will automatically reflect that – you don’t need to remember to use 30 work hours per week when making calendar projections from duration.

Even with points and velocity, you still need to keep your eye on the large time scale picture. If your estimate for a project indicates you need 30, one-week iterations to complete it, don’t forget to lay those 30 iterations against a calendar of vacations, holidays, conferences, plant shutdowns, or whatever else takes people away from work in order to predict a completion date.

Assumptions & Risk

Estimates require making assumptions. Assumptions violated are like risks realized. Both can be accounted for with some estimate buffering.

Estimates by necessity are based on assumptions the team makes. Keeping track of these assumptions formally allows for the team to review them and do a simple sensitivity analysis of the estimate with respect to the assumptions. At the very least the assumptions should be documented as part of the estimate.

Like assumptions, it can be very valuable to identify and document project risks while estimating. Tim Lister, author with Tom DeMarco of “Waltzing with Bears: Managing Risk on Software Projects” describes a simple technique for responsibly accounting for these risks in a project schedule.

For each identified risk, record the probability of the risk event coming true and the impact to the estimate if the risk event comes true. Now calculate a contribution from each risk to an overall risk buffer by multiplying the probability by the impact. For example:

Project risk buffer = 0.6 + 0.1 + 2 = 2.7 weeks

We don’t usually take this approach in our projects. We tend to turn risks into assumptions and document them along with the project estimate, helping the customer to understand them and making it clear that the schedule may be impacted if assumptions are violated.

If you use a project risk buffer be sure you’re not double counting for risks in the breadth of estimates on some task. Using risks for aspects of the team or project or business environment, versus the natural variation of tasks, achieves this.

Tell Us About Your Project

We’ll send our latest tips, learnings, and case studies from the Atomic braintrust on a monthly basis.