How to React and Respond to Production Bug Reports

You’re just sitting down with your morning coffee, getting ready to start work on that new feature, when an email comes in: Your client is reporting a bug in production! What do you do?

Watch the Clock

My first piece of advice is to make sure you respond reasonably quickly. I have a tendency to plan around units of progress, rather than units of time. I might tell myself, “I’ll just figure out X so I can talk about it in my response.” X inevitably branches off into X’ and X”, and then I look up and it’s been an hour or two.

Enter the timebox. Set a timer for ten or fifteen minutes, after which your responsiveness trumps the depth of your analysis. Respond when the timer goes off, even if it’s just something like, “Thanks for the report. We’re looking into it.”

Write the Message

Fast-forwarding a bit, once you’ve gotten to the bottom of the bug, it’s time for a proper response. It’s easy for this software developer to focus solely on thorough, precise technical detail, which is both 1) too much, and 2) likely not helpful. To save time in the writing, and to make sure you cover what matters, here’s a basic template:

1. Executive Summary

Concisely convey what matters to your client right now: Who and/or what was affected, what’s the current status, and when will it be fixed?

We’ve determined that this is a bug related to (feature X), preventing users that (condition) from being able to (perform action). We’re testing a candidate fix now, and anticipate being ready to deploy it to production around (time estimate).

2. More Detail

Provide additional information at the appropriate level of technical detail for your audience. How was the bug introduced, and how did you fix it?

During the course of (previous effort), the (existing behavior) was changed to (new behavior). As a result, it was necessary to (tangential change). This was completed throughout most of the system, but we unfortunately missed (this one other thing). The fix was to (small change).

3. Think Long-term

How will you ensure this bug doesn’t recur, and what can you do to reduce the likelihood of similar bugs?

Going forward, we will validate this behavior with specific regression tests, and (take a broader action) to prevent other similar problems from occurring.

Silver Lining

I’d like to draw particular attention to the “broader action” in part 3. One upside to encountering a critical bug is an opportunity for improvement far beyond resolving a single defect. For example, the experience may highlight a need for a new class of automated tests…or raise the priority of that major refactor you’ve been putting off. Maybe your project’s infrastructure needs some attention. Then there’s the human side of the equation: Could a change to a project or development practice have prevented the problem?

Related Reading: