How to Solve Software Bugs with the Scientific Method

I don’t do creepy, crawly bugs, but software bugs are a different story. I relish the opportunity to hunt and kill particularly nasty software bugs. Here, I’ll share how I approach solving a bug. A lot of my process will sound like common sense, but you would be surprised how many developers I have seen running in circles while trying to fix bugs. Consciously following a process can help maintain your sanity during those seemingly hopeless searches.

These steps will look very familiar if you think back to middle school science classes. Do you remember having “The Scientific Method” reinforced over and over again? It was essential knowledge for countless quizzes and lab reports.

  1. Pose a question or problem
  2. Form a hypothesis
  3. Plan an experiment
  4. Record data and observations
  5. Return to step 2 as necessary
  6. Report/conclusions

Pose a question or problem.

The first step is always to formulate an informed understanding of what needs to be fixed. Forming a concise question or problem statement will help you pin down your exact goal from the start. It may be as simple as, “Why is feature X doing Y when a user does Z?” Often, this will be done for you in the bug report or backlog ticket.

Another important part of this step is to gather and document all of the readily available information. Do you have reliable reproduction steps? Which users are affected? How long has this been going on? What are the relevant parts of the application? Having a great QA tester on your side is invaluable for answering these questions.

Form a hypothesis.

Once you have as much information as possible, use it to create a hypothesis or prediction for what may be causing the bug. Maybe you have a really good guess. Or, maybe it’s vague, but that’s okay. As you iterate on this process you will hone in on more specific hypotheses.

For example, you might be able to be as specific as this hypothesis: “I think users are being logged out unexpectedly because their access tokens are expiring sooner than expected.” Or maybe you need to start more generally with something like: “I think a failed network request could cause a user to be logged out unexpectedly.”

Be sure to write down your hypothesis. It will be helpful to refer back to it later and ensure that you don’t waste time retrying the same tests over and over again.

Plan an experiment.

There are many types of experiments that can test your hypothesis. Writing automated tests, using a debugger, watching network dev tools, or even just adding log(“here”) at key points are all perfectly valid options. But as you plan for how to test, keep your hypothesis in mind. A clear goal for your experiment will make your bug hunt feel less like a goose chase.

The goal of this step is not to fix the bug. You just want to determine the level of validity of your hypothesis. You really shouldn’t be making any major code changes because we want to be using these tests to understand the implementation as it is. If your experiment leads you directly to a solution, that’s great, but for particularly stubborn software bugs, you should feel comfortable with quite a few failed experiments before finding a fix.

Record data and observations.

After your experiment has provided insights into your hypothesis, record what you learned. Go back to where you wrote down your hypothesis and keep track of what you learned. It may even help to write what you did to come to your conclusion. Don’t worry if your experiment disproves the hypothesis. That will be the case most of the time, and it just means you’re one step closer to finding the real solution.

Keeping track of these things is a great benefit when handing off work or setting it aside for later. If a higher priority feature comes up and you can’t get back to this bug for a few days, you’ll appreciate not starting over from square one. Or if a coworker has to take over the work, good notes will be a great starting point for them.

Repeat.

You now have a better understanding of what is going on. Use this information to form a new hypothesis and repeat the process. Take pride in failed experiments; each failure is one more corner of the app that you’ve ruled out.

The hardest part of this process is if you feel you can’t make a new hypothesis. If you run into that scenario, try zooming out. Let the application be a black box; observe actions in and buggy behavior out. Then slowly shrink the black box to different layers of the application.

Report.

Once you prove a hypothesis true, you should have enough information to narrow down the broken code. Once you’ve fixed any software bugs, the most important thing you can do is report back to your team with what was wrong, how it got fixed, and any other interesting findings you made along the way.

Conversation
  • Casimir says:

    Thank you for the post. I had a hard time convincing my colleagues to use the scientific method, because it appears tedious to apply it step by step instead of doing some quick trial and error. I first learned about the method while reading “Why programs fail” by Andreas Zeller. He recommends to set yourself a time limit for fiddling (10 or 15 minutes) and start using the scientific method when the problem proves to be hard.

    There is now „The debugging book“ by the same author, which you can read online for free:
    https://www.debuggingbook.org/html/Intro_Debugging.html#The-Scientific-Method

    Some parts cover recent research which is probably not yet applicable in “real world” settings.

  • Comments are closed.