How Do We Know Our Software Works?

Yes, really. How do we know that software we build actually works? How can we know that it works? What observations and actions contribute to a holistic, fact-based, confident understanding that software I just helped build does what it was intended to?

I want you to feel a little anxiety about that question. Forget for a moment that you’re working with smart people who participate in practices that help bolster your confidence. Instead, dwell on the frightening reality that people are flawed and make mistakes on a regular basis. Human communication is always incomplete, good intentions don’t guarantee good results, and it can be genuinely hard to build a broadly-shared mental model about how a problem should be solved. How do we have any confidence that our software works?!

Feeling any anxiety yet? Excellent. Now let’s harness that anxiety and use it for good. Let’s take a look at the concrete observations and actions that give us the ability to confidently say “Yes. I know it works because…”

Lazy Answers

Some answers to the question don’t require any effort. They’re lazy answers. They’re not the kind of answers that help anyone else have confidence because they don’t prove anything. Despite that, these factors do contribute to an individual’s sense that the software works, and I would be missing part of the big picture if I didn’t mention them:

  • It runs.
  • Other people are using it.
  • I remember how it works.
  • It worked the last time I ran it.

The more of the system I was personally responsible for building, the more these things mean to me. For trivial software toys, maybe this is enough. But most useful software is not trivial and not built by one individual.

Does It Work Today?

As we work through the development process, we want to know that completed features work before we deploy them. We can use humans and machines to help evaluate whether or not a feature works.

Human Evaluation

The exploratory tester went through the app (and found a few things).

We have an exploratory tester at Atomic, Phil Kirkham, who does an amazing job of approaching our software with fresh eyes. Having Phil work through software we’ve built gives me and the team confidence that we haven’t made glaring omissions or missed small things that cause errors or usability problems. I feel most confident after his run-through if he finds a few small things for the team to fix because I know our software isn’t perfect.

We’ve reviewed the behavior with someone who understands how it should work.

Giving customers and users access to review completed features is a core part of Atomic’s development (and design) practices. In the best case, we sit down and review features directly with someone who deeply understands how each feature should work. That could be a product owner, designer, domain expert, or future user of the new features. By doing that, we know that someone else has seen and understands how the software works. We’re spreading the confidence around.

The team is aware of the software’s limitations.

I’ve said many times that no software is perfect. I’m not referring only to bugs or defects—software teams make trade-offs in order to balance the constraints of scope, budget, and time that lead to software having limitations in close proximity to desired functionality. Being aware of these limits, plus tracking defects and bugs closely, contributes to confidence that we know what works and what doesn’t.

I didn’t build it alone.

Pair programming, creating and reviewing pull requests, and discussing problem solving approaches with the team all contribute to having more than one brain involved in making a piece of software work. As a team, we support and balance each other, another pair of eyes catching what was missed and another mind sharing a different perspective. I feel more confident about my work when others have seen it and been involved.

The code makes the intended behavior clear.

If I can dive into the code for a software system and understand how it operates, I can quickly identify mismatches or alignment between the code and the desired behavior. Even better if the code, through the naming of classes, methods, functions, and variables, makes the intended behavior clear. Naming and clarity are important.

Machine Evaluation

It compiles (if that means much to the tooling).

Strong, static type systems in compiled languages like C# or Haskell can provide a lot more confidence that a system can only have valid states. On the other hand, dynamic languages like Ruby open the door to any number of problems due to mismatched method parameters or other invalid states. Simply using tools with strong type systems isn’t magic, though, so use the tools to the best of your ability and be realistic about the guarantees you’re getting from your compiler.

The automated tests pass.

We love a green test suite because it gives us confidence that we didn’t break the system. And it can be a great metric to share with our customers to help build confidence that the system works. But only if…

The automated tests fail for good reasons.

Test suites are only as good as the failures they catch. I always want to see a test fail for a good reason before I consider it complete. A valuable failure is one that results from the implementation being broken in a believeable way. For example, a dependency returned an unexpected value or a numeric formula within the function under test is incorrect. Less useful failures include exceptions due to missing dependencies, changing test inputs to unlikely values, or simply raising an exception in the function under test.

Is It Still Working, a Year Later?

Knowing that a piece of software works while it’s being developed is important, but it’s not the end of the story. We build software because people expect to get value out of it for longer periods of time. How do we know that the software we wrote last year is still working correctly today?

People are using it.

Seeing that people have been using the software gives some degree of confidence that it has been working over a historical period of time. It’s even better to receive a bug report or two because that demonstrates an open channel for feedback. What it can’t give is confidence that, right at this very moment, the software is working.

Logging and monitoring are in place.

Logging, monitoring, and notification of problems are critical for any production application. These grant the confidence that the system is working correctly right now, and confidence that the team will know if that changes. Tools like New Relic, Papertrail, Monit, and PagerDuty can help fill these roles.

Tests run on CI periodically.

If you have a great test suite, don’t stop running it! We run test suites for inactive projects weekly and on interesting boundaries (Daylight Savings Time, new year, etc.) to help catch problems that crop up later in the life of a software system.

Plan for Confidence

We have stronger confidence that software works when we put forth the effort to build it into our software tools and practices. Confidence won’t simply emerge from the code as a happy by-product. Make a plan–identifying how you’ll build confidence in your software–so that you, your team, and your users can share in the confidence that your software works.