Errors are under-appreciated. I discovered that on a greenfield project when it occurred to me that I had essentially no tools in my developer utility belt for architecting them.
Sure, I write code for handling errors every day, and every programming language has built-in tools for handling errors. But the majority of error architecture that I’ve seen is not exactly graceful. Instead, it seems like programmers (myself included) opt to handle errors totally ad-hoc, as if they’re not an integral part of the larger piece of software.
I never really learned how to think about errors systematically or strategically. I’ve been reflecting on this fact and trying to figure out why architecting errors in a satisfying way is such a pain in the neck, and how we can prevent that pain. After a lot of reflection and experimentation, I’ve discovered a few really useful concepts.
In this three-part guide to dealing with errors as a software developer, I’ll cover:
- How to categorize errors and why these categories are important (this post)
- How to represent errors in your system as data and/or code
- The numerous ways to handle your errors
I hope that you’ll also recognize errors as an important and central part of your software and that you’ll gain a fresh perspective on how to deal with them systematically.
What Is an Error, Really?
Errors are a natural and pervasive part of software. Why? People are involved every step of the way, and human error is completely unavoidable. I’m not just talking about programmers, either. People make mistakes while designing software and using software, and they also make mistakes like not paying the bill for a third-party service.
In this series, I’m talking about “errors” in a broad sense, not something narrow like a 500-level HTTP code or an
Exception object. I’m talking about anything that could prevent your software from accomplishing what it’s intended to do. That means how you handle errors is just as important as your main business logic.
Two Ways to Categorize Errors
I’ve found two dimensions that are useful for categorizing errors:
- Exceptional Errors vs. Failures
- Internal vs. External Errors
This categorization is important because recognizing the broad range of errors will help you think about error cases as you’re programming around them, instead of after a support request has already been filed. In addition, categories can provide a heuristic for deciding how to handle an error. I’ll give examples later in the series.
Dimension 1: Severity – Exceptional Errors vs. Failures
Different programming languages use the terms “error” and “exception” to mean a variety of things. I prefer to think of all errors as either “exceptional errors” or “failures.” There are no specific criteria for these; it all depends on the context around the errors inside your program.
I’m not using “exception” in any language-specific sense (e.g., an
Exception class), although they tend to overlap. I’m also not just referring to data that gets thrown or raised, although there’s also a large overlap in that category.
To my mind, an exceptional error is something that you don’t really expect to happen but that you safeguard against just in case. Here are a few potential exceptional errors:
- The application runs out of memory.
idhas no corresponding database object.
- A supposedly-JSON string is not in JSON format.
One common trait of these errors (and almost all errors) is that there is a graceful way to recover. What makes them exceptional is not that they might crash the application, but rather that they probably aren’t a normal part of the application logic. That’s why they’re only potential exceptions; if you know they’ll occur frequently, they aren’t exceptional for your system.
I’ll talk about how to deal with exceptions in my next two posts, but the short story is that you should handle exceptions by terminating gracefully and then making sure they don’t happen in the future.
A failure means that an operation can’t continue for some reason. Failures are very common, and there’s almost always a graceful way to recover from them, too. Failures include:
- A user enters their password incorrectly.
- The app can’t download an image because the CDN is down.
- A user can’t access a feature because a mobile app doesn’t have the right device permissions.
- The app has no network connection.
None of these errors should ever cause your application to crash. If they do, they would be considered exceptional, and you should treat them as such.
While exceptions are often handled in programs as specific types (such as an
Exception subclass), failures can be represented in many other ways — strings, error codes, conditional statements, etc. I’ll discuss how to represent and handle failures in my next two posts.
Dimension 2: Sources – Internal vs. External Errors
Internal errors are caused by mistakes in a program’s design or implementation.
Design mistakes manifest at “logic” errors, meaning the program executed as expected, but that expectation was not “correct” from the business logic perspective. It’s impossible to handle logic-sourced errors at runtime. They can only be prevented during the design of a system.
Implementation mistakes mean that the program does not execute as expected. Instead, it produces either an incorrect result or a runtime error. Here are some examples:
- Misconstructing a regular expression
- Using the wrong operator (for example, using
++as a prefix rather than a postfix, or vice versa)
- Writing a test that doesn’t fail when it’s supposed to
These are errors caused by clients and dependencies. Clients can mean people, bots, or other systems that depend on your software. Dependencies are, of course, resources that your system depends on, such as libraries or APIs. These tend to be failures rather than exceptions because you cannot prevent them. For this reason, you should handle these errors right when they occur and redirect the system into a “happy” state as soon as possible.
In fact, peripheral errors are really the only type of error that it makes sense to handle with any complexity at all. That’s because non-peripheral errors are, by definition, under your control; you should “handle” them by preventing them via tests, validations, etc.
Why Does the Difference Matter?
Internal errors are easier to prevent than to handle.
- Unit tests should focus on weeding out internal errors because you’re in total control of inputs and outputs.
- Internal errors should face as little error handling as possible because it’s far better to just prevent them with exploratory testing.
- You typically won’t write much code for handling internal errors because if you can detect them, you might as well prevent them altogether by fixing your code.
External errors are easier to handle than to prevent.
- External errors can’t be prevented because you simply are not in control of them. You may have some hand in designing an external API or in giving your users easy-to-follow instructions, but it’s only inside of your own program that you have real control.
- Integration tests are well suited for testing external errors against your application.
- External errors should face plenty of error handling because you have no choice other than to let your application break!
You can think of every error as either an exception or a failure, depending on how severe or rare it is in the context of your program. You can also classify an error as internal or external, depending on whether it’s a simple programming mistake, a failure to implement domain logic, or an unexpected interaction with an outside system.
Being aware of the numerous types of errors will help you identify them more quickly as you program, and knowing what type of error you’re dealing with can help you decide how to best handle it.