Yesterday, I talked about the many types of errors in software and how you can categorize them strategically. Now let’s talk about how you can capture and represent errors in a useful way.
I’ll be using the term “error data” to mean information about an error, as opposed to the actual “error event” that happened. I’ll also use the term “consumer” to mean any part of your system that specifically looks for error data. This could be a
catch block, a function that calls another function that produces error data, etc.
What Makes Good Error Data?
Before we can decide the best way to create useful error data, let’s consider what makes error data “useful.”
Accuracy and Precision
You should always capture error data as accurately as possible. Inaccurate error data leads to confusion and possibly even more errors. Typically, the only way to create inaccurate error data is by making a mistake while coding.
Precision, on the other hand, has a sweet spot. Over- or under-specific error data will be difficult to use. For example, think about these three ways you could represent a single error:
- “These credentials are invalid.”
- “The password is incorrect.”
- “The given password is two characters off from the expected password.”
The first one would likely be the easiest to represent as error data; the third would be the hardest. If your web app needs to tell the user what they did wrong without being too revealing, the first version is not precise enough, and the third version is too precise. Your app can present a less precise message to the user, but it would be more work to parse out exactly which information is relevant.
Fit for Its Intended Use
Usefulness also depends on the reason you’re gathering error data. The most important use is figuring out how you can recover gracefully from the error.
- For a failure, that means you need enough information to distinguish the error state from the “good” state, as well as from other error states. The consumer might also need information about why the failure happened (e.g., invalid fields).
- For an exceptional error, you just need enough information for a consumer to recognize and recover from the error. The way this pattern matching happens will depend on your system. Object-oriented languages typically use class inheritance, whereas a language with a structural type system may use discriminated unions.
Another use for collecting error data is forensics. This applies mostly to exceptional errors. If you log the right data for an exception, you may be able to use that information to fix a bug or turn an exceptional error into a simple failure case. In general, you should log as much information about exceptions as reasonably possible. However, if an exception happens all the time or you don’t plan on fixing it, then too much data might pollute your logs.
Shaping Your Error Data
What information should be captured from an error?
You always need information for identifying which error happened; this means picking the appropriate error class, error code, and name. Beyond that, you have many options: which operation failed, which inputs were invalid, which inputs were given in the first place, the IDs of relevant database objects, the timestamp, user-readable error messages, etc.
So what should be included? You need enough information for consumers to do their jobs right. This can be surprisingly difficult to narrow down because what counts as a failure or exception for one part of your application might be a completely normal state of things for another part of your application. I’ll discuss this more in my next post.
For exceptions, you should always dump as much data as you can immediately so that the data returned to the consumer is minimized. Log the timestamp, inputs, etc. as soon as the error is detected, and only pass back the information that the consumer needs.
Error data is a lot like any other data in your system: it ranges from simple to very complex. On the simple side, you have Boolean values — either something is an error or it is not. This is how you might represent a failure returned by a function that validates its input.
Next, you have integers, strings, and symbols. These carry about the same amount of information, something like, “This specific error just happened.” Integer error codes can be just as complex as strings because they have no scalar value, i.e. HTTP 403 is not “bigger” than HTTP 400. When using strings to represent errors, you should treat them the same as symbols. (Symbols are values that are equal to themselves and nothing else, and they can’t be compared any other way.) Note that your error data can include non-symbolic strings, but only for carrying extra information (such as messages to the user), not for uniquely identifying the error.
Next, you have compound data types such as structs and classes. Most object-oriented languages define a top-level
PurchaseOutcome class with a
was_successful field, or a status field that can be
Complexity Is a Balancing Act
Every error could hypothetically be represented by an integer code, where the code’s meaning is stored in some lookup table. But that makes development very difficult because you have to keep track of what Error #35 and #73 and #2683 are inside the code.
On the other extreme, you could represent every single error as a class and return an
EmailRegexDidNotMatch object when validating email addresses. But compared to returning
false, that’s a lot of unnecessary overhead and maintenance.
Your error data should be just complex enough to accomplish the goals we discussed earlier:
- Recovering from the error
- Recording the right amount of information for forensics and analytics
So many different things affect how your application’s errors will be captured and represented. The groundwork on categorizing errors that I laid out in the first part of this series can help provide some guidance about what to include in error data to keep the application easy to develop and debug.
Tomorrow, I’ll discuss what your application should do with error data. Heads up — it’s complicated!