Article summary
I support and maintain a variety of applications in production. Some of these applications consist of what might be considered “legacy” codebases. When troubleshooting issues with these applications, detailed and accurate external documentation is not always available. I often find myself acting as a code archaeologist, reliant on only the contents of the source code repo to get to the bottom of a thorny problem.
In these situations, I’ve found that source code repositories contain at least two important forms of documentation:
- The code: can be self-documenting, insofar as it clearly expresses intent and data flow
- The revision control history: can tell detailed stories of how a piece of code came to be
In my opinion, these have the potential to be the most important documentation your app can have.
I’d like to share some observations on how cultivating good habits can make these two forms of documentation more valuable.
Why Code as Documentation
When I say that code can act as documentation, I explicitly do not mean comments. Comments have their place, but it’s easy for a comment to get separated or out of sync with the code it pertains to. It’s much better to write expressive code, when possible.
So long as you have the code, you have… the code. It’s the stickiest form of documentation available. External documentation can get stale, and it’s not always made available to all the right people when teams transition—things can get lost in the shuffle. It’s likely though, that you’ll have a copy of the source when providing technical support or working on the application, and even more likely when the application is a web application in an interpreted language where the source is what gets deployed.
How Code as Documentation
There are a few different ways that I’ve recently seen code be expressive and self-documenting. There are certainly more ways than I’ll cover here, but these are a few where I’ve recently seen the benefit firsthand.
1. Micro improvements: Naming behavior
The first is at a micro-level: within a file, it can be helpful to give descriptive names to series of steps. Creating a separate, named method for five to ten lines of code that seems obvious to you can go a long ways toward making it immediately clear to a later reader.
I was reminded of this recently when pairing with Matt Fletcher. We were test-driving the development of a Chef cookbook to set up servers for his project and he pointed out a few places where such descriptive helper methods added clarity. As a bonus, we were able to “DRY” up some code, too, but the primary goal was to make the code easier to understand.
2. Macro improvements: Abstractions
The second is at a macro-level: when designing a large project, it’s important to be mindful about the abstractions you use, and how the architecture of your codebase supports clear thinking about the core business logic. Drew Colthorp recently gave a presentation at SoftwareGR about just this thing.
Once a project is underway, further enhancements to your abstractions are sometimes worth refactoring for, but it’s important to prioritize that work according to the value it provides.
3. Clarifying Techniques for Control and Data Flow
Finding ways to clearly express control flow and data flow within your code goes a long ways toward making it tractable for a later reader. This is especially true for asynchronous code—there are a lot of benefits to asynchronous code, but clear expression of control flow isn’t always one of them.
Leaning on well-known idioms and opinionated frameworks can help in some contexts, so long as the reader has a chance to learn those idioms as well. Straying from those norms can incur significant penalties when it comes to the readability and maintainability of your code.
Another help for clarifying control flow is to use explicitly defined finite state machines. When it comes to data flow, it is also helpful to lean on local idioms. Another approach, similar to using finite state machines, is to build explicit pipelines for data flow. Seeing how data flows through a series of functional transformations can be easier to understand as a pipeline than when the behaviors are spread across interaction between several components and buried within the behavior of complex objects.
Next week we’ll talk about making the most of another form of “sticky” documentation: revision control history.
This an interesting approach to avoiding “stale” documentation. Once you’ve reached “legacy” stage however, what do you do if the documentation (i.e. code) isn’t explanatory enough? Is there any concern about refactoring the code to make it better self-documenting at this point? I’d be afraid to break something… so perhaps comprehensive unit test coverage is a prerequisite.
Great observation, Matt!
Yes, I think that tests are incredibly helpful for maintaining old code. Undertaking a major refactoring of a legacy code-base without the assurance of a passing test suite is very risky. At Atomic, in circumstances where we’ve inherited such un-tested code, our developers will often try to create such a test suite before moving ahead with new development.
I’ve also found that well-written tests can be expository in their own right, and could even be included as a third form of “sticky” documentation.
Thanks for your comment!