Sticky Documentation, Part 2: Source Control History as Documentation

Last week, I introduced a concept I’m calling “sticky documentation” and reviewed a few ways that we can make the most of the “stickiest” documentation we have: the code. Today, I’d like to talk about another form of “sticky” documentation: source control history.

If you have access to the code for an application, and that code has been kept under some form of source control, it’s quite likely you’ll have access to the source control history as well. In other words, the source control history is likely to stick around.

How can we make the most of source control history as a form of documentation for our projects? What will be most valuable to future code archeologists digging in our repositories?

Properties of Source Control History

First, let’s distinguish some properties of source control (a.k.a “version control” or “revision control”) history from other forms of documentation like the code itself. The code can tell us what’s happening and can help us understand the overall structure of a software system; source control history can tell us how things came to be that way. Oftentimes, when troubleshooting a problem, the story of how things came to be is immensely valuable.

Like the code, source control is near at hand during development—working with source control is often a necessary part of testing and deploying new code. When it’s an integrated part of the development process, it can’t be neglected completely, but it can certainly be underutilized.

Making the Most of Source Control History

Here are some ways to make source control history a more valuable asset for your project. (I’ll be using Git as an example, but most of the same practices apply to other tools as well.)

1. Make clean commits

Try to scope commits to well-defined units of work: a specific feature, a specific bug fix, a specific code cleanup task. It’s often tempting to commit a bug fix and a feature, or a whole afternoon’s work on three separate features and fixes all at once, but being disciplined about maintaining “clean” commits can go a long way toward making your source control history tractable in the future.

When using Git, I sometimes use git commit -p to help split up changes when I accidentally slip up and have a few lines that belong in a separate commit. I can leave those lines unstated while committing other changes to the same file. It’s tedious if you have a large number of changes that need to be split, so it’s still important to maintain discipline, but git commit -p is really handy for small fix-ups.

2. Write good commit messages

Take the time to write good commit messages. If you’re making an effort to make clean commits, you should at least be able to explain what the purpose of your commit is. Also consider the formatting of your commit messages. If you’ve made a complex change that might be hard to remember the reason for later, add a few paragraphs of explanation to the commit message. Just remember to keep the subject line concise.

This is another area that requires discipline. Sometimes, in the heat of troubleshooting a thorny problem, it’s tempting to start committing with messages like “Trying something else” or “Update” or even to use less-than-polite language. Don’t.

With Git, I often use git commit --amend to improve my commit messages before pushing. If you have a lot of (unpushed) troubleshooting commits, you might also consider squashing them down to the one meaningful change, and giving that a good commit message before pushing.

3. Take advantage of branches

This varies somewhat by tool, but for Git at least, it’s very easy to create new branches. Take advantage of this fact and use branches as another way to structure your work. Being able to follow development on various features through various stages of QA to release can help to clarify where things stand, and can tell a much clearer picture of how the code currently in production came to be.

With Git, you can create explicit merge commits with git merge --no-ff <branchname>. This clearly sets apart all of the commits that were made on that branch as a group. It also gives you another opportunity to leave a meaningful commit message. Consider using formal branching model like “Git flow”, or adapt it to a form that fits your workflow best. Whatever your branching model, be consistent and make it part of your team’s workflow. This too takes discipline, but it’s discipline that will pay off when you look back at your source control history and can see at a glance when major features and fixes landed on master.

4. Include Identifiers

When writing commit messages, you can include other unique identifiers in them like issue numbers from an issue tracker or keywords that can tag a particular type of change. Github automatically creates hyperlinks for issue numbers and usernames, but you don’t need to have that for unique identifiers to be useful in commit messages.

Sometimes when troubleshooting an issue, I come to the code via a commit that included the ID of an issue I was looking at. Sometimes, it’s the other way around. Either way, cross-references to and from another source of information can be really helpful, so long as it’s not an excuse to omit necessary information from your commit messages. It’s quite possible that the code repository may someday be accessible to someone who does not have access to the original issue tracker.

5. Tag Releases

Tags can be easy to forget when you’re deploying code behind the scenes and not making a publicly downloadable release artifact. Consider making automatic tag creation part of your deploy process.

A tag with the date that your code was deployed to a particular environment can be awfully useful when trying to figure out what might have caused an issue for an end user 2 weeks ago. Tags can also provide helpful specificity when talking about “the release with feature X that we deployed last month.”

Summary

Programs must be written for people to read, and only incidentally for machines to execute. – Hal Abelson and Gerald Sussman, “Structure and Interpretation of Computer Programs”

Code is for humans. Generating documentation should be part of writing the code. Forms of documentation that “stick” with the code are often overlooked, but extremely valuable. Source control history can be used to tell the story of how a software system came to be. The most valuable source control history requires a degree of discipline to generate, but is not unattainable. The effort required to produce valuable source control history is less than what would be required to generate the same level of detailed code documentation in another medium.

Code and source control history are just two forms of “sticky” documentation. I think that a good test suite could qualify as a third form of “sticky” documentation. Are there others? How does your team generate and maintain “sticky” documentation?