Code Reading

I was recently asked if I had any advice for reading code. It’s an important skill for developers, because practice reading code leads to faster ramp-up on projects. Studying good codebases is also one of the best ways to pick up a sense for project architecture design.

1. Find Your Point of Entry

First, there’s the question of where to start. If I’m trying to get a sense for how a system fits together a whole, its main point of entry is an obvious starting place. Often, though, I’m more interested in a particular feature — understanding why awk’s implementation of regular expressions can run laps around the one in Ruby, or a cool trick like Chicken Scheme‘s “Cheney on the MTA“-based garbage collector.

If there are tests for the feature in isolation, that’s a great place to start, but it’s not always that easy. Codebases can differ wildly in organization (particularly when the language imposes few conventions), but if I know what the codebase calls the feature internally, grepping for the name or related terms will usually point me in the right direction.

2. Engage the Code

It’s easy to miss subtle, crucial details when skimming through a bunch of code, so I try to actively engage with it. Making minor changes as I read it, such as correcting style inconsistencies, keeps me focused. Names and comments can be misleading (though seldom intentionally), so it’s usually necessary to execute the code to be sure how it works. (Commit logs and `git blame` may also be helpful, if they’re available.)

In languages that have REPLs (interactive shells), such as Scheme or Haskell, it’s usually straightforward to run parts of the program with sample input. These languages were designed with interactive development in mind, though. It’s more challenging with languages that only supports batch compilation. While debuggers and printf make poor substitutes for REPLs, adding test cases for the functionality I’m exploring can also help to see how they behave under specific circumstances.

3. Divide & Conquer

I find it helpful to break unclear bits out into small standalone programs and study them in isolation, especially in languages without REPLs. (As a bonus, it’s good refactoring practice.) I read a lot of C, so I’ve found ways to streamline this process. If a file will build with a makefile that says:

# compile foo.c to an executable called "foo"
# with default compiler/linker options
foo: ${CC} ${CFLAGS} ${LDFLAGS} -o foo foo.c

then it doesn’t need a makefile at all — default rules are sufficient, and just `make filename` will build it. The combination of that and a skeleton file with #includes and a main() means I can make small executables quite quickly. If several files are involved, then

thing: file1.c file2.c file3.c

will generally do the right thing.

4. Take Notes

I also keep a text file with notes on a project as I study it (using org-mode in Emacs). Any time I have questions or I’m stumped by a design decision, I make a note of it to follow up on later, along with the current file path and line. (I have a bit of emacs lisp that automates this.) If the project doesn’t have a glossary in its documentation, I usually build one as I go. Gathering these definitions for the program’s problem domain usually helps clarify its overall design. Other discoveries go in the notes, as well – I’ve learned a couple useful algorithms this way, such as kona’s linear-time sort.

What are your tips for reading code?