Code Reading

I was recently asked if I had any advice for reading code. It’s an important skill for developers, because practice reading code leads to faster ramp-up on projects. Studying good codebases is also one of the best ways to pick up a sense for project architecture design.

1. Find Your Point of Entry

First, there’s the question of where to start. If I’m trying to get a sense for how a system fits together a whole, its main point of entry is an obvious starting place. Often, though, I’m more interested in a particular feature — understanding why awk’s implementation of regular expressions can run laps around the one in Ruby, or a cool trick like Chicken Scheme‘s “Cheney on the MTA“-based garbage collector.

If there are tests for the feature in isolation, that’s a great place to start, but it’s not always that easy. Codebases can differ wildly in organization (particularly when the language imposes few conventions), but if I know what the codebase calls the feature internally, grepping for the name or related terms will usually point me in the right direction.

2. Engage the Code

It’s easy to miss subtle, crucial details when skimming through a bunch of code, so I try to actively engage with it. Making minor changes as I read it, such as correcting style inconsistencies, keeps me focused. Names and comments can be misleading (though seldom intentionally), so it’s usually necessary to execute the code to be sure how it works. (Commit logs and `git blame` may also be helpful, if they’re available.)

In languages that have REPLs (interactive shells), such as Scheme or Haskell, it’s usually straightforward to run parts of the program with sample input. These languages were designed with interactive development in mind, though. It’s more challenging with languages that only supports batch compilation. While debuggers and printf make poor substitutes for REPLs, adding test cases for the functionality I’m exploring can also help to see how they behave under specific circumstances.

3. Divide & Conquer

I find it helpful to break unclear bits out into small standalone programs and study them in isolation, especially in languages without REPLs. (As a bonus, it’s good refactoring practice.) I read a lot of C, so I’ve found ways to streamline this process. If a file will build with a makefile that says:

# compile foo.c to an executable called "foo"
# with default compiler/linker options
foo: ${CC} ${CFLAGS} ${LDFLAGS} -o foo foo.c

then it doesn’t need a makefile at all — default rules are sufficient, and just `make filename` will build it. The combination of that and a skeleton file with #includes and a main() means I can make small executables quite quickly. If several files are involved, then

thing: file1.c file2.c file3.c

will generally do the right thing.

4. Take Notes

I also keep a text file with notes on a project as I study it (using org-mode in Emacs). Any time I have questions or I’m stumped by a design decision, I make a note of it to follow up on later, along with the current file path and line. (I have a bit of emacs lisp that automates this.) If the project doesn’t have a glossary in its documentation, I usually build one as I go. Gathering these definitions for the program’s problem domain usually helps clarify its overall design. Other discoveries go in the notes, as well – I’ve learned a couple useful algorithms this way, such as kona’s linear-time sort.

What are your tips for reading code?
 

Conversation
  • I often use grep (or awk) the way you describe in 1, but even better (when I think to do it) is to build a tags file using ctags. That let’s me jump around more easily within $EDITOR. The result is less linear, but I can make connections between parts of the code more easily.

    • Scott Vokes Scott Vokes says:

      I too use tag search (in Emacs) quite a bit, as well as glean, a little search engine I wrote.

      I wanted to avoid being too editor/IDE-specific, though I made an exception for org-mode because it’s that good.

  • Mark Needham says:

    Hey,

    Really liked the post and it was good advice for me as I’ve been debugging some Haskell programs – a language I’m not entirely familiar with and one which seems to encourage one and two letter variable names!

    I used your debugging tip by using a function from the ‘Debug.Trace’ module which allowed me to print out the state of the code as I executed it.

    One other thing I found useful was to move functions around the file so that ones which called each other were adjacent. I find it much easier to understand the code if it’s all on one screen rather than having to scroll all over the place.

    Cheers, Mark

  • Craig Stuntz says:

    I learn the most about code I’m reading when I reimplement it in a different language or idiom. It sounds a bit like cheating (“That’s writing; we were talking about reading!”) but it really forces you to read and understand the code completely. You just can’t cheat and skim when you do this.

  • […] to reinforce the point that the Atomic Object’s Blog was worth following, here is another good post. This time it is about reading […]

  • Comments are closed.