“I hate reading other people’s code” is a common refrain among software developers of all experience levels. However, it’s a necessary skill, especially for developers rolling onto existing codebases, and if you approach it with the right perspective and the right tools, it can be an enjoyable and enlightening experience.
The reason we hate reading other people’s code is because we didn’t write it ourselves. That’s not to say that we all have this secret belief that we’re the best coders on the planet and no one else can write code like we can. It’s because there’s an intense thought process that goes into creating code, and a passive reader doesn’t get the benefit of experiencing that firsthand.
The code as you see it on the screen might have involved multiple people. It might have involved debates and collaboration. It might have taken weeks to nail down a version that conformed to some undocumented constraints that were only known in the heads of the original writers–but you won’t know any of that.
As a reader, all you will see is the finished product, and, unless you do a little digging, the only context you will have is the other words on the screen.
1. Learn to Dig
When you’re diving into a mature codebase for the first time, you might not feel like a developer. You might feel more like an archaeologist, a private investigator, or a Biblical scholar. That’s just fine, because you’ve got a bunch of shovels at your disposal.
If you’re fortunate enough to be working on a codebase that’s been in version control since the beginning, celebrate. You have access to a wealth of metadata that will make your job of understanding not just the code, but the context, a lot easier. I’ll assume you’re using Git, but if you’re using SVN, the same ideas apply.
git blame
You can use git blame
on a file to get an author name, last modified date, and commit hash for every single line. Get familiar with the authors. If you’re lucky, there might only be a few of them, and they might all still be working with you, so you can use them as a resource. If you’re unlucky, there might be dozens of authors that you’ve never heard of before.
Regardless, try to get a feel for who the major contributors are. If you ever come across a bizarre function that you can’t figure out, use git blame
to figure out the author and track him or her down for questioning.
git log
Use git log
to look at the commit history of the overall repo. This command prints the commit messages, so don’t forget about grep if you want to do something like search for commits where the commit message references someFunction: git log | grep someFunction -C 3
(-C 3
will show your matches with three lines of context).
git log
can also show you the history of a single file with the -p flag: git log -p index.js
. Pay attention to the people who have been modifying things lately so you know where to direct questions if they come up.
2. Go Back in Time
You can check out any commit you want and run it as if it were the most recent commit in the project. You might want to check out the last known good commit before some difficult-to-track-down bug started rearing its head, or you might just be bored and in the mood for some historical perspective on where your project was years before you came onto it.
If your project is hosted on GitHub or something similar, you can get tons of perspective by reading through issues, pull requests, and code reviews. Pay attention to the issues that have generated the largest amount of discussion. These might be pain points that you’re going to run into eventually, and you will know ahead of time how to deal with them.
3. Read the Specs
Specs are the new comments. Read unit specs to figure out what functions and modules are supposed to do and what kinds of edge cases they are designed to handle. Read integration specs to figure out how users are going to interact with your application and what kinds of workflows your application supports.
4. Think of Comments as Hints
If you come across a confusing function and then read an associated comment that makes you even more confused, consider the possibility that the comment is out of date and hasn’t been maintained. Programmers’ eyes have a way of skipping right past green lines of text, and it’s possible that the comment explains an iteration of the function that hasn’t existed in months (or years), and no one else has noticed.
5. Find Main
This might seem obvious, but make sure you know where the code starts executing and how it sets itself up. Look at the files being included here, the classes being instantiated, and the configuration options being set.
You’re likely to see them all over the place in the rest of the code base. Some of the modules here are likely to be very general-purpose and decoupled from the rest of the codebase. They represent smaller and more digestible bits of functionality that you should familiarize yourself with before trying to tackle the larger application.
Run a git blame
on this file and see which parts of it have changed recently. A chunk of code that has changed recently might clue you in to some of the challenges that have been facing the dev team in recent weeks. Maybe they’ve introduced a new library, maybe they’ve been constantly struggling to configure a library that isn’t working out too well, or maybe there’s just boilerplate code that needs to be updated on a regular basis.
Try to find references to these modules in some of the other source files to get a feel of how and when they are used. This can give you a sense for how they fit into the overall application.
6. Notice Style
The reason you’re learning about this application is because you’re eventually going to be writing code for it, so pay attention the style. Of course, this includes superficial things like naming conventions, spacing conventions, and brace placement, but it also includes code conventions.
What is the general level of abstraction? If it’s highly abstract code with many layers, you should expect to write the same kind of code.
If you dig around in the history enough, you can probably find the exact point in time where one of the developers decided to abstract out a piece of code. What did it look like before, and what did it look like afterwards? Try to follow the same convention when you’re writing your code.
On a more micro level, what kind of code are the other team members using to get things done? If the developers favor for loops over maps, then you should probably favor for loops over maps as well.
If you come up against a convention you don’t like, talk to your team about potentially changing the convention in the future, but don’t mix and match a bunch of different styles in the same file. The more a file looks like it was written by one person, the better. Being consistent is more important than being cute.
7. Expect to Find Garbage
You might find functions that are never used, or you might find entire files that are never used. You might find commented-out code that hasn’t been touched in years (git blame
). Don’t slow down and spend too much time thinking about it, and don’t be afraid to get rid of this stuff.
If the code was there for a reason, someone will flag it in a code review. You’ll be reducing the mental overhead for the next reader that comes along.
8. Don’t Get Lost
Keep these things in mind, and don’t feel bad when you find yourself out in the weeds. Don’t expect it to be a linear process, and don’t expect to understand everything 100%. Pay attention to the important details and know how to dig around to find answers to your questions, and you will find yourself understanding very quickly.
I disagree with a statement in #7. Never “expect” things to be caught in code reviews. The reviewers might be learning this code too. Do the research yourself about why code exists and if it can be removed. On our projects, parts of the main product are also used by tools, so just because the code isn’t used in your solution, doesn’t mean it isn’t used elsewhere.
Also, I love how the author assumes that if you’re not using git, you’re using SVN, because we all know those are the only two version control systems that exist.
I disagree with disagreeing with #7. :)
If you see commented out code then delete it. If the comment-outer decides he now wants to uncomment the code then they go back in history in source control. Commented out code almost never gets uncommented. I’d say virtually never. Get rid of it. I would not pass code in review that had commented out code.
Unused functions, or other abandoned code can, and should, also be deleted if, and only if, there exists a good set of unit tests with good code coverage.
Delete abandoned (commented-out) code!!! Yes, it gets in the way of understanding. At the very most, if you absolutely must, just leave a one line that says it is in version control. I have often committed code and then removed it for the next commit, just so it would be in version control in case I need it in the future (because I did write it, it took time, and someone paid me to write it), but I never really expect I will use it in the future.
This is a perfect use of the VCS. That code will be saved, but not be in the way or distract readers from the working code.