6 Steps to Understanding a Large, Legacy Codebase

When diving into an unfamiliar codebase, it’s tempting to pick one aspect and start exploring it in detail. But that’s like trying to navigate a forest by dissecting a tree.

I’ve had to find my way around a few legacy codebases over the last few years. And I’ve learned to start with the 10,000-foot view, then work my way down. It makes the whole process a lot more manageable.

1. Start by Using the App

I used to clone a repository and dive straight into the code. I try to resist that urge now until I’ve had the opportunity to experience the application as a user.

Firing up the app and walking through the main workflows is a quick way to get an overview of the functionality that the application provides. I find that a new codebase is easier to understand if I can map what I’m seeing in the code to the end user’s experience in the app itself.

2. Study the Config Files

Config files are generally not the most exciting thing to read, but I make a point of going through them carefully. Developers are unlikely to modify config files unless they need to, so anything custom in a config file generally tells you valuable information about the application.

Config files will tell you what tools the app is using and what dependencies it has. This can alert you to features you should look for in the application or suggest how some types of problems will be solved. You’ll want to understand how the app is being built, tested, and deployed. You’ll also want to know what environments are supported (e.g., development, test, release) and how they vary from each other.

Pay close attention to any scripts or tasks provided in the config files. If they’re in there, it’s likely you’ll need to use them. It’s best to at least get a sense of what’s there.

3. Read the System Tests

Does your app have system tests? If so, they’re a valuable source of information about the features provided and how to work with the application APIs. Assuming the tests actually run, they’re often a much better source of truth than READMEs or code comments.

4. Learn the Folder Structure

It can be tempting to start with the first file and work your way file-by-file through the codebase, but that’s generally an exercise in frustration for me. I really want to focus on the big picture first, so I review the codebase’s folder structure. What types of files are grouped together? What types of things would you expect to find in each folder? Where would you go look for a UI component or for a route definition?

5. Focus on Key Boundaries

How does the application start up? Is there a shutdown process? How does it store and retrieve data? Does it communicate with the network? How do users interact with the application? These “edges” of the application are really good to understand, as a lot of workflows will start and end in these places. They’re also key points for debugging.

6. Follow Key Workflows

My last step is to identify a key workflow in the application, then do my best to trace it through the code. For example, if my first step as a user is to sign up for an account, I’ll walk through that process in the code. I want to understand how navigation through the code works, where security and authorization are being handled, what the exception handling looks like, etc. If you have the option of a debugger, this can be a great time to use it to walk through the execution.

Assuming I feel fairly confident with one workflow, I’ll start with another and repeat until I’m feeling like I have a decent hit rate of correctly predicting what the implementation is going to look like. The bulk of my time understanding a new codebase is probably spent in this process.

Bonus Tips

I encourage skimming the code and taking a lot of passes early on. Don’t get bogged down if something isn’t making sense right away. Some bits won’t make a whole lot of sense on their own, and you’ll need to understand other parts of the codebase in order to connect the dots. Initially, I read fairly quickly, make some hypotheses about what’s going on, and then come back for an in-depth look after I’ve seen more of the code.

Lastly, resist the urge to refactor early. I’m really big on consistency and cleanliness, and I pretty much always find things I want to fix right away. I’ve learned to hold off on this; I know I’ll do a better job of refactoring once I’ve taken a bit more time to get to know the codebase.


Best of luck on your legacy codebase adventure! It gets a little bit easier each time. Please let me know if any of this was helpful. If you’re a seasoned pro, let me know what I missed!

Want to learn more? Here are a few related posts by other Atoms: