Four Techniques for Making Sense of Legacy Code

At some point or another, many developers will need to confront legacy code. The difficulty of this task can vary drastically, depending on several factors. If the codebase is relatively small and developers who have actively worked on it are available, it might not be that painful to figure out what is going on. However, in a worst-case scenario, you could be thrown into a big gnarly beast of a codebase alone, with no help in sight.

If you’re lucky, you’ll be doing this as part of an application rewrite. In this case, it can be especially important to be able to get a holistic view of a system. You don’t get very many opportunities to completely restructure an application. When that chance presents itself, it can be the perfect time to fix any points of pain or risk.

A Learning Experience

I’m working on a team that was tasked with a rewrite. Thankfully, the legacy application’s codebase was relatively small. We also had direct access to several developers with knowledge of the system, as well as the person who originally wrote the system.

However, while the system was small, it was still somewhat complex. In order to make the most out of our time, we had to understand what the system was doing. And, as consultants billing by the hour, it was important to do that as efficiently as possible.

How to Get a Holistic View of Legacy Code

These are some of the methods we used to make sense of legacy code.

1. Map out the system

If you have access to people who are familiar with the system, start by going to a whiteboard and mapping out the system. Skip digging into much detailed code for now, and focus on a high-level overview of what’s going on. What are the inputs and outputs? How does information change as it moves through the system? Are there different states the system can move through? If so, what triggers transitions?

If you don’t have anyone available to review this with you, try and walk through the code as best you can, without worrying about getting into too many details right away. Start by finding the application’s entry point, and work on tracking a path through the code. Try to identify the key modules to the program.

With this high-level overview of the system, you will be in a better position to start sifting through specific pieces of code. And, while doing a rewrite, you can start thinking about restructuring opportunities.

2. Check out the test suites

If the legacy code has a test suite available, definitely take a look. The assertions throughout the tests should hopefully give you more of an idea about how the previous developers expected the system to behave.

While looking through tests, pay attention to the level of the test. Is it a simple unit test focusing on one very small, specific component? Or is it an integration test that is testing multiple components together? The higher-level tests can give you additional insights into any parts of the application that interact with or depend on each other.

If you are doing a rewrite, be sure you don’t skimp on a test suite in your new version! The “future you” will thank you.

3. Print it out (and highlight)

Lastly, don’t overlook the power of paper copies. Don’t worry about printing out every single file in the project. When it comes time to review a particular function or method, you should definitely attempt to make sense of it while it’s on your monitor.

However, if the logic is particularly confusing, or the code is too long to easily read through, this is a sign that it’s time to print it out, get some highlighters and a pen, and go to town.

When marking up printouts, I like to use multiple colors of highlighters. I use those highlighters to focus on:

  • Identifying repetitive code
  • Breaking up complex logical conditions
  • Tracking the use of specific variables
  • Picking out calls to other functions

I’ve found that making sense of those components is the most beneficial in extracting the “intent” behind the code. For code sequences that are repeated, are they truly doing the same thing? Or is the context different in each case? For complicated logical conditions, are there any unreachable or impossible conditions? The goal here is to figure out what is actually happening and why.

During our rewrite project, this technique has been invaluable in helping us feel confident in our new implementations. Not only are we making sure we’ve understood the existing logic, but we also have a chance to refactor this logic and improve its readability.

4. More whiteboarding!

I cannot stress enough the value of a whiteboard or other similar mechanism to write out your thoughts. Marking up printed code is helpful, but sometimes, it’s nice to just completely get away from code.

After you’ve highlighted the key pieces of the code, it can be helpful to jot down those key pieces in terms more readable to people (not computers). Again, during rewrites, this is a huge boon to help you easily plan out your new implementation.

Conclusion

For those times you have to dig through legacy code, some of these techniques can aid you in making sense of it all. It can be a difficult task, but it doesn’t need to be a miserable one. If you have any other tips on things that have helped you get into legacy code, please feel free to share them below!