Five Steps to Finding Performance Optimization Bottlenecks

Article summary

1. Trust Only Hard Data
2. Start at the Bottom, and Work Up
3. Test Extreme Values
4. Extract the Problem
5. Keep Your Control Group Handy
To Sum it Up

performance-optimization

I recently had the opportunity to do some performance optimization work, and I enjoyed it maybe a little more than a reasonable person should. It reminded me of one of my favorite projects from years ago — tuning a Java 2D graphics library that had to beat a competitor’s implementation on industry benchmarks.

Perhaps what I like most about this type of work is that it’s often a bit of a “whodunnit.” Which part of the algorithm is eating the milliseconds? And why? I’d like to briefly cover five guidelines I use when doing this type of work.

It’ll be helpful to have an example available, so let’s consider an application that renders your favorite photos as quickly as possible, at random locations on a canvas, clearing the canvas before drawing each image.

1. Trust Only Hard Data

I’m an intuitive person, and I find my intuition to be an invaluable tool when debugging. However, I’ve learned to be fairly skeptical of it when optimizing for performance. I’m not suggesting that we shouldn’t look at application code, consider where the problem areas are likely to be, and make those areas our primary suspects. But performance bottlenecks are often surprises.

Perhaps after a careful inspection of our example animation loop, the image rendering call seems likely to be the problem. We’re rendering large images and we’re scaling them down, that must be where the time is going, right? Well, maybe. But maybe not. Maybe clearing the canvas is expensive. Maybe we’re doing a lot of math to figure out the optimal coordinates for that next image. Or maybe our algorithm for choosing the next photo is eating up the clock. We’re not reading that next photo from disk each time right?

It’s very important to prove that the time is going where we think before narrowing focus. Depending on the platform, we may have profiling tools to measure where the time is spent. If not, there’s always the system clock to fall back to (of course, benchmark the clock; it may be a slow call itself, or it may have poor resolution). Regardless, good data is key to optimizing performance, so let’s gather as much as we can throughout the process.

Also, don’t forget to record your results along the way—you’re likely to try a number of experiments during the course of your investigation. You won’t want to have to repeat a test you’ve done earlier to compare performance against your current approach.

2. Start at the Bottom, and Work Up

Ok, so we’re not grabbing that next image from the internet, we’re not reading it from disk, and our logic for picking the rendering location isn’t the problem. The benchmarks show that it is the image rendering call eating up the clock after all. Can we draw a filled rectangle instead and get the performance we want?

I often find that these sorts of trivial tests are invaluable to establish a baseline. We may find that a much simpler operation than what we are trying to accomplish is still unacceptably slow. Maybe the bottleneck is unrelated to the “interesting” work we’re trying to do and there’s a more basic limitation that we’re hitting.

Assuming that this is not the case and our simple operation works well, we can start to add complexity—maybe our image has alpha channels that are slowing the rendering. Maybe the scaling should be done at load time rather than when we’re rendering the images. But perhaps the problem is elsewhere. It’s important to make sure that the simplest possible case gives the performance we need and work our way up from there.

3. Test Extreme Values

So the simplest case—drawing a 50×50 filled rectangle—was super fast. Is drawing a solid-color 50×50 bitmap fast? Assuming that it is, let’s try a 75×75 image. Oh no! Super slow! We’ve found our problem—large images are slow!

Before going too far with a conclusion like this, I find it’s always useful to test extreme values and verify that they validate the current hypothesis. Is a 10×10 image significantly faster to render? What if we try a 1000×1000 image—does the performance degrade as we expect? What? 1000×1000 images render fairly quickly? Now we have some interesting data to narrow in on.

4. Extract the Problem

Ok, so it’s not purely image size. Maybe it has something else to do with image dimensions. Maybe our image drawing call is slow at rendering odd-pixel-width images?

At this point, we have testable hypotheses, and it’s important to eliminate variables to prove that we’re on the right track. We should be able to write simple test cases outside of our application to prove that we’ve isolated the factors that lead to our performance degradation. If our test cases confirm the hypothesis, great. If not, we’ve missed something important—also very useful information. Removing variables is key to proving bottlenecks, and if we’ve isolated the problem, we should be able to demonstrate it independently from our application.

A standalone test case is a great way to communicate the problem to another team if we need to file a defect, and it shows that we’ve fully understood the problem. One tip when writing these test cases: try to remove every bit of noise possible. In this example, it’s a better test case if we can demonstrate the problem with a plain, single-color image than it is if we load a random image.

5. Keep Your Control Group Handy

Ok, we’ve found it! Our graphics library is terrible at rendering odd-pixel-width images! If we stick to even-pixel-width images, things are much better. First of all, congratulations! But second, let’s try our original test case with odd-pixel-width images again, verify that we see the performance problems, then use only even-pixel-width images and prove that our problem is resolved.

Performance work can be exhausting, and it can be exhilarating to get that great run where the numbers look wonderful. Unfortunately, we may have only solved one of our problems. Assuming you’ve extracted and proved a performance bottleneck, you may find that plugging the fix back in to your original test case reveals a new bottleneck. And we’re back to step one.

To Sum it Up

Performance bottlenecks are rarely where you expect them to be. Good measurements are key to help you focus on the real problem areas in your code. When you discover these areas, experiment with the simplest possible scenarios to validate that your assumptions are correct. Extract standalone benchmarks to reproduce the problem outside of your application, and then validate that plugging the solution back into your original problem gives you the results you’re expecting.

Five Steps to Finding Performance Bottlenecks

Article summary

1. Trust Only Hard Data

2. Start at the Bottom, and Work Up

3. Test Extreme Values

4. Extract the Problem

5. Keep Your Control Group Handy

To Sum it Up

Tell Us About Your Project

Article summary

1. Trust Only Hard Data

2. Start at the Bottom, and Work Up

3. Test Extreme Values

4. Extract the Problem

5. Keep Your Control Group Handy

To Sum it Up

Related Posts

Software Developers, Have a Feature? Resist This Urge

Prevent Frame Buffering with OpenCV

The Only Question a “Pair” LLM Programmer Can Answer

Keep up with our latest posts.

Tell Us About Your Project