Article summary
At Atomic Object we love getting interesting projects that really stretch our abilities. Even so, a real-life NP-complete problem is very rare. Before this project, I hadn’t heard of the knapsack problem. Fortunately, I don’t work alone. I was able to use the Atomic Object brain trust to solve the problem.
The Problem
My project has very complex billing logic. A client may need many services. The client pays for a service tier, which covers a certain amount of work. We have data on the amount of work (in minutes averaged per day) each service takes, and how many daily minutes each service requires. We also have pricing information for tiers and services. A few services can either be included in the service tier or billed separately. This means the client could have a choice of paying for the premium service tier or paying for a lower tier and paying separately for one or more services. My job is to find the cheapest plan for this client.
The core problem here is finding the best way to pack services into a service tier. With that knowledge I can price out each possible tier and choose the best option. I can simplify the problem slightly by realizing that most services must always be included in the tier. If I subtract the combined minutes of these services from the minutes allotted to the tier, the remainder is the space I can fill with the more flexible services.
Now I have a number of minutes to fill, and a set of services. Each service has a number of minutes and a price. This is a classic knapsack problem. The idea is you have a container with a limited size, and a number of items of different size and value. The goal is to maximize the value in that container. The remaining minutes of the service tier give me the size of my knapsack. The price and minutes of the services are their value and size, respectively.
Wikipedia explains the difficulty well. “The problem is NP-complete to solve exactly, thus it is expected that no algorithm can be both correct and fast (polynomial-time) on all cases.” The project calls for a web application that can collect the needed services and give a price on the fly. Any completely correct solution will take exponentially longer to calculate for each flexible service available. If that’s not good enough, I’ll have to settle for an approximation, which would be right most of the time, but not always. With clients’ money on the line, that’s not very appealing. As it turns out, a client can currently only be eligible for no more than five flexible services. This may grow over time, but I can reasonably assume it won’t exceed fifteen.
The Solution
I didn’t know the answer, but I knew where to look. I started asking around the office and within five minutes Karlin Fox identified it as the knapsack problem. A quick search later, and I was reading up on the nuances and various approaches to solving it. Karlin wasn’t even on my project, but the open atmosphere at Atomic Object makes it easy to get answers any time we need them.
I began with a brute force algorithm to get a baseline for performance characteristics. The green line on the graph shows that performance. It wasn’t good enough, but it was just a start. Drew Colthorp proposed a “branch and bound” algorithm. I tried a few other algorithms and settled on Drew’s approach. Then I really dug in and focused on the details, experimenting with different library calls until I found the fastest combination.
It could certainly be faster. This was all written in Ruby, which is quite slow, but also tremendously powerful and productive. I could rewrite it in C or do other similar tweaks, but no matter what I do, the graph will look the same, because it’s an exponential curve. Looking again at the graph, the Branch and Bound algorithm is ten times as fast with twelve flexible services, but that only allows me to support five more services. I might be able to make it fast enough for a few more flexible services, but right now I have 5. If that grows enough for this algorithm to be a problem, it will probably keep growing over time, and I’ll need an approximation rather than a correct solution. For now, it’s time to call it good enough and move on to the next task.
Did you try building up a table to solve this using dynamic programming?
Josh, I didn’t try a dynamic programming table, but that’s a great suggestion. That probably would have been my next step if I was still falling short of my performance goals. Instead, I called it good enough and moved on to the next feature.
This looks like the type of problem that Drools Planner was designed for. Take a look at
. Specifically about the 7 minute mark.
Thanks, Wayne. Drools Planner looks interesting.
Hi Ryan,
I am currently testing my new Algorithm that handles the Knapsack problem (with 2 dimensions i.e. Value and Weight).
I tried to compare my performance results with your graph but that’s hard to do b/c it depends on the performance of the machine…
It’s hard to tell if I have a better solution with only 17 pairs of data (a set of 17 value-weight).
Could you please send me an email so I can test my Algorithm on some real data and we can see if that helps out.
I.e.: I am currently testing my Algorithm with 30-50 pairs of data.
Regards,
Martin