Agentultra - Optimization Techniques: Divide and Conquer

In my last post, I presented a brute-force solution to a rather innocuous problem. I challenged my readers to think of a way to optimize the solution. The problem is trivial, but the excercise is important: naive algorithms get the job done but optimization is key to getting them done in a reasonable amount of time. In my naive solution I managed to get the answer to the problem but if I added just one more number to the input list of my function, I was looking at increasing the time it took to reach that solution exponentially (still no word on whether this is actually exponential or logarithmic… more investigation on my part will be required). This of course isn’t ideal in our day to day work as programmers.

In this series of posts, I’m going to investigate optimization techniques and how I like to approach these sorts of problems.

My first step in optimizing an algorithm is to first have an algorithm to optimize. When approaching a problem I like to break it down to small cases and solve for those first. Once I understand the problem and its solution I end up with a function that should get the correct answer in most realistic cases. However, the work is far from done. This initial solution is usually a brute-force solution or else terribly inefficient in some other way.

The next step is to analyze the solution you’ve come up with. Once you get correct answers from your function, increase the number (frequency) and size (space) of your inputs. If you still get correct answers and it’s pretty fast no matter how large the input, you’ve probably gotten lucky and landed on an optimal solution from the get-go. However, unless you’re a seasoned expert you’re probably going to need to analyze your solution and figure out a strategy for improving it.

Let’s put all of this theory to practice and see how we can analyze my naive solution!

The first most primitive test we can apply to analyze our solution is `time`. (Note: this is typically a *nix tool, Windows users… you may have to find some sort of alternative that I’m not aware of. Sorry.). It’s not the most precise and in-depth tool you can use, but it will give you a general sense of how well you’re doing. If your solution can give you an answer in less than a minute you’re doing okay. The key to this tool is to run it several times and take the results in aggregate. However, this isn’t a post on statistics so you’ll have to figure that out on your own.

On a deeper level, you’ll run across profiling tools. In Python there’s a built-in module called “cProfile” (or just “profile” if the former isn’t available). It will do its best to accurately report the timing of every function call in your method, the cumulative time spent in the function at each call, and the total number of calls to each function. This useful module also includes a statistics module for better reporting. Go ahead, read up on it, and run it on my brute-force function.

You’ll get a table that looks like:

         4194340 function calls in 9.890 CPU seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  4194281    3.085    0.000    3.085    0.000 {sum}
       22    0.000    0.000    0.000    0.000 {method 'strip' of 'str' objects}
       22    0.000    0.000    0.000    0.000 largest_sub.py:45()
        1    0.000    0.000    0.000    0.000 {map}
        1    0.000    0.000    0.000    0.000 {method 'read' of 'file' objects}
        1    0.000    0.000    0.000    0.000 {open}
        1    0.000    0.000    0.000    0.000 {len}
        1    0.001    0.001    9.890    9.890 {execfile}
        1    6.803    6.803    9.888    9.888 largest_sub.py:4(largest_sub)
        1    0.000    0.000    0.000    0.000 {method '__enter__' of 'file' objects}
        1    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        1    0.000    0.000    0.000    0.000 cProfile.py:66(Profile)
        1    0.000    0.000    9.890    9.890 :1()
        1    0.000    0.000    0.000    0.000 {sorted}
        1    0.001    0.001    9.889    9.889 largest_sub.py:1()
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {range}
        1    0.000    0.000    0.000    0.000 cProfile.py:5()

There’s a lot of information here, so I went ahead and ordered it by the number of calls to each function. The big sore spot is right at the top. It’s telling me that my function is spending almost all of its time calculating an enormous number of sums. If we look back at the code we can easily see that this is true. The function is essentially running through every combination of sub-sets from two up to the entire set and calling sum on each one!

So one of the most simple and common optimization strategies you might encounter is divide and conquer! If we think about our function and all the sums it’s doing it’s not hard to realize that we’re doing a lot of sums that don’t make sense. For example, the sum of the combination of the two largest integers in our list will never equal any number in our list. So why bother summing it? So the core of this strategy is to divide the problem space up so that we can do fewer calculations.

So when I go back to the drawing board, I’m going to think of the most simple way possible that I can divide up my problem space so that I can reduce the number of calls to sum that my function has to do. It’s okay if it’s still not the most efficient possible algorithm. Getting the most efficient algorithm takes time and accurate observation!

So my second attempt:

def largest_sub2(nums):
    combinations = 0
    sorted_nums = sorted(nums)
    for num in nums:
        rest = filter(lambda x: x < num, nums)
        up_to = len(rest)
        up_to = up_to + 1 if up_to > 2 else up_to
        for i in range(2, up_to):
            for combination in itertools.combinations(rest, i):
                if sum(combination) == num:
                    combinations +=1
    return combinations

So here we have a slightly better function. Same answer, but slightly faster. In this function, I’ve iterated over the list of input numbers and divided up the number of combinations to check to make sure that I’m only generating combinations of numbers that are less than the current one that we’re looking at in the input list. This way, when we’re looking at the second largest number in the input list, we’re not going to try and find combinations with the largest number in the list.

To see if our change actually improved anything, run the `time` command. You should see a modest improvement! Then run the profiler. The table should look something like this:

         4194150 function calls in 6.782 CPU seconds

   Ordered by: call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  4194049    3.035    0.000    3.035    0.000 {sum}
       22    0.000    0.000    0.000    0.000 largest_sub.py:46()
       22    0.000    0.000    0.000    0.000 {len}
       22    0.000    0.000    0.000    0.000 {method 'strip' of 'str' objects}
       22    0.000    0.000    0.000    0.000 {range}
        1    3.746    3.746    6.780    6.780 largest_sub.py:29(largest_sub2)
        1    0.000    0.000    0.000    0.000 {map}
        1    0.000    0.000    0.000    0.000 {method 'read' of 'file' objects}
        1    0.000    0.000    0.000    0.000 {open}
        1    0.001    0.001    6.782    6.782 {execfile}
        1    0.000    0.000    0.000    0.000 {method '__enter__' of 'file' objects}
        1    0.000    0.000    0.000    0.000 {method 'split' of 'str' objects}
        1    0.000    0.000    0.000    0.000 cProfile.py:66(Profile)
        1    0.000    0.000    6.782    6.782 :1()
        1    0.000    0.000    0.000    0.000 {sorted}
        1    0.001    0.001    6.781    6.781 largest_sub.py:1()
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 cProfile.py:5()

Looks like we only shaved off a few corner cases. There’s only a difference of 232 calls to sum. Yet there should be an improvement of at least a couple of seconds. Looks like we could think of a better strategy! See if you can come up with one. I’ll post a better strategy next time and we’ll compare results.