Can You Make This Toy Python Program Faster?

Posted on October 10, 2010

As my job search is slowly coming to a close, my life for the past couple months has been filled with… trivial programming problems. It seems that the hiring practices of HR departments has become a draconian screening process where by applicants are asked to sit down and write out solutions to deviously “simple” problems on a piece of paper or a white board. While I’m not a fan of the process, there are few options for getting around the gate keepers. So I’ve learned to just get on with solving these problems.

Actually I must say that I do like programming challenges. I just don’t tend to solve them well off the top of my head on a piece of paper while the clock is ticking down. I tend to solve them better with a compiler/interpreter/virtual machine, a debugger, and a decent editor.

I happened upon one that isn’t terribly interesting, but for which the brute-force solution is so sufficiently obvious that I was immediately struck by the desire to optimize it (if possible) upon finishing it. The problem description is as follows:

For the final task, you must find all subsets of an array
where the largest number is the sum of the remaining numbers.
For example, for an input of:

(1, 2, 3, 4, 6)

the subsets would be

1 + 2 = 3
1 + 3 = 4
2 + 4 = 6
1 + 2 + 3 = 6

So the answer would be 4

(Some of you may recognize this problem, I found it from Grepplin).

Anyway, my solution was:

import itertools


def largest_sub(nums):
    """
    For the final task, you must find all subsets of an array
    where the largest number is the sum of the remaining numbers.
    For example, for an input of:

    (1, 2, 3, 4, 6)

    the subsets would be

    1 + 2 = 3
    1 + 3 = 4
    2 + 4 = 6
    1 + 2 + 3 = 6

    So the answer would be 4
    """
    combinations = 0
    sorted_nums = sorted(nums)
    for i in range(2, len(sorted_nums) + 1):
        for combination in itertools.combinations(sorted_nums, i):
            if sum(combination) in sorted_nums:
                combinations += 1
    return combinations

if __name__ == '__main__':
    with open('numbers.csv') as f:
        nums_str = f.read()
        nums = map(lambda x: int(x.strip()), nums_str.split(','))
        print largest_sub(nums)

On my laptop it takes about 7 seconds to run on the following input set:

3, 4, 9, 14, 15, 19, 28, 37, 47, 50, 54, 56, 59, 61, 70, 73, 78, 81, 92, 95, 97, 99

I’m basically summing every possible combination of sub-set from two to the length of the input set and checking whether it is in the input set. For small inputs this method might work well enough to get the job done, but every number we add to the input set we increase the number of sums and checks we do exponentially (note that I only just cracked into Concrete Mathematics a couple of months ago, please correct me if I’m wrong about this).

Now, I haven’t taken a stab at optimizing this code yet, but I wanted to get the conversation going if anyone out there wants to share their ideas.

Can you make it faster?

[**Note:** My laptop is a little old, being a Core 2 Duo T5500 @ 1.66Ghz, 1G DDR2, running Ubuntu 10.04]