Thursday, November 9, 2023

Coding a fast O(n) solution for the TwoSum problem

One of the more common problems asked during tech interviews is 'Two Sum'.

You are given an array (or list) of integer values, along with a target integer, and must return the two indices of the two integer values that add up to the given target value. 

Here are a couple of examples:


Example #1:
---------------------
Input array:             [2, 5, 7, 15, 22]
Target value:           29
Expected output:    [2, 4]


Example #2:
---------------------
Input array:             [1, 4, 9, 6]
Target value:           5
Expected output:    [0, 1]


The easiest and fastest solution is a brute-force approach, where we have a nested loop scanning the array, and checking every possible pair to see if it adds up to our target sum. Such a solution would be coded like this (Java version):

public int[] twoSum_Bruteforce(int[] nums, int target) {
int numsLen = nums.length;
for(int indexA = 0; indexA < numsLen; indexA++) {
for(int indexB = indexA + 1; indexB < numsLen; indexB++) {
int numA = nums[indexA];
int numB = nums[indexB];
if(numA + numB == target) {
return new int[]{indexA, indexB};
}
}
}
return new int[]{-1, -1};
}


As a first-time approach to this problem, it's not bad - but its far from ideal. If you were to supply this solution during an actual coding interview, you'd get some credit since there's no denying it actually works, but the interviewer will most likely coax you into writing a more optimized solution. 


Here's why: although our brute-force method works perfectly fine for smaller array inputs, it's extremely inefficient for larger arrays.


Below is a graph that illustrates the speed issues for larger arrays. Size of the array is represented on the X-Axis, while runtime (in seconds) is on the Y-Axis. Each runtime was measured with the worst-case scenario - meaning that the correct pair is always located at the end of the array (the last 2 integers). Runtime is measured in seconds, so the results are going to vary a bit from one device to the next (and also from one langauge to another) - but for all intents and purposes - assuming a typical standard Desktop PC running Java... it's going to be about the same. The goal here was to use a inutitive metric to drive the point home.


Chart showing TwoSum Runtimes when using Bruteforce approach

For relatively small arrays with only a couple to a few thousand integers, the runtime is respectable and pretty much "fast" enough. However, once we get into the several thousands, and north of 10,000 elements, the runtime starts to increase exponentially. By the time we get to 100,000 elements, the worst-case runtime is close to 2 seconds! This is expected since the brute-force approach runs with a O(n^2) time complexity. In a real-world environment this type of runtime speed would simply be downright unacceptable - especially for the simple purpose of what this algorithm is trying to achieve to begin with. 

The main bottleneck here is that nested loop, which, if we study the problem a little harder, we find that we can actually get rid of it altogether.


Let's look at one of our examples from the top again (Example #1), and see what we can do to find the correct output without needing a nested loop.

The array input is: [2, 5, 7, 15, 22], our target sum is 29, and the expected output is [2, 4] (Remember: we want to return the 2 indices of the solution pair, so 2 is the index of 7, and 4 is the index of 22)

This time, we'll use one loop, and we'll do the following for each integer in the list until we've found the solution:

-> Current Integer: 2, Current Index: 0
     Subtract 2 from our target sum (29) = 29 - 2 = 27.
     Have we seen 27 before? No.
     Store someplace that we have seen the integer 2 at index 0.
     Continue


-> Current Integer: 5, Current Index: 1
     Subtract 5 from our target sum (29) = 29 - 5 = 24
     Have we seen 24 before? No.
     Store someplace that we have seen the integer 5 at index 1.
     Continue


-> Current Integer: 7, Current Index: 2
     Subtract 7 from our target sum (29) = 29 - 7 = 22
     Have we seen 22 before? No.
     Store someplace that we have seen the integer 7 at index 2
     Continue


-> Current Integer: 15, Current Index: 3
     Subtract 15 from our target sum (29) = 29 - 15 = 14
     Have we seen 14 before? No.
     Store someplace that we have seen the integer 15 at index 3
     Continue


-> Current Integer: 22, Current Index: 4
     Subtract 22 from our target sum (29) = 29 - 22 = 7
     Have we seen 7 before? Yes - we saw 7 at index 2
     Stop loop and return 2 along with the current index of 4 -> Return [2, 4]


To summarize the above steps: Each time we encountered an integer we hadn't seen yet, we stored that integer along with its respective array index somewhere so we could retrieve it later. We do this every iteration of the loop. If we come across an integer X and see that we've already encountered the value of (target sum - x), then we're done!

Then we just have to figure out what data structure to store the (integer, index) pairs in, and a good choice for that is a Map, since we can easily map Integer -> Index of integer.

Below are two versions of the same implementation of the above approach - one coded in Java, and the other in Go:


Java implementation of fast solution using a map:

public int[] twoSum_Fast(int[] nums, int target) {
Map<Integer, Integer> numsSeen = new HashMap();
for(int index = 0; index < nums.length; index++) {
int otherNum = target - nums[index];
if(numsSeen.containsKey(otherNum)) {
return new int[]{numsSeen.get(otherNum), index};
}
numsSeen.put(nums[index], index);
}
return new int[]{-1, -1};
}

Link to gist: https://gist.github.com/edev90/1d4c1c01cf4d1aec5040078eec6fbf33


Go implementation of fast solution using a map:

func twoSum(nums []int, target int) []int {
numsSeen := make(map[int]int)
for index := 0; index < len(nums); index++ {
otherNum := target - nums[index]
if indexOfOtherNum, isValidKey := numsSeen[otherNum]; isValidKey {
return []int{indexOfOtherNum, index}
}
numsSeen[nums[index]] = index
}
return []int{-1, -1}
}

Link to gist: https://gist.github.com/edev90/4228c5fd3e447e27ff3be6d60f42591c


As we can see, the HashMap solution is faster on average by multiple times more than the brute-force solution. The graph below shows the runtimes and performance for the Two Sum solution using one loop and a HashMap:


Chart showing TwoSum Runtimes when using a Map

Large arrays - even up to 100,000 elements in size are well below sub-tenth of a second in runtime. This is drastically faster than the brute-force approach. In fact, on my machine I had to bump the array size up to 75 million elements to get a runtime with the fast method that matched the 2 second runtime of the brute-force approach for 100,000 elements - which in this case makes the HashMap solution about 500 times faster than the brute-force one.

In the end, we were able to write a solution to this problem with worst case O(n) time complexity. There is a trade-off of course, as we increased the space complexity to track integers we had already seen along with their corresponding index for later retrieval. However, most versions of the Two Sum question are asked with the assumption that time is the most important factor to be optimized. That being said, if you encounter this question in an interview it's always best to ask the interviewer(s) what aspect of the problem you should be optimizing on, and adjust your approach accordingly.


If you'd like to see more of these algorithm/data structure posts, or would like me to cover a specific problem, please drop a comment below letting me know what you'd like to see covered in a future post.