Thursday, November 9, 2023

Coding a fast O(n) solution for the TwoSum problem

One of the more common problems asked during tech interviews is 'Two Sum'.

You are given an array (or list) of integer values, along with a target integer, and must return the two indices of the two integer values that add up to the given target value. 

Here are a couple of examples:


Example #1:
---------------------
Input array:             [2, 5, 7, 15, 22]
Target value:           29
Expected output:    [2, 4]


Example #2:
---------------------
Input array:             [1, 4, 9, 6]
Target value:           5
Expected output:    [0, 1]


The easiest and fastest solution is a brute-force approach, where we have a nested loop scanning the array, and checking every possible pair to see if it adds up to our target sum. Such a solution would be coded like this (Java version):

public int[] twoSum_Bruteforce(int[] nums, int target) {
int numsLen = nums.length;
for(int indexA = 0; indexA < numsLen; indexA++) {
for(int indexB = indexA + 1; indexB < numsLen; indexB++) {
int numA = nums[indexA];
int numB = nums[indexB];
if(numA + numB == target) {
return new int[]{indexA, indexB};
}
}
}
return new int[]{-1, -1};
}


As a first-time approach to this problem, it's not bad - but its far from ideal. If you were to supply this solution during an actual coding interview, you'd get some credit since there's no denying it actually works, but the interviewer will most likely coax you into writing a more optimized solution. 


Here's why: although our brute-force method works perfectly fine for smaller array inputs, it's extremely inefficient for larger arrays.


Below is a graph that illustrates the speed issues for larger arrays. Size of the array is represented on the X-Axis, while runtime (in seconds) is on the Y-Axis. Each runtime was measured with the worst-case scenario - meaning that the correct pair is always located at the end of the array (the last 2 integers). Runtime is measured in seconds, so the results are going to vary a bit from one device to the next (and also from one langauge to another) - but for all intents and purposes - assuming a typical standard Desktop PC running Java... it's going to be about the same. The goal here was to use a inutitive metric to drive the point home.


Chart showing TwoSum Runtimes when using Bruteforce approach

For relatively small arrays with only a couple to a few thousand integers, the runtime is respectable and pretty much "fast" enough. However, once we get into the several thousands, and north of 10,000 elements, the runtime starts to increase exponentially. By the time we get to 100,000 elements, the worst-case runtime is close to 2 seconds! This is expected since the brute-force approach runs with a O(n^2) time complexity. In a real-world environment this type of runtime speed would simply be downright unacceptable - especially for the simple purpose of what this algorithm is trying to achieve to begin with. 

The main bottleneck here is that nested loop, which, if we study the problem a little harder, we find that we can actually get rid of it altogether.


Let's look at one of our examples from the top again (Example #1), and see what we can do to find the correct output without needing a nested loop.

The array input is: [2, 5, 7, 15, 22], our target sum is 29, and the expected output is [2, 4] (Remember: we want to return the 2 indices of the solution pair, so 2 is the index of 7, and 4 is the index of 22)

This time, we'll use one loop, and we'll do the following for each integer in the list until we've found the solution:

-> Current Integer: 2, Current Index: 0
     Subtract 2 from our target sum (29) = 29 - 2 = 27.
     Have we seen 27 before? No.
     Store someplace that we have seen the integer 2 at index 0.
     Continue


-> Current Integer: 5, Current Index: 1
     Subtract 5 from our target sum (29) = 29 - 5 = 24
     Have we seen 24 before? No.
     Store someplace that we have seen the integer 5 at index 1.
     Continue


-> Current Integer: 7, Current Index: 2
     Subtract 7 from our target sum (29) = 29 - 7 = 22
     Have we seen 22 before? No.
     Store someplace that we have seen the integer 7 at index 2
     Continue


-> Current Integer: 15, Current Index: 3
     Subtract 15 from our target sum (29) = 29 - 15 = 14
     Have we seen 14 before? No.
     Store someplace that we have seen the integer 15 at index 3
     Continue


-> Current Integer: 22, Current Index: 4
     Subtract 22 from our target sum (29) = 29 - 22 = 7
     Have we seen 7 before? Yes - we saw 7 at index 2
     Stop loop and return 2 along with the current index of 4 -> Return [2, 4]


To summarize the above steps: Each time we encountered an integer we hadn't seen yet, we stored that integer along with its respective array index somewhere so we could retrieve it later. We do this every iteration of the loop. If we come across an integer X and see that we've already encountered the value of (target sum - x), then we're done!

Then we just have to figure out what data structure to store the (integer, index) pairs in, and a good choice for that is a Map, since we can easily map Integer -> Index of integer.

Below are two versions of the same implementation of the above approach - one coded in Java, and the other in Go:


Java implementation of fast solution using a map:

public int[] twoSum_Fast(int[] nums, int target) {
Map<Integer, Integer> numsSeen = new HashMap();
for(int index = 0; index < nums.length; index++) {
int otherNum = target - nums[index];
if(numsSeen.containsKey(otherNum)) {
return new int[]{numsSeen.get(otherNum), index};
}
numsSeen.put(nums[index], index);
}
return new int[]{-1, -1};
}

Link to gist: https://gist.github.com/edev90/1d4c1c01cf4d1aec5040078eec6fbf33


Go implementation of fast solution using a map:

func twoSum(nums []int, target int) []int {
numsSeen := make(map[int]int)
for index := 0; index < len(nums); index++ {
otherNum := target - nums[index]
if indexOfOtherNum, isValidKey := numsSeen[otherNum]; isValidKey {
return []int{indexOfOtherNum, index}
}
numsSeen[nums[index]] = index
}
return []int{-1, -1}
}

Link to gist: https://gist.github.com/edev90/4228c5fd3e447e27ff3be6d60f42591c


As we can see, the HashMap solution is faster on average by multiple times more than the brute-force solution. The graph below shows the runtimes and performance for the Two Sum solution using one loop and a HashMap:


Chart showing TwoSum Runtimes when using a Map

Large arrays - even up to 100,000 elements in size are well below sub-tenth of a second in runtime. This is drastically faster than the brute-force approach. In fact, on my machine I had to bump the array size up to 75 million elements to get a runtime with the fast method that matched the 2 second runtime of the brute-force approach for 100,000 elements - which in this case makes the HashMap solution about 500 times faster than the brute-force one.

In the end, we were able to write a solution to this problem with worst case O(n) time complexity. There is a trade-off of course, as we increased the space complexity to track integers we had already seen along with their corresponding index for later retrieval. However, most versions of the Two Sum question are asked with the assumption that time is the most important factor to be optimized. That being said, if you encounter this question in an interview it's always best to ask the interviewer(s) what aspect of the problem you should be optimizing on, and adjust your approach accordingly.


If you'd like to see more of these algorithm/data structure posts, or would like me to cover a specific problem, please drop a comment below letting me know what you'd like to see covered in a future post.


Saturday, January 14, 2023

Returning multiple values in Java - My preferred approach

Java does not natively support the ability to return multiple values from a method.

It would be nice if we could do something like this: could


public int, int methodThatReturnsMultipleValues() {

    return 1, 2;

}


But we can’t. You’ll likely find this particularly bothersome if you’ve migrated over to Java from another language that supports returning multiple values out of the box (like Golang, for instance).


Thankfully, there are workarounds we can use to emulate this behavior. One of the more common workarounds is to simply return an array that contains the return values, like below:


public int[] methodThatReturnsMultipleValues() {

    return new int[]{1, 2};

}


In order to retrieve both integer values we returned, we read both elements of the array:


int[] returnValues = methodThatReturnsMultipleValues();

int firstValue = returnValues[0];

int secondValue = returnValues[1];


For smaller use cases this works fine, but it gets messier with scale — and that’s why I don’t like to use it. 


It’s easy to introduce errors that won’t be exposed until the application is running, such as issues pertaining to:


Array Indexing


Let’s say you forget to make your code zero-based and attempt to pull from returnValues[1] and returnValues[2] instead. At some point an ArrayIndexOutOfBoundsException is going to get thrown when this code executes.


Additionally, by only using indexes to identify which array element corresponds to which return value, you could inadvertently assign returnValue[1] to firstInteger, and returnValue[0] to secondInteger.


Casting


In a lot of situations where it’s handy to return multiple values, the values are of different types. Since Object is a parent class of all other classes in Java (the topmost class), we can have an Object[] array that returns anything we want:


public Object[] methodReturnsMultipleValuesDifferentTypes() {

    String a = "This is a string";

    int[] b = {1, 2};

    return new Object[] {a, b};

}


We can put any object of any type into that array. Here, I’m putting a string and an integer array into the Object[] array.


But we then need to employ casting when we want to read those values back and assign to their appropriate type(s):


Object[] returnValues = methodReturnsMultipleValuesDifferentTypes();

String firstReturnValue = (String)returnValues[0];

int[] secondReturnValue = (int[])returnValues[1];

And therein lies the possibility for casting mistakes to be made. If you mistakenly cast a value as the wrong type, a ClassCastException will occur during runtime.


My Preferred Approach: 


Because of all the considerations I mention above, I almost never use the array approach.


Instead, I use a generic class to store whatever values I need to return from my method.


I create a generic class named “ResultPair" (or something that clearly conveys what the class is used for), and then place it somewhere accessible to every other class in the project. 


I have a type parameter for every return value, and a constructor that accepts each return value that we want the ResultPair to store:


public class ResultPair <T1, T2> {


    public T1 firstValue;

    public T2 secondValue;


    public ResultPair(T1 firstValue, T2 secondValue) {

        this.firstValue = firstValue;

        this.secondValue = secondValue;

    }


}


Instead of using a convoluted array, now we just pass the values to our ResultPair constructor, and return that from a method:


public ResultPair<String, int[]> methodThatReturnsResultPair() {

    String a = "This is a string";

    int[] b = {1, 2};

    return new ResultPair(a, b);

}



Now accessing the return values is much cleaner:


ResultPair<String, int[]> returnValues = methodThatReturnsResultPair();

String firstReturnValue = returnValues.firstReturnValue;

int[] secondReturnValue = returnValues.secondReturnValue;


There’s no need now to read from an Array, so ArrayIndexOutOfBounds exceptions are no longer a concern. We also don’t need to cast anymore, and if we even attempt to assign one of the return values to the wrong type (assigning the String return value to a Double, for example), we’ll get a heads up with IDE warnings we can resolve before running the application:




The other really nice thing about this approach is that you have a clear indication of how many, and what type of values you’re getting back from a method. In this case when I see ResultPair<String, int[]> in the method signature, I know that I’m getting a String and an Integer back from the method I called — whereas when dealing with an Object[] array I might have to trace through the method to discover what types of values I’ll be given.


When I did some quick benchmark tests, I found no significant speed impairments with this approach — and in fact often times the generic class approach was faster than the array method. 


Everything considered, unless there’s some niche case that calls for using an array, I highly encourage using the generic class approach to return multiple values in Java.