Everything Dev

Thursday, March 28, 2024

Adding two binary strings together

I was recently mentoring a junior dev who was sharing some stories with me regarding his interview experience for the current role he's in. He recalled that he was thrown some "curve balls" when it came time to the coding challenge portion of the interview. He was faced with three questions: the first was an inverting binary tree question - and the second was a binary search problem. Pretty common and typical questions. He aced the first two... but had a little trouble with the third one: adding two binary strings together.

This problem is generally regarded as "easy", but some of the easy problems require more planning and understanding the underlying mechanics more than others - and this is one of them. In retrospect, he realized that jumping right into coding prior to thinking about the edge cases and logic is what made this feel more like a medium or hard question.

The goal of the Add Binary Strings problem is to write a function that takes two binary strings a and b and returns a third binary string that contains the sum of a and b.

The problem has multiple variations depending on who's asking, but typically one of the commonalities is that the binary strings can be hundreds of digits long - which is pretty much to force you to get creative and not just convert the strings to integers, add them together, and then convert the integer sum back to a string of bits.

A sub-10 line solution

Many of the common programming languages supported by coding interview platforms all bundle their own variation of BigInteger - functionality that allows us to perform bitwise and mathematical operations on large numbers outside the bounds of the typical 32 or 64 bit data types that standard variables are capped at.

If we go down the BigInteger route, we can literally code a solution in just a few lines (code snippet examples in Java, Golang, Javascript, respectively):

// Java
public String addBinary(String a, String b) {
    return new BigInteger(a, 2).add(new BigInteger(b, 2)).toString(2);
}

// Go
func addBinary(a string, b string) string {
    aAsBigInt := new(big.Int)
    aAsBigInt.SetString(a, 2)
    
    bAsBigInt := new(big.Int)
    bAsBigInt.SetString(b, 2)
    
    return aAsBigInt.Add(aAsBigInt, bAsBigInt).Text(2)
}

//JavaScript
var addBinary = function(a, b) {
    return (BigInt("0b"+a) + BigInt("0b"+b)).toString(2)
};

This works, but we're kind of cheating because we didn't actually implement any of the logic ourselves - so let's see how we would roll our own algorithm to achieve the same result.

Implementing the algorithm ourselves

Take the example below - we are adding two numbers: 15 (1111 in binary) and 12 (1100 in binary):

At any given iteration, we have two things we need to do: figure out the current output bit, and the current carry bit that we pass over to the next iteration of the loop. The carry bit will be set to 1 if a column's sum is greater than or equal to 2 (meaning we have an overflow).

This is actually rather straightforward. In each column (starting from the far right) we add all three bits together (carryBit + bitA + bitB). The sum for each column is shown in an orange and blue 2-bit pair. The right bit (blue) is the output bit for that column. The left bit (orange) is the carry bit that overflows to the next column.

Translating over to code

Remember that the addBinary function accepts 2 strings - and the two strings may not be the same lengths as each other. Therefore, we'll create two variables - aLen and bLen, to hold the lengths of each string, and set a variable named maxLen to the value of whichever value is the largest.

func addBinary(a string, b string) string {

    aLen, bLen := len(a), len(b)
    maxLen := aLen
    if bLen > aLen {
        maxLen = bLen
    }

We'll create a couple more variables - carryBit (which again, at the beginning is 0 since we don't have an overflow of anything yet), and sumStr, which will hold the entire resulting sum binary string.

    carryBit := 0
    sumStr := ""

Next we set up the loop - initializing a counter i starting at 1 and increasing that value until we reach maxLen OR while carryBit is not zero:

for i := 1; i <= maxLen || carryBit > 0; i++ {

The offsets for which characters we want to look at each iteration are calculated like so:

        bitA, bitB := 0, 0
        aOffset := aLen - i
        bOffset := bLen - i

You'll probably notice that since we're looping until at least the length of the longest string (OR while we still have a remaining carry bit), it's possible at some point aOffset or bOffset (whichever is the smaller of the two) or both, is going to become negative, which means we've scanned as far to the left as we can and now we're "outside" the bounds of one or both of the strings. The next following lines of code handles these cases:

        if aOffset >= 0 && a[aOffset:aOffset+1] == "1" {
            bitA = 1
        }
        if bOffset >= 0 && b[bOffset:bOffset+1] == "1" {
            bitB = 1
        }

If aOffset is greater than or equal to 0 and the character value at that position in string a is '1' then we set bitA to 1 - otherwise if either of those conditions is false we just keep bitA set to 0.

Same logic for bOffset: If bOffset is greater than or equal to 0 and the character value at that position in string b is '1' then we set bitB to 1 - otherwise we just keep bitB set to 0.

Now we sum carryBit, bitA, and bitB altogether, convert the first bit (rightmost bit) of the sum to a character, and concatenate to the beginning of sumStr.

        sum := carryBit + bitA + bitB
        if sum & 1 == 0 {
            sumStr = "0" + sumStr
        } else {
            sumStr = "1" + sumStr
        }

Finally, we discard the first bit of sum but extract the carry bit to pass over to the next iteration of the loop, which can be done by simply bit shifting sum to the right by 1:

        carryBit = sum >> 1

The full code for the addBinary function (Go implementation):

func addBinary(a string, b string) string {
    aLen, bLen := len(a), len(b)
    maxLen := aLen
    if bLen > aLen {
        maxLen = bLen
    }
    carryBit := 0
    sumStr := ""
    for i := 1; i <= maxLen || carryBit > 0; i++ {
        bitA, bitB := 0, 0
        aOffset := aLen - i
        bOffset := bLen - i
        if aOffset >= 0 && a[aOffset:aOffset+1] == "1" {
            bitA = 1
        }
        if bOffset >= 0 && b[bOffset:bOffset+1] == "1" {
            bitB = 1
        }
        sum := carryBit + bitA + bitB
        if sum & 1 == 0 {
            sumStr = "0" + sumStr
        } else {
            sumStr = "1" + sumStr
        }
        carryBit = sum >> 1
    }
    return sumStr
}

Java version can be found here: https://gist.github.com/edev90/9bc7010845daea8c7f13f260f274f1a3

JavaScript version can be found here: https://gist.github.com/edev90/93fb814b4de2d15129530f4e1e2cf412

As we can see, the answer shouldn't be complex. If you're working on this and come to a solution that has multiple loops, arrays, and nested conditionals all over the place and/or tons of bugs due to mishandled edge cases - it's likely a sign that you need to reevaluate your logic and see how it can be condensed further.

Thursday, November 9, 2023

Coding a fast O(n) solution for the TwoSum problem

One of the more common problems asked during tech interviews is 'Two Sum'.

You are given an array (or list) of integer values, along with a target integer, and must return the two indices of the two integer values that add up to the given target value.

Here are a couple of examples:

Example #1:
---------------------
Input array:         [2, 5, 7, 15, 22]
Target value:         29
Expected output:   [2, 4]

Example #2:
---------------------
Input array:         [1, 4, 9, 6]
Target value:         5
Expected output:   [0, 1]

The easiest and fastest solution is a brute-force approach, where we have a nested loop scanning the array, and checking every possible pair to see if it adds up to our target sum. Such a solution would be coded like this (Java version):

public int[] twoSum_Bruteforce(int[] nums, int target) {
    int numsLen = nums.length;
    for(int indexA = 0; indexA < numsLen; indexA++) {
        for(int indexB = indexA + 1; indexB < numsLen; indexB++) {
            int numA = nums[indexA];
            int numB = nums[indexB];
            if(numA + numB == target) {
                return new int[]{indexA, indexB};
            }
        }
    }
    return new int[]{-1, -1};
}

As a first-time approach to this problem, it's not bad - but its far from ideal. If you were to supply this solution during an actual coding interview, you'd get some credit since there's no denying it actually works, but the interviewer will most likely coax you into writing a more optimized solution.

Here's why: although our brute-force method works perfectly fine for smaller array inputs, it's extremely inefficient for larger arrays.

Below is a graph that illustrates the speed issues for larger arrays. Size of the array is represented on the X-Axis, while runtime (in seconds) is on the Y-Axis. Each runtime was measured with the worst-case scenario - meaning that the correct pair is always located at the end of the array (the last 2 integers). Runtime is measured in seconds, so the results are going to vary a bit from one device to the next (and also from one langauge to another) - but for all intents and purposes - assuming a typical standard Desktop PC running Java... it's going to be about the same. The goal here was to use a inutitive metric to drive the point home.

Chart showing TwoSum Runtimes when using Bruteforce approach

For relatively small arrays with only a couple to a few thousand integers, the runtime is respectable and pretty much "fast" enough. However, once we get into the several thousands, and north of 10,000 elements, the runtime starts to increase exponentially. By the time we get to 100,000 elements, the worst-case runtime is close to 2 seconds! This is expected since the brute-force approach runs with a O(n^2) time complexity. In a real-world environment this type of runtime speed would simply be downright unacceptable - especially for the simple purpose of what this algorithm is trying to achieve to begin with.

The main bottleneck here is that nested loop, which, if we study the problem a little harder, we find that we can actually get rid of it altogether.

Let's look at one of our examples from the top again (Example #1), and see what we can do to find the correct output without needing a nested loop.

The array input is: [2, 5, 7, 15, 22], our target sum is 29, and the expected output is [2, 4] (Remember: we want to return the 2 indices of the solution pair, so 2 is the index of 7, and 4 is the index of 22)

This time, we'll use one loop, and we'll do the following for each integer in the list until we've found the solution:

-> Current Integer: 2, Current Index: 0
Subtract 2 from our target sum (29) = 29 - 2 = 27.
Have we seen 27 before? No.
Store someplace that we have seen the integer 2 at index 0.
Continue

-> Current Integer: 5, Current Index: 1
Subtract 5 from our target sum (29) = 29 - 5 = 24
Have we seen 24 before? No.
Store someplace that we have seen the integer 5 at index 1.
Continue

-> Current Integer: 7, Current Index: 2
Subtract 7 from our target sum (29) = 29 - 7 = 22
Have we seen 22 before? No.
Store someplace that we have seen the integer 7 at index 2
Continue

-> Current Integer: 15, Current Index: 3
Subtract 15 from our target sum (29) = 29 - 15 = 14
Have we seen 14 before? No.
Store someplace that we have seen the integer 15 at index 3
Continue

-> Current Integer: 22, Current Index: 4
Subtract 22 from our target sum (29) = 29 - 22 = 7
Have we seen 7 before? Yes - we saw 7 at index 2
Stop loop and return 2 along with the current index of 4 -> Return [2, 4]

To summarize the above steps: Each time we encountered an integer we hadn't seen yet, we stored that integer along with its respective array index somewhere so we could retrieve it later. We do this every iteration of the loop. If we come across an integer X and see that we've already encountered the value of (target sum - x), then we're done!

Then we just have to figure out what data structure to store the (integer, index) pairs in, and a good choice for that is a Map, since we can easily map Integer -> Index of integer.

Below are two versions of the same implementation of the above approach - one coded in Java, and the other in Go:

Java implementation of fast solution using a map:

public int[] twoSum_Fast(int[] nums, int target) {
    Map<Integer, Integer> numsSeen = new HashMap();
    for(int index = 0; index < nums.length; index++) {
        int otherNum = target - nums[index];
        if(numsSeen.containsKey(otherNum)) {
            return new int[]{numsSeen.get(otherNum), index};
        }
        numsSeen.put(nums[index], index);
    }
    return new int[]{-1, -1};
}

Link to gist: https://gist.github.com/edev90/1d4c1c01cf4d1aec5040078eec6fbf33

Go implementation of fast solution using a map:

func twoSum(nums []int, target int) []int {
    numsSeen := make(map[int]int)
    for index := 0; index < len(nums); index++ {
        otherNum := target - nums[index]
        if indexOfOtherNum, isValidKey := numsSeen[otherNum]; isValidKey {
            return []int{indexOfOtherNum, index}
        }
        numsSeen[nums[index]] = index
    }
    return []int{-1, -1}
}

Link to gist: https://gist.github.com/edev90/4228c5fd3e447e27ff3be6d60f42591c

As we can see, the HashMap solution is faster on average by multiple times more than the brute-force solution. The graph below shows the runtimes and performance for the Two Sum solution using one loop and a HashMap:

Chart showing TwoSum Runtimes when using a Map

Large arrays - even up to 100,000 elements in size are well below sub-tenth of a second in runtime. This is drastically faster than the brute-force approach. In fact, on my machine I had to bump the array size up to 75 million elements to get a runtime with the fast method that matched the 2 second runtime of the brute-force approach for 100,000 elements - which in this case makes the HashMap solution about 500 times faster than the brute-force one.

In the end, we were able to write a solution to this problem with worst case O(n) time complexity. There is a trade-off of course, as we increased the space complexity to track integers we had already seen along with their corresponding index for later retrieval. However, most versions of the Two Sum question are asked with the assumption that time is the most important factor to be optimized. That being said, if you encounter this question in an interview it's always best to ask the interviewer(s) what aspect of the problem you should be optimizing on, and adjust your approach accordingly.

If you'd like to see more of these algorithm/data structure posts, or would like me to cover a specific problem, please drop a comment below letting me know what you'd like to see covered in a future post.

Saturday, January 14, 2023

Returning multiple values in Java - My preferred approach

Java does not natively support the ability to return multiple values from a method.

It would be nice if we could do something like this: could

public int, int methodThatReturnsMultipleValues() {
return 1, 2;
}

But we can’t. You’ll likely find this particularly bothersome if you’ve migrated over to Java from another language that supports returning multiple values out of the box (like Golang, for instance).

Thankfully, there are workarounds we can use to emulate this behavior. One of the more common workarounds is to simply return an array that contains the return values, like below:

public int[] methodThatReturnsMultipleValues() {
return new int[]{1, 2};
}

In order to retrieve both integer values we returned, we read both elements of the array:

int[] returnValues = methodThatReturnsMultipleValues();
int firstValue = returnValues[0];
int secondValue = returnValues[1];

For smaller use cases this works fine, but it gets messier with scale — and that’s why I don’t like to use it.

It’s easy to introduce errors that won’t be exposed until the application is running, such as issues pertaining to:

Array Indexing

Let’s say you forget to make your code zero-based and attempt to pull from returnValues[1] and returnValues[2] instead. At some point an ArrayIndexOutOfBoundsException is going to get thrown when this code executes.

Additionally, by only using indexes to identify which array element corresponds to which return value, you could inadvertently assign returnValue[1] to firstInteger, and returnValue[0] to secondInteger.

Casting

In a lot of situations where it’s handy to return multiple values, the values are of different types. Since Object is a parent class of all other classes in Java (the topmost class), we can have an Object[] array that returns anything we want:

public Object[] methodReturnsMultipleValuesDifferentTypes() {
String a = "This is a string";
int[] b = {1, 2};
return new Object[] {a, b};
}

We can put any object of any type into that array. Here, I’m putting a string and an integer array into the Object[] array.

But we then need to employ casting when we want to read those values back and assign to their appropriate type(s):

Object[] returnValues = methodReturnsMultipleValuesDifferentTypes();
String firstReturnValue = (String)returnValues[0];
int[] secondReturnValue = (int[])returnValues[1];

And therein lies the possibility for casting mistakes to be made. If you mistakenly cast a value as the wrong type, a ClassCastException will occur during runtime.

My Preferred Approach:

Because of all the considerations I mention above, I almost never use the array approach.

Instead, I use a generic class to store whatever values I need to return from my method.

I create a generic class named “ResultPair" (or something that clearly conveys what the class is used for), and then place it somewhere accessible to every other class in the project.

I have a type parameter for every return value, and a constructor that accepts each return value that we want the ResultPair to store:

public class ResultPair <T1, T2> {

public T1 firstValue;

public T2 secondValue;

public ResultPair(T1 firstValue, T2 secondValue) {

this.firstValue = firstValue;

this.secondValue = secondValue;

}

Instead of using a convoluted array, now we just pass the values to our ResultPair constructor, and return that from a method:

public ResultPair<String, int[]> methodThatReturnsResultPair() {
String a = "This is a string";
int[] b = {1, 2};
return new ResultPair(a, b);
}

Now accessing the return values is much cleaner:

ResultPair<String, int[]> returnValues = methodThatReturnsResultPair();
String firstReturnValue = returnValues.firstReturnValue;
int[] secondReturnValue = returnValues.secondReturnValue;

There’s no need now to read from an Array, so ArrayIndexOutOfBounds exceptions are no longer a concern. We also don’t need to cast anymore, and if we even attempt to assign one of the return values to the wrong type (assigning the String return value to a Double, for example), we’ll get a heads up with IDE warnings we can resolve before running the application:

The other really nice thing about this approach is that you have a clear indication of how many, and what type of values you’re getting back from a method. In this case when I see ResultPair<String, int[]> in the method signature, I know that I’m getting a String and an Integer back from the method I called — whereas when dealing with an Object[] array I might have to trace through the method to discover what types of values I’ll be given.

When I did some quick benchmark tests, I found no significant speed impairments with this approach — and in fact often times the generic class approach was faster than the array method.

Everything considered, unless there’s some niche case that calls for using an array, I highly encourage using the generic class approach to return multiple values in Java.

Wednesday, December 14, 2022

Unit tests are not the magic solution to code integrity

I’ve seen dev teams constantly tout the importance of having as many unit tests in a codebase as possible.

This became such an important metric for one such group, that leadership overseeing the team started a competition between the developers to track the “leaders” who had the most unit tests.

Engineering VPs began frequently sending out a chart listing the top unit test producers for each sprint. Each chart would be broken down like this:

Name	# of unit tests committed during sprint
Person A	12
Person B	5
Person C	2
Person D	2

They would stress in their communications for everyone to keep up the good work, and keep adding more “good” unit tests. “We need more and more good unit tests” they would constantly reiterate and repeat over and over in nearly every email or meeting regarding the improving code quality of the codebase.

Here’s the problem with this type of practice: this doesn’t promote the production of more, “good”, quality unit tests — it promotes the production of more frivolous unit tests.

Hyper-focusing on just the volume of unit tests goes against one of the main reasons they exist: to increase the testability, maintainability, extensibility, and clarity of code. If you’re throwing unit tests out there just for the sake of increasing unit test count, you’re actually hindering your ability to achieve these things — not helping them. Ironically, you end up adding one of the biggest things you aim to get rid of: wasteful, glitchy, code overhead that has no purpose. Additionally, keep in mind that unit tests need to get executed in order to actually perform the validation testing that they’re written to do. The unit tests that a developer coded just to reach an inconsequential goal and gain a top spot on a leader board is going to drag down the runtime of build tests (amongst other things) as well as impede the runtime of their own local testing (which can result in the delay of them rolling out bug fixes and new features).

With any mature codebase, there’s often a great deal of bloat or oversights inherit in the code resulting from past development decisions that prioritized speed over quality and thoroughness. This “Tech Debt” makes it harder for developers to test, debug, and extend the surrounding code due to the bloat and verboseness of it. Unfortunately, addressing tech debt isn’t as “flashy” of an endeavor as adding new features to the product. At the same time, however, stakeholders understand that this bloat makes it harder to maintain and support the system, and impairs the organization’s ability to ship new functionality and bug fixes to customers.

This conflict — juggling the realization that tech debt hinders the ability to deliver quality results faster to clients, but not wanting to prioritize and allocate the proper time necessary to clean it up, is what results in leadership pushing these sorts of “more unit test” campaigns. They view past redundant, slow, and disorganized spaghetti code baked into the software as a sort of sunk cost. They find it hard to understand why any time should be spent on rewriting it, and instead believe that everyone should just move on and switch their attention and energy into writing unit tests for future code commits. This view is compounded by the fact that implementing new features is what keeps customers engaged and interested in a product — customers don’t care about your tech debt or past development oversights, they just want new features built, and current features to be faster.

As such, it’s often tricky to get product managers and leadership onboard with the idea of dedicating an entire sprint (or more) to tackling tech debt when there is a backlog of customer requested features waiting to be implemented. That’s why the push for tech debt removal needs to be sold in a skillful way. Whenever possible, point out the fact that that removing the convoluted code in Module X would cut the development time for Feature Y in half, or make a report 200% times faster for a client to run. Hold routine meetings with PMs, management, and support staff reviewing any recent escalations and bugs. Show how an issue could have been mitigated faster if getting down to the root cause of the issue hadn’t involved hours of untangling spaghetti code. It’s easier to get buy-in when everyone realizes that messy code resulted in a customer not getting a fix in time and escalating a complaint up to the president of the company.

The Bottom Line

Unit tests should absolutely be included in code wherever possible to strengthen the quality of the codebase. However, they should be added strategically, and not just thrown around for the sake of adding more unit tests. Trying to gamify such metrics is pointless in general anyway, especially when there are so many additional actions that need to be taken to foster a stable system. Unit testing is not the sole magic solution to improving code integrity.

Moreover, adding new unit tests alongside new code does not retroactively fix existing tech debt and alleviate the overhead costs such debt poses to the maintainability of a system. If this tech debt isn’t proactively taken care of, someone will just have to clean it up to allow them to support unit tests eventually, so past tech debt is never avoidable for very long.

Refactoring efforts and initiatives to optimize current modules and remove existing debt should be prioritized. This will strip existing overhead from the code, and facilitate the process of writing higher quality unit tests in the future. A strong foundation of clean, organized, well structured code will lead the way for higher quality, and better unit tests down the line.

Tuesday, August 2, 2022

GACubeSolver version 1.0 release

I've released version 1.0 of GACubeSolver on my github page.

GACubeSolver is a Rubik's cube solver I wrote in Java. It uses the Swing framework for the UI.

Instead of having hard-coded algorithms (like beginner's method, CFOP, Roux, etc) that the solver follows, the solver must figure out entirely on its own how to solve a scrambled cube. In order to achieve this, genetic algorithms are used. Traditionally speaking, this isn't the best use case for genetic algorithms due to the highly combinatorial nature of the Rubik's cube - but that's what makes this a fun and interesting challenge.

So far, GACubeSolver can completely solve the pocket cube aka the 2x2 mini Rubik's cube (solve times vary depending on how scrambled the cube is). Getting the 3x3 fully solvable is the ultimate goal, and so far is a work in progress.

The source code as well as a brief overview of the tech details and more details regarding the current support for the 3x3, can found on the project's github page.

Monday, August 16, 2021

260 Colors Generated with Cellular Automata

The block of code below produces the image shown above... can you figure out how?

(Continue reading on for the answer)

A couple weeks ago I wrote a post about using Rule 30, a type of cellular automaton, to build a PRNG (pseudo random number generator).

Since then, I've been playing around with the code used in that tutorial to see how much smaller I could make it. Turns out I was able to golf it a lot -- down to just 264 characters of javascript in fact (264 chars of JS + 70 of HTML).

This was the code before:

And this is the code now:

For the most part it was just about compressing the code, removing unnecessary html tags, and stripping as many semicolons as I could without breaking the code. I also changed the way I'm building the colors: you'll notice that instead of breaking the integer out into an RGB string, I'm now formatting them as hex codes so I could get back some of the chars I had spent on string concatenation and bitshifting.

I also ditched the arrow function after taking a closer look and realizing it wasn't even necessary. The whole reason I had this to begin with is that I need to know when I'm looking at cells 'outside' the bounds of a row in the Rule30 grid (the 2 leading and trailing blank cells that are always present on each row).

So I golfed this line: b.push(f(i) ^ (f(i+1) || f(i+2)));

Down to: b.push(a[i]^(a[i+1]|a[i+2]));

Why are we able to do this? You might wonder, won't there now be cases when an index will be out of bounds and error out? Well actually no, because in JS attempting to fetch an element from an out of bounds index will just return undefined, which in this case is actually useful. When we use undefined as part of boolean logic, it will become false, (aka 0 when interpreted in math) anyway, which is what we want!

A couple other minor differences included notching up the main loop from 3120 to 6240 iterations so there are 260 colors instead of 130. And instead of a circle I let each color fan out off screen because I think it looks a lot cooler than the colorwheel.

A practical takeaway from this is getting to visualize Rule 30 from a different perspective. It's clear from looking at all the colors generated from such condensed code that Rule 30 is a truly interesting cellular automaton and there are likely a myriad of other use cases it can be applied to that have yet to be discovered.

If you happen to golf the code down even further, or want to share some other piece of code using Rule 30 in some unique way, go ahead and share in a comment below!

Tuesday, July 20, 2021

Generating Pseudo Random Numbers with Cellular Automata

Below is an image of a color wheel on an HTML5 canvas:

This is code that produces that image:

Only 31 lines of JavaScript code, and yet the wheel is comprised of 130 unique colors. The same 130 unique colors each time too. Notice there's no loading data from external calls made to fetch data from other scripts or pages, no Math.random(), and no compressed data used. So how was this accomplished?

The answer to this puzzle is Rule 30, a cellular automaton introduced by Stephen Wolfram. As we'll see in a moment, the bits in the center column of Rule 30 are chaotic and complex and follow no easily discernible pattern. In this case, I collect those bits and concatenate them into integers that I use as RGB values in the wheel.

The rule set for Rule 30 is: Bitwise concatenate each group of 3 cells in a row. If the bitwise concatenation of the group is 1, 2, 3, or 4, then the resulting cell underneath will be 1 (represented by dark shaded cells). Otherwise, 0, 5, 6, or 7 will be 0 (represented by light color cells).

Even simpler: for every 3 bit group, the resulting bottom cell will be the value of the left cell, XOR’d by the the result of the center cell OR’d by the right cell (Bottom Cell = Left Cell XOR (Center Cell OR Right Cell).

The GIF below illustrates how the first 4 rows of the Rule 30 automata are built when the initial input seed is 1:

Similar to other cellular automata, the rows are built repeatedly one after the other. The interesting thing about Rule 30 is that, unlike other automata, the rule set here produces a center column that exhibits complex and seemingly random behavior for many seed inputs (the input of 1 is an example of a seed that produces a complex behavior).

There are pretty much 2 ways you can exploit the center column to implement a pseudo random number generator.

1.) At runtime populate Rule 30 data for n rows (i.e: 100). Then take some other runtime value that isn’t exactly random, but changes constantly and reliably — say the system current timestamp. Then mod this by the number of rows, and then start collecting the bits starting at that row in the center column for as many bits as you like (wrapping around if needed).

For instance, given a current timestamp of 1626103527, and 100 rows of generated Rule 30 data, and we want to build a 32 bit integer. We’d start at row 27 (1629103527 mod 100 = 27). Select bits from row 27 to 58 which yields the 32 bit number.

2.) Take the timestamp and convert it to a string of bits (1626103527 in base 2 = 1100001000110100010010110100111) and then use that as the start seed. Only 32 rows would have to be generated to get a random number. Beware that a downside to this is there’s no guarantee the current seed will produce a random output. For instance, no combination of XOR and ORs will produce 1s given all nonzero inputs, so a seed of 0 will produce a ‘0’ in every row of the center.

How you set up the data structures/arrays to accomplish either of the 2 methods is up to you.

In short: the first method requires generating more rows up front, but then we don’t need to generate any further rows after that. In the second scenario we plug in varying seed inputs each time and only need to generate the # of rows for the # of bits that we want our number to consist of, but it’s harder to guarantee that our dataset will be complex.

One more example with a visual:

We want to generate a random 8 bit value using the first method. We build 8 rows of the automata using an initial seed of 1.

Then we bitwise concatenate those 8 bits together, which makes 11011100 in binary, or 220 in decimal.

Found this interesting and up for a challenge? Check out the contest that Stephen Wolfram has going offering $30,000 in total prizes to solve questions regarding the chaotic behaviors surrounding Rule 30 and why precisely it occurs.

Questions, comments, feedback, or have your own code snippets that uses Rule 30 or other cellular automata in unique or fun ways? Please share below in the comments!