Monday, August 16, 2021

260 Colors Generated with Cellular Automata



The block of code below produces the image shown above... can you figure out how?
(Continue reading on for the answer)




A couple weeks ago I wrote a post about using Rule 30, a type of cellular automaton, to build a PRNG (pseudo random number generator).

Since then, I've been playing around with the code used in that tutorial to see how much smaller I could make it. Turns out I was able to golf it a lot -- down to just 264 characters of javascript in fact (264 chars of JS + 70 of HTML).

This was the code before:

And this is the code now:

For the most part it was just about compressing the code, removing unnecessary html tags, and stripping as many semicolons as I could without breaking the code. I also changed the way I'm building the colors: you'll notice that instead of breaking the integer out into an RGB string, I'm now formatting them as hex codes so I could get back some of the chars I had spent on string concatenation and bitshifting.

I also ditched the arrow function after taking a closer look and realizing it wasn't even necessary. The whole reason I had this to begin with is that I need to know when I'm looking at cells 'outside' the bounds of a row in the Rule30 grid (the 2 leading and trailing blank cells that are always present on each row). 

So I golfed this line: b.push(f(i) ^ (f(i+1) || f(i+2)));

Down to: b.push(a[i]^(a[i+1]|a[i+2]));

Why are we able to do this? You might wonder, won't there now be cases when an index will be out of bounds and error out? Well actually no, because in JS attempting to fetch an element from an out of bounds index will just return undefined, which in this case is actually useful. When we use undefined as part of boolean logic, it will become false, (aka 0 when interpreted in math) anyway, which is what we want!

A couple other minor differences included notching up the main loop from 3120 to 6240 iterations so there are 260 colors instead of 130. And instead of a circle I let each color fan out off screen because I think it looks a lot cooler than the colorwheel.

A practical takeaway from this is getting to visualize Rule 30 from a different perspective. It's clear from looking at all the colors generated from such condensed code that Rule 30 is a truly interesting cellular automaton and there are likely a myriad of other use cases it can be applied to that have yet to be discovered.

If you happen to golf the code down even further, or want to share some other piece of code using Rule 30 in some unique way, go ahead and share in a comment below!

Tuesday, July 20, 2021

Generating Pseudo Random Numbers with Cellular Automata

Below is an image of a color wheel on an HTML5 canvas:




This is code that produces that image: 

Only 31 lines of JavaScript code, and yet the wheel is comprised of 130 unique colors. The same 130 unique colors each time too. Notice there's no loading data from external calls made to fetch data from other scripts or pages, no Math.random(), and no compressed data used. So how was this accomplished?


The answer to this puzzle is Rule 30, a cellular automaton introduced by Stephen Wolfram. As we'll see in a moment, the bits in the center column of Rule 30 are chaotic and complex and follow no easily discernible pattern. In this case, I collect those bits and concatenate them into integers that I use as RGB values in the wheel.


The rule set for Rule 30 is: Bitwise concatenate each group of 3 cells in a row. If the bitwise concatenation of the group is 1, 2, 3, or 4, then the resulting cell underneath will be 1 (represented by dark shaded cells).  Otherwise, 0, 5, 6, or 7 will be 0 (represented by light color cells).


Even simpler: for every 3 bit group, the resulting bottom cell will be the value of the left cell, XOR’d by the the result of the center cell OR’d by the right cell (Bottom Cell = Left Cell XOR (Center Cell OR Right Cell).


The GIF below illustrates how the first 4 rows of the Rule 30 automata are built when the initial input seed is 1:




Similar to other cellular automata, the rows are built repeatedly one after the other. The interesting thing about Rule 30 is that, unlike other automata, the rule set here produces a center column that exhibits complex and seemingly random behavior for many seed inputs (the input of 1 is an example of a seed that produces a complex behavior).


There are pretty much 2 ways you can exploit the center column to implement a pseudo random number generator. 


1.) At runtime populate Rule 30 data for n rows (i.e: 100). Then take some other runtime value that isn’t exactly random, but changes constantly and reliably — say the system current timestamp. Then mod this by the number of rows, and then start collecting the bits starting at that row in the center column for as many bits as you like (wrapping around if needed).


For instance, given a current timestamp of 1626103527, and 100 rows of generated Rule 30 data, and we want to build a 32 bit integer. We’d start at row 27 (1629103527 mod 100 = 27). Select bits from row 27 to 58 which yields the 32 bit number. 


2.) Take the timestamp and convert it to a string of bits (1626103527 in base 2 = 1100001000110100010010110100111) and then use that as the start seed. Only 32 rows would have to be generated to get a random number. Beware that a downside to this is there’s no guarantee the current seed will produce a random output. For instance, no combination of XOR and ORs will produce 1s given all nonzero inputs, so a seed of 0 will produce a ‘0’ in every row of the center. 


How you set up the data structures/arrays to accomplish either of the 2 methods is up to you.


In short: the first method requires generating more rows up front, but then we don’t need to generate any further rows after that. In the second scenario we plug in varying seed inputs each time and only need to generate the # of rows for the # of bits that we want our number to consist of, but it’s harder to guarantee that our dataset will be complex.


One more example with a visual:


We want to generate a random 8 bit value using the first method. We build 8 rows of the automata using an initial seed of 1. 



Then we bitwise concatenate those 8 bits together, which makes 11011100 in binary, or 220 in decimal.





Found this interesting and up for a challenge? Check out the contest that Stephen Wolfram has going offering $30,000 in total prizes to solve questions regarding the chaotic behaviors surrounding Rule 30 and why precisely it occurs.


Questions, comments, feedback, or have your own code snippets that uses Rule 30 or other cellular automata in unique or fun ways? Please share below in the comments!

Thursday, May 6, 2021

Genetic algorithms vs. Genetic Programming

In the Introduction to Genetic Algorithms video tutorial, we saw how GAs are powerful evolutionary algorithms that are used to synthesize solutions and optimize parameters for problems that are too complex or convoluted for humans to solve on their own. 


They work well in cases where semantics isn’t a huge concern. So in the case of the video, each possible step the mouse was able to take was encoded as an integer (gene). Strings of these genes (chromosomes) were built, and we kept evolving them through mutation and crossover until we found a chromosome that got the mouse from the start of the maze over to the cheese at the end. 


We weren’t concerned about what order the integers were in a chromosome, or if any of the states were “invalid”. Sure, there were some chromosomes that had a lot of noise or useless actions, but nothing that inherently didn’t make sense. This is what I mean by “semantics”.

So what happens if we wanted to use GA to evolve a mathematical expression? An area where semantics does matter.


The goal here will be to find a Mathematical expression that will yield the following outputs from inputs:


When input = 2, Output = 9

When Input = 0, Output = 5

When Input = 1, Output = 6

When Input = .5, Output = 5.125


The exact expression we’re looking for the computer to find is y = x^2 + 5.


We could encode each mathematical operation as an integer:


0: x

1: ^

2: *

3: +

4: -

5 through 9: (can be a number/digit in the range 0 to 5)


A few possible states we could generate:


701846 (would be 2x^3 - 1)

601846 (would be x^3 - 1)

031111 (would be x+^^^^)

222223 (would be *****+)

434343 (would be -+-+-+)


The first two states aren’t exactly the winning expression that we want, but they’re valid. The last 3 though are completely bogus. 


Even if we started with a proper string, a slight alteration during the mutation process can have a drastic result on the offspring. A generated expression such as 701846 undergoing a mutation could result in 101846, which now results in the expression ^x^3 - 1 — which doesn’t make much sense, does it? 


In order to deal with these invalid states, we'd need to include mechanisms into the parser that would repair or the damaged expressions prior to evaluating their fitness. It's certainly doable, but not very practical, and it’s definitely not scalable as we advance to building more complicated expressions. In theory, we’d need to manually code in more and more rules on how the computer should repair/handle invalid states. 



Enter Genetic Programming



Genetic programming has qualities that naturally safeguards it from spitting out faulty states like above.


Unlike GA, genetic programs are typically represented as a structure of “nodes” connected to each other somehow in memory. One of the more common structures used to represent them are trees.


The math function (4 + (X/25) - (9 * sqrt(X)) represented by a tree structure:


A math function represented by a tree structure



Since we’re dealing with a tree data structure now, we won’t be able to just mutate the genes by flipping a bit, or swapping the value of an integer - more effort will need to go in to the mutation and crossover operations (if you're looking for specifics on the code part, I'll be covering this part in detail in future articles).


But the nice thing here is that the restrictions are built directly into the tree, so we don’t need to deal with invalid states.


The root node, and also the internal nodes (nodes with child nodes) can only be operators. The terminal nodes (those without child nodes) can only contain constants or variables. We don’t connect an operator to a constant or variable node. Connecting nodes in this manner prevents us from generating a state like +X/-9+6.


It’s possible we could run into divide by zero errors, but these can be easily dealt with during the fitness evaluation stage of the GP. 


The bottom line


Practically speaking, there’s a fine line between what constitutes as a “genetic algorithm” or “genetic program”. At the end of the day — it really comes down to the classification and context of the problem you’re solving, the particular mechanics you’re using, and the format of the states that you’re generating.


In general, the primary differences come down to this:


Genetic Algorithms are traditionally built using strings of bits, integers, or characters that represent the genes that make up a chromosome. While this can make it pretty easy to manage and and evolve a GA, it can make evaluating them more cumbersome when they are applied to applications where they have a propensity to generate invalid states. Additionally, the fact that GA chromosomes are typically restricted by a fixed length results in decreased flexibility in the ability to scale as a problem becomes more complex.


With Genetic Programs, we adopt the same underlying evolutionary concepts and foundation from GA, but encode the individuals with a tree (or similar) structure rather than a string or list of genes. A consequence of this is that the overhead required to grow and maintain these structures is significantly increased. On the positive side, we have more control to enforce limits directly into the programs we build which greatly decreases the occurrence of invalid states. Due to the variable size of this type of representation, GP individuals are inherently more flexible than GA chromosomes, and can grow and scale as needed according to a problem's complexity — while efficiently preserving limits and rules throughout the tree.

Thursday, February 18, 2021

Checking if a string is a permutation of a Palindrome


During this post we’ll be writing an algorithm that determines whether a string is a permutation of a Palindrome. We’re not talking about finding out whether a string is a Palindrome, but whether if a string is a permutation of a Palindrome. 


In other words, given a string of scrambled characters, we want to see if it would be possible to rearrange those characters into a word that spells the forward as it does backward (i.e: “oonn” can be re-arranged to form the palindrome “noon”). 


You might be tempted to write code that finds every possible permutation of characters to see if one forms a Palindrome. This would work for small strings but would quickly become time consuming as the amount of possible permutations rises dramatically. And in fact, it’s also not needed. This is likely one such reason why this question (and variations of it) emerge frequently in interview questions and other coding brainteasers — to test the ability for you to take a step back — assess a problem and carefully break it down into its most critical components to produce an efficient and optimized solution.


“Wow”, “Racecar”, “Radar”, “Level”, “Noon” are all examples of palindromes. They each have the following qualities in common: all have an even count of characters, or at most 1 character with an odd # of occurrences. This means that instead of going crazy generating every permutation of a string and checking whether each is a palindrome, we just need to check if the string we start with abides by these rules. 

 

Let’s apply these rules to the string “aacecrr” and see if we can accurately conclude whether the letters in “aacecrr” can be rearranged into a palindrome. We’ll keep count of the # of times we’ve seen each character, along with a count that keeps track of the odd # of characters — incrementing by 1 when any current character count is odd, and decrementing it by 1 when any current character count is even. Below is an illustrated breakdown of what we’ll do as we iterate through each letter in the string (the current character and the counts being decremented/incremented are highlighted in blue):


A A C E C R R


A: 1  (1 is odd so numOdd should be incremented by 1)

numOdd: 1


A A C E C R R


A: 2  (2 is even so numOdd should be decremented by 1)

numOdd: 0


A A C E C R R


A: 2

C: 1 (odd so increment numOdd by 1)

numOdd: 1


A A C E C R R


A: 2

C: 1

E: 1 (odd so increment numOdd by 1)

numOdd: 2


A A C E C R R


A: 2

C: 2 (even so decrement numOdd by 1)

E: 1

numOdd: 1


A A C E C R R


A: 2

C: 2

E: 1

R: 1 (odd so increment numOdd by 1)

numOdd: 2


A A C E C R R


A: 2

C: 2

E: 1

R: 2 (even so decrement numOdd by 1)

numOdd: 1


By the end, numOdd is only 1, as it should be because we only have one unique character, “E”, that has an odd number of occurrences. And since we end with numOdd <= 1, isPermOfPalindrome will return true. You’ll notice with this logic we must scan every character of the string, as even though a character might have an even count at the beginning, it might eventually have an odd count at the end, and vice versa. We just won’t know until the end. 


Now we’re ready to translate this algorithm over to code. I’ve included my java implementation below:


I’m using a HashMap to keep track of the character counts, and just one integer, numOdd, to track the # of unique characters that have an odd frequency. 


Drop any comments or feedback below, and feel free to share your own implementation coded in another language if you have one!