Wednesday, July 2, 2025

One of the most important techniques: Sliding Window

- A couple of weeks ago I was doing some binary analysis for a client. It's not as mainstream as some of the other work I've been working on in recent months but  reverse engineering work isn't exactly in demand by your typical client in the tech space and it's something I enjoy doing so I was more than happy to take the project on.

- There are probably tools out there that could handle this already - perhaps a tool that has a feature like this built-in natively or hex editor or *nix command-line tool that would make this a cinch. I couldn't think of one off the top of my head, and I figured with the time that it would take me to locate such a tool I could just code something up myself faster.

- A knee-jerk solution to this would be to load the binary file in memory and use a nested loop to iterate through the bytes, and use a HashSet to keep track of the unique bytes we've seen. The size of the HashSet at any given time tells us the amount of unique bytes for that block of # bytes. Here's a look at what that looks like: [INSERT CODE EXAMPLE]

Let's say we're looking at subequences of 10 bytes. The bytes scanned per iteration would look like this: [INSERT ANIMATION]

Notice how in each iteration we're looking at redundant data? This wastes time - and its also unnecessary. 

A nested-loop solution for this type of problem is typically a good sign that we need to step back and think carefully about what we're doing. 

- [INSERT OUTPUT]

Instead - we'll use a sliding window to preserve information gathered from each previous iteration. Then, we only need to analyze the new bytes we haven't seen yet and adjust our previous metrics accordingly, rather than start from scratch.

 We'll ditch the HashSet and swap it out with a datastructue we can use to lookup values (counts) for each key (each byte we encounter). A HashMap would work perfectly fine for this since it lets us work with key/value pairs with ease.  However, because we're dealing with individual bytes, we can actually use an even simpler datastructure with a lighter footprint: an array.

We need to initialize an int array with a length of 256:

All elements are zeroed out by default in Java - so that's all we have to do there. 

Next, we'll declare 2 variables, "left" and "right" - which defines the left and right points of our windows.

We'll set up a loop that iterates until right is _equal to_ or _greater_ than the length of our data buffer. Next, we employ the following logic inside the loop at each iteration:

-> Look up the value for _byte_ at index _left_ of the lookup table and decrement the value (count) by 1 (because we are "dropping" that byte from the left edge of the current window).  If the count was 1 and is now 0, then the byte doesn't exist in the current subsequence of X bytes - so decrease numUniqueBytes by 1. If the count is greater than or equal to 1, we do nothing.

-> Look up the value for _byte_ at index _right_ of the lookup table and increment the value (count) by 1. If the count was 0 and is now 1, this means its a byte we've haven't seen yet in this subsequence of X bytes: so increase numUniqueBytes by 1. If the count is greater than or equal to 1, we do nothing.

-> If numUniqueBytes is greater than maxNumUniqueBytes, then we've hit a new maximum of unique bytes: so we will set maxNumUniqueBytes to numUniqueBytes. If numUniqueBytes is less than or equal to maxNumUniqueBytes - we don't change maxNumUniqueBytes at all. 

- Below is a Java method that contains the general logic behind the algorithm. 

- The gist contains code that builds on the above logic and extends it into a fully-functional command-line based tool. I also added logic to handle edge-cases as well as support for analyzing bytes in chunks to accommodate large files that can't be stuffed into memory all at once.


Tuesday, March 11, 2025

Make every project yield value

You can't guarantee that every product will be a major success - but you can guarantee that the development process of each one creates value for your organization every time. 

I've been consulting a lot over the past couple of years. One of the things I do is help leaders within smaller organizations make informed decisions regarding their tech stacks and the necessary tooling that should be in place to ensure their development processes and workflows are in line with their business needs and requirements. 

One trend that I've noticed an uptick in is with companies shifting their tech stack from Java and migrating over to Golang (often times I think this is more in response to evolving trends than actual necessity... but I digress...). Often times, these companies have a lot of legacy code written in Java that needs to be managed by Java devs. 

One of the groups I'm working with recently expressed interest in a new software project. Going forward, they want future features and microservices to be built in Go. There are 7 Java devs, and 2 dedicated Go devs currently on this group.

The initial group consensus was that both of the dedicated Go devs should handle the experimental project while the Java devs would cover the maintenance side of things on the Java side of their code base.

After analyzing their dev allocation, it was clear to me that there was ample Java resources allocated to the maintenance side of things - an area area where, frankly - the day-to-day work centered around bug fixes and occasional "change the color of some label or button" type 2-point sprint work. Feature requests had pretty much stalled, and there really weren't any signs pointing to a reversal in that trend.

I proposed an idea: pull a couple of the Java devs off the maintenance work and re-assign them over to the Go side of things. Now the Java devs will gain exposure to the Go tech stack and who knows - might even end up liking and excelling in that area more and find a fit on that team.

Going even further: The 2 Java devs that are being mentored by the Go programmers are senior developers. The Go developers, on the other hand, are entry to junior level. This dynamic might not seem intuitive at first if you're thinking about things from a hierarchical perspective - but I wasn't thinking strictly in terms of a hierarchical perspective - I was thinking in terms of what each side had to offer the other. The senior and lead developers are senior for a reason: they already have significant experience leading other engineers and working closely with stakeholders to ensure proper alignment of design decisions with business needs - why not offer the chance for the junior Go devs to get some experience in this area as well? 

The process I've outlined here isn't exactly groundbreaking - it's been thought of before: I've seen it proposed many times in the past by not only me, but other devs I've worked with. It is unique though in the sense that in reality, it's rarely implemented. When there's a new project, leaders will often hyper-focus on which developers can get certain parts of the project done the fastest. There should be an emphasis on structuring the project around fostering career development and climate of talent diversification - building a more flexible team that can support multiple sides of an organization if need be. From my perspective, there's no better time to let other's break into different sides of the product than during a new speculative endeavor.

The project is still in the earliest phases of production and only time will tell how much money it's going to bring in. In a lot of ways though, it's already a success: the organization now has Go devs with heightened leadership abilities, and an added layer of insurance with Java devs that now have enough insight into the Go codebase that they can assist with a hotfix or other high-priority fixes as they arise.

Some products skyrocket, others barely break even, and inevitably, some will even fail without ever yielding a dime; then there are others that provide just enough cash flow to cover opportunity costs of the resources poured into those projects. Regardless - it's entirely possible to derive value from the project regardless of the monetary value it yields. 

This isn't to say, of course, that a company should just leap into a new product idea even if they think it won't make them money. However, once the green light has been given to proceed with a new idea that has a high chance of turning profit and things have progressed to the implementation and design phase, there should be other decisions besides simply determining which engineers can complete which tasks the fastest. Engineering talent and resources should be arranged in a way that not only maximizes development efficiency, but also learning and talent dispersion. This will lead to a more robust, flexible, and agile engineering environment equipped to respond more swiftly to evolving expectations. 

Thursday, February 27, 2025

Finding the square root of a number using binary search

The other day I was going through a hard drive full of homework assignments I did years ago in college.

I found one of the assignments interesting enough that I thought I would include the solution on here. The goal of the assignment was to write a function that didn't use any built in math library calls to calculate the square root of a number. 

The instructions stated that 1000 test cases would be performed, and they ALL had to pass in less than a second. In other words: the sum of all the runtimes of each of the test cases had to total less than a second -- which is pretty much to force the coder to get creative and not write a brute force solution (a bruteforce solution.. while easy to code, would perform horrendously). 

For reference, a brute force solution would look something like this:

public static double bruteForceSQRT(double num) {
final double precision = 0.00000001D;
double result = num;
if (num < 1.0D) {
if (num == 0) {
return 0;
} else if (num < 0) {
// Handle these cases Like Math.sqrt() does it
return Double.NaN;
}
result *= 2;
} else {
result /= 2;
}
while(result*result > num) {
result -= precision;
}
return result;
}

In the brute force approach we halve the number we are given and then decrement that by a certain precision decimal value until we reach an approximated value that is extremely close to the square root of that number. Unfortunately, the greater the accuracy that you desire, the drastically slower the runtime is going to be. It took more than 2 seconds! for the bruteforce code to calculate the square root of 25 - which of course is ridiculous. 

The approach I went with for the homework assignment utilized binary search instead. Binary search provides superior performance and accuracy, and looks something like this:

public static double bin_sqrt(double num) {
double left = 0.0D, right = num;
if (num < 1.0D) {
if(num == 0) {
return 0;
}
else if (num < 0) {
// Handle these cases Like Math.sqrt() does it
return Double.NaN;
}
// If we got to here then 0 < num < 1
// so we want to actually set LEFT to num, and RIGHT to 1
// The resulting square root value will be GREATER than num
// (i.e: the square root of 0.25 is 0.50)
left = num;
right = 1.0D;
}
double middleNumber = (right + left) / 2;
double lastMiddleNumber = num;
double mSquared = middleNumber * middleNumber;
while (lastMiddleNumber != middleNumber) {
if (mSquared > num) {
right = middleNumber;
} else if (mSquared < num) {
left = middleNumber;
}
lastMiddleNumber = middleNumber;
middleNumber = (right + left) / 2;
mSquared = middleNumber * middleNumber;
}
return middleNumber;
}


The binary search method is clearly faster than the bruteforce approach - but you might be wondering -  how does it stack up against the built in Math.sqrt() function? 

Below is a table I put together which shows the results of runtimes for the binary search version (bin_srt) vs. the standard Math.sqrt();


As you might expect, Math.sqrt() in almost all cases is still significantly faster than the binary search method - sometimes almost 5 times as fast (on modern day systems, Math.sqrt() is likely to execute the square root hardware instructions directly - which is likely going to beat any homegrown software solution any day). 

Nevertheless - I could see the binary search method being useful in some niche cases: like if you can't or don't want to include a standard math library due to size constraints for a game jam, or solving an algorithm question, or, as in my original case - solving a homework problem. In those cases this is a perfectly fine solution that offers a good balance between speed, efficiency, and accuracy. 

Lastly (and arguably most importantly), this example stands as another testament to the power and efficiency of binary search and the many use cases it can be applied to.

A Java version of the binary search square root implementation can also be found on my github here: https://gist.github.com/edev90/bda51627b722a0d825453b3d797b7283