Some years ago I found myself puzzling over how hardware multipliers work. Honestly, I’m not sure why! Perhaps for the same reason some people do cryptic crosswords?
I’ve known about half- and full-adder circuits for a long time, and I reckon I could draw up the necessary arrangement of logic gates by working backwards from the truth table, but hardware multipliers were a mysterious black box.
I could have just looked up the Wikipedia page, but I thought it would be an interesting thing to try and puzzle out from first principles while walking the dog. Some people do crosswords…
The naive way of doing multiplication (x × y) is just to add x
to itself y times. That is: 3 × 5 = 15 is
3 + 3 + 3 + 3 + 3 = 15 or literally 5 times 3.
In C code this would look something like:
uint32_t mul(uint16_t x, uint16_t y) {
uint32_t r = 0;
while (y) {
r += x;
y--;
}
return r;
}And, actually, this works fine. The performance is linear - O(y). Larger
values of y just mean more iterations through the loop.
But surely there’s a better way!
After puzzling over this for quite a while, I twigged that each 1 bit in
the multiplier y represents an addition of the multiplicand x to the
result, left shifted to the position of that 1 bit. 🤯
When we do ‘long’ multiplication by hand, we multiply x by the least
significant digit of y, then multiply x again by the next least
significant digit of y, recording the result one column over, and so on.
Finally, we add all the intermediate results to get the answer.
38 ; x
x 52 ; y
----
76 ; = 2 x 38
+ 1900 ; = 5 x 38 x 10
----
1976In binary, you can use the same process, but every multiplication step multiplies by only zero or one! I’m going to use smaller numbers in my example just to show the principle.
0011 ; x = 3
x 0101 ; y = 5
-----------
0011 ; = 3
00000
001100 ; = 12 or 0011 <<= 2
+ 0000000
---------
0001111 ; = 15!Eureka!
This translates naturally(!) into:
uint32_t multiply(uint16_t x, uint16_t y) {
uint32_t r = 0;
uint32_t xx = x; // Promote x to prevent overflow
while (xx && y) {
if (y & 1) r+=xx;
y >>= 1;
xx <<= 1;
}
return r;
}A couple of details to note - especially if you don’t read C as a second language:
The
xparameter has to be promoted to a larger type so the value can be left-shifted (xx <<= 1) without overflowing.The
if (y & 1) r+=xxline tests the least significant bit ofy, and adds the current (shifted) value ofxxto the resultr.y >>= 1shiftsyto the right.As far as I know, this is only correct for positive integers.
I must admit I was pretty pleased with this! It’s notably faster than
the naive solution. The time to perform the multiplication varies by the
size of y - O(log y), not its value - O(y).
For example, let’s say we double the size of the multiplier y from
8 to 16 bits (like it’s the 1980s!). Using the
naive addition method, a multiplication will take 256 times longer to
complete. Using the shift and add method, it’s only doubled!
Though I was pleased to have worked this out, I knew it couldn’t be original work, but it wasn’t until I gave the code to Claude to review that I found out it’s called ‘Egyptian’ or ‘Russian Peasant’ multiplication and long predates the common use of binary!
Actually, that’s probably the coolest thing about this - Russian peasants and Egyptian scribes were using binary multiplication to do arithmetic centuries before the difference engine was a twinkle in Mr Babbage’s eye!