Jarvis Cochrane - An Experiment With Binary Multiplication

Some years ago I found myself puzzling over how hardware multipliers work. Honestly, I’m not sure why! Perhaps for the same reason some people do cryptic crosswords?

I’ve known about half- and full-adder circuits for a long time, and I reckon I could draw up the necessary arrangement of logic gates by working backwards from the truth table, but hardware multipliers were a mysterious black box.

I could have just looked up the Wikipedia page, but I thought it would be an interesting thing to try and puzzle out from first principles while walking the dog. Some people do crosswords…

The naive way of doing multiplication (x × y) is just to add x to itself y times. That is: 3 × 5 = 15 is 3 + 3 + 3 + 3 + 3 = 15 or literally 5 times 3.

In C code this would look something like:

uint32_t mul(uint16_t x, uint16_t y) {
    uint32_t r = 0;
    while (y) {
        r += x;
        y--;
    }
    return r;
}

And, actually, this works fine. The performance is linear - O(y). Larger values of y just mean more iterations through the loop.

But surely there’s a better way!

After puzzling over this for quite a while, I twigged that each 1 bit in the multiplier y represents an addition of the multiplicand x to the result, left shifted to the position of that 1 bit. 🤯

When we do ‘long’ multiplication by hand, we multiply x by the least significant digit of y, then multiply x again by the next least significant digit of y, recording the result one column over, and so on. Finally, we add all the intermediate results to get the answer.

    38 ; x
  x 52 ; y
  ----
    76 ; = 2 x 38
+ 1900 ; = 5 x 38 x 10
  ----
  1976

In binary, you can use the same process, but every multiplication step multiplies by only zero or one! I’m going to use smaller numbers in my example just to show the principle.

     0011 ; x = 3
   x 0101 ; y = 5
-----------
     0011 ; = 3
    00000
   001100 ; = 12 or 0011 <<= 2
+ 0000000
---------
  0001111 ; = 15!

Eureka!

This translates naturally(!) into:

uint32_t multiply(uint16_t x, uint16_t y) {
    uint32_t r = 0;
    uint32_t xx = x;    // Promote x to prevent overflow

    while (xx && y) {
        if (y & 1) r+=xx;
        y >>= 1;
        xx <<= 1;
    }
    return r;
}

A couple of details to note - especially if you don’t read C as a second language:

The x parameter has to be promoted to a larger type so the value can be left-shifted (xx <<= 1) without overflowing.
The if (y & 1) r+=xx line tests the least significant bit of y, and adds the current (shifted) value of xx to the result r.
y >>= 1 shifts y to the right.
As far as I know, this is only correct for positive integers.

I must admit I was pretty pleased with this! It’s notably faster than the naive solution. The time to perform the multiplication varies by the size of y - O(log y), not its value - O(y).

For example, let’s say we double the size of the multiplier y from 8 to 16 bits (like it’s the 1980s!). Using the naive addition method, a multiplication will take 256 times longer to complete. Using the shift and add method, it’s only doubled!

Though I was pleased to have worked this out, I knew it couldn’t be original work, but it wasn’t until I gave the code to Claude to review that I found out it’s called ‘Egyptian’ or ‘Russian Peasant’ multiplication and long predates the common use of binary!

Actually, that’s probably the coolest thing about this - Russian peasants and Egyptian scribes were using binary multiplication to do arithmetic centuries before the difference engine was a twinkle in Mr Babbage’s eye!