sequential multipliers lecture 9. required reading chapter 9, basic multiplication scheme chapter...
TRANSCRIPT
Sequential Multipliers
Lecture 9
Required Reading
Chapter 9, Basic Multiplication SchemeChapter 10, High-Radix MultipliersChapter 12.3, Bit-Serial MultipliersChapter 12.4, Modular Multipliers
Behrooz Parhami, Computer Arithmetic: Algorithms and Hardware Design
Notation
a Multiplicand ak-1ak-2 . . . a1 a0
x Multiplier xk-1xk-2 . . . x1 x0
p Product (a x) p2k-1p2k-2 . . . p2 p1 p0
If multiplicand and multiplier are of different sizes, usually multiplier has the smaller size
Multiplication of two 4-bit unsigned binary numbers in dot notation
Partial Product 0
Partial Product 1
Partial Product 2
Partial Product 3
Number of partial products = number of bits in multiplier xBit-width of each partial product = bit-width of multiplicand a
Basic Multiplication Equations
x = xi 2i
i=0
k-1
p = a x
p = a x = a xi 2i =
= x0a20 + x1a21 + x2a22 + … + xk-1a2k-1
i=0
k-1
Shift/Add AlgorithmRight-shift version
Shift/Add AlgorithmsRight-shift algorithm
p = a x = x0a20 + x1a21 + x2a22 + … + xk-1a2k-1
= (...((0 + x0a2k)/2 + x1a2k)/2 + ... + xk-1a2k)/2 =
k times
=
p(0) = 0
p = p(k)
p(j+1) = (p(j) + xj a 2k) / 2 j=0..k-1
Sequential shift-and-add multiplier forright-shift algorithm
Right-shiftmultiplication
algorithm: Example
Area optimization for the sequential shift-and-add multiplier with the right-shift algorithm
Shift/Add AlgorithmsRight-shift algorithm: multiply-add
= (...((y2k + x0a2k)/2 + x1a2k)/2 + ... + xk-1a2k)/2 =
k times
p(0) = y2k
p = p(k)
p(j+1) = (p(j) + xj a 2k) / 2 j=0..k-1
= y + x0a20 + x1a21 + x2a22 + … + xk-1a2k-1 = y + a x
Signed Multiplication
• Previous sequential multipliers are for unsigned multiplication• For signed multiplication:
– assume sign-extended operation for p(j) + xja– if 2's complement multiplier is POSITIVE
right-shift sequential algorithms (shift-add) will work directly– if 2's complement multiplier is NEGATIVE than we must use
"negative weight” for xk-1 and subtract xk-1a in the last cycle• Slight increase in area due to control and one-bit sign extension on
inputs of adder– Unsigned: k bit number + k bit number k+1 bit number– Signed: k+1 bit sign extended number + k+1 bit sign extended
number k+1 bit number
Sequential multiplication
of 2’s-complementnumbers
with right shifts(positive multiplier)
Sequential multiplication
of 2’s-complementnumbers
with right shifts(negative multiplier)
Shift/Add AlgorithmLeft-shift version
Shift/Add AlgorithmsLeft-shift algorithm
p = a x = x0a20 + x1a21 + x2a22 + … + xk-1a2k-1
= (...((02 + xk-1a)2 + xk-2a)2 + ... + x1a)2 + x0a=
k times
=
p(0) = 0
p = p(k)
p(j+1) = (p(j) 2 + xk-1-ja) j=0..k-1
Sequential shift-and-add multiplier forleft-shift algorithm
Left shifts are not as efficient fortwo's complement because mustsign extend multiplicand by k bits
Left-shiftmultiplication
algorithm: Example
p(0) = y2-k
p = p(k)
p(j+1) = (p(j) 2 + xk-(j+1)a) j=0..k-1
Shift/Add AlgorithmsLeft-shift algorithm: multiply-add
= (...((y2-k 2 + xk-1a)2 + xk-2a)2 + ... + x1a)2 + x0a =
k times
= y + xk-1a2k-1 + xk-2a2k-2 + … + x1a21 + x0a = y + a x
Shift/Add AlgorithmRight-shift version
with Carry-Save Adder
Sequential shift-and-add multiplierwith a carry save adder
High-Radix Sequential Multipliers
High-Radix Notation
a Multiplicand (an-1an-2 . . . a1 a0)r
x Multiplier (xn-1xn-2 . . . x1 x0)r
p Product (a x) (p2n-1p2n-2 . . . p2 p1 p0)r
Radix-4, or two-bit-at-a-time, multiplication in dot notation
Basic Multiplication Equations
x = xi ri
i=0
n-1
p = a x
p = a x = a xi ri =
= x0ar0 + x1a r1 + x2a r2 + … + xn-1a rn-1
i=0
n-1
High-Radix Shift/Add AlgorithmsRight-shift high-radix algorithm
p = a x = x0ar0 + x1ar1 + x2ar2 + … + xn-1arn-1
= (...((0 + x0arn)/r + x1arn)/r + ... + xn-1arn)/r =
n times
=
p(0) = 0
p = p(n)
p(j+1) = (p(j) + xj a rn) / r j=0..n-1
High-Radix Shift/Add AlgorithmsLeft-shift high-radix algorithm
p = a x = x0ar0 + x1ar1 + x2ar2 + … + xn-1arn-1
= (...((0r + xn-1a)r + xn-2a)r + ... + x1a)r + x0a=
n times
=
p(0) = 0
p = p(n)
p(j+1) = (p(j) r + xn-1-ja) j=0..n-1
The multiple generation part of a radix-4multiplier with precomputation of 3a
Example of radix-4 multiplicationusing the 3a multiple
The multiple generation part of a radix-4multiplier based on replacing 3a with 4a (carry into next higher radix-4 multiplier
digit) and -a
Higher Radix Multiplication
• In radix-8, one must precompute 3a, 5a, 7a – Overhead becomes prohibitive and does not
help
• However, when we discuss CSA this may be useful
Radix-2 Booth Recoding
ijj+1
Radix-2 Booth Recoding
yi = -xi + xi-1
Sequential multiplication of
2’s-complementnumbers with
right shifts using Booth’s recoding
Notation
Y Multiplicand ym-1ym-2 . . . y1 y0
X Multiplier xm-1xm-2 . . . x1 x0
P Product (Y X ) p2m-1p2m-2 . . . p2 p1 p0
If multiplicand and multiplier are of different sizes, usually multiplier has the smaller size
Radix-2 BoothMultiplierBasic Step
Radix-2 BoothMultiplierBasic Step
in Xilinx FPGAs
Radix-2 Booth Multiplierin Xilinx FPGAs
Radix-4 Booth Recoding
(1) -1 0 1 0 0 -1 1 0 -1 1 -1 1 0 0 -1 0
zi/2 = -2xi+1 + xi + xi-1
Example radix-4 multiplication with modifiedBooth’s recoding of the 2’s-complement
multiplier
The multiple generation part of a radix-4multiplier based on Booth’s recoding
Notation
Y Multiplicand ym-1ym-2 . . . y1 y0
X Multiplier xm-1xm-2 . . . x1 x0
P Product (Y X ) p2m-1p2m-2 . . . p2 p1 p0
If multiplicand and multiplier are of different sizes, usually multiplier has the smaller size
Radix-4 BoothMultiplierBasic Step
Radix-4 Booth Multiplier:Left Shifter & Control
High-Radix Multiplierswith Carry-Save Adder
Radix-4 multiplication with a carry-save adderused to combine the
cumulative partial product, xia, and 2xi+1a into two numbers
Radix-4 multiplier with a carry-save adder and Booth’s recoding
Booth recoding and multiple selection logic for high-radix multiplication
Radix-4 multiplier with two carry-save adders
Radix-16 multiplier with carry-save adders
Bit-Serial Multipliers
Bit Serial MultipliersAdvantages
• small area
• reduced pin count
• reduced wire length
• high clock rate
Systolic Array
• Systolic array: synchronous arrays of processing elements that are interconnected by only short, local wires thus allowing very high clock rates
Semisystolic Bit-Serial Multiplier (1)
Semisystolic Bit-Serial Multiplier (2)
a3x0 a2x0 a1x0 a0x0
a3x1 a2x1 a1x1 a0x1
a3x2 a2x2 a1x2 a0x2
a3x3 a2x3 a1x3 a0x3
a3 0 a2 0 a1 0 a0 0
a3 0 a2 0 a1 0 a0 0
a3 0 a2 0 a1 0 a0 0
a3 0 a2 0 a1 0 a0 0
p0
p1
p2
p3
p4
p5
p6
p7
Retiming
d
kk
k+n k+n+d
d kk+d
k+d+n k+d+n
Retimed Semisystolic Bit-Serial Multiplier (1)
Retimed Semisystolic Bit-Serial Multiplier (2)
a3 0 a2 0 a1 0 a0x0
a3 0 a2 0 a1x0 a0x1
a3 0 a2x0 a1x1 a0x2
a3x0 a2x1 a1x2 a0x3
a3 x1 a2x2 a1x3 a0 0
a3 x2 a2x3 a1 0 a0 0
a3x3 a2 0 a1 0 a0 0
a3 0 a2 0 a1 0 a0 0
p0
p1
p2
p3
p4
p5
p6
p7
Systolic Bit-Serial Multiplier
Modular Multipliers
Modular MultiplicationSpecial Cases
a
x
pH pL
a x = p = pH 2k + pL
k bits
a x mod 2k = pL
a x mod 2k-1 = pL + pH + carry
p
a
x
a x mod 2k+1 = pL - pH - borrow
Modular MultiplicationSpecial Case (1)
a x mod 2k-1 = (pH 2k + pL) mod (2k-1) = = (pH (2k mod (2k-1)) + pL) mod (2k-1) = = pH + pL mod (2k-1) =
=pH + pL if pH + pL < 2k - 1
pH + pL - (2k-1) if pH + pL 2k - 1
= pL + pH + carry
carry = carry from addition pL + pH
Modular MultiplicationSpecial Case (2)
a x mod 2k+1 = (pH 2k + pL) mod (2k+1) = = (pH (2k+1-1) + pL) mod (2k+1) = = pL - pH mod (2k+1) =
=pL - pH if pL - pH 0
pL - pH + (2k+1) if pL - pH < 0
= pL - pH + borrow
borrow = borrow from subtraction pL + pH
Modulo (2b-1) Carry Save Adder
4 x 4 Modulo 15 Multiplier
4 x 4 Modulo 13 Multiplier