low-power, high-speed multiplier architectures shawn nicholl elec-5705y march 7, 2005

28
Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

Upload: magnus-hicks

Post on 26-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

Low-power, High-speed Multiplier Architectures

Shawn NichollELEC-5705y

March 7, 2005

Page 2: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 2

Agenda/Overview

Design Abstraction Numbering Systems Addition and Subtraction Adder Architectures Multiplication Traditional Multiplier Architectures Advanced Multiplier Architectures

Page 3: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 3

Levels of Abstraction in Digital ICs

Higher levels of abstraction have greater effect on overall system performance

Systems

Modules

Logic Gates

Circuits

Devices

Low-power, high-speed techniques can be used at many levels of abstraction

Incr

easi

ng

Ab

stra

ctio

n

Multiplier Architectures

Page 4: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 4

Numbering Systems – A Quick Review

Decimal

1

0

10n

i

iidD

1

0

2n

i

iibB

Range: 0 to 10n-1

Range: 0 to 2n-1

Range: -2n-1 to +(2n-1 –1)

Some common numbering systems:

Unsigned Binary

Two’s-Complement

Sign Decimal Sign Unsigned Binary

Sign Two’s Complement

+ 10 + 0000 1010 N/A 0000 1010

- 45 - 0010 1101 N/A 1101 0011

1 1 0 1 0 0 1 1

1 1 0 1 0 0 1 0 1

2’s Comp

45d = 0+0+25+0+23+22+0+20

0 0 1 0 1 1 0 1Eg.

Page 5: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 5

Adding and Subtracting Two’s-complement algorithm is consistent

Addition and subtraction and behave the same Negative numbers treated same as positive numbers

Example: Add –45d to 10d 10d-45d

-45d 10d

45d-10d

45d-10d 35d

-35d

Step1) Initialize

Step2) Compare so that augend holds larger number

Step3) Treat as a subtraction

Step4) Do subtraction (borrows may be required)

Step5) Negate result (knowing that augend was negative)

Two’s Complement Method

Step1) Initialize

Step2) Add (no special rules)

10d = 0000 1010b-45d = 1101 0011b 0000 1010b 1101 0011b 1101 1101b

Converting 2’s Comp back to decimal:

1101 1101b = -35d

Page 6: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 6

Adding and Subtracting (Example 2)

Example2: Subtract –45d from 10d

10d- -45d

10d+ 45d

55d

Step1) Initialize

Step2) Subtrahend is negative, so negate it and do an addition

Signed Decimal Method Two’s Complement Method

10d = 0000 1010b

-45d = 1101 0011b

1b 0000

1010b 0010

1100b 0011

0111bConverting 2’s Comp back to

decimal:0011 0111b = 55d

Step1) Initialize

Step2) Invert subtrahend and set CIN = 1

Subtraction logic can be shared with addition logic!

Page 7: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 7

Adder Building Blocks

Half AdderSn = An Bn

COn = An • Bn

An

Bn

COn

Sn

SnCINn

COUTn

An

Bn

Full AdderSn = An Bn CINn

COUTn = An • Bn• CINn

Page 8: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 8

Adder Architectures (CRA)

Carry Ripple Adder (CRA) Gate Count N Area N Delay N Power N Layout friendly (low fan-in/fan-out; regular

structure)

AN BN

SN

FACOUTN CIN0

A1 B1

S1

FA

A0 B0

S0

FA

Page 9: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 9

Adder Architectures (CLA) Carry Lookahead Adder (CLA)

Generate: Gn = An • Bn

Propagate: Pn = An + Bn Recursive Relationship:

CINn = Gn-1 + Pn-1• CINn-1

Generates

Propagates 1

CINn = Gn-1 + Pn-1Gn-2 + Pn-1Pn-2…P1G0 + Pn-1Pn-2…P0CIN0

CLA: Delay log2N

(if built right) Gate count, power are

greater than CRA Not layout friendly

(high fan-in; difficult to route)

GN-1 PN-1 CIN0P0P1PN-1 PN-1 GN-3 PN-1 P1P2 G0PN-2GN-2

CINN

AN BN A1 B1 A0 B0

SN S1 S0

Source: Patterson and Hennessy,Figure A.14

Stage n

CINn

Stage n

CINn

Stage n-1

Stage n-1

CINn

Page 10: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 10

Adder Architectures (CSA)

Carry Save Adder Adders work

independently, so very fast

Pipelined architecture results in flops and control logic, which increase area and latency

CIN0A0 B0

S0

FA

COUT0

CIN1A1 B1

S1

FA

COUT1

CINN-1AN-1 BN-1

SN-1

FA

COUTN-1

CINNAN BN

SN

FA

COUTN

FAFAFAFA

FAFAFAFA

FAFAFAFA

Page 11: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 11

Unsigned Multiplication

Shift-and-Add Algorithm

Example: Multiply 118d by 99d

Multiplicand

Multiplier

Step1) Initialize

Step2) Find partial products

Step3) Sum up the shifted partial products

118d99d

1062d 1062

d11682

d

Two’s Complement Method

Step1) Initialize

Step2) Find partial products

Step3) Sum up the shifted partial products

118d = 0111 0110b

99d = 0110 0011b

01110110b

Convert 2’s-Comp back to decimal:

0010 1101 1010 0010 = 11682d

00000000 b00000000 b

01110110 b01110110 b

00000000 b010110110100010 b

01110110 b00000000 b

Page 12: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 12

Shift-and-Add Multiplier

A

B

SCOUT

Anx B

N-bit Adder

N N

Load B

Load A

P

N

N

N

N

N

N

N+1

1

2N

Shift

Add

B MultiplicandX A Multiplier P Product

Shift-and-Add Multiplier

Take N cycles to complete:

TLat= (TN-bitADD+Tshift)xN Requires

minimal logic (most logic is in the adder)

Page 13: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 13

A B

Shift-and-AddMultiplier

Convert toUnsigned

Convert toUnsigned

DetermineSign of Result

Convert toSigned

P

2N

NN

Basic Signed Multiplication

ExtraHardware!

Basic Idea1. Convert to

Unsigned2. Use Shift-and-Add

Multiplier3. Convert to Signed

Page 14: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 14

Signed Multiplication

Booth Recoding Reduce the number of partial products

by re-coding the multiplier operand Works for signed numbers

Example: Multiply -118d by -99d

Recall, 99d = 0110 0011b

1001 1100b 1b

-99d = 1001 1101bRadix-2 Booth Recoding

0101 1110-99d =

An An-1

Partial Produc

t

0 0 0

0 1 +B

1 0 -B

1 1 0

Low-order BitLast Bit Shifted Out

Page 15: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 15

Radix-2 Booth Multiplication

Radix-2 Booth

Step1) Initialize

Step2) Find partial products

Step3) Sum up the shifted partial products

-118d = 0111 0110b

01110110b

Convert 2’s-Comp back to decimal:

0010 1101 1010 0010 = 11682d

00000000 b00000000 b

1110001010 b000000000 b

01110110 b0010110110100010

b

110001010 b01110110 b

0101 1110-99d = -B

B-B 0 0 B 0-B

B = -118d = 1000 1010b

-B = 118d = 0111 0110b

A = -99d = 1001 1101b

Example: Multiply -118d by -99d

Sign Extension

0101 1110-99d =

Page 16: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 16

Array Multiplier

Array Multiplier Combinatorial, so it is very

fast – delay N Can be pipelined Very regular structure

-118d = 0111 0110b

01110110b

00000000 b00000000 b

1110001010 b000000000 b

01110110 b0010110110100010

b

110001010 b01110110 b

0101 1110-99d = -B

B-B 0 0 B 0-B

01110110b110001010 b01110110 b

-B B-B

FA FAFAFA

CSA

CSA

CSA

CSA

CSA

CPA

00000000 b

0

00000000 b

0

1110001010 b

B

000000000 b

0

01110110 b

-B

Page 17: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 17

Array Multiplier Structure

Source: J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, 1999

Page 18: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 18

Radix-4 Booth Multiplication

Similar to Radix-2, but uses looks at two low-order bits at a time (instead of 1) A2n+1 A2n A2n-1

Partial Produc

t

0 0 0 0

0 0 1 +B

0 1 0 +B

0 1 1 +2B

1 0 0 -2B

1 0 1 -B

1 1 0 -B

1 1 1 0

Low-order Bits

Last Bit Shifted Out

Recall, 99d = 0110 0011b

1001 1100b 1b

-99d = 1001 1101bRadix-4 Booth Recoding

-99d =

1122

Page 19: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 19

Radix-4 Booth Multiplication

Radix-4 Booth

Step1) Initialize

Step2) Find partial products

Step3) Sum up the shifted partial products

-118d = 0111 0110b

Convert 2’s-Comp back to decimal:

0010 1101 1010 0010 = 11682d

111111110001010b

011101100 b0010110110100010

b

01110110 b11100010100 b

B-B 2B-2B

B = -118d = 1000 1010b-B = 118d = 0111 0110b

2B = -236d = 1 0001 0100b

-2B = 236d = 0 1110 1100b

A = -99d = 1001 1101b

Example: Multiply -118d by -99d

Sign Extension

-99d =

1122

-99d =

1122

Reduces number of partial products by half!

Page 20: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 20

Tree Multiplier

Wallace Tree Reduces the total

number of full-adders Uses 3:2 Compressor

(aka Full Adder) Delay log3/2N Irregular structure is

difficult to layout

Source: J. Kuo, et. al., Low-Voltage CMOS VLSI Circuits, 1999

B7A0 B0A0

B7A8 B0A8

B7A0 B0A0B7A8

B0A8

OriginalStructure

TreeStructure

Page 21: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 21

Twin Pipe Serial-Parallel Multiplier

Features

Source: S. Shah, et.al., “Comparison of 32-bit Multipliers for Various Performance Measures”, 2000.

Even data bits on rising clock

Odd data bits on falling clock

Parallel Feed One Operand

Serial Feed One Operand

Low Area High

latency Low Power

Page 22: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 22

Cluster Multiplication

Divide circuit into clusters of nibble-wide multiplications If all bits in a nibble

are zeroes, then use clock-gating to gate multiplication for that nibble

A0

B0

A1

B1

A(N-1)

B(N-1)

A(N-1)xB0 A1xB0 A0xB0

A(N-1)xB1 A1xB1 A0xB1

A(N-1)xB(N-1)

A1xB(N-1) A0xB(N-1)

4 44

4

4

4

Source: A. Fayed, M. Bayoumi, “A Novel Architecture for Low-Power Design of Parallel

Multipliers”, 2001.

Features Low Power(claims 13%

savings)

Page 23: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 23

Multiplexer-Based Array Multiplier

Characteristics Fast (because it

is array-based) Unlike Booth,

does not require encoding logic

Source: K. Pekmestzi, “Multiplexer-Based Array Multipliers”, 1999.

Processes 1 bit of multiplier and 1 bit of multiplicand at a time, thus it is symmetric

Has a zigzag shape, thus not layout-friendly

Page 24: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 24

Area-Efficient Multiplexer-Based Multiplier

Characteristics Increases each row to have N+1 cells (instead of N) Depth is cut in half (increases “squareness”)

Source:Y. Wang, Y. Jiang, E. Sha, “On Area-Efficient Low Power Array Multipliers”, 2001.

Page 25: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 25

Low Latency Booth-Encoding-based Pipeline Multiplier

Features Delay N/4 Needs (N+N/2)-bit

addition at end Uses CLA’s instead of

CSA’s because longest stage (i.e. adder at end) determines fastest operating frequency

Source: X. Wu, H. Chen, S. Wei, “Design of a Low Latency High Speed Pipelining Multiplier”, 2001.

Page 26: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 26

Two’s Complement Gray-Encoded Array Multiplier

Characteristics Uses gray code

to reduce the switching activity of multiplier

Claims that traditional Booth uses 45% more power

Greater area than traditional Booth

Source: E. Costa, et.al., “A New Architecture for 2’s Complement Gray Encoded Array Multiplier”, 2002.

Page 27: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 27

Project Plan

Start End Task

- 03/05 Research Multiplier Circuits

03/06 03/12 Code multipliers in Verilog HDL

03/13 03/19 Synthesize all multiplier circuits

03/20 03/26 Analyze results (delay/power/area)

03/27 04/02 Prepare report

04/03 04/09 Prepare for final exam

04/10 04/16 Complete Report and Submit

Page 28: Low-power, High-speed Multiplier Architectures Shawn Nicholl ELEC-5705y March 7, 2005

2005/03/07 Low-Power, High-Speed Multiplier Architectures 28

References S. Shah, A.J. Al-Khalili, D. Al-Khalili, “Comparison of 32-bit Multipliers for

Various Performance Measures”, Proc. 2000 Int’l Conf. Microelectronics, pp. 75-80, 2000.

D. Patterson, J. Hennessy, 2nd, ed., Computer Architecture – A Quantitative Approach, San Francisco, CA: Morgan Kaufmann Publishers, Inc., 1996.

X. Wu, H. Chen, S. Wei, “Design of a Low Latency High Speed Pipelining Multiplier”, Proc. 2001 Int’l Conf. on ASIC, pp. 551-554, 2001.

J. Wakerly, 2nd, ed., Digital Design – Principles and Practices, Eaglewood Cliffs, NJ: Prentice Hall, 1994.

J. Kuo and J. Lou, Low-Voltage CMOS VLSI Circuits, New York, NY: John Wiley & Sons, Inc., 1999.

K. Pekmestzi, “Multiplexer-Based Array Multipliers”, IEEE Trans. on Computers, vol. 48, pp. 15-23, 1999.

A. Fayed, M. Bayoumi, “A Novel Architecture for Low-Power Design of Parallel Multipliers”, Proc. 2001 IEEE Computer Society Workshop on VLSI, pp. 149-154, 2001.

Y. Wang, Y. Jiang, E. Sha, “On Area-Efficient Low Power Array Multipliers”, Proc. 2001 IEEE Int’l Conf. On Electronics, Circuits and Systems, vol. 3, pp. 1429‑1432, 2001.