sp09 cmpen 411 l19 s.1 cmpen 411 vlsi digital circuits spring 2009 lecture 19: adder design [adapted...

Sp09 CMPEN 411 L19 S.1

CMPEN 411VLSI Digital Circuits

Spring 2009

Lecture 19: Adder Design

[Adapted from Rabaey’s Digital Integrated Circuits, Second Edition, ©2003 J. Rabaey, A. Chandrakasan, B. Nikolic]


Major Components of a Computer

Processor

Control

Datapath

Memory

Devices

Input

Output

Modern processor architecture styles (CSE 431) Pipelined, single issue (e.g., ARM) Pipelined, hardware controlled multiple issue – superscalar Pipelined, software controlled multiple issue – VLIW Pipelined, multiple issue from multiple process threads -

multithreaded


Basic Building Blocks

Datapath Execution units

- Adder, multiplier, divider, shifter, etc.

Register file and pipeline registers Multiplexers, decoders

Control Finite state machines (PLA, ROM, random logic)

Interconnect Switches, arbiters, buses

Memory Caches, TLBs, DRAM, buffers


MIPS 5-Stage Pipelined (Single Issue) Datapath

ReadAddress

I$

Add

PC

4

0

1

Write Data

Read Addr 1

Read Addr 2

Write Addr

Register

File

Read Data 1

Read Data 2

SignExtend16 32

ALU

1

0

Shiftleft 2

Add

D$Address

Write Data

ReadData

1

0

IF/D

ec

De

c/E

xe

c

Ex

ec

/Me

m

Me

m/W

B

pipelinestage

isolationregister

Fetch Decode Execute Memory WriteBack

clk

Icacheprecharge

Dcacheprecharge

RegWrite


Datapath Bit-Sliced Organization

Control Flow

Bit 0

Bit 1

Bit 2

Bit 3

Tile identical bit-slice elements

Re

gis

ter

File

Pip

elin

e R

egis

ter

Ad

der

Sh

ifter

Pip

elin

e R

egis

ter

Mu

ltip

lexe

r

Mu

ltip

lexe

r

Data Flow

Pip

elin

e R

egis

ter

From I$

Pip

elin

e R

egis

ter

To/From D$


The Binary Adder

S A B Ci =

A= BCi ABCi ABCi ABCi+ + +

Co AB BCi ACi+ +=

A B

Cout

Sum

Cin Fulladder


The 1-bit Binary Adder

1-bit Full Adder(FA)

A

BS

Cin

S = A B Cin

Cout = A&B | A&Cin | B&Cin (majority function)

A VERY common operation –often in the critical path

A B Cin CoutS carry status

0 0 0 0 0 kill

0 0 1 0 1 kill

0 1 0 0 1 propagate

0 1 1 1 0 propagate

1 0 0 0 1 propagate

1 0 1 1 0 propagate

1 1 0 1 0 generate

1 1 1 1 1 generate

Cout

G = A & BP = A BK = !A & !B

= P Cin

= G | P&Cin


Complimentary Static CMOS Full Adder

28 Transistors

A B

B

A

Ci

Ci A

X

VDD

VDD

A B

Ci BA

B VDD

A

B

Ci

Ci

A

B

A CiB

Co

VDD

A direct implementation in CMOS needs 28 transistors

(pp.565) Co=AB+BCi+ACi , S=ABCi+!Co(A+B+Ci)


The 1-bit Binary Adder

1-bit Full Adder(FA)

A

BS

Cin

S = A B Cin

Cout = A&B | A&Cin | B&Cin (majority function)

How can we use it to build a 64-bit adder?

How can we modify it easily to build an adder/subtractor?

How can we make it better (faster, lower power, smaller)?

A B Cin CoutS carry status

0 0 0 0 0 kill

0 0 1 0 1 kill

0 1 0 0 1 propagate

0 1 1 1 0 propagate

1 0 0 0 1 propagate

1 0 1 1 0 propagate

1 1 0 1 0 generate

1 1 1 1 1 generate

Cout

G = A & BP = A BK = !A & !B

= P Cin

= G | P&Cin

Sp09 CMPEN 411 L19 S.10

A 64-bit Adder/Subtractor

1-bit FA S0

C0=Cin

C1

1-bit FA S1

C2

1-bit FA S2

C3

C64=Cout

1-bit FA S63

C63

. .

.

Ripple Carry Adder (RCA) built out of 64 FAs

Subtraction – complement all subtrahend bits (xor gates) and set the low order carry-in

RCA

advantage: simple logic, so small (low cost)

disadvantage: slow (O(N) for N bits) and lots of glitching (so lots of energy consumption)

A0

B0

A1

B1

A2

B2

A63

B63

add/subt

Sp09 CMPEN 411 L19 S.11

Ripple Carry Adder (RCA)

A0 B0

S0

C0=CinFA

A1 B1

S1

FA

A2 B2

S2

FA

A3 B3

S3

FACout=C4

T = O(N) worst case delay

Tadder (N-1) Tcarry + Tsum

Real Goal: Make the fastest possible carry path

Sp09 CMPEN 411 L19 S.12

Inversion Property

A B

S

CinFA

!Cout (A, B, Cin) = Cout (!A, !B, !Cin)

Cout

A B

S

FACout Cin

!S (A, B, Cin) = S(!A, !B, !Cin)

Inverting all inputs to a FA results in inverted values for all outputs

Sp09 CMPEN 411 L19 S.13

Exploiting the Inversion Property

A0 B0

S0

C0=CinFA’

A1 B1

S1

FA’

A2 B2

S2

FA’

A3 B3

S3

FA’Cout=C4

Now need two “flavors” of FAs

regular cellinverted cell

Minimizes the critical path (the carry chain) by eliminating inverters between the FAs

Sp09 CMPEN 411 L19 S.15

Mirror Adder Features The NMOS and PMOS chains are completely

symmetrical with a maximum of two series transistors in the carry circuitry, guaranteeing identical rise and fall transitions if the NMOS and PMOS devices are properly sized.

When laying out the cell, the most critical issue is the minimization of the capacitances at node !Cout (four diffusion capacitances, two internal gate capacitances, and two inverter gate capacitances). Shared diffusions can reduce the stack node capacitances.

The transistors connected to Cin are placed closest to the output.

Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.

Sp09 CMPEN 411 L19 S.17

Manchester Carry Chain (MCC)

Switches controlled by Gi and Pi

Total delay of time to form the switch control signals Gi and Pi

signal propagation delay through N switches in the worst case

Gi Pi

!Ci!Ci+1

clk

Sp09 CMPEN 411 L19 S.18

4-bit Sliced MCC Adder

G P

!C0

clk

G PG PG P

& & & &

A0 B0A1 B1A2 B2A3 B3

S0S1S2S3

!C1!C2!C3

!C4

Sp09 CMPEN 411 L19 S.19

8-bit MCC Adder

4-bit slice MCC !C0

&

4-bit slice MCC

&

!C7

Its really hard to beat the speed of a well designed MCC for word lengths of 8 bits or less !

Sp09 CMPEN 411 L19 S.20

Carry Skip Adder (a.k.a. Carry Bypass Adder)

If (P0 & P1 & P2 & P3 = 1) then C4 = C0 otherwise the block itself kills or generates the carry internally

A0 B0

S0

C0FA

A1 B1

S1

FA

A2 B2

S2

FA

A3 B3

S3

FAC4

C4

BP = P0&P1&P2&P3 “Block Propagate”

Sp09 CMPEN 411 L19 S.21

Carry-Skip Chain Implementation

BPblock carry-in

block carry-outcarry-out

Cin

G0

P0P1P2P3

G1G2G3

!Cout

BP

Sp09 CMPEN 411 L19 S.22

16 bit, 4-bit Block Carry Skip Adder

Worst-case delay carry from bit 0 to bit 15 = carry generated in bit 0, ripples through bits 1, 2, and 3, skips the middle two groups (B is the group size in bits), ripples in the last group from bit 12 to bit 15

Ci,0

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

bits 0 to 3bits 4 to 7bits 8 to 11bits 12 to 15

Tadd = tsetup + B tcarry + ((N/B) - 1) tskip +(B-1) tcarry + tsum

Sp09 CMPEN 411 L19 S.24

RCA, Carry Skip Adder Comparison

0

10

20

30

40

50

60

70

8 bits 16 bits 32 bits 48 bits 64 bits

RCA

CSkA

B=2 B=3B=4

B=5B=6

Sp09 CMPEN 411 L19 S.25

Carry Skip Adder Extensions Variable block sizes

A carry that is generated in, or absorbed by, one of the inner blocks travels a shorter distance through the skip blocks, so can have bigger blocks for the inner carries without increasing the overall delay

CinCout

Sp09 CMPEN 411 L19 S.26

Carry Select Adder

4-b Setup

“0” carry propagation

“1” carry propagation 1

0

multiplexer CinCout

Sum generation

P’s G’s

C’s

Precompute the carry out of each block for both carry_in = 0 and carry_in = 1 (can be done for all blocks in parallel) and then select the correct one

A’s B’s

S’s

Sp09 CMPEN 411 L19 S.27

Carry Select Adder: Critical Path

Setup

“0” carry

“1” carry 1

0

muxCin

Sum gen

P’s G’s

C’s

S’s

A’s B’s

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

S’s

A’s B’s

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

S’s

A’s B’s

Setup

“0” carry

“1” carry

muxCout

Sum gen

P’s G’s

C’s

S’s

A’s B’sbits 0 to 3bits 4 to 7bits 8 to 1bits 12 to 15

Tadd = tsetup + B tcarry + N/B tmux + tsum

1

+4

+1+1+1+1

+1

Sp09 CMPEN 411 L19 S.28

Square Root Carry Select Adder

Setup

“0” carry

“1” carry 1

0

muxCin

Sum gen

P’sG’s

C’s

S’s

As B’sA’s Bs

1

0

S’s

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

A’s B’s

Setup

“0” carry

“1” carry 1

0

muxCout

Sum gen

P’s G’s

C’s

S’s

A’s B’sbits 0 to 1bits 2 to 4bits 5 to 8bits 9 to 13

Tadd = tsetup + 2 tcarry + √2N tmux + tsum

Setup

1

0

mux

Sum gen

P’s G’s

C’s

S’s

“1” carry

“0” carry

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

A’s B’sbits 14 to 19

1

+2

+1+1+1+1+1

+1

+3+4+5+6

S’s

Sp09 CMPEN 411 L19 S.29

Look-Ahead: Topology

Co k Gk Pk Gk 1– Pk 1– Co k 2–+ +=

Co k Gk Pk Gk 1– Pk 1– P1 G0 P0Ci 0+ + + +=

Expanding Lookahead equations:

All the way:

Co k f A k Bk Co k 1– Gk P kCo k 1–+= =

Sp09 CMPEN 411 L19 S.30

LookAhead - Basic Idea

AN-1, BN-1A1, B1

P1

S1

• • •

• • • SN-1

PN-1Ci, N-1

S0

P0Ci,0 Ci,1

A

Sp09 CMPEN 411 L19 S.31

Look-Ahead: Topology

Co k Gk Pk Gk 1– Pk 1– P1 G0 P0Ci 0+ + + +=

Co,3

Ci,0

VDD

P0

P1

P2

P3

G0

G1

G2

Sp09 CMPEN 411 L19 S.32

Logarithmic Look-Ahead Adder

A7

F

A6A5A4A3A2A1

A0

A0A1

A2A3

A4A5

A6A7

F

tp log2(N)

tp N

Sp09 CMPEN 411 L19 S.33

Carry Lookahead Trees

Co 0 G0 P0Ci 0+=

Co 1 G1 P1G0 P1P0Ci 0+ +=

Co 2 G2 P2G1 P2P1G0 P+ 2P1P0C i 0+ +=

G2 P2G1+ = P2P1 G0 P0Ci 0+ + G2:1 P2:1Co 0+=

Can continue building the tree hierarchically.

Sp09 CMPEN 411 L19 S.34

Carry Operator Define carry operator € on (G,P) signal pairs

€ is associative, i.e.,

[(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)]

€

(G’’,P’’) (G’,P’)

(G,P)

where G = G’’ | P’’&G’ P = P’’&P’

€

€ €

€

G’

!G

G’’

P’’

Sp09 CMPEN 411 L19 S.35

PPA (Partially Prefix Adder) General Structure Given P and G terms for each bit position, computing all

the carries is equal to finding all the prefixes in parallel

(G0,P0) € (G1,P1) € (G2,P2) € … € (GN-2,PN-2) € (GN-1,PN-1)

Since € is associative, we can group them in any order

Measures to consider number of € cells tree cell depth (time) tree cell area cell fan-in and fan-out max wiring length wiring congestion delay path variation (glitching)

Pi, Gi logic (1 unit delay)

Si logic (1 unit delay)

Ci parallel prefix logic tree (1 unit delay per level)

Sp09 CMPEN 411 L19 S.36

Brent-Kung PPAP

aral

lel P

refix

Com

puta

tion

€

G0

P0

G1

P1

G2

p2

G3

P3

G4

P4

G5

P5

G6

P6

G7

P7

G8

P8

G9

p9

G10

P10

G11

p11

G12

P12

G13

p13

G14

p14

G15

p15

€€€€€€€

€ € € €

€

€

€

€

€

€

€ € € € € €

€ €

C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16

T =

log 2

NT

= lo

g 2N

- 2

A =

2lo

g 2N

-2

A = N/2

Sp09 CMPEN 411 L19 S.37

A Faster Yet PPA

There are even faster PPA approaches that are used in most modern day machines for operands of 32 bits or greater

Kogge-Stone (KS) faster pp tree (logN for KS versus 2logN-2 for BK) fan-out of carry cell € limited to two takes more € cells and has more wiring

Brent-Kung (BK) adder has the time bound of

TBK = 1 + (2log N – 2) + 1

Sp09 CMPEN 411 L19 S.38

Kogge-Stone PPF AdderP

aral

lel P

refix

Com

puta

tion

T =

log 2

NA

= lo

g 2N

A = N

€

G0

P0

G1

P1

G2

P2

G3

P3

G4

P4

G5

P5

G6

P6

G7

P7

G8

P8

G9

P9

G10

P10

G11

P11

G12

P12

G13

P13

G14

P14

G15

P15

€€€€€€€

€ € € €

€

€

€

€

C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16

Cin

€€€€€€€€

€ € € € € € € € € €

€ € € € € € € € € €

€ € € € € €

Tadd = tsetup + log2N t€ + tsum

Sp09 CMPEN 411 L19 S.39

PPA Comparisons

Measure BK PPA N=64 KS PPA N=64# of € cells 2N - 2 - logN 129 NlogN - N + 1 321

tree depth 2logN - 2 10 logN 6

tree area (WxH)

(N/2) * (2logN -2) 320 N * logN 384

cell fan-in 2 2 2 2

cell fan-out logN 6 2 2

max wire length

N/4 16 N/2 32

wiring density

sparse dense

glitching high low

Sp09 CMPEN 411 L19 S.40

More Adder Comparisons

0

10

20

30

40

50

60

70

8 bits 16 bits 32 bits 48 bits 64 bits

RCA

CSkA

KS PPA

Sp09 CMPEN 411 L19 S.41

State of art

Sp09 CMPEN 411 L19 S.42

Next Lecture and Reminders Next lecture

Multiplier Design- Reading assignment – Rabaey, et al, 11.4

sp09 cmpen 411 l19 s.1 cmpen 411 vlsi digital circuits spring 2009 lecture 19: adder design [adapted...

Documents