l21: adder design - uahmilenka/cpe527-03f/lectures/l21_3p.pdf · 2003. 11. 17. · a 64-bit...

•VLSI Design I; A. Milenkovic •1

CPE/EE 427, CPE 527 VLSI Design I

L21: Adder Design

Department of Electrical and Computer Engineering University of Alabama in Huntsville

Aleksandar Milenkovic ( www. ece.uah.edu/~milenka )www. ece.uah.edu/~milenka/cpe527-03F

[Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al. and Mary Jane Irwin ( www. cse. psu.edu/~mji ) ]

11/17/2003 VLSI Design I; A. Milenkovic 2

Course Administration

• Instructor: Aleksandar [email protected]/~milenkaEB 217-LOffice Hrs: MW 17:30-18:30

• TA: Fathima Tareentareenf @eng.uah.edu

• Project pr.: For schedulehttp://www.ece.uah.edu/~milenka/cpe527-03FFollow conventions for ppt file namesTiming, content, ...

• HW#3: Due 12/01/03• Project: Reports due 12/12/03

Design submission due 10/12/03 (arrange with instructor & lab instructor)


Review: Basic Building Blocks

• Datapath– Execution units

• Adder, multiplier, divider, shifter, etc.

– Register file and pipeline registers– Multiplexers, decoders

• Control– Finite state machines (PLA, ROM, random logic)

• Interconnect– Switches, arbiters, buses

• Memory– Caches (SRAMs), TLBs, DRAMs, buffers



The 1-bit Binary Adder

1-bit Full Adder(FA)

A

BS

Cin

S = A ⊕ B ⊕ C inCout = A&B | A&C in | B&C in (majority function)

q How can we use it to build a 64-bit adder?

q How can we modify it easily to build an adder/subtractor?

q How can we make it better (faster, lower power, smaller)?

generategenerate

propagatepropagatepropagatepropagate

killkill

carry status

1111101011011011000101110100101010000000SCoutCinBA

Cout

G = A&BP = A ⊕ BK = !A & !B

= P ⊕ C in

= G | P&C in


FA Gate Level Implementations

A B

S

Cout

C in

t1 t0t2 t0

t1

A B

S

Cout

C in

t2

• The way you learned to design in EE201 and CPE422


Carry-Look-Ahead Adder – CLA (1)

• Idea: speed up carry computation – C i+1 = Gi + Pi*C i• Propagate: Pi = Ai + Bi

– if Pi = 1, then carry from (i-1)th stage is propagated

• Generate: Gi = Ai*Bi – if Gi = 1 there is carry out

2i2i1ii2i1ii1iii

2i2i2i1ii1iii

1i1ii1iii

1i1i1iii

iii1i

iii

iii

CPPPGPPGPG

)CPG(PPGPGCPPGPG

)CPG(PGCPGC

BAGBAP

−−−−−−

−−−−−

−−−

−−−

+

⋅⋅⋅+⋅⋅+⋅+=

⋅+⋅⋅+⋅+=⋅⋅+⋅+=

⋅+⋅+=⋅+=

⋅=+=

iiii CBAS ⊕⊕=




00123

01231232334

00120121223

0010112

0001

CPPPP

GPPPGPPGPGC

CPPPGPPGPGCCPPGPGC

CPGC

⋅⋅⋅⋅

+⋅⋅⋅+⋅⋅+⋅+=

⋅⋅⋅+⋅⋅+⋅+=⋅⋅+⋅+=

⋅+=

PG Generator Carry GenerateBlock

Sum Generator

G0

P 0

C0

C1

C2

C3S 3

S 2

S 1

S 0

B 3

A 3

P 3

G3

G2

G1

P 1

P 2

B 2

A 2

B 1

A 1

B 0

A 0

P 2

P 1

P 0

C0

P 2P 1G0

P 2

G1

G2

C3

P0

C0

G0

C1



0012301231232334

00120121223

0010112

0001

CPPPPGPPPGPPGPGC

CPPPGPPGPGCCPPGPGC

CPGC

⋅⋅⋅⋅+⋅⋅⋅+⋅⋅+⋅+=

⋅⋅⋅+⋅⋅+⋅+=⋅⋅+⋅+=

⋅+=

0012301231232334 CPPPPGPPPGPPGPGC ⋅⋅⋅⋅+⋅⋅⋅+⋅⋅+⋅+=

030304 CPGC ⋅+= −−

003478111215

0347811121547811121581112151215

0034781103478114781181112151215

121215121516

00347811034781147811811

00347034747811811

881181112

00347034747

003034747447478

CPPPP

GPPPGPPGPG

)CPPPGPPGPG(PG

CPGC

CPPPGPPGPG

)CPPGPG(PG

CPGC

CPPGPG

)CPG(PGCPGC

⋅⋅⋅⋅+

⋅⋅⋅+⋅⋅+⋅+=

⋅⋅⋅+⋅⋅+⋅+⋅+=

⋅+=

⋅⋅⋅+⋅⋅+⋅+=

⋅⋅+⋅+⋅+=

⋅+=

⋅⋅+⋅+=

⋅+⋅+=⋅+=

−−−−

−−−−−−−−−−

−−−−−−−−−−−

−−

−−−−−−−−−

−−−−−−−

−−

−−−−−

−−−−−−



A3:0

A4

B3:0

S3:0

A7-4

A4

B7-4

S7-4

A11-8

A4

B11-8

S11-8

A15-12

A4

B15-12

S15-12

CGC0C0C4C8C12 P3-0G3-0P7-4G7-4P11-8G11-8P15-12G15-12

G15-0 P15-0



Review: XOR FA

Cout

S

Cin

A

B

16 transistors


Review: CPL FA

A

! A

B! B Cin! Cin

!S

S

Cout

!CoutA

! A

B

! B

! B

B Cin ! Cin

Cin

! Cin

20+8 transistors, dual rail – beware of threshold drops


Delay Balanced FA

B ! BIdentical Delays for Carry

and Sum

P ! P

Signal set-up

B

A

! B

pA

Carry generation

Sum generation

Cin

! P

A

! Cout

! P

P

Cin

P

A

! Cout

P

! P

SCin Cin

20+2 transistors



Review: Mirror Adder

B

B B

B B

BB

BA

A

A

A

A

A A

A

Cin

Cin

Cin

Cin

Cin!Cout !S

24+4 transistors

kill

generate

0-propagate

1-propagate

Cout = A&B | B&Cin | A&Cin SUM = A&B&Cin | COUT&(A | B | Cin)

4 4

4 4

4

8

888

8

2 2 23

3

3

6

6

6

444

4

2

Sizing: Each input in the carry circuit has a logical effort of 2 so the optimal fan-out for each is also 2. Since ! Cout drives 2 internal and 2 inverter transistor gates (to form Cin for the nms bit adder) should oversize the carry circuit. PMOS/NMOS ratio of 2.


Mirror Adder Features

• The NMOS and PMOS chains are completely symmetrical with a maximum of two series transistors in the carry circuitry, guaranteeing identical rise and fall transitions if the NMOS and PMOS devices are properly sized.

• When laying out the cell, the most critical issue is the minimization of the capacitances at node !Cout (four diffusion capacitances, two internal gate capacitances, and two inverter gate capacitances). Shared diffusions can reduce the stack node capacitances.

• The transistors connected to C in are placed closest to the output.

• Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.


A 64-bit Adder/Subtractor

• Ripple Carry Adder (RCA) built out of 64 FAs

• Subtraction – complement all subtrahend bits (xorgates) and set the low order carry -in

• RCA– advantage: simple logic, so

small (low cost)– disadvantage: slow (O(N) for

N bits) and lots of glitching(so lots of energy consumption)

1-bit FA S0

C0=Cin

C11-bit FA S1

C21-bit FA S2

C3

C64=Cout

1-bit FA S63

C63

. . .

A0

B0

A1

B1

A2

B2

A63

B63

add/subt



Ripple Carry Adder (RCA)

A0 B0

S0

C0=C inFA

A1 B1

S1

FA

A2 B2

S2

FA

A3 B3

S3

FACout=C4

T = O(N) worst case delay

Tadder ≈ TFA(A,B→Cout) + (N-2)TFA(Cin→Cout) + TFA(C in→S)

Real Goal: Make the fastest possible carry path


Inversion Property

A B

S

C inFA

!Cout (A, B, C in) = Cout (!A, !B, !C in)

Cout

A B

S

FACout C in

!S (A, B, C in) = S(!A, !B, !C in)

≡

• Inverting all inputs to a FA results in inverted values for all outputs


Exploiting the Inversion Property

A0 B0

S0

C0=C inFA’

A1 B1

S1

FA’

A2 B2

S2

FA’

A3 B3

S3

FA’Cout=C4

Now need two “flavors” of FAs

regular cellinverted cell

• Minimizes the critical path (the carry chain) by eliminating inverters between the FAs (will need to increase the transistor sizing on the carry chain portion of the mirror adder).



Fast Carry Chain Design

• The key to fast addition is a low latency carry network• What matters is whether in a given position a carry is

– generated Gi = Ai & Bi = AiBi– propagated Pi = Ai ⊕ Bi (sometimes use Ai | Bi)– annihilated (killed) Ki = !Ai & !Bi

• Giving a carry recurrence ofC i+1 = Gi | PiC i

C1 =C2 =

C3 =C4 =


Fast Carry Chain Design

• The key to fast addition is a low latency carry network• What matters is whether in a given position a carry is

– generated Gi = Ai & Bi = AiBi– propagated Pi = Ai ⊕ Bi (sometimes use Ai | Bi)– annihilated (killed) Ki = !Ai & !Bi

• Giving a carry recurrence ofC i+1 = Gi | PiC i

C1 = G0 | P0C0C2 = G1 | P1G0 | P1P0 C0C3 = G2 | P2G1 | P2P1G0 | P2P1P0 C0C4 = G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 C0


Manchester Carry Chain

• Switches controlled by Gi and Pi

• Total delay of– time to form the switch control signals Gi and Pi– setup time for the switches– signal propagation delay through N switches in the

worst case

Gi Pi

!Ci!Ci+1

clk



4-bit Sliced MCC Adder

G P

!C0

clk

G PG PG P

⊕⊕⊕⊕

& ⊕& ⊕& ⊕& ⊕

A0 B0A1 B1A2 B2A3 B3

S0S1S2S3

!C1!C2!C3

!C4


Domino Manchester Carry Chain Circuit

Ci,0G0

clk

clkP0P1P2P3

G1G2G3

Ci,41 2 3 4

5

6

3 3 3 3 3

1

2

2

3

3

4

4

5

!(G0 | P0 Ci,0)

!(G1 | P1G0 | P1P0 Ci,0)

!(G2 | P2G1 | P2P1G0 | P2P1P0 Ci,0)

!(G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 Ci,0)


Binary Adder Landscape

synchronous word parallel adders

ripple carry adders (RCA) carry prop min adders

signed-digit fast carry prop residue adders adders adders

Manchester carry parallel conditional carrycarry chain select prefix sum skip

T = O(N), A = O(N)

T = O(1), A = O(N)

T = O(log N)A = O(N log N)

T = O(√N), A = O(N)T = O(N)

A = O(N)



Carry-Skip (Carry-Bypass) Adder

If (P0 & P1 & P2 & P3 = 1) then Co,3 = Ci,0 otherwise the block itself kills or generates the carry internally

A0 B0

S0

C i,0FA

A1 B1

S1

FA

A2 B2

S2

FA

A3 B3

S3

FACo,3

Co,3

BP = P0 P1 P2 P3 “Block Propagate”


Carry-Skip Chain Implementation

BPblock carry-in

block carry-outcarry-out

CinG0

P0P1P2P3

G1G2G3

!Cout

BP


4-bit Block Carry-Skip Adder

Worst-case delay→ carry from bit 0 to bit 15 = carry generated in bit 0, ripples through bits 1, 2, and 3, skips the middle two groups (B is the group size in bits), ripples in the last group from bit 12 to bit 15

Ci,0

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

Sum

CarryPropagation

Setup

bits 0 to 3bits 4 to 7bits 8 to 11bits 12 to 15

Tadd = tsetup + B tcarry + ((N/B) -1) tskip +B tcarry + tsum



Optimal Block Size and Time

• Assuming one stage of ripple (tcarry) has the same delay as one skip logic stage (tskip) and both are 1

TCSkA = 1 + B + (N/B-1) + B + 1tsetup ripple in skips ripple in tsum

block 0 last block

= 2B + N/B + 1

• So the optimal block size, B, is dTCSkA/dB = 0 ⇒ √(N/2) = Bopt

• And the optimal time isOptimal TCSkA = 2(√(2N)) + 1


Carry-Skip Adder Extensions

• Variable block sizes– A carry that is generated in, or absorbed by, one of the inner blocks

travels a shorter distance through the skip blocks, so can have bigger blocks for the inner carries without increasing the overall delay

CinCout

• Multiple levels of skip logic

skip level 1

skip level 2

CinCout

AND of the first level skip signals (BP’s)


Carry-Skip Adder Comparisons

0

10

20

30

40

50

60

70

8 bits 16 bits 32 bits 48 bits 64 bits

RCACSkAVSkA

B=2 B=3B=4

B=5B=6



Carry Select Adder

4-b Setup

“0” carry propagation

“1” carry propagation 1

0

multiplexer C inCout

Sum generation

P’s G’s

C’s

q Precompute the carry out of each block for both carry_in = 0 and carry_in = 1 (can be done for all blocks in parallel) and then select the correct one

A’s B’s

S’s


Carry Select Adder: Critical Path

Setup

“0” carry

“1” carry 1

0

mux Cin

Sum gen

P’s G’s

C’s

S’s

A’s B’s

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

S’s

A’s B’s

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

S’s

A’s B’s

Setup

“0” carry

“1” carry

muxCout

Sum gen

P’s G’s

C’s

S’s

A’s B’sbits 0 to 3bits 4 to 7bits 8 to 1bits 12 to 15


Carry Select Adder: Critical Path

Setup

“0” carry

“1” carry 1

0

mux Cin

Sum gen

P’s G’s

C’s

S’s

A’s B’s

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

S’s

A’s B’s

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

S’s

A’s B’s

Setup

“0” carry

“1” carry

muxCout

Sum gen

P’s G’s

C’s

S’s


Tadd = tsetup + B tcarry + N/B tmux + tsum

1

+4

+1+1+1+1

+1



Square Root Carry Select Adder

Setup

“0” carry

“1” carry 1

0

mux Cin

Sum gen

P’sG’s

C’s

S’s

A’s B’sA’s B’s

S’s

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

A’s B’s

Setup

“0” carry

“1” carry

muxCout

Sum gen

P’s G’s

C’s

S’s


Setup

mux

Sum gen

P’sG’s

C’s

S’s

“1” carry

“0” carry

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

A’s B’sbits 14 to 19

S’s


Square Root Carry Select Adder

Setup

“0” carry

“1” carry 1

0

mux Cin

Sum gen

P’sG’s

C’s

S’s

As B’sA’s Bs

1

0

S’s

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

A’s B’s

Setup

“0” carry

“1” carry 1

0

muxCout

Sum gen

P’s G’s

C’s

S’s


Tadd = tsetup + 2 tcarry + vN tmux + tsum

Setup

1

0

mux

Sum gen

P’sG’s

C’s

S’s

“1” carry

“0” carry

Setup

“0” carry

“1” carry

mux

Sum gen

P’s G’s

C’s

A’s B’sbits 14 to 19

1

+2

+1+1+1+1+1

+1

+3+4+5+6

S’s


Parallel Prefix Adders (PPAs)

• Define carry operator € on (G,P) signal pairs

– € is associative, i.e.,[(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)]

€

(G’’,P’’) (G’,P’)

(G,P)

whereG = G’’ ∨ P’’G’P = P’’P’

€

€ €

€

G’!G

G’’

P’’



PPA General Structure

• Given P and G terms for each bit position, computing all the carries is equal to finding all the prefixes in parallel

(G0,P0) € (G1,P1) € (G2,P2) € … € (GN- 2,PN-2) € (GN-1,PN-1)• Since € is associative, we can group them in any order

– but note that it is not commutative

• Measures to consider– number of € cells– tree cell depth (time)– tree cell area– cell fan-in and fan-out– max wiring length– wiring congestion– delay path variation

(glitching)

Pi, Gi logic (1 unit delay)

Si logic (1 unit delay)

Ci parallel prefix logic tree (1 unit delay per level)


Brent-Kung PPA

Par

alle

l Pre

fix C

ompu

tatio

n €

G0P0

G1P1

G2p2

G3P3

G4P4

G5P5

G6P6

G7P7

G8P8

G9p9

G10P10

G11p11

G12P12

G13p13

G14p14

G15p15

€€€€€€€

€ € € €

€

€

€

€

€

€

€ € € € € €

€ €

C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16

Cin

€

T =

log 2

NT

= lo

g 2N

-2

A =

2lo

g 2N

A = N/2


Brent-Kung PPA

Par

alle

l Pre

fix C

ompu

tatio

n €

G0P0

G1P1

G2p2

G3P3

G4P4

G5P5

G6P6

G7P7

G8P8

G9p9

G10P10

G11p11

G12P12

G13p13

G14p14

G15p15

€€€€€€€

€ € € €

€

€

€

€

€

€

€ € € € € €

€ €


Cin

€

T =

log 2

NT

= lo

g 2N

-2

A =

2lo

g 2N

A = N/2



Kogge-Stone PPF Adder

Par

alle

l Pre

fix C

ompu

tatio

n €

G0P0

G1P1

G2P2

G3P3

G4P4

G5P5

G6P6

G7P7

G8P8

G9P9

G10P10

G11P11

G12P12

G13P13

G14P14

G15P15

€€€€€€€

€ € € €

€

€

€

€


Cin

€

T =

log 2

N

A =

log 2

N

A = N

€€€€€€€

€ € € € € € € € € €

€ € € € € € € € € €

€ € € € € €

Tadd = tsetup + log2N t€ + tsum


More Adder Comparisons

0

10

20

30

40

50

60

70


RCA

CSkAVSkAKS PPA


Adder Speed Comparisons

10

20

30

40

50

60

70

16 bits 32 bits 64 bits

RCAMCCCCSkAVCSkACCSlAB&K



Adder Average Power Comparisons

0

5

10

15

20

25

30

35

16 bits 32 bits 64 bits

RCAMCC

CCSkAVCSkACCSlAB&K


PDP of Adder Comparisons

0

20

40

60

80

100


RCAMCCACCSkAVCSkACCSlABKA

From From NagendraNagendra, 1996, 1996

l21: adder design - uahmilenka/cpe527-03f/lectures/l21_3p.pdf · 2003. 11. 17. · a 64-bit...

Documents