l21: adder design - uahmilenka/cpe527-03f/lectures/l21_3p.pdf · 2003. 11. 17. · a 64-bit...
TRANSCRIPT
-
•VLSI Design I; A. Milenkovic •1
CPE/EE 427, CPE 527 VLSI Design I
L21: Adder Design
Department of Electrical and Computer Engineering University of Alabama in Huntsville
Aleksandar Milenkovic ( www. ece.uah.edu/~milenka )www. ece.uah.edu/~milenka/cpe527-03F
[Adapted from Rabaey’s Digital Integrated Circuits, ©2002, J. Rabaey et al. and Mary Jane Irwin ( www. cse. psu.edu/~mji ) ]
11/17/2003 VLSI Design I; A. Milenkovic 2
Course Administration
• Instructor: Aleksandar [email protected]/~milenkaEB 217-LOffice Hrs: MW 17:30-18:30
• TA: Fathima Tareentareenf @eng.uah.edu
• Project pr.: For schedulehttp://www.ece.uah.edu/~milenka/cpe527-03FFollow conventions for ppt file namesTiming, content, ...
• HW#3: Due 12/01/03• Project: Reports due 12/12/03
Design submission due 10/12/03 (arrange with instructor & lab instructor)
11/17/2003 VLSI Design I; A. Milenkovic 3
Review: Basic Building Blocks
• Datapath– Execution units
• Adder, multiplier, divider, shifter, etc.
– Register file and pipeline registers– Multiplexers, decoders
• Control– Finite state machines (PLA, ROM, random logic)
• Interconnect– Switches, arbiters, buses
• Memory– Caches (SRAMs), TLBs, DRAMs, buffers
-
•VLSI Design I; A. Milenkovic •2
11/17/2003 VLSI Design I; A. Milenkovic 4
The 1-bit Binary Adder
1-bit Full Adder(FA)
A
BS
Cin
S = A ⊕ B ⊕ C inCout = A&B | A&C in | B&C in (majority function)
q How can we use it to build a 64-bit adder?
q How can we modify it easily to build an adder/subtractor?
q How can we make it better (faster, lower power, smaller)?
generategenerate
propagatepropagatepropagatepropagate
killkill
carry status
1111101011011011000101110100101010000000SCoutCinBA
Cout
G = A&BP = A ⊕ BK = !A & !B
= P ⊕ C in
= G | P&C in
11/17/2003 VLSI Design I; A. Milenkovic 5
FA Gate Level Implementations
A B
S
Cout
C in
t1 t0t2 t0
t1
A B
S
Cout
C in
t2
• The way you learned to design in EE201 and CPE422
11/17/2003 VLSI Design I; A. Milenkovic 6
Carry-Look-Ahead Adder – CLA (1)
• Idea: speed up carry computation – C i+1 = Gi + Pi*C i• Propagate: Pi = Ai + Bi
– if Pi = 1, then carry from (i-1)th stage is propagated
• Generate: Gi = Ai*Bi – if Gi = 1 there is carry out
2i2i1ii2i1ii1iii
2i2i2i1ii1iii
1i1ii1iii
1i1i1iii
iii1i
iii
iii
CPPPGPPGPG
)CPG(PPGPGCPPGPG
)CPG(PGCPGC
BAGBAP
−−−−−−
−−−−−
−−−
−−−
+
⋅⋅⋅+⋅⋅+⋅+=
⋅+⋅⋅+⋅+=⋅⋅+⋅+=
⋅+⋅+=⋅+=
⋅=+=
iiii CBAS ⊕⊕=
-
•VLSI Design I; A. Milenkovic •3
11/17/2003 VLSI Design I; A. Milenkovic 7
Carry-Look-Ahead Adder – CLA (2)
00123
01231232334
00120121223
0010112
0001
CPPPP
GPPPGPPGPGC
CPPPGPPGPGCCPPGPGC
CPGC
⋅⋅⋅⋅
+⋅⋅⋅+⋅⋅+⋅+=
⋅⋅⋅+⋅⋅+⋅+=⋅⋅+⋅+=
⋅+=
PG Generator Carry GenerateBlock
Sum Generator
G0
P 0
C0
C1
C2
C3S 3
S 2
S 1
S 0
B 3
A 3
P 3
G3
G2
G1
P 1
P 2
B 2
A 2
B 1
A 1
B 0
A 0
P 2
P 1
P 0
C0
P 2P 1G0
P 2
G1
G2
C3
P0
C0
G0
C1
11/17/2003 VLSI Design I; A. Milenkovic 8
Carry-Look-Ahead Adder – CLA (4)
0012301231232334
00120121223
0010112
0001
CPPPPGPPPGPPGPGC
CPPPGPPGPGCCPPGPGC
CPGC
⋅⋅⋅⋅+⋅⋅⋅+⋅⋅+⋅+=
⋅⋅⋅+⋅⋅+⋅+=⋅⋅+⋅+=
⋅+=
0012301231232334 CPPPPGPPPGPPGPGC ⋅⋅⋅⋅+⋅⋅⋅+⋅⋅+⋅+=
030304 CPGC ⋅+= −−
003478111215
0347811121547811121581112151215
0034781103478114781181112151215
121215121516
00347811034781147811811
00347034747811811
881181112
00347034747
003034747447478
CPPPP
GPPPGPPGPG
)CPPPGPPGPG(PG
CPGC
CPPPGPPGPG
)CPPGPG(PG
CPGC
CPPGPG
)CPG(PGCPGC
⋅⋅⋅⋅+
⋅⋅⋅+⋅⋅+⋅+=
⋅⋅⋅+⋅⋅+⋅+⋅+=
⋅+=
⋅⋅⋅+⋅⋅+⋅+=
⋅⋅+⋅+⋅+=
⋅+=
⋅⋅+⋅+=
⋅+⋅+=⋅+=
−−−−
−−−−−−−−−−
−−−−−−−−−−−
−−
−−−−−−−−−
−−−−−−−
−−
−−−−−
−−−−−−
11/17/2003 VLSI Design I; A. Milenkovic 9
Carry-Look-Ahead Adder – CLA (5)
A3:0
A4
B3:0
S3:0
A7-4
A4
B7-4
S7-4
A11-8
A4
B11-8
S11-8
A15-12
A4
B15-12
S15-12
CGC0C0C4C8C12 P3-0G3-0P7-4G7-4P11-8G11-8P15-12G15-12
G15-0 P15-0
-
•VLSI Design I; A. Milenkovic •4
11/17/2003 VLSI Design I; A. Milenkovic 10
Review: XOR FA
Cout
S
Cin
A
B
16 transistors
11/17/2003 VLSI Design I; A. Milenkovic 11
Review: CPL FA
A
! A
B! B Cin! Cin
!S
S
Cout
!CoutA
! A
B
! B
! B
B Cin ! Cin
Cin
! Cin
20+8 transistors, dual rail – beware of threshold drops
11/17/2003 VLSI Design I; A. Milenkovic 12
Delay Balanced FA
B ! BIdentical Delays for Carry
and Sum
P ! P
Signal set-up
B
A
! B
pA
Carry generation
Sum generation
Cin
! P
A
! Cout
! P
P
Cin
P
A
! Cout
P
! P
SCin Cin
20+2 transistors
-
•VLSI Design I; A. Milenkovic •5
11/17/2003 VLSI Design I; A. Milenkovic 13
Review: Mirror Adder
B
B B
B B
BB
BA
A
A
A
A
A A
A
Cin
Cin
Cin
Cin
Cin!Cout !S
24+4 transistors
kill
generate
0-propagate
1-propagate
Cout = A&B | B&Cin | A&Cin SUM = A&B&Cin | COUT&(A | B | Cin)
4 4
4 4
4
8
888
8
2 2 23
3
3
6
6
6
444
4
2
Sizing: Each input in the carry circuit has a logical effort of 2 so the optimal fan-out for each is also 2. Since ! Cout drives 2 internal and 2 inverter transistor gates (to form Cin for the nms bit adder) should oversize the carry circuit. PMOS/NMOS ratio of 2.
11/17/2003 VLSI Design I; A. Milenkovic 14
Mirror Adder Features
• The NMOS and PMOS chains are completely symmetrical with a maximum of two series transistors in the carry circuitry, guaranteeing identical rise and fall transitions if the NMOS and PMOS devices are properly sized.
• When laying out the cell, the most critical issue is the minimization of the capacitances at node !Cout (four diffusion capacitances, two internal gate capacitances, and two inverter gate capacitances). Shared diffusions can reduce the stack node capacitances.
• The transistors connected to C in are placed closest to the output.
• Only the transistors in the carry stage have to be optimized for optimal speed. All transistors in the sum stage can be minimal size.
11/17/2003 VLSI Design I; A. Milenkovic 15
A 64-bit Adder/Subtractor
• Ripple Carry Adder (RCA) built out of 64 FAs
• Subtraction – complement all subtrahend bits (xorgates) and set the low order carry -in
• RCA– advantage: simple logic, so
small (low cost)– disadvantage: slow (O(N) for
N bits) and lots of glitching(so lots of energy consumption)
1-bit FA S0
C0=Cin
C11-bit FA S1
C21-bit FA S2
C3
C64=Cout
1-bit FA S63
C63
. . .
A0
B0
A1
B1
A2
B2
A63
B63
add/subt
-
•VLSI Design I; A. Milenkovic •6
11/17/2003 VLSI Design I; A. Milenkovic 16
Ripple Carry Adder (RCA)
A0 B0
S0
C0=C inFA
A1 B1
S1
FA
A2 B2
S2
FA
A3 B3
S3
FACout=C4
T = O(N) worst case delay
Tadder ≈ TFA(A,B→Cout) + (N-2)TFA(Cin→Cout) + TFA(C in→S)
Real Goal: Make the fastest possible carry path
11/17/2003 VLSI Design I; A. Milenkovic 17
Inversion Property
A B
S
C inFA
!Cout (A, B, C in) = Cout (!A, !B, !C in)
Cout
A B
S
FACout C in
!S (A, B, C in) = S(!A, !B, !C in)
≡
• Inverting all inputs to a FA results in inverted values for all outputs
11/17/2003 VLSI Design I; A. Milenkovic 18
Exploiting the Inversion Property
A0 B0
S0
C0=C inFA’
A1 B1
S1
FA’
A2 B2
S2
FA’
A3 B3
S3
FA’Cout=C4
Now need two “flavors” of FAs
regular cellinverted cell
• Minimizes the critical path (the carry chain) by eliminating inverters between the FAs (will need to increase the transistor sizing on the carry chain portion of the mirror adder).
-
•VLSI Design I; A. Milenkovic •7
11/17/2003 VLSI Design I; A. Milenkovic 19
Fast Carry Chain Design
• The key to fast addition is a low latency carry network• What matters is whether in a given position a carry is
– generated Gi = Ai & Bi = AiBi– propagated Pi = Ai ⊕ Bi (sometimes use Ai | Bi)– annihilated (killed) Ki = !Ai & !Bi
• Giving a carry recurrence ofC i+1 = Gi | PiC i
C1 =C2 =
C3 =C4 =
11/17/2003 VLSI Design I; A. Milenkovic 20
Fast Carry Chain Design
• The key to fast addition is a low latency carry network• What matters is whether in a given position a carry is
– generated Gi = Ai & Bi = AiBi– propagated Pi = Ai ⊕ Bi (sometimes use Ai | Bi)– annihilated (killed) Ki = !Ai & !Bi
• Giving a carry recurrence ofC i+1 = Gi | PiC i
C1 = G0 | P0C0C2 = G1 | P1G0 | P1P0 C0C3 = G2 | P2G1 | P2P1G0 | P2P1P0 C0C4 = G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 C0
11/17/2003 VLSI Design I; A. Milenkovic 21
Manchester Carry Chain
• Switches controlled by Gi and Pi
• Total delay of– time to form the switch control signals Gi and Pi– setup time for the switches– signal propagation delay through N switches in the
worst case
Gi Pi
!Ci!Ci+1
clk
-
•VLSI Design I; A. Milenkovic •8
11/17/2003 VLSI Design I; A. Milenkovic 22
4-bit Sliced MCC Adder
G P
!C0
clk
G PG PG P
⊕⊕⊕⊕
& ⊕& ⊕& ⊕& ⊕
A0 B0A1 B1A2 B2A3 B3
S0S1S2S3
!C1!C2!C3
!C4
11/17/2003 VLSI Design I; A. Milenkovic 23
Domino Manchester Carry Chain Circuit
Ci,0G0
clk
clkP0P1P2P3
G1G2G3
Ci,41 2 3 4
5
6
3 3 3 3 3
1
2
2
3
3
4
4
5
!(G0 | P0 Ci,0)
!(G1 | P1G0 | P1P0 Ci,0)
!(G2 | P2G1 | P2P1G0 | P2P1P0 Ci,0)
!(G3 | P3G2 | P3P2G1 | P3P2P1G0 | P3P2P1P0 Ci,0)
11/17/2003 VLSI Design I; A. Milenkovic 24
Binary Adder Landscape
synchronous word parallel adders
ripple carry adders (RCA) carry prop min adders
signed-digit fast carry prop residue adders adders adders
Manchester carry parallel conditional carrycarry chain select prefix sum skip
T = O(N), A = O(N)
T = O(1), A = O(N)
T = O(log N)A = O(N log N)
T = O(√N), A = O(N)T = O(N)
A = O(N)
-
•VLSI Design I; A. Milenkovic •9
11/17/2003 VLSI Design I; A. Milenkovic 25
Carry-Skip (Carry-Bypass) Adder
If (P0 & P1 & P2 & P3 = 1) then Co,3 = Ci,0 otherwise the block itself kills or generates the carry internally
A0 B0
S0
C i,0FA
A1 B1
S1
FA
A2 B2
S2
FA
A3 B3
S3
FACo,3
Co,3
BP = P0 P1 P2 P3 “Block Propagate”
11/17/2003 VLSI Design I; A. Milenkovic 26
Carry-Skip Chain Implementation
BPblock carry-in
block carry-outcarry-out
CinG0
P0P1P2P3
G1G2G3
!Cout
BP
11/17/2003 VLSI Design I; A. Milenkovic 27
4-bit Block Carry-Skip Adder
Worst-case delay→ carry from bit 0 to bit 15 = carry generated in bit 0, ripples through bits 1, 2, and 3, skips the middle two groups (B is the group size in bits), ripples in the last group from bit 12 to bit 15
Ci,0
Sum
CarryPropagation
Setup
Sum
CarryPropagation
Setup
Sum
CarryPropagation
Setup
Sum
CarryPropagation
Setup
bits 0 to 3bits 4 to 7bits 8 to 11bits 12 to 15
Tadd = tsetup + B tcarry + ((N/B) -1) tskip +B tcarry + tsum
-
•VLSI Design I; A. Milenkovic •10
11/17/2003 VLSI Design I; A. Milenkovic 28
Optimal Block Size and Time
• Assuming one stage of ripple (tcarry) has the same delay as one skip logic stage (tskip) and both are 1
TCSkA = 1 + B + (N/B-1) + B + 1tsetup ripple in skips ripple in tsum
block 0 last block
= 2B + N/B + 1
• So the optimal block size, B, is dTCSkA/dB = 0 ⇒ √(N/2) = Bopt
• And the optimal time isOptimal TCSkA = 2(√(2N)) + 1
11/17/2003 VLSI Design I; A. Milenkovic 29
Carry-Skip Adder Extensions
• Variable block sizes– A carry that is generated in, or absorbed by, one of the inner blocks
travels a shorter distance through the skip blocks, so can have bigger blocks for the inner carries without increasing the overall delay
CinCout
• Multiple levels of skip logic
skip level 1
skip level 2
CinCout
AND of the first level skip signals (BP’s)
11/17/2003 VLSI Design I; A. Milenkovic 30
Carry-Skip Adder Comparisons
0
10
20
30
40
50
60
70
8 bits 16 bits 32 bits 48 bits 64 bits
RCACSkAVSkA
B=2 B=3B=4
B=5B=6
-
•VLSI Design I; A. Milenkovic •11
11/17/2003 VLSI Design I; A. Milenkovic 31
Carry Select Adder
4-b Setup
“0” carry propagation
“1” carry propagation 1
0
multiplexer C inCout
Sum generation
P’s G’s
C’s
q Precompute the carry out of each block for both carry_in = 0 and carry_in = 1 (can be done for all blocks in parallel) and then select the correct one
A’s B’s
S’s
11/17/2003 VLSI Design I; A. Milenkovic 32
Carry Select Adder: Critical Path
Setup
“0” carry
“1” carry 1
0
mux Cin
Sum gen
P’s G’s
C’s
S’s
A’s B’s
Setup
“0” carry
“1” carry
mux
Sum gen
P’s G’s
C’s
S’s
A’s B’s
Setup
“0” carry
“1” carry
mux
Sum gen
P’s G’s
C’s
S’s
A’s B’s
Setup
“0” carry
“1” carry
muxCout
Sum gen
P’s G’s
C’s
S’s
A’s B’sbits 0 to 3bits 4 to 7bits 8 to 1bits 12 to 15
11/17/2003 VLSI Design I; A. Milenkovic 33
Carry Select Adder: Critical Path
Setup
“0” carry
“1” carry 1
0
mux Cin
Sum gen
P’s G’s
C’s
S’s
A’s B’s
Setup
“0” carry
“1” carry
mux
Sum gen
P’s G’s
C’s
S’s
A’s B’s
Setup
“0” carry
“1” carry
mux
Sum gen
P’s G’s
C’s
S’s
A’s B’s
Setup
“0” carry
“1” carry
muxCout
Sum gen
P’s G’s
C’s
S’s
A’s B’sbits 0 to 3bits 4 to 7bits 8 to 1bits 12 to 15
Tadd = tsetup + B tcarry + N/B tmux + tsum
1
+4
+1+1+1+1
+1
-
•VLSI Design I; A. Milenkovic •12
11/17/2003 VLSI Design I; A. Milenkovic 34
Square Root Carry Select Adder
Setup
“0” carry
“1” carry 1
0
mux Cin
Sum gen
P’sG’s
C’s
S’s
A’s B’sA’s B’s
S’s
Setup
“0” carry
“1” carry
mux
Sum gen
P’s G’s
C’s
A’s B’s
Setup
“0” carry
“1” carry
muxCout
Sum gen
P’s G’s
C’s
S’s
A’s B’sbits 0 to 1bits 2 to 4bits 5 to 8bits 9 to 13
Setup
mux
Sum gen
P’sG’s
C’s
S’s
“1” carry
“0” carry
Setup
“0” carry
“1” carry
mux
Sum gen
P’s G’s
C’s
A’s B’sbits 14 to 19
S’s
11/17/2003 VLSI Design I; A. Milenkovic 35
Square Root Carry Select Adder
Setup
“0” carry
“1” carry 1
0
mux Cin
Sum gen
P’sG’s
C’s
S’s
As B’sA’s Bs
1
0
S’s
Setup
“0” carry
“1” carry
mux
Sum gen
P’s G’s
C’s
A’s B’s
Setup
“0” carry
“1” carry 1
0
muxCout
Sum gen
P’s G’s
C’s
S’s
A’s B’sbits 0 to 1bits 2 to 4bits 5 to 8bits 9 to 13
Tadd = tsetup + 2 tcarry + vN tmux + tsum
Setup
1
0
mux
Sum gen
P’sG’s
C’s
S’s
“1” carry
“0” carry
Setup
“0” carry
“1” carry
mux
Sum gen
P’s G’s
C’s
A’s B’sbits 14 to 19
1
+2
+1+1+1+1+1
+1
+3+4+5+6
S’s
11/17/2003 VLSI Design I; A. Milenkovic 36
Parallel Prefix Adders (PPAs)
• Define carry operator € on (G,P) signal pairs
– € is associative, i.e.,[(g’’’,p’’’) € (g’’,p’’)] € (g’,p’) = (g’’’,p’’’) € [(g’’,p’’) € (g’,p’)]
€
(G’’,P’’) (G’,P’)
(G,P)
whereG = G’’ ∨ P’’G’P = P’’P’
€
€ €
€
G’!G
G’’
P’’
-
•VLSI Design I; A. Milenkovic •13
11/17/2003 VLSI Design I; A. Milenkovic 37
PPA General Structure
• Given P and G terms for each bit position, computing all the carries is equal to finding all the prefixes in parallel
(G0,P0) € (G1,P1) € (G2,P2) € … € (GN- 2,PN-2) € (GN-1,PN-1)• Since € is associative, we can group them in any order
– but note that it is not commutative
• Measures to consider– number of € cells– tree cell depth (time)– tree cell area– cell fan-in and fan-out– max wiring length– wiring congestion– delay path variation
(glitching)
Pi, Gi logic (1 unit delay)
Si logic (1 unit delay)
Ci parallel prefix logic tree (1 unit delay per level)
11/17/2003 VLSI Design I; A. Milenkovic 38
Brent-Kung PPA
Par
alle
l Pre
fix C
ompu
tatio
n €
G0P0
G1P1
G2p2
G3P3
G4P4
G5P5
G6P6
G7P7
G8P8
G9p9
G10P10
G11p11
G12P12
G13p13
G14p14
G15p15
€€€€€€€
€ € € €
€
€
€
€
€
€
€ € € € € €
€ €
C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16
Cin
€
T =
log 2
NT
= lo
g 2N
-2
A =
2lo
g 2N
A = N/2
11/17/2003 VLSI Design I; A. Milenkovic 39
Brent-Kung PPA
Par
alle
l Pre
fix C
ompu
tatio
n €
G0P0
G1P1
G2p2
G3P3
G4P4
G5P5
G6P6
G7P7
G8P8
G9p9
G10P10
G11p11
G12P12
G13p13
G14p14
G15p15
€€€€€€€
€ € € €
€
€
€
€
€
€
€ € € € € €
€ €
C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16
Cin
€
T =
log 2
NT
= lo
g 2N
-2
A =
2lo
g 2N
A = N/2
-
•VLSI Design I; A. Milenkovic •14
11/17/2003 VLSI Design I; A. Milenkovic 40
Kogge-Stone PPF Adder
Par
alle
l Pre
fix C
ompu
tatio
n €
G0P0
G1P1
G2P2
G3P3
G4P4
G5P5
G6P6
G7P7
G8P8
G9P9
G10P10
G11P11
G12P12
G13P13
G14P14
G15P15
€€€€€€€
€ € € €
€
€
€
€
C1C2C3C4C5C6C7C8C9C10C11C12C13C14C15C16
Cin
€
T =
log 2
N
A =
log 2
N
A = N
€€€€€€€
€ € € € € € € € € €
€ € € € € € € € € €
€ € € € € €
Tadd = tsetup + log2N t€ + tsum
11/17/2003 VLSI Design I; A. Milenkovic 41
More Adder Comparisons
0
10
20
30
40
50
60
70
8 bits 16 bits 32 bits 48 bits 64 bits
RCA
CSkAVSkAKS PPA
11/17/2003 VLSI Design I; A. Milenkovic 42
Adder Speed Comparisons
10
20
30
40
50
60
70
16 bits 32 bits 64 bits
RCAMCCCCSkAVCSkACCSlAB&K
-
•VLSI Design I; A. Milenkovic •15
11/17/2003 VLSI Design I; A. Milenkovic 43
Adder Average Power Comparisons
0
5
10
15
20
25
30
35
16 bits 32 bits 64 bits
RCAMCC
CCSkAVCSkACCSlAB&K
11/17/2003 VLSI Design I; A. Milenkovic 44
PDP of Adder Comparisons
0
20
40
60
80
100
8 bits 16 bits 32 bits 48 bits 64 bits
RCAMCCACCSkAVCSkACCSlABKA
From From NagendraNagendra, 1996, 1996