ee241 - spring 2011bwrcs.eecs.berkeley.edu/classes/icdesign/ee241_s11/...carry tree considerations...
TRANSCRIPT
-
1
EE241 Spring 2011EE241 - Spring 2011Advanced Digital Integrated Circuits
Lecture 23: Wrap-up
AnnouncementsHomework #4 due todayQuiz #4 todayFinal exam on Wedensday!
80 minutes, in classProject reports due next Wednesday, May 4, noon
6 pages, double columnProject presentations next Wednesday May 4 at 2pm in
2
Project presentations next Wednesday, May 4, at 2pm in BWRC
20min + 5 min Q&A
-
2
OutlineLast lecture
Other dynamic logic stylesAdders: Conditional sum and carry-lookahead
This lectureFinish addersPerspective
Reading: Selected publications
3
g p
AddersAdders
-
3
Carry Tree ConsiderationsNumber of signals merging at each stage (radix)
Uniform vs. non-uniformNumber of logic levels
Full vs. sparse trees
5
Tree Adders: Kogge-Stone
S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10
S 11
S 12
S 13
S 14
S 15
6
16-bit radix-2 Kogge-Stone Tree
(A0,
B0)
(A1,
B1)
(A2,
B2)
(A3,
B3)
(A4,
B4)
(A5,
B5)
(A6,
B6)
(A7,
B7)
(A8,
B8)
(A9,
B9)
(A10
, B10
)
(A11
, B11
)
(A12
, B12
)
(A13
, B13
)
(A14
, B14
)
(A15
, B15
)
-
4
Tree Adders: Other TreesLadner-Fischer
S 0 S 1 S 2 S 3 S 4 S 5 S 6 S 7 S 8 S 9 S 10
S 11
S 12
S 13
S 14
S 15
S S S S S S S S S S S S S S S S
7
(A0,
B0)
(A1,
B1)
(A2,
B2)
(A3,
B3)
(A4,
B4)
(A5,
B5)
(A6,
B6)
(A7,
B7)
(A8,
B8)
(A9,
B9)
(A10
, B10
)
(A11
, B11
)
(A12
, B12
)
(A13
, B13
)
(A14
, B14
)
(A15
, B15
)
Tree Adders: Radix 4
S0
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
S13
S14
S15
8
(a0,
b 0)
(a1,
b 1)
(a2,
b 2)
(a3,
b 3)
(a4,
b 4)
(a5,
b 5)
(a6,
b 6)
(a7,
b 7)
(a8,
b 8)
(a9,
b 9)
(a10
, b10
)
(a11
, b11
)
(a12
, b12
)
(a13
, b13
)
(a14
, b14
)
(a15
, b15
)
16-bit radix-4 Kogge-Stone Tree
-
5
Ling Adder
CLA Ling’s equations
:0 1:0
1:0
i i i
i i i
i i i i
i i i i
g a bp a bG g p GS a b G
:0 1 1:0
:0 1 1:0
i i i
i i i
i i i i
i i i i i i
g a bt a bH g t HS t H g t H
9Ling, IBM J. Res. Dev, 5/81
Ling Adder
G g p g p p g p p p g Conventional radix-4
3:0 3 3 2 3 2 1 3 2 1 0G g p g p p g p p p g
3:0 3 2 2 2 1 1 2 1 0 0
3 2 2 1 2 1 0
H g t g t t g t t t gg g t g t t g
Ling’s radix-4
10
Reduces the stack height (or width)Reduces input loading
-
6
Ling vs. CLAConventional G3
Ling’s H3
C K
a3
b3
a3 b3
a2
b2
a2
a1
b2
a1 b1
G 3
CK
a3
b3
a2 a2
b2 a1
b1
b1
a0
H3
b2
a1
11
b1 a0
b0
b1 a0
b0
Ling vs. CLA: Sum Pre-Computation
Conventional CLA Ling’s
0
1
i i i
i i i
S a b
S a b
0
11 1
i i i
i i i i i
S a b
S a b a b
12
-
7
Ling vs. CLA (64 bit)
44
49
Radix-2 Ling0.5 FO4
1 1
19
24
29
34
39
Ener
gy [p
J]
Radix-4 LingRadix-2 CLARadix-4 CLA
0.5 FO4
1
3 234
13
4
9
14
7 9 11 13 15
Delay [FO4]
2 4
Ling vs. CLATradeoff between the first carry and the sum circuit complexity
Later carry stages are unchanged from conventional CLAReducing the input loading and smaller stack speed up the carryReducing the input loading and smaller stack speed up the carry
Sum gets more complexWith tight power constraints Ling is slower than CLA
14
-
8
Sparse TreesNot all the carries are calculated
Only every 2nd or 4th
R d d i t itReduced input capacitanceMore complex sum
Sparseness of 2 doubles the sum select loadingEffectively shifting the fanout towards back
15
Sparse Trees
S1
S3
S5
S7
S9
S11
S13
S15
S0
S2
S4
S6
S8
S10
S12
S14
16
(a0,
b0)
(a1,
b1)
(a2,
b2)
(a3,
b3)
(a4,
b4)
(a5,
b5)
(a6,
b6)
(a7,
b7)
(a8,
b8)
(a9,
b9)
(a10
, b10
)
(a11
, b11
)
(a12
, b12
)
(a13
, b13
)
(a14
, b14
)
(a15
, b15
)
16-bit radix-2 sparse Kogge-Stone tree with sparseness of 2 (Han-Carlson)
-
9
Sparse TreesPrecomputed sums for sparse Ling adderEven bit sums unchangedOdd bit lOdd bit sums complex
2211111
110
iiiiiiiii
iiiii
babababaS
babaS
17
Sparse TreesLadner-Fischer
18
-
10
Sparse TreesFull trees
19
24
29
pJ]
1-1-1-1-1-12-2-2-2-2-14-4-4-4-2-18-8-8-4-2-116-16-8-4-2-132-16-8-4-2-1
123456
1 2 3
Lateral fanout
Sparse-2 trees
19
24
29
[pJ]
1-1-1-1-1-12-2-2-2-2-14-4-4-4-2-18-8-8-4-2-116-16-8-4-2-132-16-8-4-2-1
123456
1 2 3 4
5
4
9
14
19
7 9 11 13 15Delay [FO4]
Ener
gy [p
456
4
9
14
7 9 11 13 15Delay [FO4]
Ener
gy
Lateral fanout
6
Sparse-4 trees
29
1-1-1-1-1-12-2-2-2-2-14 4 4 4 2 1
123
1
194
9
14
19
24
7 9 11 13 15Delay [FO4]
Ene
rgy
[pJ]
4-4-4-4-2-18-8-8-4-2-116-16-8-4-2-132-16-8-4-2-1
3456
2
45
6
3
Lateral fanout
Zlatanovici, JSSC’09
Other Sparse Trees
20Mathew, VLSI’02
-
11
Intel’s 65nm 32b ALU
WijeratneISSCC’06
21
Radix-2 carry tree generates every fourth carry 73% fewer carry-merge gates 80% reduction in wiring complexity
vs. dense parallel-prefix
adders
Grouping Gates
29
Radix-4 sparse-2 Radix-4 Ladner-Fischer sparse-4
14
19
24
29
Ene
rgy
[pJ]
Radix-2 full
Radix-4 full
Radix-2 Ladner-Fischer full
g = grouped sizing f = flat sizing
gg
g
f fg
f
22
4
9
7 8 9 10 11 12 13 14 15Delay [FO4]
g
f f
-
12
Sparse Trees
1-1-1-1-1-1 trees
24
29FullSparse-2Sparse-4
123
32-16-8-4-2-1 trees
24
29
FullSparse-2Sparse-4
123
4
9
14
19
24
7 9 11 13 15Delay [FO4]
Ener
gy [p
J] Sparseness
1
23
4
9
14
19
24
7 9 11 13 15Delay [FO4]
Ener
gy [p
J]
Sparseness
1
2 3
23
What is the fastest 64-b adder?
32-Bit Adders
24Patil, ARITH’07
-
13
Hybrid Adders
25Dobberpuhl, JSSC 11/92 DEC Aplha 21064
DEC AdderCombination:
8-bit tapered pre-discharged Manchester carry chains, with Cin = 0 d C 1and Cin = 1
32-bit LSB carry-lookahead32-bit MSB conditional sum adderCarry-select on most significant bitsLatch-based timing
26
-
14
Another DEC Adder
27
Propagate-kill cell
Group propagate-kill Kowaleski, ISSCC’96
This ClassTried to put design choices in perspective of technologyThe design constraints have changed and will be changing
Cost, energy, (power, leakage, …), performanceStressed on variability, power-performance tradeoffsDid not cover multipliers, power regulation/distribution
28
p , p g(class projects), I/O
-
15
This FieldMoore’s law will end sometime during your (my?) career
28nm in 2011 scales to 0.1nm by 2050 with 2-yr cycles (or to 1nm ith 4 l )with 4-yr cycles)
Physics will stop CMOS somewhere around 5nmWe will see a different CMOS device beforehand
Economics will likely stop it earlierAnd the nodes will be stretched out
29
Don’t worry: There is plenty of problems that we don’t know how to solve today, and they will be around for a while!
Even filling 10B/100B/1 trillion transistor chips with SRAM is not trivial!
Technology Strategy / Roadmap 2000 2005 2010 2015 2020 2025 2030
Plan A: Extending Si CMOSPlan A: Extending Si CMOS
Plan B: Subsytem IntegrationPlan B: Subsytem Integration
R D
R D
30
Plan C: Post Si CMOS Options Plan C: Post Si CMOS Options
R R&D
Plan Q:Plan Q:
R D
Quantum ComputingQuantum Computing
T.C. Chen, Where Si-CMOS is going: Trendy Hype vs. Real Technology, ISSCC’06