optimizing power @ design time – circuit-level ... · dd – slower functions are implemented...
TRANSCRIPT
Chapter 4
Optimizing Power @ Design Time – Circuit-Level Techniques
Jan M. Rabaey
Optimizing Power @ Design Time
Circuits
Dejan MarkovicBorivoje Nikolic
Slide 4.1
Chapter Outline
Optimization framework for energy–delay trade-offDynamic-power optimization – Multiple supply voltages– Transistor sizing– Technology mapping
Static-power optimization– Multiple thresholds– Transistor stacking
Slide 4.2
Energy/Power Optimization Strategy
For given function and activity, an optimal operation point can be derived in the energy–performance spaceTime of optimization depends upon activity profile Different optimizations apply to active and static power
Fixed Activity
Variable Activity
No Activity – Standby
ActiveDesign time Run time Sleep
Static
Slide 4.3
Maximize throughput for given energy orMinimize energy for given throughput
Delay
design
Emax
DmaxDmin
Energy/op
Emin
Energy–Delay Optimization and Trade-off
Trade-off space
Other important metrics: Area, Reliability, Reusability
Unoptimized
Slide 4.4
The Design Abstraction Stack
Logic/RT
(Micro-)Architecture
Software
Circuit
Device
System/Application
Thi
s C
hapt
er
A very rich set of design parameters to consider!It helps to consider options in relation to their abstraction layer
sizing, supply, thresholds
logic family, standard cell versus custom
Parallel versus pipelined, general purpose versus application-specific
Bulk versus SOI
Choice of algorithm
Amount of concurrency
Slide 4.5
Architecture
Micro-Architecture
Circuit (Logic & FFs)
Optimization Can/Must Span Multiple Levels
Design optimization combines top-down and bottom-up: “meet-in-the-middle”
Slide 4.6
topology A
DelayE
ner
gy/
op
Globally optimal energy–delay curve for a given function
Energy–Delay Optimization
topology B
topology A
topology B
Delay
En
erg
y/o
p
Slide 4.7
Some Optimization Observations
∂E/∂A∂D/∂A A=A0
SA =
SB
SA
f(A0,B)
f(A,B0)
Delay
En
erg
y
D0
(A0,B0)
Energy–Delay Sensitivities
[Ref: V. Stojanovic, ESSCIRC’02]
Slide 4.8
ΔE = SA · (−ΔD) + SB · ΔD
On the optimal curve, all sensitivities must be equal
Finding the Optimal Energy–Delay Curve
f(A0,B)
f(A,B0)
Delay
En
erg
y
D0
(A0,B0)
ΔD
f(A1,B)
Pareto-optimal:the best that can be achieved without disadvantaging at least one metric.
Slide 4.9
Reducing voltages– Lowering the supply voltage (VDD) at the expense of clock speed– Lowering the logic swing (Vswing)
Reducing transistor sizes (CL )– Slows down logic
Reducing activity (α)– Reducing switching activity through transformations– Reducing glitching by balancing logic
fVVCP DDswingLactive ⋅⋅⋅⋅~DDswingLactive VVCα
αE ⋅⋅⋅~
Reducing Active Energy @ Design Time
Slide 4.10
Downsizing and/or lowering the supply on the critical path lowers the operating frequencyDownsizing non-critical paths reduces energy for free, but– Narrows down the path–delay distribution– Increases impact of variations, impacts robustness
tp(path)
# of
pat
hs
targetdelay
# of
pat
hs
targetdelay
Observation
tp(path)
Slide 4.11
topology A
topology B
DelayE
ner
gy/
op
Reference case– Dmin sizing @ VDD max, VTH ref
minimize Energy (VDD, VTH, W )subject to Delay (VDD, VTH, W ) D≤ con
ConstraintsVDD min < VDD < VDD max
VTH min < VTH< VTH max
Wmin < W
Circuit Optimization Framework
[Ref: V. Stojanovic, ESSCIRC’02]
Slide 4.12
i i +1
CwCiC γi Ci +1
Optimization Framework: Generic Network
VDD i +1VDD i
Gate in stage i loaded by fan-out (stage i +1)
Slide 4.13
Fit parameters: Von, αd, K ,d γ
Alpha-Power Based Delay Model
VDD ref = 1.2 V, technology 90 nm
)1
1()()(
11
i
inom
i
iwi
onDD
DDd
C
C
C
CCCγγ γτ
VV
VKtp
++ ′⋅+=++
−=
0 2 4 6 8 100
10
20
30
40
50
60
Fan-out (Ci +1/Ci)
Del
ay (
ps)
tp
0.5 0.6 0.7 0.8 0.9 1 0
0.5
1
1.5
2
2.5
3
3.5
4
VDD
/VDD ref
FO
4 de
lay
(nor
m.)
Von = 0.37 Vαd = 1.53
simulationmodel
τnom= 6 psγ = 1.35
simulationmodel
αd
Slide 4.14
Parasitic delay pi –
≈
depends upon gate topology
Electrical effort f i S i+1/S i
Logical effort gi – depends upon gate topology
Effective fan-out hi = fi gi
For Complex Gates
Combined with Logical-Effort Formulation
)( iiinom
gfptp τ γ+=
[Ref: I. Sutherland, Morgan-Kaufman’99]
Slide 4.15
= energy consumed by logic gate i
Dynamic Energy
i i +1
CwCiCi Ci+1
VDD,i +1VDD,i
iiiiwiiei
iDDiiiDDiwidyn
SSCCCfSKC
VfCV γγ
γ
CCCE
//)(
)()(
11
2,
2,1
++
+
′=+=′=
⋅′+=⋅++=
)( 2,
21, iDDiDDiei VVSKE += −
γ
Slide 4.16
∞ for equal h
(Dmin)
max at VDD(max)
(Dmin)
Depends on Sensitivity (∂E /∂D)
Optimizing Return on Investment (ROI)
Gate Sizing
Supply Voltage
)( 1−−−=
∂∂
∂∂
iinom
i
i
i
hh
E
τ
α
SD
SE
DD
ond
DD
on
DD
DD
V
VV
V
D
E
VD
VE
+−
−⋅⋅−=
∂∂
∂∂
1
)1(2
Slide 4.17
Properties of inverter chain– Single path topology– Energy increases geometrically from input to output
Example: Inverter Chain
CL
1
S1 = 1 S2 … SNS3
Goal– Find optimal sizing S = [S1, S2, …, SN ], supply voltage, and
buffering strategy to achieve the best energy–delay trade-off
Slide 4.18
Variable taper achieves minimum energy
[Ref: Ma, JSSC’94]
Inverter Chain: Gate Sizing
1 2 3 4 5 6 70
5
10
15
20
25
stage
effe
ctiv
e fa
n-ou
t, h
0%
1%
10%
30%
Dinc
= 50%nomopt
1
21
112
21
−
−
+−
−∝
⋅⋅⋅−=μ τ
+ μ⋅=
ii
iS
Snom
DDe
i
iii
hh
EF
FVK
S
SSS
Slide 4.19
VDD reduces energy of the final load first
Variable taper achieved by voltage scaling
Inverter Chain: VDD Optimization
1 2 3 4 5 6 70
0.2
0.4
0.6
0.8
1.0
stage
VD
D/V
DD
nom
0%
1%
10%
30%
Dinc
= 50%
nomopt
Slide 4.20
Parameter with the largest sensitivity has the largest potential for energy reductionTwo discrete supplies mimic per-stage V DD
Inverter Chain: Optimization Results
500 10 20 30 400
20
40
60
80
100
incD (%)
ener
gy r
educ
tion
(%)
0 10 20 30 40 500
0.2
0.4
0.6
0.8
1.0
Dinc
(%)
Sen
sitiv
ity (
norm
)
cVDD
SgV DD
2V DD
Slide 4.21
Tree adder– Long wires– Reconvergent paths– Multiple active outputs
(A0, B0)
Example: Kogge–Stone Tree Adder
Cin
(A15, B15)
S0
S15
[Ref: P. Kogge, Trans. Comp’73]
Slide 4.22
sizing: E (–54%)D = 10%
referenceD = D
Dual VDD : E (–27%)D = 10%
Tree Adder: Sizing vs. Dual-VDD Optimization
Reference design: all paths are critical
Internal energy ⇒ S more effective than V DD– S: E(–54%), Dual VDD: E(–27%) at D inc = 10%
incmin inc
10080604020
ener
gy
bit slice
stage63 47 31 15 0 13
57
9
10080604020
ener
gy
bit slicesta
ge63 47 31 15 0 13
57
9
10080604020
ener
gy
bit slice
stage63 47 31 15 0 1
35
79
Slide 4.23
Tree Adder: Multi-dimensional Search
Can get pretty close to optimum with only two variablesGetting the minimum speed or delay is very expensive
En
erg
y/E
ref
Delay/Dmin
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
0.2
0.4
0.6
0.8
1Reference
S, V DD
VDD, VTH
S, V TH
S, V DD, V TH
Slide 4.24
Block-level supply assignment– Higher-throughput/lower-latency functions are
implemented in higher VDD
– Slower functions are implemented with lower VDD
– This leads to so-called voltage islands with separate supply grids
– Level conversion performed at block boundaries
Multiple supplies inside a block– Non-critical paths moved to lower supply voltage– Level conversion within the block– Physical design challenging
Multiple Supply Voltages
Slide 4.25
V1 = 1.5V, VTH = 0.3V
Using Three VDD’s
+
V2 (V)
V3
(V)
0.4 0.6 0.8 1 1.2 1.4
0.4
0.6
0.8
1
1.2
1.4
V2 (V)V
3 (V)
Po
wer
Red
uct
ion
Rat
io
00.5
11.5
0
0.51
1.50.4
0.5
0.6
0.7
0.8
0.9
1
[Ref: T. Kuroda, ICCAD’02]
Slide 4.26
1.0
0.5
VD
D R
atio
1.0
0.4
0.5 1.0 1.5V1 (V)
P R
atio
V2 /V1
P2 /P1
{ V1, V2 }
V2 /V1
V3 /V1
{ V1, V2, V3 }
0.5 1.0 1.5V1 (V)
P3 /P1
V2 /V1
V3 /V1
V4 /V1
0.5 1.0 1.5V1 (V)
P4 /P1
{ V1, V2, V3, V4 }
Optimum Number of VDDs
The more the number of VDD s the less the power, but the effect saturates
Power reduction effect decreases with scaling of VDD
Optimum V2 /V1 is around 0.7
© IEEE 2001
[Ref: M. Hamada, CICC’01]
Slide 4.27
Two supply voltages per block are optimal
Optimal ratio between the supply voltages is 0.7
Level conversion is performed on the voltage boundary, using a level-converting flip-flop (LCFF)
An option is to use an asynchronous level converter– More sensitive to coupling and supply noise
Lessons: Multiple Supply Voltages
Slide 4.28
i1 o1
VDDHVDDL
VSS
Conventional
VDDH circuit V DDL circuit
i2 o2
V DDH
V DDL
V SS
Shared n-well
VDDH circuit VDDL circuit
Distributing Multiple Supply Voltages
i2 o2
i1 o1
Slide 4.29
V DDH circuit
VDDH V DDL
VSS
n-well isolation
V DDL circuit
(a) Dedicated row
(b) Dedicated region
VDDH Row
VDDH Row
VDDH
RegionVDDL
Region
Conventional
VDDL Row
VDDL Row
Slide 4.30
VDDH circuit
V DDH
VDDL
VSS
Shared n-well
VDDL circuit
(a) Floor plan image
V DDL circuit
V DDH circuit
Shared n-Well
[Shimazaki et al., ISSCC’03]
Slide 4.31
Lower VDD portion is shared“Clustered voltage scaling”
Example: Multiple Supplies in a Block
FF
FF
FF
FFFF
FF
FF
FF
FF
FF
CVS StructureConventional Design
Critical Path
Level-Shifting FF
Critical Path
FF
FF
FF
FF
FF
FF FF
FF
FF
FF
FF
[Ref: M. Takahashi, ISSCC’98]
© IEEE 1998
Slide 4.32
Pulsed Half-Latch versus Master–Slave LCFFsSmaller # of MOSFETs/clock loadingFaster level conversion using half-latch structureShorter D–Q path from pulsed circuit
Level-Converting Flip-Flops (LCFFs)
q
ck
ckb ckclk
level conversion
ckb
ckd q (inv.)
ck
ckclk
level conversion
dmo
mf
sfso db
sfso
MN1 MN2
Master–Slave Pulsed Half-Latch
© IEEE 2003
[Ref: F. Ishihara, ISLPED’03]
Slide 4.33
Dynamic Realization of Pulsed LCFF
Pulsed precharge LCFF (PPR)– Fast level conversion by
precharge mechanism– Suppressed
charge/discharge toggle by conditional capture
– Short D–Q path
Pulsed Precharge Latch
clk
ckd1
qb
clk level conversion
x
db
qb
ckd1
VDDH
VDDH
VDDH
d
xb
IV1
q (inv.)
ck
MN1
MN2
MP1
[Ref: F. Ishihara, ISLPED’03]© IEEE 2003
Slide 4.34
carrygen.
partialsum
gpgen.
5:1MUX
ain
bin
carry
s0/s1
sum
sumb (long loop-back bus)
clk
clock gen.
: V DDH circuit
: V DDL circuit
INV1INV2
0.5 pF
sumsel.
2:1MUX
9:1MUX
logicalunit
9:1MUX
ain0
Case Study: ALU for 64-bit Microprocessor
[Ref: Y. Shimazaki, ISSCC’03]© IEEE 2003
Slide 4.35
sum
keeperpc
sumb
VDDH
VDDL
INV1 INV2
domino level converter (9:1 MUX)
ain0sel(VDDH)
VDDH
VDDL
INV2 is placed near 9:1 MUX to increase noise immunityLevel conversion is done by a domino 9:1 MUX
Low-Swing Bus and Level Converter
[Ref: Y. Shimazaki, ISSCC’03]
© IEEE 2003
Slide 4.36
[Ref: Y. Shimazaki, ISSCC’03]
Single-supplyShared-well(VDDH=1.8 V)E
nerg
y [p
J]
TCYCLE [ns]
Room temperature
200
300
400
500
600
700
800
0.6 0.8 1.0 1.2 1.4 1.6
1.16 GHz
VDDL=1.4 VEnergy:–25.3% Delay :+2.8%
VDDL=1.2 VEnergy:–33.3% Delay :+8.3%
Measured Results: Energy and Delay
© IEEE 2003
Slide 4.37
Practical Transistor Sizing
Continuous sizing of transistors only an option in custom design
In ASIC design flows, options set by available library
Discrete sizing options made possible in standard-cell design methodology by providing multiple options for the same cell– Leads to larger libraries (> 800 cells)– Easily integrated into technology mapping
Slide 4.38
(a) Implemented using four-input NAND + INV(b) Implemented using two-input NAND + two-input NOR
Library 1: High-Speed
Technology Mapping
Example: four-input AND
Gatetype
Area (cell unit)
Input cap. (fF)
Average delay (ps)
Average delay (ps)
INV 3 1.8 7.0 + 3.8C L 12.0 + 6.0C L
NAND2 4 2.0 10.3 + 5.3C L 16.3 + 8.8CL
NAND4 5 2.0 13.6 + 5.8C L 22.7 + 10.2CL
NOR2 3 2.2 10.7 + 5.4C L 16.7 + 8.9CL
Library 2: Low-Power
(delay formula: C
(numbers calibrated for 90 nm)L in fF)
Slide 4.40
Technology Mapping – Example
four-input AND (a) NAND4 + INV
(b) NAND2 + NOR2
Area 8 11
HS: Delay (ps) 31.0 + 3.8CL
53.1 + 6.0CL
0.1 + 0.06CL
32.7 + 5.4CL
LP: Delay (ps) 52.4 + 8.9CL
Sw Energy (fF) 0.83 + 0.06CL
Area– Four-input more compact than two-input (two gates vs three gates)
Timing– Both implementations are two-stage realizations– Second-stage INV (a) is better driver than NOR2 (b)– For more complex blocks, simpler gates will show better
performanceEnergy– Internal switching increases energy in the two-input case– Low-power library has worse delay, but lower leakage (see later)
Slide 4.41
Technology mappingGate selectionSizingPin assignment
Logical OptimizationsFactoring
Restructuring
Buffer insertion/deletion
Don’t - care optimization
Gate-Level Trade-offs for Power
Slide 4.42
Logic restructuring to minimize spurious transitions
Buffer insertion for path balancing
Logic Restructuring
01
1
1
0
1
1
1
0
1 1
1
1
1
1
111
2
3
Slide 4.43
Idea: Modify network to reduce capacitance
Caveat: This may increase activity!
pa pb= 0.1; = 0.5; pc = 0.5
Algebraic Transformations Factoring
a
bc
ff
a
a
b
c
p1 = 0.051
p2 = 0.051
p3 = 0.076
p4 = 0.375
p5 = 0.076
Slide 4.44
Energy-efficient design
Joint optimization over multiple design parameters possible using sensitivity-based optimization framework– Equal marginal costs ⇔
Peak performance is VERY power inefficient– About 70% energy reduction for 20% delay penalty– Additional variables for higher energy-efficiency
Two supply voltages in general sufficient; three or more supply voltages only offer small advantage
Choice between sizing and supply voltage parameters depends upon circuit topology
But … leakage not considered so far
Lessons from Circuit Optimization
Slide 4.45
Considering leakage as well as dynamic
power is essential in sub-100 nm
technologies
Leakage is not essentially a bad thing
– Increased leakage leads to improved
performance, allowing for lower supply voltages
– Again a trade-off issue …
Considering Leakage at Design Time
Slide 4.46
Must adapt to process and activity variations
( ) 2
αln
lk sw optd
avg
E EL
K
=
−
Topology Inv Add Dec
(E lk /Esw)opt 0.8 0.5 0.2
Leakage – Not Necessarily a Bad Thing
Optimal designs have high leakage (Elk /Esw 0.5)≈
10–2
10–1
100
101
0
0.2
0.4
0.6
0.8
1
Estatic /Edynamic
Eno
rm
VTHref-180 mV
0.81VDDmax
VTHref-140 mV
0.52VDDmax
Version 1
Version 2
[Ref: D. Markovic, JSSC’04]
© IEEE 2004
Slide 4.47
Switching energy
Leakage energy
with:I0(Ψ): normalized leakage current with inputs in state Ψ
Refining the Optimization Model
210 )( DDedyn VfSKE += →
cycleDDqkT
VV
stat TVeSIEDDdTH
/0 )(
+−
Ψ=
α
λ
γ
Slide 4.48
Using longer transistors– Limited benefit– Increase in active current
Using higher thresholds– Channel doping– Stacked devices– Body biasing
Reducing the voltage!!
Reducing Leakage @ Design Time
Slide 4.49
10% longer gates reduce leakage by 50%Increases switching power by 18% with W/L = constant
Doubling L reduces leakage by 5xImpacts performance
– Attractive when not required to increase W (e.g., memory)
Longer Channels
100 110 120 130 140 150 160 170 180 190 2000.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Transistor length (nm)
1
2
3
4
5
6
7
8
9
10
90 nm CMOS
Switching energy
Leakage power
Nor
mal
ized
sw
itchi
ng e
nerg
y
Nor
mal
ized
leak
age
pow
er
Slide 4.50
There is no need for level conversion
Dual thresholds can be added to standard design flows– High-VTH and Low-VTH libraries are a standard in sub-0.18 μm
processes– For example: can synthesize using only high-VTH and then simply
swap-in low-VTH cells to improve timing.– Second VTH insertion can be combined with resizing
Only two thresholds are needed per block– Using more than two yields small improvements
Using Multiple Thresholds
Slide 4.51
Three VTH’s
VDD = 1.5 V, VTH.1 = 0.3 V
+
VTH.3(V)
VT
H.2
(V)
0.4 0.6 0.8 1 1.2 1.4
0.4
0.6
0.8
1
1.2
1.4
Lea
kag
e R
edu
ctio
n R
atio
VTH.3(V)
VTH.2 (V )
00.5
11.5
0
11.5
0.5
0
0.2
0.4
0.6
0.8
1
Impact of third threshold very limited
[Ref: T. Kuroda, ICCAD’02]
Slide 4.52
Using Multiple Thresholds
FF
FF
FF
FF
FF
Cell-by-cell VTH assignment (not at block level)Achieves all-low-VTH performance with substantial reduction in leakage
Low VTHHigh VTH
[Ref: S. Date, SLPE’94]
Slide 4.53
Shaded transistors are low-threshold
Low-threshold transistors used only in critical paths
Dual-VTH Domino
P1
Inv1
Inv2 Inv3
Dn+1
Clkn
Clkn+1
Dn …
Slide 4.54
Easily introduced in standard-cell design methodology by extending cell libraries with cells with different thresholds– Selection of cells during technology mapping– No impact on dynamic power– No interface issues (as was the case with multiple
VDDs)
Impact: Can reduce leakage power substantially
Multiple Thresholds and Design Methodology
Slide 4.55
High-VTHOnly
Low-VTH Only
Dual-VTH
Total Slack –53 ps 0 ps 0 ps
Dynamic Power
3.2 mW 3.3 mW 3.2 mW
Static Power
914 nW 3873 nW 1519 nW
All designs synthesized automatically using Synopsys Flows
Dual-VTH for High-Performance Design
[Courtesy: Synopsys, Toshiba, 2004]
Slide 4.56
Example: High- vs. Low-Threshold LibrariesLe
akag
e P
ower
(nW
)
Selected combinational tests130 nm CMOS
TH
TH
TH
TH
[Courtesy: Synopsys 2004]
TH
TH
Slide 4.57
Complex Gates Increase Ion /Ioff Ratio
Ion and Ioff of single NMOS versus stack of 10 NMOS transistorsTransistors in stack are sized up to give similar drive
No stack
Stack
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
VDD (V)
I off
(nA
)
No stack
Stack
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
20
40
60
80
100
120
140
I on
(μA
)VDD (V)
(90 nm technology) (90 nm technology)
Slide 4.58
Complex Gates Increase Ion/Ioff Ratio
Stacking transistors suppresses submicron effectsReduced velocity saturationReduced DIBL effectAllows for operation at lower thresholds
Stack
No stack
Factor 10!
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10
0.5
1
1.5
2
2.5
3
3.5× 105
VDD (V)
I on
/Io
ffra
tio
(90 nm technology)
Slide 4.59
Example: four-input NAND
With transistors sized for similar performance:Leakage of Fan-in(2) =
Leakage of Fan-in(4) x 3(Averaged over all possible input patterns)
Fan-in(2)Fan-in(4)
versus
Complex Gates Increase Ion /Ioff Ratio
2 4 6 8 10 12 14 160
2
4
6
8
10
12
14
Input pattern
Lea
kag
e C
urr
ent
(nA
)Fan-in(2)
Fan-in(4)
Slide 4.60
[Ref: S. Narendra, ISLPED’01]
Example: 32-bit Kogge–Stone Adder
H HV V
% o
f in
pu
t ve
cto
rs
Standby leakage current (μμA)
factor 18
Reducing the threshold by 150 mV increases leakage of single NMOS transistor by a factor of 60
© Springer 2001
Slide 4.61
Circuit optimization can lead to substantial energy reduction at limited performance lossEnergy–delay plots are the perfect mechanisms for analyzing energy–delay trade-offsWell-defined optimization problem over W, VDD and VTH parametersIncreasingly better support by today’s CAD flowsObserve: leakage is not necessarily bad – if appropriately managed
Summary
Slide 4.62
Books:A. Bellaouar and M.I Elmasry, Low-Power Digital VLSI Design Circuits and Systems, Kluwer Academic Publishers, 1st ed, 1995.D. Chinnery and K. Keutzer, Closing the Gap Between ASIC and Custom, Springer, 2002. D. Chinnery and K. Keutzer, Closing the Power Gap Between ASIC and Custom, Springer, 2007. J. Rabaey, A. Chandrakasan and B. Nikolic, Digital Integrated Circuits: A Design Perspective, 2nd ed, Prentice Hall 2003.I. Sutherland, B. Sproul and D. Harris, Logical Effort: Designing Fast CMOS Circuits,Morgan- Kaufmann, 1st ed, 1999.
Articles:R.W. Brodersen, M.A. Horowitz, D. Markovic, B. Nikolic and V. Stojanovic, “Methods for True Power Minimization,” Int. Conf. on Computer-Aided Design (ICCAD), pp. 35–42, Nov. 2002.S. Date, N. Shibata, S. Mutoh, and J. Yamada, "1-V 30-MHz Memory-Macrocell-Circuit Technology with a 0.5 gm Multi-Threshold CMOS," Proceedings of the 1994 Symposium on Low Power Electronics, San Diego, CA, pp. 90–91, Oct. 1994.M. Hamada, Y. Ootaguro and T. Kuroda, “Utilizing Surplus Timing for Power Reduction,” IEEE Custom Integrated Circuits Conf., (CICC), pp. 89–92, Sept. 2001.F. Ishihara, F. Sheikh and B. Nikolic, “Level Conversion for Dual-Supply Systems,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 164–167, Aug. 2003.P.M. Kogge and H.S. Stone, “A Parallel Algorithm for the Efficient Solution of General Class of Recurrence Equations,” IEEE Trans. Comput., C-22(8), pp. 786–793, Aug 1973. T. Kuroda, “Optimization and control of VDD and VTH for Low-Power, High-Speed CMOS Design,”Proceedings ICCAD 2002, San Jose, Nov. 2002.
References
Slide 4.63
Articles (cont.):H.C. Lin and L.W. Linholm, “An optimized output stage for MOS integrated circuits,” IEEE Journal of Solid-State Circuits, SC-102, pp. 106–109, Apr. 1975. S. Ma and P. Franzon, “Energy control and accurate delay estimation in the design of CMOS buffers,” IEEE Journal of Solid-State Circuits, (299), pp. 1150–1153, Sep. 1994.D. Markovic, V. Stojanovic, B. Nikolic, M.A. Horowitz and R.W. Brodersen, “Methods for true energy- Performance Optimization,” IEEE Journal of Solid-State Circuits, 39(8), pp. 1282–1293, Aug. 2004.MathWorks, http://www.mathworks.comS. Narendra, S. Borkar, V. De, D. Antoniadis and A. Chandrakasan, “Scaling of stack effect and its applications for leakage reduction,” Int. Conf. Low Power Electronics and Design, (ISLPED), pp. 195–200, Aug. 2001.T. Sakurai and R. Newton, “Alpha-power law MOSFET model and its applications to CMOS inverter delay and other formulas,” IEEE Journal of Solid-State Circuits, 25(2),pp. 584–594, Apr. 1990.Y. Shimazaki, R. Zlatanovici and B. Nikolic, “A shared-well dual-supply-voltage 64-bit ALU,” Int. Conf. Solid-State Circuits, (ISSCC), pp. 104–105, Feb. 2003.V. Stojanovic, D. Markovic, B. Nikolic, M.A. Horowitz and R.W. Brodersen, “Energy-delay tradeoffs in combinational logic using gate sizing and supply voltage optimization,” European Solid- State Circuits Conf., (ESSCIRC), pp. 211–214, Sep. 2002.M. Takahashi et al., “A 60mW MPEG video codec using clustered voltage scaling with variable supply-voltage scheme,” IEEE Int. Solid-State Circuits Conf., (ISSCC), pp. 36–37, Feb. 1998.
References
Slide 4.64