vlsi digital circuits
TRANSCRIPT
-
7/30/2019 VLSI Digital Circuits
1/22
CSE477 L26 System Power.1 Irwin&Vijay, PSU, 2002
CSE477
VLSI Digital CircuitsFall 2002
Lecture 26: Low Power Techniques
inMicroarchitectures and Memories
Mary Jane Irwin ( www.cse.psu.edu/~mji )www.cse.psu.edu/~cg477
[Adapted from Rabaeys Digital Integrated Circuits, 2002, J. Rabaey et al.]
http://www.cse.psu.edu/~mjihttp://www.cse.psu.edu/~cg477http://www.cse.psu.edu/~cg477http://www.cse.psu.edu/~mji -
7/30/2019 VLSI Digital Circuits
2/22
CSE477 L26 System Power.2 Irwin&Vijay, PSU, 2002
Review: Energy & Power Equations
E = CL VDD2 P01 + tsc VDD Ipeak P01 + VDD Ileakage
P = CL VDD2 f01 + tscVDD Ipeak f01 + VDD Ileakage
Dynamic power
(~90% today anddecreasingrelatively)
Short-circuit
power(~8% today and
decreasingabsolutely)
Leakage power
(~2% today andincreasing)
f01 = P01 * fclock
-
7/30/2019 VLSI Digital Circuits
3/22
CSE477 L26 System Power.3 Irwin&Vijay, PSU, 2002
Power and Energy Design Space
ConstantThroughput/Latency
VariableThroughput/Latency
Energy Design Time Non-active Modules Run Time
Active Logic Design
Reduced Vdd
Sizing
Multi-Vdd
Clock Gating DFS, DVS
(Dynamic Freq,VoltageScaling)
Leakage + Multi-VT Sleep Transistors
Multi-Vdd
Variable VT
+ Variable VT
-
7/30/2019 VLSI Digital Circuits
4/22
CSE477 L26 System Power.4 Irwin&Vijay, PSU, 2002
Bus Multiplexing
Share long data buses with time multiplexing (S1 uses even
cycles, S2 odd)
S2
S1D1
D2
S1
S2 D2
D1
Buses are a significant source of power dissipation due tohigh switching activities and large capacitive loadingq 15% of total power in Alpha 21064
q 30% of total power in Intel 80386
But what if data samples are correlated (e.g., sign bits)?
-
7/30/2019 VLSI Digital Circuits
5/22
CSE477 L26 System Power.5 Irwin&Vijay, PSU, 2002
Correlated Data Streams
0
0.5
1
14 12 10 8 6 4 2 0
Muxed
Dedicated
Bit positionMSB LSB
Bitswitching
probabilities
For a shared (multiplexed)
bus advantages of datacorrelation are lost (buscarries samples from twouncorrelated datastreams)q Bus sharing should not be
used forpositivelycorrelated data streams
q Bus sharing may proveadvantageous in a
negatively correlated datastream (where successivesamples switch sign bits) -more random switching
-
7/30/2019 VLSI Digital Circuits
6/22
CSE477 L26 System Power.6 Irwin&Vijay, PSU, 2002
Glitch Reduction by Pipelining Glitches depend on the logic depth of the circuit - gates
deeper in the logic network are more prone to glitchingq arrival times of the gate inputs are more spread due to delay
imbalancesq usually affected more by primary input switching
Reduce logic depth by adding pipeline registers
q additional energy used by the clock and pipeline registers
PC
Fetch Decode Execute Memory WriteBack
Instruction
MA
R
MD
R
I$ D$
clk
pipelinestage
isolationregister
-
7/30/2019 VLSI Digital Circuits
7/22CSE477 L26 System Power.7 Irwin&Vijay, PSU, 2002
Power and Energy Design Space
ConstantThroughput/Latency
VariableThroughput/Latency
Energy Design Time Non-active Modules Run Time
Active Logic Design
Reduced Vdd
Sizing
Multi-Vdd
Clock Gating DFS, DVS
(Dynamic Freq,VoltageScaling)
Leakage + Multi-VT Sleep Transistors
Multi-Vdd
Variable VT
+ Variable VT
-
7/30/2019 VLSI Digital Circuits
8/22CSE477 L26 System Power.8 Irwin&Vijay, PSU, 2002
Clock Gating
Gate off clock to idle functionalunitsq
e.g., floating point unitsq need logic to generate
disablesignal
- increases complexity of control logic
- consumes power
- timing critical to avoid clock glitchesat
OR gate output
q additional gate delay on clock signal
- gating OR gate can replace a buffer
in the clock distribution tree
Most popular method for power reduction of clock signalsand functional units
Reg
clockdisable
Functional
unit
-
7/30/2019 VLSI Digital Circuits
9/22CSE477 L26 System Power.9 Irwin&Vijay, PSU, 2002
Clock Gating in a Pipelined Datapath
For idle units (e.g., floating point units in Exec stage, WBstage for instructions with no write back operation)
PC
Fetch Decode Execute Memory WriteBack
Instruction
MAR
MDRI$ D$
clk
No FP No WB
-
7/30/2019 VLSI Digital Circuits
10/22CSE477 L26 System Power.10 Irwin&Vijay, PSU, 2002
Power and Energy Design Space
ConstantThroughput/Latency
VariableThroughput/Latency
Energy Design Time Non-active Modules Run Time
Active Logic Design
Reduced Vdd
Sizing
Multi-Vdd
Clock Gating DFS, DVS
(Dynamic Freq,VoltageScaling)
Leakage + Multi-VT Sleep Transistors
Multi-Vdd
Variable VT
+ Variable VT
-
7/30/2019 VLSI Digital Circuits
11/22CSE477 L26 System Power.11 Irwin&Vijay, PSU, 2002
Review: Dynamic Power as a Function of VDD
Decreasing the VDD
decreases dynamicenergy consumption(quadratically)
But, increases gatedelay (decreasesperformance)
1
1.5
2
2.5
3
3.5
4
4.5
5
5.5
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
VDD (V)
tp
(nor
ma
liz
ed)
Determine the critical path(s) at design time and use highVDD for the transistors on those paths for speed. Use alower VDD on the other logic to reduce dynamic energyconsumption.
-
7/30/2019 VLSI Digital Circuits
12/22CSE477 L26 System Power.12 Irwin&Vijay, PSU, 2002
Dynamic Frequency and Voltage Scaling
Intels SpeedStepq
Hardware that steps down the clock frequency (dynamic frequencyscaling DFS) when the user unplugs from AC power
- PLL from 650MHz 500MHz
q CPU stalls during SpeedStep adjustment
Transmeta LongRunq Hardware that applies both DFS and DVS (dynamic supply
voltage scaling)
- 32 levels of VDD from 1.1V to 1.6V
- PLL from 200MHz 700MHz in increments of 33MHz
q Triggered when CPU load change is detected by software- heavier load ramp up VDD, when stable speed up clock
- lighter load slow down clock, when PLL locks onto new rate,ramp down VDD
q CPU stalls only during PLL relock (< 20 microsec)
-
7/30/2019 VLSI Digital Circuits
13/22CSE477 L26 System Power.13 Irwin&Vijay, PSU, 2002
Dynamic Thermal Management (DTM)
Initiation Mechanism:How do we enable
technique?
Response Mechanism:What technique do we
enable?
Trigger Mechanism:When do we enable
DTM techniques?
-
7/30/2019 VLSI Digital Circuits
14/22
CSE477 L26 System Power.14 Irwin&Vijay, PSU, 2002
DTM Trigger Mechanisms
Mechanism: How to deducetemperature?
Direct approach: on-chiptemperature sensorsq Based on differential voltage
change across 2 diodes of
different sizesq May require >1 sensor
q Hysteresis and delay areproblems
Policy: When to beginresponding?q Trigger level set too high
means higher packagingcosts
q Trigger level set too lowmeans frequent triggering
and loss in performance
Choose trigger level toexploit difference betweenaverage and worst case
power
-
7/30/2019 VLSI Digital Circuits
15/22
CSE477 L26 System Power.15 Irwin&Vijay, PSU, 2002
DTM Initiation and Response Mechanisms
Operating system or microarchitectural control?
q Hardware support can reduce performance penalty by 20-30%
Initiation of policy incurs some delayq When using DVS and/or DFS, much of the performance penalty
can be attributed to enabling/disabling overhead
q Increasing policy delay reduces overhead; smarter initiationtechniques would help as well
Thermal window (100Kcycles+)q Larger thermal windows smooth short thermal spikes
-
7/30/2019 VLSI Digital Circuits
16/22
CSE477 L26 System Power.16 Irwin&Vijay, PSU, 2002
DTM Activation and Deactivation Cycle
Trigger
Reached
Response Delay Invocation time (e.g., adjust clock)
ResponseDelay
Policy Delay Number of cycles engaged
PolicyDelay
Check
Temp
Check
Temp
Turn
ResponseOn
InitiationDelay
Initiation Delay OS interrupt/handler
ShutoffDelay
Turn
ResponseOff
Shutoff Delay Disabling time (e.g., re-adjust clock)
-
7/30/2019 VLSI Digital Circuits
17/22
-
7/30/2019 VLSI Digital Circuits
18/22
CSE477 L26 System Power.18 Irwin&Vijay, PSU, 2002
Power and Energy Design Space
ConstantThroughput/Latency
VariableThroughput/Latency
Energy Design Time Non-active Modules Run Time
Active Logic Design
Reduced Vdd
Sizing
Multi-Vdd
Clock Gating DFS, DVS
(Dynamic Freq,VoltageScaling)
Leakage + Multi-VT Sleep Transistors
Multi-Vdd
Variable VT
+ Variable VT
-
7/30/2019 VLSI Digital Circuits
19/22
CSE477 L26 System Power.19 Irwin&Vijay, PSU, 2002
Speculated Power of a 15mm P
0.25 , 15mm die, 2V
0% 0% 0% 0% 1% 1% 1% 2% 3%
-
10
20
30
4050
60
70
30 40 50 60 70 80 90 100
110
Temp (C)
Power(Watts
Leakage
Active
0.18 , 15mm die, 1.4V
0% 0% 1% 1% 2% 3% 5%7% 9%
-
10
20
30
40
50
60
70
30 40 50 60 70 80 90 100
110
Temp (C)
Power(Watts
Leakage
Active
0.13 , 15mm die. 1V
1% 2% 3% 5%8% 11%
15%20%
26%
-
10
20
30
40
50
60
70
30 40 50 60 70 80 90 100
110
Temp (C)
Power
(Watts
Leakage
Active
0.1 , 15mm die, 0.7V
6% 9%14%
19%26%
33%
41% 49% 56%
-
10
20
30
40
50
60
70
30 40 50 60 70 80 90 100
110
Temp (C)
Power
(Watts
Leakage
Active
-
7/30/2019 VLSI Digital Circuits
20/22
CSE477 L26 System Power.20 Irwin&Vijay, PSU, 2002
Review: Leakage as a Function of Design Time VT
Reducing the VT
increases the sub-threshold leakagecurrent (exponentially)
But, reducing VT
decreases gate delay(increases performance)
0 0.2 0.4 0.6 0.8 1
VGS (V)
ID(
A)
VT=0.4VVT=0.1V
Determine the critical path(s) at design time and use lowVT devices on the transistors on those paths for speed.Use a high VT on the other logic for leakage control.
-
7/30/2019 VLSI Digital Circuits
21/22
CSE477 L26 System Power.21 Irwin&Vijay, PSU, 2002
Review: Variable VT (ABB) at Run Time
VT = VT0 + (|-2F + VSB| - |-2F|)
where VT0 is the threshold voltage at VSB = 0VSB is the source-bulk (substrate) voltage
is the body-effect coefficient
0.4
0.45
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
-2.5 -2 -1.5 -1 -0.5 0
VSB (V)
VT
(V)
For an n-channel device,
the substrate is normally tiedto ground
A negative bias causes VT
to increase from 0.45V to
0.85V Adjusting the substratebias at run time is calledadaptive body-biasing (ABB)
-
7/30/2019 VLSI Digital Circuits
22/22
CSE477 L26 System Power 22 Irwin&Vijay PSU 2002
Next Lecture and Reminders Next lecture
q System level interconnect
- Reading assignment Rabaey, et al, xx
Remindersq Project final reports due December 5th
q Final grading negotiations/correction (except for the finalexam) must be concluded by December 10th
q Final exam scheduled
- Monday, December 16th from 10:10 to noon in 118 and 121Thomas