practical power gating and dynamic voltage/frequency scaling
TRANSCRIPT
1 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
PRACTICAL POWER GATING AND DYNAMIC
VOLTAGE/FREQUENCY SCALING
Stephen Kosonocky
AMD Fellow
2 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
OUTLINE
Motivation for DVFS and Power Gating
Dynamic Voltage/Frequency Scaling
Power Gating
Conclusion
3 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
MOTIVATION
4 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
SERVER COST TRENDS
Server compute capacity/$ is increasing steadily
– Driving up total power/$ spent on servers
– Also driving up electrical power spending
Green initiative and focus on energy independence creates additional pressure to lower energy costs
Increased energy efficiency directly reduces costs
• Source Data: Uptime Institute, 2009
5 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
MOBILE AND DESKTOP
High degree of integration of special functions
Diverse workloads
– Data serial, data parallel
– Streaming
– Compute intensive in bursts
– Long periods of inactivity
Small form factor
Long battery life
Instant access of the device
Autonomous power management
Now: Parallel/Data-Dense
16:9 @ 7 megapixels
HD video flipcams, phones, webcams (1GB)
3D Internet apps and HD video online, social networking w/HD files
3D Blu-ray HD
Multi-touch, facial/gesture/voice recognition + mouse & keyboard
All day computing (8+ Hours)
Immersive and interactive performance
Wo
rklo
ad
s
- Samuel Naffziger, VLSI Symposium 2011, Kyoto
6 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
POWER/PERFORMANCE TRADE-OFF
Watt
s/C
ore
0
5
10
15
20
25
30
35
0.725 0.775 0.825 0.875 0.925 0.95 1.025 1.075 1.125 1.175 1.225
Voltage
0
2
4
6
8
10
12
Power
Fmax/Voltage
Perf/Watt
Performance/Watt
maximized at the
lowest voltage
operating point
Processor Power-Frequency/Voltage
Fre
qu
en
cy (
GH
z)
Performance/Watt maximized at VMIN
7 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
HIGH PERFORMANCE CORE POWER
Depending on process node, leakage ~ 40% total power
8 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
Two major knobs have emerged for controlling power
1. Dynamic Voltage and Frequency Scaling
– Optimize performance for the application while it’s running
2. Power Gating
– Gate power during idle periods
Each present unique challenges for
implementation and optimization
9 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
DYNAMIC VOLTAGE/FREQUENCY SCALING
10 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
CPU
GPUDDR PHY
NB
AnalogIO
PHY
LLANO ACCELERATED PROCESSING UNIT (APU)
~ Power Breakdown
• Integration Provides Improvement
• Eliminate power and latency of extra chip crossing
• 3X bandwidth between GPU and Memory
• Same sized GPU is substantially more effective
• Power efficient, advanced technology for both CPU and GPU
• Thermal management optimization between CPU and GPU
Key features for Mobile Mark 07
– Lower Idle, Active power
– DVFS / Clock-gating / Power Gating
– Robust and Flexible Power Management Control
11 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
APU POWER MANAGEMENT
- Sebastien Nussbaum, IEEE Vail Computer Elements, June 22th, 2009
12 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
DYNAMIC VOLTAGE SCALING
Dynamic frequency and voltage scaling (DVFS)
Match the frequency of operation with the workload
Lower voltage to sustain required frequency
Examples:
P-states for active control of voltage/frequency
C-states for degrees of idle control
Voltage
P-states: Core executing at some frequency
C1/C2: Core halted and frequency ramped down
C3/C4: Core halted, clocks ramped to even lower frequency, maintain caches, maintain core
C5: Core halted, clocks ramped to even lower frequency, flush caches, maintain core
C6: Core off, flush caches, core state saved to DRAM
VP-states
VAltvid
Vdeeper Altvid
0V
13 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
DESIGN FOR DVFS
Optimization of timing across entire voltage range necessary for efficiency DVFS
Required two separate timing corners with equally aggressive cycle time goals
– 0.8V & 1.3V
Both gate-dominated timing paths and paths with high RC content have been tuned to provide better scaling than the prior design
Enables the core to thoroughly exploit boost capability
14 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
APU DVFS
- Sebastien Nussbaum, IEEE Vail Computer Elements, June 22th, 2009
15 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
APU THERMAL PROFILE
85.0
90.0
95.0
100.0
105.0
110.0
CPU-centric workload
- Samuel Naffziger, VLSI Symposium 2011, Kyoto
16 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
APU THERMAL PROFILE
85.0
90.0
95.0
100.0
105.0
110.0
GPU-centric data parallel workload
- Samuel Naffziger, VLSI Symposium 2011, Kyoto
17 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
APU THERMAL PROFILE
85.0
90.0
95.0
100.0
105.0
110.0
Balanced workload
- Samuel Naffziger, VLSI Symposium 2011, Kyoto
18 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
WHAT HAPPENS WHEN WE ADD MORE CORES?
VRM
Misc
VDDNB
VDD
DIMMs
SVI
CPU 0 CPU 1
CPU 2 CPU 3
GraphicsNorthbridge
VLDTVDDIO, VTT
VDDA
PLL PLL
PLLPLL
PLL
PLLs
PLLs
Already at 6,2 high current supplies
19 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
PACKAGE LIMITS
Packages add constraints to increasing number of discrete supplies
Typically only use 4 thick layers for high current power distribution
3-4 thin build-up layers on each side
– Used for secondary supply and signal routing to pins
Difficult to add many more supply rails
Typical low cost package structure
Chip
20 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
TYPICAL PACKAGE LAYER ROUTING LAYERS
VDD VDDIO
VD
DA
VD
DR
VD
DN
B
M1 – PWR (top metal layer) M2 – routing layer 1Package capacitors
Space limited for capacitors
Routing congestion
21 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
CORE VOLTAGE POWER DELIVERY IN PACKAGE
Package Routing Constraints forces compromises for VDD plane
– More difficult with increasing VDD planes
More resources dedicated to VSS
– Shows less droop
VDD Max Package Droop VSS Max Package Droop
1
7
13
19
25
31
S1 S
4 S7
S1
0
S1
3
S1
6
S1
9
S2
2
S2
5
0.032
0.033
0.034
0.035
0.036
0.037
0.038
0.039
0.04
0.041
1 4 7
10
13
16
19
22
25
28
31
34
S1
S8
S15
S22
S29
-0.029
-0.0285
-0.028
-0.0275
-0.027
-0.0265
-0.026
-0.0255
-0.0252.4x
higher
22 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
WHAT DOES THE ACTUAL VDD LOOK LIKE?
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.1
1.15
1.2
1.25
1.3
1.35
Time
25ns trace of DGEMM benchmarkV
DD
No
rmalized
Cu
rren
t
X86 core Supply simulation w/ package board model
Increased local decoupling cap could help
~50mV
23 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
PACKAGE CAPS FOR SUPPLY DECOUPLING
Marginally effective
Most effective
Hard to find places for additional components
24 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
LARGE ON-DIE DECOUPLING CAPACITOR FOR REDUCTION
OF SUPPLY NOISE
- D. Wendel, et.al. “The Implementation of POWER7: A Highly Paralleland Scalable Multi-Core High-End Server Processor”, ISSCC 2010
100fF/um2 ~ 25x thick-ox
Shift resonant frequency from 80MHz -> 7MHz
Trench Capacitor
Adds significant process cost
25 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
METHODS FOR INTEGRATED VOLTAGE REGULATION
Switched capacitor regulator
Buck converter
Linear regulator
Switched voltages
26 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
INTEGRATED SWITCHED CAPACITOR REGULATOR
Efficiency dependent on load current
– Most efficient at 2:1 voltage conversion ratio
– Limits number of switches in the path
– ~90% for 2:1, ~70% for other values
Requires large capacitors
– Can use trench capacitor for better area efficiency
Large voltage and current ripple
– Can mitigate with large decap and/or interleaved converters
Leland Chang, Robert K. Montoye, Brian L. Ji, Alan J. Weger, Kevin G. Stawiasz, and Robert H. Dennard, “A Fully-Integrated Switched-Capacitor 2:1 Voltage Converter with Regulation Capability and 90% Efficiency at 2.3A/mm2”, VLSI Symposium , 2010
2:1 Conversion
27 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
INTEGRATED BUCK CONVERTER
Integrated inductor options
~ 0.33nH possible
High frequency operation can reduce inductance requirements
L proportional 1/Fs
Can reasonably approach 1GHz when fully integrated
Parasitic series resistance is key problem
I2R loss large
Multi-phase or parallel regulator architecture can help
– (I/n) 2 R (n=number phases)
– Current ripple is averaged by output cap
Too many phases will reduce efficiency for low current loads
Interleaved Outputs
W. Kim, et.al, June, 2007
28 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
INTEGRATED BUCK
CONVERTER KEY
LIMITATION:
DCR VS. INDUCTANCE
CMOS Inductor vs. Small Package size components
Peak Efficiency considering inductor resistive loss alone
• Embedded inductors needs lower resistance or higher inductance/turn
• Multi-phase or parallel can approach reasonable efficiency only with very high number of paths
• Reduces current in each inductor
• Fully embedded approach really needs a specialized integrated inductor technology
Discrete inductors consume valuable package resources
>70%
29 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
INTEGRATED LINEAR REGULATION
Error between divided output and reference voltages is and fed to the pass transistor to correct the error
NMOS type higher performance
PMOS type (Low Drop Out)
Efficiency good only for small (Vin – Vout)
Eff = (Power Out)/(Power In)
= ~ Vout/Vin
LDO design can be more efficient (PMOS output)
– Stability & speed of feedback become an issue
Vin
Vout
Vref
ErrAmp
Iload
30 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
LINEAR REGULATOR (PMOS)
PMOS based linear regulators offer low dropout voltage while operating within (0-Vin) range
Required gain-bandwidth product challenging high for processor loads
BW ~ 1GHz needed
Hurts efficiency, power wasted in Op-amp
+
-Vref
Vin
Vout
Iload Reff Cdecap
31 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
LINEAR REGULATOR (NMOS)
Source-follower topology
Will require charge pump to provide gate overdrive.
Accurate voltage setting.
Supply rejection – dropout voltage tradeoff
+
-
Vin
Vout
Iload Reff Cdecap
Charge Pump
Vref
Intrinsically good frequency response
Charge-pump (or low current supply) allows Low drop out
Higher Efficiency
Amplifier is high gain, low bandwidth.
Bandwidth and efficiency depends on desired slew rate
0Vth devices offer additional efficiency
High BW response makes it
a good candidate for integration
w/o large Capacitors
Requires charge pump or high
voltage supply >Vin for good efficiency
32 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
DVFS SYSTEM IMPLEMENTED WITH DUAL VOLTAGES
Fully integrated167 processor system
Each processor tile contains a core that operates at:
– A fully-independent clock frequency
Any frequency below maximum
Halts, restarts, and changes arbitrarily
– Dynamically-changeablesupply voltage
VddHigh or VddLow
Disconnected for leakage reduction
Each power gate comprises 48 individually-controllableparallel transistors
Dean N. Truong, Wayne H. Cheng, Tinoosh Mohsenin, Zhiyi Yu, Anthony T. Jacobson, Gouri Landge, Michael J. Meeuwsen, Christine Watnik, Anh T. Tran, Zhibin Xiao, Eric W. Work, Jeremy W. Webb, Paul V. Mejia, Bevan M. Baas, “167-Processor Computational Platform in 65 nm CMOS”, JSSC, Vol. 44, No. 4, April 2009
33 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
POWER GATING
34 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
WHAT'S THE PROBLEM?
JUST TURN IT OFF WHEN YOUR NOT USING IT!
Many choices and options
Need to analyze your particular application for the optimal solution
Considerations:– Design complexity
– Power savings
– Frequency/performance impact
– Wake-up time
– Area overhead
– Verification complexity
– Return on investment (ROI)
35 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
POWER GATING MODES (ACTIVE)
During active mode
Natural low gate activity factors inside of logic block allow sharing of power gate device for many logic functions with gated logic block
– Reduces resistance requirements of power gate device
– Works to improve Ion/Ioff ratio requirements
Design objective
– Minimize resistance of cut-off device
Penalties:
– Increased logic delay
– Requires higher external supply voltage for same frequency
Consumes more power during active mode
Header Switch
Logic Block
IMAXRON
MAXONIRPG
IRPGddLOGIC
IRV
VVV
36 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
POWER GATING MODES (SLEEP)
During sleep mode
Steady state
– Want to maximize OFF resistance of cut-off device
– Reduce stand-by power with lower leakage
(10x -> 1000x possible)
Transition to cutoff
– Creates supply di/dt for small ISLEEP
vs. large IL
Header Switch
Logic Block
ISLEEPROFF
CUTOFF
LSLEEPCUTOFF
SLEEPddSLEEP
t
II
dt
dI
IVP
37 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
POWER GATING MODES (WAKE)
During wake-up mode
– Need to control in-rush
current to charge logic devices to VDD
Logic capacitance is discharged to nVdd during sleep mode
Header Switch
IWAKERWAKE
WAKE
ddWAKE R
VnI
1
CLogic
1st order analysis
38 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
LEAKAGE SAVINGS WITH POWER GATING
Early Llano Measured Results
Ravi Jotwani, Sriram Sundaram, Stephen Kosonocky, Alex Schaefer, Victor F. Andrade, Amy Novak, Samuel Naffziger, “An x86-64 Core in 32 nm SOI CMOS”, JSSC, January, 2011
>20x leakage savings
Very effective method for controlling power during idle periods
39 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
DETERMINATION OF FREQUENCY IMPACT DURING
ACTIVE MODE
Determine tolerable voltage drop for active mode operation
– De-rate timing models by fixed maximum value to account for voltage loss in power switch
– Increase external voltage to compensate for voltage drop
Use critical path simulations or representative RO data for voltage/frequency trade-off
Design worst-case power gate and grid resistance to meet specification target
Analyze drop at highest operating voltage and highest power consumption
Keep voltage drop due to power gating small to minimize uncertainty at power island boundary crossings
2nd order effects:
– Circuit area growth due to wire blockages caused by power switch
– Decreased available quiet decoupling capacitance due to increased resistance to implicit and explicit charge reservoirs on the die
40 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
FREQUENCY IMPACT ESTIMATED WITH RING OSCILLATOR
0.0%
0.5%
1.0%
1.5%
2.0%
2.5%
3.0%
0.6 0.7 0.8 0.9 1 1.1 1.2 1.3 1.4
VDD (V)
Fre
qu
en
cy D
elt
a
0.5% Delta VDD
1.0% Delta VDD
1.5% Delta VDD
Target IR drop:VIRPG~11mV @1.1V
1% VDD loss ~ 1% frequency loss
32 nm FO1 RO Data (RVT, min W)
41 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
POWER GATE CELL I-V CHARACTERISTICS
0.0E+00
1.0E-04
2.0E-04
3.0E-04
4.0E-04
5.0E-04
6.0E-04
7.0E-04
0.00 0.01 0.02 0.03 0.04 0.05 0.06
VDS (V)
IDS
(A
)
74.00
74.50
75.00
75.50
76.00
76.50
RD
S (
oh
ms
)
FET operates in the linear region
Can be approximated as a resistor
Target region of operation
IDS
RDS
42 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
EMBEDDED ROW-BASED POWER SWITCH PLACEMENT
PS PS PS PS PS PS PS PS PS PSPS
PS PS PS PS PS PS PS PS PS PSPS
VirVDD
VDD
VSS
VirVDD
VDD
VSS
Current Flow
Pre-populate logic area with power gate cells at uniform locations prior to place and route
43 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
MAXIMUM POWER GATE SPACING(ROW-BASED PLACEMENT)
Assume uniform current density
Can estimate power gate spacing with respect to IR and EM limits
VIRPG = 2 * rows * IROW * RPG
VIR_WC = VIRPG + ∑ VIRW
VIRW(0) VIRW(1) VIRW(2) VIRW(3) VIRW(4)
VIRPG
Imax
44 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
IR Drop Along Virtual VSSPerpendicular to Standard Cell Rows
0.0
10.0
20.0
30.0
40.0
50.0
60.0
70.0
0 20 40 60 80 100 120 140
Standard Cell Rows Between Footers
IR D
rop (
mV
)
1X Metal Virtual Supply IR drop
1.3X Metal Virtual Supply IR drop
IR DROP EXAMPLE (UNIFORM POWER DENSITY)
Max current density = 1.0A/mm2
75.57ohm footer30rows
45 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
EM Margin Along Virtual VSSPerpendicular to Standard Cell Rows
0%
5%
10%
15%
20%
25%
0 20 40 60 80 100 120 140
Standard Cell Rows Between Footers
EM
Lim
it
1X Metal EM Limit
1.3X Metal EM Limit
EM EXAMPLE (UNIFORM POWER DENSITY)
Max current density=1.0A/mm2
Thick metal will provide more EM margin30rows
46 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
VERTICAL CROSS SECTION
PS
C4 Bump
M4
M5
M6
M7
M8
M9
M10
M11
M3
M2
Full VDDGrid
VDDViaTower
Logic
VirtualVDDGrid
VDD VVDD
thic
ker m
eta
l
EM
EM hazards can occur in high power density regions
47 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
EMBEDDED POWER GATE EXAMPLESTYLE USED IN LLANO GPU
Power gate:
Checkerboard
pattern
Always-on cells:
FeedThru cells,
Power control
logic, Clamp
cells for macro
input pins and tile output ports.
Input I/O buffers
AMD GPU Functional Unit Power Gate Tile
SRAM
Power gate cells must squeeze around arraysProviding power to always-on cells can be problematicIP I/O protection typically needed to isolate individual regionsSRAMs need their own custom power gating
48 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
EMBEDDED POWER GATE EXAMPLE
AMD GPU Functional Unit Power Gate Tile
Simulated with Apache Redhawk in vectorless dynamic analysis mode
Power gating increases distribution of IR drop on gridCan cause problems with critical circuits and paths
49 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
EXTERNAL RING STYLE GATING
Interrupt global supply grid to power-gated IP block
Requires careful chip floorplanning to accommodate supply interruption
Increased sharing of power gate devices can ease power gate sizing
– Large IP power gating can greatly ease worst-case current analysis
Can be difficult to create always power-on islands with power-gated region
VGND
GNDGlobalGrid
Macro/Core
Footer SwitchLocation
VirtualGrid
M1 metal
M2 metal
50 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
LLANO CPU CORE RING GATING
WITH PACKAGE LAYER ASSIST
Ravi Jotwani, Sriram Sundaram, Stephen Kosonocky, Alex Schaefer, Victor F. Andrade, Amy Novak, Samuel Naffziger, “An x86-64 Core in 32 nm SOI CMOS”, JSSC, January, 2011
Virtual voltage spreads uniformlyBumps near hot spots can exceed max limits
High RVSS can create noise issues
51 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
STRAIGHT EDGE & ZIG/ZAG EDGE PACKAGE PLANE
BOUNDARIES
Peak currents flowing into bumps at periphery can be alleviated using zig/zag approach
52 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
MAXIMIZING POWER GATING OPPORTUNITIES
To maximize power savings, we would like to go into a power gating mode as soon as possible to reduce leakage power
Practical constraints limit opportunity
– Energy break-even point
– In-rush current induced noise
– Time to restore state of power gated region
A
c
t
i
v
e
Idle Idle
time
Power
A
c
t
i
v
e
IdleApplicationMode
A
c
t
i
v
e
53 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
ENERGY BREAK-EVEN POINT
Power Gate Example:
T0: Logic block starts inactivity
T1: Control circuit decides to power gate
T2: Power gate signal propagation complete
T3: Energy break-even point
T4: Full discharge of virtual supply
T5: Control circuit decides to re-power logic
T6: Power gate signal propagation complete
T7: Gated logic is fully charged and ready for activity
Z. Hu et al., "Microarchitectural Techniques for Power Gatingof Execution Units," ISLPED'04, August 9–11, 2004, Newport Beach, California, USA.
54 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen KosonockyS. Kosonocky 54
IN-RUSH CURRENT CONTROL
During wake-up of the power gated region it’s critical to control the in-rush current
The effective resistance of parallel combination of footers is very small
– Can create excessive currents
Transition to cutoff mode can also create similar di/dt events
Hazards
– Supply bounce from large di/dt
– EM violations
– Short circuit currents from imbalanced nodes in power gated region during wake-up
55 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky 55
UNDERSTANDING SUPPLY BOUNCE
Suhwan Kim, Chang Jun Choi, Deog-Kyoon Jeong, Stephen Kosonocky, Sung Bae Park, “Reducing Ground-Bounce Noise and Stabilizing the Data-Retention Voltage of Power-Gating Structures”, IEEE Transactions on Electron Devices, Vol. 55, NO. 1, January 2008
In-rush current can causesupply bounce on neighboring IP
56 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen KosonockyS. Kosonocky 56
IN-RUSH CURRENT SIDE-EFFECTS
Disturb neighboring circuits
– Retention flops, adjacent IP logic, SRAM
Supply bounce, ground bounce
Over stress gate-oxides due to voltage excursions
Un-even distribution of wake signals during power-up
– Creates excessive short circuit currents when some gates are charged and others are not
Large di/dt can cause EM failures
57 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
PROGRAMMABLE IN-RUSH CURRENT CONTROL
When exiting power gating, power gates are gradually enabled in groups with progressively larger effective device widths
– Controlled by FSM
– Fuse programmable strengths and duration for post-silicon tuning
– Analytical R-C model w/o inductance demonstrating control
group1
group2
group3
group4
group5
58 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
LLANO CPU CORE IN-RUSH CURRENT ON CC6 EXIT
0
0.2
0.4
0.6
0.8
1
1.2
1.4
0.00
0.50
1.00
1.50
2.00
2.50
3.00
3.50
0 20000 40000 60000 80000 100000
VS
SC
OR
E (
V)
Curr
ent
(A)
Time (ps)
Apache Redhawk In-rush Current Simulation (Conservative Programming)
IVSSCORE0
VSSCORE0
Fuse programmable in-rush control built-inSample setting shows a 3A peak during CC6 exit
59 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
HW DATA SHOWING IN-RUSH EFFECT ON OTHER CORES
3 cores in/out CC61 core max freq
1 cores in/out CC63 core max freq
3 cores in/out CC61 core max freq
1 cores in/out CC63 core max freq
Large in-rush setting Small in-rush setting
Transient droop limited by CC6 exit Transient droop limited by # coresrunning max frequency
60 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
STATE SAVE AND RESTORE
Regulate virtual supply during sleep mode to low voltage
– Moderate leakage reduction possible
– Globally applies to all retention elements in gated region
– Example:
K. Kumagai et al., "A Novel Powering-down Scheme for Low Vt CMOS Circuits," Symposium on VLSI Circuits, 1998 (Virtual Rail CMOS).
L. Clark, S. Demmons, "Standby Power Management for a 0.18μm Microprocessor," ISLPED 2002 (Intel XScale processor).
Retention flops with always-on latch cell
– Many variants possible
– Some with explicit control or automatic restore
– Good practice to avoid adding power domain crossing in functional path of flop
– Can increase timing uncertainty with virtual supply bounce
– Reduces requirements of always-on supply distribution
Write-out critical state to memory using serial or parallel paths
– Can utilize scan mechanism to avoid a dedicate bus
– Low area overhead
– Can be a large time overhead depending on total size of state restore memory
61 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
CONCLUSION
62 | Practical Power Gating and Dynamic Voltage/Frequency Scaling Issues | August 17, 2011 | Stephen Kosonocky
CONCLUSION
DVFS provides a nice capability for optimization of CPU/GPU/APU performance per application
– As we add more cores and IP to the SOC, it’s will be difficult to continue to provide unique voltage rails with external regulators
– Integrated regulation has it’s own challenges, more details will be covered by the other presenters
Power gating can mitigate the need for additional voltage rails
– Maximum frequency impact is minimal
– Integration by embedding in IP or surrounding in a ring each have their own issues
– Opportunity is limited by energy breakeven, in-rush noise, and state restore overhead
– In-rush current control is necessary to prevent frequency impacts