modeling and analysis of power supply noise tolerance with fine … · 2016-11-07 · robust low...
TRANSCRIPT
RobustLowPowerVLSI
Modeling and Analysis of Power Supply Noise Tolerance with Fine-
grained GALS Adaptive ClocksDivya Akella Kamakshi*, Matthew Fojtik†, Brucek Khailany†,
Sudhir Kudva†, Yaping Zhou†, Benton H. Calhoun*
*University of Virginia †NVIDIA Corporation
§ Problem: Power supply noise§ Existing solution
§ Traditional adaptive clocking scheme § Drawbacks
§ Proposed solution: Fine-grained GALS adaptive clocking § Quantification of benefits
§ Experimental setup§ Simulation results
§ Summary
2
Outline
§ Problem: Power supply noise§ Existing solution
§ Traditional adaptive clocking scheme § Drawbacks
§ Proposed solution: Fine-grained GALS adaptive clocking § Quantification of benefits
§ Experimental setup§ Simulation results
§ Summary
3
Outline
4
Problem: Power Supply Noise
Chip
Package
Board
Figure courtesy: http://www.theregister.co.uk/2012/05/18/inside_nvidia_kepler2_gk110_gpu_tesla/http://www.ansys.com/Products/Electronics/Option-SIwave-PSI-Solver
5
Problem: Power Supply NoiseLreg
Lbulk
Rbulk
Cbulk
Lmb
Lhf
Rhf
Chf
Lpkg
Lpkg_p
Rpkg_p
Cpkg
Lbump
Ldie
Rdie
Cdie
Regulator Board Package Die
Load
VDD
Vreg
Rpkg RbumpRmbRreg
GND
Parasitics è power supply noise !!!
0.00E+00
5.00E-04
1.00E-03
1.50E-03
2.00E-03
2.50E-03
1.00E+05 1.00E+06 1.00E+07 1.00E+08 1.00E+09
Impe
danc
e(Ω
)
Frequency (Hz)
First droop Resonant frequency = 30MHz
Second droop
§ Resistive component è IR drop
6
Problem: Power Supply Noise
Image courtesy: http://www.soccentral.com/results.asp?EntryID=19453
§ Reactive component è LdI/dt droopFirst, second, third droop . . .
7
Problem: Power Supply Noise
Supply noise è timing errors è performance degradation.
Clock
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Data pathNominal
Droop Slows down
Frequency remains same
Operate at frequency that can handle worst-case noise !!!
§ Problem: Power supply noise§ Existing solution
§ Traditional adaptive clocking scheme § Drawbacks
§ Proposed solution: Fine-grained GALS adaptive clocking § Quantification of Benefits
§ Experimental setup§ Simulation results
§ Summary
8
Outline
§ Adaptive clocking [1][2]
9
Existing Solution: Traditional Adaptive Clocking
Frequency tracks voltage !!!
Voltage railFmax-adaptive clock
Fmax-fixed clock
Adaptive clock
Fixed clock
Voltage railFmax-adaptive clock
Fmax-fixed clock
Adaptive clock
Fixed clock
§ Fixed clocking vs. Adaptive clock
10
Existing Solution: Traditional Adaptive Clocking
Voltage railFmax-adaptive clock
Fmax-fixed clock
Adaptive clock
Fixed clock Tolerates worst-case noise
Frequency tracks voltage noise
11
Metric: Uncompensated Voltage Noise (UVN)
Uncompensated voltage noise UVN = Vmean – Vreq
Vmean : available voltage, averaged over a clock cycle Vreq : required voltage (for operation of circuits at required frequency)
When Vmean > Vreq, no problem !!! When Vmean < Vreq è additional margin for failure-free operation (UVN)
Lower UVN è lower margin
Lower UVN is better!!!
§ Fixed clocking vs. Adaptive clock
12
Existing Solution: Traditional Adaptive ClockingVoltage rail
Fmax-adaptive clock
Fmax-fixed clock
Adaptive clock
Fixed clock Voltage corresponding to fixed frequency (Vreq)
Voltage droops: voltage available (Vmean)
UVN = Vmean – Vreq
UVN = 0 (expected)
Voltage railFmax-adaptive clock
Fmax-fixed clock
Adaptive clock
Fixed clock
Voltage railFmax-adaptive clock
Fmax-fixed clock
Adaptive clock
Fixed clockAdaptive clock
Adaptive clocking is a great solution !!!
Voltage droops: voltage available (Vmean)
Voltage corresponding to adaptive frequency (Vreq)
13
Drawbacks: Traditional Adaptive Clocking
25 mm
25 m
m
Clock domains
§ Large chips with a few clock domains§ But each clock domain is still many mm2
14
Drawbacks: Traditional Adaptive ClockingDrawback #1: Effect of Clock-tree Insertion Delay
Clock
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Insertion DelayRoot
Leaf
15
Drawbacks: Traditional Adaptive ClockingDrawback #1: Effect of Clock-tree Insertion Delay
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Δt
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
ClockClock – root
Adaptive clock responds to supply noise Clock-tree
Insertion delay ΔtAdaptive Clock
generator Load
Δt
Voltage
Clock Clock – leaf
Δt time for the stretched pulses to reach the load (~ 1 -2 ns)
Higher UVN: additional margin for failure-free operation !!!
Voltage
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
16
Drawbacks: Traditional Adaptive ClockingDrawback #2: Effect of Spatial Workload Variations
Current variations across chip è Variations in voltage fluctuation across chip
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Clock
Voltage spike in clock generator region Voltage droop in load region
Voltage
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Clock
Voltage spike in clock generator region Voltage droop in load region
Voltage
Δt
Higher UVN, additional margin for failure-free operation !!!
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Clock
Voltage spike in clock generator region Voltage droop in load region
Voltage
Clock – root
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Clock
Voltage spike in clock generator region Voltage droop in load region
Voltage
Clock – leaf
17
Drawbacks: Traditional Adaptive Clocking
25 mm
25 m
m
§ Higher clock domain area§ Effect of clock-tree insertion delay is higher § Spatial difference in voltage fluctuations is higher
§ Problem: Power supply noise§ Existing solution
§ Traditional adaptive clocking scheme § Drawbacks
§ Proposed solution: Fine-grained GALS adaptive clocking
§ Quantification of Benefits§ Experimental setup§ Simulation results
§ Summary
18
Outline
19
Proposed Solution: Fine-grained GALS Adaptive Clock
Traditional adaptive clockClock domain many mm2
Fine-grained GALS adaptive clockClock domain as small as a mm2
20
Proposed Solution: Fine-grained GALS Adaptive Clock
A) Asynchronous boundary crossing: § B. Keller et. al § Pausible bisynchronous
FIFO design § Easily integrated to
standard tool flows§ Average latency 1.34
cyclesB) Myriad local clocks§ Ring oscillators: mW range
power
21
Proposed Solution: Fine-grained GALS Adaptive Clock § Lower clock domain area
§ Lower insertion delay (few 100 ps). § Lower variation in voltage fluctuation.
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Clock-treeInsertion delay is of the
order of a few 100ps only
Adaptive Clock generator Load
Clock
Both clock generator and load region experience similar switching activities
Voltage
Clock
Clock-treeInsertion delay is of the
order of a few 100ps only
Adaptive Clock generator Load
Clock
Both clock generator and load region experience similar switching activities
Voltage
Clock
Voltage
Clock
Fine-grained GALS adaptive clocks è lower UVN (lower margin)!!!
§ Problem: Power supply noise§ Existing solution
§ Traditional adaptive clocking scheme § Drawbacks
§ Proposed solution: Fine-grained GALS adaptive clocking § Quantification of Benefits
§ Experimental setup§ Simulation results
§ Summary
22
Outline
23
Experimental Setup
Vreg
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
ClockPower Distribution Network
Adaptive Clock generator
Clock treeLoad
24
Experimental Setup
Vreg
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
ClockPower Distribution Network PCB, Package, On-chip
Adaptive Clock generator
Clock treeLoad
Lreg
Lbulk
Rbulk
Cbulk
Lmb
Lhf
Rhf
Chf
Lpkg
Lpkg_p
Rpkg_p
Cpkg
Lbump
Ldie
Rdie
Cdie
Regulator Board Package Die
Load
VDD
Vreg
Rpkg RbumpRmbRreg
GND
25
Experimental Setup
Vreg
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
ClockPower Distribution Network PCB, Package, On-chip
VDD
Adaptive Clock generator
Clock treeLoad
26
Experimental Setup
Vreg
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
ClockPower Distribution Network
VDD
Adaptive Clock generator
Clock treeLoad
27
Experimental Setup Vreg
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
ClockPower Distribution Network
VDD
Adaptive Clock generator
Clock treeLoad
Traditional Adaptive Clocking§ Long clock-tree è upto 2 ns§ Set PDN area to many mm2
Fine-grained GALS Adaptive Clocking§ Short clock-tree è low as 300 ps§ Set PDN area to just a mm2
28
Details: Experimental Setup Vreg
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
ClockPower Distribution Network
VDD
Adaptive Clock generator
Clock treeLoad
29
Power Distribution Network Simple lumped PDN model
§ Cannot model spatial voltage variations§ Need distributed PDN – Voltspot [3]
21p
0.094m
0.166m
240µ
2.83p
0.045m
5µ
PCB Package Die
Load
0.094m 21p
Vreg
Gnd0.045m 2.83p
Bump
0.0025m
0.0045p0.295u
0.0045p0.295u
30
Power Distribution Network
Distributed PDN model using Voltspot• Total chip area • PDN divided into an array(47 x 47)
Ω
PCB + Package
resistance
Vreg
Gnd
Distributed on-chip grid + C4 pads
Distributed load + decaps
Distributed package inductace
31
Details: Experimental Setup Vreg
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
ClockPower Distribution Network
VDD
Adaptive Clock generator
Clock treeLoad
Adaptive Clock Generator
32
§ Verilog-A model § Voltage averaged over a
cycle : Vmean
§ Voltage vs. frequency (VF) curve
How is VF curve generated?§ Critical path: longest circuit path on an SoC§ Emulate critical path using 45 nm PDK kit§ Simulate for max frequency vs. voltage è VF curve
33
Details: Experimental Setup Vreg
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
ClockPower Distribution Network
VDD
Adaptive Clock generator
Clock treeLoad
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
34
Clock-tree
34
§ Global and local clock distribution § Insertion delay vs. voltage
Curve fit polynomial used in Verilog-A model
35
Details: Experimental Setup Vreg
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
ClockPower Distribution Network
VDD
Adaptive Clock generator
Clock treeLoad
Clock-treeInsertion delay Δt
Adaptive Clock generator Load
Δt
Voltage
Clock
36
Workload
36
§ Current switching activity è voltage rail fluctuation § Resonating current profile have worst effect of supply noise
§ Frequency of interest : 10 – 40 MHz (Resonance at 30 MHz) § Current slew rate: 10 A to 90 A over 10 clock cycles § System frequency : 850 MHz, supply voltage = 1 V
0.00E+00
5.00E-04
1.00E-03
1.50E-03
2.00E-03
2.50E-03
1.00E+05 1.00E+06 1.00E+07 1.00E+08 1.00E+09
Imp
ed
an
ce(Ω
)
Frequency (Hz)
First droop Resonant frequency = 30MHz
Second droop
0
5
10
15
20
25
30
35
40
45
50
0 10 20 30 40
Unc
ompe
nsat
ed V
olta
ge N
oise
(m
V)
Workload frequency (MHz)
1.5 ns 1.2 ns 0.9 ns 0.6 ns 0.3 ns
37
Simulation Results
37
A. Effect of Clock-tree Insertion Delay§ Uniform current distribution throughout PDN area§ Sweep workload frequency: 10 - 40 MHz, insertion delay: 0.3 -1.5 ns
Simulation Results
38
B. Effect of Spatial Workload Variations
23.5 mm
23.5
mm
Clock
ClockMeasurement
Traditional adaptive clocking
Fine-grained GALS adaptive clocking
Simulation Results
39
B. Effect of Spatial Workload Variations
23.5 mm
23.5
mm
Clock
ClockMeasurement
Traditional adaptive clocking
Fine-grained GALS adaptive clocking
Upper
Lower
Simulation Results
40
B. Effect of Spatial Workload Variations: • Lower half : 80% of power (top half: 20%)• Workload frequency: 30 MHz
Simulation Results
41
Traditional Adaptive ClockingUncompensated voltage noise = 111 mV
Fine-grained GALS Adaptive ClockingUncompensated voltage noise = 33 mV
§ Model and analyze power supply noise tolerance§ Traditional adaptive clocking§ Fine-grained GALS adaptive clocking
§ Effects of clock-tree insertion delay, spatial workload variations.§ UVN savings of ~78 mV§ Equivalent to power saving of ~15% for same performance (@1
V)§ Overheads
§ Myriad local clocks§ Good candidates are digitally-controlled / ring oscillators§ Only a few mWs of power (<1%).
§ Future work§ Overall savings dependent on the GALS partition size.§ Account for domain crossing, clocking overhead in model.
42
Summary
1. A. Grenat, S. Pant, R. Rachala, and S. Naffziger, "5.6 Adaptive clocking systemfor improved power efficiency in a 28nm x86-64 microprocessor," in Solid-StateCircuits Conference Digest of Technical Papers (ISSCC), 2014 IEEEInternational, vol., no., pp.106-107, 9-13 Feb. 2014
2. N. Kurd, P. Mosalikanti, M. Neidengard, J. Douglas, and R. Kumar, "NextGeneration Intel¯ Core™ Micro-Architecture (Nehalem) Clocking," in Solid-State Circuits, IEEE Journal of , vol.44, no.4, pp.1121-1129, April 2009
3. R. Zhang, K. Wang, B. H. Meyer, M. R. Stan, and K. Skadron, "Architectureimplications of pads as a scarce resource," in Computer Architecture (ISCA),2014 ACM/IEEE 41st International Symposium on , vol., no., pp.373-384, 14-18June 2014
43
References
44
Thank You!