Download - EE 587 SoC Design & Test
2
Power & Low Power Design
Physical Design Methodologies
3
Metric1 : Power
• Ref. 5.9 of HJS• If we improve a design relative to power but it slows down the
circuit, then it might not be acceptable
• Comparing the power of two designs might be misleading
• the lower power design might just be slower
4
Metric 2 : Energy / Operation
• Rather than looking at power, look at the total energy needed to complete some operation. Fixes obvious problems with the Power metric, since changing the operating frequency does not change the answer
5
Metric 3 : EDP
6
Energy vs. Delay
7
Technology Optimization
• Energy per transition is proportional to 2ddV
• When the supply voltage approaches the threshold then delay increases significantly
8
Technology Optimization
• Modification of the threshold voltage
• Reduction of threshold voltage and supply reduction is offset by an increase in leakage current
9
Transistor Sizing
• Optimum transistor sizing
• The first stage is driving the gate capacitance of the second and the parasitic capacitance
• input gate capacitance of both stages is given by NCref, where Cref represents the gate capacitance of a MOS device with the smallest allowable (W/L)
10
Transistor Sizing
• When there is no parasitic capacitance contribution (i.e., α = 0), the energy increases linearly with respect to N and the solution of utilizing devices with the smallest (W/L) ratios results in the lowest power.
• At high values of α, when parasitic capacitances begin to dominate over the gate capacitances, the power decreases temporarily with increasing device sizes and then starts to increase, resulting in a optimal value for N.
• The initial decrease in supply voltage achieved from the reduction in delays more than compensates the increase in capacitance due to increasing N.
• after some point the increase in capacitance dominates the achievable reduction in voltage, since the incremental speed increase with transistor sizing is very small
• Minimum sized devices should be used when the total load capacitance is not dominated by the interconnect
11
Power Dissipation in Interconnects
• In the deep-submicron era, interconnect wires (and the associated driver and receiver circuits) are responsible for an ever increasing fraction of the energy consumption of an integrated circuit.
• Most of this increase is due to global wires, such as busses and clock and timing signals.
• More than 90% of the power dissipation of traditional FPGA components (over a wide range of applications) is due to the interconnect
• For gate array and cell library based designs it has been found that the power consumption of wires and clock signals can be up to 40% and 50% of the total on-chip power consumption respectively.
12
Energy Metric
swingDDLwdyn VVCCE )(
13
Low-swing Circuits
Conventional Level Converter
• Extra power rail
• Special low-Vt device needed
14
Dynamically-Enabled Drivers
• The basic idea is to control the charging/discharging time of the drivers so that a desirable swing on the interconnect is obtained.
• Wire is floating when the driver is disabled
out
VDD
SA
REF
REF
EN2
ENin
CL
PRE
15
Low Swing Bus
• Power dissipated in an n-bit bus
2DDw VCfnP
• Increasing the number of switching bits n causes a proportional increase in power dissipation
16
Low Swing Bus
nCw
Cw
Dummy Ground
n
1
• The voltage swing can be reduced by using an additional bus wire, called the dummy ground
• This dummy ground is initially discharged to the real ground level and then immediately isolated from the ground.
• The charge of bus wiring capacitance is discharged to the dummy ground instead of the real ground.
• When n bits of the bus signals switch from “I” to “0,” the voltage swing is reduced to
1n
VV ddswing
17
Low Swing Bus
• The bus power dissipation required to switch n bits of the bus is given as
2
1 ddw
ddswingw
VCfn
n
VVCfnP
• The voltage swing is further reduced as the number of switching bits increases
18
SSDLC
• Symmetric Source-Follower Driver with Level Converter
• The driver limits the interconnect swing from Vtn to Vdd-Vtn
• Assume that node in2 goes from low to high; Vtn to Vdd-Vtn.
• Initially, node A sits at Vtn and node B sits at Ground.
• During the transition period, with both N3 and P3 conducting, A and B rise to Vdd-Vtn
• Consequently, N2 is turned on, and out goes to low. The feedback transistor PI pulls A further up to Vdd to cut off P2 completely. in2 and B stay at Vdd-Vtn.
VDD
in
VDD VDD
in2
CL
P1
P3
P2
N1 N2
N3out
A
B
19
Level Converter with Low-Vt Device
2
ddfull
new
V
REF
E
E
20
Gated Clocks
LogicBlock
MSBREG
REGForBits
0-N-2
REGForBits
0-N-2
REG
CLK
CLK
MSBComparator
A>B
ComparatorA>BFor
Bits 0-(N-1)
ConditionallySwitched
Gated Clock
21
Low Power Through Circuit Design
• Low-Power Logic Styles: CMOS Versus Pass-Transistor Logic by Zimmermann and Fichtner
• Power savings through proper choice of logic styles– Switching Capacitance– Transition Activity– Short Circuit Currents
• Power dissipation of various logic styles need to be analyzed
22
Circuit Design Styles
• Nonclocked Logic– CMOS, Pseudo-NMOS, Differential Cascade Voltage Switch
(DCVS), Pass-Transistor• Clocked Logic
– Domino, Differential Current Switch Logic (DCSL)
23
Complementary CMOS - Advantages
• Simple monotonic gates can be realized very efficiently with only a few transistors, one signal inversion level, few circuit nodes– Area and Power reduces, delay reduces
• Robustness against voltage scaling and transistor sizing• Input signals are connected to gate inputs only
24
Complementary CMOS- Disadvantages
• Large PMOS transistors– Area, Power, Delay increase
• Series transistors in the output stage– Weak output driving capability
• Delay increases
25
Pseudo-NMOS Logic
Reduced complexity of logic and hence, lower capacitance, and faster speed
Ratioed Logic, better suited for large fan-in design
Static Current Power Dissipation is high
26
Performance of Pseudo-nMOS
Size, W/Lp Logic 0 voltageLogic 0 static
powerDelay
0 → 1
4 0.693 V 564 μW 14 ps
2 0.273 V 298 μW 56 ps
1 0.133 V 160 μW 123 ps
0.5 0.064 V 80 μW 268 ps
0.25 0.031 V 41 μW 569 ps
J. M. Rabaey, A. Chandrakasan and B. Nokolić, Digital IntegratedCircuits, Upper Saddle River, New Jersey: Pearson Education, 2003.
27
Negative Aspects of Pseudo-nMOS
• Output 0 state is ratioed logic.• Faster gates mean higher static power.• Low static power means slow gates.
28
DCVS Logic
No static power dissipation
Speed advantage of ratioed logic
Has larger area and switched capacitances
29
Pass-Transistor Logic Styles
• One pass-transistor network is sufficient to perform the logic operation– Smaller no. of transistors, smaller input loads
• Threshold Voltage Drop– Swing restoration Circuit required
• Multiplexer Structure– Dual Rail Logic required
30
Complementary Pass-Transistor Logic (CPL)
Small input loads Power and delay reduces
Efficient XOR and MUX implementation
Good output drive Cross-coupled pull-up
Large short-circuit current Substantial number of nodes Inefficient realization of simple gates
31
Double Pass-Transistor Logic (DPL)
Both PMOS and NMOS logic networks are used in parallel Full swing on the output
signals Number of transistors and the
number of nodes are quite high Substantial capacitive load
32
Swing Restored Pass-Transistor Logic (SRPL)
Derived from CPL, Output inverters are cross-coupled to a latch structure Swing restoration and output
buffering at the same time Transistor sizing is difficult, poor
output driving capability Slow switching Large short-circuit current
33
Single-Rail Pass-Transistor Logic (LEAP)
Single NMOS networks are required Area, Power, Delay
decreases Swing restoration only works for
Robustness in the low voltages is not guaranteed
dd tn tpV V V
34
Comparisons between CMOS and Pass-Transistor
• Pass-Transistor logic is claimed to be the low-power logic styles– All the comparisons were based on the full adder
implementation• Not representative
• Full adders have limited importance even in arithmetic circuits
35
Comparisons between CMOS and PL
• Higher Performance for CPL over CMOS in case of full adder implementation
• In case of multiplexer and other monotonic gates CMOS outperforms others
• In case of XOR CPL is faster, but power-delay product is more• CPL provides best performance among all pass-transistor
design styles
36
Domino Logic
Nonratioed logic – sizing of pMOS transistor is not important for output levels.
Higher Speed Only implements noninverting
logic gates Best suited for large fan-in gates Switching activity is high Lower noise immunity Large clock load
37
Logic Activity
• Probability of 0 → 1 transition:– Static CMOS, p0 p1 = p0(1 – p0)– Dynamic CMOS, p0
• Example: 2-input NOR gate
– Static CMOS, Pdyn = 0.1875 CLVDD2fCK
– Dynamic CMOS, Pdyn = 0.75 CLVDD2fCK
p1=0.5
p1=0.5
p1=0.25 p0=0.75
38
Selecting a Logic Style
• Static CMOS: most reliable and predictable, reasonable in power and speed, voltage scaling and device sizing are well understood.
• Pass-transistor logic: beneficial for multiplexer and XOR dominated circuits like adders, etc.
• For large fanin gates, static CMOS is inefficient; a choice can be made between pseudo-nMOS, dynamic CMOS and domino CMOS.