volume 1, issue 2, 2007 - semantic scholar · volume 1, issue 2, 2007 novel energy-efficient...
TRANSCRIPT
Volume 1, Issue 2, 2007
Novel Energy-Efficient Leakage Current Minimization Techniques for CMOS VLSI Circuits Preetham Lakshmikanthan, Graduate Fellow, EECS Department, L.C. Smith College of Engineering and
Computer Science, Syracuse University, Syracuse, New York, USA. E-mail: [email protected] Abstract
Leakage power loss is a major concern in deep-submicron technologies. High-performance processors and servers consume enormous amounts of operating power. For portable devices that have burst -mode
type integrated circuits, it is acceptable to have leakage during the active mode. However, during the idle state it is extremely wasteful to have leakage, as power is unnecessarily consumed with no useful work being done. Efficient leakage control mechanisms are crucial for saving power.
In this research, we propose novel leakage current minimization techniques for CMOS VLSI circuits. A combination of high-threshold and standard-threshold sleep transistors embedded within the CMOS
topology was used in voltage balancing of the Pull-Up Network (PUN) as well as the Pull-Down Network (PDN), thereby shutting them off and minimizing leakage loss. An ultra-low power standard cell library which uses this technique that achieves cancellation of leakage effects in both the PUN and PDN for
CMOS circuits has been characterized for area, delay and power. A signal probability based self -controller was designed for leakage power reduction. It is the core of this work that sequences the working of these sleep-embedded cells in any VLSI circuit. Since signal probabilities are used to determine the mode of
operation of these cells, there is no need for any extra external circuitry for this purpose. The ultra-low power standard library consists of 8 combinational and 2 sequential cells.
Experimental results show significant leakage savings (an average of 20.7X) in CMOS circuits employing this sleep-circuitry when compared to standard CMOS circuits. A methodology to integrate the ultra low-power library and the self-controller into the low-power synthesis framework is also presented as part of
this research. Comparison of our technique with other well -established leakage reduction techniques shows significant leakage savings of the former over the latter, with comparable area and delay performance degradation. Large leakage savings were observed even at higher temperatures. An analysis
of these sleep-embedded circuits showed a negligible 0.42% increase in dynamic power dissipation. Our technique was also applied to the Differential Cascode Voltage Switch Logic (DCVSL) class of circuits. An order of leakage savings was observed, thereby demonstrating its effectiveness.
Copyright c© 2007 Preetham Lakshmikanthan
All Rights Reserved
Table of Contents
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Sources of Power Dissipation . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Motivation for Leakage Control Mechanisms . . . . . . . . . . . . . . . . 4
1.3 Review of Prior Techniques for Leakage Reduction . . . . . . . . . . . . . 6
1.4 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
Chapter 2: Low-Power Synthesis Framework . . . . . . . . . . . . . . . . . . . 17
2.1 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Low-Power Techniques at Different Abstraction Layers . . . . . . . . . . 21
2.3 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . . . . 22
Chapter 3: VSDCAD Ultra Low-Power Standard Cell Library . . . . . . . . . 24
3.1 VSDCAD Sleep-Circuitry Embedded CMOS Cells . . . . . . . . . . . . . . 24
3.2 Leakage Power Calculation with a CMOS OR2 Circuit Example . . . . . 27
3.3 Leakage Savings Compared to the Power-Gating Methodology . . . . . . 30
3.4 Characterized Low-Power Combinational Cells . . . . . . . . . . . . . . . 32
3.5 Characterized Low-Power Sequential Cells . . . . . . . . . . . . . . . . . 36
3.6 Low-Power Differential Cascode Voltage Switch Logic (DCVSL) Cells . . 41
3.7 Active Mode Leakage Loss Increase . . . . . . . . . . . . . . . . . . . . . 43
3.8 Increase in Dynamic Power Dissipation . . . . . . . . . . . . . . . . . . . 45
Chapter 4: Comparison of VSDCAD with Other Leakage Reduction Techniques 50
4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
i
4.2 MCNC’91 VSDCAD Implementation Leakage Values . . . . . . . . . . . 52
4.3 Area and Delay Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.4 Leakage Savings Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 58
Chapter 5: Additional Experimental Research . . . . . . . . . . . . . . . . . . 60
5.1 Leakage Savings after Replacement of Only Non-Critical Cells . . . . . . 60
5.2 Effects of Varying Temperature on Leakage Power . . . . . . . . . . . . . 63
Chapter 6: Signal Probability Based Self-Controller . . . . . . . . . . . . . . . 67
6.1 VCLEARIT Control Circuitry Embedded CMOS Gates . . . . . . . . . . 68
6.2 Gate Control Signal Calculation in Circuits . . . . . . . . . . . . . . . . . 69
6.3 Leakage Loss Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.4 Circuit Delay Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
Chapter 7: Dynamic Power Study . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.1 Standard Circuit Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . 80
7.2 LECTOR Circuit Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . 81
7.3 VCLEARIT Circuit Dynamic Power . . . . . . . . . . . . . . . . . . . . . . 82
Chapter 8: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8.2 Scope for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Appendix A: Sleep-Embedded Master-Slave POSX DFF Schematic . . . . . . . 87
Appendix B: Automated Gate Control Signal Calculation - Parser Output . . . 90
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
ii
List of Figures
Figure Number Page
1.1 Projected Subthreshold Leakage Power [22] . . . . . . . . . . . . . 5
1.2 LECTOR CMOS Gate [33] . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1 Overview of Synthesis Framework for Low-Power Design . . . . 18
3.1 Block Diagram - Generic VSDCAD CMOS Circuit . . . . . . . . . . . 25
3.2 Sleep-Embedded Cascaded OR2 Gate Schematic . . . . . . . . . . . 28
3.3 Output Waveforms for Sleep-Embedded OR2 Gate . . . . . . . . . . 29
3.4 Block Diagram - Generic Power-Gated CMOS Circuit . . . . . . . . 30
3.5 Output Waveforms Showing Functioning of the VSDCAD POSX
Master-Slave D Flip-Flop in Standard and Sleep Modes of Opera-
tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.6 Block Diagram - Generic Sleep-Embedded DCVSL Circuit . . . . . 42
4.1 Leakage Power Dissipation Comparison . . . . . . . . . . . . . . . . 58
5.1 Temperature Effects On Leakage Power - Standard Mode . . . . . 63
5.2 Temperature Effects On Leakage Power - Standby (Sleep) Mode 64
5.3 Leakage Power Savings at Higher Temperatures . . . . . . . . . . 65
6.1 VCLEARIT CMOS Gate (AND, NAND) with Control Value 0 . . . . . 68
6.2 VCLEARIT CMOS Gate (OR, NOR) with Control Value 1 . . . . . . . 70
6.3 Example illustrating Gate Control Signal Calculation . . . . . . . 71
A.1 Block Diagram - VSDCAD Master-Slave Positive Edged D Flip-Flop 88
A.2 Schematic - VSDCAD Master-Slave Positive Edged D Flip-Flop . . 89
B.1 C17 Circuit Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
iii
List of Tables
Table Number Page
1.1 Summary of Leakage-Power Reduction Techniques and Method-
ologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.1 Low-Power Techniques at Various Design Abstraction Levels [60] 21
3.1 Standard OR2 Gate : Average Leakage Power Loss = 152.34 pW . 27
3.2 Leakage Comparison - VSDCAD Circuit vs. Power-Gated Circuit 31
3.3 Combinational Cell Library Performance Measurements @ Tem-
perature = 27oC for Circuits Implemented using TSMC’s 180 nm
Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.4 Combinational Cell Library Performance Measurements @ Tem-
perature = 27oC for Circuits Implemented using TSMC’s 180 nm
Technology (cont’d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Sequential Cell Library Performance Measurements @ Tempera-
ture = 27oC for Circuits Implemented using TSMC’s 180 nm Tech-
nology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6 Sequential Cell Library Performance Measurements @ Tempera-
ture = 27oC for Circuits Implemented using TSMC’s 180 nm Tech-
nology (cont’d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.7 DCVSL Cell Performance Measurements @ Temperature = 27oC
for Circuits Implemented using TSMC’s 180 nm Technology . . . . 43
3.8 Active Mode Leakage Loss - Standard CMOS Circuit vs. Sleep-
Embedded CMOS Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . 44
3.9 Dynamic Power Dissipation - Standard Circuits vs. VSDCAD Sleep-
Embedded Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.1 PMOS Supply and Threshold Voltage Values for Various BPTM
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
iv
4.2 NMOS Supply and Threshold Voltage Values for Various BPTM
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 VSDCAD Leakage Values for MCNC’91 Benchmarks for Various
Deep-Submicron Technologies . . . . . . . . . . . . . . . . . . . . . . 53
4.4 Leakage Power and Delay Comparison for Two-Input Nand Gate 55
4.5 Experimental Results for MCNC’91 Benchmarks (70 nm BPTM Pro-
cess, Supply Voltage = 1V) . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.1 Perf. Chars. of an 8-bit Standard Ripple Carry Adder . . . . . . . 61
5.2 Perf. Chars. of an 8-bit Sleep-Embedded Ripple Carry Adder . . 62
6.1 Leakage Power Comparison @ Temperature = 27oC (TSMC’s 180
nm Implementation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 Leakage Power Comparison @ Temperature = 27oC (BPTM’s 100
nm Implementation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3 Circuit Delay Comparison (TSMC’s 180 nm Implementation) . . . 76
7.1 Standard Circuit Dynamic Power Measurement @ Temperature
= 27oC (TSMC’s 180 nm Implementation) . . . . . . . . . . . . . . . . 80
7.2 LECTOR Dynamic Power Measurement @ Temperature = 27oC (TSMC’s
180 nm Implementation) . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.3 VCLEARIT Dynamic Power Measurement @ Temperature = 27oC (TSMC’s
180 nm Implementation) . . . . . . . . . . . . . . . . . . . . . . . . . . 82
v
Acknowledgments
First and foremost, I would like to thank God Almighty for giving me the strength and
patience to realize my dream of getting a Ph.D degree.
Next, I would like to thank Dr. Adrian Nunez, my dissertation advisor and mentor at
Syracuse University. I truly admire his perseverance, depth of knowledge and strong
dedication to students and research that has made him one of the most successful pro-
fessors ever. His mastery at any topic is amazing, but yet he is such a humble and
down-to-earth person. I’m glad that I was given the opportunity to work with him.
He brings out the best in his students and I’d like to thank him for all the support,
encouragement and guidance given to me during my graduate years. Any student
should consider himself or herself extremely fortunate to find a gem of an advisor like
Dr. Nunez. Thanks again for everything, Adrian - my friend, philosopher and guide.
Next, I would like to acknowledge Prof. N. Venkateswaran for motivating me and guid-
ing me through my undergrad years. He has always had the confidence that I could
get my Ph.D some day. I have not believed as much in myself as Waran sir did in me.
He lives and breathes for his students. What I am today is all because of him. Thank
you so much, Sir. I will never forget all that you have done for me.
Next, my heartfelt thanks go to my Ph.D committee members: Dr. Shobha Bhatia, Dr.
Ehat Ercanli, Dr. Can Isik, Dr. Srinivas Katkoori, Dr. Nazanin Mansouri and Dr. Fred
vi
Schlereth. All of you have been like co-advisors to me and have helped me a lot with my
dissertation right from its inception. I’m extremely proud to have such wonderful and
knowledgeable people like yourselves serving on my dissertation committee. I’m really
grateful for all your thoughtful insights and suggestions in helping me get exceptional
research done towards my doctoral degree. Thank you, Professors.
How can I forget my dissertation “preview” committee, Mr. Henry Jankiewicz of the
Graduate Editing Center ? Henry did a great job of refining the crappy initial drafts
of my dissertation into the awesome manuscript that it is today. Thanks so much for
your time, effort and patience, Henry.
Next, I’d like to thank all my VSDCAD lab mates: Chandy, Subbu, Sharlet, Amit, Yo-
gesh, Lu, Siddharth, Sameer, Shweta, Deepak, Neema, Dipti, Mayank, Chetan, Ab-
hishek, Karan, Vikram, Pradyuman and Payal for all the good times at the lab. Now
onto the non-VSDCAD folks: S.K, Sashi, Shantanu, Ganji, Ashok, Satish, Smita, Ghan-
shyam, Ravikumar, Marudhu, Anand Natarajan, Murali, Rosanne, Karen, Maureen,
Roni, Sally, Parija, Krishnan, Ganesh, Karthick Jayaraman, Rohan - Netto and Fer-
nandes, Tanu, Murali, Vai, Premal, Ajay Brar, Anand Chandrashekar, Anirudha Kr-
ishna, Gulru, Aravind, Srinath, Bharath, Rahul, Harish, Shyam, Navaneeth, Jimmy,
Tarun, Roma, Salil, Santosh Singh, Young, Priyank, Deniz, Shivani, Srilatha, Venkat
Dharmarajan, Vijay Appadurai, Vishal Chugh, Vishal Kapashi and Yan Zhang to
name a few. Thanks so much to all of you for the fun, frolic and great memories here
at S.U.
Finally, and above everyone else, I would like to thank My Family for standing by me
through all the joys and sorrows that life had to offer. My heartfelt thanks and life-long
gratitude go to my Dearest Mother, Mrs. Mala Kanthan and my Loving Father, Major
K.L. Kanthan for all the love and affection that they have showered upon both their
children. You both are the Best and Most Loving Parents that anyone can hope to have
vii
in this entire universe. If not for your constant support, encouragement and sacrifices
I would never have made it to this stage in life. I love you so much and am proud to be
your son. I still have so much to learn from you and pray that I am re-born as your son
in future lives also. I thank my Dear Brother, Gautham for being such a great sibling
and putting up with my wonky behavior throughout his life (I’m not done as yet, my
dear fellow !). Being the studious type, he continues his quest for knowledge and is
currently pursuing his M.B.A. His undying dedication to studies has always been my
inspiration to try and study something at least. I’d like to thank My Dearest Rathi
Aunty, Dida, Patti and Both Thatha’s who are watching over us and helping my en-
tire family from Up There. I cannot thank Kona Aunty, Sharma Uncle, Nita, Shoba,
Krishna and Vinay enough for suggesting that I pursue a Ph.D at S.U and also for pro-
viding me a home away from home. I am indebted to all of you for life. Last, but not the
least I’d like to thank my Dearest Wife, Raji for bringing a new meaning and purpose
to my otherwise dull life. She was instrumental in me finishing up my dissertation
writeup in a record 3 weeks time and was my pillar of strength during those last few
weeks leading upto my final defense. Thanks so much for your un-ending support and
love, babe !
I’d like to thank all those people who have helped me out in some way or the other, and
whose names I’ve inadvertedly missed out here. Thank you so much, everyone.
viii
To My Loving Parents, Brother, Raji
and
My Dearest Rathi Aunty
Chapter 1
Introduction
With rapid progress in semiconductor technology, feature sizes have shrunk through
the use of deep-submicron processes, thereby enabling extremely complex function-
ality to be integrated on a single chip. Battery-powered electronic systems form the
backbone of the growing market of mobile hand-held devices used all over the world
today. In order to maximize battery life, the tremendous computational capacity of
portable devices such as notebook computers, personal communication devices (cell
phones, pocket PCs, PDAs), hearing aids and implantable pacemakers has to be real-
ized with very low power requirements. With miniaturization and the growing trend
towards wireless communication, power dissipation has become a very critical design
metric. The longer the battery lasts, the better.
Even with the scaling down of the supply voltage, power dissipation has not dimin-
ished. The magnitude of power per unit area has kept growing, and the accompanying
problem of heat removal and power dissipation has kept getting worse. Innovative
cooling and packaging strategies [61] are of little help for the rapidly increasing power
consumption of present day chips. Also, the cost associated with packaging and cool-
ing such devices is becoming prohibitive. In addition to cost, the issue of reliability
is a major concern. Every 10oC increase in operating temperature roughly doubles
1
a component’s failure rate [66]. Minimizing power consumption is currently an ex-
tremely challenging area of research, especially with on-chip devices doubling every
two years [6].
Design styles [62] play a key role in determining the power dissipation, performance
and supply/threshold scalability of a circuit. Dynamic circuits achieve high levels of
performance (speed) and utilize less area. However, they require two operation phases:
pre-charging and evaluation. They cannot be scaled easily due to their low noise im-
munity, and require keeper circuits to restore logic levels. On the other hand, fully
Complementary Metal Oxide Semiconductor (CMOS) styles are usually robust, dissi-
pate low power, have fully restored logic levels, and are easily scalable. In general,
they require more area (2X transistors when compared to X+2 in the case of dynamic
circuits).
1.1 Sources of Power Dissipation
The power consumed by CMOS circuits can be classified into two categories :
• Dynamic Power Dissipation: For a fraction of an instant during the operation
of a circuit, both the PMOS and NMOS devices are “on” simultaneously. The du-
ration of the interval depends on the input and output transition (rise and fall)
times. During this time, a path exists between Vdd and Gnd and a short-circuit
current flows. However, this is not the dominant factor in dynamic power dissi-
pation. The major component of dynamic power dissipation arises from transient
switching behavior of the nodes. Signals in CMOS devices transition back and
forth between the two logic levels, resulting in the charging and discharging of
parasitic capacitances in the circuit. Dynamic power dissipation is proportional
to the square of the supply voltage. Every time a capacitive node (CL) switches
from Vdd to Gnd (and back), energy of CLVdd2 is consumed. In deep-submicron
2
processes, supply voltages and threshold voltages for MOS transistors are greatly
reduced. This, to an extent, reduces the dynamic power dissipation.
• Static Power Dissipation: This is the power dissipation due to leakage cur-
rents which flow through a transistor when no transactions occur and the tran-
sistor is in a steady state. Leakage power depends on gate length and oxide
thickness. It varies exponentially with threshold voltage and other parame-
ters. Reduction of supply voltages and threshold voltages for MOS transistors,
which helps to reduce dynamic power dissipation, becomes disadvantageous in
this case. The subthreshold leakage current increases exponentially, thereby in-
creasing static power dissipation. The main components of leakage current [29]
in a MOS transistor are:
– Reverse-biased junction leakage current: Junction leakage occurs from the
source or drain to the substrate through the reverse-biased diodes when the
transistor is off.
– Gate induced drain leakage: This is caused due to the high field effect in the
drain junction of MOS transistors. It is made worse by high drain to body
voltage and high drain to gate voltage.
– Gate direct tunneling leakage: Gate leakage flows from the gate through the
oxide insulation layer to the substrate. Direct tunneling current is signifi-
cant for low oxide thickness. The gate leakage of a PMOS device is typically
one order of magnitude smaller than that of an NMOS device with identical
Tox and Vdd.
– Subthreshold (weak inversion) leakage: This is the drain to source current
of a transistor operating in the weak inversion region, when gate to source
voltage (VGS) is below the transistor threshold voltage (Vth). Equation 1.1
3
approximates the subthreshold leakage current [64] of a MOSFET.
Isub = A.eθ.
[
1 − e
(
−qVDSkT
)
]
(1.1)
where
A = µ0CoxW
L
(
kT
q
)2
e1.8
and
θ =
[
q
n′kT
(
VGS − Vth0−γ′Vs+ηVDS
)
]
µ0 is the carrier mobility; Cox is the gate oxide capacitance per unit area; W
and L denote the transistor width and length; kTq
is the thermal voltage at
temperature T; n′ is the subthreshold swing coefficient of the transistor; VGS
is the gate to source voltage of the transistor; Vth0is the zero-bias threshold
voltage; γ′Vs is the body effect where γ’ is the linearized body effect coeffi-
cient; and η is the Drain Induced Barrier Lowering (DIBL) coefficient. VDS is
the drain to source voltage of the transistor.
Pleak =∑
i
Isubi.VDSi
(1.2)
Equation 1.2 gives the total leakage power for all the transistors.
Minimization of subthreshold leakage is the primary goal in this research work.
1.2 Motivation for Leakage Control Mechanisms
Cell phones and pocket PCs have burst-mode type integrated circuits, which for the
majority of the time are in an idle state. For such circuits, it is acceptable to have leak-
age during the active mode. However, during the idle state it is extremely wasteful
to have leakage, as power is unnecessarily consumed with no useful work being done.
Given the present advances in power management techniques [60, 64, 75], leakage loss
4
is a major concern in deep-submicron technologies, as it drains the battery, even when
a circuit is completely idle.
Power dissipation of high-performance processors and servers is predicted to increase
linearly over the next decade [26]. The 2006 International Technology Roadmap for
Semiconductors [6] projects power dissipation to reach 198 Watts in the year 2008
and reach 300 Watts by the year 2018. Multi-core integrated processors [1, 5] deliver
significantly greater compute power through concurrency, offer greater system den-
sity and run at lower clock speeds, thereby reducing thermal dissipation and power
consumption to an extent. Leakage power will contribute towards the majority of the
total power consumption for such servers fabricated with deep-submicron technologies.
0.1
1
10
100
Su
bth
resh
old
Lea
kag
e (W
atts
)
Lo
gsc
ale
Technology
0.25u 0.18u 0.13u 90nm 65nm 45nm
Figure 1.1: Projected Subthreshold Leakage Power [22]
Figure 1.1 shows subthreshold leakage power trends [22] in accordance with Moore’s
law. Clearly, with deep-submicron processes, chips will leak excessive amounts of
5
power. By the year 2020, leakage is expected to increase 32 times per device [6]. This
is a major challenge in scaling down designs, and it motivates the need for efficient
leakage control mechanisms to minimize power overheads in circuits designed with
deep-submicron technologies.
An ultra-low power standard cell library was implemented as part of this research. A
novel voltage balancing strategy using sleep transistors to reduce leakage power was
used in implementing the CMOS standard cells. Our technique significantly reduces
leakage power, with savings of 20.7X (on average) for various standard cells designed
with a 180 nm process technology. Designers and automated synthesis tools can select
components from this library to build energy-efficient circuits. A signal probability
based self-controller technique was also developed to integrate the low-power standard
cell library into the low-power synthesis framework.
1.3 Review of Prior Techniques for Leakage Reduction
A lot of interesting research work has been done in the attempt to minimize leakage
power. Listed below are some publications related to our work, each having its own
unique features :
Durate et al. present a survey of leakage minimization techniques in [27]. They list the
benefits and limitations of various techniques and optimizations applied at run-time.
Pedram et al. give a tutorial of various representative power minimization techniques
at the register level (RTL) in [14].
Ye et al. [74] show that the “stacking” of two off devices significantly reduces sub-
threshold leakage, compared to a single off device. These stacks are series-connected
devices between supply and ground (e.g., PMOS stack in NOR or NMOS stack in NAND
6
gates). Their technique enables leakage reduction during standby mode by input vec-
tor activation. It involves extensive circuit simulations to install a vector at the input
of the circuit, so as to maximize the number of PMOS or NMOS stacks with more than
one off device.
Chen et al. [25] performed an analysis of subthreshold leakage through a stack of n-
transistors. A genetic algorithm based technique was used to determine the bounds for
leakage power in various CMOS circuits. As part of their analysis, they determined a
set of test vectors which places corresponding circuits in the low-power standby mode.
Yang et al. [73] present an accurate macro-model for the stacking effect on leakage
power for sub-100 nm circuits.
Narendra et al. [58] present a full-chip subthreshold leakage current prediction model.
In [57], they use a stack-forcing method to reduce subthreshold leakage. This is
achieved by forcing a non-stack transistor of width ‘W ’ to a series-stack of two tran-
sistors, each of width ‘W2
’. This effective method does not affect the input load and
the switching power. However, there is a delay penalty to be incurred as a result of
this stack-forcing. Hence, this technique can be used only on devices in paths that are
non-critical.
Hanchate et al. [33] propose a technique called LECTOR for designing CMOS gates,
which cuts down leakage current by adapting the technique of effective stacking of
transistors. Experimental results obtained using leakage reduction techniques de-
scribed later in this research are compared with the LECTOR results. Figure 1.2 illus-
trates the topology of a LECTOR CMOS gate. Two Leakage Control Transistors (LCTs),
LCT1 and LCT2, are introduced between nodes N1 and N2. The gate terminal for each
LCT is controlled by the source of the other. Hence, these LCTs act as self-controlled
7
in 1
in n
LCT1
LCT2
N2
N1
PDN
PUN
Gnd
Vdd
out
Figure 1.2: LECTOR CMOS Gate [33]
stacked transistors. No external control circuitry is required using the LECTOR imple-
mentation. The introduction of LCTs increases the resistance of the path from Vdd to
Gnd, thereby reducing leakage.
Mutoh et al. [56] were the pioneers of Multi-Threshold voltage CMOS (MTCMOS) cir-
cuits. Here, low-threshold (low-VT ) transistors that are fast and leaky are used to
implement speed-critical logic. High-threshold (high-VT ) devices that are slower, but
have low subthreshold leakage, are used as sleep transistors. Multi-threshold voltage
circuits have degraded noise immunity when compared to standard low-threshold volt-
age circuits. The sleep transistor has to be sized properly to decrease its voltage drop
when it is on. A sleep control scheme was introduced for efficient power management.
Since data retention was required in standby mode, this work was extended, and an
extra high-VT memory circuit was introduced in [65].
8
Wei et al. present a mixed-Vth CMOS circuit design methodology in [70].
Kao et al. [41] used MTCMOS for power-gating. A method to size sleep transistors,
based on a mutual exclusion discharge pattern principle, is described. The introduc-
tion of extra devices in series with the power supplies leads to a performance penalty.
Automated sizing of sleep transistors can be done using the technique illustrated in
[47] by Lakshmikanthan et al.
Agarwal et al. present a technique in [17] for power-gating with multiple sleep modes.
Bhunia et al. present a novel circuit technique in [21] to minimize power dissipation in
combinational circuits. This is achieved by inserting extra supply-gating transistors in
the supply to ground paths of the circuit. They assume that the sleep/wake-up signals
to control these gating transistors are generated from an external power management
unit. In the active mode, the gating transistor is on and the circuit behaves as usual.
In the standby mode, the gating transistor is turned off, thereby cutting off power to
the circuit. In [13], Abdollahi et al. consider another important objective, which is
limiting the number of sleep transistors.
Yuan et al. [76] apply Input Vector Control (IVC) techniques for leakage power re-
duction. IVC utilizes the transistor stack effect in CMOS gates by applying a Min-
imum Leakage Vector (MLV) to the primary inputs of combinational circuits during
the standby mode. The MLV problem is NP-Complete. Typically, an exhaustive circuit
simulation is performed for all input patterns, to find the pattern with the minimum
leakage current. However, this approach is not practical for large circuits. In their
work, Yuan et al. replace internal gates in their worst leakage states by other library
gates, while maintaining the correct functionality of the circuit during the active mode.
They present a divide-and-conquer approach that integrates gate replacement and an
9
optimal MLV searching algorithm for tree circuits.
Johnson et al. [40] show that a particular ordering of the inputs could potentially make
use of the well-known stack effect technique to reduce leakage overheads. Since prac-
tical circuits do not consist of only a single transistor stack, a procedure to evaluate
the leakage of a CMOS circuit, given a set of logic signal inputs, is explained. They it-
eratively choose the input with the largest leakage observability and assign it a value
that results in the smallest leakage. The input combination constructed by this greedy
heuristic was taken as the MLV.
Abdollahi et al. [12] propose a technique to directly control the value of internal nodes
to reduce leakage. They add PMOS and NMOS transistors to some of the gates in the
circuit to increase the controllability of the internal signals of the circuit and decrease
the leakage current of the gates using the “stack effect”. Boolean satisfiability (SAT)
is then used to formulate the problem, which is subsequently solved using efficient
off-the-shelf SAT-solvers [18]. More precisely, given a combinational circuit descrip-
tion, they first construct a boolean network which computes the total leakage of that
circuit. From this Leakage Computing Network (LCN), they write a set of boolean
clauses that capture the leakage current of the original circuit. A SAT-solver is then
used to find the MLV. The time complexity of the SAT solver, however, is exponential in
the worst case.
Kursun et al. evaluate the subthreshold leakage current characteristics of domino logic
circuits in [45]. They show that a discharged dynamic node is preferred for reducing
leakage current in a dual-VT circuit. Alternatively, a charged dynamic node is better
suited for lower leakage in a low-VT circuit. The keeper and output inverter have to be
sized in a dual-VT domino circuit with a high-VT keeper, in order to provide noise im-
10
munity similar to that of a low-VT domino logic circuit. [44] employs these techniques,
coupled with sleep transistor switches, for placing idle domino circuits in a low leakage
state. A high-VT NMOS sleep transistor is connected in parallel with the dynamic node
of domino logic circuits. In the standby mode of operation, the pull-up transistor of the
domino circuit is off, while the NMOS sleep transistor is turned on. The dynamic node
of the domino gate is discharged through the sleep transistor, thereby significantly re-
ducing the subthreshold leakage current.
To achieve low-power benefits without compromising performance, static and dynamic
scaling of supply voltages can be applied. Static supply-scaling is a multiple supply ap-
proach in which critical and non-critical paths are clustered and powered by higher and
lower supply voltages, respectively. Since the speed requirements of the non-critical
clusters are lower than the critical ones, the supply voltage of non-critical clusters can
be lowered without degrading performance. Hillman, in his work in [36], designs an
SoC, using clusters of components characterized at various voltage levels. Whenever
a node from a low-voltage cluster needs to drive a node of a high-voltage cluster (or
vice-versa), a level-conversion is needed at the interface. The secondary voltages may
be generated off-chip [69] or on-chip [63]. Design issues and implementation strategies
for building on-chip dc-dc voltage level-shifting circuits are presented by Lakshmikan-
than et al. in [48]. Dynamic supply scaling is much harder to generate, but saves the
cost of using two supply voltages by adapting the single supply voltage to performance
demand.
Iman et al. present POSE, a Power Optimization and Synthesis Environment [39],
for designing low-power digital circuits at the logic level. POSE provides a unified
framework for specifying and maintaining power relevant circuit information. Power
optimization techniques were developed with area-power trade-offs. Low-power opti-
11
mization algorithms provided in POSE are classified into three categories : Algebraic
Restructuring Techniques, Node Simplification, and Technology Mapping. Experimen-
tal results show an average reduction of power consumption by 29% at the expense
of area increase by 30% on average. The delay of the circuits increased by 4%. This
clearly shows a trade-off between area and power, while the circuit delay is not much
affected.
Abdollahi et al. [15] present a precomputation-based guarding methodology for reduc-
ing both dynamic and static power consumption in CMOS VLSI circuits. Precompu-
tation logic duplicates part of the logic by precomputing the circuit output values, one
clock cycle before they are required. It is a method in which some inputs of a circuit are
frozen, while some smaller circuit computes the output values. Unlike precomputation,
guarded logic does not require synthesis of additional logic to implement the shutdown
mechanism. It exploits the existing signals in the original circuit, and no changes
to the original combinational circuitry are needed. Guarded evaluation involves de-
termining which parts of a circuit are computing useful results and which parts are
computing results that are not used. The unnecessary portions can be shut off. If the
guarding signal itself switches frequently, the power dissipation of the switching sleep
transistors may outweigh the power saving due to guarding. Hence, in their work,
Abdollahi et al. propose a method to generate a new guard signal, based on the most
recent values of the original guarding signal.
Most of the techniques listed above are not complete for RTL synthesis of low-power
circuits. They require an external controller that sequences the working of the entire
circuit. The controller should be able to identify and differentiate between portions of a
circuit that are active (switching) and parts of the circuit that are inactive. “Sleep” sig-
nals should be generated automatically to synchronize the operation of the datapath
12
(design), thereby switching devices back and forth between active and standby modes
of operation. This sleep/enable controller generation is assumed to be present in most
of the prior research work.
Elkarablieh et al. [28] present a synthesis technique for reducing leakage power, based
on signal controllability chains. Local re-synthesis of a large fan-in gate into smaller
sleep-embedded gates that achieve the same functionality is suggested. The sleep sig-
nals controlling the corresponding smaller gates could be judiciously picked from the
pattern combination at the input of the original (large fan-in) gate. Signal controllabil-
ity measures shown in [30] predict how controllable the output of a circuit is. This idea
is used in their work to determine which signal should be used to place some portion
of the circuit in sleep mode. They define controllability as the length of the chain of
gates driven by a signal whose output is controlled by the value of the signal. The idea
is basically to assign sleep signals using lines with the longest controllable chains. A
mathematical model for the estimated power saving is presented.
Calhoun et al. identify sneak leakage paths and present a set of design rules in [23].
They partition the Configurable Logic Blocks (CLBs) of the target Field Programmable
Gate Array FPGA architecture into four sleep regions : A Look-Up Table (LUT) region,
an adder region, a flip-flop region and a control circuitry region. The configuration
bits tell each CLB how to organize its internal parts at run time. These configuration
bits also act as control signals for the sleep regions. Minimal control logic is required
for deciding when to assert the sleep signal for each local sleep region. The FPGA
architecture inherently avoids many interfacing problems for sleep regions by using
transmission gate multiplexors.
Various overheads, like the routing of the sleep control signals, increased area, and
13
an excess delay penalty due to repeated turning on and off the circuit, are observed
for external controller-based leakage reduction circuits. In this research work, all the
components needed to build low-leakage power circuits are completely integrated, in-
cluding the low-power component library and the self-controlling leakage reduction
technique.
Table 1.1 gives the classification, advantages and disadvantages of the leakage-power
reduction techniques and methodologies described previously in this section. Each
technique has its own unique features, and no technique in particular can be claimed
to be better than the other. As can be seen from the advantages and disadvantages
of each methodology, achieving low power consumption definitely involves trading-off
various performance parameters, like area, delay and throughput.
1.4 Dissertation Organization
This dissertation is organized as follows: In Chapter 2, a low-power synthesis frame-
work is presented. Enumerated in this chapter are the significant contributions of this
dissertation and how they fit into a typical synthesis environment. Chapter 3 is a de-
tailed description of the design and methodology used in the development of the ultra
low-power standard cell RTL component library at the VLSI Systems Design and CAD
(VSDCAD) Laboratory in Syracuse University. The characterization of combinational
CMOS cells is presented first, followed by the characterization of sequential circuits.
Experimental results for various classes of circuits are tabulated illustrating signifi-
cant leakage savings in all cases. Chapter 4 is a comparison of the VSDCAD technique
with other well-established leakage reduction techniques. Crucial design constraints,
such as area, delay and leakage savings, are the important factors considered for this
comparison. Chapter 5 presents additional experimental results pertinent to leakage
reduction, and includes a study of leakage effects at higher temperatures. Chapter 6
14
explains the most significant contribution of this dissertation–the signal probability
based VCLEARIT self-controller circuitry. The procedure to calculate the gate control
signal is explained with an example. Experimental results are presented, demonstrat-
ing the effectiveness of the self-controlled circuits. Chapter 7 discusses the effects of
adding extra leakage reduction circuitry on the dynamic power of circuits. Finally,
Chapter 8 details the conclusions drawn from this dissertation work and presents the
scope for future research.
15
Table 1.1: Summary of Leakage-Power Reduction Techniques and Methodolo-
gies
Methodology Related Research Advantages Disadvantages
Description Publications
“Stack” effect due No controller to generate Increased area,
to leakage control [25, 33, 57, 58, 74] sleep/enable signals, No level Degraded circuit delay
transistor insertion converters for voltage scaling,
“Stack” effect due to No process technology Exhaustive simulation,
Input Vector Control (IVC) [12, 34, 40, 76] modification, No negative Needs modeling as boolean
and Minimum Leakage impact with technology scaling satisfiability problem, then
Vector method (MLV) No change in logic circuitry use SAT or ILP solvers
Sleep transistor Implement sleep controller,
insertion [41, 44, 45, 56, 65] No exhaustive simulation, Diligently size sleep transistor,
and/or No level converters for Increased area,
MTCMOS circuits voltage scaling, Widely accepted Degraded circuit delay,
Gating the and used technique among State-retention problem,
supply [13, 21] the research community Mixed process technology
voltage for circuit fabrication
Use of mixed voltage supplies Increased area,
Static/Dynamic for different portions of the Need for circuit partitioning,
voltage [24, 36, 48, 63, 69] circuit, resulting in lower Intricate tuning of level
scaling overall power consumption converters, Not cost effective,
Degraded circuit delay
16
Chapter 2
Low-Power Synthesis Framework
Synthesis is the process of transforming the design from one level of abstraction to
another. CAD research has progressed from low levels of abstraction to higher levels of
abstraction through circuit, logic, register-transfer and behavioral level synthesis. The
input to a typical synthesis environment is a high-level design specification, which is
then transformed into various levels of abstraction, using High Level Synthesis (HLS),
Logic Synthesis and finally Layout Synthesis processes. Synthesis environments try
to satisfy various user-defined design constraints, like area, timing, speed, through-
put and power, to name a few. With present-day design complexities, it is extremely
difficult or impossible to satisfy all the constraints simultaneously. Hence, judicious
trade-offs among various constraints are made, and the design that best matches the
user requirements is returned.
In the previous chapter, an overview of the sources of power dissipation was given,
and the need for leakage control mechanisms and the various issues involved were
explained. In this chapter, a synthesis framework for low-power design is presented,
and the important contributions of this dissertation are listed. The primary goal of
this work is to optimize and produce low-power designs, with an emphasis on leakage
17
(Cadence/Synopsys)
RTL Simulation
Test Vectors
(Cadence/Synopsys)
Slack Calculation
Replacing Non−Critical Cells
Having Enough Slack With
Sleep−Embedded Cells
Controller Design For
RTL VHDL/Verilog
Architectural/Design
(Area, Speed & Power)
Constraints
(VHDL/Verilog) (Cadence/Synopsys)
Behavioral SimulationBehavioral Design Specification
Test Vectors
High−Level Synthesis
(Cadence/Synopsys)
RTL Component Library
TSMC Standard Cells
Controller + Design
Layout Synthesis
(Cadence/Synopsys)
GDSII File
RTL VHDL/Verilog
Layout Simulation
(Cadence/Synopsys)
Test Vectors
Chip Fabrication
Low−Power Cells
VSDCAD Sleep−Embedded
TSMC Standard Cells
RTL Component Library
Leakage Power Reduction
Critical Path Tracing
Figure 2.1: Overview of Synthesis Framework for Low-Power Design
18
power savings, while at the same time honoring other user constraints like area and
timing.
2.1 Framework Overview
The overview of a synthesis framework for low-power design is illustrated in Fig-
ure 2.1. This framework is a top-down approach and consists of various synthesis
phases (HLS, Register Transfer Logic {RTL}, and Layout). The user specifies behav-
ioral designs in subsets of either Verilog [10, 68] or VHDL [11, 19]. Subsets of Verilog or
VHDL are used, since not all constructs of these languages are supported by synthesis.
The various architectural and design constraints, such as area, speed and power, are
also specified by the user. Behavioral simulation is performed to check for the func-
tional correctness of the design.
Next, high-level synthesis is performed on the functionally correct behavioral design.
Commercial tools like Cadence [4] or Synopsys [8] are used for HLS. An RTL compo-
nent library (e.g., TSMC’s 180 nm standard cell library [9]) is used by the HLS system
to generate the RTL Verilog or VHDL code. Critical path analysis is then performed
on the generated RTL code, using Cadence tools or Synopsys Primetime, and the slack
times are calculated for all the standard cells used in the circuit.
Depending on available slack time, the standard cell gates on the non-critical paths of
the circuit are replaced with special sleep-embedded low-power cells that perform the
same functionality. These sleep-embedded cells are selected from an ultra low-power
RTL standard cell component library, which was developed as part of this research
at the VLSI Systems Design and CAD (VSDCAD) Laboratory at Syracuse University.
A combination of high-threshold and standard-threshold sleep transistors embedded
within the CMOS topology was used in voltage balancing of the pull-up network as
19
well as the pull-down network of CMOS circuits, thereby shutting them off and mini-
mizing leakage loss.
The gates on the critical path are unchanged components from the original standard
cell library (TSMC). Whenever non-critical cells are replaced due to availability of slack
time, a regressive search is carried out on all paths of the circuit to ensure that the
critical path has not changed. If the critical path has changed, then the replacement
procedure is backed out and this replacement process is tried on other non-critical cells
in the design. This ensures that the timing of the circuit is not affected, as the original
critical path has not changed. However, the overall circuit leakage power decreases
due to the introduction of low-power cells in non-critical paths. The resultant Verilog
or VHDL RTL datapath (design) code is a mixture of components from the original
standard cell library (TSMC) as well as from the VSDCAD low-power cell library. RTL
simulations are performed to check if the mixed library cell datapath (design) func-
tions according to specifications.
An RTL controller, possibly a Finite State Machine (FSM), or a micro-controller, or a
self control circuit for leakage power reduction and automatic sleep signal generation
in order to change the circuit from operating to standby (sleep) mode and vice-versa,
is then created. Generation of this controller is a complex process with various trade-
offs are involved. An FSM or a micro-controller would be relatively easier to design,
but, would consume extra area, have very slow switching times between the sleep and
wakeup modes of operation and would create routing congestion for the various con-
trol signals in the circuit layout. As an alternative option, a self-controlling circuit
would be more complex to design, but, would alleviate the disadvantages of the prior
technique. In this work, we have designed a signal probability based self-controller for
leakage reduction. RTL simulations are then performed to verify whether or not the
20
controller and datapath work in synchronization with each other and also to check for
timing issues.
Finally, layout synthesis is performed on the RTL code and parasitic extraction done
using Cadence or Synopsys tools. The extracted layout is then simulated before the
GDSII file is sent out to the foundry, so that the chip can be fabricated.
2.2 Low-Power Techniques at Different Abstraction Layers
Low-power design techniques can be applied at various levels of design hierarchy [60]
- the system level, the algorithm (behavior) level, the architecture (structure) level, the
circuit/logic level and the fabrication (technology) level.
Table 2.1 provides a summary of the various techniques that can be applied at each
abstraction level. It shows the sheer complexity of the low-power design problem at all
levels of abstraction. This dissertation presents techniques for leakage reduction and
low-power design at the circuit/logic (RTL) level of abstraction.
Table 2.1: Low-Power Techniques at Various Design Abstraction Levels [60]
Abstraction Level Technique Name(s)
System Partitioning, Power-Down, Power-States
Algorithm Complexity, Concurrency, Regularity, Locality
Architecture Parallelism, Pipelining, Redundancy, Data Encoding
Circuit/Logic Logic Styles, Logic Manipulation, Transistor Sizing, Energy Recovery
Fabrication Technology Threshold Reduction, Multi-Threshold Devices
21
2.3 Contributions of This Dissertation
In the previous section, the overview of a standard synthesis environment for generat-
ing low-power designs was presented. This section summarizes the important features
of the work done for this dissertation. The shaded boxes in Figure 2.1 show where the
major contributions of this dissertation fit into the core segment of the RTL synthe-
sis flow for low-power design. These contributions are enumerated and categorized as
follows:
• Design and Development of the VSDCAD Sleep-Embedded Topology for
Leakage Reduction in CMOS Circuits [49]: A novel technique that achieves
cancellation of leakage effects in both the Pull-Up Network (PUN) as well as the
Pull-Down Network (PDN) of CMOS cells was devised. It involved voltage balanc-
ing in the PUN and PDN paths using a combination of high-VT and standard-VT
sleep transistors. Section 3.1 of Chapter 3 describes in depth the topology as well
as the working of these VSDCAD sleep-embedded CMOS cells.
• Characterization of the VSDCAD Ultra Low-Power Standard Cell Library
[51, 54]: As part of this research, an ultra low-power standard cell library was
developed on the basis of the VSDCAD topology. The VSDCAD ultra low-power
standard cell library contains 8 combinational and 2 sequential standard cells,
which have been characterized for area, delay and power. Sections 3.4 and 3.5
of Chapter 3 describe in detail the process of characterizing these combinational
and sequential cells.
• Signal Probability Based VCLEARIT Self-Controller Design for Leakage
Power Reduction [52]: The self-controller is the vital segment of this disser-
tation work. It sequences the working of the VSDCAD sleep-embedded cells in
22
complex circuits. Signal probabilities are used to determine the mode of opera-
tion (functional or standby) of such cells. The VSDCAD sleep-embedded topology
was modified in this work for better controllability and also to reduce routing
congestion. Chapter 6 describes the VCLEARIT self-controlling leakage reduction
technique. Experiments conducted show significant savings in leakage power for
the VCLEARIT technique, when compared to other well-established techniques,
with comparable area and delay penalties.
• Seamless Integration of the Self-Controlled Sleep-Embedded Cells into
the Low-Power Synthesis Flow: Figure 2.1 of Section 2.1 in this chapter
shows how the various contributions of this dissertation fit into the RTL synthe-
sis segment of the low-power synthesis flow. An example illustrating the leakage
savings obtained after replacing only non-critical cells in a circuit with corre-
sponding VSDCAD cells is presented in Section 5.1 of Chapter 5.
In this chapter, an overview of the synthesis framework for low-power design has been
presented. A step-by-step procedure detailing the way behavioral designs are taken
through a series of synthesis processes, right down to layout, is described. The pri-
mary contributions of this dissertation and the way it can be integrated into the whole
low-power synthesis flow are also explained. In the following chapter, the design, de-
velopment and characterization of the VSDCAD ultra low-power standard cell library
are explained in detail.
23
Chapter 3
VSDCAD Ultra Low-Power
Standard Cell Library
An ultra low-power standard cell RTL component library was developed as part of
this research at the VLSI Systems Design and CAD (VSDCAD) Laboratory at Syracuse
University. This VSDCAD low-power library will be used in the synthesis framework
for generating low-power designs. The following sections of this chapter explain the
methodology used in the design and development of the low-power standard cells.
Then the characterization of combinational CMOS cells is discussed, followed by the
characterization of sequential circuits. The application of the sleep-embedded tech-
nique to Differential Cascode Voltage Switch Logic (DCVSL) circuits in then explained.
Finally, issues relating to active mode leakage power loss, as well as dynamic power
dissipation, are enumerated.
3.1 VSDCAD Sleep-Circuitry Embedded CMOS Cells
The sleep transistor concept used for dynamic circuits in [44] was adapted and modi-
fied to work for leakage reduction in static CMOS complementary circuits. A combina-
24
tion of high-VT and standard-VT sleep transistors are used in our implementation [49],
to provide a well balanced trade-off between high speed and leakage loss. Our tech-
nique facilitates in the creation of an ultra-low power standard cell library, using sleep-
circuitry embedded components.
in 1
in n
Vdd
Gnd
P0
P1
X1
X2
N0
H vt
PUN
PDN
out
sleep
sleep
sleepbar
Figure 3.1: Block Diagram - Generic VSDCAD CMOS Circuit
Figure 3.1 illustrates the topology of a generic CMOS complementary circuit with sleep
transistors embedded in it. We refer to such circuits as VSDCAD sleep-embedded CMOS
circuits for the remainder of this work. There are ‘n’ inputs, in1, . . . inn, feeding the
Pull-Up Network (PUN) as well as the Pull-Down Network (PDN). The transistors in
both the PUN and PDN are standard-VT devices. The sleep-circuitry consists of three
transistors - two PMOS devices {P0 and P1} and one NMOS device {N0}. Transistors
P0 and N0 are standard-VT devices, while P1 is a high-VT device. P0 is connected in
parallel with the PUN, one end connecting to the source (Vdd) and the other end to a
common point X1. N0 is connected in parallel with the PDN, one end connecting to the
25
Gnd and the other end to a common point X2. The high-VT transistor, P1, connects
between the two common points X1 and X2 and behaves like a transmission gate. Two
input signals, “sleep” and its complement “sleepbar” feed transistors {P1, N0} and P0,
respectively. The output of the CMOS circuit, “out”, is drawn from the common point
X2.
The working of the VSDCAD sleep-embedded CMOS circuit is as follows. In the normal
operating mode, “sleep” is off and “sleepbar” is on. This causes transistors {P0, N0}
to turn off and transistor P1 to turn on. The circuit now behaves exactly as a normal
CMOS complementary circuit should. The sleep (standby) operating mode is a little
more involved. In this mode, “sleep” is on and ‘sleepbar’ is off. Hence, transistors {P0,
N0} turn on and transistor P1 turns off. Since P0 is on, common point X1 is also at
voltage Vdd. The PUN is now between two points at equal voltage potential (Vdd) and
hence no leakage current should flow through it. Similarly, N0 is on and common point
X2 is grounded. The PDN is now between two points at equal voltage potential (Gnd)
and hence no leakage current should flow through it. Since “out” is connected to X2,
during the sleep mode the output value will always be ‘0’. The leakage loss occurring
during the sleep mode will only be through the high-VT transistor P1, which is turned
off, but connected between points X1 and X2, which are at different voltage potentials.
For any given process technology, the standard-VT transistors P0 and N0 are unit-sized
devices (the smallest width-to-length {W/L} ratio as defined by the technology). How-
ever, the high-VT transistor P1 needs to be sized appropriately for the VSDCAD sleep-
embedded CMOS cell to have a propagation delay comparable to that of the standard
CMOS cell. There is a nominal increase in both area and propagation delay of the
VSDCAD sleep-embedded circuit, when compared to the standard CMOS circuit. This
overhead of VSDCAD sleep-embedded cells is traded-off against enormous power sav-
26
ings, when compared to the standard CMOS cells.
As an alternative option, an NMOS transistor driven by the input “sleepbar” could be
used in place of the transistor P1. In this circuit, the output of the CMOS circuit “out”
will have to be drawn out from the common point X1, rather than from X2.
3.2 Leakage Power Calculation with a CMOS OR2 Circuit Example
The standard 2-input OR gate (OR2) considered here is a cascade structure consisting
of a 2-input NOR gate followed by an inverter. TSMC’s 180 nm technology [9] with
a supply voltage (Vdd) of 1.8V was used to implement the standard OR2 gate. The
transistor sizes were fixed, similar to those found in the library provided by Oklahoma
State University [7, 31] : PMOS (Width: 3600 nm, Length: 180 nm) and NMOS (Width:
900 nm, Length: 180 nm). SPECTRE1 was used in this work to simulate circuits and
also to measure leakage power.
Table 3.1: Standard OR2 Gate : Average Leakage Power Loss = 152.34 pW
Input Combinations Leakage Power
a b Loss (pW)
0 0 202.45
0 1 185.08
1 0 144.04
1 1 77.76
All possible input combinations for the OR2 gate were applied individually, and their
corresponding leakage power was measured using SPECTRE [4] at a temperature of
27oC. Table 3.1 lists the leakage power loss for all the input combinations. A leakage
1TM - Cadence Design Systems, Inc.
27
Figure 3.2: Sleep-Embedded Cascaded OR2 Gate Schematic
power loss value of 202.45 pW was observed for the “00” input combination. This was
the worst case. The “11” input combination yielded the least leakage power loss value
of 77.76 pW. The average leakage power loss value for all 4 input combination values
of the standard OR2 gate was calculated to be 152.34 pW.
Figure 3.2 illustrates the topology of the VSDCAD sleep-embedded OR2 gate built by
cascading an embedded NOR2 gate followed by an embedded inverter. TSMC’s 180 nm
technology was used in the implementation. Sleep transistors TP2, TN2, TP5 and TN5
28
Figure 3.3: Output Waveforms for Sleep-Embedded OR2 Gate
are standard-VT devices and are unit-sized: PMOS (Width: 600 nm, Length: 180 nm),
NMOS (Width: 600 nm, Length: 180 nm). The other sleep transistors, TP7 and TP8 are
high-VT devices and were sized using a procedure explained in the next section. In the
sleep mode of operation, the output is ‘0’ irrespective of any input combination given.
Hence with “sleep” being ‘1’, “sleepbar” set to ‘0’ and some input combination, the OR2
circuit shown in Figure 3.2 was simulated at a temperature of 27oC. The leakage power
loss measured using SPECTRE [4] was 7.76 pW. Figure 3.3 shows the leakage power
and waveforms obtained. This leakage value is approximately 20 times less than the
leakage value of the standard OR2 gate.
29
3.3 Leakage Savings Compared to the Power-Gating Methodology
Power-gating [13, 21] is a popular technique for reducing leakage power. A power-gated
design uses switches that are high-VT transistors, with sleep signals to effectively
“switch off” the connection to power or ground, thereby turning off leakage power
when the design is in standby mode. Figure 3.4 shows one of the several topologies
(header configuration) of a generic power-gated CMOS circuit. It is a standard CMOS
circuit with ‘n’ inputs in1, . . . inn feeding the PUN and PDN. A high-VT transistor, P1,
connects between the power source (Vdd) and the PUN, acting as a switch. The “sleep”
signal controls transistor P1, turning it on and off as necessary. In the standby mode
of operation, “sleep” is on, thus cutting off power from the CMOS circuit.
in 1
in n
PUN
PDN
Vdd
out
P1
H vt
sleep
Gnd
Figure 3.4: Block Diagram - Generic Power-Gated CMOS Circuit
30
Since power-gating is the most commonly used methodology for reducing leakage power,
it is compared in this section to the VSDCAD sleep-embedded methodology in the
standby mode of operation. Nine experimental CMOS circuits - NOR2, NAND2, OR2,
AND2, XOR2, XNOR2, MUX2x1, FULL ADDER and DFFPOSX were used in this com-
parison. TSMC’s 180 nm technology, with a supply voltage (Vdd) of 1.8V was used to
implement these circuits. All transistors (including the high-VT ones) were unit sized :
PMOS (Width: 600 nm, Length: 180 nm), NMOS (Width: 600 nm, Length: 180 nm).
Table 3.2: Leakage Comparison - VSDCAD Circuit vs. Power-Gated Circuit
CMOS Power-Gated VSDCAD Sleep-Embedded Improvement
Circuit Name Circuit Leakage (pW) Circuit Leakage (pW) (C2/C3)
NOR2 2.225 0.890 2.5X
NAND2 4.176 0.890 4.7X
OR2 6.666 1.780 3.7X
AND2 7.879 1.780 4.4X
XOR2 12.103 2.670 4.5X
XNOR2 14.306 2.670 5.4X
MUX2x1 17.608 3.560 4.9X
FULL ADDER 74.077 14.240 5.2X
DFFPOSX 29.093 9.586 3.0X
Average 4.3X
All 9 circuits were first implemented as power-gated circuits (as shown in Figure 3.4).
SPECTRE [4] was used to simulate them in the standby mode (“sleep” is on) at a tem-
perature of 27oC, and their leakage power was measured. Column 2 of Table 3.2 gives
the average leakage power loss for each of the power-gated cells.
31
Next, all 9 circuits were implemented as VSDCAD sleep-embedded circuits (as shown
in Figure 3.1 of Section 3.1). SPECTRE [4] was used to simulate them in the standby
mode of operation (“sleep” is on and “sleepbar” is off) at a temperature of 27oC, and
their leakage power was measured. Column 3 of Table 3.2 lists the average leakage
power loss for each of the VSDCAD sleep-embedded cells.
The leakage loss of the VSDCAD sleep-embedded cells (from Column 3 {C3}) when
compared to that of power-gated cells (from Column 2 {C2}), is expressed as a ratio
in Column 4 of Table 3.2. The leakage improvement from using the VSDCAD sleep-
embedded methodology across all the experimental circuits is 4.3X (on average). This
is a significant improvement over the commonly used power-gating methodology and
demonstrates the effectiveness of the VSDCAD sleep-embedded technique.
3.4 Characterized Low-Power Combinational Cells
Standard combinational CMOS library cells, such as NOR2, NAND2, OR2, AND2, XOR2,
XNOR2 and MUX2x1, were implemented [54] using TSMC’s 180 nm technology [9].
Transistor sizes in all these circuits were fixed, similar to those found in the library
provided by Oklahoma State University [7, 31]. The W/L ratios of transistors in the
standard cells were in the order of 10X∼20X. The total area of each standard cell is
listed in Column 2 of Table 3.3. A supply voltage (Vdd) of 1.8V was used and transient
analysis performed on all 7 cells listed above, using SPECTRE [4]. The output load for
each of the 7 cells was a fully sized NAND2 gate.
The propagation delay of each cell was calculated and the high-to-low transition (TPHL)
tabulated in Column 3 of Table 3.3. The low-to-high transition (TPLH) was tabulated
in Column 4 of Table 3.3. Next, the circuits were simulated at a temperature of 27oC
32
Table 3.3: Combinational Cell Library Performance Measurements @ Temper-
ature = 27oC for Circuits Implemented using TSMC’s 180 nm Technology
CMOS Standard Circuit Operation Sleep-Embedded Circuit Operation
Circuit Area Propagation Delay Leakage High-VT Area Propagation Delay Leakage
Name TPHL TPLH Power Transistor(s) TPHL TPLH Power
(pm2) (ps) (ps) (pW) W/L Ratio (pm2) (ps) (ps) (pW)
NOR2 1.620 37.12 57.01 84.39 12X 2.224 54.19 69.76 4.79
NAND2 1.296 28.62 39.79 81.20 15X 1.998 38.79 78.13 6.19
OR2 2.106 66.83 90.12 152.34 10X 3.186 123.40 121.59 7.76
AND2 1.782 63.25 58.08 146.36 10X 2.862 64.27 102.26 7.76
XOR2 5.832 90.24 96.20 415.56 12X 7.646 136.33 118.04 14.37
XNOR2 5.832 91.00 109.87 415.67 12X 7.646 114.87 110.88 14.37
MUX2x1 5.346 87.51 101.53 362.79 12X 7.765 137.44 189.31 19.16
FULL (S) 153.17 179.57 (S) 221.97 267.16
ADDER 21.222 (C) 239.21 269.04 1617.56 10X∼12X 37.896 (C) 441.27 429.62 67.52
and their leakage power measured. All possible input combinations were applied and
leakage power loss measured in every case. Column 5 of Table 3.3 lists the average
leakage power loss for each standard CMOS cell.
Next, the VSDCAD sleep-circuitry was introduced (as shown in Figure 3.1 of Sec-
tion 3.1) for all 7 standard CMOS cells. For each cell, transient analysis was performed
in the normal mode of operation (with “sleep” off and “sleepbar” on). The output load
for each of the 7 cells was the same fully sized NAND2 gate used previously. The prop-
agation delays were calculated. These were compared to the standard circuit values
listed in Column 3 and Column 4 of Table 3.3. The high-VT sleep transistor(s) were
33
sized such that the propagation delay of the VSDCAD cell was comparable to that of
the standard cell. Column 6 of Table 3.3 lists this W/L ratio value of the high-VT sleep
transistor(s). The total area of each VSDCAD cell is listed in Column 7 of Table 3.3,
while the propagation delay values TPHL and TPLH are tabulated in Columns 8 and
9 respectively. Finally, the VSDCAD sleep-embedded cell was simulated in the sleep
(standby) mode of operation (with “sleep” on and “sleepbar” off) and the leakage loss
measured. Column 10 of Table 3.3 lists the leakage power loss for all the VSDCAD
sleep-embedded standard cells.
A full adder circuit was built using the low-power VSDCAD sleep-embedded AND2, OR2
and XOR2 library components characterized previously (AND2, OR2 and XOR2 high-VT
sleep transistor(s) with W/L ratios fixed at 10X, 10X and 12X respectively). No addi-
tional tuning for this circuit was necessary. The performance results of the VSDCAD
full adder are presented in Row 8, Columns 6-10 of Table 3.3.
Certain trends and observations from this research on combinational cells are pre-
sented below. The W/L ratio of the high-VT sleep transistor(s) for all the VSDCAD
sleep-embedded cells, as seen in Column 6 of Table 3.3, is in the order of 10X∼15X.
This value is comparable to the transistor sizes of standard CMOS cells. Hence, the
layout of these VSDCAD cells is uniform and more regular, when compared to sleep-
embedded circuits with either unit-sized or extremely large high-VT sleep transistor(s).
Experiments showed that utilizing unit-sized high-VT sleep transistor(s) resulted in
leakage power loss savings of upto 53X (on average). However, a major limitation
was that the propagation delay of such sleep-embedded standard cells was 5X∼6X
more than that of the standard CMOS cells. Hence, a reasonable trade-off among the
area, propagation delay and leakage power savings was made in designing the sleep-
34
Table 3.4: Combinational Cell Library Performance Measurements @ Tem-
perature = 27oC for Circuits Implemented using TSMC’s 180 nm Technology
(cont’d)
CMOS Perf. Comparison Ratios [Columns (C2. . .C5, C7. . .C10) below are from Table 3.3]
Circuit Area Propagation Delay Penalty Leakage
Name Penalty TPHL TPLH Savings
(C7/C2) (C8/C3) (C9/C4) (C5/C10)
NOR2 1.37X 1.46X 1.22X 18X
NAND2 1.54X 1.36X 1.96X 13X
OR2 1.51X 1.85X 1.35X 20X
AND2 1.61X 1.02X 1.76X 19X
XOR2 1.31X 1.51X 1.23X 29X
XNOR2 1.31X 1.26X 1.01X 29X
MUX2x1 1.45X 1.57X 1.86X 19X
FULL (S) 1.45X 1.49X
ADDER 1.79X (C) 1.84X 1.60X 24X
Average 1.49X 1.48X 1.50X 21X
embedded combinational standard library cells.
Columns 2-5 of Table 3.4 give the performance comparison ratios between the standard
CMOS combinational cells and the VSDCAD sleep-embedded combinational cells. Col-
umn 2 of Table 3.4 shows the area penalty (increase) of the sleep-embedded cells (from
Column 7 {C7} of Table 3.3), when compared to that of standard cells (from Column 2
{C2} of Table 3.3). The average area increase for all the circuits is 1.49X, seen in Row 9
of Table 3.4. Kuo [43] states that the most power-efficient high-VT cell has a 2.5X delay
impact, when compared to the standard cell. The propagation delay increase (penalty)
35
for the VSDCAD sleep-embedded cells, when compared to standard cells is 1.48X (on
average for TPHL) and 1.5X (on average for TPLH). Column 5 of Table 3.4 lists the
leakage power loss savings of the VSDCAD sleep-embedded CMOS combinational cells
when compared to standard cells. Average leakage savings of 21X are obtained. The
nominal delay overhead is offset by these significant power savings given the massive
leakage power values predicted in circuits designed using deep-submicron processes
(as shown in Figure 1.1 of Chapter 1). Designers and synthesis tools could add these
sleep-embedded combinational cells in non-critical paths, thereby avoiding effects on
the overall circuit delay, while significantly saving on leakage power loss.
3.5 Characterized Low-Power Sequential Cells
The design of low-power sequential cells is much more involved than that of low-power
combinational cells. This is because low-power sequential circuits are required to re-
tain data even during the power-down (sleep) mode [55]. To this effect, data and clock
retention circuits are employed in flip-flops, to store values during the sleep phase of
operation. In this work, an extension of the Clocked CMOS (C2MOS) Master-Slave Reg-
ister [67] is implemented. During the first half of the clock cycle, the master stage is in
the evaluation mode and samples the input, while the slave stage is in the hold mode.
In the next half of the clock cycle, the master stage is in the hold mode, while the slave
stage evaluates and outputs the value sampled.
The positive edge-triggered master-slave D flip-flop (DFFPOSX) and the negative edge-
triggered master-slave D flip-flop (DFFNEGX) were chosen as the class of standard se-
quential library cells for experimentation. TSMC’s 180 nm technology [9], with a supply
voltage (Vdd) of 1.8V, was used in the implementation of these D flip-flops. Transistor
sizes in these circuits were fixed, similar to those found in the library provided by Ok-
lahoma State University [7, 31]. The W/L ratios of transistors in these standard cells
36
Table 3.5: Sequential Cell Library Performance Measurements @ Temperature
= 27oC for Circuits Implemented using TSMC’s 180 nm Technology
CMOS Standard Circuit Operation Sleep-Embedded Circuit Operation
Circuit Area Timing Parameters Leakage High-VT Area Timing Parameters Leakage
Name TSU THLD TC−Q Power Tran(s) TSU THLD TC−Q Power
(pm2) (ps) (ps) (ps) (pW) W/L Ratio (pm2) (ps) (ps) (ps) (pW)
DFFPOSX 5.994 73.0 89.9 206.39 389.71 8X 9.849 116.3 118.3 405.14 21.21
DFFNEGX 5.994 73.4 90.2 207.86 389.77 8X 9.849 117.0 118.6 405.69 21.21
were in the order of 5X∼20X. The total area of each flip-flop is listed in Column 2 of
Table 3.5.
Transient analysis was performed on the flip-flops listed above, using SPECTRE [4].
The output load in each case was a fully sized NAND2 gate. Various timing parameters
of the flip-flops were measured. The setup time TSU , which is the time that the data in-
put (D) must be valid before the clock transition is tabulated in Column 3 of Table 3.5.
Column 4 of Table 3.5 lists the hold time THLD, which is the time that the data input
(D) must remain valid after the clock edge. The propagation delay of the flip-flop with
respect to the clock edge TC−Q is shown in Column 5 of Table 3.5. Next, the circuits
were simulated at a temperature of 27oC and their leakage power measured. All pos-
sible input combinations were applied and leakage power loss measured in every case.
Column 6 of Table 3.5 lists the average leakage power loss for each flip-flop.
Next, the VSDCAD sleep-embedded DFFPOSX and DFFNEGX cells were implemented
[51]. The various Pull-Up Network (PUN) and Pull-Down Network (PDN) paths (CMOS-
like paths) were first identified in the standard Master-Slave D Flip-Flop. There are
37
7 such paths (including one for the clock inverter). These 7 CMOS-like paths were re-
placed by their equivalent VSDCAD CMOS-like circuits (as shown in Figure 3.1 of Sec-
tion 3.1). Then, a special state-saving circuit was added. It was designed for retaining
data in the master-slave VSDCAD D flip-flop during the power-down (sleep) mode. This
state-saving circuit is a dynamic transmission gate latch, implemented completely us-
ing high-VT transistors, in order to minimize leakage loss. The latch stores the value
of the master-slave flip-flop the instant the circuit goes into the standby mode (“sleep”
is on and “sleepbar” is off). This is achieved by closing the transmission gate and re-
taining whatever value was seen during that instant at the input of the transmission
gate.
Figure A.1 in Appendix A is the block diagram of the VSDCAD master-slave D flip-flop.
The main sections of the VSDCAD D flip-flop (Master, Slave and State-Saving Circuit)
are highlighted in enclosed boxes, as seen in Figure A.1. A level-restoring transistor,
TP13, is part of the state-saving circuitry, in order to strengthen the signal at the out-
put of the transmission gate (formed by connecting transistors TP9 and TN4). All the
high-VT transistors of the state-saving circuitry are unit-sized: PMOS (Width: 600 nm,
Length: 180 nm), NMOS (Width: 600 nm, Length: 180 nm). The sleep transistors in
the VSDCAD CMOS-like paths are unit-sized, with the exception of the high-VT sleep
transistors. The remaining transistors (as in the standard D flip-flop circuit) are sized
similar to those found in the Oklahoma State University library [7, 31]. Figure A.2 in
Appendix A is the actual schematic capture of the VSDCAD master-slave POSX DFF.
For both DFFPOSX and DFFNEGX, transient analysis was performed in the normal
mode of operation. The output load for each was the fully sized NAND2 gate used pre-
viously. The various timing parameters of the flip-flops were calculated. These were
compared to the standard circuit values listed in Columns 3-5 of Table 3.5. The high-
38
VT sleep transistors were sized, such that the propagation delay TC−Q of the VSDCAD
sleep-embedded flip-flop was comparable to that of the standard flip-flop. Column 7
of Table 3.5 lists this W/L ratio value of the high-VT sleep transistors. The total area
of each sleep-circuitry embedded flip-flop is listed in Column 8 of Table 3.5, while the
timing parameter values TSU , THLD and TC−Q are tabulated in Columns 9, 10 and 11
respectively. Finally, the sleep-embedded flip-flop was simulated in the sleep (standby)
mode of operation (“sleep” is on and “sleepbar” is off) and the leakage loss measured.
Column 12 of Table 3.5 lists the leakage power loss for the VSDCAD sleep-embedded
flip-flops.
Certain trends and observations from this research on sequential cells are presented
below. The W/L ratio value of the high-VT sleep transistors for the VSDCAD sleep-
embedded D flip-flops, as seen in Column 7 of Table 3.5, is 8X. This value is comparable
to the transistor sizes of the standard D flip-flops. Hence, the layout of these VSDCAD
flip-flops is uniform and more regular, when compared to sleep-embedded flip-flops
with either unit-sized or extremely large high-VT sleep transistors. Findings showed
that utilizing unit-sized high-VT sleep transistors in VSDCAD sleep-embedded D flip-
flops resulted in leakage power savings of upto 22.39X (on average). However, a major
limitation was that the propagation delay TC−Q of such sleep-embedded flip-flops was
7X∼9X more than that of standard flip-flops. Hence, a reasonable trade-off among the
area, propagation delay and leakage power savings was made in designing the VSD-
CAD sleep-embedded D flip-flop standard cells.
Columns 2-6 of Table 3.6 give the performance comparison ratios between the standard
flip-flop cells and the VSDCAD sleep-circuitry embedded flip-flops. Column 2 of Ta-
ble 3.6 is the area penalty (increase in area) of the VSDCAD D flip-flops (from Column 8
{C8} of Table 3.5), when compared to that of standard D flip-flops (from Column 2 {C2}
39
Table 3.6: Sequential Cell Library Performance Measurements @ Temperature
= 27oC for Circuits Implemented using TSMC’s 180 nm Technology (cont’d)
CMOS Perf. Comparison Ratios [Columns (C2. . .C6, C8. . .C12) below are from Table 3.5]
Circuit Area Timing Parameters Penalty Leakage
Name Penalty TSU THLD TC−Q Savings
(C8/C2) (C9/C3) (C10/C4) (C11/C5) (C6/C12)
DFFPOSX 1.64X 1.59X 1.32X 1.96X 18X
DFFNEGX 1.64X 1.59X 1.31X 1.95X 18X
Average 1.64X 1.59X 1.315X 1.955X 18X
of Table 3.5). The average area increase for the VSDCAD sleep-embedded flip-flops is
1.64X, seen in Row 3 of Table 3.6. Kuo [43] states that the most power-efficient high-
VT cell has a 2.5X delay impact, when compared to the standard cell. The propagation
delay increase (penalty) for the sleep-circuitry embedded flip-flops when compared to
standard flip-flops is 1.955X (on average for TC−Q). The setup (TSU ) and hold (THLD)
times have also increased by factors of 1.59X and 1.315X, respectively. Column 6 of Ta-
ble 3.6 lists the leakage power loss savings of the VSDCAD sleep-embedded flip-flops,
when compared to standard flip-flops. Average leakage savings of 18X are obtained.
Again, as in the case of VSDCAD sleep-embedded combinational circuits, the nominal
delay overhead in sleep-embedded flip-flops is offset by their significant power savings.
Designers and synthesis tools could add these VSDCAD sleep-embedded D flip-flops in
non-critical paths of the circuit.
Figure 3.5 shows the transient waveforms for the characterized VSDCAD positive edge-
triggered D flip-flop, in both the standard as well as standby (sleep) modes of operation.
From the waveforms, it can be observed that the circuit retains proper values in the
40
Figure 3.5: Output Waveforms Showing Functioning of the VSDCAD POSX
Master-Slave D Flip-Flop in Standard and Sleep Modes of Operation
power-down (“sleep” signal high) mode. At simulation times of 45 ns as well as 95 ns,
with the sleep signal high, the VSDCAD sleep-embedded D flip-flop keeps the previously
held values (‘1’ and ‘0’ respectively) rather than storing new input (D) values at the
rising clock edge. This shows that the VSDCAD sleep-embedded master-slave D flip-
flop performs in accordance with the design specifications.
3.6 Low-Power Differential Cascode Voltage Switch Logic (DCVSL) Cells
DCVSL circuits [35] are a combination of two important concepts, differential logic and
positive feedback. These circuits require each input in complementary format, and
41
they produce complementary outputs themselves. The pull-down networks PDN1 and
PDN2 use NMOS devices and are mutually exclusive, i.e, when PDN1 is on, PDN2 is off,
and vice-versa. The function and its inverse are simultaneously implemented by the
circuit.
No DCVSL circuits were characterized as part of the VSDCAD ultra low-power standard
cell library. However, the sleep-transistor technique described in Section 3.1 was ex-
tended [50] and experiments carried out on 3 DCVSL circuits - AND2/NAND2, OR2/NOR2
and XOR2/XNOR2 to check for leakage savings. Figure 3.6 illustrates the topology of
a generic DCVSL circuit with sleep transistors embedded in it. PUT1 and PUT2 are
Pull-Up Transistors.
in 1
in 1
in n
in n
Gnd
PDN1
N1
Vdd
P1
X2
PUT1
H vt
sleep
sleep
sleepbar
out
X1
X4
X3
HV1P
Vdd
Gnd
sleep
PUT2P2
N2
PDN2
outbar
sleepbar
H vt
sleep
PHV2
Figure 3.6: Block Diagram - Generic Sleep-Embedded DCVSL Circuit
42
TSMC’s 180 nm technology with a supply voltage (Vdd) of 1.8V, was used to implement
the 3 DCVSL circuits. All transistors were unit-sized: PMOS (Width: 600 nm, Length:
180 nm), NMOS (Width: 600 nm, Length: 180 nm). SPECTRE was used to simulate
the circuits at a temperature of 27oC and their leakage power measured. Column 2 of
Table 3.7 gives the average leakage power loss for each of the standard DCVSL cells.
Table 3.7: DCVSL Cell Performance Measurements @ Temperature = 27oC for
Circuits Implemented using TSMC’s 180 nm Technology
DCVSL Standard Circuit Operation Sleep-Embedded Circuit Operation Leakage
Circuit Leakage Power Leakage Power Savings
Name (pW) (pW) (C2/C3)
AND2/NAND2 183.61 3.56 52X
OR2/NOR2 184.36 3.56 52X
XOR2/XNOR2 282.1 3.56 79X
Next, the sleep-circuitry was introduced in the DCVSL circuits, keeping all the tran-
sistors unit-sized (including the high-VT ones). The leakage power of these circuits
was then measured. Column 3 of Table 3.7 gives the leakage power loss for each of
the sleep-embedded DCVSL cells. Column 4 of Table 3.7 lists the leakage power sav-
ings of the sleep-circuitry embedded DCVSL cells when compared to the standard cells.
Significant power savings are seen for this class of circuits also.
3.7 Active Mode Leakage Loss Increase
Up until this point in the experiments, a comparison was always provided between the
active mode leakage power loss of standard cells and the standby (sleep) mode leakage
power loss of sleep-embedded cells. This provided accurate leakage savings results
43
when the sleep-embedded circuit was shut off or put in the standby mode. However,
no comparison was made between the leakage loss during the active mode of standard
cells and the active mode leakage loss of the sleep-embedded cells. In this section, the
active mode leakage loss results of the standard cells, as well as the sleep-embedded
cells, are provided. This allows a fair comparison between a standard cell library and
the VSDCAD ultra low-power standard cell library.
Table 3.8: Active Mode Leakage Loss - Standard CMOS Circuit vs. Sleep-
Embedded CMOS Circuit
CMOS Standard Circuit Sleep-Embedded Circuit Leakage Increase
Circuit Name Leakage (pW) Leakage (pW) Ratio (C3/C2)
NOR2 84.39 100.90 1.20X
NAND2 81.20 120.63 1.49X
OR2 152.34 214.87 1.41X
AND2 146.36 196.04 1.34X
XOR2 415.56 458.18 1.10X
XNOR2 415.67 458.11 1.10X
MUX2x1 362.79 472.85 1.30X
FULL ADDER 1617.56 1973.00 1.22X
DFFPOSX 389.71 618.19 1.59X
DFFNEGX 389.77 618.20 1.59X
Average 1.33X
Table 3.8 gives the leakage performance comparisons for the standard cells and the
sleep-embedded cells in the active mode. Results from both combinational and se-
quential cells are displayed in Table 3.8. Circuit simulations were performed using
SPECTRE [4], and transistor sizes were fixed, similar to those found in the library pro-
44
vided by Oklahoma State University. Column 2 of Table 3.8 provides the active mode
leakage loss of standard cells. Column 3 of Table 3.8 lists the active mode leakage loss
of the sleep-embedded cells. The leakage increase of the sleep-embedded cells (from
Column 3 {C3}), when compared to that of standard cells (from Column 2 {C2}), is ex-
pressed as a ratio in Column 4 of Table 3.8. The average leakage increase in the active
mode across all the sleep-embedded cells is 33%. This small increase is due to the fact
that the standard-VT sleep transistors, added in parallel to the PUN and PDN (P0 and
N0 from Figure 3.1) leak during the active mode. The high-VT sleep transistor (P1 from
Figure 3.1), has negligible leakage when compared to the other two standard-VT sleep
transistors in the active mode.
Standard cells are always on, with no means of being switched off when they are not
needed. Hence, sleep-embedded VSDCAD cells are preferred for low-power designs,
in spite of a small increase in leakage power loss in the active mode. For the entire
length of the circuit operation, the overall leakage loss (a combination of many active
and standby modes) by using the sleep-embedded cells is significantly less than the
leakage loss occurring using standard cells that are always on.
3.8 Increase in Dynamic Power Dissipation
The main emphasis till now has been on the standby (sleep) mode leakage power loss
of the VSDCAD sleep-embedded cells. The dynamic power loss of these circuits has not
been explored as yet. As explained in Section 1.1, dynamic power dissipation depends
mainly on transient switching activity and frequency of operation, as well as on the
square of the supply voltage. In this section, the effect of the additional VSDCAD sleep-
circuitry components on dynamic power dissipation of standard cells is studied. The
combinational and sequential VSDCAD standard library cells characterized previously
in this work were used, and their dynamic power measured.
45
Table 3.9: Dynamic Power Dissipation - Standard Circuits vs. VSDCAD Sleep-
Embedded Circuits
CMOS Standard Circuit Sleep-Embedded Circuit Dynamic Power Penalty
Circuit Name Dynamic Power (µW) Dynamic Power (µW) (Increase) (C3/C2)
NOR2 2.89943 3.95455 1.364X
NAND2 3.86127 5.33194 1.381X
OR2 4.61272 5.87739 1.274X
AND2 6.08750 7.87104 1.293X
XOR2 15.90818 17.60026 1.106X
XNOR2 19.61208 21.29784 1.086X
MUX2x1 13.77033 15.71666 1.141X
FULL ADDER 69.18433 80.48972 1.163X
DFFPOSX 34.26382 39.90766 1.165X
DFFNEGX 34.77193 40.20384 1.156X
Average 1.213X
Table 3.9 gives the dynamic power dissipation comparison between standard cells and
VSDCAD sleep-embedded cells. Transistor sizes were fixed, similar to those found in
the library provided by Oklahoma State University [7, 31], and circuit simulations
were performed using SPECTRE [4]. Column 2 of Table 3.9 gives the dynamic power
loss of standard cells. Column 3 of Table 3.9 lists the dynamic power dissipation of
the VSDCAD sleep-embedded cells. The dynamic power penalty (increase) of the sleep-
embedded cells (from Column 3 {C3}), when compared to that of standard cells (from
Column 2 {C2}), is expressed as a ratio in Column 4 of Table 3.9. The average dynamic
power loss increase across all the VSDCAD sleep-embedded cells is 21.3%. This power
increase is due to the additional transistors introduced and the consequent capacitive
increase in the sleep-embedded circuits.
46
The literature detailing various methods to reduce dynamic power has been analyzed
and can be summarized as follows :
• Clock and Signal Gating: This is the simplest and most straight forward
method to reduce transient switching activity of the highly active nodes in a cir-
cuit. Control-signal gating techniques, like those presented by Kapadia et al. in
[42], target reduction in switching power.
• Operand Isolation Techniques: The input-sharing problem is typically the
cause of unnecessary switching activity in modules where there should be none.
For example, consider a simple Arithmetic and Logic Unit (ALU) designed for 4
operations (add, subtract, multiply and shift), all sharing 2 input signals - “in1”
and “in2”. During a cycle to perform only the “subtract” operation, the adder,
multiplier and shifter units are simultaneously active along with the subtractor,
thereby wasting power. Operand isolation techniques, like using multiplexers or
using multiple registers to drive different modules, solve the input-sharing prob-
lem. However, this increases the area and the delay, and adds other overheads.
• Circuits Comprised of Independent Voltage Islands: Lackey et al. present
a comprehensive background on methods used to design voltage islands in [46].
They present various voltage island scenarios, a system architecture and chip
implementation methodology, which are used to reduce active and static power
consumption in SoC designs. The design implications of voltage islands are also
evaluated.
Hillman in [36] focuses on minimizing the operating voltage to reduce dynamic
power. The library of components created was characterized for different volt-
ages. Next, the whole SoC design was built with various components from this
47
library, using voltage level-shifting circuits and voltage isolation cells. The on-
chip dc-dc voltage level-shifting circuits already designed in [48] could be used in
experimentation with this methodology.
Carballo et al. propose a semi-custom voltage island approach in [24] to build
high-speed serial links. Their approach is a mixture of selective custom design
and the transparent use of multiple supplies to reduce power. The digital cir-
cuitry on the chip runs at a low supply voltage, while the analog circuitry runs
at a higher voltage level. An on-chip regulator converts low to high voltage, and
vice-versa. MTCMOS transistors are used in the custom design process.
Hung et al. [38] present a voltage island partitioning and floor-planning algo-
rithm for architecting SoC designs. Their work explores the thermal impact of
voltage islands. A hybrid optimization approach consisting of a genetic algorithm-
based (GA-based) voltage island partitioning algorithm and a simulated annealing-
based (SA-based) floor-planning algorithm, is presented.
• Transistor Re-ordering Techniques: Hossain et al. [37] use a probability-
based transistor re-ordering technique to reduce dynamic power dissipation in
CMOS circuits.
In this chapter, the design, development and characterization of the VSDCAD Ultra
Low-Power RTL Standard Cell Library have been explained. The core of the low-power
library development - a novel technique that achieves cancellation of leakage effects
in both the Pull-Up Network (PUN), as well as the Pull-Down Network (PDN) of CMOS
cells, is presented. It involves voltage balancing in the PUN and PDN paths, using a
combination of high-VT and standard-VT sleep transistors. Experimental results show
significant leakage power savings (an average of 20.7X for a 180 nm process technology
at a temperature of 27oC) in both combinational and sequential CMOS library cells
employing this sleep-circuitry, when compared to standard CMOS cells. In the next
48
chapter, a thorough comparison of the VSDCAD technique with other well-established
leakage reduction techniques is made. Important design constraints, such as area
utilization, circuit delay and the leakage savings of a few leakage reduction techniques,
will be evaluated and compared.
49
Chapter 4
Comparison of VSDCAD with
Other Leakage Reduction
Techniques
In order to properly evaluate the VSDCAD technique presented in the previous chap-
ter, it has to be compared with other leakage reduction techniques for important de-
sign constraints like area utilization, circuit delay and leakage savings. In the fol-
lowing sections of this chapter, the VSDCAD technique is compared [53] against the
well-established LECTOR and Power-Gating techniques, using a variety of MCNC’91
benchmarks [72]. The Berkeley Predictive Technology Models (BPTM) [3] were used to
implement and simulate the circuits in this work. Since BPTM contains models only
for standard-VT PMOS and NMOS transistors, models for high-VT PMOS and NMOS
transistors were developed as part of this research for this comparative study. Results
show that a definitive trade-off exists among the various design constraints - area
utilization, propagation delay and leakage power savings, for all leakage reduction
techniques.
50
4.1 Experimental Setup
Experiments were conducted on a variety of combinational multi-level MCNC’91 bench-
marks [72]. Circuits were implemented using various deep-submicron process tech-
nologies. The HSPICE1 simulator, in conjunction with the BPTM [3] deep-submicron
technology, was used to simulate circuits and to estimate leakage power dissipation.
All circuits (unless specified otherwise) were simulated at a temperature of 25oC.
The Berkeley Predictive Technology Models (BPTM) contained process parameters and
values only for standard-VT PMOS and NMOS transistors. No models are available for
high-VT transistors. Experiments using some proprietary technology models obtained
directly from foundries showed an interesting trend in the threshold voltage value of
high-VT transistors. For a variety of deep-submicron technologies, we observed that
the threshold voltage (VT ) value of a high-VT PMOS or a high-VT NMOS transistor was
25%-35% more than that of a standard-VT transistor. Hence, models for high-VT PMOS
and NMOS transistors were incorporated into BPTM [3] with threshold voltage val-
ues 25% more than that of standard-VT transistors. DC simulations were run using
HSPICE [8] to ensure that the threshold values of these high-VT transistors were only
25% more than those of standard-VT transistors.
Tables 4.1 and 4.2 list the supply and threshold voltage values for various BPTM mod-
els for PMOS and NMOS transistors respectively. The first columns in Tables 4.1 and 4.2
list the technology feature size. The supply voltage used for each feature size is listed
in Column 2 of both Tables 4.1 and 4.2. The zero-bias threshold voltage (Vth0) of a
PMOS standard-VT transistor is tabulated in Column 3 of Table 4.1, and that of an
NMOS standard-VT transistor is tabulated in Column 3 of Table 4.2. Column 4 of
1TM - Synopsys, Inc.
51
Table 4.1: PMOS Supply and Threshold Voltage Values for Various BPTM Mod-
els
BPTM Supply PMOS Standard Tran. PMOS Standard Tran. PMOS High-Vt Tran.
Feature Size Voltage (Vdd) Zero-Bias Threshold (Vth0) Threshold (Vth) Threshold (Vth)
180 nm 1.8V -0.4200V -0.2822V -0.3528V
130 nm 1.5V -0.3499V -0.4108V -0.5137V
100 nm 1.0V -0.3030V -0.2891V -0.3614V
70 nm 0.85V -0.2200V -0.3338V -0.4173V
Table 4.2: NMOS Supply and Threshold Voltage Values for Various BPTM Mod-
els
BPTM Supply NMOS Standard Tran. NMOS Standard Tran. NMOS High-Vt Tran.
Feature Size Voltage (Vdd) Zero-Bias Threshold (Vth0) Threshold (Vth) Threshold (Vth)
180 nm 1.8V 0.3999V 0.4432V 0.5540V
130 nm 1.5V 0.3320V 0.3110V 0.3887V
100 nm 1.0V 0.2607V 0.2773V 0.3466V
70 nm 0.85V 0.2000V 0.3133V 0.3916V
Table 4.1 gives the threshold voltage (Vth) of a standard PMOS transistor, while Col-
umn 4 of Table 4.2 gives the threshold voltage (Vth) of a standard NMOS transistor. The
threshold voltage of a high-VT PMOS transistor is listed in Column 5 of Table 4.1 and
the threshold voltage of a high-VT NMOS transistor is listed in Column 5 of Table 4.2.
4.2 MCNC’91 VSDCAD Implementation Leakage Values
Forty-six experimental MCNC’91 benchmark circuits were implemented with individ-
ual VSDCAD CMOS gates. They were sized appropriately for 4 different deep-submicron
technologies - 180 nm, 130 nm, 100 nm and 70 nm. The supply voltages for the respec-
tive technologies are given in Column 2 of Table 4.1. Simulations were carried out,
using HSPICE [8] in the standby mode of operation, and their leakage loss measured.
52
Table 4.3: VSDCAD Leakage Values for MCNC’91 Benchmarks for Various Deep-
Submicron Technologies
MCNC’91 VSDCAD Leakage Value for BPTM (nW) MCNC’91 VSDCAD Leakage Value for BPTM (nW)
Circuit 180 nm 130 nm 100 nm 70 nm Circuit 180 nm 130 nm 100 nm 70 nm
I1 3.117 1.236 0.336 0.169 apex7 11.93 4.728 1.287 0.645
I2 7.387 2.928 0.797 0.400 b9 8.471 3.358 0.914 0.458
I3 6.099 2.418 0.658 0.330 c8 11.11 4.406 1.199 0.601
I4 8.132 3.224 0.877 0.440 cht 15.52 6.152 1.674 0.839
I5 19.31 7.656 2.083 1.045 comp 10.23 4.056 1.104 0.553
I6 23.04 9.133 2.485 1.246 cordic 6.912 2.740 0.746 0.374
I7 31.92 12.65 3.443 1.726 count 9.691 3.841 1.045 0.524
I8 124.1 49.19 13.38 6.712 dalu 115.0 45.59 12.41 6.220
I9 35.37 14.02 3.816 1.913 frg1 7.116 2.821 0.768 0.385
I10 153.2 60.71 16.52 8.284 frg2 68.04 26.97 7.339 3.680
C432 10.84 4.298 1.170 0.586 k2 81.39 32.26 8.779 4.402
C499 13.69 5.426 1.477 0.740 pair 97.18 38.52 10.48 5.256
C880 25.96 10.29 2.800 1.404 parity 4.608 1.827 0.497 0.249
C1355 37.00 14.67 3.991 2.001 rot 46.83 18.56 5.051 2.533
C1908 59.64 23.64 6.433 3.226 sct 6.167 2.445 0.665 0.334
C2670 80.85 32.05 8.721 4.373 t481 140.4 55.66 15.15 7.595
C3540 113.1 44.83 12.20 6.118 term1 24.26 9.617 2.617 1.312
C5315 156.3 61.97 16.86 8.456 ttt2 13.55 5.373 1.462 0.733
C6288 163.7 64.90 17.66 8.856 vda 39.64 15.71 4.276 2.144
C7552 238.0 94.34 25.67 12.873 x1 19.31 7.656 2.083 1.045
alu2 22.70 8.999 2.449 1.228 x2 2.846 1.128 0.307 0.154
alu4 46.15 18.29 4.978 2.496 x3 48.45 19.21 5.227 2.621
apex6 30.63 12.14 3.304 1.657 x4 25.00 9.912 2.697 1.353
Since exhaustive testing for many of the benchmarks was impossible, a representative
sample of 1500 randomly generated input vector combinations was applied to each of
the circuits, and leakage loss was measured in every case. The average of these 1500
values is listed as the leakage dissipation value for a circuit in Table 4.3.
Columns 1 and 6 of Table 4.3 list the MCNC’91 benchmark names. Columns 2 and 7
of Table 4.3 give the leakage values of the various benchmarks implemented using the
53
180 nm BPTM. Similarly, Columns 3 and 8 give leakage values of the benchmarks for
the 130 nm BPTM; Columns 4 and 9 give leakage values of the benchmarks for the 100
nm BPTM; and Columns 5 and 10 give leakage values of the benchmarks for the 70 nm
BPTM. The circuit C7552, containing approximately 3500 gates, is the largest design
among all the benchmarks chosen, while circuit x2, containing approximately 50 gates,
is the smallest. From Table 4.3, we observe an order of magnitude of leakage savings
for all benchmarks as the technology shrinks, from 180 nm down to 70 nm.
4.3 Area and Delay Comparison
In addition to leakage power reduction, the VSDCAD leakage reduction technique needs
to be evaluated for essential performance parameters like area and delay. Towards this
end, we compare it with the well-established LECTOR technique [33]. In order to com-
pare propagation delays using both techniques, a two-input NAND gate was used as
an example, and implementation was done using the BPTM’s 100 nm and 70 nm tech-
nologies. Table 4.4 lists the delay and leakage power comparison for the LECTOR and
VSDCAD techniques. For a fair comparison, the supply voltage was set to 1V for both
the 100 nm and 70 nm process technologies (as was done in [33]). The LECTOR NAND
gate used was the 1-LCT case reported in [33], where the widths of the LCT1 and LCT2
transistors were the same as those of the PMOS and NMOS transistors of the PUN and
PDN (seen in Figure 1.2). Hence in the VSDCAD NAND implementation, the width of
the high-VT transistor P1 (seen in Figure 3.1 of Chapter 3) was set to be the same size
as that of LECTOR LCT1 transistor (seen in Figure 1.2 of Chapter 1). All other PUN
and PDN transistors were sized the same as those in the LECTOR case. The extra sleep
transistors, P0 and N0 (seen in Figure 3.1 of Chapter 3) were unit-sized.
The values reported in Rows 2 , 3 , 5 and 6 of Table 4.4 are those given in [33]. Row 2
and Row 5 list the leakage and delay values for a conventional NAND gate, using 100
54
Table 4.4: Leakage Power and Delay Comparison for Two-Input Nand Gate
100 nm Process Technology (BPTM), Supply Voltage = 1V
NAND Gate Leakage Power Dissipation in Watts for Input Vector Delay Delay Average Leakage
Type (0, 0) (0, 1) (1, 0) (1, 1) (ps) Penalty Savings Ratio
Conventional 1.228e-10 9.117e-10 5.356e-10 2.241e-09 13.53 - -
LECTOR 1.180e-10 5.542e-10 4.477e-10 1.539e-09 18.79 38.88% 1.433X
VSDCAD 8.114e-12 8.114e-12 8.114e-12 8.114e-12 16.73 23.65% 117.1X
70 nm Process Technology (BPTM), Supply Voltage = 1V
Conventional 6.450e-10 5.600e-09 3.817e-09 1.091e-08 15.16 - -
LECTOR 6.065e-10 3.808e-09 3.622e-09 5.564e-09 21.40 41.16% 1.5421X
VSDCAD 5.142e-12 5.142e-12 5.142e-12 5.142e-12 18.29 20.65% 1019.6X
nm BPTM and 70 nm BPTM, respectively. Row 3 and Row 6 list the leakage and delay
values for the LECTOR NAND gate, using 100 nm BPTM and 70 nm BPTM, respectively.
Row 4 and Row 7 of Table 4.4 give the leakage (in standby mode) and delay values for
the VSDCAD NAND gate, using 100 nm BPTM and 70 nm BPTM, respectively. All high-
lighted values in Table 4.4 represent the best values achieved. Analysis of the results
shows that the VSDCAD technique has the least leakage power dissipation, while the
conventional NAND gate has the least propagation delay value. Column 7 of Table 4.4
gives the delay penalties of both the LECTOR and VSDCAD techniques, when compared
to the conventional case. We see that the VSDCAD technique has a lesser delay penalty
than LECTOR, for both the 100 nm and 70 nm cases. Another interesting fact observed
from Column 7 is that, with shrinking technologies (70 nm compared with 100 nm), the
LECTOR delay penalty increases, while the delay penalty of VSDCAD decreases. Col-
umn 8 is the average leakage savings ratio of the LECTOR and VSDCAD techniques,
when compared with the conventional NAND gate. It was observed that the VSDCAD
technique has two orders of magnitude savings for the 100 nm case, and three orders
of magnitude savings for the 70 nm case, when compared to the LECTOR technique.
Next, in order to compare the area utilization using both VSDCAD and LECTOR tech-
55
Table 4.5: Experimental Results for MCNC’91 Benchmarks (70 nm BPTM Pro-
cess, Supply Voltage = 1V)
MCNC’91 Leakage Power Dissipation Normalized Area Ratio Leakage Savings
Circuit U-MCNC LECTOR VSDCAD with respect to U-MCNC Ratio
Name µW (10−6W) µW (10−6W) nW (10−9W) LECTOR VSDCAD (C2/C4) (C3/C4)
I1 1.159 0.156 0.362 1.21 1.13 3202X 431X
I2 2.305 0.735 0.858 1.14 1.06 2686X 857X
I3 1.383 0.419 0.708 1.18 1.10 1953X 592X
I4 2.356 0.632 0.944 1.12 1.05 2496X 670X
I5 4.625 0.475 2.243 1.19 1.11 2062X 212X
I6 6.906 1.912 2.676 1.13 1.05 2581X 714X
I7 8.933 3.126 3.706 1.12 1.05 2410X 844X
I8 30.05 5.038 14.41 1.08 1.01 2085X 350X
I9 21.90 2.897 4.108 1.12 1.05 5331X 705X
I10 40.47 5.842 17.78 1.15 1.07 2276X 329X
C432 1.395 0.672 1.259 1.17 1.09 1108X 534X
C499 3.469 1.444 1.590 1.15 1.07 2182X 908X
C880 6.141 1.154 3.014 1.18 1.10 2038X 372X
C1355 8.089 1.672 4.297 1.11 1.04 1883X 389X
C1908 19.61 1.926 6.925 1.13 1.05 2832X 278X
C2670 52.17 2.845 9.388 1.19 1.11 5557X 303X
C3540 64.79 3.852 13.13 1.14 1.06 4935X 293X
C5315 82.58 4.826 18.15 1.12 1.05 4550X 266X
C6288 163.7 9.725 19.01 1.10 1.03 8611X 512X
C7552 323.2 10.24 27.64 1.08 1.01 11693X 370X
Average 1.1405X 1.0645X 3624X 496X
niques, 20 MCNC’91 benchmark circuits were used, and implementation was done us-
ing the BPTM’s 70 nm technology. These circuits include the largest circuit (C7552), as
well one of the smallest circuits (I1) in the benchmark suite. Table 4.5 lists the nor-
malized area and leakage power comparison for the LECTOR and VSDCAD techniques.
For a fair comparison, the supply voltage was set to 1V (as done in [33]). That is the
reason that the VSDCAD leakage values for the 20 benchmarks given in Column 4 of
Table 4.5 do not match the leakage values of the same 20 benchmarks from Column 5
56
of Table 4.3 (whose supply voltage was 0.85V).
The values reported in Columns 2 , 3 and 5 of Table 4.5 are those given in [33]. Col-
umn 1 lists the leakage power of the unmodified MCNC’91 (U-MCNC) circuits, while
Column 2 lists the leakage power of the LECTOR MCNC circuits. Column 4 of Table 4.5
gives the leakage (in standby mode) of the VSDCAD MCNC circuits. An important point
to note is that the leakage of U-MCNC and LECTOR circuits is in the order of micro-
Watts (µW), while the leakage of VSDCAD circuits is in the order of nano-Watts (nW).
Columns 5 and 6 give the normalized area ratio (penalty) of the LECTOR and VSDCAD
techniques, compared to U-MCNC. On average, a 6.45% area increase was observed in
the case of VSDCAD, as opposed to 14% area increase in the case of LECTOR. Column 7
of Table 4.5 lists the leakage savings when comparing the U-MCNC leakage value (from
Column 2 {C2}) to the VSDCAD leakage value (from Column 4 {C4}). Three orders of
magnitude leakage savings are seen (average of Column 7). Similarly, Column 8 of
Table 4.5 lists the leakage savings seen when comparing the LECTOR circuit leakage
value (from Column 3 {C3}) to the VSDCAD circuit leakage value (from Column 4 {C4}).
Two orders of magnitude leakage savings are observed (average of Column 8). From
these results, it can be clearly ascertained that the use of VSDCAD cells offers enor-
mous leakage savings, at the cost of a nominal increase in area.
Only an order of magnitude leakage savings was observed in Tables 3.4 and 3.6 of
Chapter 3, while two to three orders of magnitude savings were observed in Table 4.5.
This is due to the fact that BPTM models were used here instead of TSMC models
and also 70 nm was the technology used in this implementation as compared to the
180 nm implementation in Chapter 3. Much larger leakage losses are expected as
the technology shrinks. Consequently, for deep-submicron technologies huge leakage
savings (2 to 3 orders of magnitude) are seen using the VSDCAD technique over the
57
standard circuit implementation.
4.4 Leakage Savings Comparison
Since power-gating is a commonly used method for reducing leakage power, we com-
pared it with the VSDCAD technique for the 70 nm BPTM in the standby mode of oper-
ation. Five representative MCNC’91 benchmark circuits were used in this comparison.
In our power-gated CMOS circuit implementation, a high-VT PMOS transistor connects
between the power supply (Vdd) and the PUN, acting as a switch. For a fair comparison,
the high-VT PMOS transistors were sized the same for both the power-gated circuit and
the VSDCAD circuit implementations. Figure 4.1 shows the leakage power dissipation
comparison between both the techniques for the 5 circuits. From the experimental re-
sults, we noted a 5.7X improvement (on average) in leakage savings using the VSDCAD
technique, when compared to the use of traditional power-gating technique.
� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �
� � � �� � � �� � � �� � � �� � � �� � � �� � � �� � � �
� � �� � �� � �
� � �� � �� � �
� � � �� � � �� � � �� � � �� � � �� � � �
� � � �� � � �� � � �� � � �� � � �� � � �
� � � �� � � �� � � �� � � �
� � �� � �� � �� � �
� � � �� � � �
0
10
20
30
40
50
I7 C1908 C5315 dalu x3
Le
ak
ag
e P
ow
er
Dis
sip
ati
on
(n
W)
Power−Gated CircuitBPTM 70 nm, Vdd = 0.85V
MCNC’91 Benchmark Circuit
VSDCAD Circuit
Figure 4.1: Leakage Power Dissipation Comparison
58
Comparison of the VSDCAD technique with the well-established LECTOR technique
shows significant leakage savings (orders of magnitude) of the former over the latter.
The area and delay penalty of VSDCAD circuits were also less than those seen in the
case of LECTOR. Designers and synthesis tools could add the VSDCAD cells in non-
critical paths, thereby avoiding effects on the overall circuit delay, while significantly
saving on leakage power loss.
In the following chapter, additional experimental research results are reported. An
example illustrating the leakage savings obtained after replacing only non-critical cells
in the circuit is presented. The effects of higher temperatures on leakage power are
also studied.
59
Chapter 5
Additional Experimental
Research
The VSDCAD ultra low-power RTL standard cell library can be seamlessly integrated
into a typical low-power synthesis framework. In Chapter 4, we saw that there is a
delay penalty associated with the use of VSDCAD cells. Hence, designers and synthesis
tools could add the VSDCAD cells in non-critical paths, and avoid effects on the over-
all circuit delay, while significantly saving on leakage power loss. The subthreshold
leakage current is exponentially dependent on temperature, and, its effects on leakage
power needs to be analyzed and studied in depth. In this chapter, an example illustrat-
ing the leakage savings obtained after replacing only non-critical cells in the circuit is
presented, followed by a study of higher temperature effects on leakage power.
5.1 Leakage Savings after Replacement of Only Non-Critical Cells
Figure 2.1 of Chapter 2 shows that a critical path analysis is performed on the RTL
circuit, using Cadence [4] or Synopsys [8] tools, as part of the low-power synthesis
framework. As an example, an 8-bit ripple-carry adder was designed, using TSMC’s
60
Table 5.1: Perf. Chars. of an 8-bit Standard Ripple Carry Adder
Output Signal Propagation Delay (ns) Slack Time (ns)
COUT 1.8917 0
S7 1.4470 0.4447
Leakage Power Loss = 12.483 nW
180 nm technology [9]. The components used to build the 8-bit adder (XOR2, AND2 and
OR2) were standard cells whose transistor sizes were fixed, similar to those found in
the library provided by Oklahoma State University [7, 31].
A critical path analysis was performed on the 8-bit ripple-carry adder using Synopsys
tools and the timings/slack for all the standard cells in the various paths obtained from
the logfiles. It was observed that the path to the carryout output signal (COUT ) was the
most critical, as it had no slack time. This adder was implemented only for demonstra-
tion purposes and may not be the optimal design implementation. Our adder design
was used in this analysis in order to show the effectiveness of our approach. Row 1,
Column 2 of Table 5.1 shows the propagation delay of the COUT signal. All the other
paths for the sum output signals (S0, . . . S7) had more slack time, when compared to
the critical COUT path. The slack time of the second most critical path, S7, is listed in
Row 2, Column 3 of Table 5.1. The slack times of all the other paths (S0, . . . S6) were
greater than 0.4447 ns. The leakage power loss was then measured for this 8-bit ripple-
carry adder composed of standard cells. The result was 12.483 nW, as can be seen from
Column 3 of Table 5.1.
Since enough available slack time was present for the sum paths (S0, . . . S7), all the
non-critical standard cells (XOR2 gates) in the various sum paths were replaced with
equivalent low-power VSDCAD library XOR2 gates. The standard cells that make up
61
Table 5.2: Perf. Chars. of an 8-bit Sleep-Embedded Ripple Carry Adder
Operating Mode Output Signal Propagation Delay (ns) Slack Time (ns)
COUT 1.8912 0
Active S7 1.5033 0.3879
Leakage Power Loss = 12.990 nW
Standby (Sleep) Leakage Power Loss = 6.213 nW
the critical COUT path were retained. Then, Synopsys tools were used to perform criti-
cal path analysis on this new 8-bit ripple-carry adder, made up of a mixture of standard
cells and VSDCAD sleep-embedded cells.
Column 4 of Table 5.2 shows the slack times obtained from the Synopsys logfiles. An
analysis of the new slack times shows that the slack times of the sum paths (S0, . . . S7)
have decreased because of the use of sleep-embedded cells, which have higher prop-
agation delay. However, the critical path timing (COUT ) of this new sleep-embedded
circuit is unchanged from that of the standard circuit. Next, the leakage power loss
in both the active and standby modes of operation was measured for this 8-bit ripple-
carry adder, which used a combination of standard cells and VSDCAD low-power cells.
Row 3 of Table 5.2 lists the leakage power loss in the active mode of operation as 12.990
nW. This is an increase of about 4% over the value listed in Row 3 of Table 5.1. How-
ever, during the standby mode of operation, the leakage loss, as seen from Row 4 of
Table 5.2, is 6.213 nW. Hence, during the sleep mode, leakage savings of more than 2X
were observed over the active mode operation of the circuit made up of standard cells.
The back-tracking algorithm for mixed-VT static CMOS circuits presented by Wei et al.
[70] could be used to evaluate the usage of high-VT or low-VT transistors in a circuit.
62
5.2 Effects of Varying Temperature on Leakage Power
Prior research [20, 34, 60, 64] shows a much larger leakage power loss occurring at
higher temperatures. This is due to the fact that subthreshold leakage current is ex-
ponentially dependent on temperature, as seen from Equation 1.1 in Chapter 1.
All the experimental results shown till now has been from simulations carried out on
various circuits at room temperature. Experiments to study the higher temperature
effects on leakage current were conducted on three CMOS circuits - NOR2x1, NAND2x1
and XOR2x1 gates. They were implemented using the BPTM’s 180 nm technology, and
the sizes of the transistors were fixed, similar to those found in the library provided by
Oklahoma State University [7, 31]. Circuit simulations were performed using HSPICE
[8] at six different temperature values: 25oC, 45oC, 65oC, 85oC, 105oC and 125oC.
0
200
400
600
800
1000
1200
40 60 80 100 120
Lea
kag
e P
ow
er L
oss
(n
W)
Temperature (Deg C)
Conventional CMOS Gates : Standard Mode (BPTM 180 nm, Vdd = 1.8V)
Xor2x1 ->
Nand2x1 ->
<- Nor2x1
"nor2a.dat""nand2a.dat"
"xor2a.dat"
Figure 5.1: Temperature Effects On Leakage Power - Standard Mode
63
Conventional implementations of all three circuits were first simulated using the stan-
dard mode of operation. Figure 5.1 shows a graph of the leakage power loss for the
NOR2x1, NAND2x1 and XOR2x1 gates at various temperature values. The XOR2x1 gate,
being a larger circuit than the other two, exhibits the largest leakage power loss at all
temperatures.
0
5
10
15
20
25
30
35
40 60 80 100 120
Lea
kag
e P
ow
er L
oss
(n
W)
Temperature (Deg C)
VSDCAD CMOS Gates : Standby Mode (BPTM 180 nm, Vdd = 1.8V)
Xor2x1 ->
Nand2x1/Nor2x1 ->
"nor2b-nand2b.dat""xor2b.dat"
Figure 5.2: Temperature Effects On Leakage Power - Standby (Sleep) Mode
Next, the VSDCAD sleep-circuitry was introduced, and these circuits were again simu-
lated in the standby (sleep) mode of operation. Figure 5.2 shows a graph of the leakage
power loss at various temperatures for all three circuits in the standby mode of oper-
ation. As can be seen, VSDCAD NAND2x1 and NOR2x1 gates exhibit the same leakage
loss at all temperatures.
64
NAND2x1 Gate
XOR2x1 Gate
NOR2x1 GateBPTM 180 nm, Vdd = 1.8V
40X
30X
20X
10X
0
25o
45o
65o
85o 105
o125
o
Temperature (Deg C)
Lea
kage
Sav
ings
Rat
io
Figure 5.3: Leakage Power Savings at Higher Temperatures
Figure 5.3 shows the leakage savings ratio at higher temperatures between the VS-
DCAD technique and the conventional implementation. It can be observed from Fig-
ure 5.3 that the leakage savings ratios vary from 8X (for the XOR2x1 gate at 45oC) to
35X (for the XOR2x1 gate at 125oC). Even at higher temperatures, significant leakage
power savings are seen using the VSDCAD technique, when compared to the standard
mode of operation, thereby supporting the effectiveness of our approach.
In this chapter, results from additional experiments show large leakage savings using
the VSDCAD technique, even after replacing only non-critical cells in a circuit. Find-
ings also show significant leakage savings in VSDCAD circuits at much higher temper-
atures.
65
A controller is needed to integrate the VSDCAD ultra low-power RTL standard cell li-
brary into the low-power synthesis framework. The area, delay and power overheads
due to this controller also need to be studied. RTL simulations need to be performed
to verify whether or not the controller and datapath (design) work in synchronization
with each other. In the next chapter, a signal probability based self-controller tech-
nique is presented. Experiments illustrating the leakage savings and circuit delay
overheads of controller integrated circuits, when compared to other leakage reduction
techniques, are also reported.
66
Chapter 6
Signal Probability Based
Self-Controller
The controller is most important contribution of this dissertation. It is the vital link
that will integrate with the VSDCAD ultra low-power RTL standard cell library, to mini-
mize leakage power in any low-power synthesis framework. The design and implemen-
tation of this leakage reduction controller is explained in this chapter. The VSDCAD
generic low leakage CMOS cell implementation, shown in Section 3.1 of Chapter 3, has
a disadvantage in that the circuit has two extra control signals, “sleep” and its comple-
ment “sleepbar”, feeding in from some external source or controller. To overcome this,
the topology in Figure 3.1 of Chapter 3 was modified for purposes of self-controllability
and reducing routing congestion. Such modified gates are referred to as VCLEARIT
(Vlsi Cmos LEAkage ReductIon Technique) self-controlled gates [52] for the remain-
der of this work. Less leakage savings were observed using VCLEARIT gates, when
compared to values seen in Chapters 3 and 4. This is because a perfect voltage balance
was not achieved, unlike in the previous work. However, the trade-off factor was that
the VCLEARIT gate had one less control signal than the VSDCAD gate, and hence was
much easier to route as well as control.
67
6.1 VCLEARIT Control Circuitry Embedded CMOS Gates
in 1
in n
Vdd
Gnd
P0
X1
X2
H vt
PUN
PDNP1
N0
ctrl
ctrl
ctrl
out
Figure 6.1: VCLEARIT CMOS Gate (AND, NAND) with Control Value 0
Figure 6.1 is the topology of a VCLEARIT CMOS gate with a control value ‘0’ (e.g., AND,
NAND) illustrating the control transistor circuitry embedded in it. There are ‘n’ inputs,
in1, . . . inn, feeding the Pull-Up Network (PUN) and the Pull-Down Network (PDN). The
transistors in both the PUN and PDN are standard-VT devices. The control circuitry
consists of three transistors - two PMOS devices {P0 and P1} and one NMOS device
{N0}. Transistors P0 and P1 are standard-VT devices, while N0 is a high-VT device.
P0 is connected in parallel with the PUN, one end connecting to the source (Vdd) and
the other end to a common point X1. P1 is connected in parallel with the PDN, one end
connecting to the Gnd and the other end to a common point X2. The high-VT transistor,
N0, connects between the two common points X1 and X2 and behaves like a transmis-
sion gate. The output of the CMOS circuit, “out”, is drawn from the common point X1.
An input signal, “ctrl” feeds the 3 transistors P0, P1 and N0. The output of any gate is
68
known when one of its inputs has the control value, and hence, this gate can be placed
in the standby mode of operation. The idea here is to connect whichever input is the
controlling value, to the “ctrl” signal, thereby placing the gate in the standby mode of
operation.
The operation of the VCLEARIT CMOS gate is as follows. In the normal operating mode,
“ctrl” is on. This causes transistors {P0, P1} to turn off and transistor N0 to turn on.
The circuit now behaves exactly as a normal CMOS complementary circuit should. The
standby operating mode is a little more involved. In this mode, one of the ‘n’ inputs
has the controlling value ‘0’ of the {AND, NAND} gate output, so that the gate can be
switched off. Signal “ctrl” is off, so that transistors {P0, P1} turn on, while transistor
N0 turns off. Since P0 is on, common point X1 is also at voltage Vdd. The PUN is now
between two points of almost equal voltage potential (Vdd), and hence, no leakage cur-
rent should flow through it. Similarly, P1 is on and common point X2 is grounded. The
PDN is now between two points of almost equal voltage potential (Gnd), and hence,
no leakage current should flow through it. The leakage loss occurring during the sleep
mode will only be through the high-VT transistor N0, which is turned off, but connected
between points X1 and X2, which are at different voltage potentials.
Figure 6.2 is the topology of a VCLEARIT CMOS gate with a control value ‘1’ (e.g., OR,
NOR).
6.2 Gate Control Signal Calculation in Circuits
The signal probability of a line is defined as the probability of it being set to either a ‘0’
or a ‘1’ value by some driving value. The signal probabilities of intermediate points in a
boolean CMOS circuit can be calculated from the signal probabilities of its primary in-
puts, by using well known signal probability propagation techniques [30]. These signal
69
in 1
in n
Vdd
Gnd
X1
X2
H vt
PUN
PDN
P0
N0
N1
ctrl
ctrl
ctrl
out
Figure 6.2: VCLEARIT CMOS Gate (OR, NOR) with Control Value 1
controllability and propagation properties are exploited in controlling VCLEARIT gates.
Consider the circuit schematic shown in Figure 6.3. It contains four VCLEARIT gates
{G1, . . . G4}, with three inputs {A, B and C} and one output {OUT}. The gate output
probabilities are calculated as shown in [16]. The control value for a NAND gate is ‘0’
and its output value is ‘1’. Also, the control value for an AND gate is ‘0’ and its output
value is ‘0’. Equation 6.1 gives the gate output probability value for both these cases.
Similarly, the control value for a NOR gate is ‘1’ and its output value is ‘0’. Also, the
control value for an OR gate is ‘1’ and its output value is ‘1’. Equation 6.2 gives these
gate output probability values.
GateOutputProb[NAND1, AND0] = 1 −n
∏
i=0
Pi (6.1)
GateOutputProb[NOR0, OR1] =n
∑
i=0
Pi −n
∏
i=0
Pi (6.2)
where, Pi is the signal probability of the ith input for the corresponding gate.
70
1/2
1/2 1/4
1/4
3/4
C
G3
G43/16
OUT
G1
G2
ctrl
ctrl
ctrlctrl
1/2
1/2
1/2
B
A
Figure 6.3: Example illustrating Gate Control Signal Calculation
All the inputs {A, B and C} to the circuit shown in Figure 6.3 are assumed to have an
equal probability (1
2) of having either a ‘0’ or a ‘1’ value. Using Equations 6.1 and 6.2
the output probabilities of all the four gates, G1, . . . G4, are calculated. The input
probability for every gate input is calculated such that this input value is the control-
ling value for that gate. For example, NOR gate G2 has 2 inputs ‘B’ and ‘C’ both with
probabilities (1
2) that their value is ‘1’, the controlling value of G2. Hence the output
probability of G2 is (3
4) and the output value is ‘0’. However, G2 drives one input of gate
G3, as well as one input of gate G4. The control value of G3 is ‘0’ and so the output
value of gate G2 can drive this input. Hence the probability of this input of G3 is the
same as that of the output of gate G2 which is (3
4). But, the control value of G4 is ‘1’,
and hence, the opposite of the output value of gate G2 needs to drive this input of G4.
So the probability of this input of G4 is (1-3
4= 1
4).
Once all signal probability calculations are completed, the input with the maximum
probability control value for every gate is chosen as the one to be connected to the
“ctrl” signal of that gate. In the case of a tie, the first occurrence is chosen. These
71
are represented in Figure 6.3 with circles around them and the corresponding “ctrl”
connections are seen. Gates in this circuit will automatically (on the fly) go into the
standby mode of operation based on their input signal values. This signal probability
based self-controlling technique can be applied to all classes of VLSI circuits.
6.3 Leakage Loss Comparison
TSMC’s 180 nm technology [9], with a supply voltage of 1.8V, as well as BPTM’s 100
nm technology [3], with a supply voltage of 1V, were used in the implementation of
this work. SPECTRE [4] was used in this work to simulate circuits and also to measure
leakage power. All simulations were carried out at a temperature of 27oC. Experiments
were conducted on 13 CMOS circuits, the smallest being the MCNC’91 benchmark C17,
and the largest being the MCNC’91 benchmark C6288. To see the leakage effects across
different technologies, experimental results were taken for 8 benchmarks implemented
using TSMC’s 180 nm technology, and for 6 benchmarks implemented using BPTM’s 100
nm technology.
First using TSMC’s 180 nm technology, 8 standard CMOS circuits were implemented.
Simulations were carried out and leakage power measured using SPECTRE [4]. For
every circuit, all possible input combinations were applied, and leakage power loss
was measured in every case. Column 2 (C2) from Table 6.1 lists the average leakage
power loss for all standard circuit implementations. Next, the same circuits were im-
plemented with TSMC’s 180 nm LECTOR gates and all LECTOR implementations were
simulated and their leakage loss measured. Column 3 (C3) from Table 6.1 gives the
average leakage power loss for each of the 8 LECTOR circuits. Finally, all 8 circuits
were implemented using TSMC’s 180 nm VCLEARIT gates.
An automated program was written using ANTLR [2, 59], which can parse a VHDL [11]
72
Table 6.1: Leakage Power Comparison @ Temperature = 27oC (TSMC’s 180 nm
Implementation)
CMOS Standard LECTOR VCLEARIT Leakage Leakage
Circuit Implementation Implementation Implementation Savings Savings
Name Leakage Power (pW) Leakage Power (pW) Leakage Power (pW) (C2/C4) (C3/C4)
C17 365.52 293.16 225.76 1.62X 1.30X
Rbench1 631.79 519.95 424.02 1.49X 1.23X
Rbench2 908.34 756.21 550.52 1.65X 1.37X
Levelized 1729.13 1408.32 1006.38 1.72X 1.40X
Skewed 618.87 529.80 409.97 1.51X 1.29X
Balanced 3460.33 2829.74 2034.49 1.70X 1.39X
Full-Adder 1104.33 878.82 751.15 1.47X 1.17X
4-Bit Adder 4155.22 3099.64 2457.81 1.69X 1.26X
Average 1.61X 1.30X
circuit netlist, build its connectivity graph, calculate the various internal signal prob-
abilities, and compute the input with the highest signal probability (for all the gates)
to connect to the gate’s “ctrl” signal. The input parameters to this program are: (1)
the VHDL circuit netlist, and (2) the switching probability values for all input signals
that turn off the corresponding gates they are connected to. The output of this pro-
gram is the corresponding VCLEARIT self-controlling leakage reduction circuit, with
all the “ctrl” signals connected appropriately. Appendix B illustrates the working of
this program for automated gate control signal calculation. The C17 MCNC’91 bench-
mark is used as an example, and control signals are generated, using this program for
the following two cases :
• When all the inputs have equal probability (0.5) of being either a ‘0’ or a ‘1’ value.
• When all the inputs have different probabilities of being either a ‘0’ or a ‘1’ value.
73
This program generated the VCLEARIT self-controlling leakage reduction circuit for
the 8 benchmarks listed in Table 6.1. These circuits were then simulated and the leak-
age loss measured. Column 4 (C4) from Table 6.1 gives the average leakage power loss
for each of the 8 VCLEARIT circuits.
Column 5 of Table 6.1 lists the leakage savings seen when comparing the standard
implementation leakage values (from Column 2 {C2}) to those of the VCLEARIT imple-
mentation leakage (from Column 4 {C4}). On average, a 61% improvement in leakage
savings is seen. Similarly, Column 6 of Table 6.1 lists the leakage savings seen when
comparing the self-controlled LECTOR circuit leakage value (from Column 3 {C3}) to
that of the self-controlled VCLEARIT circuit leakage value (from Column 4 {C4}). A
30% improvement in leakage savings (on average) is seen using the VCLEARIT tech-
nique, when compared to the LECTOR implementation for TSMC’s 180 nm technology.
Then, using BPTM’s 100 nm technology, 6 MCNC’91 benchmark standard circuits were
implemented. Simulations were carried out and leakage power measured using SPEC-
TRE [4]. For every circuit, 1500 input combinations were applied and leakage power
loss measured in every case. Column 2 (C2) from Table 6.2 lists the average leakage
power loss for all standard circuit implementations. Next, the same circuits were im-
plemented with BPTM’s 100 nm LECTOR gates, and all LECTOR implementations were
simulated and their leakage loss measured. Column 3 (C3) from Table 6.2 gives the
average leakage power loss for each of the 6 LECTOR circuits. Finally, all 6 circuits
were implemented using BPTM’s 100 nm VCLEARIT gates.
The automated program generated the VCLEARIT self-controlling leakage reduction
circuit for the 6 benchmarks listed in Table 6.2. These circuits were then simulated
and the leakage loss measured. Column 4 (C4) from Table 6.2 gives the average leak-
74
Table 6.2: Leakage Power Comparison @ Temperature = 27oC (BPTM’s 100 nm
Implementation)
CMOS Standard LECTOR VCLEARIT Leakage Leakage
Circuit Implementation Implementation Implementation Savings Savings
Name Leakage Power Leakage Power Leakage Power (C2/C4) (C3/C4)
C17 6.23 nW 4.36 nW 3.49 nW 1.79X 1.25X
C432 436.60 nW 363.90 nW 256.20 nW 1.70X 1.42X
C499 693.80 nW 512.60 nW 425.80 nW 1.63X 1.20X
C880 223.20 nW 157.38 nW 101.88 nW 2.19X 1.55X
C1355 497.60 nW 357.22 nW 244.30 nW 2.04X 1.46X
C6288 7.193 µW 5.370 µW 4.165 µW 1.73X 1.29X
Average 1.85X 1.36X
age power loss for each of the 6 VCLEARIT circuits.
Column 5 of Table 6.2 lists the leakage savings seen when comparing the standard
implementation leakage values (from Column 2 {C2}) to those of the VCLEARIT imple-
mentation leakage (from Column 4 {C4}). On average, an 85% improvement in leakage
savings is seen. Similarly, Column 6 of Table 6.2 lists the leakage savings seen when
comparing the self-controlled LECTOR circuit leakage value (from Column 3 {C3}) to
that of the self-controlled VCLEARIT circuit leakage value (from Column 4 {C4}). A
36% improvement in leakage savings (on average) is seen using the VCLEARIT tech-
nique, when compared to the LECTOR implementation for BPTM’s 100 nm technology.
6.4 Circuit Delay Comparison
TSMC’s 180 nm technology [9], with a supply voltage of 1.8V, was used in the imple-
mentation of this work. Transient analysis was performed on the 3 benchmark circuits
75
Table 6.3: Circuit Delay Comparison (TSMC’s 180 nm Implementation)
CMOS Standard LECTOR Power-Gated VCLEARIT Delay Delay Delay
Circuit Circuit Circuit Circuit Circuit Penalty Penalty Penalty
Name Delay (ps) Delay (ps) Delay (ps) Delay (ps) (C3/C2) (C4/C2) (C5/C2)
Skewed 136.83 198.36 158.99 158.63 1.45X 1.16X 1.16X
Levelized 197.70 318.28 237.21 235.26 1.61X 1.20X 1.19X
Balanced 319.51 492.05 357.84 351.46 1.54X 1.12X 1.10X
Average 1.53X 1.16X 1.15X
listed in Column 1 of Table 6.3 using SPECTRE [4]. The output load in each case was
an appropriately sized NAND2 gate.
For a fair comparison, the power-gated cell was also designed like the VCLEARIT cell.
The gating high-VT transistor was an NMOS transistor for an AND/NAND gate, with
a control value of ‘0’. Likewise, the gating high-VT transistor was a PMOS transistor
for an OR/NOR gate, with a control value of ‘1’. All gates in each of the 3 benchmarks
were sized as follows. The width of the high-VT transistor in both the VCLEARIT cir-
cuit (N0 from Figure 6.1; P0 from Figure 6.2), and the power-gated circuit (P1 from
Figure 3.4 of Section 3.3 of Chapter 3), was ‘W’. The 2 LCT transistors (LCT1 and LCT2
from Figure 1.2 of Chapter 1) were each sized ‘W2
’. All other transistors in the gate
were unit-sized.
Column 2 of Table 6.3 lists the delay of the 3 circuits for the standard implementation.
The delay of the 3 circuits for the LECTOR implementation case is shown in Column 3
of Table 6.3. The power-gated circuit delays are listed in Column 4 of Table 6.3. Col-
umn 5 of Table 6.3 gives the delay values of the VCLEARIT implementation for the
76
3 circuits. Column 6 of Table 6.3 shows the delay penalty (increase) of the LECTOR
circuit (from Column 3 {C3} of Table 6.3), when compared to that of the standard im-
plementation (from Column 2 {C2} of Table 6.3). An average delay penalty of 54% is
seen. Column 7 of Table 6.3 shows the delay penalty of the power-gated circuit (from
Column 4 {C4} of Table 6.3), when compared to that of the standard implementation
(from Column 2 {C2} of Table 6.3). An average delay penalty of 16% is seen. Finally,
Column 8 of Table 6.3 shows the delay penalty of the VCLEARIT circuit (from Column 5
{C5} of Table 6.3), when compared to that of the standard implementation (from Col-
umn 2 {C2} of Table 6.3). An average delay penalty of 15% is seen. We observe almost
similar delay penalties for power-gated and VCLEARIT circuits. This is due to the ex-
tra high-VT transistor in series with the circuit. The LECTOR implementation shows
worse delays because of the presence of two LCT transistors in series with the circuit.
Certain observations can be pointed out from this research. First, there is no need
for an external controller to sequence the operation of any circuit in order to reduce
leakage power. Internal signals can be tapped to implement self-control of the circuit
as done here. Second, routing of various controller signals to different portions of the
circuit, as in the case of MTCMOS circuits with an external controller, leads to complex
routing congestion problems. In this work, the layout is simple, and the only extra
routing involves internally connecting one of the input signals to the “ctrl” signal for
every VCLEARIT gate.
A novel self-controlling leakage reduction technique for CMOS circuits (VCLEARIT) is
presented in this chapter. Signal probabilities determine the mode of operation (func-
tional or standby) of the gates making up complex circuits. Experiments conducted on
a variety of combinational benchmarks for 180 nm and 100 nm technologies show sig-
nificant savings in leakage power for the VCLEARIT technique, when compared to stan-
77
dard circuit implementation, as well as to the LECTOR technique. The delay penalty
of the VCLEARIT technique was comparable to that of the power-gated circuit and was
far superior to that of the LECTOR leakage reduction technique.
The next chapter discusses experiments carried out to study the impact of different
circuit level topologies on the dynamic power dissipation of CMOS circuits. For a given
circuit topology, it would be a pity if excellent leakage savings are either offset or over-
run by excessive dynamic power values.
78
Chapter 7
Dynamic Power Study
Dynamic power is the active or switching component of the total power dissipation
of any circuit, as explained in Section 1.1 of Chapter 1. Equation 7.1 expresses the
dynamic power (PD) value [75] of any CMOS circuit:
PD = αfCLVdd2 (7.1)
where, α is the switching activity, f is the operation frequency, CL is the load capaci-
tance and Vdd is the supply voltage.
Efficient leakage current minimization techniques should not drastically affect dy-
namic power values of a circuit. It is a case of bad design when excellent leakage
savings are either offset or over-run by excessive dynamic power values. Hence, an ef-
fective design or technique would be one which maximizes the net reduction in power
(static + dynamic) dissipation.
Experiments were carried out to study the impact of circuit level choices (LECTOR and
VCLEARIT techniques) on the dynamic power dissipation of CMOS circuits. TSMC’s 180
nm technology [9] was used to implement the various circuits here. A supply voltage
79
Table 7.1: Standard Circuit Dynamic Power Measurement @ Temperature =
27oC (TSMC’s 180 nm Implementation)
CMOS Total Weighted Standard
Circuit # Implementation
Name Transistors Dynamic Power (µW)
C17 24 166.5432
Rbench1 36 393.7464
Rbench2 54 854.5338
Full-Adder 62 1846.4654
Levelized 86 4440.7562
∑
262 7702.045
Weighted Average 29.3971
of 1.8V was used, and all simulations were carried out at a room temperature of 27oC.
SPECTRE [4] was used to simulate circuits and also to measure leakage power. Experi-
ments were conducted on 5 CMOS circuits, the smallest being the MCNC’91 benchmark
- C17, and the largest being the Levelized circuit. For a fair comparison, the weighted
average of the dynamic power (per transistor) for all circuits across the different tech-
niques was used.
7.1 Standard Circuit Dynamic Power
First, all 5 standard CMOS circuits were implemented. Simulations with sequences of
test vectors were carried out and the dynamic power was measured using SPECTRE
[4]. Column 1 of Table 7.1 lists the names of the circuits used. The total number of
transistors for each circuit is given in Column 2 of Table 7.1. The weighted dynamic
power loss for all standard circuit implementations is shown in Column 3 of Table 7.1.
Row 6, Column 1 of Table 7.1 gives the sum total of the number of transistors in all
designs - 262. The summation of the weighted dynamic power values of all designs is
80
Table 7.2: LECTOR Dynamic Power Measurement @ Temperature =
27oC (TSMC’s 180 nm Implementation)
CMOS Total Weighted LECTOR
Circuit # Implementation
Name Transistors Dynamic Power (µW)
C17 36 306.3708
Rbench1 56 671.6472
Rbench2 84 1535.9820
Full-Adder 98 3233.6864
Levelized 134 7750.2518
∑
408 13497.9380
Weighted Average 33.0832
given in Row 6, Column 2 of Table 7.1. Row 7 of Table 7.1 shows the weighted average
dynamic power (per transistor) for standard circuit implementation to be 29.3971 µW.
7.2 LECTOR Circuit Dynamic Power
Next, the LECTOR versions of all 5 CMOS circuits were implemented. Simulations with
sequences of test vectors were carried out and the dynamic power was measured using
SPECTRE. Column 1 of Table 7.2 lists the names of the circuits used. The total number
of transistors for each circuit is given in Column 2 of Table 7.2. The weighted dynamic
power loss for all LECTOR circuit implementations is shown in Column 3 of Table 7.2.
Row 6, Column 1 of Table 7.2 gives the sum total of the number of transistors in all
designs - 408. The summation of the weighted dynamic power values of all designs is
given in Row 6, Column 2 of Table 7.2. Row 7 of Table 7.2 shows the weighted average
dynamic power (per transistor) for the LECTOR circuit implementation to be 33.0832
µW.
81
Table 7.3: VCLEARIT Dynamic Power Measurement @ Temperature =
27oC (TSMC’s 180 nm Implementation)
CMOS Total Weighted VCLEARIT
Circuit # Implementation
Name Transistors Dynamic Power (µW)
C17 42 299.7834
Rbench1 60 661.0200
Rbench2 88 1406.7064
Full-Adder 101 3023.3744
Levelized 143 7420.4130
∑
434 12811.2970
Weighted Average 29.5191
7.3 VCLEARIT Circuit Dynamic Power
Finally, the VCLEARIT versions of all 5 CMOS circuits were implemented. Simulations
with sequences of test vectors were carried out and the dynamic power was measured
using SPECTRE. Column 1 of Table 7.3 lists the names of the circuits used. The total
number of transistors for each circuit is given in Column 2 of Table 7.3. The weighted
dynamic power loss for all VCLEARIT circuit implementations is shown in Column 3
of Table 7.3. Row 6, Column 1 of Table 7.3 gives the sum total of the number of tran-
sistors in all designs - 434. The summation of the weighted dynamic power values of
all designs is given in Row 6, Column 2 of Table 7.3. Row 7 of Table 7.3 shows the
weighted average dynamic power (per transistor) for the VCLEARIT circuit implemen-
tation to be 29.5191 µW.
A small 0.42% increase in dynamic power was seen when comparing the VCLEARIT
implementation dynamic power (Row 7 of Table 7.3) to that of the standard implemen-
82
tation (Row 7 of Table 7.1). This is in contrast to a 12.54% increase in dynamic power,
when comparing the LECTOR implementation dynamic power (Row 7 of Table 7.2) to
that of the standard implementation (Row 7 of Table 7.1). These results show that
the VCLEARIT technique, in addition to providing significant leakage savings, does
not have any adverse effect on the dynamic power of the whole circuit. Hence, the
VCLEARIT technique could be used to reduce the net power (static + dynamic) dissipa-
tion for deep-submicron technologies.
In the next chapter, the conclusions drawn from this research are presented, the sum-
mary of contributions of this dissertation clearly detailed, and the scope for future
work laid out.
83
Chapter 8
Conclusions
With the advent of deep-submicron technologies, leakage loss is a major concern for
scaling down portable devices that have burst-mode type integrated circuits. Leak-
age drains the battery, even when a circuit is completely idle. Power is unnecessarily
consumed, with no useful work being done. In this dissertation, we have developed a
novel self-controlled leakage reduction technique for CMOS circuits and have embed-
ded it into the low-power synthesis framework. In this chapter, we present a summary
of contributions of this work and give suggestions and pointers for future work.
8.1 Summary of Contributions
• Design and Development of the VSDCAD Sleep-Embedded Topology for
Leakage Reduction in CMOS Circuits [49]: A novel technique that achieves
cancellation of leakage effects in both the Pull-Up Network (PUN) as well as the
Pull-Down Network (PDN) of CMOS cells was devised. A combination of high-VT
and standard-VT sleep transistors embedded within the CMOS topology was used
in voltage balancing of the PUN and PDN paths, thereby shutting them off and
minimizing leakage loss.
84
• Characterization of the VSDCAD Ultra Low-Power Standard Cell Library
[51, 54]: As part of this research, an ultra low-power standard cell library was
developed on the basis of the VSDCAD topology. The VSDCAD ultra low-power
standard cell library contains 8 combinational and 2 sequential standard cells,
which have been characterized for area, delay and power.
• Signal Probability Based VCLEARIT Self-Controller Design for Leakage
Power Reduction [52]: The VSDCAD sleep-embedded topology was modified in
this work for better controllability and also to reduce routing congestion. The
self-controller is the vital segment of this dissertation work that sequences the
working of VSDCAD sleep-embedded cells in VLSI circuits. Signal probabilities
are used to determine the mode of operation (functional or standby) of such cells,
thereby avoiding the need for external circuitry.
• Seamless Integration of the Self-Controlled Sleep-Embedded Cells into
the Low-Power Synthesis Flow: A methodology to integrate the VSDCAD ultra
low-power library and the VCLEARIT self-controller into the RTL segment of the
low-power synthesis framework is presented in this work.
8.2 Scope for Future Work
• Study the Effects of Gate Leakage: The primary goal of this dissertation
work was to invent techniques to minimize subthreshold leakage. However, the
gate leakage problem poses a significant design challenge for sub-100 nm CMOS
technologies [32, 71]. The effects of the gate leakage component of power dissipa-
tion could be studied and augmented to provide a complete leakage minimization
package.
85
• Methodology for Comprehensive Leakage Reduction: This methodology
currently being developed at the VSDCAD laboratory in Syracuse University can
make use of the ultra low-power standard cell library developed as part of our
research. For active mode power reduction, multiple power domains are sup-
ported using clusters of high-VT and low-VT cells. For standby leakage reduction,
the cell assignment algorithm makes use of the MTCMOS technique. Hence, the
signal probability based VCLEARIT self-controller cells could be used for high-VT
allocation by this algorithm.
• Investigation of Leakage Reduction Techniques at Higher Levels of Ab-
straction: The investigation of leakage reduction techniques at the behavioral-
level or even system-level of abstraction would be an interesting topic for further
in-depth research.
86
Appendix A
Sleep-Embedded Master-Slave
POSX DFF Schematic
The following page shows the topology of the Positive Edge-triggered Sleep-Embedded
Master-Slave D Flip-Flop. The important cross-sections of the flip-flop are shown in
boxes - Master, Slave, Clock Inverter and State Saving circuitry.
87
Figure A.1: Block Diagram - VSDCAD Master-Slave Positive Edged D Flip-Flop
VSDCAD
CMOS
LIKE
VSDCAD
CMOS
LIKE
clk
clkbar
sleep
sleepbar
clk
clkbar
clkclkbar
sleep
sleepbar VSDCAD
CMOS
LIKE
PATH4 PATH5
PATH6
VSDCAD
CMOS
LIKE
VSDCAD
CMOS
LIKE
PATH2
PATH3
D
sleep
sleepbarclk
clkbar
sleep
sleepbar
clk
clkbar
sleep
sleepbar
clkclkbar
sleepbar
sleep
PATH1
VSDCAD
CMOS
LIKE
INVERTER
CLOCK
VSDCAD
sleepbar
sleep
clkclkbar
MASTER SLAVE
CIRCUITSAVING
STATE
vdd!
HV
vdd!
gnd!
HV
HV
Qbar
QHV
gnd!
vdd!
HV
HV HV
sleep sleepbarvdd!gnd!
TP9
TP13
TN4
TP10
TN1
TP0
TN0
88
Figure A.2: Schematic - VSDCAD Master-Slave Positive Edged D Flip-Flop
89
Appendix B
Automated Gate Control Signal
Calculation - Parser Output
The C17 MCNC’91 benchmark [72] is used to illustrate the automated output from
the parser developed as part of this research. The circuit topology is as shown in
Figure B.1. It comprises of 6 NAND gates, with 5 inputs (INP(0), . . . INP(4)) and 2
outputs (OUTPI(0), OUTPI(1)).
Figure B.1: C17 Circuit Topology
NAND2
OUTPI(1)
OUTPI(0)
NAND3
NAND1
NAND0
INTERP(2)
NAND5
NAND4
INTERP(3)
INTERP(0)
INP(4)
INP(1)
INP(3)
INP(2)
INTERP(1)
INP(0)
90
The VHDL implementation of the C17 benchmark is as follows :
--------------------------------------------------- -----------------
-- Source File Name : c17.vhd -
-- Modified by : Preetham Lakshmikanthan -
-- VLSI Systems Design and CAD (VSDCAD) Laboratory -
-- EECS Department, Syracuse University, Syracuse, NY-132 44, U.S.A -
--------------------------------------------------- -----------------
library IEEE;
use IEEE.std_logic_1164.all;
use work.gates_pkg.all;
ENTITY c17_i89 IS
PORT (
INP : in std_ulogic_vector(0 to 4);
OUTP : out std_ulogic_vector(0 to 1));
END c17_i89;
ARCHITECTURE structural OF c17_i89 IS
signal INTERP : std_ulogic_vector(0 to 3):=(others=>’0’) ;
signal OUTPI : std_ulogic_vector(OUTP’range):=(others= >’0’);
BEGIN
NAND0 : NANDG_N generic map (2,1 ns,1 ns)
port map (
inp(0) => INP(0),
inp(1) => INP(2),
out1 => INTERP(0));
NAND1 : NANDG_N generic map (2,1 ns,1 ns)
91
port map (
inp(0) => INP(2),
inp(1) => INP(3),
out1 => INTERP(1));
NAND2 : NANDG_N generic map (2,1 ns,1 ns)
port map (
inp(0) => INP(1),
inp(1) => INTERP(1),
out1 => INTERP(2));
NAND3 : NANDG_N generic map (2,1 ns,1 ns)
port map (
inp(0) => INTERP(1),
inp(1) => INP(4),
out1 => INTERP(3));
NAND4 : NANDG_N generic map (2,1 ns,1 ns)
port map (
inp(0) => INTERP(0),
inp(1) => INTERP(3),
out1 => OUTPI(0));
NAND5 : NANDG_N generic map (2,1 ns,1 ns)
port map (
inp(0) => INTERP(2),
inp(1) => INTERP(3),
out1 => OUTPI(1));
BUFFER_OUT : OUTP <= OUTPI;
END structural;
-------------------------------------------
92
The automated parser output for the C17 circuit when all the inputs have equal prob-
ability (0.5) of being either a ‘0’ or a ‘1’ value, is as follows :
--------------------------------------------------- ---------------
preetham@nyx˜> execute
Please enter [Path Name]VHDL file to parse : c17.vhd
Please enter number of inputs -> 5
Enter Input[0] Name : INP(0)
Enter Input[0] Probability Value : 0.5
Enter Input[1] Name : INP(1)
Enter Input[1] Probability Value : 0.5
Enter Input[2] Name : INP(2)
Enter Input[2] Probability Value : 0.5
Enter Input[3] Name : INP(3)
Enter Input[3] Probability Value : 0.5
Enter Input[4] Name : INP(4)
Enter Input[4] Probability Value : 0.5
Number of Components in this Design : 6
Successor Gate List :
Gate NAND0 connects to NAND4
Gate NAND1 connects to NAND2 NAND3
Gate NAND2 connects to NAND5
Gate NAND3 connects to NAND4 NAND5
Gate NAND4 has no successors
Gate NAND5 has no successors
93
Input Signal Connectivity to :
NAND0 inputs are ...
1) INP(0) has probability 0.5 to have control value 0
2) INP(2) has probability 0.5 to have control value 0
.. and output probability of gate NAND0 to be 1 is : 0.75
NAND1 inputs are ...
1) INP(2) has probability 0.5 to have control value 0
2) INP(3) has probability 0.5 to have control value 0
.. and output probability of gate NAND1 to be 1 is : 0.75
NAND2 inputs are ...
1) INP(1) has probability 0.5 to have control value 0
2) Gate NAND1 has probability 0.75
.... o/p sig. is : INTERP(1) whose prob. is 0.25
.. and output probability of gate NAND2 to be 1 is : 0.875
NAND3 inputs are ...
1) INP(4) has probability 0.5 to have control value 0
2) Gate NAND1 has probability 0.75
.... o/p sig. is : INTERP(1) whose prob. is 0.25
.. and output probability of gate NAND3 to be 1 is : 0.875
NAND4 inputs are ...
1) Gate NAND0 has probability 0.75
.... o/p sig. is : INTERP(0) whose prob. is 0.25
2) Gate NAND3 has probability 0.875
94
.... o/p sig. is : INTERP(3) whose prob. is 0.125
.. and output probability of gate NAND4 to be 1 is : 0.96875
NAND5 inputs are ...
1) Gate NAND2 has probability 0.875
.... o/p sig. is : INTERP(2) whose prob. is 0.125
2) Gate NAND3 has probability 0.875
.... o/p sig. is : INTERP(3) whose prob. is 0.125
.. and output probability of gate NAND5 to be 1 is : 0.984375
Control Signal Info :
INP(0) with max prob 0.5 is ctrl sig for Gate NAND0
INP(2) with max prob 0.5 is ctrl sig for Gate NAND1
INP(1) with max prob 0.5 is ctrl sig for Gate NAND2
INP(4) with max prob 0.5 is ctrl sig for Gate NAND3
INTERP(0) with max prob 0.25 is ctrl sig for Gate NAND4
INTERP(2) with max prob 0.125 is ctrl sig for Gate NAND5
--------------------------------------------------- ---------------
The automated parser output for the C17 circuit when the various inputs have different
probabilities of being either a ‘0’ or a ‘1’ value, is as follows :
--------------------------------------------------- ---------------
preetham@nyx˜> execute
Please enter [Path Name]VHDL file to parse : c17.vhd
Please enter number of inputs -> 5
Enter Input[0] Name : INP(0)
Enter Input[0] Probability Value : 0.5
Enter Input[1] Name : INP(4)
95
Enter Input[1] Probability Value : 1.0
Enter Input[2] Name : INP(1)
Enter Input[2] Probability Value : 0.2
Enter Input[3] Name : INP(3)
Enter Input[3] Probability Value : 1.0
Enter Input[4] Name : INP(2)
Enter Input[4] Probability Value : 0.9
Number of Components in this Design : 6
Successor Gate List :
Gate NAND0 connects to NAND4
Gate NAND1 connects to NAND2 NAND3
Gate NAND2 connects to NAND5
Gate NAND3 connects to NAND4 NAND5
Gate NAND4 has no successors
Gate NAND5 has no successors
Input Signal Connectivity to :
NAND0 inputs are ...
1) INP(0) has probability 0.5 to have control value 0
2) INP(2) has probability 0.9 to have control value 0
.. and output probability of gate NAND0 to be 1 is : 0.55
NAND1 inputs are ...
1) INP(2) has probability 0.9 to have control value 0
2) INP(3) has probability 1 to have control value 0
96
.. and output probability of gate NAND1 to be 1 is : 0.1
NAND2 inputs are ...
1) INP(1) has probability 0.2 to have control value 0
2) Gate NAND1 has probability 0.1
.... o/p sig. is : INTERP(1) whose prob. is 0.9
.. and output probability of gate NAND2 to be 1 is : 0.82
NAND3 inputs are ...
1) INP(4) has probability 1 to have control value 0
2) Gate NAND1 has probability 0.1
.... o/p sig. is : INTERP(1) whose prob. is 0.9
.. and output probability of gate NAND3 to be 1 is : 0.1
NAND4 inputs are ...
1) Gate NAND0 has probability 0.55
.... o/p sig. is : INTERP(0) whose prob. is 0.45
2) Gate NAND3 has probability 0.1
.... o/p sig. is : INTERP(3) whose prob. is 0.9
.. and output probability of gate NAND4 to be 1 is : 0.595
NAND5 inputs are ...
1) Gate NAND2 has probability 0.82
.... o/p sig. is : INTERP(2) whose prob. is 0.18
2) Gate NAND3 has probability 0.1
.... o/p sig. is : INTERP(3) whose prob. is 0.9
.. and output probability of gate NAND5 to be 1 is : 0.838
97
Control Signal Info :
INP(2) with max prob 0.9 is ctrl sig for Gate NAND0
INP(3) with max prob 1 is ctrl sig for Gate NAND1
INTERP(1) with max prob 0.9 is ctrl sig for Gate NAND2
INP(4) with max prob 1 is ctrl sig for Gate NAND3
INTERP(3) with max prob 0.9 is ctrl sig for Gate NAND4
INTERP(3) with max prob 0.9 is ctrl sig for Gate NAND5
--------------------------------------------------- ---
98
Bibliography
[1] Advanced Micro Devices, Inc.
http://www.amd.com.
[2] ANTLR v3.
http://www.antlr.org.
[3] Berkeley Predictive Technology Model (BPTM).
http://www-device.eecs.berkeley.edu/∼ptm.
[4] Cadence Design Systems, Inc.
http://www.cadence.com.
[5] Intel Corporation.
http://www.intel.com.
[6] International Technology Roadmap for Semiconductors (ITRS-06).
http://www.itrs.net/Links/2006Update/FinalToPost/02 Design 2006Update.pdf.
[7] Oklahoma State University - Standard Cell Library.
http://avatar.ecen.okstate.edu/projects/scells.
[8] Synopsys, Inc.
http://www.synopsys.com.
[9] TSMC Processes Available Through MOSIS.
http://www.mosis.org/products/fab/vendors/tsmc.
[10] IEEE Standard 1076-1993 Hardware Description Language Based on the Verilog.
IEEE Press.
99
[11] IEEE Standard 1076-1993 Standard VHDL Language Reference Manual. IEEE
Press.
[12] ABDOLLAHI, A., FALLAH, F., AND PEDRAM, M. Leakage Current Reduction in
CMOS VLSI Circuits by Input Vector Control. IEEE Transactions on Very Large
Scale Integration (VLSI) Systems 12, 2 (February 2004), 140–154.
[13] ABDOLLAHI, A., FALLAH, F., AND PEDRAM, M. A Robust Power Gating Struc-
ture and Power Mode Transition Strategy for MTCMOS Design. Under Review for
IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2006), 1–24.
[14] ABDOLLAHI, A., AND PEDRAM, M. “Power Minimization Techniques at the RT-
level and Below” In System-On-Chip: Next Generation Electronics. IEE Press,
2006, pp. 387–410.
[15] ABDOLLAHI, A., PEDRAM, M., FALLAH, F., AND GHOSH, I. Precomputation-
based Guarding for Dynamic and Leakage Power Reduction. In Proceedings of
the 21st International Conference on Computer Design (October 2003), pp. 90–97.
[16] ABRAMOVICI, M., BREUER, M. A., AND FRIEDMAN, A. D. Digital Systems Test-
ing and Testable Design. Wiley-IEEE Press, New York, USA, 1994, p. 672.
[17] AGARWAL, K., DEOGUN, H., SYLVESTER, D., AND NOWKA, K. Power Gating
with Multiple Sleep Modes. In Proceedings of the 7th International Symposium
on Quality Electronic Design (March 2006), pp. 633–637.
[18] ALOUL, F. A., HASSOUN, S., SAKALLAH, K. A., AND BLAAUW, D. Robust SAT-
Based Search Algorithm for Leakage Power Reduction. In Proceedings of the 12th
International Workshop on Power And Timing Modeling, Optimization and Sim-
ulation (September 2002), pp. 167–177.
[19] ASHENDEN, P. J. The Designer’s Guide to VHDL, second ed. Morgan Kaufmann
Publishers, San Francisco, USA, 2001, p. 759.
[20] BENINI, L., MICHELI, G. D., AND MACII, E. Designing Low-Power Circuits:
Practical Recipes. IEEE Circuits and Systems Magazine 1, 1 (January 2001), 6–
25.
[21] BHUNIA, S., MAHMOODI, H., GHOSH, D., MUKHOPADHYAY, S., AND ROY, K.
Low Power Scan Design Using First Level Supply Gating. IEEE Transactions on
Very Large Scale Integration (VLSI) Systems 13, 3 (March 2005), 384–395.
100
[22] BORKAR, S. Gigascale Integration - Challenges and Opportunities (Home > Intel
Software Network > Strategies & Technologies).
http://www.intel.com/cd/ids/developer/asmo-na/eng/strategy/182440.htm?page=2.
[23] CALHOUN, B. A., HONORE, F. A., AND CHANDRAKASAN, A. Design Methodology
for Fine-Grained Leakage Control in MTCMOS. In IEEE International Sympo-
sium on Low Power Electronics and Design (August 2003), pp. 104–109.
[24] CARBALLO, J. A., BURNS, J. L., YOO, S. M., VO, I., AND NORMAN, V. R. A Semi-
Custom Voltage-Island Technique and its Application to High-Speed Serial Links.
In Proceedings of the 2003 International Symposium on Low Power Electronics
and Design (August 2003), pp. 60–65.
[25] CHEN, Z., JOHNSON, M., WEI, L., AND ROY, K. Estimation of Standby Leakage
Power in CMOS Circuits Considering Accurate Modeling of Transistor Stacks. In
IEEE International Symposium on Low Power Electronics and Design (August
1998), pp. 239–244.
[26] COPELAND, D. 64-bit Server Cooling Requirements. In Proceedings of the 2005
IEEE 21st Annual Semiconductor Thermal Measurement and Management Sym-
posium (March 2005), pp. 94–98.
[27] DUARTE, D., TSAI, Y. F., VIJAYKRISHNAN, N., AND IRWIN, M. J. Evaluating
Run-Time Techniques for Leakage Power Reduction. In Proceedings of the 7th
Asia and South Pacific Design Automation Conference/15th International Confer-
ence on VLSI Design (January 2002), pp. 31–38.
[28] ELKARABLIEH, B., AND NUNEZ, A. A Synthesis Technique for Reducing Leakage
Based on Signal Controllability. In Proceedings of the 3rd IEEE International
Conference on Electrical and Electronics Engineering (September 2006), pp. 339–
342.
[29] FALLAH, F., AND PEDRAM, M. Standby and Active Leakage Current Control and
Minimization in CMOS VLSI Circuits. IEICE Transactions on Electronics, Special
Section on Low-Power LSI and Low-Power IP E88-C, 4 (April 2005), 509–519.
[30] GOLDSTEIN, L. Controllability/Observability Analysis of Digital Circuits. IEEE
Transactions on Circuits and Systems 26, 9 (September 1979), 685–693.
101
[31] GRAD, J., AND STINE, J. E. A Standard Cell Library for Student Projects. In
Proceedings of the 2003 IEEE International Conference on Microelectronic Sys-
tems Education (June 2003), pp. 98–99.
[32] GUINDI, R. S., AND NAJM, F. N. Design Techniques for Gate-Leakage Reduction
in CMOS Circuits. In Proceedings of the 4th International Symposium on Quality
Electronic Design (March 2003), pp. 61–65.
[33] HANCHATE, N., AND RANGANATHAN, N. LECTOR: A Technique for Leakage
Reduction in CMOS Circuits. IEEE Transactions on Very Large Scale Integration
(VLSI) Systems 12, 2 (February 2004), 196–205.
[34] HE, L., LIAO, W., AND STAN, M. R. System Level Leakage Reduction Consider-
ing the Interdependence of Temperature and Leakage. In Proceedings of the 41st
Design Automation Conference (June 2004), pp. 12–17.
[35] HELLER, L., GRIFFIN, W., DAVIS, J., AND THOMAS, N. Cascode Voltage Switch
Logic: A Differential CMOS Logic Family. In Proceedings of the IEEE Interna-
tional Solid-State Circuits Conference (February 1984), pp. 16–17.
[36] HILLMAN, D. Using Mobilize Power Management IP for Dynamic & Static Power
Reduction in SoC at 130 nm. In Proceedings of the Design, Automation and Test
in Europe (March 2005), vol. 3, pp. 240–246.
[37] HOSSAIN, R., ZHENG, M., AND ALBICKI, A. Reducing Power Dissipation in
CMOS Circuits by Signal Probability based Transistor Reordering. IEEE Trans-
actions on Computer-Aided Design of Integrated Circuits and Systems 15, 3
(March 1996), 361–368.
[38] HUNG, W. L., LINK, G. M., XIE, Y., VIJAYKRISHNAN, N., DHANWADA, N., AND
CONNER, J. Temperature-Aware Voltage Islands Architecting in System-on-Chip
Design. In Proceedings of the IEEE International Conference on Computer Design
(October 2005), pp. 689–694.
[39] IMAN, S., AND PEDRAM, M. POSE: Power Optimization and Synthesis Envi-
ronment. In Proceedings of the 33rd Design Automation Conference (June 1996),
pp. 21–26.
[40] JOHNSON, M. C., SOMASEKHAR, D., AND ROY, K. Models and Algorithms for
Bounds on Leakage in CMOS Circuits. IEEE Transactions on Computer-Aided
Design of Integrated Circuits and Systems 18, 6 (June 1999), 714–725.
102
[41] KAO, J. T., AND CHANDRAKASAN, A. P. Dual-Threshold Voltage Techniques for
Low-power Digital Circuits. IEEE Journal of Solid-State Circuits 35, 7 (July
2000), 1009–1018.
[42] KAPADIA, H., BENINI, L., AND MICHELI, G. D. Reducing Switching Activity on
Datapath Buses with Control-Signal Gating. IEEE Journal of Solid-State Circuits
34, 3 (March 1999), 405–414.
[43] KUO, G. Low-Power Design Goes Mainstream. EE Times
http://www.eetimes.com/news/latest/showArticle.jhtml?articleID=163101933,
1371 (May 2005), 56.
[44] KURSUN, V., AND FRIEDMAN, E. G. Energy Efficient Dual Threshold Voltage
Dynamic Circuits Employing Sleep Switches To Minimize Subthreshold Leakage.
In Proceedings of the IEEE International Symposium on Circuits and Systems
(May 2004), vol. 2, pp. 417–420.
[45] KURSUN, V., AND FRIEDMAN, E. G. Node Voltage Dependent Subthreshold Leak-
age Current Characteristics Of Dynamic Circuits. In Proceedings of the 5th Inter-
national Symposium on Quality Electronic Design (March 2004), pp. 104–109.
[46] LACKEY, D. E., ZUCHOWSKI, P. S., BEDNAR, T. R., STOUT, D. W., GOULD, S. W.,
AND COHN, J. M. Managing Power and Performance for System-on-Chip Designs
using Voltage Islands. In Proceedings of the IEEE/ACM International Conference
on Computer-Aided Design (November 2002), pp. 195–202.
[47] LAKSHMIKANTHAN, P., MULCHANDANI, S., AND NUNEZ, A. Sizing Analog Cir-
cuits using an Improved Optimization-Based Tool. In Proceedings of the 2nd
IASTED International Conference on Circuits, Signals and Systems (November
2004), pp. 130–135.
[48] LAKSHMIKANTHAN, P., AND NUNEZ, A. Design Issues and Implementation
Strategies for Building On-Chip Voltage Level-Shifting Circuits. In Proceedings
of the 4th IASTED International Conference on Circuits, Signals and Systems
(November 2006), pp. 144–149.
[49] LAKSHMIKANTHAN, P., AND NUNEZ, A. A Novel Methodology To Reduce Leak-
age Power In CMOS Complementary Circuits. In Proceedings of the 16th Inter-
national Workshop on Power And Timing Modeling, Optimization and Simulation
(September 2006), pp. 614–623.
103
[50] LAKSHMIKANTHAN, P., AND NUNEZ, A. A Novel Methodology To Reduce Leakage
Power In Differential Cascode Voltage Switch Logic Circuits. In Proceedings of
the 3rd IEEE International Conference on Electrical and Electronics Engineering
(September 2006), pp. 335–338.
[51] LAKSHMIKANTHAN, P., AND NUNEZ, A. A Novel Methodology To Reduce Leak-
age Power In Master-Slave D Flip-Flops. In Proceedings of the 9th Military
and Aerospace Programmable Logic Devices International Conference (September
2006), pp. 1–7.
[52] LAKSHMIKANTHAN, P., AND NUNEZ, A. A Signal Probability Based Self-
Controlling Leakage Reduction Technique for CMOS Circuits. In Proceedings of
the 4th IEEE International Conference on Electrical and Electronics Engineering
(September 2007), pp. 357–360.
[53] LAKSHMIKANTHAN, P., AND NUNEZ, A. VCLEARIT: A VLSI CMOS Circuit
Leakage Reduction Technique for Nanoscale Technologies. In Proceedings of the
Advanced Low Power Systems Workshop at the 21st ACM International Conference
on Supercomputing (June 2007), pp. 15–22.
[54] LAKSHMIKANTHAN, P., SAHNI, K., AND NUNEZ, A. Design of Ultra-Low
Power Combinational Standard Library Cells Using A Novel Leakage Reduction
Methodology. In Proceedings of the 19th IEEE International System-On-Chip Con-
ference (September 2006), pp. 93–94.
[55] MAHMOODI-MEIMAND, H., AND ROY, K. Data-Retention Flip-Flops for Power-
Down Applications. In IEEE International Symposium on Circuits and Systems
(May 2004), vol. 2, pp. 677–680.
[56] MUTOH, S., DOUSEKI, T., MATSUYA, Y., AOKI, T., SHIGEMATSU, S., AND
YAMADA, J. 1-V Power Supply High-Speed Digital Circuit Technology with
Multithreshold-Voltage CMOS. IEEE Journal of Solid-State Circuits 30, 8 (Au-
gust 1995), 847–854.
[57] NARENDRA, S., BORKAR, S., DE, V., ANTONIADIS, D., AND CHANDRAKASAN, A.
Scaling of Stack Effect and its Application for Leakage Reduction. In Proceedings
of the International Symposium on Low Power Electronics and Design (August
2001), pp. 195–200.
104
[58] NARENDRA, S., DE, V., BORKAR, S., ANTONIADIS, D. A., AND CHANDRAKASAN,
A. P. Full-Chip Subthreshold Leakage Power Prediction and Reduction Tech-
niques for Sub-0.18-µm CMOS. IEEE Journal of Solid-State Circuits 39, 2 (Febru-
ary 2004), 501–510.
[59] PARR, T. The Definitive ANTLR Reference : Building Domain-Specific Languages.
The Pragmatic Programmers, Texas, USA, 2007, p. 384.
[60] PEDRAM, M., AND RABAEY, J. M. Power Aware Design Methodologies. Kluwer
Academic Publishers, Massachusetts, USA, 2002, p. 544.
[61] PIVIN, D. Pick the Right Package for Your Next ASIC Design. EDN 39, 3 (Febru-
ary 1994), 91–108.
[62] RABAEY, J. M., CHANDRAKASAN, A. P., AND NIKOLIC, B. Digital Integrated
Circuits - A Design Perspective, second ed. Prentice Hall Publishers, New Jersey,
USA, 2002, p. 761.
[63] RAJAPANDIAN, S., SHEPARD, K. L., HAZUCHA, P., AND KARNIK, T. High-
Tension Power Delivery: Operating 0.18µm CMOS Digital Logic at 5.4V. In IEEE
International Solid-State Circuits Conference (February 2005), pp. 298–299.
[64] ROY, K., AND PRASAD, S. Low-Power CMOS VLSI Circuit Design. Wiley-
Interscience, New York, USA, 2000, p. 376.
[65] SHIGEMATSU, S., MUTOH, S., MATSUYA, Y., TANABE, Y., AND YAMADA, J. A
1-V High-Speed MTCMOS Circuit Scheme for Power-Down Application Circuits.
IEEE Journal of Solid-State Circuits 32, 6 (June 1997), 861–869.
[66] SMALL, C. Shrinking Devices Put the Squeeze on System Packaging. EDN 39, 4
(February 1994), 41–46.
[67] SUZUKI, Y., ODAGAWA, K., AND ABE, T. Clocked CMOS Calculator Circuitry.
IEEE Journal of Solid-State Circuits SC-8, 6 (December 1973), 462–469.
[68] THOMAS, D. E., AND MOORBY, P. R. The Verilog Hardware Description Lan-
guage, fifth ed. Kluwer Academic Publishers, Massachusetts, USA, 2002, p. 408.
[69] WEI, G. Y., AND HOROWITZ, M. A Fully Digital, Energy-Efficient Adaptive
Power-Supply Regulator. IEEE Journal of Solid-State Circuits 34, 4 (April 1999),
520–528.
105
[70] WEI, L., CHEN, Z., ROY, K., YE, Y., AND DE, V. Mixed-Vth (MVT) CMOS Cir-
cuit Design Methodology for Low Power Applications. In Proceedings of the 36th
Design Automation Conference (June 1999), pp. 430–435.
[71] YANG, G., WANG, Z., AND KANG, S. Gate Leakage Tolerant Circuits in Deep
Sub-100 nm CMOS Technologies. Smart Materials and Structures 15, 1 (February
2006), S21–S28.
[72] YANG, S. Logic Synthesis and Optimization Benchmarks User Guide Version 3.0.
Microelectronics Center of North Carolina Technical Report (January 1991).
[73] YANG, S., WOLF, W., VIJAYKRISHNAN, N., XIE, Y., AND WANG, W. Accurate
Stacking Effect Macro-Modeling of Leakage Power in Sub-100nm Circuits. In
Proceedings of the 18th International Conference on VLSI Design (January 2005),
pp. 165–170.
[74] YE, Y., BORKAR, S., AND DE, V. A New Technique for Standby Leakage Reduc-
tion in High-Performance Circuits. In IEEE Symposium on VLSI Circuits Digest
of Technical Papers (June 1998), pp. 40–41.
[75] YEO, K. S., AND ROY, K. Low-Voltage, Low-Power VLSI Subsystems. McGraw-
Hill, New York, USA, 2005, p. 293.
[76] YUAN, L., AND QU, G. A Combined Gate Replacement and Input Vector Control
Approach for Leakage Current Reduction. IEEE Transactions on Very Large Scale
Integration (VLSI) Systems 14, 2 (February 2006), 196–205.
106