volume 1, issue 2, 2007 - semantic scholar · volume 1, issue 2, 2007 novel energy-efficient...

Volume 1, Issue 2, 2007

Novel Energy-Efficient Leakage Current Minimization Techniques for CMOS VLSI Circuits Preetham Lakshmikanthan, Graduate Fellow, EECS Department, L.C. Smith College of Engineering and

Computer Science, Syracuse University, Syracuse, New York, USA. E-mail: [email protected] Abstract

Leakage power loss is a major concern in deep-submicron technologies. High-performance processors and servers consume enormous amounts of operating power. For portable devices that have burst -mode

type integrated circuits, it is acceptable to have leakage during the active mode. However, during the idle state it is extremely wasteful to have leakage, as power is unnecessarily consumed with no useful work being done. Efficient leakage control mechanisms are crucial for saving power.

In this research, we propose novel leakage current minimization techniques for CMOS VLSI circuits. A combination of high-threshold and standard-threshold sleep transistors embedded within the CMOS

topology was used in voltage balancing of the Pull-Up Network (PUN) as well as the Pull-Down Network (PDN), thereby shutting them off and minimizing leakage loss. An ultra-low power standard cell library which uses this technique that achieves cancellation of leakage effects in both the PUN and PDN for

CMOS circuits has been characterized for area, delay and power. A signal probability based self -controller was designed for leakage power reduction. It is the core of this work that sequences the working of these sleep-embedded cells in any VLSI circuit. Since signal probabilities are used to determine the mode of

operation of these cells, there is no need for any extra external circuitry for this purpose. The ultra-low power standard library consists of 8 combinational and 2 sequential cells.

Experimental results show significant leakage savings (an average of 20.7X) in CMOS circuits employing this sleep-circuitry when compared to standard CMOS circuits. A methodology to integrate the ultra low-power library and the self-controller into the low-power synthesis framework is also presented as part of

this research. Comparison of our technique with other well -established leakage reduction techniques shows significant leakage savings of the former over the latter, with comparable area and delay performance degradation. Large leakage savings were observed even at higher temperatures. An analysis

of these sleep-embedded circuits showed a negligible 0.42% increase in dynamic power dissipation. Our technique was also applied to the Differential Cascode Voltage Switch Logic (DCVSL) class of circuits. An order of leakage savings was observed, thereby demonstrating its effectiveness.

Table of Contents

List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv

Chapter 1: Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Sources of Power Dissipation . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Motivation for Leakage Control Mechanisms . . . . . . . . . . . . . . . . 4

1.3 Review of Prior Techniques for Leakage Reduction . . . . . . . . . . . . . 6

1.4 Dissertation Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

Chapter 2: Low-Power Synthesis Framework . . . . . . . . . . . . . . . . . . . 17

2.1 Framework Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 Low-Power Techniques at Different Abstraction Layers . . . . . . . . . . 21

2.3 Contributions of This Dissertation . . . . . . . . . . . . . . . . . . . . . . 22

Chapter 3: VSDCAD Ultra Low-Power Standard Cell Library . . . . . . . . . 24

3.1 VSDCAD Sleep-Circuitry Embedded CMOS Cells . . . . . . . . . . . . . . 24

3.2 Leakage Power Calculation with a CMOS OR2 Circuit Example . . . . . 27

3.3 Leakage Savings Compared to the Power-Gating Methodology . . . . . . 30

3.4 Characterized Low-Power Combinational Cells . . . . . . . . . . . . . . . 32

3.5 Characterized Low-Power Sequential Cells . . . . . . . . . . . . . . . . . 36

3.6 Low-Power Differential Cascode Voltage Switch Logic (DCVSL) Cells . . 41

3.7 Active Mode Leakage Loss Increase . . . . . . . . . . . . . . . . . . . . . 43

3.8 Increase in Dynamic Power Dissipation . . . . . . . . . . . . . . . . . . . 45

Chapter 4: Comparison of VSDCAD with Other Leakage Reduction Techniques 50

4.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

i

4.2 MCNC’91 VSDCAD Implementation Leakage Values . . . . . . . . . . . 52

4.3 Area and Delay Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.4 Leakage Savings Comparison . . . . . . . . . . . . . . . . . . . . . . . . . 58

Chapter 5: Additional Experimental Research . . . . . . . . . . . . . . . . . . 60

5.1 Leakage Savings after Replacement of Only Non-Critical Cells . . . . . . 60

5.2 Effects of Varying Temperature on Leakage Power . . . . . . . . . . . . . 63

Chapter 6: Signal Probability Based Self-Controller . . . . . . . . . . . . . . . 67

6.1 VCLEARIT Control Circuitry Embedded CMOS Gates . . . . . . . . . . 68

6.2 Gate Control Signal Calculation in Circuits . . . . . . . . . . . . . . . . . 69

6.3 Leakage Loss Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.4 Circuit Delay Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

Chapter 7: Dynamic Power Study . . . . . . . . . . . . . . . . . . . . . . . . . 79

7.1 Standard Circuit Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . 80

7.2 LECTOR Circuit Dynamic Power . . . . . . . . . . . . . . . . . . . . . . . 81

7.3 VCLEARIT Circuit Dynamic Power . . . . . . . . . . . . . . . . . . . . . . 82

Chapter 8: Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

8.1 Summary of Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

8.2 Scope for Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

Appendix A: Sleep-Embedded Master-Slave POSX DFF Schematic . . . . . . . 87

Appendix B: Automated Gate Control Signal Calculation - Parser Output . . . 90

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

ii

List of Figures

Figure Number Page

1.1 Projected Subthreshold Leakage Power [22] . . . . . . . . . . . . . 5

1.2 LECTOR CMOS Gate [33] . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.1 Overview of Synthesis Framework for Low-Power Design . . . . 18

3.1 Block Diagram - Generic VSDCAD CMOS Circuit . . . . . . . . . . . 25

3.2 Sleep-Embedded Cascaded OR2 Gate Schematic . . . . . . . . . . . 28

3.3 Output Waveforms for Sleep-Embedded OR2 Gate . . . . . . . . . . 29

3.4 Block Diagram - Generic Power-Gated CMOS Circuit . . . . . . . . 30

3.5 Output Waveforms Showing Functioning of the VSDCAD POSX

Master-Slave D Flip-Flop in Standard and Sleep Modes of Opera-

tion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.6 Block Diagram - Generic Sleep-Embedded DCVSL Circuit . . . . . 42

4.1 Leakage Power Dissipation Comparison . . . . . . . . . . . . . . . . 58

5.1 Temperature Effects On Leakage Power - Standard Mode . . . . . 63

5.2 Temperature Effects On Leakage Power - Standby (Sleep) Mode 64

5.3 Leakage Power Savings at Higher Temperatures . . . . . . . . . . 65

6.1 VCLEARIT CMOS Gate (AND, NAND) with Control Value 0 . . . . . 68

6.2 VCLEARIT CMOS Gate (OR, NOR) with Control Value 1 . . . . . . . 70

6.3 Example illustrating Gate Control Signal Calculation . . . . . . . 71

A.1 Block Diagram - VSDCAD Master-Slave Positive Edged D Flip-Flop 88

A.2 Schematic - VSDCAD Master-Slave Positive Edged D Flip-Flop . . 89

B.1 C17 Circuit Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

iii

List of Tables

Table Number Page

1.1 Summary of Leakage-Power Reduction Techniques and Method-

ologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.1 Low-Power Techniques at Various Design Abstraction Levels [60] 21

3.1 Standard OR2 Gate : Average Leakage Power Loss = 152.34 pW . 27

3.2 Leakage Comparison - VSDCAD Circuit vs. Power-Gated Circuit 31

3.3 Combinational Cell Library Performance Measurements @ Tem-

perature = 27oC for Circuits Implemented using TSMC’s 180 nm

Technology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.4 Combinational Cell Library Performance Measurements @ Tem-

perature = 27oC for Circuits Implemented using TSMC’s 180 nm

Technology (cont’d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5 Sequential Cell Library Performance Measurements @ Tempera-

ture = 27oC for Circuits Implemented using TSMC’s 180 nm Tech-

nology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.6 Sequential Cell Library Performance Measurements @ Tempera-

ture = 27oC for Circuits Implemented using TSMC’s 180 nm Tech-

nology (cont’d) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.7 DCVSL Cell Performance Measurements @ Temperature = 27oC

for Circuits Implemented using TSMC’s 180 nm Technology . . . . 43

3.8 Active Mode Leakage Loss - Standard CMOS Circuit vs. Sleep-

Embedded CMOS Circuit . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3.9 Dynamic Power Dissipation - Standard Circuits vs. VSDCAD Sleep-

Embedded Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4.1 PMOS Supply and Threshold Voltage Values for Various BPTM

Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

iv

4.2 NMOS Supply and Threshold Voltage Values for Various BPTM

Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.3 VSDCAD Leakage Values for MCNC’91 Benchmarks for Various

Deep-Submicron Technologies . . . . . . . . . . . . . . . . . . . . . . 53

4.4 Leakage Power and Delay Comparison for Two-Input Nand Gate 55

4.5 Experimental Results for MCNC’91 Benchmarks (70 nm BPTM Pro-

cess, Supply Voltage = 1V) . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.1 Perf. Chars. of an 8-bit Standard Ripple Carry Adder . . . . . . . 61

5.2 Perf. Chars. of an 8-bit Sleep-Embedded Ripple Carry Adder . . 62

6.1 Leakage Power Comparison @ Temperature = 27oC (TSMC’s 180

nm Implementation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.2 Leakage Power Comparison @ Temperature = 27oC (BPTM’s 100

nm Implementation) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.3 Circuit Delay Comparison (TSMC’s 180 nm Implementation) . . . 76

7.1 Standard Circuit Dynamic Power Measurement @ Temperature

= 27oC (TSMC’s 180 nm Implementation) . . . . . . . . . . . . . . . . 80

7.2 LECTOR Dynamic Power Measurement @ Temperature = 27oC (TSMC’s

180 nm Implementation) . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.3 VCLEARIT Dynamic Power Measurement @ Temperature = 27oC (TSMC’s

180 nm Implementation) . . . . . . . . . . . . . . . . . . . . . . . . . . 82

v

Acknowledgments

First and foremost, I would like to thank God Almighty for giving me the strength and

patience to realize my dream of getting a Ph.D degree.

Next, I would like to thank Dr. Adrian Nunez, my dissertation advisor and mentor at

Syracuse University. I truly admire his perseverance, depth of knowledge and strong

dedication to students and research that has made him one of the most successful pro-

fessors ever. His mastery at any topic is amazing, but yet he is such a humble and

down-to-earth person. I’m glad that I was given the opportunity to work with him.

He brings out the best in his students and I’d like to thank him for all the support,

encouragement and guidance given to me during my graduate years. Any student

should consider himself or herself extremely fortunate to find a gem of an advisor like

Dr. Nunez. Thanks again for everything, Adrian - my friend, philosopher and guide.

Next, I would like to acknowledge Prof. N. Venkateswaran for motivating me and guid-

ing me through my undergrad years. He has always had the confidence that I could

get my Ph.D some day. I have not believed as much in myself as Waran sir did in me.

He lives and breathes for his students. What I am today is all because of him. Thank

you so much, Sir. I will never forget all that you have done for me.

Next, my heartfelt thanks go to my Ph.D committee members: Dr. Shobha Bhatia, Dr.

Ehat Ercanli, Dr. Can Isik, Dr. Srinivas Katkoori, Dr. Nazanin Mansouri and Dr. Fred

vi

Schlereth. All of you have been like co-advisors to me and have helped me a lot with my

dissertation right from its inception. I’m extremely proud to have such wonderful and

knowledgeable people like yourselves serving on my dissertation committee. I’m really

grateful for all your thoughtful insights and suggestions in helping me get exceptional

research done towards my doctoral degree. Thank you, Professors.

How can I forget my dissertation “preview” committee, Mr. Henry Jankiewicz of the

Graduate Editing Center ? Henry did a great job of refining the crappy initial drafts

of my dissertation into the awesome manuscript that it is today. Thanks so much for

your time, effort and patience, Henry.

Next, I’d like to thank all my VSDCAD lab mates: Chandy, Subbu, Sharlet, Amit, Yo-

gesh, Lu, Siddharth, Sameer, Shweta, Deepak, Neema, Dipti, Mayank, Chetan, Ab-

hishek, Karan, Vikram, Pradyuman and Payal for all the good times at the lab. Now

onto the non-VSDCAD folks: S.K, Sashi, Shantanu, Ganji, Ashok, Satish, Smita, Ghan-

shyam, Ravikumar, Marudhu, Anand Natarajan, Murali, Rosanne, Karen, Maureen,

Roni, Sally, Parija, Krishnan, Ganesh, Karthick Jayaraman, Rohan - Netto and Fer-

nandes, Tanu, Murali, Vai, Premal, Ajay Brar, Anand Chandrashekar, Anirudha Kr-

ishna, Gulru, Aravind, Srinath, Bharath, Rahul, Harish, Shyam, Navaneeth, Jimmy,

Tarun, Roma, Salil, Santosh Singh, Young, Priyank, Deniz, Shivani, Srilatha, Venkat

Dharmarajan, Vijay Appadurai, Vishal Chugh, Vishal Kapashi and Yan Zhang to

name a few. Thanks so much to all of you for the fun, frolic and great memories here

at S.U.

Finally, and above everyone else, I would like to thank My Family for standing by me

through all the joys and sorrows that life had to offer. My heartfelt thanks and life-long

gratitude go to my Dearest Mother, Mrs. Mala Kanthan and my Loving Father, Major

K.L. Kanthan for all the love and affection that they have showered upon both their

children. You both are the Best and Most Loving Parents that anyone can hope to have

vii

in this entire universe. If not for your constant support, encouragement and sacrifices

I would never have made it to this stage in life. I love you so much and am proud to be

your son. I still have so much to learn from you and pray that I am re-born as your son

in future lives also. I thank my Dear Brother, Gautham for being such a great sibling

and putting up with my wonky behavior throughout his life (I’m not done as yet, my

dear fellow !). Being the studious type, he continues his quest for knowledge and is

currently pursuing his M.B.A. His undying dedication to studies has always been my

inspiration to try and study something at least. I’d like to thank My Dearest Rathi

Aunty, Dida, Patti and Both Thatha’s who are watching over us and helping my en-

tire family from Up There. I cannot thank Kona Aunty, Sharma Uncle, Nita, Shoba,

Krishna and Vinay enough for suggesting that I pursue a Ph.D at S.U and also for pro-

viding me a home away from home. I am indebted to all of you for life. Last, but not the

least I’d like to thank my Dearest Wife, Raji for bringing a new meaning and purpose

to my otherwise dull life. She was instrumental in me finishing up my dissertation

writeup in a record 3 weeks time and was my pillar of strength during those last few

weeks leading upto my final defense. Thanks so much for your un-ending support and

love, babe !

I’d like to thank all those people who have helped me out in some way or the other, and

whose names I’ve inadvertedly missed out here. Thank you so much, everyone.

viii

To My Loving Parents, Brother, Raji

and

My Dearest Rathi Aunty

Chapter 1

Introduction

With rapid progress in semiconductor technology, feature sizes have shrunk through

the use of deep-submicron processes, thereby enabling extremely complex function-

ality to be integrated on a single chip. Battery-powered electronic systems form the

backbone of the growing market of mobile hand-held devices used all over the world

today. In order to maximize battery life, the tremendous computational capacity of

portable devices such as notebook computers, personal communication devices (cell

phones, pocket PCs, PDAs), hearing aids and implantable pacemakers has to be real-

ized with very low power requirements. With miniaturization and the growing trend

towards wireless communication, power dissipation has become a very critical design

metric. The longer the battery lasts, the better.

Even with the scaling down of the supply voltage, power dissipation has not dimin-

ished. The magnitude of power per unit area has kept growing, and the accompanying

problem of heat removal and power dissipation has kept getting worse. Innovative

cooling and packaging strategies [61] are of little help for the rapidly increasing power

consumption of present day chips. Also, the cost associated with packaging and cool-

ing such devices is becoming prohibitive. In addition to cost, the issue of reliability

is a major concern. Every 10oC increase in operating temperature roughly doubles

1

a component’s failure rate [66]. Minimizing power consumption is currently an ex-

tremely challenging area of research, especially with on-chip devices doubling every

two years [6].

Design styles [62] play a key role in determining the power dissipation, performance

and supply/threshold scalability of a circuit. Dynamic circuits achieve high levels of

performance (speed) and utilize less area. However, they require two operation phases:

pre-charging and evaluation. They cannot be scaled easily due to their low noise im-

munity, and require keeper circuits to restore logic levels. On the other hand, fully

Complementary Metal Oxide Semiconductor (CMOS) styles are usually robust, dissi-

pate low power, have fully restored logic levels, and are easily scalable. In general,

they require more area (2X transistors when compared to X+2 in the case of dynamic

circuits).

1.1 Sources of Power Dissipation

The power consumed by CMOS circuits can be classified into two categories :

• Dynamic Power Dissipation: For a fraction of an instant during the operation

of a circuit, both the PMOS and NMOS devices are “on” simultaneously. The du-

ration of the interval depends on the input and output transition (rise and fall)

times. During this time, a path exists between Vdd and Gnd and a short-circuit

current flows. However, this is not the dominant factor in dynamic power dissi-

pation. The major component of dynamic power dissipation arises from transient

switching behavior of the nodes. Signals in CMOS devices transition back and

forth between the two logic levels, resulting in the charging and discharging of

parasitic capacitances in the circuit. Dynamic power dissipation is proportional

to the square of the supply voltage. Every time a capacitive node (CL) switches

from Vdd to Gnd (and back), energy of CLVdd2 is consumed. In deep-submicron

2

processes, supply voltages and threshold voltages for MOS transistors are greatly

reduced. This, to an extent, reduces the dynamic power dissipation.

• Static Power Dissipation: This is the power dissipation due to leakage cur-

rents which flow through a transistor when no transactions occur and the tran-

sistor is in a steady state. Leakage power depends on gate length and oxide

thickness. It varies exponentially with threshold voltage and other parame-

ters. Reduction of supply voltages and threshold voltages for MOS transistors,

which helps to reduce dynamic power dissipation, becomes disadvantageous in

this case. The subthreshold leakage current increases exponentially, thereby in-

creasing static power dissipation. The main components of leakage current [29]

in a MOS transistor are:

– Reverse-biased junction leakage current: Junction leakage occurs from the

source or drain to the substrate through the reverse-biased diodes when the

transistor is off.

– Gate induced drain leakage: This is caused due to the high field effect in the

drain junction of MOS transistors. It is made worse by high drain to body

voltage and high drain to gate voltage.

– Gate direct tunneling leakage: Gate leakage flows from the gate through the

oxide insulation layer to the substrate. Direct tunneling current is signifi-

cant for low oxide thickness. The gate leakage of a PMOS device is typically

one order of magnitude smaller than that of an NMOS device with identical

Tox and Vdd.

– Subthreshold (weak inversion) leakage: This is the drain to source current

of a transistor operating in the weak inversion region, when gate to source

voltage (VGS) is below the transistor threshold voltage (Vth). Equation 1.1

3

approximates the subthreshold leakage current [64] of a MOSFET.

Isub = A.eθ.

[

1 − e

(

−qVDSkT

)

]

(1.1)

where

A = µ0CoxW

L

(

kT

q

)2

e1.8

and

θ =

[

q

n′kT

(

VGS − Vth0−γ′Vs+ηVDS

)

]

µ0 is the carrier mobility; Cox is the gate oxide capacitance per unit area; W

and L denote the transistor width and length; kTq

is the thermal voltage at

temperature T; n′ is the subthreshold swing coefficient of the transistor; VGS

is the gate to source voltage of the transistor; Vth0is the zero-bias threshold

voltage; γ′Vs is the body effect where γ’ is the linearized body effect coeffi-

cient; and η is the Drain Induced Barrier Lowering (DIBL) coefficient. VDS is

the drain to source voltage of the transistor.

Pleak =∑

i

Isubi.VDSi

(1.2)

Equation 1.2 gives the total leakage power for all the transistors.

Minimization of subthreshold leakage is the primary goal in this research work.

1.2 Motivation for Leakage Control Mechanisms

Cell phones and pocket PCs have burst-mode type integrated circuits, which for the

majority of the time are in an idle state. For such circuits, it is acceptable to have leak-

age during the active mode. However, during the idle state it is extremely wasteful

to have leakage, as power is unnecessarily consumed with no useful work being done.

Given the present advances in power management techniques [60, 64, 75], leakage loss

4

is a major concern in deep-submicron technologies, as it drains the battery, even when

a circuit is completely idle.

Power dissipation of high-performance processors and servers is predicted to increase

linearly over the next decade [26]. The 2006 International Technology Roadmap for

Semiconductors [6] projects power dissipation to reach 198 Watts in the year 2008

and reach 300 Watts by the year 2018. Multi-core integrated processors [1, 5] deliver

significantly greater compute power through concurrency, offer greater system den-

sity and run at lower clock speeds, thereby reducing thermal dissipation and power

consumption to an extent. Leakage power will contribute towards the majority of the

total power consumption for such servers fabricated with deep-submicron technologies.

0.1

1

10

100

Su

bth

resh

old

Lea

kag

e (W

atts

)

Lo

gsc

ale

Technology

0.25u 0.18u 0.13u 90nm 65nm 45nm

Figure 1.1: Projected Subthreshold Leakage Power [22]

Figure 1.1 shows subthreshold leakage power trends [22] in accordance with Moore’s

law. Clearly, with deep-submicron processes, chips will leak excessive amounts of

5

power. By the year 2020, leakage is expected to increase 32 times per device [6]. This

is a major challenge in scaling down designs, and it motivates the need for efficient

leakage control mechanisms to minimize power overheads in circuits designed with

deep-submicron technologies.

An ultra-low power standard cell library was implemented as part of this research. A

novel voltage balancing strategy using sleep transistors to reduce leakage power was

used in implementing the CMOS standard cells. Our technique significantly reduces

leakage power, with savings of 20.7X (on average) for various standard cells designed

with a 180 nm process technology. Designers and automated synthesis tools can select

components from this library to build energy-efficient circuits. A signal probability

based self-controller technique was also developed to integrate the low-power standard

cell library into the low-power synthesis framework.

1.3 Review of Prior Techniques for Leakage Reduction

A lot of interesting research work has been done in the attempt to minimize leakage

power. Listed below are some publications related to our work, each having its own

unique features :

Durate et al. present a survey of leakage minimization techniques in [27]. They list the

benefits and limitations of various techniques and optimizations applied at run-time.

Pedram et al. give a tutorial of various representative power minimization techniques

at the register level (RTL) in [14].

Ye et al. [74] show that the “stacking” of two off devices significantly reduces sub-

threshold leakage, compared to a single off device. These stacks are series-connected

devices between supply and ground (e.g., PMOS stack in NOR or NMOS stack in NAND

6

gates). Their technique enables leakage reduction during standby mode by input vec-

tor activation. It involves extensive circuit simulations to install a vector at the input

of the circuit, so as to maximize the number of PMOS or NMOS stacks with more than

one off device.

Chen et al. [25] performed an analysis of subthreshold leakage through a stack of n-

transistors. A genetic algorithm based technique was used to determine the bounds for

leakage power in various CMOS circuits. As part of their analysis, they determined a

set of test vectors which places corresponding circuits in the low-power standby mode.

Yang et al. [73] present an accurate macro-model for the stacking effect on leakage

power for sub-100 nm circuits.

Narendra et al. [58] present a full-chip subthreshold leakage current prediction model.

In [57], they use a stack-forcing method to reduce subthreshold leakage. This is

achieved by forcing a non-stack transistor of width ‘W ’ to a series-stack of two tran-

sistors, each of width ‘W2

’. This effective method does not affect the input load and

the switching power. However, there is a delay penalty to be incurred as a result of

this stack-forcing. Hence, this technique can be used only on devices in paths that are

non-critical.

Hanchate et al. [33] propose a technique called LECTOR for designing CMOS gates,

which cuts down leakage current by adapting the technique of effective stacking of

transistors. Experimental results obtained using leakage reduction techniques de-

scribed later in this research are compared with the LECTOR results. Figure 1.2 illus-

trates the topology of a LECTOR CMOS gate. Two Leakage Control Transistors (LCTs),

LCT1 and LCT2, are introduced between nodes N1 and N2. The gate terminal for each

LCT is controlled by the source of the other. Hence, these LCTs act as self-controlled

7

in 1

in n

LCT1

LCT2

N2

N1

PDN

PUN

Gnd

Vdd

out

Figure 1.2: LECTOR CMOS Gate [33]

stacked transistors. No external control circuitry is required using the LECTOR imple-

mentation. The introduction of LCTs increases the resistance of the path from Vdd to

Gnd, thereby reducing leakage.

Mutoh et al. [56] were the pioneers of Multi-Threshold voltage CMOS (MTCMOS) cir-

cuits. Here, low-threshold (low-VT ) transistors that are fast and leaky are used to

implement speed-critical logic. High-threshold (high-VT ) devices that are slower, but

have low subthreshold leakage, are used as sleep transistors. Multi-threshold voltage

circuits have degraded noise immunity when compared to standard low-threshold volt-

age circuits. The sleep transistor has to be sized properly to decrease its voltage drop

when it is on. A sleep control scheme was introduced for efficient power management.

Since data retention was required in standby mode, this work was extended, and an

extra high-VT memory circuit was introduced in [65].

8

Wei et al. present a mixed-Vth CMOS circuit design methodology in [70].

Kao et al. [41] used MTCMOS for power-gating. A method to size sleep transistors,

based on a mutual exclusion discharge pattern principle, is described. The introduc-

tion of extra devices in series with the power supplies leads to a performance penalty.

Automated sizing of sleep transistors can be done using the technique illustrated in

[47] by Lakshmikanthan et al.

Agarwal et al. present a technique in [17] for power-gating with multiple sleep modes.

Bhunia et al. present a novel circuit technique in [21] to minimize power dissipation in

combinational circuits. This is achieved by inserting extra supply-gating transistors in

the supply to ground paths of the circuit. They assume that the sleep/wake-up signals

to control these gating transistors are generated from an external power management

unit. In the active mode, the gating transistor is on and the circuit behaves as usual.

In the standby mode, the gating transistor is turned off, thereby cutting off power to

the circuit. In [13], Abdollahi et al. consider another important objective, which is

limiting the number of sleep transistors.

Yuan et al. [76] apply Input Vector Control (IVC) techniques for leakage power re-

duction. IVC utilizes the transistor stack effect in CMOS gates by applying a Min-

imum Leakage Vector (MLV) to the primary inputs of combinational circuits during

the standby mode. The MLV problem is NP-Complete. Typically, an exhaustive circuit

simulation is performed for all input patterns, to find the pattern with the minimum

leakage current. However, this approach is not practical for large circuits. In their

work, Yuan et al. replace internal gates in their worst leakage states by other library

gates, while maintaining the correct functionality of the circuit during the active mode.

They present a divide-and-conquer approach that integrates gate replacement and an

9

optimal MLV searching algorithm for tree circuits.

Johnson et al. [40] show that a particular ordering of the inputs could potentially make

use of the well-known stack effect technique to reduce leakage overheads. Since prac-

tical circuits do not consist of only a single transistor stack, a procedure to evaluate

the leakage of a CMOS circuit, given a set of logic signal inputs, is explained. They it-

eratively choose the input with the largest leakage observability and assign it a value

that results in the smallest leakage. The input combination constructed by this greedy

heuristic was taken as the MLV.

Abdollahi et al. [12] propose a technique to directly control the value of internal nodes

to reduce leakage. They add PMOS and NMOS transistors to some of the gates in the

circuit to increase the controllability of the internal signals of the circuit and decrease

the leakage current of the gates using the “stack effect”. Boolean satisfiability (SAT)

is then used to formulate the problem, which is subsequently solved using efficient

off-the-shelf SAT-solvers [18]. More precisely, given a combinational circuit descrip-

tion, they first construct a boolean network which computes the total leakage of that

circuit. From this Leakage Computing Network (LCN), they write a set of boolean

clauses that capture the leakage current of the original circuit. A SAT-solver is then

used to find the MLV. The time complexity of the SAT solver, however, is exponential in

the worst case.

Kursun et al. evaluate the subthreshold leakage current characteristics of domino logic

circuits in [45]. They show that a discharged dynamic node is preferred for reducing

leakage current in a dual-VT circuit. Alternatively, a charged dynamic node is better

suited for lower leakage in a low-VT circuit. The keeper and output inverter have to be

sized in a dual-VT domino circuit with a high-VT keeper, in order to provide noise im-

10

munity similar to that of a low-VT domino logic circuit. [44] employs these techniques,

coupled with sleep transistor switches, for placing idle domino circuits in a low leakage

state. A high-VT NMOS sleep transistor is connected in parallel with the dynamic node

of domino logic circuits. In the standby mode of operation, the pull-up transistor of the

domino circuit is off, while the NMOS sleep transistor is turned on. The dynamic node

of the domino gate is discharged through the sleep transistor, thereby significantly re-

ducing the subthreshold leakage current.

To achieve low-power benefits without compromising performance, static and dynamic

scaling of supply voltages can be applied. Static supply-scaling is a multiple supply ap-

proach in which critical and non-critical paths are clustered and powered by higher and

lower supply voltages, respectively. Since the speed requirements of the non-critical

clusters are lower than the critical ones, the supply voltage of non-critical clusters can

be lowered without degrading performance. Hillman, in his work in [36], designs an

SoC, using clusters of components characterized at various voltage levels. Whenever

a node from a low-voltage cluster needs to drive a node of a high-voltage cluster (or

vice-versa), a level-conversion is needed at the interface. The secondary voltages may

be generated off-chip [69] or on-chip [63]. Design issues and implementation strategies

for building on-chip dc-dc voltage level-shifting circuits are presented by Lakshmikan-

than et al. in [48]. Dynamic supply scaling is much harder to generate, but saves the

cost of using two supply voltages by adapting the single supply voltage to performance

demand.

Iman et al. present POSE, a Power Optimization and Synthesis Environment [39],

for designing low-power digital circuits at the logic level. POSE provides a unified

framework for specifying and maintaining power relevant circuit information. Power

optimization techniques were developed with area-power trade-offs. Low-power opti-

11

mization algorithms provided in POSE are classified into three categories : Algebraic

Restructuring Techniques, Node Simplification, and Technology Mapping. Experimen-

tal results show an average reduction of power consumption by 29% at the expense

of area increase by 30% on average. The delay of the circuits increased by 4%. This

clearly shows a trade-off between area and power, while the circuit delay is not much

affected.

Abdollahi et al. [15] present a precomputation-based guarding methodology for reduc-

ing both dynamic and static power consumption in CMOS VLSI circuits. Precompu-

tation logic duplicates part of the logic by precomputing the circuit output values, one

clock cycle before they are required. It is a method in which some inputs of a circuit are

frozen, while some smaller circuit computes the output values. Unlike precomputation,

guarded logic does not require synthesis of additional logic to implement the shutdown

mechanism. It exploits the existing signals in the original circuit, and no changes

to the original combinational circuitry are needed. Guarded evaluation involves de-

termining which parts of a circuit are computing useful results and which parts are

computing results that are not used. The unnecessary portions can be shut off. If the

guarding signal itself switches frequently, the power dissipation of the switching sleep

transistors may outweigh the power saving due to guarding. Hence, in their work,

Abdollahi et al. propose a method to generate a new guard signal, based on the most

recent values of the original guarding signal.

Most of the techniques listed above are not complete for RTL synthesis of low-power

circuits. They require an external controller that sequences the working of the entire

circuit. The controller should be able to identify and differentiate between portions of a

circuit that are active (switching) and parts of the circuit that are inactive. “Sleep” sig-

nals should be generated automatically to synchronize the operation of the datapath

12

(design), thereby switching devices back and forth between active and standby modes

of operation. This sleep/enable controller generation is assumed to be present in most

of the prior research work.

Elkarablieh et al. [28] present a synthesis technique for reducing leakage power, based

on signal controllability chains. Local re-synthesis of a large fan-in gate into smaller

sleep-embedded gates that achieve the same functionality is suggested. The sleep sig-

nals controlling the corresponding smaller gates could be judiciously picked from the

pattern combination at the input of the original (large fan-in) gate. Signal controllabil-

ity measures shown in [30] predict how controllable the output of a circuit is. This idea

is used in their work to determine which signal should be used to place some portion

of the circuit in sleep mode. They define controllability as the length of the chain of

gates driven by a signal whose output is controlled by the value of the signal. The idea

is basically to assign sleep signals using lines with the longest controllable chains. A

mathematical model for the estimated power saving is presented.

Calhoun et al. identify sneak leakage paths and present a set of design rules in [23].

They partition the Configurable Logic Blocks (CLBs) of the target Field Programmable

Gate Array FPGA architecture into four sleep regions : A Look-Up Table (LUT) region,

an adder region, a flip-flop region and a control circuitry region. The configuration

bits tell each CLB how to organize its internal parts at run time. These configuration

bits also act as control signals for the sleep regions. Minimal control logic is required

for deciding when to assert the sleep signal for each local sleep region. The FPGA

architecture inherently avoids many interfacing problems for sleep regions by using

transmission gate multiplexors.

Various overheads, like the routing of the sleep control signals, increased area, and

13

an excess delay penalty due to repeated turning on and off the circuit, are observed

for external controller-based leakage reduction circuits. In this research work, all the

components needed to build low-leakage power circuits are completely integrated, in-

cluding the low-power component library and the self-controlling leakage reduction

technique.

Table 1.1 gives the classification, advantages and disadvantages of the leakage-power

reduction techniques and methodologies described previously in this section. Each

technique has its own unique features, and no technique in particular can be claimed

to be better than the other. As can be seen from the advantages and disadvantages

of each methodology, achieving low power consumption definitely involves trading-off

various performance parameters, like area, delay and throughput.

1.4 Dissertation Organization

This dissertation is organized as follows: In Chapter 2, a low-power synthesis frame-

work is presented. Enumerated in this chapter are the significant contributions of this

dissertation and how they fit into a typical synthesis environment. Chapter 3 is a de-

tailed description of the design and methodology used in the development of the ultra

low-power standard cell RTL component library at the VLSI Systems Design and CAD

(VSDCAD) Laboratory in Syracuse University. The characterization of combinational

CMOS cells is presented first, followed by the characterization of sequential circuits.

Experimental results for various classes of circuits are tabulated illustrating signifi-

cant leakage savings in all cases. Chapter 4 is a comparison of the VSDCAD technique

with other well-established leakage reduction techniques. Crucial design constraints,

such as area, delay and leakage savings, are the important factors considered for this

comparison. Chapter 5 presents additional experimental results pertinent to leakage

reduction, and includes a study of leakage effects at higher temperatures. Chapter 6

14

explains the most significant contribution of this dissertation–the signal probability

based VCLEARIT self-controller circuitry. The procedure to calculate the gate control

signal is explained with an example. Experimental results are presented, demonstrat-

ing the effectiveness of the self-controlled circuits. Chapter 7 discusses the effects of

adding extra leakage reduction circuitry on the dynamic power of circuits. Finally,

Chapter 8 details the conclusions drawn from this dissertation work and presents the

scope for future research.

15

Table 1.1: Summary of Leakage-Power Reduction Techniques and Methodolo-

gies

Methodology Related Research Advantages Disadvantages

Description Publications

“Stack” effect due No controller to generate Increased area,

to leakage control [25, 33, 57, 58, 74] sleep/enable signals, No level Degraded circuit delay

transistor insertion converters for voltage scaling,

“Stack” effect due to No process technology Exhaustive simulation,

Input Vector Control (IVC) [12, 34, 40, 76] modification, No negative Needs modeling as boolean

and Minimum Leakage impact with technology scaling satisfiability problem, then

Vector method (MLV) No change in logic circuitry use SAT or ILP solvers

Sleep transistor Implement sleep controller,

insertion [41, 44, 45, 56, 65] No exhaustive simulation, Diligently size sleep transistor,

and/or No level converters for Increased area,

MTCMOS circuits voltage scaling, Widely accepted Degraded circuit delay,

Gating the and used technique among State-retention problem,

supply [13, 21] the research community Mixed process technology

voltage for circuit fabrication

Use of mixed voltage supplies Increased area,

Static/Dynamic for different portions of the Need for circuit partitioning,

voltage [24, 36, 48, 63, 69] circuit, resulting in lower Intricate tuning of level

scaling overall power consumption converters, Not cost effective,

Degraded circuit delay

16

Chapter 2

Low-Power Synthesis Framework

Synthesis is the process of transforming the design from one level of abstraction to

another. CAD research has progressed from low levels of abstraction to higher levels of

abstraction through circuit, logic, register-transfer and behavioral level synthesis. The

input to a typical synthesis environment is a high-level design specification, which is

then transformed into various levels of abstraction, using High Level Synthesis (HLS),

Logic Synthesis and finally Layout Synthesis processes. Synthesis environments try

to satisfy various user-defined design constraints, like area, timing, speed, through-

put and power, to name a few. With present-day design complexities, it is extremely

difficult or impossible to satisfy all the constraints simultaneously. Hence, judicious

trade-offs among various constraints are made, and the design that best matches the

user requirements is returned.

In the previous chapter, an overview of the sources of power dissipation was given,

and the need for leakage control mechanisms and the various issues involved were

explained. In this chapter, a synthesis framework for low-power design is presented,

and the important contributions of this dissertation are listed. The primary goal of

this work is to optimize and produce low-power designs, with an emphasis on leakage

17

(Cadence/Synopsys)

RTL Simulation

Test Vectors

(Cadence/Synopsys)

Slack Calculation

Replacing Non−Critical Cells

Having Enough Slack With

Sleep−Embedded Cells

Controller Design For

RTL VHDL/Verilog

Architectural/Design

(Area, Speed & Power)

Constraints

(VHDL/Verilog) (Cadence/Synopsys)

Behavioral SimulationBehavioral Design Specification

Test Vectors

High−Level Synthesis

(Cadence/Synopsys)

RTL Component Library

TSMC Standard Cells

Controller + Design

Layout Synthesis

(Cadence/Synopsys)

GDSII File

RTL VHDL/Verilog

Layout Simulation

(Cadence/Synopsys)

Test Vectors

Chip Fabrication

Low−Power Cells

VSDCAD Sleep−Embedded

TSMC Standard Cells

RTL Component Library

Leakage Power Reduction

Critical Path Tracing

Figure 2.1: Overview of Synthesis Framework for Low-Power Design

18

power savings, while at the same time honoring other user constraints like area and

timing.

2.1 Framework Overview

The overview of a synthesis framework for low-power design is illustrated in Fig-

ure 2.1. This framework is a top-down approach and consists of various synthesis

phases (HLS, Register Transfer Logic {RTL}, and Layout). The user specifies behav-

ioral designs in subsets of either Verilog [10, 68] or VHDL [11, 19]. Subsets of Verilog or

VHDL are used, since not all constructs of these languages are supported by synthesis.

The various architectural and design constraints, such as area, speed and power, are

also specified by the user. Behavioral simulation is performed to check for the func-

tional correctness of the design.

Next, high-level synthesis is performed on the functionally correct behavioral design.

Commercial tools like Cadence [4] or Synopsys [8] are used for HLS. An RTL compo-

nent library (e.g., TSMC’s 180 nm standard cell library [9]) is used by the HLS system

to generate the RTL Verilog or VHDL code. Critical path analysis is then performed

on the generated RTL code, using Cadence tools or Synopsys Primetime, and the slack

times are calculated for all the standard cells used in the circuit.

Depending on available slack time, the standard cell gates on the non-critical paths of

the circuit are replaced with special sleep-embedded low-power cells that perform the

same functionality. These sleep-embedded cells are selected from an ultra low-power

RTL standard cell component library, which was developed as part of this research

at the VLSI Systems Design and CAD (VSDCAD) Laboratory at Syracuse University.

A combination of high-threshold and standard-threshold sleep transistors embedded

within the CMOS topology was used in voltage balancing of the pull-up network as

19

well as the pull-down network of CMOS circuits, thereby shutting them off and mini-

mizing leakage loss.

The gates on the critical path are unchanged components from the original standard

cell library (TSMC). Whenever non-critical cells are replaced due to availability of slack

time, a regressive search is carried out on all paths of the circuit to ensure that the

critical path has not changed. If the critical path has changed, then the replacement

procedure is backed out and this replacement process is tried on other non-critical cells

in the design. This ensures that the timing of the circuit is not affected, as the original

critical path has not changed. However, the overall circuit leakage power decreases

due to the introduction of low-power cells in non-critical paths. The resultant Verilog

or VHDL RTL datapath (design) code is a mixture of components from the original

standard cell library (TSMC) as well as from the VSDCAD low-power cell library. RTL

simulations are performed to check if the mixed library cell datapath (design) func-

tions according to specifications.

An RTL controller, possibly a Finite State Machine (FSM), or a micro-controller, or a

self control circuit for leakage power reduction and automatic sleep signal generation

in order to change the circuit from operating to standby (sleep) mode and vice-versa,

is then created. Generation of this controller is a complex process with various trade-

offs are involved. An FSM or a micro-controller would be relatively easier to design,

but, would consume extra area, have very slow switching times between the sleep and

wakeup modes of operation and would create routing congestion for the various con-

trol signals in the circuit layout. As an alternative option, a self-controlling circuit

would be more complex to design, but, would alleviate the disadvantages of the prior

technique. In this work, we have designed a signal probability based self-controller for

leakage reduction. RTL simulations are then performed to verify whether or not the

20

controller and datapath work in synchronization with each other and also to check for

timing issues.

Finally, layout synthesis is performed on the RTL code and parasitic extraction done

using Cadence or Synopsys tools. The extracted layout is then simulated before the

GDSII file is sent out to the foundry, so that the chip can be fabricated.

2.2 Low-Power Techniques at Different Abstraction Layers

Low-power design techniques can be applied at various levels of design hierarchy [60]

- the system level, the algorithm (behavior) level, the architecture (structure) level, the

circuit/logic level and the fabrication (technology) level.

Table 2.1 provides a summary of the various techniques that can be applied at each

abstraction level. It shows the sheer complexity of the low-power design problem at all

levels of abstraction. This dissertation presents techniques for leakage reduction and

low-power design at the circuit/logic (RTL) level of abstraction.

Table 2.1: Low-Power Techniques at Various Design Abstraction Levels [60]

Abstraction Level Technique Name(s)

System Partitioning, Power-Down, Power-States

Algorithm Complexity, Concurrency, Regularity, Locality

Architecture Parallelism, Pipelining, Redundancy, Data Encoding

Circuit/Logic Logic Styles, Logic Manipulation, Transistor Sizing, Energy Recovery

Fabrication Technology Threshold Reduction, Multi-Threshold Devices

21

2.3 Contributions of This Dissertation

In the previous section, the overview of a standard synthesis environment for generat-

ing low-power designs was presented. This section summarizes the important features

of the work done for this dissertation. The shaded boxes in Figure 2.1 show where the

major contributions of this dissertation fit into the core segment of the RTL synthe-

sis flow for low-power design. These contributions are enumerated and categorized as

follows:

• Design and Development of the VSDCAD Sleep-Embedded Topology for

Leakage Reduction in CMOS Circuits [49]: A novel technique that achieves

cancellation of leakage effects in both the Pull-Up Network (PUN) as well as the

Pull-Down Network (PDN) of CMOS cells was devised. It involved voltage balanc-

ing in the PUN and PDN paths using a combination of high-VT and standard-VT

sleep transistors. Section 3.1 of Chapter 3 describes in depth the topology as well

as the working of these VSDCAD sleep-embedded CMOS cells.

• Characterization of the VSDCAD Ultra Low-Power Standard Cell Library

[51, 54]: As part of this research, an ultra low-power standard cell library was

developed on the basis of the VSDCAD topology. The VSDCAD ultra low-power

standard cell library contains 8 combinational and 2 sequential standard cells,

which have been characterized for area, delay and power. Sections 3.4 and 3.5

of Chapter 3 describe in detail the process of characterizing these combinational

and sequential cells.

• Signal Probability Based VCLEARIT Self-Controller Design for Leakage

Power Reduction [52]: The self-controller is the vital segment of this disser-

tation work. It sequences the working of the VSDCAD sleep-embedded cells in

22

complex circuits. Signal probabilities are used to determine the mode of opera-

tion (functional or standby) of such cells. The VSDCAD sleep-embedded topology

was modified in this work for better controllability and also to reduce routing

congestion. Chapter 6 describes the VCLEARIT self-controlling leakage reduction

technique. Experiments conducted show significant savings in leakage power for

the VCLEARIT technique, when compared to other well-established techniques,

with comparable area and delay penalties.

• Seamless Integration of the Self-Controlled Sleep-Embedded Cells into

the Low-Power Synthesis Flow: Figure 2.1 of Section 2.1 in this chapter

shows how the various contributions of this dissertation fit into the RTL synthe-

sis segment of the low-power synthesis flow. An example illustrating the leakage

savings obtained after replacing only non-critical cells in a circuit with corre-

sponding VSDCAD cells is presented in Section 5.1 of Chapter 5.

In this chapter, an overview of the synthesis framework for low-power design has been

presented. A step-by-step procedure detailing the way behavioral designs are taken

through a series of synthesis processes, right down to layout, is described. The pri-

mary contributions of this dissertation and the way it can be integrated into the whole

low-power synthesis flow are also explained. In the following chapter, the design, de-

velopment and characterization of the VSDCAD ultra low-power standard cell library

are explained in detail.

23

Chapter 3

VSDCAD Ultra Low-Power

Standard Cell Library

An ultra low-power standard cell RTL component library was developed as part of

this research at the VLSI Systems Design and CAD (VSDCAD) Laboratory at Syracuse

University. This VSDCAD low-power library will be used in the synthesis framework

for generating low-power designs. The following sections of this chapter explain the

methodology used in the design and development of the low-power standard cells.

Then the characterization of combinational CMOS cells is discussed, followed by the

characterization of sequential circuits. The application of the sleep-embedded tech-

nique to Differential Cascode Voltage Switch Logic (DCVSL) circuits in then explained.

Finally, issues relating to active mode leakage power loss, as well as dynamic power

dissipation, are enumerated.

3.1 VSDCAD Sleep-Circuitry Embedded CMOS Cells

The sleep transistor concept used for dynamic circuits in [44] was adapted and modi-

fied to work for leakage reduction in static CMOS complementary circuits. A combina-

24

tion of high-VT and standard-VT sleep transistors are used in our implementation [49],

to provide a well balanced trade-off between high speed and leakage loss. Our tech-

nique facilitates in the creation of an ultra-low power standard cell library, using sleep-

circuitry embedded components.

in 1

in n

Vdd

Gnd

P0

P1

X1

X2

N0

H vt

PUN

PDN

out

sleep

sleep

sleepbar

Figure 3.1: Block Diagram - Generic VSDCAD CMOS Circuit

Figure 3.1 illustrates the topology of a generic CMOS complementary circuit with sleep

transistors embedded in it. We refer to such circuits as VSDCAD sleep-embedded CMOS

circuits for the remainder of this work. There are ‘n’ inputs, in1, . . . inn, feeding the

Pull-Up Network (PUN) as well as the Pull-Down Network (PDN). The transistors in

both the PUN and PDN are standard-VT devices. The sleep-circuitry consists of three

transistors - two PMOS devices {P0 and P1} and one NMOS device {N0}. Transistors

P0 and N0 are standard-VT devices, while P1 is a high-VT device. P0 is connected in

parallel with the PUN, one end connecting to the source (Vdd) and the other end to a

common point X1. N0 is connected in parallel with the PDN, one end connecting to the

25

Gnd and the other end to a common point X2. The high-VT transistor, P1, connects

between the two common points X1 and X2 and behaves like a transmission gate. Two

input signals, “sleep” and its complement “sleepbar” feed transistors {P1, N0} and P0,

respectively. The output of the CMOS circuit, “out”, is drawn from the common point

X2.

The working of the VSDCAD sleep-embedded CMOS circuit is as follows. In the normal

operating mode, “sleep” is off and “sleepbar” is on. This causes transistors {P0, N0}

to turn off and transistor P1 to turn on. The circuit now behaves exactly as a normal

CMOS complementary circuit should. The sleep (standby) operating mode is a little

more involved. In this mode, “sleep” is on and ‘sleepbar’ is off. Hence, transistors {P0,

N0} turn on and transistor P1 turns off. Since P0 is on, common point X1 is also at

voltage Vdd. The PUN is now between two points at equal voltage potential (Vdd) and

hence no leakage current should flow through it. Similarly, N0 is on and common point

X2 is grounded. The PDN is now between two points at equal voltage potential (Gnd)

and hence no leakage current should flow through it. Since “out” is connected to X2,

during the sleep mode the output value will always be ‘0’. The leakage loss occurring

during the sleep mode will only be through the high-VT transistor P1, which is turned

off, but connected between points X1 and X2, which are at different voltage potentials.

For any given process technology, the standard-VT transistors P0 and N0 are unit-sized

devices (the smallest width-to-length {W/L} ratio as defined by the technology). How-

ever, the high-VT transistor P1 needs to be sized appropriately for the VSDCAD sleep-

embedded CMOS cell to have a propagation delay comparable to that of the standard

CMOS cell. There is a nominal increase in both area and propagation delay of the

VSDCAD sleep-embedded circuit, when compared to the standard CMOS circuit. This

overhead of VSDCAD sleep-embedded cells is traded-off against enormous power sav-

26

ings, when compared to the standard CMOS cells.

As an alternative option, an NMOS transistor driven by the input “sleepbar” could be

used in place of the transistor P1. In this circuit, the output of the CMOS circuit “out”

will have to be drawn out from the common point X1, rather than from X2.

3.2 Leakage Power Calculation with a CMOS OR2 Circuit Example

The standard 2-input OR gate (OR2) considered here is a cascade structure consisting

of a 2-input NOR gate followed by an inverter. TSMC’s 180 nm technology [9] with

a supply voltage (Vdd) of 1.8V was used to implement the standard OR2 gate. The

transistor sizes were fixed, similar to those found in the library provided by Oklahoma

State University [7, 31] : PMOS (Width: 3600 nm, Length: 180 nm) and NMOS (Width:

900 nm, Length: 180 nm). SPECTRE1 was used in this work to simulate circuits and

also to measure leakage power.

Table 3.1: Standard OR2 Gate : Average Leakage Power Loss = 152.34 pW

Input Combinations Leakage Power

a b Loss (pW)

0 0 202.45

0 1 185.08

1 0 144.04

1 1 77.76

All possible input combinations for the OR2 gate were applied individually, and their

corresponding leakage power was measured using SPECTRE [4] at a temperature of

27oC. Table 3.1 lists the leakage power loss for all the input combinations. A leakage

1TM - Cadence Design Systems, Inc.

27

Figure 3.2: Sleep-Embedded Cascaded OR2 Gate Schematic

power loss value of 202.45 pW was observed for the “00” input combination. This was

the worst case. The “11” input combination yielded the least leakage power loss value

of 77.76 pW. The average leakage power loss value for all 4 input combination values

of the standard OR2 gate was calculated to be 152.34 pW.

Figure 3.2 illustrates the topology of the VSDCAD sleep-embedded OR2 gate built by

cascading an embedded NOR2 gate followed by an embedded inverter. TSMC’s 180 nm

technology was used in the implementation. Sleep transistors TP2, TN2, TP5 and TN5

28

Figure 3.3: Output Waveforms for Sleep-Embedded OR2 Gate

are standard-VT devices and are unit-sized: PMOS (Width: 600 nm, Length: 180 nm),

NMOS (Width: 600 nm, Length: 180 nm). The other sleep transistors, TP7 and TP8 are

high-VT devices and were sized using a procedure explained in the next section. In the

sleep mode of operation, the output is ‘0’ irrespective of any input combination given.

Hence with “sleep” being ‘1’, “sleepbar” set to ‘0’ and some input combination, the OR2

circuit shown in Figure 3.2 was simulated at a temperature of 27oC. The leakage power

loss measured using SPECTRE [4] was 7.76 pW. Figure 3.3 shows the leakage power

and waveforms obtained. This leakage value is approximately 20 times less than the

leakage value of the standard OR2 gate.

29

3.3 Leakage Savings Compared to the Power-Gating Methodology

Power-gating [13, 21] is a popular technique for reducing leakage power. A power-gated

design uses switches that are high-VT transistors, with sleep signals to effectively

“switch off” the connection to power or ground, thereby turning off leakage power

when the design is in standby mode. Figure 3.4 shows one of the several topologies

(header configuration) of a generic power-gated CMOS circuit. It is a standard CMOS

circuit with ‘n’ inputs in1, . . . inn feeding the PUN and PDN. A high-VT transistor, P1,

connects between the power source (Vdd) and the PUN, acting as a switch. The “sleep”

signal controls transistor P1, turning it on and off as necessary. In the standby mode

of operation, “sleep” is on, thus cutting off power from the CMOS circuit.

in 1

in n

PUN

PDN

Vdd

out

P1

H vt

sleep

Gnd

Figure 3.4: Block Diagram - Generic Power-Gated CMOS Circuit

30

Since power-gating is the most commonly used methodology for reducing leakage power,

it is compared in this section to the VSDCAD sleep-embedded methodology in the

standby mode of operation. Nine experimental CMOS circuits - NOR2, NAND2, OR2,

AND2, XOR2, XNOR2, MUX2x1, FULL ADDER and DFFPOSX were used in this com-

parison. TSMC’s 180 nm technology, with a supply voltage (Vdd) of 1.8V was used to

implement these circuits. All transistors (including the high-VT ones) were unit sized :

PMOS (Width: 600 nm, Length: 180 nm), NMOS (Width: 600 nm, Length: 180 nm).

Table 3.2: Leakage Comparison - VSDCAD Circuit vs. Power-Gated Circuit

CMOS Power-Gated VSDCAD Sleep-Embedded Improvement

Circuit Name Circuit Leakage (pW) Circuit Leakage (pW) (C2/C3)

NOR2 2.225 0.890 2.5X

NAND2 4.176 0.890 4.7X

OR2 6.666 1.780 3.7X

AND2 7.879 1.780 4.4X

XOR2 12.103 2.670 4.5X

XNOR2 14.306 2.670 5.4X

MUX2x1 17.608 3.560 4.9X

FULL ADDER 74.077 14.240 5.2X

DFFPOSX 29.093 9.586 3.0X

Average 4.3X

All 9 circuits were first implemented as power-gated circuits (as shown in Figure 3.4).

SPECTRE [4] was used to simulate them in the standby mode (“sleep” is on) at a tem-

perature of 27oC, and their leakage power was measured. Column 2 of Table 3.2 gives

the average leakage power loss for each of the power-gated cells.

31

Next, all 9 circuits were implemented as VSDCAD sleep-embedded circuits (as shown

in Figure 3.1 of Section 3.1). SPECTRE [4] was used to simulate them in the standby

mode of operation (“sleep” is on and “sleepbar” is off) at a temperature of 27oC, and

their leakage power was measured. Column 3 of Table 3.2 lists the average leakage

power loss for each of the VSDCAD sleep-embedded cells.

The leakage loss of the VSDCAD sleep-embedded cells (from Column 3 {C3}) when

compared to that of power-gated cells (from Column 2 {C2}), is expressed as a ratio

in Column 4 of Table 3.2. The leakage improvement from using the VSDCAD sleep-

embedded methodology across all the experimental circuits is 4.3X (on average). This

is a significant improvement over the commonly used power-gating methodology and

demonstrates the effectiveness of the VSDCAD sleep-embedded technique.

3.4 Characterized Low-Power Combinational Cells

Standard combinational CMOS library cells, such as NOR2, NAND2, OR2, AND2, XOR2,

XNOR2 and MUX2x1, were implemented [54] using TSMC’s 180 nm technology [9].

Transistor sizes in all these circuits were fixed, similar to those found in the library

provided by Oklahoma State University [7, 31]. The W/L ratios of transistors in the

standard cells were in the order of 10X∼20X. The total area of each standard cell is

listed in Column 2 of Table 3.3. A supply voltage (Vdd) of 1.8V was used and transient

analysis performed on all 7 cells listed above, using SPECTRE [4]. The output load for

each of the 7 cells was a fully sized NAND2 gate.

The propagation delay of each cell was calculated and the high-to-low transition (TPHL)

tabulated in Column 3 of Table 3.3. The low-to-high transition (TPLH) was tabulated

in Column 4 of Table 3.3. Next, the circuits were simulated at a temperature of 27oC

32

Table 3.3: Combinational Cell Library Performance Measurements @ Temper-

ature = 27oC for Circuits Implemented using TSMC’s 180 nm Technology

CMOS Standard Circuit Operation Sleep-Embedded Circuit Operation

Circuit Area Propagation Delay Leakage High-VT Area Propagation Delay Leakage

Name TPHL TPLH Power Transistor(s) TPHL TPLH Power

(pm2) (ps) (ps) (pW) W/L Ratio (pm2) (ps) (ps) (pW)

NOR2 1.620 37.12 57.01 84.39 12X 2.224 54.19 69.76 4.79

NAND2 1.296 28.62 39.79 81.20 15X 1.998 38.79 78.13 6.19

OR2 2.106 66.83 90.12 152.34 10X 3.186 123.40 121.59 7.76

AND2 1.782 63.25 58.08 146.36 10X 2.862 64.27 102.26 7.76

XOR2 5.832 90.24 96.20 415.56 12X 7.646 136.33 118.04 14.37

XNOR2 5.832 91.00 109.87 415.67 12X 7.646 114.87 110.88 14.37

MUX2x1 5.346 87.51 101.53 362.79 12X 7.765 137.44 189.31 19.16

FULL (S) 153.17 179.57 (S) 221.97 267.16

ADDER 21.222 (C) 239.21 269.04 1617.56 10X∼12X 37.896 (C) 441.27 429.62 67.52

and their leakage power measured. All possible input combinations were applied and

leakage power loss measured in every case. Column 5 of Table 3.3 lists the average

leakage power loss for each standard CMOS cell.

Next, the VSDCAD sleep-circuitry was introduced (as shown in Figure 3.1 of Sec-

tion 3.1) for all 7 standard CMOS cells. For each cell, transient analysis was performed

in the normal mode of operation (with “sleep” off and “sleepbar” on). The output load

for each of the 7 cells was the same fully sized NAND2 gate used previously. The prop-

agation delays were calculated. These were compared to the standard circuit values

listed in Column 3 and Column 4 of Table 3.3. The high-VT sleep transistor(s) were

33

sized such that the propagation delay of the VSDCAD cell was comparable to that of

the standard cell. Column 6 of Table 3.3 lists this W/L ratio value of the high-VT sleep

transistor(s). The total area of each VSDCAD cell is listed in Column 7 of Table 3.3,

while the propagation delay values TPHL and TPLH are tabulated in Columns 8 and

9 respectively. Finally, the VSDCAD sleep-embedded cell was simulated in the sleep

(standby) mode of operation (with “sleep” on and “sleepbar” off) and the leakage loss

measured. Column 10 of Table 3.3 lists the leakage power loss for all the VSDCAD

sleep-embedded standard cells.

A full adder circuit was built using the low-power VSDCAD sleep-embedded AND2, OR2

and XOR2 library components characterized previously (AND2, OR2 and XOR2 high-VT

sleep transistor(s) with W/L ratios fixed at 10X, 10X and 12X respectively). No addi-

tional tuning for this circuit was necessary. The performance results of the VSDCAD

full adder are presented in Row 8, Columns 6-10 of Table 3.3.

Certain trends and observations from this research on combinational cells are pre-

sented below. The W/L ratio of the high-VT sleep transistor(s) for all the VSDCAD

sleep-embedded cells, as seen in Column 6 of Table 3.3, is in the order of 10X∼15X.

This value is comparable to the transistor sizes of standard CMOS cells. Hence, the

layout of these VSDCAD cells is uniform and more regular, when compared to sleep-

embedded circuits with either unit-sized or extremely large high-VT sleep transistor(s).

Experiments showed that utilizing unit-sized high-VT sleep transistor(s) resulted in

leakage power loss savings of upto 53X (on average). However, a major limitation

was that the propagation delay of such sleep-embedded standard cells was 5X∼6X

more than that of the standard CMOS cells. Hence, a reasonable trade-off among the

area, propagation delay and leakage power savings was made in designing the sleep-

34

Table 3.4: Combinational Cell Library Performance Measurements @ Tem-

perature = 27oC for Circuits Implemented using TSMC’s 180 nm Technology

(cont’d)

CMOS Perf. Comparison Ratios [Columns (C2. . .C5, C7. . .C10) below are from Table 3.3]

Circuit Area Propagation Delay Penalty Leakage

Name Penalty TPHL TPLH Savings

(C7/C2) (C8/C3) (C9/C4) (C5/C10)

NOR2 1.37X 1.46X 1.22X 18X

NAND2 1.54X 1.36X 1.96X 13X

OR2 1.51X 1.85X 1.35X 20X

AND2 1.61X 1.02X 1.76X 19X

XOR2 1.31X 1.51X 1.23X 29X

XNOR2 1.31X 1.26X 1.01X 29X

MUX2x1 1.45X 1.57X 1.86X 19X

FULL (S) 1.45X 1.49X

ADDER 1.79X (C) 1.84X 1.60X 24X

Average 1.49X 1.48X 1.50X 21X

embedded combinational standard library cells.

Columns 2-5 of Table 3.4 give the performance comparison ratios between the standard

CMOS combinational cells and the VSDCAD sleep-embedded combinational cells. Col-

umn 2 of Table 3.4 shows the area penalty (increase) of the sleep-embedded cells (from

Column 7 {C7} of Table 3.3), when compared to that of standard cells (from Column 2

{C2} of Table 3.3). The average area increase for all the circuits is 1.49X, seen in Row 9

of Table 3.4. Kuo [43] states that the most power-efficient high-VT cell has a 2.5X delay

impact, when compared to the standard cell. The propagation delay increase (penalty)

35

for the VSDCAD sleep-embedded cells, when compared to standard cells is 1.48X (on

average for TPHL) and 1.5X (on average for TPLH). Column 5 of Table 3.4 lists the

leakage power loss savings of the VSDCAD sleep-embedded CMOS combinational cells

when compared to standard cells. Average leakage savings of 21X are obtained. The

nominal delay overhead is offset by these significant power savings given the massive

leakage power values predicted in circuits designed using deep-submicron processes

(as shown in Figure 1.1 of Chapter 1). Designers and synthesis tools could add these

sleep-embedded combinational cells in non-critical paths, thereby avoiding effects on

the overall circuit delay, while significantly saving on leakage power loss.

3.5 Characterized Low-Power Sequential Cells

The design of low-power sequential cells is much more involved than that of low-power

combinational cells. This is because low-power sequential circuits are required to re-

tain data even during the power-down (sleep) mode [55]. To this effect, data and clock

retention circuits are employed in flip-flops, to store values during the sleep phase of

operation. In this work, an extension of the Clocked CMOS (C2MOS) Master-Slave Reg-

ister [67] is implemented. During the first half of the clock cycle, the master stage is in

the evaluation mode and samples the input, while the slave stage is in the hold mode.

In the next half of the clock cycle, the master stage is in the hold mode, while the slave

stage evaluates and outputs the value sampled.

The positive edge-triggered master-slave D flip-flop (DFFPOSX) and the negative edge-

triggered master-slave D flip-flop (DFFNEGX) were chosen as the class of standard se-

quential library cells for experimentation. TSMC’s 180 nm technology [9], with a supply

voltage (Vdd) of 1.8V, was used in the implementation of these D flip-flops. Transistor

sizes in these circuits were fixed, similar to those found in the library provided by Ok-

lahoma State University [7, 31]. The W/L ratios of transistors in these standard cells

36

Table 3.5: Sequential Cell Library Performance Measurements @ Temperature

= 27oC for Circuits Implemented using TSMC’s 180 nm Technology

CMOS Standard Circuit Operation Sleep-Embedded Circuit Operation

Circuit Area Timing Parameters Leakage High-VT Area Timing Parameters Leakage

Name TSU THLD TC−Q Power Tran(s) TSU THLD TC−Q Power

(pm2) (ps) (ps) (ps) (pW) W/L Ratio (pm2) (ps) (ps) (ps) (pW)

DFFPOSX 5.994 73.0 89.9 206.39 389.71 8X 9.849 116.3 118.3 405.14 21.21

DFFNEGX 5.994 73.4 90.2 207.86 389.77 8X 9.849 117.0 118.6 405.69 21.21

were in the order of 5X∼20X. The total area of each flip-flop is listed in Column 2 of

Table 3.5.

Transient analysis was performed on the flip-flops listed above, using SPECTRE [4].

The output load in each case was a fully sized NAND2 gate. Various timing parameters

of the flip-flops were measured. The setup time TSU , which is the time that the data in-

put (D) must be valid before the clock transition is tabulated in Column 3 of Table 3.5.

Column 4 of Table 3.5 lists the hold time THLD, which is the time that the data input

(D) must remain valid after the clock edge. The propagation delay of the flip-flop with

respect to the clock edge TC−Q is shown in Column 5 of Table 3.5. Next, the circuits

were simulated at a temperature of 27oC and their leakage power measured. All pos-

sible input combinations were applied and leakage power loss measured in every case.

Column 6 of Table 3.5 lists the average leakage power loss for each flip-flop.

Next, the VSDCAD sleep-embedded DFFPOSX and DFFNEGX cells were implemented

[51]. The various Pull-Up Network (PUN) and Pull-Down Network (PDN) paths (CMOS-

like paths) were first identified in the standard Master-Slave D Flip-Flop. There are

37

7 such paths (including one for the clock inverter). These 7 CMOS-like paths were re-

placed by their equivalent VSDCAD CMOS-like circuits (as shown in Figure 3.1 of Sec-

tion 3.1). Then, a special state-saving circuit was added. It was designed for retaining

data in the master-slave VSDCAD D flip-flop during the power-down (sleep) mode. This

state-saving circuit is a dynamic transmission gate latch, implemented completely us-

ing high-VT transistors, in order to minimize leakage loss. The latch stores the value

of the master-slave flip-flop the instant the circuit goes into the standby mode (“sleep”

is on and “sleepbar” is off). This is achieved by closing the transmission gate and re-

taining whatever value was seen during that instant at the input of the transmission

gate.

Figure A.1 in Appendix A is the block diagram of the VSDCAD master-slave D flip-flop.

The main sections of the VSDCAD D flip-flop (Master, Slave and State-Saving Circuit)

are highlighted in enclosed boxes, as seen in Figure A.1. A level-restoring transistor,

TP13, is part of the state-saving circuitry, in order to strengthen the signal at the out-

put of the transmission gate (formed by connecting transistors TP9 and TN4). All the

high-VT transistors of the state-saving circuitry are unit-sized: PMOS (Width: 600 nm,

Length: 180 nm), NMOS (Width: 600 nm, Length: 180 nm). The sleep transistors in

the VSDCAD CMOS-like paths are unit-sized, with the exception of the high-VT sleep

transistors. The remaining transistors (as in the standard D flip-flop circuit) are sized

similar to those found in the Oklahoma State University library [7, 31]. Figure A.2 in

Appendix A is the actual schematic capture of the VSDCAD master-slave POSX DFF.

For both DFFPOSX and DFFNEGX, transient analysis was performed in the normal

mode of operation. The output load for each was the fully sized NAND2 gate used pre-

viously. The various timing parameters of the flip-flops were calculated. These were

compared to the standard circuit values listed in Columns 3-5 of Table 3.5. The high-

38

VT sleep transistors were sized, such that the propagation delay TC−Q of the VSDCAD

sleep-embedded flip-flop was comparable to that of the standard flip-flop. Column 7

of Table 3.5 lists this W/L ratio value of the high-VT sleep transistors. The total area

of each sleep-circuitry embedded flip-flop is listed in Column 8 of Table 3.5, while the

timing parameter values TSU , THLD and TC−Q are tabulated in Columns 9, 10 and 11

respectively. Finally, the sleep-embedded flip-flop was simulated in the sleep (standby)

mode of operation (“sleep” is on and “sleepbar” is off) and the leakage loss measured.

Column 12 of Table 3.5 lists the leakage power loss for the VSDCAD sleep-embedded

flip-flops.

Certain trends and observations from this research on sequential cells are presented

below. The W/L ratio value of the high-VT sleep transistors for the VSDCAD sleep-

embedded D flip-flops, as seen in Column 7 of Table 3.5, is 8X. This value is comparable

to the transistor sizes of the standard D flip-flops. Hence, the layout of these VSDCAD

flip-flops is uniform and more regular, when compared to sleep-embedded flip-flops

with either unit-sized or extremely large high-VT sleep transistors. Findings showed

that utilizing unit-sized high-VT sleep transistors in VSDCAD sleep-embedded D flip-

flops resulted in leakage power savings of upto 22.39X (on average). However, a major

limitation was that the propagation delay TC−Q of such sleep-embedded flip-flops was

7X∼9X more than that of standard flip-flops. Hence, a reasonable trade-off among the

area, propagation delay and leakage power savings was made in designing the VSD-

CAD sleep-embedded D flip-flop standard cells.

Columns 2-6 of Table 3.6 give the performance comparison ratios between the standard

flip-flop cells and the VSDCAD sleep-circuitry embedded flip-flops. Column 2 of Ta-

ble 3.6 is the area penalty (increase in area) of the VSDCAD D flip-flops (from Column 8

{C8} of Table 3.5), when compared to that of standard D flip-flops (from Column 2 {C2}

39

Table 3.6: Sequential Cell Library Performance Measurements @ Temperature

= 27oC for Circuits Implemented using TSMC’s 180 nm Technology (cont’d)

CMOS Perf. Comparison Ratios [Columns (C2. . .C6, C8. . .C12) below are from Table 3.5]

Circuit Area Timing Parameters Penalty Leakage

Name Penalty TSU THLD TC−Q Savings

(C8/C2) (C9/C3) (C10/C4) (C11/C5) (C6/C12)

DFFPOSX 1.64X 1.59X 1.32X 1.96X 18X

DFFNEGX 1.64X 1.59X 1.31X 1.95X 18X

Average 1.64X 1.59X 1.315X 1.955X 18X

of Table 3.5). The average area increase for the VSDCAD sleep-embedded flip-flops is

1.64X, seen in Row 3 of Table 3.6. Kuo [43] states that the most power-efficient high-

VT cell has a 2.5X delay impact, when compared to the standard cell. The propagation

delay increase (penalty) for the sleep-circuitry embedded flip-flops when compared to

standard flip-flops is 1.955X (on average for TC−Q). The setup (TSU ) and hold (THLD)

times have also increased by factors of 1.59X and 1.315X, respectively. Column 6 of Ta-

ble 3.6 lists the leakage power loss savings of the VSDCAD sleep-embedded flip-flops,

when compared to standard flip-flops. Average leakage savings of 18X are obtained.

Again, as in the case of VSDCAD sleep-embedded combinational circuits, the nominal

delay overhead in sleep-embedded flip-flops is offset by their significant power savings.

Designers and synthesis tools could add these VSDCAD sleep-embedded D flip-flops in

non-critical paths of the circuit.

Figure 3.5 shows the transient waveforms for the characterized VSDCAD positive edge-

triggered D flip-flop, in both the standard as well as standby (sleep) modes of operation.

From the waveforms, it can be observed that the circuit retains proper values in the

40

Figure 3.5: Output Waveforms Showing Functioning of the VSDCAD POSX

Master-Slave D Flip-Flop in Standard and Sleep Modes of Operation

power-down (“sleep” signal high) mode. At simulation times of 45 ns as well as 95 ns,

with the sleep signal high, the VSDCAD sleep-embedded D flip-flop keeps the previously

held values (‘1’ and ‘0’ respectively) rather than storing new input (D) values at the

rising clock edge. This shows that the VSDCAD sleep-embedded master-slave D flip-

flop performs in accordance with the design specifications.

3.6 Low-Power Differential Cascode Voltage Switch Logic (DCVSL) Cells

DCVSL circuits [35] are a combination of two important concepts, differential logic and

positive feedback. These circuits require each input in complementary format, and

41

they produce complementary outputs themselves. The pull-down networks PDN1 and

PDN2 use NMOS devices and are mutually exclusive, i.e, when PDN1 is on, PDN2 is off,

and vice-versa. The function and its inverse are simultaneously implemented by the

circuit.

No DCVSL circuits were characterized as part of the VSDCAD ultra low-power standard

cell library. However, the sleep-transistor technique described in Section 3.1 was ex-

tended [50] and experiments carried out on 3 DCVSL circuits - AND2/NAND2, OR2/NOR2

and XOR2/XNOR2 to check for leakage savings. Figure 3.6 illustrates the topology of

a generic DCVSL circuit with sleep transistors embedded in it. PUT1 and PUT2 are

Pull-Up Transistors.

in 1

in 1

in n

in n

Gnd

PDN1

N1

Vdd

P1

X2

PUT1

H vt

sleep

sleep

sleepbar

out

X1

X4

X3

HV1P

Vdd

Gnd

sleep

PUT2P2

N2

PDN2

outbar

sleepbar

H vt

sleep

PHV2

Figure 3.6: Block Diagram - Generic Sleep-Embedded DCVSL Circuit

42

TSMC’s 180 nm technology with a supply voltage (Vdd) of 1.8V, was used to implement

the 3 DCVSL circuits. All transistors were unit-sized: PMOS (Width: 600 nm, Length:

180 nm), NMOS (Width: 600 nm, Length: 180 nm). SPECTRE was used to simulate

the circuits at a temperature of 27oC and their leakage power measured. Column 2 of

Table 3.7 gives the average leakage power loss for each of the standard DCVSL cells.

Table 3.7: DCVSL Cell Performance Measurements @ Temperature = 27oC for

Circuits Implemented using TSMC’s 180 nm Technology

DCVSL Standard Circuit Operation Sleep-Embedded Circuit Operation Leakage

Circuit Leakage Power Leakage Power Savings

Name (pW) (pW) (C2/C3)

AND2/NAND2 183.61 3.56 52X

OR2/NOR2 184.36 3.56 52X

XOR2/XNOR2 282.1 3.56 79X

Next, the sleep-circuitry was introduced in the DCVSL circuits, keeping all the tran-

sistors unit-sized (including the high-VT ones). The leakage power of these circuits

was then measured. Column 3 of Table 3.7 gives the leakage power loss for each of

the sleep-embedded DCVSL cells. Column 4 of Table 3.7 lists the leakage power sav-

ings of the sleep-circuitry embedded DCVSL cells when compared to the standard cells.

Significant power savings are seen for this class of circuits also.

3.7 Active Mode Leakage Loss Increase

Up until this point in the experiments, a comparison was always provided between the

active mode leakage power loss of standard cells and the standby (sleep) mode leakage

power loss of sleep-embedded cells. This provided accurate leakage savings results

43

when the sleep-embedded circuit was shut off or put in the standby mode. However,

no comparison was made between the leakage loss during the active mode of standard

cells and the active mode leakage loss of the sleep-embedded cells. In this section, the

active mode leakage loss results of the standard cells, as well as the sleep-embedded

cells, are provided. This allows a fair comparison between a standard cell library and

the VSDCAD ultra low-power standard cell library.

Table 3.8: Active Mode Leakage Loss - Standard CMOS Circuit vs. Sleep-

Embedded CMOS Circuit

CMOS Standard Circuit Sleep-Embedded Circuit Leakage Increase

Circuit Name Leakage (pW) Leakage (pW) Ratio (C3/C2)

NOR2 84.39 100.90 1.20X

NAND2 81.20 120.63 1.49X

OR2 152.34 214.87 1.41X

AND2 146.36 196.04 1.34X

XOR2 415.56 458.18 1.10X

XNOR2 415.67 458.11 1.10X

MUX2x1 362.79 472.85 1.30X

FULL ADDER 1617.56 1973.00 1.22X

DFFPOSX 389.71 618.19 1.59X

DFFNEGX 389.77 618.20 1.59X

Average 1.33X

Table 3.8 gives the leakage performance comparisons for the standard cells and the

sleep-embedded cells in the active mode. Results from both combinational and se-

quential cells are displayed in Table 3.8. Circuit simulations were performed using

SPECTRE [4], and transistor sizes were fixed, similar to those found in the library pro-

44

vided by Oklahoma State University. Column 2 of Table 3.8 provides the active mode

leakage loss of standard cells. Column 3 of Table 3.8 lists the active mode leakage loss

of the sleep-embedded cells. The leakage increase of the sleep-embedded cells (from

Column 3 {C3}), when compared to that of standard cells (from Column 2 {C2}), is ex-

pressed as a ratio in Column 4 of Table 3.8. The average leakage increase in the active

mode across all the sleep-embedded cells is 33%. This small increase is due to the fact

that the standard-VT sleep transistors, added in parallel to the PUN and PDN (P0 and

N0 from Figure 3.1) leak during the active mode. The high-VT sleep transistor (P1 from

Figure 3.1), has negligible leakage when compared to the other two standard-VT sleep

transistors in the active mode.

Standard cells are always on, with no means of being switched off when they are not

needed. Hence, sleep-embedded VSDCAD cells are preferred for low-power designs,

in spite of a small increase in leakage power loss in the active mode. For the entire

length of the circuit operation, the overall leakage loss (a combination of many active

and standby modes) by using the sleep-embedded cells is significantly less than the

leakage loss occurring using standard cells that are always on.

3.8 Increase in Dynamic Power Dissipation

The main emphasis till now has been on the standby (sleep) mode leakage power loss

of the VSDCAD sleep-embedded cells. The dynamic power loss of these circuits has not

been explored as yet. As explained in Section 1.1, dynamic power dissipation depends

mainly on transient switching activity and frequency of operation, as well as on the

square of the supply voltage. In this section, the effect of the additional VSDCAD sleep-

circuitry components on dynamic power dissipation of standard cells is studied. The

combinational and sequential VSDCAD standard library cells characterized previously

in this work were used, and their dynamic power measured.

45

Table 3.9: Dynamic Power Dissipation - Standard Circuits vs. VSDCAD Sleep-

Embedded Circuits

CMOS Standard Circuit Sleep-Embedded Circuit Dynamic Power Penalty

Circuit Name Dynamic Power (µW) Dynamic Power (µW) (Increase) (C3/C2)

NOR2 2.89943 3.95455 1.364X

NAND2 3.86127 5.33194 1.381X

OR2 4.61272 5.87739 1.274X

AND2 6.08750 7.87104 1.293X

XOR2 15.90818 17.60026 1.106X

XNOR2 19.61208 21.29784 1.086X

MUX2x1 13.77033 15.71666 1.141X

FULL ADDER 69.18433 80.48972 1.163X

DFFPOSX 34.26382 39.90766 1.165X

DFFNEGX 34.77193 40.20384 1.156X

Average 1.213X

Table 3.9 gives the dynamic power dissipation comparison between standard cells and

VSDCAD sleep-embedded cells. Transistor sizes were fixed, similar to those found in

the library provided by Oklahoma State University [7, 31], and circuit simulations

were performed using SPECTRE [4]. Column 2 of Table 3.9 gives the dynamic power

loss of standard cells. Column 3 of Table 3.9 lists the dynamic power dissipation of

the VSDCAD sleep-embedded cells. The dynamic power penalty (increase) of the sleep-

embedded cells (from Column 3 {C3}), when compared to that of standard cells (from

Column 2 {C2}), is expressed as a ratio in Column 4 of Table 3.9. The average dynamic

power loss increase across all the VSDCAD sleep-embedded cells is 21.3%. This power

increase is due to the additional transistors introduced and the consequent capacitive

increase in the sleep-embedded circuits.

46

The literature detailing various methods to reduce dynamic power has been analyzed

and can be summarized as follows :

• Clock and Signal Gating: This is the simplest and most straight forward

method to reduce transient switching activity of the highly active nodes in a cir-

cuit. Control-signal gating techniques, like those presented by Kapadia et al. in

[42], target reduction in switching power.

• Operand Isolation Techniques: The input-sharing problem is typically the

cause of unnecessary switching activity in modules where there should be none.

For example, consider a simple Arithmetic and Logic Unit (ALU) designed for 4

operations (add, subtract, multiply and shift), all sharing 2 input signals - “in1”

and “in2”. During a cycle to perform only the “subtract” operation, the adder,

multiplier and shifter units are simultaneously active along with the subtractor,

thereby wasting power. Operand isolation techniques, like using multiplexers or

using multiple registers to drive different modules, solve the input-sharing prob-

lem. However, this increases the area and the delay, and adds other overheads.

• Circuits Comprised of Independent Voltage Islands: Lackey et al. present

a comprehensive background on methods used to design voltage islands in [46].

They present various voltage island scenarios, a system architecture and chip

implementation methodology, which are used to reduce active and static power

consumption in SoC designs. The design implications of voltage islands are also

evaluated.

Hillman in [36] focuses on minimizing the operating voltage to reduce dynamic

power. The library of components created was characterized for different volt-

ages. Next, the whole SoC design was built with various components from this

47

library, using voltage level-shifting circuits and voltage isolation cells. The on-

chip dc-dc voltage level-shifting circuits already designed in [48] could be used in

experimentation with this methodology.

Carballo et al. propose a semi-custom voltage island approach in [24] to build

high-speed serial links. Their approach is a mixture of selective custom design

and the transparent use of multiple supplies to reduce power. The digital cir-

cuitry on the chip runs at a low supply voltage, while the analog circuitry runs

at a higher voltage level. An on-chip regulator converts low to high voltage, and

vice-versa. MTCMOS transistors are used in the custom design process.

Hung et al. [38] present a voltage island partitioning and floor-planning algo-

rithm for architecting SoC designs. Their work explores the thermal impact of

voltage islands. A hybrid optimization approach consisting of a genetic algorithm-

based (GA-based) voltage island partitioning algorithm and a simulated annealing-

based (SA-based) floor-planning algorithm, is presented.

• Transistor Re-ordering Techniques: Hossain et al. [37] use a probability-

based transistor re-ordering technique to reduce dynamic power dissipation in

CMOS circuits.

In this chapter, the design, development and characterization of the VSDCAD Ultra

Low-Power RTL Standard Cell Library have been explained. The core of the low-power

library development - a novel technique that achieves cancellation of leakage effects

in both the Pull-Up Network (PUN), as well as the Pull-Down Network (PDN) of CMOS

cells, is presented. It involves voltage balancing in the PUN and PDN paths, using a

combination of high-VT and standard-VT sleep transistors. Experimental results show

significant leakage power savings (an average of 20.7X for a 180 nm process technology

at a temperature of 27oC) in both combinational and sequential CMOS library cells

employing this sleep-circuitry, when compared to standard CMOS cells. In the next

48

chapter, a thorough comparison of the VSDCAD technique with other well-established

leakage reduction techniques is made. Important design constraints, such as area

utilization, circuit delay and the leakage savings of a few leakage reduction techniques,

will be evaluated and compared.

49

Chapter 4

Comparison of VSDCAD with

Other Leakage Reduction

Techniques

In order to properly evaluate the VSDCAD technique presented in the previous chap-

ter, it has to be compared with other leakage reduction techniques for important de-

sign constraints like area utilization, circuit delay and leakage savings. In the fol-

lowing sections of this chapter, the VSDCAD technique is compared [53] against the

well-established LECTOR and Power-Gating techniques, using a variety of MCNC’91

benchmarks [72]. The Berkeley Predictive Technology Models (BPTM) [3] were used to

implement and simulate the circuits in this work. Since BPTM contains models only

for standard-VT PMOS and NMOS transistors, models for high-VT PMOS and NMOS

transistors were developed as part of this research for this comparative study. Results

show that a definitive trade-off exists among the various design constraints - area

utilization, propagation delay and leakage power savings, for all leakage reduction

techniques.

50

4.1 Experimental Setup

Experiments were conducted on a variety of combinational multi-level MCNC’91 bench-

marks [72]. Circuits were implemented using various deep-submicron process tech-

nologies. The HSPICE1 simulator, in conjunction with the BPTM [3] deep-submicron

technology, was used to simulate circuits and to estimate leakage power dissipation.

All circuits (unless specified otherwise) were simulated at a temperature of 25oC.

The Berkeley Predictive Technology Models (BPTM) contained process parameters and

values only for standard-VT PMOS and NMOS transistors. No models are available for

high-VT transistors. Experiments using some proprietary technology models obtained

directly from foundries showed an interesting trend in the threshold voltage value of

high-VT transistors. For a variety of deep-submicron technologies, we observed that

the threshold voltage (VT ) value of a high-VT PMOS or a high-VT NMOS transistor was

25%-35% more than that of a standard-VT transistor. Hence, models for high-VT PMOS

and NMOS transistors were incorporated into BPTM [3] with threshold voltage val-

ues 25% more than that of standard-VT transistors. DC simulations were run using

HSPICE [8] to ensure that the threshold values of these high-VT transistors were only

25% more than those of standard-VT transistors.

Tables 4.1 and 4.2 list the supply and threshold voltage values for various BPTM mod-

els for PMOS and NMOS transistors respectively. The first columns in Tables 4.1 and 4.2

list the technology feature size. The supply voltage used for each feature size is listed

in Column 2 of both Tables 4.1 and 4.2. The zero-bias threshold voltage (Vth0) of a

PMOS standard-VT transistor is tabulated in Column 3 of Table 4.1, and that of an

NMOS standard-VT transistor is tabulated in Column 3 of Table 4.2. Column 4 of

1TM - Synopsys, Inc.

51

Table 4.1: PMOS Supply and Threshold Voltage Values for Various BPTM Mod-

els

BPTM Supply PMOS Standard Tran. PMOS Standard Tran. PMOS High-Vt Tran.

Feature Size Voltage (Vdd) Zero-Bias Threshold (Vth0) Threshold (Vth) Threshold (Vth)

180 nm 1.8V -0.4200V -0.2822V -0.3528V

130 nm 1.5V -0.3499V -0.4108V -0.5137V

100 nm 1.0V -0.3030V -0.2891V -0.3614V

70 nm 0.85V -0.2200V -0.3338V -0.4173V

Table 4.2: NMOS Supply and Threshold Voltage Values for Various BPTM Mod-

els

BPTM Supply NMOS Standard Tran. NMOS Standard Tran. NMOS High-Vt Tran.

Feature Size Voltage (Vdd) Zero-Bias Threshold (Vth0) Threshold (Vth) Threshold (Vth)

180 nm 1.8V 0.3999V 0.4432V 0.5540V

130 nm 1.5V 0.3320V 0.3110V 0.3887V

100 nm 1.0V 0.2607V 0.2773V 0.3466V

70 nm 0.85V 0.2000V 0.3133V 0.3916V

Table 4.1 gives the threshold voltage (Vth) of a standard PMOS transistor, while Col-

umn 4 of Table 4.2 gives the threshold voltage (Vth) of a standard NMOS transistor. The

threshold voltage of a high-VT PMOS transistor is listed in Column 5 of Table 4.1 and

the threshold voltage of a high-VT NMOS transistor is listed in Column 5 of Table 4.2.

4.2 MCNC’91 VSDCAD Implementation Leakage Values

Forty-six experimental MCNC’91 benchmark circuits were implemented with individ-

ual VSDCAD CMOS gates. They were sized appropriately for 4 different deep-submicron

technologies - 180 nm, 130 nm, 100 nm and 70 nm. The supply voltages for the respec-

tive technologies are given in Column 2 of Table 4.1. Simulations were carried out,

using HSPICE [8] in the standby mode of operation, and their leakage loss measured.

52

Table 4.3: VSDCAD Leakage Values for MCNC’91 Benchmarks for Various Deep-

Submicron Technologies

MCNC’91 VSDCAD Leakage Value for BPTM (nW) MCNC’91 VSDCAD Leakage Value for BPTM (nW)

Circuit 180 nm 130 nm 100 nm 70 nm Circuit 180 nm 130 nm 100 nm 70 nm

I1 3.117 1.236 0.336 0.169 apex7 11.93 4.728 1.287 0.645

I2 7.387 2.928 0.797 0.400 b9 8.471 3.358 0.914 0.458

I3 6.099 2.418 0.658 0.330 c8 11.11 4.406 1.199 0.601

I4 8.132 3.224 0.877 0.440 cht 15.52 6.152 1.674 0.839

I5 19.31 7.656 2.083 1.045 comp 10.23 4.056 1.104 0.553

I6 23.04 9.133 2.485 1.246 cordic 6.912 2.740 0.746 0.374

I7 31.92 12.65 3.443 1.726 count 9.691 3.841 1.045 0.524

I8 124.1 49.19 13.38 6.712 dalu 115.0 45.59 12.41 6.220

I9 35.37 14.02 3.816 1.913 frg1 7.116 2.821 0.768 0.385

I10 153.2 60.71 16.52 8.284 frg2 68.04 26.97 7.339 3.680

C432 10.84 4.298 1.170 0.586 k2 81.39 32.26 8.779 4.402

C499 13.69 5.426 1.477 0.740 pair 97.18 38.52 10.48 5.256

C880 25.96 10.29 2.800 1.404 parity 4.608 1.827 0.497 0.249

C1355 37.00 14.67 3.991 2.001 rot 46.83 18.56 5.051 2.533

C1908 59.64 23.64 6.433 3.226 sct 6.167 2.445 0.665 0.334

C2670 80.85 32.05 8.721 4.373 t481 140.4 55.66 15.15 7.595

C3540 113.1 44.83 12.20 6.118 term1 24.26 9.617 2.617 1.312

C5315 156.3 61.97 16.86 8.456 ttt2 13.55 5.373 1.462 0.733

C6288 163.7 64.90 17.66 8.856 vda 39.64 15.71 4.276 2.144

C7552 238.0 94.34 25.67 12.873 x1 19.31 7.656 2.083 1.045

alu2 22.70 8.999 2.449 1.228 x2 2.846 1.128 0.307 0.154

alu4 46.15 18.29 4.978 2.496 x3 48.45 19.21 5.227 2.621

apex6 30.63 12.14 3.304 1.657 x4 25.00 9.912 2.697 1.353

Since exhaustive testing for many of the benchmarks was impossible, a representative

sample of 1500 randomly generated input vector combinations was applied to each of

the circuits, and leakage loss was measured in every case. The average of these 1500

values is listed as the leakage dissipation value for a circuit in Table 4.3.

Columns 1 and 6 of Table 4.3 list the MCNC’91 benchmark names. Columns 2 and 7

of Table 4.3 give the leakage values of the various benchmarks implemented using the

53

180 nm BPTM. Similarly, Columns 3 and 8 give leakage values of the benchmarks for

the 130 nm BPTM; Columns 4 and 9 give leakage values of the benchmarks for the 100

nm BPTM; and Columns 5 and 10 give leakage values of the benchmarks for the 70 nm

BPTM. The circuit C7552, containing approximately 3500 gates, is the largest design

among all the benchmarks chosen, while circuit x2, containing approximately 50 gates,

is the smallest. From Table 4.3, we observe an order of magnitude of leakage savings

for all benchmarks as the technology shrinks, from 180 nm down to 70 nm.

4.3 Area and Delay Comparison

In addition to leakage power reduction, the VSDCAD leakage reduction technique needs

to be evaluated for essential performance parameters like area and delay. Towards this

end, we compare it with the well-established LECTOR technique [33]. In order to com-

pare propagation delays using both techniques, a two-input NAND gate was used as

an example, and implementation was done using the BPTM’s 100 nm and 70 nm tech-

nologies. Table 4.4 lists the delay and leakage power comparison for the LECTOR and

VSDCAD techniques. For a fair comparison, the supply voltage was set to 1V for both

the 100 nm and 70 nm process technologies (as was done in [33]). The LECTOR NAND

gate used was the 1-LCT case reported in [33], where the widths of the LCT1 and LCT2

transistors were the same as those of the PMOS and NMOS transistors of the PUN and

PDN (seen in Figure 1.2). Hence in the VSDCAD NAND implementation, the width of

the high-VT transistor P1 (seen in Figure 3.1 of Chapter 3) was set to be the same size

as that of LECTOR LCT1 transistor (seen in Figure 1.2 of Chapter 1). All other PUN

and PDN transistors were sized the same as those in the LECTOR case. The extra sleep

transistors, P0 and N0 (seen in Figure 3.1 of Chapter 3) were unit-sized.

The values reported in Rows 2 , 3 , 5 and 6 of Table 4.4 are those given in [33]. Row 2

and Row 5 list the leakage and delay values for a conventional NAND gate, using 100

54

Table 4.4: Leakage Power and Delay Comparison for Two-Input Nand Gate

100 nm Process Technology (BPTM), Supply Voltage = 1V

NAND Gate Leakage Power Dissipation in Watts for Input Vector Delay Delay Average Leakage

Type (0, 0) (0, 1) (1, 0) (1, 1) (ps) Penalty Savings Ratio

Conventional 1.228e-10 9.117e-10 5.356e-10 2.241e-09 13.53 - -

LECTOR 1.180e-10 5.542e-10 4.477e-10 1.539e-09 18.79 38.88% 1.433X

VSDCAD 8.114e-12 8.114e-12 8.114e-12 8.114e-12 16.73 23.65% 117.1X

70 nm Process Technology (BPTM), Supply Voltage = 1V

Conventional 6.450e-10 5.600e-09 3.817e-09 1.091e-08 15.16 - -

LECTOR 6.065e-10 3.808e-09 3.622e-09 5.564e-09 21.40 41.16% 1.5421X

VSDCAD 5.142e-12 5.142e-12 5.142e-12 5.142e-12 18.29 20.65% 1019.6X

nm BPTM and 70 nm BPTM, respectively. Row 3 and Row 6 list the leakage and delay

values for the LECTOR NAND gate, using 100 nm BPTM and 70 nm BPTM, respectively.

Row 4 and Row 7 of Table 4.4 give the leakage (in standby mode) and delay values for

the VSDCAD NAND gate, using 100 nm BPTM and 70 nm BPTM, respectively. All high-

lighted values in Table 4.4 represent the best values achieved. Analysis of the results

shows that the VSDCAD technique has the least leakage power dissipation, while the

conventional NAND gate has the least propagation delay value. Column 7 of Table 4.4

gives the delay penalties of both the LECTOR and VSDCAD techniques, when compared

to the conventional case. We see that the VSDCAD technique has a lesser delay penalty

than LECTOR, for both the 100 nm and 70 nm cases. Another interesting fact observed

from Column 7 is that, with shrinking technologies (70 nm compared with 100 nm), the

LECTOR delay penalty increases, while the delay penalty of VSDCAD decreases. Col-

umn 8 is the average leakage savings ratio of the LECTOR and VSDCAD techniques,

when compared with the conventional NAND gate. It was observed that the VSDCAD

technique has two orders of magnitude savings for the 100 nm case, and three orders

of magnitude savings for the 70 nm case, when compared to the LECTOR technique.

Next, in order to compare the area utilization using both VSDCAD and LECTOR tech-

55

Table 4.5: Experimental Results for MCNC’91 Benchmarks (70 nm BPTM Pro-

cess, Supply Voltage = 1V)

MCNC’91 Leakage Power Dissipation Normalized Area Ratio Leakage Savings

Circuit U-MCNC LECTOR VSDCAD with respect to U-MCNC Ratio

Name µW (10−6W) µW (10−6W) nW (10−9W) LECTOR VSDCAD (C2/C4) (C3/C4)

I1 1.159 0.156 0.362 1.21 1.13 3202X 431X

I2 2.305 0.735 0.858 1.14 1.06 2686X 857X

I3 1.383 0.419 0.708 1.18 1.10 1953X 592X

I4 2.356 0.632 0.944 1.12 1.05 2496X 670X

I5 4.625 0.475 2.243 1.19 1.11 2062X 212X

I6 6.906 1.912 2.676 1.13 1.05 2581X 714X

I7 8.933 3.126 3.706 1.12 1.05 2410X 844X

I8 30.05 5.038 14.41 1.08 1.01 2085X 350X

I9 21.90 2.897 4.108 1.12 1.05 5331X 705X

I10 40.47 5.842 17.78 1.15 1.07 2276X 329X

C432 1.395 0.672 1.259 1.17 1.09 1108X 534X

C499 3.469 1.444 1.590 1.15 1.07 2182X 908X

C880 6.141 1.154 3.014 1.18 1.10 2038X 372X

C1355 8.089 1.672 4.297 1.11 1.04 1883X 389X

C1908 19.61 1.926 6.925 1.13 1.05 2832X 278X

C2670 52.17 2.845 9.388 1.19 1.11 5557X 303X

C3540 64.79 3.852 13.13 1.14 1.06 4935X 293X

C5315 82.58 4.826 18.15 1.12 1.05 4550X 266X

C6288 163.7 9.725 19.01 1.10 1.03 8611X 512X

C7552 323.2 10.24 27.64 1.08 1.01 11693X 370X

Average 1.1405X 1.0645X 3624X 496X

niques, 20 MCNC’91 benchmark circuits were used, and implementation was done us-

ing the BPTM’s 70 nm technology. These circuits include the largest circuit (C7552), as

well one of the smallest circuits (I1) in the benchmark suite. Table 4.5 lists the nor-

malized area and leakage power comparison for the LECTOR and VSDCAD techniques.

For a fair comparison, the supply voltage was set to 1V (as done in [33]). That is the

reason that the VSDCAD leakage values for the 20 benchmarks given in Column 4 of

Table 4.5 do not match the leakage values of the same 20 benchmarks from Column 5

56

of Table 4.3 (whose supply voltage was 0.85V).

The values reported in Columns 2 , 3 and 5 of Table 4.5 are those given in [33]. Col-

umn 1 lists the leakage power of the unmodified MCNC’91 (U-MCNC) circuits, while

Column 2 lists the leakage power of the LECTOR MCNC circuits. Column 4 of Table 4.5

gives the leakage (in standby mode) of the VSDCAD MCNC circuits. An important point

to note is that the leakage of U-MCNC and LECTOR circuits is in the order of micro-

Watts (µW), while the leakage of VSDCAD circuits is in the order of nano-Watts (nW).

Columns 5 and 6 give the normalized area ratio (penalty) of the LECTOR and VSDCAD

techniques, compared to U-MCNC. On average, a 6.45% area increase was observed in

the case of VSDCAD, as opposed to 14% area increase in the case of LECTOR. Column 7

of Table 4.5 lists the leakage savings when comparing the U-MCNC leakage value (from

Column 2 {C2}) to the VSDCAD leakage value (from Column 4 {C4}). Three orders of

magnitude leakage savings are seen (average of Column 7). Similarly, Column 8 of

Table 4.5 lists the leakage savings seen when comparing the LECTOR circuit leakage

value (from Column 3 {C3}) to the VSDCAD circuit leakage value (from Column 4 {C4}).

Two orders of magnitude leakage savings are observed (average of Column 8). From

these results, it can be clearly ascertained that the use of VSDCAD cells offers enor-

mous leakage savings, at the cost of a nominal increase in area.

Only an order of magnitude leakage savings was observed in Tables 3.4 and 3.6 of

Chapter 3, while two to three orders of magnitude savings were observed in Table 4.5.

This is due to the fact that BPTM models were used here instead of TSMC models

and also 70 nm was the technology used in this implementation as compared to the

180 nm implementation in Chapter 3. Much larger leakage losses are expected as

the technology shrinks. Consequently, for deep-submicron technologies huge leakage

savings (2 to 3 orders of magnitude) are seen using the VSDCAD technique over the

57

standard circuit implementation.

4.4 Leakage Savings Comparison

Since power-gating is a commonly used method for reducing leakage power, we com-

pared it with the VSDCAD technique for the 70 nm BPTM in the standby mode of oper-

ation. Five representative MCNC’91 benchmark circuits were used in this comparison.

In our power-gated CMOS circuit implementation, a high-VT PMOS transistor connects

between the power supply (Vdd) and the PUN, acting as a switch. For a fair comparison,

the high-VT PMOS transistors were sized the same for both the power-gated circuit and

the VSDCAD circuit implementations. Figure 4.1 shows the leakage power dissipation

comparison between both the techniques for the 5 circuits. From the experimental re-

sults, we noted a 5.7X improvement (on average) in leakage savings using the VSDCAD

technique, when compared to the use of traditional power-gating technique.

� � � ��

� � � ��

� � ��

� � ��

� � � ��

� � � ��

� � � ��

� � ��

� � � ��

0

10

20

30

40

50

I7 C1908 C5315 dalu x3

Le

ak

ag

e P

ow

er

Dis

sip

ati

on

(n

W)

Power−Gated CircuitBPTM 70 nm, Vdd = 0.85V

MCNC’91 Benchmark Circuit

VSDCAD Circuit

Figure 4.1: Leakage Power Dissipation Comparison

58

Comparison of the VSDCAD technique with the well-established LECTOR technique

shows significant leakage savings (orders of magnitude) of the former over the latter.

The area and delay penalty of VSDCAD circuits were also less than those seen in the

case of LECTOR. Designers and synthesis tools could add the VSDCAD cells in non-

critical paths, thereby avoiding effects on the overall circuit delay, while significantly

saving on leakage power loss.

In the following chapter, additional experimental research results are reported. An

example illustrating the leakage savings obtained after replacing only non-critical cells

in the circuit is presented. The effects of higher temperatures on leakage power are

also studied.

59

Chapter 5

Additional Experimental

Research

The VSDCAD ultra low-power RTL standard cell library can be seamlessly integrated

into a typical low-power synthesis framework. In Chapter 4, we saw that there is a

delay penalty associated with the use of VSDCAD cells. Hence, designers and synthesis

tools could add the VSDCAD cells in non-critical paths, and avoid effects on the over-

all circuit delay, while significantly saving on leakage power loss. The subthreshold

leakage current is exponentially dependent on temperature, and, its effects on leakage

power needs to be analyzed and studied in depth. In this chapter, an example illustrat-

ing the leakage savings obtained after replacing only non-critical cells in the circuit is

presented, followed by a study of higher temperature effects on leakage power.

5.1 Leakage Savings after Replacement of Only Non-Critical Cells

Figure 2.1 of Chapter 2 shows that a critical path analysis is performed on the RTL

circuit, using Cadence [4] or Synopsys [8] tools, as part of the low-power synthesis

framework. As an example, an 8-bit ripple-carry adder was designed, using TSMC’s

60

Table 5.1: Perf. Chars. of an 8-bit Standard Ripple Carry Adder

Output Signal Propagation Delay (ns) Slack Time (ns)

COUT 1.8917 0

S7 1.4470 0.4447

Leakage Power Loss = 12.483 nW

180 nm technology [9]. The components used to build the 8-bit adder (XOR2, AND2 and

OR2) were standard cells whose transistor sizes were fixed, similar to those found in

the library provided by Oklahoma State University [7, 31].

A critical path analysis was performed on the 8-bit ripple-carry adder using Synopsys

tools and the timings/slack for all the standard cells in the various paths obtained from

the logfiles. It was observed that the path to the carryout output signal (COUT ) was the

most critical, as it had no slack time. This adder was implemented only for demonstra-

tion purposes and may not be the optimal design implementation. Our adder design

was used in this analysis in order to show the effectiveness of our approach. Row 1,

Column 2 of Table 5.1 shows the propagation delay of the COUT signal. All the other

paths for the sum output signals (S0, . . . S7) had more slack time, when compared to

the critical COUT path. The slack time of the second most critical path, S7, is listed in

Row 2, Column 3 of Table 5.1. The slack times of all the other paths (S0, . . . S6) were

greater than 0.4447 ns. The leakage power loss was then measured for this 8-bit ripple-

carry adder composed of standard cells. The result was 12.483 nW, as can be seen from

Column 3 of Table 5.1.

Since enough available slack time was present for the sum paths (S0, . . . S7), all the

non-critical standard cells (XOR2 gates) in the various sum paths were replaced with

equivalent low-power VSDCAD library XOR2 gates. The standard cells that make up

61

Table 5.2: Perf. Chars. of an 8-bit Sleep-Embedded Ripple Carry Adder

Operating Mode Output Signal Propagation Delay (ns) Slack Time (ns)

COUT 1.8912 0

Active S7 1.5033 0.3879

Leakage Power Loss = 12.990 nW

Standby (Sleep) Leakage Power Loss = 6.213 nW

the critical COUT path were retained. Then, Synopsys tools were used to perform criti-

cal path analysis on this new 8-bit ripple-carry adder, made up of a mixture of standard

cells and VSDCAD sleep-embedded cells.

Column 4 of Table 5.2 shows the slack times obtained from the Synopsys logfiles. An

analysis of the new slack times shows that the slack times of the sum paths (S0, . . . S7)

have decreased because of the use of sleep-embedded cells, which have higher prop-

agation delay. However, the critical path timing (COUT ) of this new sleep-embedded

circuit is unchanged from that of the standard circuit. Next, the leakage power loss

in both the active and standby modes of operation was measured for this 8-bit ripple-

carry adder, which used a combination of standard cells and VSDCAD low-power cells.

Row 3 of Table 5.2 lists the leakage power loss in the active mode of operation as 12.990

nW. This is an increase of about 4% over the value listed in Row 3 of Table 5.1. How-

ever, during the standby mode of operation, the leakage loss, as seen from Row 4 of

Table 5.2, is 6.213 nW. Hence, during the sleep mode, leakage savings of more than 2X

were observed over the active mode operation of the circuit made up of standard cells.

The back-tracking algorithm for mixed-VT static CMOS circuits presented by Wei et al.

[70] could be used to evaluate the usage of high-VT or low-VT transistors in a circuit.

62

5.2 Effects of Varying Temperature on Leakage Power

Prior research [20, 34, 60, 64] shows a much larger leakage power loss occurring at

higher temperatures. This is due to the fact that subthreshold leakage current is ex-

ponentially dependent on temperature, as seen from Equation 1.1 in Chapter 1.

All the experimental results shown till now has been from simulations carried out on

various circuits at room temperature. Experiments to study the higher temperature

effects on leakage current were conducted on three CMOS circuits - NOR2x1, NAND2x1

and XOR2x1 gates. They were implemented using the BPTM’s 180 nm technology, and

the sizes of the transistors were fixed, similar to those found in the library provided by

Oklahoma State University [7, 31]. Circuit simulations were performed using HSPICE

[8] at six different temperature values: 25oC, 45oC, 65oC, 85oC, 105oC and 125oC.

0

200

400

600

800

1000

1200

40 60 80 100 120

Lea

kag

e P

ow

er L

oss

(n

W)

Temperature (Deg C)

Conventional CMOS Gates : Standard Mode (BPTM 180 nm, Vdd = 1.8V)

Xor2x1 ->

Nand2x1 ->

<- Nor2x1

"nor2a.dat""nand2a.dat"

"xor2a.dat"

Figure 5.1: Temperature Effects On Leakage Power - Standard Mode

63

Conventional implementations of all three circuits were first simulated using the stan-

dard mode of operation. Figure 5.1 shows a graph of the leakage power loss for the

NOR2x1, NAND2x1 and XOR2x1 gates at various temperature values. The XOR2x1 gate,

being a larger circuit than the other two, exhibits the largest leakage power loss at all

temperatures.

0

5

10

15

20

25

30

35

40 60 80 100 120

Lea

kag

e P

ow

er L

oss

(n

W)

Temperature (Deg C)

VSDCAD CMOS Gates : Standby Mode (BPTM 180 nm, Vdd = 1.8V)

Xor2x1 ->

Nand2x1/Nor2x1 ->

"nor2b-nand2b.dat""xor2b.dat"

Figure 5.2: Temperature Effects On Leakage Power - Standby (Sleep) Mode

Next, the VSDCAD sleep-circuitry was introduced, and these circuits were again simu-

lated in the standby (sleep) mode of operation. Figure 5.2 shows a graph of the leakage

power loss at various temperatures for all three circuits in the standby mode of oper-

ation. As can be seen, VSDCAD NAND2x1 and NOR2x1 gates exhibit the same leakage

loss at all temperatures.

64

NAND2x1 Gate

XOR2x1 Gate

NOR2x1 GateBPTM 180 nm, Vdd = 1.8V

40X

30X

20X

10X

0

25o

45o

65o

85o 105

o125

o

Temperature (Deg C)

Lea

kage

Sav

ings

Rat

io

Figure 5.3: Leakage Power Savings at Higher Temperatures

Figure 5.3 shows the leakage savings ratio at higher temperatures between the VS-

DCAD technique and the conventional implementation. It can be observed from Fig-

ure 5.3 that the leakage savings ratios vary from 8X (for the XOR2x1 gate at 45oC) to

35X (for the XOR2x1 gate at 125oC). Even at higher temperatures, significant leakage

power savings are seen using the VSDCAD technique, when compared to the standard

mode of operation, thereby supporting the effectiveness of our approach.

In this chapter, results from additional experiments show large leakage savings using

the VSDCAD technique, even after replacing only non-critical cells in a circuit. Find-

ings also show significant leakage savings in VSDCAD circuits at much higher temper-

atures.

65

A controller is needed to integrate the VSDCAD ultra low-power RTL standard cell li-

brary into the low-power synthesis framework. The area, delay and power overheads

due to this controller also need to be studied. RTL simulations need to be performed

to verify whether or not the controller and datapath (design) work in synchronization

with each other. In the next chapter, a signal probability based self-controller tech-

nique is presented. Experiments illustrating the leakage savings and circuit delay

overheads of controller integrated circuits, when compared to other leakage reduction

techniques, are also reported.

66

Chapter 6

Signal Probability Based

Self-Controller

The controller is most important contribution of this dissertation. It is the vital link

that will integrate with the VSDCAD ultra low-power RTL standard cell library, to mini-

mize leakage power in any low-power synthesis framework. The design and implemen-

tation of this leakage reduction controller is explained in this chapter. The VSDCAD

generic low leakage CMOS cell implementation, shown in Section 3.1 of Chapter 3, has

a disadvantage in that the circuit has two extra control signals, “sleep” and its comple-

ment “sleepbar”, feeding in from some external source or controller. To overcome this,

the topology in Figure 3.1 of Chapter 3 was modified for purposes of self-controllability

and reducing routing congestion. Such modified gates are referred to as VCLEARIT

(Vlsi Cmos LEAkage ReductIon Technique) self-controlled gates [52] for the remain-

der of this work. Less leakage savings were observed using VCLEARIT gates, when

compared to values seen in Chapters 3 and 4. This is because a perfect voltage balance

was not achieved, unlike in the previous work. However, the trade-off factor was that

the VCLEARIT gate had one less control signal than the VSDCAD gate, and hence was

much easier to route as well as control.

67

6.1 VCLEARIT Control Circuitry Embedded CMOS Gates

in 1

in n

Vdd

Gnd

P0

X1

X2

H vt

PUN

PDNP1

N0

ctrl

ctrl

ctrl

out

Figure 6.1: VCLEARIT CMOS Gate (AND, NAND) with Control Value 0

Figure 6.1 is the topology of a VCLEARIT CMOS gate with a control value ‘0’ (e.g., AND,

NAND) illustrating the control transistor circuitry embedded in it. There are ‘n’ inputs,

in1, . . . inn, feeding the Pull-Up Network (PUN) and the Pull-Down Network (PDN). The

transistors in both the PUN and PDN are standard-VT devices. The control circuitry

consists of three transistors - two PMOS devices {P0 and P1} and one NMOS device

{N0}. Transistors P0 and P1 are standard-VT devices, while N0 is a high-VT device.

P0 is connected in parallel with the PUN, one end connecting to the source (Vdd) and

the other end to a common point X1. P1 is connected in parallel with the PDN, one end

connecting to the Gnd and the other end to a common point X2. The high-VT transistor,

N0, connects between the two common points X1 and X2 and behaves like a transmis-

sion gate. The output of the CMOS circuit, “out”, is drawn from the common point X1.

An input signal, “ctrl” feeds the 3 transistors P0, P1 and N0. The output of any gate is

68

known when one of its inputs has the control value, and hence, this gate can be placed

in the standby mode of operation. The idea here is to connect whichever input is the

controlling value, to the “ctrl” signal, thereby placing the gate in the standby mode of

operation.

The operation of the VCLEARIT CMOS gate is as follows. In the normal operating mode,

“ctrl” is on. This causes transistors {P0, P1} to turn off and transistor N0 to turn on.

The circuit now behaves exactly as a normal CMOS complementary circuit should. The

standby operating mode is a little more involved. In this mode, one of the ‘n’ inputs

has the controlling value ‘0’ of the {AND, NAND} gate output, so that the gate can be

switched off. Signal “ctrl” is off, so that transistors {P0, P1} turn on, while transistor

N0 turns off. Since P0 is on, common point X1 is also at voltage Vdd. The PUN is now

between two points of almost equal voltage potential (Vdd), and hence, no leakage cur-

rent should flow through it. Similarly, P1 is on and common point X2 is grounded. The

PDN is now between two points of almost equal voltage potential (Gnd), and hence,

no leakage current should flow through it. The leakage loss occurring during the sleep

mode will only be through the high-VT transistor N0, which is turned off, but connected

between points X1 and X2, which are at different voltage potentials.

Figure 6.2 is the topology of a VCLEARIT CMOS gate with a control value ‘1’ (e.g., OR,

NOR).

6.2 Gate Control Signal Calculation in Circuits

The signal probability of a line is defined as the probability of it being set to either a ‘0’

or a ‘1’ value by some driving value. The signal probabilities of intermediate points in a

boolean CMOS circuit can be calculated from the signal probabilities of its primary in-

puts, by using well known signal probability propagation techniques [30]. These signal

69

in 1

in n

Vdd

Gnd

X1

X2

H vt

PUN

PDN

P0

N0

N1

ctrl

ctrl

ctrl

out

Figure 6.2: VCLEARIT CMOS Gate (OR, NOR) with Control Value 1

controllability and propagation properties are exploited in controlling VCLEARIT gates.

Consider the circuit schematic shown in Figure 6.3. It contains four VCLEARIT gates

{G1, . . . G4}, with three inputs {A, B and C} and one output {OUT}. The gate output

probabilities are calculated as shown in [16]. The control value for a NAND gate is ‘0’

and its output value is ‘1’. Also, the control value for an AND gate is ‘0’ and its output

value is ‘0’. Equation 6.1 gives the gate output probability value for both these cases.

Similarly, the control value for a NOR gate is ‘1’ and its output value is ‘0’. Also, the

control value for an OR gate is ‘1’ and its output value is ‘1’. Equation 6.2 gives these

gate output probability values.

GateOutputProb[NAND1, AND0] = 1 −n

∏

i=0

Pi (6.1)

GateOutputProb[NOR0, OR1] =n

∑

i=0

Pi −n

∏

i=0

Pi (6.2)

where, Pi is the signal probability of the ith input for the corresponding gate.

70

1/2

1/2 1/4

1/4

3/4

C

G3

G43/16

OUT

G1

G2

ctrl

ctrl

ctrlctrl

1/2

1/2

1/2

B

A

Figure 6.3: Example illustrating Gate Control Signal Calculation

All the inputs {A, B and C} to the circuit shown in Figure 6.3 are assumed to have an

equal probability (1

2) of having either a ‘0’ or a ‘1’ value. Using Equations 6.1 and 6.2

the output probabilities of all the four gates, G1, . . . G4, are calculated. The input

probability for every gate input is calculated such that this input value is the control-

ling value for that gate. For example, NOR gate G2 has 2 inputs ‘B’ and ‘C’ both with

probabilities (1

2) that their value is ‘1’, the controlling value of G2. Hence the output

probability of G2 is (3

4) and the output value is ‘0’. However, G2 drives one input of gate

G3, as well as one input of gate G4. The control value of G3 is ‘0’ and so the output

value of gate G2 can drive this input. Hence the probability of this input of G3 is the

same as that of the output of gate G2 which is (3

4). But, the control value of G4 is ‘1’,

and hence, the opposite of the output value of gate G2 needs to drive this input of G4.

So the probability of this input of G4 is (1-3

4= 1

4).

Once all signal probability calculations are completed, the input with the maximum

probability control value for every gate is chosen as the one to be connected to the

“ctrl” signal of that gate. In the case of a tie, the first occurrence is chosen. These

71

are represented in Figure 6.3 with circles around them and the corresponding “ctrl”

connections are seen. Gates in this circuit will automatically (on the fly) go into the

standby mode of operation based on their input signal values. This signal probability

based self-controlling technique can be applied to all classes of VLSI circuits.

6.3 Leakage Loss Comparison

TSMC’s 180 nm technology [9], with a supply voltage of 1.8V, as well as BPTM’s 100

nm technology [3], with a supply voltage of 1V, were used in the implementation of

this work. SPECTRE [4] was used in this work to simulate circuits and also to measure

leakage power. All simulations were carried out at a temperature of 27oC. Experiments

were conducted on 13 CMOS circuits, the smallest being the MCNC’91 benchmark C17,

and the largest being the MCNC’91 benchmark C6288. To see the leakage effects across

different technologies, experimental results were taken for 8 benchmarks implemented

using TSMC’s 180 nm technology, and for 6 benchmarks implemented using BPTM’s 100

nm technology.

First using TSMC’s 180 nm technology, 8 standard CMOS circuits were implemented.

Simulations were carried out and leakage power measured using SPECTRE [4]. For

every circuit, all possible input combinations were applied, and leakage power loss

was measured in every case. Column 2 (C2) from Table 6.1 lists the average leakage

power loss for all standard circuit implementations. Next, the same circuits were im-

plemented with TSMC’s 180 nm LECTOR gates and all LECTOR implementations were

simulated and their leakage loss measured. Column 3 (C3) from Table 6.1 gives the

average leakage power loss for each of the 8 LECTOR circuits. Finally, all 8 circuits

were implemented using TSMC’s 180 nm VCLEARIT gates.

An automated program was written using ANTLR [2, 59], which can parse a VHDL [11]

72

Table 6.1: Leakage Power Comparison @ Temperature = 27oC (TSMC’s 180 nm

Implementation)

CMOS Standard LECTOR VCLEARIT Leakage Leakage

Circuit Implementation Implementation Implementation Savings Savings

Name Leakage Power (pW) Leakage Power (pW) Leakage Power (pW) (C2/C4) (C3/C4)

C17 365.52 293.16 225.76 1.62X 1.30X

Rbench1 631.79 519.95 424.02 1.49X 1.23X

Rbench2 908.34 756.21 550.52 1.65X 1.37X

Levelized 1729.13 1408.32 1006.38 1.72X 1.40X

Skewed 618.87 529.80 409.97 1.51X 1.29X

Balanced 3460.33 2829.74 2034.49 1.70X 1.39X

Full-Adder 1104.33 878.82 751.15 1.47X 1.17X

4-Bit Adder 4155.22 3099.64 2457.81 1.69X 1.26X

Average 1.61X 1.30X

circuit netlist, build its connectivity graph, calculate the various internal signal prob-

abilities, and compute the input with the highest signal probability (for all the gates)

to connect to the gate’s “ctrl” signal. The input parameters to this program are: (1)

the VHDL circuit netlist, and (2) the switching probability values for all input signals

that turn off the corresponding gates they are connected to. The output of this pro-

gram is the corresponding VCLEARIT self-controlling leakage reduction circuit, with

all the “ctrl” signals connected appropriately. Appendix B illustrates the working of

this program for automated gate control signal calculation. The C17 MCNC’91 bench-

mark is used as an example, and control signals are generated, using this program for

the following two cases :

• When all the inputs have equal probability (0.5) of being either a ‘0’ or a ‘1’ value.

• When all the inputs have different probabilities of being either a ‘0’ or a ‘1’ value.

73

This program generated the VCLEARIT self-controlling leakage reduction circuit for

the 8 benchmarks listed in Table 6.1. These circuits were then simulated and the leak-

age loss measured. Column 4 (C4) from Table 6.1 gives the average leakage power loss

for each of the 8 VCLEARIT circuits.

Column 5 of Table 6.1 lists the leakage savings seen when comparing the standard

implementation leakage values (from Column 2 {C2}) to those of the VCLEARIT imple-

mentation leakage (from Column 4 {C4}). On average, a 61% improvement in leakage

savings is seen. Similarly, Column 6 of Table 6.1 lists the leakage savings seen when

comparing the self-controlled LECTOR circuit leakage value (from Column 3 {C3}) to

that of the self-controlled VCLEARIT circuit leakage value (from Column 4 {C4}). A

30% improvement in leakage savings (on average) is seen using the VCLEARIT tech-

nique, when compared to the LECTOR implementation for TSMC’s 180 nm technology.

Then, using BPTM’s 100 nm technology, 6 MCNC’91 benchmark standard circuits were

implemented. Simulations were carried out and leakage power measured using SPEC-

TRE [4]. For every circuit, 1500 input combinations were applied and leakage power

loss measured in every case. Column 2 (C2) from Table 6.2 lists the average leakage

power loss for all standard circuit implementations. Next, the same circuits were im-

plemented with BPTM’s 100 nm LECTOR gates, and all LECTOR implementations were

simulated and their leakage loss measured. Column 3 (C3) from Table 6.2 gives the

average leakage power loss for each of the 6 LECTOR circuits. Finally, all 6 circuits

were implemented using BPTM’s 100 nm VCLEARIT gates.

The automated program generated the VCLEARIT self-controlling leakage reduction

circuit for the 6 benchmarks listed in Table 6.2. These circuits were then simulated

and the leakage loss measured. Column 4 (C4) from Table 6.2 gives the average leak-

74

Table 6.2: Leakage Power Comparison @ Temperature = 27oC (BPTM’s 100 nm

Implementation)

CMOS Standard LECTOR VCLEARIT Leakage Leakage

Circuit Implementation Implementation Implementation Savings Savings

Name Leakage Power Leakage Power Leakage Power (C2/C4) (C3/C4)

C17 6.23 nW 4.36 nW 3.49 nW 1.79X 1.25X

C432 436.60 nW 363.90 nW 256.20 nW 1.70X 1.42X

C499 693.80 nW 512.60 nW 425.80 nW 1.63X 1.20X

C880 223.20 nW 157.38 nW 101.88 nW 2.19X 1.55X

C1355 497.60 nW 357.22 nW 244.30 nW 2.04X 1.46X

C6288 7.193 µW 5.370 µW 4.165 µW 1.73X 1.29X

Average 1.85X 1.36X

age power loss for each of the 6 VCLEARIT circuits.

Column 5 of Table 6.2 lists the leakage savings seen when comparing the standard

implementation leakage values (from Column 2 {C2}) to those of the VCLEARIT imple-

mentation leakage (from Column 4 {C4}). On average, an 85% improvement in leakage

savings is seen. Similarly, Column 6 of Table 6.2 lists the leakage savings seen when

comparing the self-controlled LECTOR circuit leakage value (from Column 3 {C3}) to

that of the self-controlled VCLEARIT circuit leakage value (from Column 4 {C4}). A

36% improvement in leakage savings (on average) is seen using the VCLEARIT tech-

nique, when compared to the LECTOR implementation for BPTM’s 100 nm technology.

6.4 Circuit Delay Comparison

TSMC’s 180 nm technology [9], with a supply voltage of 1.8V, was used in the imple-

mentation of this work. Transient analysis was performed on the 3 benchmark circuits

75

Table 6.3: Circuit Delay Comparison (TSMC’s 180 nm Implementation)

CMOS Standard LECTOR Power-Gated VCLEARIT Delay Delay Delay

Circuit Circuit Circuit Circuit Circuit Penalty Penalty Penalty

Name Delay (ps) Delay (ps) Delay (ps) Delay (ps) (C3/C2) (C4/C2) (C5/C2)

Skewed 136.83 198.36 158.99 158.63 1.45X 1.16X 1.16X

Levelized 197.70 318.28 237.21 235.26 1.61X 1.20X 1.19X

Balanced 319.51 492.05 357.84 351.46 1.54X 1.12X 1.10X

Average 1.53X 1.16X 1.15X

listed in Column 1 of Table 6.3 using SPECTRE [4]. The output load in each case was

an appropriately sized NAND2 gate.

For a fair comparison, the power-gated cell was also designed like the VCLEARIT cell.

The gating high-VT transistor was an NMOS transistor for an AND/NAND gate, with

a control value of ‘0’. Likewise, the gating high-VT transistor was a PMOS transistor

for an OR/NOR gate, with a control value of ‘1’. All gates in each of the 3 benchmarks

were sized as follows. The width of the high-VT transistor in both the VCLEARIT cir-

cuit (N0 from Figure 6.1; P0 from Figure 6.2), and the power-gated circuit (P1 from

Figure 3.4 of Section 3.3 of Chapter 3), was ‘W’. The 2 LCT transistors (LCT1 and LCT2

from Figure 1.2 of Chapter 1) were each sized ‘W2

’. All other transistors in the gate

were unit-sized.

Column 2 of Table 6.3 lists the delay of the 3 circuits for the standard implementation.

The delay of the 3 circuits for the LECTOR implementation case is shown in Column 3

of Table 6.3. The power-gated circuit delays are listed in Column 4 of Table 6.3. Col-

umn 5 of Table 6.3 gives the delay values of the VCLEARIT implementation for the

76

3 circuits. Column 6 of Table 6.3 shows the delay penalty (increase) of the LECTOR

circuit (from Column 3 {C3} of Table 6.3), when compared to that of the standard im-

plementation (from Column 2 {C2} of Table 6.3). An average delay penalty of 54% is

seen. Column 7 of Table 6.3 shows the delay penalty of the power-gated circuit (from

Column 4 {C4} of Table 6.3), when compared to that of the standard implementation

(from Column 2 {C2} of Table 6.3). An average delay penalty of 16% is seen. Finally,

Column 8 of Table 6.3 shows the delay penalty of the VCLEARIT circuit (from Column 5

{C5} of Table 6.3), when compared to that of the standard implementation (from Col-

umn 2 {C2} of Table 6.3). An average delay penalty of 15% is seen. We observe almost

similar delay penalties for power-gated and VCLEARIT circuits. This is due to the ex-

tra high-VT transistor in series with the circuit. The LECTOR implementation shows

worse delays because of the presence of two LCT transistors in series with the circuit.

Certain observations can be pointed out from this research. First, there is no need

for an external controller to sequence the operation of any circuit in order to reduce

leakage power. Internal signals can be tapped to implement self-control of the circuit

as done here. Second, routing of various controller signals to different portions of the

circuit, as in the case of MTCMOS circuits with an external controller, leads to complex

routing congestion problems. In this work, the layout is simple, and the only extra

routing involves internally connecting one of the input signals to the “ctrl” signal for

every VCLEARIT gate.

A novel self-controlling leakage reduction technique for CMOS circuits (VCLEARIT) is

presented in this chapter. Signal probabilities determine the mode of operation (func-

tional or standby) of the gates making up complex circuits. Experiments conducted on

a variety of combinational benchmarks for 180 nm and 100 nm technologies show sig-

nificant savings in leakage power for the VCLEARIT technique, when compared to stan-

77

dard circuit implementation, as well as to the LECTOR technique. The delay penalty

of the VCLEARIT technique was comparable to that of the power-gated circuit and was

far superior to that of the LECTOR leakage reduction technique.

The next chapter discusses experiments carried out to study the impact of different

circuit level topologies on the dynamic power dissipation of CMOS circuits. For a given

circuit topology, it would be a pity if excellent leakage savings are either offset or over-

run by excessive dynamic power values.

78

Chapter 7

Dynamic Power Study

Dynamic power is the active or switching component of the total power dissipation

of any circuit, as explained in Section 1.1 of Chapter 1. Equation 7.1 expresses the

dynamic power (PD) value [75] of any CMOS circuit:

PD = αfCLVdd2 (7.1)

where, α is the switching activity, f is the operation frequency, CL is the load capaci-

tance and Vdd is the supply voltage.

Efficient leakage current minimization techniques should not drastically affect dy-

namic power values of a circuit. It is a case of bad design when excellent leakage

savings are either offset or over-run by excessive dynamic power values. Hence, an ef-

fective design or technique would be one which maximizes the net reduction in power

(static + dynamic) dissipation.

Experiments were carried out to study the impact of circuit level choices (LECTOR and

VCLEARIT techniques) on the dynamic power dissipation of CMOS circuits. TSMC’s 180

nm technology [9] was used to implement the various circuits here. A supply voltage

79

Table 7.1: Standard Circuit Dynamic Power Measurement @ Temperature =

27oC (TSMC’s 180 nm Implementation)

CMOS Total Weighted Standard

Circuit # Implementation

Name Transistors Dynamic Power (µW)

C17 24 166.5432

Rbench1 36 393.7464

Rbench2 54 854.5338

Full-Adder 62 1846.4654

Levelized 86 4440.7562

∑

262 7702.045

Weighted Average 29.3971

of 1.8V was used, and all simulations were carried out at a room temperature of 27oC.

SPECTRE [4] was used to simulate circuits and also to measure leakage power. Experi-

ments were conducted on 5 CMOS circuits, the smallest being the MCNC’91 benchmark

- C17, and the largest being the Levelized circuit. For a fair comparison, the weighted

average of the dynamic power (per transistor) for all circuits across the different tech-

niques was used.

7.1 Standard Circuit Dynamic Power

First, all 5 standard CMOS circuits were implemented. Simulations with sequences of

test vectors were carried out and the dynamic power was measured using SPECTRE

[4]. Column 1 of Table 7.1 lists the names of the circuits used. The total number of

transistors for each circuit is given in Column 2 of Table 7.1. The weighted dynamic

power loss for all standard circuit implementations is shown in Column 3 of Table 7.1.

Row 6, Column 1 of Table 7.1 gives the sum total of the number of transistors in all

designs - 262. The summation of the weighted dynamic power values of all designs is

80

Table 7.2: LECTOR Dynamic Power Measurement @ Temperature =


CMOS Total Weighted LECTOR



C17 36 306.3708

Rbench1 56 671.6472

Rbench2 84 1535.9820

Full-Adder 98 3233.6864

Levelized 134 7750.2518

∑

408 13497.9380


given in Row 6, Column 2 of Table 7.1. Row 7 of Table 7.1 shows the weighted average

dynamic power (per transistor) for standard circuit implementation to be 29.3971 µW.

7.2 LECTOR Circuit Dynamic Power

Next, the LECTOR versions of all 5 CMOS circuits were implemented. Simulations with

sequences of test vectors were carried out and the dynamic power was measured using

SPECTRE. Column 1 of Table 7.2 lists the names of the circuits used. The total number

of transistors for each circuit is given in Column 2 of Table 7.2. The weighted dynamic

power loss for all LECTOR circuit implementations is shown in Column 3 of Table 7.2.

Row 6, Column 1 of Table 7.2 gives the sum total of the number of transistors in all

designs - 408. The summation of the weighted dynamic power values of all designs is

given in Row 6, Column 2 of Table 7.2. Row 7 of Table 7.2 shows the weighted average

dynamic power (per transistor) for the LECTOR circuit implementation to be 33.0832

µW.

81

Table 7.3: VCLEARIT Dynamic Power Measurement @ Temperature =


CMOS Total Weighted VCLEARIT



C17 42 299.7834

Rbench1 60 661.0200

Rbench2 88 1406.7064

Full-Adder 101 3023.3744

Levelized 143 7420.4130

∑

434 12811.2970


7.3 VCLEARIT Circuit Dynamic Power

Finally, the VCLEARIT versions of all 5 CMOS circuits were implemented. Simulations

with sequences of test vectors were carried out and the dynamic power was measured

using SPECTRE. Column 1 of Table 7.3 lists the names of the circuits used. The total

number of transistors for each circuit is given in Column 2 of Table 7.3. The weighted

dynamic power loss for all VCLEARIT circuit implementations is shown in Column 3

of Table 7.3. Row 6, Column 1 of Table 7.3 gives the sum total of the number of tran-

sistors in all designs - 434. The summation of the weighted dynamic power values of

all designs is given in Row 6, Column 2 of Table 7.3. Row 7 of Table 7.3 shows the

weighted average dynamic power (per transistor) for the VCLEARIT circuit implemen-

tation to be 29.5191 µW.

A small 0.42% increase in dynamic power was seen when comparing the VCLEARIT

implementation dynamic power (Row 7 of Table 7.3) to that of the standard implemen-

82

tation (Row 7 of Table 7.1). This is in contrast to a 12.54% increase in dynamic power,

when comparing the LECTOR implementation dynamic power (Row 7 of Table 7.2) to

that of the standard implementation (Row 7 of Table 7.1). These results show that

the VCLEARIT technique, in addition to providing significant leakage savings, does

not have any adverse effect on the dynamic power of the whole circuit. Hence, the

VCLEARIT technique could be used to reduce the net power (static + dynamic) dissipa-

tion for deep-submicron technologies.

In the next chapter, the conclusions drawn from this research are presented, the sum-

mary of contributions of this dissertation clearly detailed, and the scope for future

work laid out.

83

Chapter 8

Conclusions

With the advent of deep-submicron technologies, leakage loss is a major concern for

scaling down portable devices that have burst-mode type integrated circuits. Leak-

age drains the battery, even when a circuit is completely idle. Power is unnecessarily

consumed, with no useful work being done. In this dissertation, we have developed a

novel self-controlled leakage reduction technique for CMOS circuits and have embed-

ded it into the low-power synthesis framework. In this chapter, we present a summary

of contributions of this work and give suggestions and pointers for future work.

8.1 Summary of Contributions

• Design and Development of the VSDCAD Sleep-Embedded Topology for

Leakage Reduction in CMOS Circuits [49]: A novel technique that achieves

cancellation of leakage effects in both the Pull-Up Network (PUN) as well as the

Pull-Down Network (PDN) of CMOS cells was devised. A combination of high-VT

and standard-VT sleep transistors embedded within the CMOS topology was used

in voltage balancing of the PUN and PDN paths, thereby shutting them off and

minimizing leakage loss.

84

• Characterization of the VSDCAD Ultra Low-Power Standard Cell Library

[51, 54]: As part of this research, an ultra low-power standard cell library was

developed on the basis of the VSDCAD topology. The VSDCAD ultra low-power

standard cell library contains 8 combinational and 2 sequential standard cells,

which have been characterized for area, delay and power.

• Signal Probability Based VCLEARIT Self-Controller Design for Leakage

Power Reduction [52]: The VSDCAD sleep-embedded topology was modified in

this work for better controllability and also to reduce routing congestion. The

self-controller is the vital segment of this dissertation work that sequences the

working of VSDCAD sleep-embedded cells in VLSI circuits. Signal probabilities

are used to determine the mode of operation (functional or standby) of such cells,

thereby avoiding the need for external circuitry.

• Seamless Integration of the Self-Controlled Sleep-Embedded Cells into

the Low-Power Synthesis Flow: A methodology to integrate the VSDCAD ultra

low-power library and the VCLEARIT self-controller into the RTL segment of the

low-power synthesis framework is presented in this work.

8.2 Scope for Future Work

• Study the Effects of Gate Leakage: The primary goal of this dissertation

work was to invent techniques to minimize subthreshold leakage. However, the

gate leakage problem poses a significant design challenge for sub-100 nm CMOS

technologies [32, 71]. The effects of the gate leakage component of power dissipa-

tion could be studied and augmented to provide a complete leakage minimization

package.

85

• Methodology for Comprehensive Leakage Reduction: This methodology

currently being developed at the VSDCAD laboratory in Syracuse University can

make use of the ultra low-power standard cell library developed as part of our

research. For active mode power reduction, multiple power domains are sup-

ported using clusters of high-VT and low-VT cells. For standby leakage reduction,

the cell assignment algorithm makes use of the MTCMOS technique. Hence, the

signal probability based VCLEARIT self-controller cells could be used for high-VT

allocation by this algorithm.

• Investigation of Leakage Reduction Techniques at Higher Levels of Ab-

straction: The investigation of leakage reduction techniques at the behavioral-

level or even system-level of abstraction would be an interesting topic for further

in-depth research.

86

Appendix A

Sleep-Embedded Master-Slave

POSX DFF Schematic

The following page shows the topology of the Positive Edge-triggered Sleep-Embedded

Master-Slave D Flip-Flop. The important cross-sections of the flip-flop are shown in

boxes - Master, Slave, Clock Inverter and State Saving circuitry.

87

Figure A.1: Block Diagram - VSDCAD Master-Slave Positive Edged D Flip-Flop

VSDCAD

CMOS

LIKE

VSDCAD

CMOS

LIKE

clk

clkbar

sleep

sleepbar

clk

clkbar

clkclkbar

sleep

sleepbar VSDCAD

CMOS

LIKE

PATH4 PATH5

PATH6

VSDCAD

CMOS

LIKE

VSDCAD

CMOS

LIKE

PATH2

PATH3

D

sleep

sleepbarclk

clkbar

sleep

sleepbar

clk

clkbar

sleep

sleepbar

clkclkbar

sleepbar

sleep

PATH1

VSDCAD

CMOS

LIKE

INVERTER

CLOCK

VSDCAD

sleepbar

sleep

clkclkbar

MASTER SLAVE

CIRCUITSAVING

STATE

vdd!

HV

vdd!

gnd!

HV

HV

Qbar

QHV

gnd!

vdd!

HV

HV HV

sleep sleepbarvdd!gnd!

TP9

TP13

TN4

TP10

TN1

TP0

TN0

88

Figure A.2: Schematic - VSDCAD Master-Slave Positive Edged D Flip-Flop

89

Appendix B

Automated Gate Control Signal

Calculation - Parser Output

The C17 MCNC’91 benchmark [72] is used to illustrate the automated output from

the parser developed as part of this research. The circuit topology is as shown in

Figure B.1. It comprises of 6 NAND gates, with 5 inputs (INP(0), . . . INP(4)) and 2

outputs (OUTPI(0), OUTPI(1)).

Figure B.1: C17 Circuit Topology

NAND2

OUTPI(1)

OUTPI(0)

NAND3

NAND1

NAND0

INTERP(2)

NAND5

NAND4

INTERP(3)

INTERP(0)

INP(4)

INP(1)

INP(3)

INP(2)

INTERP(1)

INP(0)

90

The VHDL implementation of the C17 benchmark is as follows :

--------------------------------------------------- -----------------

-- Source File Name : c17.vhd -

-- Modified by : Preetham Lakshmikanthan -

-- VLSI Systems Design and CAD (VSDCAD) Laboratory -

-- EECS Department, Syracuse University, Syracuse, NY-132 44, U.S.A -

--------------------------------------------------- -----------------

library IEEE;

use IEEE.std_logic_1164.all;

use work.gates_pkg.all;

ENTITY c17_i89 IS

PORT (

INP : in std_ulogic_vector(0 to 4);

OUTP : out std_ulogic_vector(0 to 1));

END c17_i89;

ARCHITECTURE structural OF c17_i89 IS

signal INTERP : std_ulogic_vector(0 to 3):=(others=>’0’) ;

signal OUTPI : std_ulogic_vector(OUTP’range):=(others= >’0’);

BEGIN

NAND0 : NANDG_N generic map (2,1 ns,1 ns)

port map (

inp(0) => INP(0),

inp(1) => INP(2),

out1 => INTERP(0));


91

port map (

inp(0) => INP(2),

inp(1) => INP(3),

out1 => INTERP(1));


port map (

inp(0) => INP(1),

inp(1) => INTERP(1),

out1 => INTERP(2));


port map (


inp(1) => INP(4),

out1 => INTERP(3));


port map (



out1 => OUTPI(0));


port map (



out1 => OUTPI(1));

BUFFER_OUT : OUTP <= OUTPI;

END structural;

-------------------------------------------

92

The automated parser output for the C17 circuit when all the inputs have equal prob-

ability (0.5) of being either a ‘0’ or a ‘1’ value, is as follows :

--------------------------------------------------- ---------------

preetham@nyx˜> execute

Please enter [Path Name]VHDL file to parse : c17.vhd

Please enter number of inputs -> 5

Enter Input[0] Name : INP(0)

Enter Input[0] Probability Value : 0.5









Number of Components in this Design : 6

Successor Gate List :

Gate NAND0 connects to NAND4

Gate NAND1 connects to NAND2 NAND3



Gate NAND4 has no successors


93

Input Signal Connectivity to :

NAND0 inputs are ...

1) INP(0) has probability 0.5 to have control value 0


.. and output probability of gate NAND0 to be 1 is : 0.75







2) Gate NAND1 has probability 0.75

.... o/p sig. is : INTERP(1) whose prob. is 0.25











94









Control Signal Info :

INP(0) with max prob 0.5 is ctrl sig for Gate NAND0




INTERP(0) with max prob 0.25 is ctrl sig for Gate NAND4


--------------------------------------------------- ---------------

The automated parser output for the C17 circuit when the various inputs have different

probabilities of being either a ‘0’ or a ‘1’ value, is as follows :

--------------------------------------------------- ---------------

preetham@nyx˜> execute

Please enter [Path Name]VHDL file to parse : c17.vhd

Please enter number of inputs -> 5




95








Number of Components in this Design : 6

Successor Gate List :







Input Signal Connectivity to :







2) INP(3) has probability 1 to have control value 0

96








1) INP(4) has probability 1 to have control value 0
















97

Control Signal Info :


INP(3) with max prob 1 is ctrl sig for Gate NAND1


INP(4) with max prob 1 is ctrl sig for Gate NAND3



--------------------------------------------------- ---

98

Bibliography

[1] Advanced Micro Devices, Inc.

http://www.amd.com.

[2] ANTLR v3.

http://www.antlr.org.

[3] Berkeley Predictive Technology Model (BPTM).

http://www-device.eecs.berkeley.edu/∼ptm.

[4] Cadence Design Systems, Inc.

http://www.cadence.com.

[5] Intel Corporation.

http://www.intel.com.

[6] International Technology Roadmap for Semiconductors (ITRS-06).

http://www.itrs.net/Links/2006Update/FinalToPost/02 Design 2006Update.pdf.

[7] Oklahoma State University - Standard Cell Library.

http://avatar.ecen.okstate.edu/projects/scells.

[8] Synopsys, Inc.

http://www.synopsys.com.

[9] TSMC Processes Available Through MOSIS.

http://www.mosis.org/products/fab/vendors/tsmc.

[10] IEEE Standard 1076-1993 Hardware Description Language Based on the Verilog.

IEEE Press.

99

[11] IEEE Standard 1076-1993 Standard VHDL Language Reference Manual. IEEE

Press.

[12] ABDOLLAHI, A., FALLAH, F., AND PEDRAM, M. Leakage Current Reduction in

CMOS VLSI Circuits by Input Vector Control. IEEE Transactions on Very Large

Scale Integration (VLSI) Systems 12, 2 (February 2004), 140–154.

[13] ABDOLLAHI, A., FALLAH, F., AND PEDRAM, M. A Robust Power Gating Struc-

ture and Power Mode Transition Strategy for MTCMOS Design. Under Review for

IEEE Transactions on Very Large Scale Integration (VLSI) Systems (2006), 1–24.

[14] ABDOLLAHI, A., AND PEDRAM, M. “Power Minimization Techniques at the RT-

level and Below” In System-On-Chip: Next Generation Electronics. IEE Press,

2006, pp. 387–410.

[15] ABDOLLAHI, A., PEDRAM, M., FALLAH, F., AND GHOSH, I. Precomputation-

based Guarding for Dynamic and Leakage Power Reduction. In Proceedings of

the 21st International Conference on Computer Design (October 2003), pp. 90–97.

[16] ABRAMOVICI, M., BREUER, M. A., AND FRIEDMAN, A. D. Digital Systems Test-

ing and Testable Design. Wiley-IEEE Press, New York, USA, 1994, p. 672.

[17] AGARWAL, K., DEOGUN, H., SYLVESTER, D., AND NOWKA, K. Power Gating

with Multiple Sleep Modes. In Proceedings of the 7th International Symposium

on Quality Electronic Design (March 2006), pp. 633–637.

[18] ALOUL, F. A., HASSOUN, S., SAKALLAH, K. A., AND BLAAUW, D. Robust SAT-

Based Search Algorithm for Leakage Power Reduction. In Proceedings of the 12th

International Workshop on Power And Timing Modeling, Optimization and Sim-

ulation (September 2002), pp. 167–177.

[19] ASHENDEN, P. J. The Designer’s Guide to VHDL, second ed. Morgan Kaufmann

Publishers, San Francisco, USA, 2001, p. 759.

[20] BENINI, L., MICHELI, G. D., AND MACII, E. Designing Low-Power Circuits:

Practical Recipes. IEEE Circuits and Systems Magazine 1, 1 (January 2001), 6–

25.

[21] BHUNIA, S., MAHMOODI, H., GHOSH, D., MUKHOPADHYAY, S., AND ROY, K.

Low Power Scan Design Using First Level Supply Gating. IEEE Transactions on

Very Large Scale Integration (VLSI) Systems 13, 3 (March 2005), 384–395.

100

[22] BORKAR, S. Gigascale Integration - Challenges and Opportunities (Home > Intel

Software Network > Strategies & Technologies).

http://www.intel.com/cd/ids/developer/asmo-na/eng/strategy/182440.htm?page=2.

[23] CALHOUN, B. A., HONORE, F. A., AND CHANDRAKASAN, A. Design Methodology

for Fine-Grained Leakage Control in MTCMOS. In IEEE International Sympo-

sium on Low Power Electronics and Design (August 2003), pp. 104–109.

[24] CARBALLO, J. A., BURNS, J. L., YOO, S. M., VO, I., AND NORMAN, V. R. A Semi-

Custom Voltage-Island Technique and its Application to High-Speed Serial Links.

In Proceedings of the 2003 International Symposium on Low Power Electronics

and Design (August 2003), pp. 60–65.

[25] CHEN, Z., JOHNSON, M., WEI, L., AND ROY, K. Estimation of Standby Leakage

Power in CMOS Circuits Considering Accurate Modeling of Transistor Stacks. In

IEEE International Symposium on Low Power Electronics and Design (August

1998), pp. 239–244.

[26] COPELAND, D. 64-bit Server Cooling Requirements. In Proceedings of the 2005

IEEE 21st Annual Semiconductor Thermal Measurement and Management Sym-

posium (March 2005), pp. 94–98.

[27] DUARTE, D., TSAI, Y. F., VIJAYKRISHNAN, N., AND IRWIN, M. J. Evaluating

Run-Time Techniques for Leakage Power Reduction. In Proceedings of the 7th

Asia and South Pacific Design Automation Conference/15th International Confer-

ence on VLSI Design (January 2002), pp. 31–38.

[28] ELKARABLIEH, B., AND NUNEZ, A. A Synthesis Technique for Reducing Leakage

Based on Signal Controllability. In Proceedings of the 3rd IEEE International

Conference on Electrical and Electronics Engineering (September 2006), pp. 339–

342.

[29] FALLAH, F., AND PEDRAM, M. Standby and Active Leakage Current Control and

Minimization in CMOS VLSI Circuits. IEICE Transactions on Electronics, Special

Section on Low-Power LSI and Low-Power IP E88-C, 4 (April 2005), 509–519.

[30] GOLDSTEIN, L. Controllability/Observability Analysis of Digital Circuits. IEEE

Transactions on Circuits and Systems 26, 9 (September 1979), 685–693.

101

[31] GRAD, J., AND STINE, J. E. A Standard Cell Library for Student Projects. In

Proceedings of the 2003 IEEE International Conference on Microelectronic Sys-

tems Education (June 2003), pp. 98–99.

[32] GUINDI, R. S., AND NAJM, F. N. Design Techniques for Gate-Leakage Reduction

in CMOS Circuits. In Proceedings of the 4th International Symposium on Quality

Electronic Design (March 2003), pp. 61–65.

[33] HANCHATE, N., AND RANGANATHAN, N. LECTOR: A Technique for Leakage

Reduction in CMOS Circuits. IEEE Transactions on Very Large Scale Integration

(VLSI) Systems 12, 2 (February 2004), 196–205.

[34] HE, L., LIAO, W., AND STAN, M. R. System Level Leakage Reduction Consider-

ing the Interdependence of Temperature and Leakage. In Proceedings of the 41st

Design Automation Conference (June 2004), pp. 12–17.

[35] HELLER, L., GRIFFIN, W., DAVIS, J., AND THOMAS, N. Cascode Voltage Switch

Logic: A Differential CMOS Logic Family. In Proceedings of the IEEE Interna-

tional Solid-State Circuits Conference (February 1984), pp. 16–17.

[36] HILLMAN, D. Using Mobilize Power Management IP for Dynamic & Static Power

Reduction in SoC at 130 nm. In Proceedings of the Design, Automation and Test

in Europe (March 2005), vol. 3, pp. 240–246.

[37] HOSSAIN, R., ZHENG, M., AND ALBICKI, A. Reducing Power Dissipation in

CMOS Circuits by Signal Probability based Transistor Reordering. IEEE Trans-

actions on Computer-Aided Design of Integrated Circuits and Systems 15, 3

(March 1996), 361–368.

[38] HUNG, W. L., LINK, G. M., XIE, Y., VIJAYKRISHNAN, N., DHANWADA, N., AND

CONNER, J. Temperature-Aware Voltage Islands Architecting in System-on-Chip

Design. In Proceedings of the IEEE International Conference on Computer Design

(October 2005), pp. 689–694.

[39] IMAN, S., AND PEDRAM, M. POSE: Power Optimization and Synthesis Envi-

ronment. In Proceedings of the 33rd Design Automation Conference (June 1996),

pp. 21–26.

[40] JOHNSON, M. C., SOMASEKHAR, D., AND ROY, K. Models and Algorithms for

Bounds on Leakage in CMOS Circuits. IEEE Transactions on Computer-Aided

Design of Integrated Circuits and Systems 18, 6 (June 1999), 714–725.

102

[41] KAO, J. T., AND CHANDRAKASAN, A. P. Dual-Threshold Voltage Techniques for

Low-power Digital Circuits. IEEE Journal of Solid-State Circuits 35, 7 (July

2000), 1009–1018.

[42] KAPADIA, H., BENINI, L., AND MICHELI, G. D. Reducing Switching Activity on

Datapath Buses with Control-Signal Gating. IEEE Journal of Solid-State Circuits

34, 3 (March 1999), 405–414.

[43] KUO, G. Low-Power Design Goes Mainstream. EE Times

http://www.eetimes.com/news/latest/showArticle.jhtml?articleID=163101933,

1371 (May 2005), 56.

[44] KURSUN, V., AND FRIEDMAN, E. G. Energy Efficient Dual Threshold Voltage

Dynamic Circuits Employing Sleep Switches To Minimize Subthreshold Leakage.

In Proceedings of the IEEE International Symposium on Circuits and Systems

(May 2004), vol. 2, pp. 417–420.

[45] KURSUN, V., AND FRIEDMAN, E. G. Node Voltage Dependent Subthreshold Leak-

age Current Characteristics Of Dynamic Circuits. In Proceedings of the 5th Inter-

national Symposium on Quality Electronic Design (March 2004), pp. 104–109.

[46] LACKEY, D. E., ZUCHOWSKI, P. S., BEDNAR, T. R., STOUT, D. W., GOULD, S. W.,

AND COHN, J. M. Managing Power and Performance for System-on-Chip Designs

using Voltage Islands. In Proceedings of the IEEE/ACM International Conference

on Computer-Aided Design (November 2002), pp. 195–202.

[47] LAKSHMIKANTHAN, P., MULCHANDANI, S., AND NUNEZ, A. Sizing Analog Cir-

cuits using an Improved Optimization-Based Tool. In Proceedings of the 2nd

IASTED International Conference on Circuits, Signals and Systems (November

2004), pp. 130–135.

[48] LAKSHMIKANTHAN, P., AND NUNEZ, A. Design Issues and Implementation

Strategies for Building On-Chip Voltage Level-Shifting Circuits. In Proceedings

of the 4th IASTED International Conference on Circuits, Signals and Systems

(November 2006), pp. 144–149.

[49] LAKSHMIKANTHAN, P., AND NUNEZ, A. A Novel Methodology To Reduce Leak-

age Power In CMOS Complementary Circuits. In Proceedings of the 16th Inter-

national Workshop on Power And Timing Modeling, Optimization and Simulation

(September 2006), pp. 614–623.

103

[50] LAKSHMIKANTHAN, P., AND NUNEZ, A. A Novel Methodology To Reduce Leakage

Power In Differential Cascode Voltage Switch Logic Circuits. In Proceedings of

the 3rd IEEE International Conference on Electrical and Electronics Engineering

(September 2006), pp. 335–338.

[51] LAKSHMIKANTHAN, P., AND NUNEZ, A. A Novel Methodology To Reduce Leak-

age Power In Master-Slave D Flip-Flops. In Proceedings of the 9th Military

and Aerospace Programmable Logic Devices International Conference (September

2006), pp. 1–7.

[52] LAKSHMIKANTHAN, P., AND NUNEZ, A. A Signal Probability Based Self-

Controlling Leakage Reduction Technique for CMOS Circuits. In Proceedings of

the 4th IEEE International Conference on Electrical and Electronics Engineering

(September 2007), pp. 357–360.

[53] LAKSHMIKANTHAN, P., AND NUNEZ, A. VCLEARIT: A VLSI CMOS Circuit

Leakage Reduction Technique for Nanoscale Technologies. In Proceedings of the

Advanced Low Power Systems Workshop at the 21st ACM International Conference

on Supercomputing (June 2007), pp. 15–22.

[54] LAKSHMIKANTHAN, P., SAHNI, K., AND NUNEZ, A. Design of Ultra-Low

Power Combinational Standard Library Cells Using A Novel Leakage Reduction

Methodology. In Proceedings of the 19th IEEE International System-On-Chip Con-

ference (September 2006), pp. 93–94.

[55] MAHMOODI-MEIMAND, H., AND ROY, K. Data-Retention Flip-Flops for Power-

Down Applications. In IEEE International Symposium on Circuits and Systems

(May 2004), vol. 2, pp. 677–680.

[56] MUTOH, S., DOUSEKI, T., MATSUYA, Y., AOKI, T., SHIGEMATSU, S., AND

YAMADA, J. 1-V Power Supply High-Speed Digital Circuit Technology with

Multithreshold-Voltage CMOS. IEEE Journal of Solid-State Circuits 30, 8 (Au-

gust 1995), 847–854.

[57] NARENDRA, S., BORKAR, S., DE, V., ANTONIADIS, D., AND CHANDRAKASAN, A.

Scaling of Stack Effect and its Application for Leakage Reduction. In Proceedings

of the International Symposium on Low Power Electronics and Design (August

2001), pp. 195–200.

104

[58] NARENDRA, S., DE, V., BORKAR, S., ANTONIADIS, D. A., AND CHANDRAKASAN,

A. P. Full-Chip Subthreshold Leakage Power Prediction and Reduction Tech-

niques for Sub-0.18-µm CMOS. IEEE Journal of Solid-State Circuits 39, 2 (Febru-

ary 2004), 501–510.

[59] PARR, T. The Definitive ANTLR Reference : Building Domain-Specific Languages.

The Pragmatic Programmers, Texas, USA, 2007, p. 384.

[60] PEDRAM, M., AND RABAEY, J. M. Power Aware Design Methodologies. Kluwer

Academic Publishers, Massachusetts, USA, 2002, p. 544.

[61] PIVIN, D. Pick the Right Package for Your Next ASIC Design. EDN 39, 3 (Febru-

ary 1994), 91–108.

[62] RABAEY, J. M., CHANDRAKASAN, A. P., AND NIKOLIC, B. Digital Integrated

Circuits - A Design Perspective, second ed. Prentice Hall Publishers, New Jersey,

USA, 2002, p. 761.

[63] RAJAPANDIAN, S., SHEPARD, K. L., HAZUCHA, P., AND KARNIK, T. High-

Tension Power Delivery: Operating 0.18µm CMOS Digital Logic at 5.4V. In IEEE

International Solid-State Circuits Conference (February 2005), pp. 298–299.

[64] ROY, K., AND PRASAD, S. Low-Power CMOS VLSI Circuit Design. Wiley-

Interscience, New York, USA, 2000, p. 376.

[65] SHIGEMATSU, S., MUTOH, S., MATSUYA, Y., TANABE, Y., AND YAMADA, J. A

1-V High-Speed MTCMOS Circuit Scheme for Power-Down Application Circuits.

IEEE Journal of Solid-State Circuits 32, 6 (June 1997), 861–869.

[66] SMALL, C. Shrinking Devices Put the Squeeze on System Packaging. EDN 39, 4

(February 1994), 41–46.

[67] SUZUKI, Y., ODAGAWA, K., AND ABE, T. Clocked CMOS Calculator Circuitry.

IEEE Journal of Solid-State Circuits SC-8, 6 (December 1973), 462–469.

[68] THOMAS, D. E., AND MOORBY, P. R. The Verilog Hardware Description Lan-

guage, fifth ed. Kluwer Academic Publishers, Massachusetts, USA, 2002, p. 408.

[69] WEI, G. Y., AND HOROWITZ, M. A Fully Digital, Energy-Efficient Adaptive

Power-Supply Regulator. IEEE Journal of Solid-State Circuits 34, 4 (April 1999),

520–528.

105

[70] WEI, L., CHEN, Z., ROY, K., YE, Y., AND DE, V. Mixed-Vth (MVT) CMOS Cir-

cuit Design Methodology for Low Power Applications. In Proceedings of the 36th

Design Automation Conference (June 1999), pp. 430–435.

[71] YANG, G., WANG, Z., AND KANG, S. Gate Leakage Tolerant Circuits in Deep

Sub-100 nm CMOS Technologies. Smart Materials and Structures 15, 1 (February

2006), S21–S28.

[72] YANG, S. Logic Synthesis and Optimization Benchmarks User Guide Version 3.0.

Microelectronics Center of North Carolina Technical Report (January 1991).

[73] YANG, S., WOLF, W., VIJAYKRISHNAN, N., XIE, Y., AND WANG, W. Accurate

Stacking Effect Macro-Modeling of Leakage Power in Sub-100nm Circuits. In

Proceedings of the 18th International Conference on VLSI Design (January 2005),

pp. 165–170.

[74] YE, Y., BORKAR, S., AND DE, V. A New Technique for Standby Leakage Reduc-

tion in High-Performance Circuits. In IEEE Symposium on VLSI Circuits Digest

of Technical Papers (June 1998), pp. 40–41.

[75] YEO, K. S., AND ROY, K. Low-Voltage, Low-Power VLSI Subsystems. McGraw-

Hill, New York, USA, 2005, p. 293.

[76] YUAN, L., AND QU, G. A Combined Gate Replacement and Input Vector Control

Approach for Leakage Current Reduction. IEEE Transactions on Very Large Scale

Integration (VLSI) Systems 14, 2 (February 2006), 196–205.

106

volume 1, issue 2, 2007 - semantic scholar · volume 1, issue 2, 2007 novel energy-efficient...

Documents