a gate level simulator for power consumption analysis

28
CARNEGIE MELLON Department of Electrical and Computer Engineering~ A Gate Level Simulator for Power Consumption Analysis David J. Pursley 1996 Advisor: Prof. Thomas

Upload: others

Post on 11-Feb-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

CARNEGIE MELLONDepartment of Electrical and Computer Engineering~

A Gate Level Simulator for PowerConsumption Analysis

David J. Pursley

1996

Advisor: Prof. Thomas

A gate level simulator for powerconsumption analysis

David J. Pursley ([email protected])

Department of Electrical and Computer Engineering

Carnegie Mellon University

Pittsburgh, PA 15213

Power consumption of digital circuits has become a critical design parameter. As such, it is neces-

sary that the system designer is able to estimate power consumption and correlate the results back

to high level specifications. A gate level tool that estimates power consumption and correlates the

results with functiona! modules and control states has been designed. This tool has produced esti-

mations of the power consumption of twelve different implementations of the discrete cosine trans-

form (DCT). These results are being used to judge the relative impact of high-leve!

transformations, such as pipelinin$ and varying the amount of resource sharing and parallelism,

on power dissipation for the D CT algorithm.

A gate level simulator for power consumption analysis May 1, 1996 1

Acknowledgments

I would like to first thank my advisor, Don Thomas, for his patience, guidance and exam-

ple over the past two years.

I would also like to thank my research partners and officemates, Pinar Ceyhan and Sad

Coumeri, for both their help and their willingness to always lend an ear.

Finally, I thank my professors at Bucknell University who first interested me in the field of

computer engineering and then aided me in my decision to continue towards my Master’s and

(someday) my Ph.D. Those helpful professors include Daniel Hyde, Jerud Mead, Xiannong Meng

(who is now at The University of Texas-Pan American), James Lu and Maurice Aburdene.

1.0 Introduction

Power consumption of digital circuits has become a critical design parameter. For example, porta-

ble applications require low power circuits to extend battery life, and all circuits have to deal with

the problem of electromigration. Thus, it is important that the system designer is able to estimate

power consumption and correlate the results with high level specifications.

We have designed a gate level tool that estimates power consumption and correlates the results to

the original register-transfer level (RTL) specifications. A unique aspect of this tool is that power

consumption is both estimated for individual modules and reported by control state. It can also be

back-annotated with actual capacitance values from layout to produce more accurate estimations.

This tool is also being used to help pinpoint areas where power-saving optimizations are most

needed and to verify the accuracy of existing statistical power estimation techniques. We have esti-

mated the power consumption of 12 different implementations of the discrete cosine transform,

and we are currently laying out the designs in order to obtain capacitance values for back-annota-

tion. In the future, we hope to use this tool to aid in the design of systems using QuadRall technol-

ogy, a low-power CMOS-based technology currently being designed at CMU [Kri96].

1.1 Our approach

Our goal is to provide a power estimation tool that will be maximally useful to the system designer

in considering various high level transformations, such as pipelining and varying the amount of

resource sharing and parallelism, and their effect on power consumption. Therefore, this tool must

be easy to integrate into existing high level design tool flows, and its results must aid the designer

in clearly identifying power consumption trade-offs. Our tool is easy to integrate into existing tool

flows as it accepts as input gate level Verilog code. Optionally, the tool also takes a list of capaci-

tance estimates for each of the nets in the design. These estimates can be extracted from the layout

A gate level simulator for power consumption analysis May 1, 1996 2

of the design or high level estimation techniques, such as those in [Don79] [Feu82] [Lan941, can be

used.

The tool aids the designer in clearly identifying power trade-offs by correlating the results to the

original RTL specifications. A unique aspect of this tool is that not only does it correlate power

consumption to functional blocks, but it also reports power dissipation by control state. Correlat-

ing power consumption with control states allows the designer to further and better analyze system

designs, suggesting not only modules but also times and events that contribute to the power con-

sumption of the system. It gives the designer the ability to evaluate spatial and temporal trade-offs.

Temporal power trade-offs must be addressed for two reasons. First, by showing the designer the

control states that use the most power, the designer can concentrate high level low power optimi-

zations on the control states that need it most. Second, estimating power by both control state and

functional block may alert the designer to times and areas where peak power consumption is high.

The importance of minimizing peak power dissipation has been discussed in [San95]. One large

problem addressed by minimizing peak power consumption is electromigration.

In Section 2.0 we will discuss our power consumption model, and in Section 3.0 we will discuss

the implementation of our tool. Section 4.0 describes the results of applying this tool to 12 differ-

ent implementations of a DCT. Finally, Section 5.0 offers conclusions and outlines directions for

future work.

A gate level simulator for power consumption analysis May 1, 1996 3

2.0 Power estimation

The purpose of this tool is to accurately estimate power consumption at a high level of abstraction.

More specifically, this tool estimates dynamic power dissipation based on simulation results.

Dynamic power dissipation of a CMOS circuit can be calculated with the following equation

[Wes85]:

1 __2P :

where C is load capacitance and fs is the switching frequency of the circuit. Our tool calculates

fs’ and takes as inputs the values of Vdd and C, which can either be extracted from layout or esti-

mated by other tools [Don79][Feu82][Lan94]. Static and short circuit power estimation is not

taken into consideration here, although it is assumed that if the target cell library is known, these

could be calculated and added to the results produced by this tool.

2.1 Related work

Most of the previous work done in power consumption estimation differs from our approach in the

level of abstraction at which the estimations are made or the method of obtaining the estimations.

Also, none of this work correlates the power estimates to both functional modules and control

states.

At the circuit level, both SPICE [Nag75] and PowerMill IEpi96] can be used to measure power

consumption by digital systems. Although PowerMill can run over 1000 times faster than SPICE,

it is still impractical to simulate at the circuit level for large designs or if many inputs vectors are to

be simulated. At the next level of abstraction, the switch level, simulators such as IRSIM [Sa189]

are able to simulate circuits over 500 times faster than SPICE, with a root mean square error of

less than 15% [Lan94]. Still, faster simulations could be done at the gate level.

A gate level simulator for power consumption analysis May 1, 1996 4

Several faster gate level tools have been developed, but none are generally applicable to a wide

variety of applications. Several require that the input vectors are able to be characterized probabi-

listically a priori [Naj91 ] [Gho92] [Cho94] [Mar94]. Other work is designed for and applicable only

to signal-processing algorithms [Pow90]. Devedas, et.al, have designed a generally-applicable

gate level algorithm, but it only predicts worst-case power dissipation [Dev90].

Landman and Rabaey have developed architecture level techniques, [Lan93] [Lan94] [Lan95], but

these also require a priori characterization of the input vectors. Although these tools provide accu-

rate results for the types of applications they are geared toward, a more generally applicable tool is

needed.

By working at the gate level of abstraction our tool is able to simulate larger designs than the

switch level simulators, and by making estimations based on simulation, our tool can be used to

estimate power regardless of whether their inputs can be readily characterized.

We chose to use simulation-based power estimation over probabilistic power estimation tech-

niques for several reasons. First, not all systems have inputs that are easily or accurately character-

ized by probabilistic methods. Second, this tool fits into existing tool flows easily. Even if the

system could be accurately characterized for probabilistic estimation, designing the statistical

models may involve a significant amount of additional work for the designer. Simulation-based

estimation requires very little extra work for the designer. Finally, we hope to use this tool to verify

the results of probabilistic estimation methods for various types of algorithms.

A gate level simulator for power consumption analysis May 1, 1996 5

3.0 Implementation

Our goal is to provide an easily integratable tool that accurately estimates power consumption at a

high level of abstraction. Our power estimates are calculated by simulating the hierarchical gate

level Verilog description and then post-processing the value change dump (VCD) file produced

the simulation. Thus, the designer does not need to alter the existing tool flow. This tool can be

added "on the side" for additional help in evaluating power trade-offs.

3.1 Tool flow

One of the goals in creating this tool is that it must be easily integratable into existing tool flows.

Since its input is hierarchical gate level Verilog, this tool can easily be inserted in existing tool

flows. Figure 1 illustrates where the power estimator can be used in the tool flow currently used in

the Center for Electronic Design Automation at Carnegie Mellon University. As shown in the dia-

gram, this tool is used to estimate power after logic and datapath synthesis has been performed.

Power estimation can be performed again after the circuit has been laid-out to produce a more

accurate estimation using capacitance values extracted from the layout information. Note also that

the addition of the power estimation tool does not alter the original tool flow at all. It merely adds

another tool that can be used when high level power estimates are desired.

3.2 Estimating power

The power estimation tool is actually a series of programs, as shown in Figure 2. The gate level

Verflog code produced by logic and datapath synthesis is simulated with a standard Verilog simu-

lator and a VCD file is created. The VCD file is then passed to heads~r±pl~er and s~a~-

e s t r i pp e r, which extracts the information from the VCD file. Then 1 i s t dr i ver s is invoked

and, through use of the Verilog programming language interface (PLI), a list of the drivers of the

nets in the design is produced. If the design has been laid-out, parsespf is used to extract capac-

A gate level simulator for power consumption analysis May 1, 1996 6

BehavioralLevel Verilog

Behavioral Synthesis SAW

Register TransferLevel Verilog

Logic and Datapath f Synopsys ~D,e_sign Compiler/~Synthesis ~ CASCADE s Epoch

Gate LevelVerilog

Place and Route

StandardParasitics File(SPF)

FIGURE 1. Tool flow at CEDA

itances from the standard parasitics file (SPF). The results of all the programs are finally passed

pc~wer___parser which produces power estimates by module and control state.

Note that the implementation presented in Figure 2 assumes that all of the functionality of the

power estimation tool is being used. If, for example, layout has not yet been performed and no

standard parasitics file (SPF) is available, lhe par s e sp f program would never be invoked and

A gate level simulator for power consumption analysis May 1, 1996 7

stripped SPF file would be passed to power_gars er. If power estimates by control state were

not desired, then states tril2per would not be invoked and no state information file would be

passed to t~ower_~arser. Similarly, if the power estimates are not to be correlated with func-

tional modules, the 1 i s tdr iver s program would not be used.

Below we will discuss the functionality and implementation of each of the programs involved in

the power estimation tool., and a user’s manual for the tool is located in the Appendix.

3.2.1 Simulation

The first step of the power estimation tool is simply a straightforward Verilog simulation that cre-

ates a value change dump (VCD) file. In general, a Verilog simulation would be done at this phase

of the design process even if no power estimates were to be made in order to verify the gate level

design. Thus, no overhead is added to the design process by running the simulation. The only

modification of the Verilog description that needs to be made is the addition of the value change

dump Verilog commands, $ dumpy ± 1 e and $ dumpvars, if these are not already included in the

code.

By creating a VCD file, the designer can perform many different analyses on the same gate level

design while only running the actual Verilog simulation once. Because of their size, we usually

compress the VCD files and then pass them to the other programs through a pipe from zcat, a

UNIX program that outputs the contents of a compressed file. As a result, the VCD file cannot be

rewound during reading. This is largely why the estimation environment has been broken into sev-

eral smaller programs.

3.2.2 headstripller

The purpose of heads tr ipl3er is to create a copy of the portion of the VCD file that defines the

tokens of all nets, registers and variables. This information is needed for the 1 istdrivers and

A gate level simulator for power consumption analysis May 1, 1996 8

Gate LevelVerilog

VerilogSimulation

StandardParasitics File(SPF)

Value ChangeDump (VCD) File

StrippedSPF

headstripper

statestri

VCDHeader

VCD StateInformation

listdrivers

DriverList

Power estimates by moduleand control state, usingcapacitance valuesextracted from layout

FIGURE 2. Implementation of the power estimation tool

power_parser programs. This information was originally copied to a separate file so that the

VCD file would never have to be scanned more than once, since it is usually passed through a pipe

from zcat, as mentioned above. Although the header would never have to be parsed more than

once in the current incarnation of the tools, headstr±lvper is still used.

A gate level simulator for power consumption analysis May 1, 1996 9

headstripper runs fairly quickly, and uses less than 100KB of memory, headstripper

was implemented with approximately 20 lines of C code.

3.2.3 statestril)l~er

statestripper is much like headstripper except that it parses the entire VCD file and

copies only those lines that have to do with the control state variable. This is necessary so that the

VCD file needs to be parsed only once during the execution of power__pars er.

statestripper requires that the designer knows the names of the nets whose value is the con-

trol state. One limitation of this tool is that the control state must have one net name (such as

CSTATE [ 3 : 0 ] ) and cannot be a concatenation of several nets (such

( a [ 1 ], foo, bar, cout [ 3 ] }). Currently, the designer must manually alter the gate level Ver-

ilog code, if necessary. Since we are using the Synopsys Design Compiler for our designs, we have

always found it easy to make this alteration since the names of the control state registers are nearly

identical to the control state register names at the register transfer level.

s tatestripper executes fairly quickly, although it does take longer than headstripper as

s tares tripper must parse the entire VCD file. It uses i00 KB of memory during execution,

and is a simple program with approximately 70 lines of C code.

3.2.4 parsespf

The purpose of p ar s e sp f is to extract the capacitances for the nets in the design. The output is a

list of net names with their associated capacitances.

This is simply a parser written in C++ that steps through the SPF file. It executes quickly and uses

less than 200 KB of memory, parsespf is implemented with approximately 80 lines of C code.

A gate level simulator for power consumption analysis May 1, 1996 10

3.2.5 listdrivers

In hierarchical Verilog descriptions several named nets in the hierarchy often refer to the same

physical net. Hereafter, I shall term such Verilog nets "analogous" nets. To accurately correlate

power estimations with modules, the Verilog name of the driver of the physical net must be deter-

mined. Then all power consumed on the net is attributed to the module containing the driver.

I i s tdr ivers outputs a list of the Verilog nets connected to the drivers.

The driving net is determined by both looking at the header of the VCD file and through use of the

Verilog programming language interface (PLI). The VCD header file is used to rapidly determined

analogous nets, since analogous nets are assigned to the same token in the VCD file. Once the

analogous nets are found, the PLI is used to determine which of the nets is connected to the driver.

Note that although the PLI is used, the simulation is not run a second time; I istdrivers deter-

mines the drivers at the end of compilation and then exits.

1 is tdr ivers still executes fairly quickly, but the memory requirement is much larger, as much

as 60 MB for a 21,000 net design. However, approximately 23 MB of this is the overhead involved

in running the Verilog simulator. The amount of memory used by the PLI code is O(n), where n

is the number of distinct nets.

1 i s tdr ivers was implemented with approximately 760 lines of C code linked to the verilog

simulator through the PLI.

3.2.6 power_~arser

Finally, the header file, control state value change file and driver list are parsed along with the

VCD file and power estimations are produced. A general discussion of the algorithm used and its

complexity follows.

A gate level simulator for power consumption analysis May 1, 1996 11

First, the header file is read and a sparse table data structure is created for the nets so that the

search time for any net is O(1) when the net’s token (as specified in the VCD file) is known.

ating and initializing this structure is O(n), where n is the number of distinct nets.

The driver list is then parsed and the driving nets are tagged. The driver names are first placed in a

binary tree. Building such a tree has time complexity O(nlogn). Each net must search the tree for

a driver, so the time complexity of the all of the searches is n ¯ O(logn) O(nlogn). Therefore,

the total time complexity for parsing and tagging the drivers is O(nlogn).

The stripped SPF file is then read in and stored in a binary tree and the nets are assigned their cor-

responding capacitances with an algorithm similar to that used above. By a similar argument, the

total time complexity for assigning the capacitance values is O(nlogn).

The VCD file is then parsed and transitions are counted for each net and also categorized by time.

Adding a transition for a net is O(1), as discussed above. Adding transitions for a certain Verilog

time step is also O(1), so the total time for the parsing of the VCD file and counting transitions

O(v), where v is the number of value changes.

Next, power is calculated for each net. This involves one floating-point multiply for each net, so

the time complexity of this is simply O(n). Next, the state value changes are parsed and transi-

tions are characterized by time. This involves stepping through an array with one entry for each

simulated time step, so the time complexity for this step is O(t), where t is the number of time

steps in the simulation.

Next, statistics are gathered by module. This involves stepping through the table of nets and doing

a strcmt3 ( ) function call for each net’s driver. Thus, O(n) s~rcmt3 ( ) ’s will be called.

A gate level simulator for power consumption analysis May 1, 1996 12

Finally, statistics are gathered by control state. This involves O(t) comparisons and additions.

Note that O(t) operations must be performed for the entire system as well as for each module for

which statistics are being gathered. Since the number of modules for which statistics are being col-

lected can be assumed to be a small constant, the total number of operations that must be done to

collect state statistics is O(t).

Thus, the total worst-case rime complexity of the VCD parser would be O(nlogn) for large

designs with a shorter simulation time or O(v) for lo nger simulations. The memory usage for

shorter and medium size simulations is dominated by O(n) because each net is represented by a

class instantiation. The memory usage for a very long simulation would be O(t), as one double

and one ±nt are malloc’ed for each Verilog time step.

power__~ar set is the portion of the power estimation tool that consumes the most CPU time, as

will be shown in Section 4.0. It also is the largest piece of code, implemented in over 2200 lines of

C++.

3.3 Estimating capacitance

Accurate capacitance estimations are essential for accurate power estimations. In the set of tools

described above, capacitance value are extracted from the design once a layout has been com-

pleted. Although this does give very accurate estimates of capacitance, it is not the only way such

estimates could be produced. Since t~ower_pars er simply reads in a file of net names and their

associated capacitances, the tool is highly flexible, allowing designers to use either high level esti-

marion techniques such as those presented in [Don79]lFeu82][Lan94] or to back-annotate capaci-

tances by extracting them from the layout once it has been done. If high level estimation

1. v will always be greater than or equal to t, since a time step is executed in a simulation if and only if oneor more value changes occur during that time step. Therefore, O(v) dominates O(t).

A gate level simulator for power consumption analysis May 1, 1996 13

techniques were used, the only change to the tools above would be the modification or omission of

parsespf.

A gate level simulator for power consumption analysis May 1, 1996 14

4.0 Case study: DCT

We have run twelve different versions of the one-dimensional discrete cosine transform. These

designs were created by Coumeri and the behavioral level differences in the designs can be seen in

Table 1 [Cou96]. Note that "# of partitions" refers to the number of pipeline stages (each with its

own control logic) in the design. DCT1 through DCT8 are not pipelined; they have only one parti-

tion. "# of mult" is the number of multipliers in the design. "memory prefetch" is "yes" if values

for iteration i+1 of the loop are being fetched while iteration i is being executed. This column does

grey code grey code# of memory state memory

example partitions # of mult prefetch encoding access # of nets

DCT1 1 3 no no no 15810

DCT2 1 3 no yes no 15879

DCT3 1 2 no no no 12196

DCT4 1 2 no yes no 12226

DCT5 1 3 yes no no 16848

DCT6 1 3 yes yes yes 16833

DCT7 1 2 yes no no 13306

DCT8 1 2 yes yes yes 13280

DCT9 6 3 ..... no no 21327

DCT10 6 3 ..... yes yes 21346

DCTll 2 3 ..... no no 16725

DCT12 2 3 ..... yes no 16745

TABLE 1. DCT Descriptions

not apply to the pipelined designs (DCT9 through DCT12) since they are already executing multi-

ple iterations of the loop at the same time. "grey code state encoding" and "grey code memory

accesses" are "yes" if the states and memory addresses, respectively, are accessed in grey code

order. Finally, "# of nets" is the number of physical nets in each design.

Table 2 shows the CPU time and memory usage for headstripper, statestripper,

i is tdrivers and power__parser being executed on three of the DCT designs. These three

designs were chosen because DCT3 is the smallest example (i.e. fewest number of nets), DCT9

one of the largest examples, and DCT1 is somewhere in between. The CPU time is reported in

A gate level simulator for power consumption analysis May 1, 1996 15

minutes and seconds. Note that these results are dependent on which statistics are being collected.

headstripper statestripper listdrivers power_parser

CPU time Memory CPU time Memory CPU time Memory CPU time Memory

DCT1 0:21 92 KB 6:56 100 KB 1:32 52196 KB 24:44 40940 KB

DCT3 0:17 92 KB 7:42 100 KB 1:19 50368 KB 20:05 39408 KB

DCT9 0:26 92 KB 5:33 100 KB 2:20 60000 KB 32:59 45212 KBTABLE 2. Execution times and memory usage for the power estimation for three of the DCT designs

As stated in Section 3.2.6, collecting statistics by module and state both involve some overhead in

computation time. For each of the examples, the same statistics were being collected: total energy,

energy consumed by each multiplier, energy consumed by all of the adders and subtracters, energy

consumed by registers, energy consumed by random glue logic (everything except the above mod-

ules) and total energy by control state. The statistics were gathered on an IBM RS/6000 worksta-

tion with 384 MB of memory.

For our DCT examples, power_parser never took more than 35 minutes of CPU time on the

RS/6000, and most of the time was spent during the actual parsing of the VCD file. Also, the

power estimation environment never required more than 60 MB of memory.

The disk space overhead for the output files is presented in Table 3. For the 12 versions of the DCT

we tested, ranging from 12,000 to 21,000 nets, heads t ripper and s t a t es tripper

involved an overhead of 1.0 to 1.7 megabytes of hard disk storage for the header information and

56 to 140 kilobytes for the state information. Note that the input for the DCT’s was 25 8x8 blocks

of image data, and that the disk space overhead for state information scales linearly with the num-

ber of blocks in the simulation. The header information is unaffected by the length of the simula-

A gate level simulator for power consumption analysis May 1, 1996 16

tion. The hard disk space overhead involved in storing the list of drivers is 297 to 490 kilobytes for

the DCT examples. This is independent of the length of the simulations.

headstripper statestripper listdrivers power_parser

DCT1 1216 KB 116 KB 395 KB < 1 KB

DCT2 1223 K_B 116 KB 396 KB < 1 KB

DCT3 1000 KB 137 KB 290 KB < 1 KB

DCT4 1004 KB 137 KB 291 KB < 1 KB

DCT5 1335 KB 81 KB 412 KB < 1 KB

DCT6 1335 KB 81 KB 412 KB < 1 KB

DCT7 1129 KB 101 KB 309 KB < 1 KB

DCT8 1126 KB 101 KB 308 KB < 1 KB

DCT9 1723 KI3 55 K13 479 KB < 1 KB

DCT10 1726 KB 60 KB 479 KB < 1 KB

DCTll 1318 KB 73 KB 411 KB < 1 KB

DCT12 1320 KB 73 KB 411 KB < 1 KB

TABLE 3. Disk space usage for power estimation of the 12 DCT designs

4.1 Results

Figure 3 gives a comparison of the results for the 12 designs. Since all 12 designs were not laid-

out, the results shown assume that all nets have the same capacitance, i. e. only transition counts

are reported. One interesting thing we immediately noticed from these results is that using grey

code for the state encoding produced a 10% or better reduction in transition count for all of the

designs. Also, notice that pipelining the design reduces power consumption.However, for these

designs it would be arguable if pipelining into six stages, since in DCT9 and DCT10, instead of

two stages, as in DCT11 and DCT12, is worthwhile as power savings is about 10% while the com-

plexity (as measured by number of nets) has increased over 30%. Still, if power savings is more

important than minimizing complexity of the circuit, the six stage pipeline designs would be

desired.

Figure 4 shows the transition count by control state for the DCT 1 and DCT 2 designs. Again, the

value of using grey code for state encoding can be seen here. By comparing the states in DCT1 and

A gate level simulator for power consumption analysis May 1, 1996 17

Transitions by Module for Twelve DCT Designs

8000000

7000000

6000000

5000000

4000000

~3000000

2000000

1000000

¯ DCT1¯ DCT2" DCT3[] DCT4¯ DCT5~a DCT6

¯ DCT7¯ DCT8¯ DCT9¯ DCT10[] DCT11¯ DCT12

Module

FIGURE 3. Comparison of 12 DCT designs

DCT2 that perform the same functions, we see that the approximate 10% reduction is apparent in

all similar states. (Remember that because the states of DCT2 are grey coded, DCT1 state 2 is the

same as DCT2 state 3, DCT1 state 3 is the same as DCT2 state 2, DCT1 state 4 is the same as DCT

state 6, etc.)

Transitions by Control State for Two DCTDesigns

140000012000001000000800000600000400000200000

0 1 2 3 4 5 6 7

Control State

FIGURE 4. Transition count by control state for two DCT designs

A gate level simulator for power consumption analysis April 30, 1996 18

Finally, the DCT 1 design was laid-out and capacitance values were extracted to give us the results

in Figure 5. Notice that although the transition counts appear to be a good predictor of energy for

the multipliers, transition counting alone underestimates the amount of power consumed by ran-

dom logic and overestimates the power consumed by the adders and subtracters. This makes sense

as one can imagine that the random logic would often be driving fairly long nets with a large

capacitance, while the nets inside the adders and subtracters would be very short 0.e. low capaci-

tance) most of the time. The nets inside the array multipliers would not be as short as those inside

the adders nor as long as those in the random logic. Correspondingly, the transition counts for the

multipliers are a better predictor of energy consumed than for either the random logic or the adders

and subtracters.

The transition counts in Figure 3 and Figure 5 differ because different target cell libraries were

used. Also, because the library used for Figure 5 had more accurate timing models, more glitching

occurred within the circuit. As a result, only 16 8x8 blocks were able to be simulated because of

the larger size of the VCD file. (The 16 block example produced a VCD file four times larger than

the 25 block examples that assume equal delay for all cells.) Thus, the results in Figure 3 and

Figure 5 should not be compared against each other.

Our gate level power estimation tool has produced estimations for 12 different DCT implementa-

tions. These estimations have allowed us to determine the relative impact of several high level

transformations, such as grey coding state assignments, resource sharing and pipelining, on

dynamic power dissipation for these designs.

A gate level simulator for power consumption analysis May 1, 1996 19

Transitions and Energy by Module for DCT1

10000000 2.50E-05

9000000

8000000

7000000

6000000

5000000

4000000

3000000

2000000

1000000

0

2.00E-05

1.50E-05 ~o

1.00E-05 ~

5.00E-06

O.OOE+O0

Module

Im~ TransitionsiI,,,~Energy I

FIGURE 5. Power estimates including capacitance information for one DCT design (DCT 1)

A gate level simulator for power consumption analysis April 30, 1996 20

5.0 Conclusions and Future Work

We have designed a gate level tool that estimates power consumption and correlates the results

with the original register-transfer level specifications. It reports power estimates as a function of

control state as well as functional module, and it can accept capacitance estimates from layout or

other tools.

Currently, we are using this tool to judge the impact of high level transformations, such as pipelin-

ing and varying the amount of resource sharing and parallelism, on dynamic power dissipation for

the 12 designs of the DCT algorithm. In the near future we will also estimate power consumption

for the Aurora RISC microprocessor core, reporting results by functional units and by instruction.

As our low power designs efforts continue, this tool will be used to help pinpoint areas where

power-saving optimizations are most needed and to verify the accuracy of existing statistical

power estimation techniques.

Eventually, we hope to categorize various high level transformations as to which types of transfor-

mations are most likely to decrease power consumption, or the power-delay product, of both stan-

dard CMOS and QuadRail systems.

A gate level simulator for power consumption analysis May 1, 1996 21

Appendix A - User’s manual

In the following sections, I shall outline how to use each subprogram of the power estimation tool.

Please refer to Figure 2 on page 9 to see how each of these tools are tied together.

A.1 headstripper

heads tripper reads the VCD file from s tdin and outputS the header file to s tdout. To

invoke it from [he command line, you would enter:

% headstripper < vcdfile > vcd.head

or, if the VCD file is compressed:

% zcat vcdfile.gz [ headstrip~er > vcd.head

A.2 statestripper

statestripper alSO reads the VCD file from stdin and outputs to stdout. It also takes as

an argument the hierarchical Verilog name of the net with the value of the control state. As men-

tioned in Section 3.2.3 on page 10, the net must be a single multi-bit net and not a concatenation of

nets. To invoke the program from the command line, you would enter:

% statestripper top.foo.bar. CSTATE < vcdfile > vcd. state

or if the VCD file is compressed:

% zcat vcdfile.gz I statestripper top.foo.bar.CSTATE > vcd. state

A.3 l)arsespf

parsesp f also reads from s tdin and outputs to s tdout. It takes as an argument the hierarchi-

cal prefix to be added to the net names present in the SPF file. For example, in our DCT examples,

the DCT was laid-out, but in the Verilog source that passes input vectors to the DCT the DCT

module was named top. dct_l. To invoke parsespf in this case we used:

% parsespf top.dct_l. < spffile > strippedspf

A gate level simulator for power consumption analysis May 1, 1996 22

% listdrivers-v ../mod.v+turbo+3

Note that the trailing period on tOp. dct_l, is necessary. If multiple modules were laid-out,

then par s e sp f would be executed for each and then all of the SPF files could then be concate-

nated together as follows:

% first_strippedspf second_strippedspf > final_strippedspf

A.4 listdrivers

1 i s t dr ive r s is a PLI routine and should be invoked in the same way as the original simula-

tion. One additional argument must be added +dump_ followed by the name of the VCD header

file. For one of our DCT simulations, we invoked 1 i s tdr iver s as follows:

+dump_dct.head sim.v -v dctlf.map2.v \-v ../ms0803vcells_mosis -v ../ms080_3vprims \

The output of listdrivers a list of the driver nets in the file DRV. list.

A.5 power_parser

power_parser also accepts input from stdin and outputs to stdout. There are a number of arguments

that can be passed to power_parser, outlined below.

¯ -fdrv driverfile " name of the driver file created by listdrivers

-fhead vcd. head " name of the VCD header file created by heads tripper

-fstate vcd. state ¯ name of the VCD state file created by statestripper

- f s sp f s t r i pp e d s p f ¯ name o f the s~ipped SPF file created by p a r s e s p f

-state top. foo. bar. CSTATE ¯ name of the control state net. The net name should be

the same as the one used in statestripper.

A gate level simulator for power consumption analysis May 1, 1996 23

¯ -net top. foo. bar. \* "names of nets to determine energy consumption for. In this case,

statistics would be gathered for the module top. foo. bar. Note that both leading and trailing

¯ ’s are legals, but internal *’s are not. Note also that the * must be escaped with the \ so that

the shell does not try to expand it. Any number of -net arguments is legal.

As an example of the use of power_s)arser, the following command was used to get the statis-

tics displayed in Figure 5, "Power estimates including capacitance information for one DCT

design (DCT 1)," on page 20:

% zcat ../tmp/dctll.gz I ../parser -fdrv dctll.drv \

-fhead dctll.head -fstate dctll.state -state top.dct_l.CSTATE \

-net top.dct_l.multi* -net top.dct_l.mult_l\* \

-net top.dct_l.mult_2\* -net top.dct_l.mult_3\* \

-net top.dct_l.U\* -net top.dct_l.r\* -net \*_reg\* \

-net top.\* -fsspf dctll.spf > dctll.results

A gate level simulator for power consumption analysis May 1, 1996 24

References

[Cho94] T. Chou, K. Roy and S. Prasad, "Estimation of circuit activity considering signalcorrelations and simultaneous switching," Proceedings of ICCAD 94, pp. 300-303, Nov. 1994.

[Cor90] T. H. Cormen, C. E. Leiserson and R. L. Rivest, Introduction to algorithms, NewYork: McGraw-Hill Book Company, pp. 244-259, 1990.

[Cou96] S. L. Coumeri, private communication, April 1996.

[Epi96] Epic Design Technologies, Inc., http://www.epic.com/powermill.html, 1996.

[Dev90] S. Devadas, K. Keutzer and J. White, "Estimation of power dissipation in CMOScombinational circuits," Proceedings of Custom IC Conference 90, pp. 19.7.1-19.7.6.

[Don79] W. Donath, "Placement and average interconnection lengths of computer logic,"IEEE Transactions on Circuits and Systems, pp. 272-277, April 1979.

[Feu82] M. Feuer, "Connectivity of random logic," IEEE Transactions on Computers, pp.29-33, Jan. 1982.

[Gho92] A. Ghosh, S. Devadas, K. Keutzer, and J. White, "Estimation of average switch-ing activity in combinational and sequential circuits," Proceedings of DAC 92, pp.253-259, 1992.

[Kri961 R. K. Krishnamurthy, I. Lys and L. R. Carley, "Static power driven voltage scalingand delay driven buffer sizing in mixed swing quadrail for sub-IV I/O swings,"submitted to IEEE/A CM International Symposium on Low Power Electronics andDesign 96, August 1996.

[Lan931 R E. Landman and J. M. Rabaey, "Power estimation for high level synthesis,"Proceedings of EuroDAC 93, pp.361-366, Feb. 1993.

[Lan941 P. E. Landman, "Low-power architectural design methodologies," ElectronicsResearch Laboratory, College of Engineering, University of California, Berkeley(UCB/ERL M94/62), 1994.

lLan95] E E. Landman and J. M. Rabaey, "Architectural power analysis: the dual type bitmethod," IEEE Transactions on VLSISystems, pp. 173-187, June 1995.

[Mar94] R. Marculescu, D. Marculescu and M. Pedram, "Switching activity analysis con-sidering spatiotemporal correlations," Proceedings oflCCAD 94, pp. 294-299,Nov. 1994.

[Nag75] L. W. Nagel, "SPICE2: a computer program to simulate semiconductor circuits,"Technical report, University of California, Berkeley (ERL-M520), 1975.

[Naj91] E Najm, "Transition density, a stochastic measure of activity in digital circuits,"Proceedings of DAC 91, pp. 644-649, June 1991.

A gate level simulator for power consumption analysis May 1, 1996 25

[Pow90]

[Sa189]

[San95]

[Wes85]

S. R. Powell, E M. Chau, "Estimating power dissipation of VLSI signal process-ing chips: the PFA technique," VLSI Signal Processing IE, pp. 250-259, 1990.

A. Salz and M. Horowitz, "IRSIM: an incremental MOS switch-level simulator,"Proceedings of DAC 89, pp. 173-178, 1989.

R. San Martin and J. E Knight, "Power-profiler: optimizing ASICs power con-sumption at the behavioral level," Proceedings of DAC 95, pp. 42-47, June 1995.

N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design, Reading,MA: Addison-Wesley Publishing Company, pp. 147-149, 1985.

A gate level simulator for power consumption analysis May 1, 1996 26