a gate level simulator for power consumption analysis
TRANSCRIPT
CARNEGIE MELLONDepartment of Electrical and Computer Engineering~
A Gate Level Simulator for PowerConsumption Analysis
David J. Pursley
1996
Advisor: Prof. Thomas
A gate level simulator for powerconsumption analysis
David J. Pursley ([email protected])
Department of Electrical and Computer Engineering
Carnegie Mellon University
Pittsburgh, PA 15213
Power consumption of digital circuits has become a critical design parameter. As such, it is neces-
sary that the system designer is able to estimate power consumption and correlate the results back
to high level specifications. A gate level tool that estimates power consumption and correlates the
results with functiona! modules and control states has been designed. This tool has produced esti-
mations of the power consumption of twelve different implementations of the discrete cosine trans-
form (DCT). These results are being used to judge the relative impact of high-leve!
transformations, such as pipelinin$ and varying the amount of resource sharing and parallelism,
on power dissipation for the D CT algorithm.
A gate level simulator for power consumption analysis May 1, 1996 1
Acknowledgments
I would like to first thank my advisor, Don Thomas, for his patience, guidance and exam-
ple over the past two years.
I would also like to thank my research partners and officemates, Pinar Ceyhan and Sad
Coumeri, for both their help and their willingness to always lend an ear.
Finally, I thank my professors at Bucknell University who first interested me in the field of
computer engineering and then aided me in my decision to continue towards my Master’s and
(someday) my Ph.D. Those helpful professors include Daniel Hyde, Jerud Mead, Xiannong Meng
(who is now at The University of Texas-Pan American), James Lu and Maurice Aburdene.
1.0 Introduction
Power consumption of digital circuits has become a critical design parameter. For example, porta-
ble applications require low power circuits to extend battery life, and all circuits have to deal with
the problem of electromigration. Thus, it is important that the system designer is able to estimate
power consumption and correlate the results with high level specifications.
We have designed a gate level tool that estimates power consumption and correlates the results to
the original register-transfer level (RTL) specifications. A unique aspect of this tool is that power
consumption is both estimated for individual modules and reported by control state. It can also be
back-annotated with actual capacitance values from layout to produce more accurate estimations.
This tool is also being used to help pinpoint areas where power-saving optimizations are most
needed and to verify the accuracy of existing statistical power estimation techniques. We have esti-
mated the power consumption of 12 different implementations of the discrete cosine transform,
and we are currently laying out the designs in order to obtain capacitance values for back-annota-
tion. In the future, we hope to use this tool to aid in the design of systems using QuadRall technol-
ogy, a low-power CMOS-based technology currently being designed at CMU [Kri96].
1.1 Our approach
Our goal is to provide a power estimation tool that will be maximally useful to the system designer
in considering various high level transformations, such as pipelining and varying the amount of
resource sharing and parallelism, and their effect on power consumption. Therefore, this tool must
be easy to integrate into existing high level design tool flows, and its results must aid the designer
in clearly identifying power consumption trade-offs. Our tool is easy to integrate into existing tool
flows as it accepts as input gate level Verilog code. Optionally, the tool also takes a list of capaci-
tance estimates for each of the nets in the design. These estimates can be extracted from the layout
A gate level simulator for power consumption analysis May 1, 1996 2
of the design or high level estimation techniques, such as those in [Don79] [Feu82] [Lan941, can be
used.
The tool aids the designer in clearly identifying power trade-offs by correlating the results to the
original RTL specifications. A unique aspect of this tool is that not only does it correlate power
consumption to functional blocks, but it also reports power dissipation by control state. Correlat-
ing power consumption with control states allows the designer to further and better analyze system
designs, suggesting not only modules but also times and events that contribute to the power con-
sumption of the system. It gives the designer the ability to evaluate spatial and temporal trade-offs.
Temporal power trade-offs must be addressed for two reasons. First, by showing the designer the
control states that use the most power, the designer can concentrate high level low power optimi-
zations on the control states that need it most. Second, estimating power by both control state and
functional block may alert the designer to times and areas where peak power consumption is high.
The importance of minimizing peak power dissipation has been discussed in [San95]. One large
problem addressed by minimizing peak power consumption is electromigration.
In Section 2.0 we will discuss our power consumption model, and in Section 3.0 we will discuss
the implementation of our tool. Section 4.0 describes the results of applying this tool to 12 differ-
ent implementations of a DCT. Finally, Section 5.0 offers conclusions and outlines directions for
future work.
A gate level simulator for power consumption analysis May 1, 1996 3
2.0 Power estimation
The purpose of this tool is to accurately estimate power consumption at a high level of abstraction.
More specifically, this tool estimates dynamic power dissipation based on simulation results.
Dynamic power dissipation of a CMOS circuit can be calculated with the following equation
[Wes85]:
1 __2P :
where C is load capacitance and fs is the switching frequency of the circuit. Our tool calculates
fs’ and takes as inputs the values of Vdd and C, which can either be extracted from layout or esti-
mated by other tools [Don79][Feu82][Lan94]. Static and short circuit power estimation is not
taken into consideration here, although it is assumed that if the target cell library is known, these
could be calculated and added to the results produced by this tool.
2.1 Related work
Most of the previous work done in power consumption estimation differs from our approach in the
level of abstraction at which the estimations are made or the method of obtaining the estimations.
Also, none of this work correlates the power estimates to both functional modules and control
states.
At the circuit level, both SPICE [Nag75] and PowerMill IEpi96] can be used to measure power
consumption by digital systems. Although PowerMill can run over 1000 times faster than SPICE,
it is still impractical to simulate at the circuit level for large designs or if many inputs vectors are to
be simulated. At the next level of abstraction, the switch level, simulators such as IRSIM [Sa189]
are able to simulate circuits over 500 times faster than SPICE, with a root mean square error of
less than 15% [Lan94]. Still, faster simulations could be done at the gate level.
A gate level simulator for power consumption analysis May 1, 1996 4
Several faster gate level tools have been developed, but none are generally applicable to a wide
variety of applications. Several require that the input vectors are able to be characterized probabi-
listically a priori [Naj91 ] [Gho92] [Cho94] [Mar94]. Other work is designed for and applicable only
to signal-processing algorithms [Pow90]. Devedas, et.al, have designed a generally-applicable
gate level algorithm, but it only predicts worst-case power dissipation [Dev90].
Landman and Rabaey have developed architecture level techniques, [Lan93] [Lan94] [Lan95], but
these also require a priori characterization of the input vectors. Although these tools provide accu-
rate results for the types of applications they are geared toward, a more generally applicable tool is
needed.
By working at the gate level of abstraction our tool is able to simulate larger designs than the
switch level simulators, and by making estimations based on simulation, our tool can be used to
estimate power regardless of whether their inputs can be readily characterized.
We chose to use simulation-based power estimation over probabilistic power estimation tech-
niques for several reasons. First, not all systems have inputs that are easily or accurately character-
ized by probabilistic methods. Second, this tool fits into existing tool flows easily. Even if the
system could be accurately characterized for probabilistic estimation, designing the statistical
models may involve a significant amount of additional work for the designer. Simulation-based
estimation requires very little extra work for the designer. Finally, we hope to use this tool to verify
the results of probabilistic estimation methods for various types of algorithms.
A gate level simulator for power consumption analysis May 1, 1996 5
3.0 Implementation
Our goal is to provide an easily integratable tool that accurately estimates power consumption at a
high level of abstraction. Our power estimates are calculated by simulating the hierarchical gate
level Verilog description and then post-processing the value change dump (VCD) file produced
the simulation. Thus, the designer does not need to alter the existing tool flow. This tool can be
added "on the side" for additional help in evaluating power trade-offs.
3.1 Tool flow
One of the goals in creating this tool is that it must be easily integratable into existing tool flows.
Since its input is hierarchical gate level Verilog, this tool can easily be inserted in existing tool
flows. Figure 1 illustrates where the power estimator can be used in the tool flow currently used in
the Center for Electronic Design Automation at Carnegie Mellon University. As shown in the dia-
gram, this tool is used to estimate power after logic and datapath synthesis has been performed.
Power estimation can be performed again after the circuit has been laid-out to produce a more
accurate estimation using capacitance values extracted from the layout information. Note also that
the addition of the power estimation tool does not alter the original tool flow at all. It merely adds
another tool that can be used when high level power estimates are desired.
3.2 Estimating power
The power estimation tool is actually a series of programs, as shown in Figure 2. The gate level
Verflog code produced by logic and datapath synthesis is simulated with a standard Verilog simu-
lator and a VCD file is created. The VCD file is then passed to heads~r±pl~er and s~a~-
e s t r i pp e r, which extracts the information from the VCD file. Then 1 i s t dr i ver s is invoked
and, through use of the Verilog programming language interface (PLI), a list of the drivers of the
nets in the design is produced. If the design has been laid-out, parsespf is used to extract capac-
A gate level simulator for power consumption analysis May 1, 1996 6
BehavioralLevel Verilog
Behavioral Synthesis SAW
Register TransferLevel Verilog
Logic and Datapath f Synopsys ~D,e_sign Compiler/~Synthesis ~ CASCADE s Epoch
Gate LevelVerilog
Place and Route
StandardParasitics File(SPF)
FIGURE 1. Tool flow at CEDA
itances from the standard parasitics file (SPF). The results of all the programs are finally passed
pc~wer___parser which produces power estimates by module and control state.
Note that the implementation presented in Figure 2 assumes that all of the functionality of the
power estimation tool is being used. If, for example, layout has not yet been performed and no
standard parasitics file (SPF) is available, lhe par s e sp f program would never be invoked and
A gate level simulator for power consumption analysis May 1, 1996 7
stripped SPF file would be passed to power_gars er. If power estimates by control state were
not desired, then states tril2per would not be invoked and no state information file would be
passed to t~ower_~arser. Similarly, if the power estimates are not to be correlated with func-
tional modules, the 1 i s tdr iver s program would not be used.
Below we will discuss the functionality and implementation of each of the programs involved in
the power estimation tool., and a user’s manual for the tool is located in the Appendix.
3.2.1 Simulation
The first step of the power estimation tool is simply a straightforward Verilog simulation that cre-
ates a value change dump (VCD) file. In general, a Verilog simulation would be done at this phase
of the design process even if no power estimates were to be made in order to verify the gate level
design. Thus, no overhead is added to the design process by running the simulation. The only
modification of the Verilog description that needs to be made is the addition of the value change
dump Verilog commands, $ dumpy ± 1 e and $ dumpvars, if these are not already included in the
code.
By creating a VCD file, the designer can perform many different analyses on the same gate level
design while only running the actual Verilog simulation once. Because of their size, we usually
compress the VCD files and then pass them to the other programs through a pipe from zcat, a
UNIX program that outputs the contents of a compressed file. As a result, the VCD file cannot be
rewound during reading. This is largely why the estimation environment has been broken into sev-
eral smaller programs.
3.2.2 headstripller
The purpose of heads tr ipl3er is to create a copy of the portion of the VCD file that defines the
tokens of all nets, registers and variables. This information is needed for the 1 istdrivers and
A gate level simulator for power consumption analysis May 1, 1996 8
Gate LevelVerilog
VerilogSimulation
StandardParasitics File(SPF)
Value ChangeDump (VCD) File
StrippedSPF
headstripper
statestri
VCDHeader
VCD StateInformation
listdrivers
DriverList
Power estimates by moduleand control state, usingcapacitance valuesextracted from layout
FIGURE 2. Implementation of the power estimation tool
power_parser programs. This information was originally copied to a separate file so that the
VCD file would never have to be scanned more than once, since it is usually passed through a pipe
from zcat, as mentioned above. Although the header would never have to be parsed more than
once in the current incarnation of the tools, headstr±lvper is still used.
A gate level simulator for power consumption analysis May 1, 1996 9
headstripper runs fairly quickly, and uses less than 100KB of memory, headstripper
was implemented with approximately 20 lines of C code.
3.2.3 statestril)l~er
statestripper is much like headstripper except that it parses the entire VCD file and
copies only those lines that have to do with the control state variable. This is necessary so that the
VCD file needs to be parsed only once during the execution of power__pars er.
statestripper requires that the designer knows the names of the nets whose value is the con-
trol state. One limitation of this tool is that the control state must have one net name (such as
CSTATE [ 3 : 0 ] ) and cannot be a concatenation of several nets (such
( a [ 1 ], foo, bar, cout [ 3 ] }). Currently, the designer must manually alter the gate level Ver-
ilog code, if necessary. Since we are using the Synopsys Design Compiler for our designs, we have
always found it easy to make this alteration since the names of the control state registers are nearly
identical to the control state register names at the register transfer level.
s tatestripper executes fairly quickly, although it does take longer than headstripper as
s tares tripper must parse the entire VCD file. It uses i00 KB of memory during execution,
and is a simple program with approximately 70 lines of C code.
3.2.4 parsespf
The purpose of p ar s e sp f is to extract the capacitances for the nets in the design. The output is a
list of net names with their associated capacitances.
This is simply a parser written in C++ that steps through the SPF file. It executes quickly and uses
less than 200 KB of memory, parsespf is implemented with approximately 80 lines of C code.
A gate level simulator for power consumption analysis May 1, 1996 10
3.2.5 listdrivers
In hierarchical Verilog descriptions several named nets in the hierarchy often refer to the same
physical net. Hereafter, I shall term such Verilog nets "analogous" nets. To accurately correlate
power estimations with modules, the Verilog name of the driver of the physical net must be deter-
mined. Then all power consumed on the net is attributed to the module containing the driver.
I i s tdr ivers outputs a list of the Verilog nets connected to the drivers.
The driving net is determined by both looking at the header of the VCD file and through use of the
Verilog programming language interface (PLI). The VCD header file is used to rapidly determined
analogous nets, since analogous nets are assigned to the same token in the VCD file. Once the
analogous nets are found, the PLI is used to determine which of the nets is connected to the driver.
Note that although the PLI is used, the simulation is not run a second time; I istdrivers deter-
mines the drivers at the end of compilation and then exits.
1 is tdr ivers still executes fairly quickly, but the memory requirement is much larger, as much
as 60 MB for a 21,000 net design. However, approximately 23 MB of this is the overhead involved
in running the Verilog simulator. The amount of memory used by the PLI code is O(n), where n
is the number of distinct nets.
1 i s tdr ivers was implemented with approximately 760 lines of C code linked to the verilog
simulator through the PLI.
3.2.6 power_~arser
Finally, the header file, control state value change file and driver list are parsed along with the
VCD file and power estimations are produced. A general discussion of the algorithm used and its
complexity follows.
A gate level simulator for power consumption analysis May 1, 1996 11
First, the header file is read and a sparse table data structure is created for the nets so that the
search time for any net is O(1) when the net’s token (as specified in the VCD file) is known.
ating and initializing this structure is O(n), where n is the number of distinct nets.
The driver list is then parsed and the driving nets are tagged. The driver names are first placed in a
binary tree. Building such a tree has time complexity O(nlogn). Each net must search the tree for
a driver, so the time complexity of the all of the searches is n ¯ O(logn) O(nlogn). Therefore,
the total time complexity for parsing and tagging the drivers is O(nlogn).
The stripped SPF file is then read in and stored in a binary tree and the nets are assigned their cor-
responding capacitances with an algorithm similar to that used above. By a similar argument, the
total time complexity for assigning the capacitance values is O(nlogn).
The VCD file is then parsed and transitions are counted for each net and also categorized by time.
Adding a transition for a net is O(1), as discussed above. Adding transitions for a certain Verilog
time step is also O(1), so the total time for the parsing of the VCD file and counting transitions
O(v), where v is the number of value changes.
Next, power is calculated for each net. This involves one floating-point multiply for each net, so
the time complexity of this is simply O(n). Next, the state value changes are parsed and transi-
tions are characterized by time. This involves stepping through an array with one entry for each
simulated time step, so the time complexity for this step is O(t), where t is the number of time
steps in the simulation.
Next, statistics are gathered by module. This involves stepping through the table of nets and doing
a strcmt3 ( ) function call for each net’s driver. Thus, O(n) s~rcmt3 ( ) ’s will be called.
A gate level simulator for power consumption analysis May 1, 1996 12
Finally, statistics are gathered by control state. This involves O(t) comparisons and additions.
Note that O(t) operations must be performed for the entire system as well as for each module for
which statistics are being gathered. Since the number of modules for which statistics are being col-
lected can be assumed to be a small constant, the total number of operations that must be done to
collect state statistics is O(t).
Thus, the total worst-case rime complexity of the VCD parser would be O(nlogn) for large
designs with a shorter simulation time or O(v) for lo nger simulations. The memory usage for
shorter and medium size simulations is dominated by O(n) because each net is represented by a
class instantiation. The memory usage for a very long simulation would be O(t), as one double
and one ±nt are malloc’ed for each Verilog time step.
power__~ar set is the portion of the power estimation tool that consumes the most CPU time, as
will be shown in Section 4.0. It also is the largest piece of code, implemented in over 2200 lines of
C++.
3.3 Estimating capacitance
Accurate capacitance estimations are essential for accurate power estimations. In the set of tools
described above, capacitance value are extracted from the design once a layout has been com-
pleted. Although this does give very accurate estimates of capacitance, it is not the only way such
estimates could be produced. Since t~ower_pars er simply reads in a file of net names and their
associated capacitances, the tool is highly flexible, allowing designers to use either high level esti-
marion techniques such as those presented in [Don79]lFeu82][Lan94] or to back-annotate capaci-
tances by extracting them from the layout once it has been done. If high level estimation
1. v will always be greater than or equal to t, since a time step is executed in a simulation if and only if oneor more value changes occur during that time step. Therefore, O(v) dominates O(t).
A gate level simulator for power consumption analysis May 1, 1996 13
techniques were used, the only change to the tools above would be the modification or omission of
parsespf.
A gate level simulator for power consumption analysis May 1, 1996 14
4.0 Case study: DCT
We have run twelve different versions of the one-dimensional discrete cosine transform. These
designs were created by Coumeri and the behavioral level differences in the designs can be seen in
Table 1 [Cou96]. Note that "# of partitions" refers to the number of pipeline stages (each with its
own control logic) in the design. DCT1 through DCT8 are not pipelined; they have only one parti-
tion. "# of mult" is the number of multipliers in the design. "memory prefetch" is "yes" if values
for iteration i+1 of the loop are being fetched while iteration i is being executed. This column does
grey code grey code# of memory state memory
example partitions # of mult prefetch encoding access # of nets
DCT1 1 3 no no no 15810
DCT2 1 3 no yes no 15879
DCT3 1 2 no no no 12196
DCT4 1 2 no yes no 12226
DCT5 1 3 yes no no 16848
DCT6 1 3 yes yes yes 16833
DCT7 1 2 yes no no 13306
DCT8 1 2 yes yes yes 13280
DCT9 6 3 ..... no no 21327
DCT10 6 3 ..... yes yes 21346
DCTll 2 3 ..... no no 16725
DCT12 2 3 ..... yes no 16745
TABLE 1. DCT Descriptions
not apply to the pipelined designs (DCT9 through DCT12) since they are already executing multi-
ple iterations of the loop at the same time. "grey code state encoding" and "grey code memory
accesses" are "yes" if the states and memory addresses, respectively, are accessed in grey code
order. Finally, "# of nets" is the number of physical nets in each design.
Table 2 shows the CPU time and memory usage for headstripper, statestripper,
i is tdrivers and power__parser being executed on three of the DCT designs. These three
designs were chosen because DCT3 is the smallest example (i.e. fewest number of nets), DCT9
one of the largest examples, and DCT1 is somewhere in between. The CPU time is reported in
A gate level simulator for power consumption analysis May 1, 1996 15
minutes and seconds. Note that these results are dependent on which statistics are being collected.
headstripper statestripper listdrivers power_parser
CPU time Memory CPU time Memory CPU time Memory CPU time Memory
DCT1 0:21 92 KB 6:56 100 KB 1:32 52196 KB 24:44 40940 KB
DCT3 0:17 92 KB 7:42 100 KB 1:19 50368 KB 20:05 39408 KB
DCT9 0:26 92 KB 5:33 100 KB 2:20 60000 KB 32:59 45212 KBTABLE 2. Execution times and memory usage for the power estimation for three of the DCT designs
As stated in Section 3.2.6, collecting statistics by module and state both involve some overhead in
computation time. For each of the examples, the same statistics were being collected: total energy,
energy consumed by each multiplier, energy consumed by all of the adders and subtracters, energy
consumed by registers, energy consumed by random glue logic (everything except the above mod-
ules) and total energy by control state. The statistics were gathered on an IBM RS/6000 worksta-
tion with 384 MB of memory.
For our DCT examples, power_parser never took more than 35 minutes of CPU time on the
RS/6000, and most of the time was spent during the actual parsing of the VCD file. Also, the
power estimation environment never required more than 60 MB of memory.
The disk space overhead for the output files is presented in Table 3. For the 12 versions of the DCT
we tested, ranging from 12,000 to 21,000 nets, heads t ripper and s t a t es tripper
involved an overhead of 1.0 to 1.7 megabytes of hard disk storage for the header information and
56 to 140 kilobytes for the state information. Note that the input for the DCT’s was 25 8x8 blocks
of image data, and that the disk space overhead for state information scales linearly with the num-
ber of blocks in the simulation. The header information is unaffected by the length of the simula-
A gate level simulator for power consumption analysis May 1, 1996 16
tion. The hard disk space overhead involved in storing the list of drivers is 297 to 490 kilobytes for
the DCT examples. This is independent of the length of the simulations.
headstripper statestripper listdrivers power_parser
DCT1 1216 KB 116 KB 395 KB < 1 KB
DCT2 1223 K_B 116 KB 396 KB < 1 KB
DCT3 1000 KB 137 KB 290 KB < 1 KB
DCT4 1004 KB 137 KB 291 KB < 1 KB
DCT5 1335 KB 81 KB 412 KB < 1 KB
DCT6 1335 KB 81 KB 412 KB < 1 KB
DCT7 1129 KB 101 KB 309 KB < 1 KB
DCT8 1126 KB 101 KB 308 KB < 1 KB
DCT9 1723 KI3 55 K13 479 KB < 1 KB
DCT10 1726 KB 60 KB 479 KB < 1 KB
DCTll 1318 KB 73 KB 411 KB < 1 KB
DCT12 1320 KB 73 KB 411 KB < 1 KB
TABLE 3. Disk space usage for power estimation of the 12 DCT designs
4.1 Results
Figure 3 gives a comparison of the results for the 12 designs. Since all 12 designs were not laid-
out, the results shown assume that all nets have the same capacitance, i. e. only transition counts
are reported. One interesting thing we immediately noticed from these results is that using grey
code for the state encoding produced a 10% or better reduction in transition count for all of the
designs. Also, notice that pipelining the design reduces power consumption.However, for these
designs it would be arguable if pipelining into six stages, since in DCT9 and DCT10, instead of
two stages, as in DCT11 and DCT12, is worthwhile as power savings is about 10% while the com-
plexity (as measured by number of nets) has increased over 30%. Still, if power savings is more
important than minimizing complexity of the circuit, the six stage pipeline designs would be
desired.
Figure 4 shows the transition count by control state for the DCT 1 and DCT 2 designs. Again, the
value of using grey code for state encoding can be seen here. By comparing the states in DCT1 and
A gate level simulator for power consumption analysis May 1, 1996 17
Transitions by Module for Twelve DCT Designs
8000000
7000000
6000000
5000000
4000000
~3000000
2000000
1000000
¯ DCT1¯ DCT2" DCT3[] DCT4¯ DCT5~a DCT6
¯ DCT7¯ DCT8¯ DCT9¯ DCT10[] DCT11¯ DCT12
Module
FIGURE 3. Comparison of 12 DCT designs
DCT2 that perform the same functions, we see that the approximate 10% reduction is apparent in
all similar states. (Remember that because the states of DCT2 are grey coded, DCT1 state 2 is the
same as DCT2 state 3, DCT1 state 3 is the same as DCT2 state 2, DCT1 state 4 is the same as DCT
state 6, etc.)
Transitions by Control State for Two DCTDesigns
140000012000001000000800000600000400000200000
0 1 2 3 4 5 6 7
Control State
FIGURE 4. Transition count by control state for two DCT designs
A gate level simulator for power consumption analysis April 30, 1996 18
Finally, the DCT 1 design was laid-out and capacitance values were extracted to give us the results
in Figure 5. Notice that although the transition counts appear to be a good predictor of energy for
the multipliers, transition counting alone underestimates the amount of power consumed by ran-
dom logic and overestimates the power consumed by the adders and subtracters. This makes sense
as one can imagine that the random logic would often be driving fairly long nets with a large
capacitance, while the nets inside the adders and subtracters would be very short 0.e. low capaci-
tance) most of the time. The nets inside the array multipliers would not be as short as those inside
the adders nor as long as those in the random logic. Correspondingly, the transition counts for the
multipliers are a better predictor of energy consumed than for either the random logic or the adders
and subtracters.
The transition counts in Figure 3 and Figure 5 differ because different target cell libraries were
used. Also, because the library used for Figure 5 had more accurate timing models, more glitching
occurred within the circuit. As a result, only 16 8x8 blocks were able to be simulated because of
the larger size of the VCD file. (The 16 block example produced a VCD file four times larger than
the 25 block examples that assume equal delay for all cells.) Thus, the results in Figure 3 and
Figure 5 should not be compared against each other.
Our gate level power estimation tool has produced estimations for 12 different DCT implementa-
tions. These estimations have allowed us to determine the relative impact of several high level
transformations, such as grey coding state assignments, resource sharing and pipelining, on
dynamic power dissipation for these designs.
A gate level simulator for power consumption analysis May 1, 1996 19
Transitions and Energy by Module for DCT1
10000000 2.50E-05
9000000
8000000
7000000
6000000
5000000
4000000
3000000
2000000
1000000
0
2.00E-05
1.50E-05 ~o
1.00E-05 ~
5.00E-06
O.OOE+O0
Module
Im~ TransitionsiI,,,~Energy I
FIGURE 5. Power estimates including capacitance information for one DCT design (DCT 1)
A gate level simulator for power consumption analysis April 30, 1996 20
5.0 Conclusions and Future Work
We have designed a gate level tool that estimates power consumption and correlates the results
with the original register-transfer level specifications. It reports power estimates as a function of
control state as well as functional module, and it can accept capacitance estimates from layout or
other tools.
Currently, we are using this tool to judge the impact of high level transformations, such as pipelin-
ing and varying the amount of resource sharing and parallelism, on dynamic power dissipation for
the 12 designs of the DCT algorithm. In the near future we will also estimate power consumption
for the Aurora RISC microprocessor core, reporting results by functional units and by instruction.
As our low power designs efforts continue, this tool will be used to help pinpoint areas where
power-saving optimizations are most needed and to verify the accuracy of existing statistical
power estimation techniques.
Eventually, we hope to categorize various high level transformations as to which types of transfor-
mations are most likely to decrease power consumption, or the power-delay product, of both stan-
dard CMOS and QuadRail systems.
A gate level simulator for power consumption analysis May 1, 1996 21
Appendix A - User’s manual
In the following sections, I shall outline how to use each subprogram of the power estimation tool.
Please refer to Figure 2 on page 9 to see how each of these tools are tied together.
A.1 headstripper
heads tripper reads the VCD file from s tdin and outputS the header file to s tdout. To
invoke it from [he command line, you would enter:
% headstripper < vcdfile > vcd.head
or, if the VCD file is compressed:
% zcat vcdfile.gz [ headstrip~er > vcd.head
A.2 statestripper
statestripper alSO reads the VCD file from stdin and outputs to stdout. It also takes as
an argument the hierarchical Verilog name of the net with the value of the control state. As men-
tioned in Section 3.2.3 on page 10, the net must be a single multi-bit net and not a concatenation of
nets. To invoke the program from the command line, you would enter:
% statestripper top.foo.bar. CSTATE < vcdfile > vcd. state
or if the VCD file is compressed:
% zcat vcdfile.gz I statestripper top.foo.bar.CSTATE > vcd. state
A.3 l)arsespf
parsesp f also reads from s tdin and outputs to s tdout. It takes as an argument the hierarchi-
cal prefix to be added to the net names present in the SPF file. For example, in our DCT examples,
the DCT was laid-out, but in the Verilog source that passes input vectors to the DCT the DCT
module was named top. dct_l. To invoke parsespf in this case we used:
% parsespf top.dct_l. < spffile > strippedspf
A gate level simulator for power consumption analysis May 1, 1996 22
% listdrivers-v ../mod.v+turbo+3
Note that the trailing period on tOp. dct_l, is necessary. If multiple modules were laid-out,
then par s e sp f would be executed for each and then all of the SPF files could then be concate-
nated together as follows:
% first_strippedspf second_strippedspf > final_strippedspf
A.4 listdrivers
1 i s t dr ive r s is a PLI routine and should be invoked in the same way as the original simula-
tion. One additional argument must be added +dump_ followed by the name of the VCD header
file. For one of our DCT simulations, we invoked 1 i s tdr iver s as follows:
+dump_dct.head sim.v -v dctlf.map2.v \-v ../ms0803vcells_mosis -v ../ms080_3vprims \
The output of listdrivers a list of the driver nets in the file DRV. list.
A.5 power_parser
power_parser also accepts input from stdin and outputs to stdout. There are a number of arguments
that can be passed to power_parser, outlined below.
¯ -fdrv driverfile " name of the driver file created by listdrivers
-fhead vcd. head " name of the VCD header file created by heads tripper
-fstate vcd. state ¯ name of the VCD state file created by statestripper
- f s sp f s t r i pp e d s p f ¯ name o f the s~ipped SPF file created by p a r s e s p f
-state top. foo. bar. CSTATE ¯ name of the control state net. The net name should be
the same as the one used in statestripper.
A gate level simulator for power consumption analysis May 1, 1996 23
¯ -net top. foo. bar. \* "names of nets to determine energy consumption for. In this case,
statistics would be gathered for the module top. foo. bar. Note that both leading and trailing
¯ ’s are legals, but internal *’s are not. Note also that the * must be escaped with the \ so that
the shell does not try to expand it. Any number of -net arguments is legal.
As an example of the use of power_s)arser, the following command was used to get the statis-
tics displayed in Figure 5, "Power estimates including capacitance information for one DCT
design (DCT 1)," on page 20:
% zcat ../tmp/dctll.gz I ../parser -fdrv dctll.drv \
-fhead dctll.head -fstate dctll.state -state top.dct_l.CSTATE \
-net top.dct_l.multi* -net top.dct_l.mult_l\* \
-net top.dct_l.mult_2\* -net top.dct_l.mult_3\* \
-net top.dct_l.U\* -net top.dct_l.r\* -net \*_reg\* \
-net top.\* -fsspf dctll.spf > dctll.results
A gate level simulator for power consumption analysis May 1, 1996 24
References
[Cho94] T. Chou, K. Roy and S. Prasad, "Estimation of circuit activity considering signalcorrelations and simultaneous switching," Proceedings of ICCAD 94, pp. 300-303, Nov. 1994.
[Cor90] T. H. Cormen, C. E. Leiserson and R. L. Rivest, Introduction to algorithms, NewYork: McGraw-Hill Book Company, pp. 244-259, 1990.
[Cou96] S. L. Coumeri, private communication, April 1996.
[Epi96] Epic Design Technologies, Inc., http://www.epic.com/powermill.html, 1996.
[Dev90] S. Devadas, K. Keutzer and J. White, "Estimation of power dissipation in CMOScombinational circuits," Proceedings of Custom IC Conference 90, pp. 19.7.1-19.7.6.
[Don79] W. Donath, "Placement and average interconnection lengths of computer logic,"IEEE Transactions on Circuits and Systems, pp. 272-277, April 1979.
[Feu82] M. Feuer, "Connectivity of random logic," IEEE Transactions on Computers, pp.29-33, Jan. 1982.
[Gho92] A. Ghosh, S. Devadas, K. Keutzer, and J. White, "Estimation of average switch-ing activity in combinational and sequential circuits," Proceedings of DAC 92, pp.253-259, 1992.
[Kri961 R. K. Krishnamurthy, I. Lys and L. R. Carley, "Static power driven voltage scalingand delay driven buffer sizing in mixed swing quadrail for sub-IV I/O swings,"submitted to IEEE/A CM International Symposium on Low Power Electronics andDesign 96, August 1996.
[Lan931 R E. Landman and J. M. Rabaey, "Power estimation for high level synthesis,"Proceedings of EuroDAC 93, pp.361-366, Feb. 1993.
[Lan941 P. E. Landman, "Low-power architectural design methodologies," ElectronicsResearch Laboratory, College of Engineering, University of California, Berkeley(UCB/ERL M94/62), 1994.
lLan95] E E. Landman and J. M. Rabaey, "Architectural power analysis: the dual type bitmethod," IEEE Transactions on VLSISystems, pp. 173-187, June 1995.
[Mar94] R. Marculescu, D. Marculescu and M. Pedram, "Switching activity analysis con-sidering spatiotemporal correlations," Proceedings oflCCAD 94, pp. 294-299,Nov. 1994.
[Nag75] L. W. Nagel, "SPICE2: a computer program to simulate semiconductor circuits,"Technical report, University of California, Berkeley (ERL-M520), 1975.
[Naj91] E Najm, "Transition density, a stochastic measure of activity in digital circuits,"Proceedings of DAC 91, pp. 644-649, June 1991.
A gate level simulator for power consumption analysis May 1, 1996 25
[Pow90]
[Sa189]
[San95]
[Wes85]
S. R. Powell, E M. Chau, "Estimating power dissipation of VLSI signal process-ing chips: the PFA technique," VLSI Signal Processing IE, pp. 250-259, 1990.
A. Salz and M. Horowitz, "IRSIM: an incremental MOS switch-level simulator,"Proceedings of DAC 89, pp. 173-178, 1989.
R. San Martin and J. E Knight, "Power-profiler: optimizing ASICs power con-sumption at the behavioral level," Proceedings of DAC 95, pp. 42-47, June 1995.
N. H. E. Weste and K. Eshraghian, Principles of CMOS VLSI Design, Reading,MA: Addison-Wesley Publishing Company, pp. 147-149, 1985.
A gate level simulator for power consumption analysis May 1, 1996 26