chapter 4 high-level power estimation - politecnico...

Chapter 4

High-Level Power EstimationThis chapter presents a conceptual framework suitable for achieving accurateand efficient estimation of power dissipation for the hardware part of embeddedsystems. For our analysis, the systems have been described in VHDL at thebehavioral and Register Transfer levels. The goal is to provide to the designerthe capability of analyzing and comparing different solutions in the architecturaldesign space, before the synthesis task. The analytical power model ishierarchical, considering the different components of the target systemarchitecture, mainly the data-path, the memory, the control logic and theembedded core processor. The most relevant aspect of the proposed approach isto be quite general, since it considers a general SOC architecture, suitable formany industrial applications, as well as their single components that typicallycompose the hardware-side of an embedded system. The model parameters takeninto account in our analysis only concern the I/O behavior of the differenthardware modules, and they do not refer to the internal structure of the modules.Experimental results have been obtained by applying the proposed power modelto some industrial case studies and benchmark circuits.

4.1. Introduction

Power estimation methodologies should be provided at several abstraction levels in the

HW/SW co-design flow. Circuit and logic-level power estimation techniques are no more

sufficient, due to the high complexity and high integration levels of digital systems. Accurate

low-level estimation techniques present some limitations due to the need to cope with circuit

complexity in an acceptable design time. Moreover, low-level estimation techniques can be

applied only during the last design phases, when a circuit or logic-level description is already

98

available. However, a re-design process at these levels could be very expensive and time

consuming.

Hence, high-level power estimation is the key issue in the early determination of the power

budget for embedded systems, being unfeasible to synthesize every design solution down to

the gate, circuit and layout levels in a reasonable time. The goal is to meet the design turn-

around time, while exploring the architectural design space widely, and to early re-target the

architectural design choices. Accuracy and efficiency of a high-level analysis should

contribute to meet the power requirements, avoiding a costly re-design process. In general, the

relative accuracy in high-level power estimation is considered much more important than the

absolute accuracy, the main goal being the comparison among different design alternatives

[102].

This chapter is devoted to the definition of a power assessment framework for the HW part of

digital embedded systems ([53], [54], [55], [57]). This work focuses on the HW components

described at the highest levels of abstraction (behavioral or architectural) expressed in VHDL.

Up to now, most of the high-level descriptions are specified in a hardware description

language, such as Verilog or VHDL, along with other graphical formalisms suitable for

describing the functional behavior at the system-level, such as timing diagrams, State

Transition Graphs for Finite State Machines, Statecharts, etc. [76], [134]. In particular, VHDL

has become the de-facto standard in the European design community for the hardware

description and for the most part of the commercial design entry, synthesis and simulation

tools.

The main advantage of VHDL is related to the possibility of specifying the system behavior

by using a mixed description at different abstraction levels: behavioral, Register-Transfer and

structural. Therefore, VHDL provides high flexibility during both the design description and

the simulation phases. Furthermore, VHDL supports a hierarchical design approach, where

the description of the elements composing the hierarchy, properly connected, perform the

global functionality. The hierarchical approach provides also the possibility to use a mixed

description composed of behavioral, Register-Transfer and structural parts at the different

hierarchical levels.

Other advantages of VHDL are related to the possibility of easily specifying both the data-

path and the control-path of the system and to support the modular design approach. Hence,

Chapter 4. High-Level Power Estimation 99

VHDL allows the designer to re-use existing components. In fact, VHDL supports the

definition of functions and procedures, to decompose a complex description into smaller and

simpler functional units. These functional units can be organized as independent files, which

can be compiled and verified separately, thus supporting the definition of a library of re-

usable cells and macro-cells. Finally, VHDL provides also the complete independence with

respect to the technology used and the mapping between a given entity and different

architectures, through the configuration approach.

The aim of this chapter is to provide a conceptual analysis framework for accurate and

efficient estimation of power dissipation for the HW-bound part of embedded systems

described at the behavioral and RT levels. The availability of a power analysis tool at these

levels of abstraction is of paramount importance to obtain early estimation results, while

maintaining an acceptable accuracy. In fact, the behavioral and RT-level descriptions, based

on VHDL, are the design entry point for the majority of embedded systems and IC designs.

The HW partition is usually the more complicated part to be estimated at the high-level with

an acceptable precision, due to its heterogeneous nature. Nevertheless, the most relevant

aspect of the proposed approach is to be quite general, since it considers a general SOC

architecture, suitable for many industrial applications, as well as their single sub-parts, that

typically constitute the HW-side of an embedded system. The model parameters considered in

our analysis only concern the I/O behavior of the different HW modules, and they do not refer

to the internal structure of the modules.

The most important value-added has been the introduction of a third dimension, power, to the

speed versus area design space, where the architectural design exploration is usually carried

out. The main goal is to provide the designer the capability of analyzing and comparing

different solutions in the architectural design space, before the synthesis task. In fact, the

relative power figures can be used to guide the designer in exploring the relative impact of the

different design alternatives on the quality of the final design rather than to provide absolute

power data.

In the proposed approach, the analysis is based on a probabilistic estimation of the switching

activity. The proposed model accurately accounts for both the switching activity and the

physical capacitance for all the parts composing the embedded system architecture.

100

Experimental results derived from the application of the proposed model to a set of

benchmark and industrial circuits have shown the effectiveness of the approach as a relative

power indicator.

The chapter is organized as follows. The discussion starts by presenting the most significant

research works related to high-level power estimation in Section 4.2. Then the power

estimation model, we are focusing on, is introduced in Section 4.3, while the details related to

the different components of the target system architecture have been described from Section

4.4 to Section 4.7. Implementation aspects of the proposed model and experimental results

obtained from some benchmark circuits and industrial case studies are reported in Section 4.8.

Finally Section 4.9 contains some concluding remarks along with some considerations on

future developments of the research.

4.2. Previous Work on High-Level Power Estimation

General surveys of power estimation techniques at different abstraction levels can be found in

[49], [35], [137], [165] [102], [118]. Power estimators operating at the different levels of the

design flow are the fundamental elements for a power-oriented design methodology to

provide a feedback on the quality of a particular design solution. This approach implies, at

each level, a loop for the exploration of various design alternatives. Up to now, several power

estimation models have been proposed in the literature at the gate, circuit and layout levels. At

these levels, the state-of-the-art can be considered mature enough and most of the EDA

vendors provide effective power analysis tools.

However, the application of such tools is not reasonable for designs including millions of

transistors. Moreover, it is more and more important to be able to estimate power figures

during the early design stages, to avoid costly re-design processes. For high-level power

estimation, the relative accuracy is much more important than the absolute accuracy, since the

goal of the analysis is to know whether one alternative is better than another one. The aim is

the wide exploration of the architectural design space. Despite of the increasing interest in the

design exploration and estimation at the highest levels of abstraction, a few papers have been

published addressing the power estimation problem at high-level until recently. State-of-the-

art surveys of the high-level power estimation techniques have been presented in [118], [102].


According to the survey of Landman in [102], high-level power estimation techniques related

to the HW part of a system can be classified as behavioral-level and architectural-level

techniques. At higher abstraction levels, power exploration tools at the SW-level and system-

level can be used to identify power metrics associated with both the SW- and HW-bound parts

to guide the system-level partitioning, as shown in Chapter 3. As moving up in the abstraction

levels, the estimation process becomes much more difficult, since the knowledge of some

design characteristics is very limited as well as the typical activity of the hardware resources.

In general, power models at the highest levels do not provide a high degree of absolute

accuracy, the goal being limited to capture general trends.

4.2.1. Behavioral-Level Power EstimationTypical approaches at the algorithmic- or behavioral-level assume to adopt some architectural

styles or templates in order to obtain power estimates based on the exploration of a limited set

of design solutions. Essentially, the behavioral approaches differ on the strategy adopted for

the activity prediction: the behavioral methods can be classified as static and dynamic activity

prediction techniques [102]. The goal of the former techniques is the estimation of the access

frequency of different HW resources, by statically analyzing the behavioral description of the

functions to be implemented [125], [33]. The latter techniques are based on a dynamic

profiling to determine the activation frequencies of various resources and the memory

accesses [96], [158].

Mehra and Rabaey [125] have developed a power estimation strategy based on a static

profiling of the Control Data Flow Graph (CDFG) representing the design behavior. The

analysis has been carried out in the context of the HYPER-LP high-level synthesis system

targeting DSP-oriented applications. The power dissipated by some HW resources, such as

data-path modules, has been analytically estimated from the CDFG. Conversely, for other

modules, such as interconnects and controllers, for which the power information available at

the behavioral-level is not sufficient, statistical models were built to estimate power based on

a stochastic study on several ASICs. Basically, the power associated to a generic hardware

resource has been estimated as:

P = 1 / 2 Na Ca V 2dd fs (1)

102

where Na is the number of resource accesses over the computational period, Ca the average

capacitance switched per access and fs the sampling frequency. The capacitance estimates

have been obtained by the empirical characterization of fixed-activity models of the different

HW resources. The number of resource accesses have been analytically calculated from the

algorithm for the execution units, the registers and the memories, while they have been

determined statistically from benchmarks for the interconnections and the control logic. Then,

the estimation models have been included into an exploration tool that, given the CDFG

description of an algorithm and a library of hardware modules, explores the space of the

available solutions for different values of clock periods and supply voltages. The results have

been compared with an architectural-level power estimator, called Stochastical Power

Analysis (SPA) [97], on 23 different chips, showing an average error of approximately 20%.

Dynamic activity prediction at the behavioral-level is based on a dynamic profiling to

determine the activation frequencies of various resources. During the simulation of a user-

supplied set of input patterns, the activities related to the frequency of various types of

operations and memories accesses are gathered. These access frequencies are then plugged

into a model similar to those used in the static approach. Examples of the dynamic approaches

are the Profile-Driven Synthesis System (PDSS) [96], that receives as input a behavioral sub-

set of VHDL, and the Power-Profiler approach described in [158].

The main advantage of dynamic versus static approaches is a higher accuracy, since data

dependencies are taken into account, whereas the main disadvantages are related to the slower

efficiency in terms of speed and the need of a set of user-supplied typical input patterns.

4.2.2. Architectural-Level Power EstimationAccording to the taxonomy contained in [102], there are two classes of techniques operating

at the architectural or RT level: the analytical and the empirical techniques.

The analytical methods aim at relating the power consumption to the physical capacitances

and the switching activities of the design nets. Analytical techniques are in turn composed of

complexity-based models and activity-based models. The former models consider the design

complexity of each part of the design, in terms of equivalent gates, as a measure of the

physical capacitance, while the latter models exploit the concept of entropy, derived from the

information theory, as a measure of the average transition activity in a circuit.


More specifically, in complexity-based models ([131], [114]), first the number n of equivalent

gates contained in each design function is specified in a macro-module library, second, the

power estimates are obtained by multiplying n by the average power consumed by each

equivalent gate. Main advantages of the complexity-based techniques are related to the fact

that they require as inputs only a few design parameters, such as memory size or count of

equivalent gates. Nevertheless, they do not model circuit activity accurately, since an overall

fixed activity factor is typically assumed. As a matter of fact, activity factors vary with block

functionality and with the data being processed.

In activity-based models, the average power is estimated as the product of the area,

considered as a measure of the average nodes capacitance, and the entropy, considered as a

measure of the average activity in a circuit [138], [141], [119], [116], [113]. In the

information theory context, the entropy represents the quantity of information carried by a

random variable or process. Here, the basic idea is to try to associate the power of a block

with the amount of computational work it performs and the entropy is considered as the

metric for measuring the computational work. In these approaches, the logic functionality of a

module is known, though there are no notions about the implementation structure of the

modules.

In [141], the average power dissipated by a module is considered as proportional to the

product of the average capacitance, C, and average switching activity, S. Then, the average

circuit area, A, is used as a measure of C and the average entropy, H, as a measure of the

activity S:

Pave ∝ C × S ∝ A × H (2)

The correlation existing between the entropy of a signal x and its switching activity ESW (x) is

approximately given by: H(x) ≈ 2 ESW (x). Thus S can be approximated with H, where H

indicates the average value of H (x) over all the nodes of the module. Najm proposed in [138]

a simple formula to derive H from the input/output behavior of a module. The area A of the

block is also expressed in terms of the number of primary inputs and the entropy associated to

the primary outputs.

Main drawbacks of the entropy-based approaches are related to their limited accuracy, since

no timing information enters in the above expressions (i.e. no glitching power is accounted

104

for). Furthermore, the implicit assumption is that the capacitance is distributed uniformly over

all the circuit nodes. As for complexity-based models, the main advantage of the entropy-

based methods is that they require a very limited amount of information as input.

The empirical methods are based on the power measures of existing implementations, then a

macro-modeling approach is used to derive models from these measurements. The empirical

methods can be sub-divided into fixed-activity models and activity-sensitive models. The

former models disregard the influence of data-activity on power, while the latter consider the

effects of statistics related to data and instructions activity on power.

An example of the application of the fixed-activity macro-modeling strategy is the Power

Factor Approximation (or PFA) method presented in [151], while some examples of activity-

sensitive models are the ESP tool described in [160] and the SPA tool proposed in [99], [100],

[101]. The SPA approach, based on activity profiling and RT-level VHDL simulation, uses

two different types of activity models: one for the data-path and one for the control path. The

data-path model is referred as the Dual Bit Type (or DBT) model, whereas the control model

is indicated as Activity-Based Control (or ABC) model. In the DBT model, the basic

assumption is that fixed point two’s-complement data streams are characterized by two

distinct activity regions: the random activity of the least significant bits (LSB’s) and the

correlated activity of the most significant bits (MSB’s). The data bits (LSB's) exhibit activity

similar to uniformly distributed white noise, while the activity of sign bits (MSB's) depends

on the sign transition probability, which is related to the temporal data stream correlation.

Thus different coefficients, derived empirically, are used to characterize the capacitance

switched in the data and sign regions. The ABC model for the power consumption of the

control paths has been presented in [101], using three implementation styles: the ROM-based

controller, the PLA-based controller and the random logic controller.

Other macro-modeling approaches have recently been proposed to estimate the circuit power

on a cycle-by-cycle basis. Addressing this need, a cycle-accurate power macro-modeling

approach has been introduced in [191] and [83].

In general, the main advantage of empirical models is that they have a strong correlation to

real implementations.


4.2.3. Pattern-DependencyA common characteristic of the power estimation problem at the different abstraction levels is

that the average power is strongly related to the switching activity of the circuit nodes. Such a

fact has been indicated in [137] as stating that power estimation is a pattern-dependent

process. More in detail, the input pattern-dependency of the power estimation approaches can

been classified as strong or weak pattern-dependency.

The typical methods for power estimation based on extensive circuit simulation have been

indicated in [137] as strongly pattern-dependent process. Main advantages of these simulation

techniques derive from their accuracy and wide applicability. However, to obtain a complete

and accurate power estimation, the designer should provide a comprehensive amount of input

patterns to be simulated, thus making this approach very time consuming and computationally

expensive. Therefore the simulation approach is almost impossible to apply to most of the

designs, due to their increasing complexity.

To avoid the need of a large amount of input patterns, the weakly pattern-dependent

approaches [137] require input probabilities. In this case, the estimation results will depend on

the probabilities supplied by the designer, reflecting the typical behavior of the input signals.

Both probabilistic techniques and statistical techniques operating at low levels of abstraction

have been presented in Section 2.5. Probabilistic techniques suitable for combinational

circuits require user-supplied input probabilities to solve the pattern dependency problem,

while statistical techniques use randomly generated input patterns to simulate the circuit

repeatedly, then using statistical mean estimation techniques to stop simulation following a

criterion to determine the closeness to the average power.

Other approaches for reducing the power simulation time are based on the compaction of the

long stream composed of the typical input vectors by using probabilistic automata [121],

[122], [123]. The basic idea is to define a Stochastic State Machine (SSM), which captures the

relevant statistical properties of the given input stream and then to excite this machine with a

reduced number of random inputs so that the output sequence of the machine is statistically

equivalent to the initial one. The significant statistical properties are signal and transition

probabilities and first-order spatio-temporal correlation among bits and across consecutive

time frames. Main goal of these compaction techniques is to reduce the length of the input

106

stream used for power characterization by one to four orders of magnitude, while preserving

an acceptable level of accuracy (i.e. approximately 5%).

In general, the methods proposed for high-level power estimation have not yet achieved the

maturity necessary to enable their use within current industrial CAD environments. Our work

is an attempt to fill such a gap, aiming at providing a high-level power model, based on

VHDL descriptions, to cover the heterogeneous modules composing the basic architecture of

embedded systems.

4.3. The High-Level Power Estimation Model for the HW Part

The power model for the HW-bound part is based on the VHDL description of the ystem at

the behavioral and RT levels. The analysis is based on the probabilistic estimation of the

internal nodes switching activity. The proposed approach is based on the following general

assumptions:

• the supply and ground voltage levels in the ASIC are fixed, although it is worth noting the

impact of supply voltage reduction on power;

• the design style is based on synchronous sequential circuits;

• the data transfer occurs at the register-to-register level;

• the Zero Delay Model (ZDM) has been adopted, thus ignoring the contribution of glitches

and hazards to power.

The inputs for the estimation are as follows:

• the ASIC specification consisting of a hierarchical VHDL description implementing the

target system architecture introduced in Chapter 3;

• the allocation library composed of the available components implementing the macro-

modules (such as adders, multipliers, etc.) and the basic modules (such as registers,

multiplexers, logic gates, I/O pads, etc.). Every component model includes the description

of the logic behavior, the input capacitance, the area and the power characteristics;

• the technological parameters such as frequency, power supply, derating factors

(accounting for the variations in process, voltage and temperature), etc.;

• the switching behavior of the ASIC primary I/Os.


The power model is an analytical model, which attempts to relate the average power

dissipation of the VHDL descriptions to the physical capacitance and the switching activity of

the nets. The estimation approach is hierarchical: at the highest hierarchical level, ad-hoc

analytical power models for each part of the target system architecture are proposed; these

models are in turn based on a macro-module library, at the lowest hierarchical levels.

Furthermore, to avoid the need of a huge amount of input patterns, our approach is weakly

pattern-dependent, requiring user-supplied input probabilities, reflecting the typical input

behavior and derived from the system-level specification.

In the proposed single ASIC architecture, the total average power dissipated, PAVE, is given

by:

PAVE = PIO + PCORE (3)

where PIO and PCORE are the average power dissipated by the I/O nets and the core internal

nets, respectively. The value of PAVE can be multiplied by the derating factor, δ, taking into

account the effects of the variations of the fabrication process and the operating conditions

(voltage and temperature) on the power values contained in the target library.

The power model of the core logic is based on the models of the different components of the

target system architecture, therefore the PCORE term can be in turn expressed as:

PCORE = PDP + PMEM + PCNTR + PPROC (4)

where the single terms represent the average power dissipated by the data-path, the memory,

the control logic and the embedded core processor. The power models related to the single

terms in the above equations will be detailed in the following sections, except for the PPROC

term, that is considered to be part of the power dissipated by the SW-bound part, as detailed in

Chapter 3.

4.4. PIO Estimation

Although a pre-synthesis analysis is performed, we assume the knowledge of the ASIC

interface in terms of primary I/O pads characteristics and related switching activity from the

system-level specifications. The set S of input, output and bi-directional nets of the ASIC can

be partitioned into N sets, such as: S = { s1, s2, ..., sk, ..., sN} , where the k-th set sk is composed

108

of the same type tk of I/O pads. Considering for example a set of output pads, the average

power of the set sk can be estimated as:

P s

k = ∑

i=1

nk

Pi (Ci) TRi (5)

where:

• nk is the number of output pads in the set sk;

• Pi (Ci) is the average power consumption per MHz of the i-th output pad in sk. The value

of Pi is computed as a function of the output load Ci at a given reference frequency f0. This

value is tabulated in the selected library (such as in [14] as (Pf0 / f0) expressed in

[µW/MHz] as a function of Ci expressed in [pF]; Ci is the output load of the i-th output pad

expressed in [pF], derived from the system-level specifications. Note that the previous

equation is valid in a range of f0;

• TRi is the toggle rate of the i-th output pad, derived from the system-level specifications.

Similarly, the average power of the input pads can be computed, depending on the estimated

internal standard loads and input ramptime.

4.5. PDP Estimation

The average power dissipated by the data-path can be expressed as:

PDP= PREG + PMUX + PFU (6)

where the single terms represent the average power dissipated by the registers, the

multiplexers and the functional units.

4.5.1. PREG EstimationThe preliminary step is the estimation of the number of required registers and, consequently,

the values of the toggle rate TRi for each of them. According to the abstraction level, such data

are either directly available from the RT-level description or the live variable analysis can be

applied to the behavioral-level specifications.

When a RT-level description is available, the number of required registers can be directly

derived from the analysis of the VHDL source code, while the corresponding switching

activity can de deducted by propagating the switching activity from the primary inputs


through the circuit architecture specified at the RT-level. Based on a pre-characterized module

library, we can then obtain an estimate of the capacitive loads driven by each register.

The analysis of a behavioral-level description is more complex, since the scheduling and

allocation passes are not yet performed. Therefore, the live variable analysis [136] has been

applied to the behavioral-level VHDL code to estimate the number of required registers and

the maximum switching activity of each register.

The algorithm examines the life of a variable over a set of VHDL code statements and it is

similar to the one proposed in [136], for the computation of the lifetime of a variable in terms

of its definition and use over a selected set of VHDL code statements. New passes have been

added to the algorithm proposed in [136], to derive information concerning the registers

switching activity. The proposed algorithm can be summarized as follows:

1. Compute the lifetimes of all the variables in the given VHDL code, composed of S

statements. A variable vj is said to live over a set of sequential code statements

{i, i + 1, i + 2, ..., i + n}, when the variable is written in statement i and it is last accessed

in statement (i + n). When a variable is written in a statement (i + k) in the set, but last

used in the same statement (i + k) of the next iteration, it is assumed to live over the entire

set;

2. Represent the lifetime of each variable as a vertical line from statement i through

statement (i + n) in the column j reserved for the corresponding variable vj;

3. Determine the maximum number N of overlapping lifetimes, computing the maximum

number of vertical lines intersecting with any horizontal cut-line;

4. Estimate the minimum number N of set of registers necessary to implement the code by

using register sharing. Register sharing has to be applied whenever a group of variables,

with the same bit-width bi, can be mapped to the same register. The total number of

registers is given by ∑i=1

N

bi ;

5. Select a possible mapping of variables into registers by using registers sharing;

6. Compute the number wi of write to the variables mapped to the same set of registers;

7. Estimate αi of each set of registers dividing wi by the number of statements S:

αi = wi / S; hence, TRi = αi fCLK.

110

Figure 1 shows an application example of this algorithm, representing the differential

equation example reported in [136]. The bold dotted line at statement 7 represents the

horizontal cut-line with the maximum number (N = 9) of vertical lines reaching or crossing it.

Thus, using register sharing, the VHDL statements can be implemented with a minimum of 9

registers. A possible mapping of variables into registers is shown in Table 1.

u1 u2 u3 u4 u5 u6 u x y y1 dx a1. while (x<a) loop

2. u1 :=u*dx

4. u3 := 3*y

3. u2 := 5*x

5. y1 :=i*dx

7. u4 := u1*u2

6. x := x+dx

8. u5 := dx * u3

10. u6 := u-u4

9. y := y+y1

11. u :=u6-u5

12 end loop

Figure 1: Live variable analysis for power estimation

Registerri

Variablesmapped to ri

Writenumber

wi

Switching act.αi

Bit-widthbi

r1 u1, u4 2 1/6 1

r2 u2, u5 2 1/6 1

r3 u3 1 1/12 1

r4 u6, y1 2 1/6 1

r5 u 1 1/12 1

r6 x 1 1/12 1

r7 y 1 1/12 1

r8 dx 1 1/12 1

r9 a 1 1/12 1

Table 1. Results of live variable analysis applied to the differential equation example.

Based on this approach, it is possible to quickly explore the architectural design space. For

example, we can evaluate the influence on power of the registers sharing, by adopting


different criteria in the choice of which one and how many variables to map on the same

register. This method enables us to estimate the number of resources allocated and

consequently the related power consumption and area occupancy.

Regarding PREG, it is worth noting that the power of latches and flip/flops is consumed not

only during output transitions, but also during all clock edges by the internal clock buffers,

depicted in Figure 2, even though the data stored in the register does not change. Thus, our

analytical model of registers takes into account both the switching and non-switching power,

the latter due to internal clock buffers. The non-switching power dissipated by internal clock

buffers accounts for approximately the 30% of the average power of the registers, as for

example in [105] for a cell-based CMOS 3.3V technology. Note that, as depicted in Figure 2,

the internal clock buffers are independent of the output load, thus the non-switching power of

latches and flip/flops is load-independent, but dependent on the clock input ramptime.

D

CP

Q

D

CP

Q

Figure 2: The latch and D flip/flop models for power estimation

Let the set of registers, S, be composed of N sets, such as: S = { s1, s2, ..., sk, ..., sN} , where the

k-th set sk is composed of the same type tk of registers. Globally, the average power of the

registers can be estimated as:

PREG = ∑k=1

N(Psk + PNsk) (7)

where:

• Psk is the average power of each set sk

• PNSk is the average non-switching power dissipated by the internal clock buffers of the

registers in the set sk,, that is the average power dissipated by the internal clock buffers

when there are no output transitions.

112

Note that the measured average power Psk, tabulated in the target library, includes also the

power dissipated by the internal clock buffers during clock edges corresponding to output

transitions. Hence the estimated value of Psk should account for a toggle rate given by the

TRsk, while the estimated values of the PNSk should consider a toggle rate of (fCLK - TRsk).

The estimated values of Psk and PNsk , for the k-th set sk , are respectively given by:

P s

k = ∑

i=1

nk

Pi (Ci) TRi (8)

P Ns

k = P0k ∑

i=1

nk

(fCLK - TRi) (9)

where:

• nk is the estimated number of registers in the set sk;

• Pi (Ci) is the average power consumption per MHz of the i-th register in sk. The value of

Pi has been computed running SPICE simulations, at a given reference frequency f0, for

different output standard loads (representing both load cells and interconnections) and

clock input ramptime. Thus the value of Pi is given as a function of the output load Ci and

the input ramptime and it is tabulated in our allocation library in [µW/MHz] as a function

of Ci, expressed in equivalent standard load and input ramptime expressed in [nsec];

• P0k is the non-switching power consumption per MHz of a single register of type tk. The

value of P0k expressed in [µW/MHz] has been computed running SPICE simulations, at a

given reference frequency f0, as a function of the clock input ramptime;

• Ci is the estimated output load of the i-th register in the set sk expressed in equivalent

standard loads;

• TRi is the estimated toggle rate of the i-th register in the set sk, obtained by using the live

variable analysis.

4.5.2. PMUX EstimationBasically, to estimate the size and number of multiplexers from the VHDL code, it is

necessary to determine the number of paths in the data-path. The approach is also based on

the definition of the power model of a 2-input non-inverting multiplexer, based on both static

signal probability of the selection net and the switching activities of the input nets. However,


a different approach has been used depending on the abstraction level of the original VHDL

description, as for the registers.

Concerning the RT-level VHDL descriptions, the architecture is already defined and thus it is

easy to derive the number and size of multiplexers in the circuits, such as the related

capacitive loads. For the switching activity estimation of the outputs of the multiplexer, we

propagate the signal probability from the primary inputs through the given architecture.

Concerning behavioral VHDL descriptions, we need to consider a possible architecture where

to map the behavioral description. The analysis of the design paths and the related notations

are similar to those performed in [136], however in the proposed approach we consider also

the paths from primary inputs to internal registers and from internal registers to primary

outputs. A path from the source component S to the target component T is represented as T <

S. Note that all memory accesses require the use of intermediate registers.

Given the target architecture represented in Figure 1, the possible paths can be classified in

the following categories:

1. primary input to register (R < I);

2. register to primary output (O < R);

3. register to register (R < R);

4. register to functional unit (U < R);

5. functional unit to register (R < U);

6. register to memory (M < R);

7. memory to register (R < M).

The algorithm used to determine the possible paths in the data-path could be easily derived

from the algorithm described in [136], but considering also the possible paths of the

categories 1 and 2.

Basically, the number and size of the selector used as input to each register is given by

computing the number of paths that have the given register as destination. Similarly to

evaluate the number and sizes of the multiplexers used as input to each functional unit. The

analysis is based on some assumptions regarding the resource allocation. For example, if we

decide to allocate a single operator to execute the operations of the same type, we need to

compute all the possible paths that have as destination the given operator. The situation is

different if several operators are allocated to execute the same operations. In this case we

114

assume an average usage of the operators. We divide the sum of all paths with the same

operator as destination among all the operators of this type allocated and we distribute the

average number of paths on all the inputs of each operator. In this way, we can estimate the

size s (ti) of the multiplexer for each input of the allocated operators of type ti such as:

s(ti) = w (ti) / 2 n (ti) (10)

where w (ti) is the number of write accesses to the operator ti (i.e. the number of paths with ti

as destination) and n (ti) is the number of operators of type ti that we suppose to allocate.

A

B

S

Z

Figure 3: The 2-input non-inverting multiplexer model for power estimation

Once the size and number of multiplexers has been computed, we derive the switching

activity of the output node of each multiplexer, given the model of the two-input non-

inverting multiplexer depicted in Figure 4. A simplified model for the maximum switching

activity of the output Z of a 2-input non-inverting multiplexer is:

αΖ = αA (1 - pS1) + αB pS

1 (11)

where:

• αA is the switching activity of input A;

• αB is the switching activity of input B;

• ps1 is the static signal probability of the selection net S.

By recursively applying the previous equation, we can obtain the switching activity of

multiplexers with larger sizes, such as the 4-input non-inverting multiplexer:

αΖ = [αA ps11 + αB (1 - ps1

1] ps21 + [αC ps1

1 + αD (1 - ps11)] (1 - ps2

1) (12)

where A, B, C, D are the data inputs, and S1 and S2 are the selection inputs.

Globally, the average power dissipated by the multiplexers can be estimated as:

PMUX = ∑i=1

NPi (13)


where N is the estimated number of multiplexers and Pi is the average power of each

multiplexer.

The value of Pi for the i-th multiplexer is given by: Pi = Pti (Ci) TRi , where Pti is the average

power consumption per MHz of a 2-input non-inverting multiplexer and TRi is the toggle rate

of the output of the i-th multiplexer.

4.5.3. PFU EstimationFor the estimation of the average power of the functional units, we use complexity-based

analytical models [102], where the complexity of each functional unit is described in terms of

equivalent gates. For the estimation of the number of equivalent gates necessary to implement

a given function of the data-path, we use a library of macro-modules such as adders,

multipliers, etc.. The library should include the estimated number of logic gates for each

macro-module, depending on the number of operands and the bit-width of each operand. Once

the number of equivalent gates for each macro-function has been evaluated, the estimated

power dissipated by the functional units can be expressed as:

PFU = ∑i=1

NPi (14)

where N is the number of macro-modules, and Pi is the power of the i-th macro-module given

by:

Pi = ni PTECH TRi (15)

where PTECH is a technological parameter expressed in [µW/(gate MHz)]; ni is the estimated

number of logic gates in the i-th macro-function; TRi is the toggle rate of the output net of the

i-th macro-module.

4.6. PMEM Estimation

A power dissipation model for a memory cell, at a low-level of abstraction, has been proposed

in [175], being:

Pmemcell = 2k/2 ( cint lcolumn + 2n-k Ctr)Vdd Vswing fclk (16)

116

where 2k is the number of cells in a row, cint is the wire capacitance per unit length, lcolumn is

the memory column length, 2n-k is the number of cells in a column, Ctr is the minimum size

drain capacitance, and Vswing is the bit line voltage swing.

Considering a fully CMOS single port static RAM, at a high-level of abstraction, we assume

to have in the target library the information related to the power consumption of a single

memory cell Pcell and of a single memory output buffer.

The average power dissipation during a read access to a single row of the array, composed of

n rows and m columns, is proportional to the inverse of the read access time ta and to the sum

of the average power dissipated by the row decoder, the m memory cells composing the i-th

row and the output buffers.

In particular, the power dissipated by the row decoder can be estimated with a complexity-

based model, where the number of equivalent gates is proportional to the product (n × lg2n)

and the load capacitance is the word line capacitance.

4.7. PCNTR Estimation

This section describes the contribution to the power consumption due to the control part of the

target sytsem architecture, described as a set of FSMs represented by STGs. The proposed

model for power dissipation of a FSM is a probabilistic model, where we approximate the

average switching activities of the FSM nodes by using the switching probabilities (or

transition probabilities) derived by modeling the FSM as a Markov chain (such as described

in Chapter 2). Given a typical implementation of a FSM, composed of a combinational circuit

and a set of state registers, as depicted in Figure 5, we consider the different contributions to

the global average power:

PCNTR= PIN + PSTATE_REG+ PCOMB +POUT (17)

where:

• PIN is the average power dissipated by the primary inputs PI;

• PSTATE_REG is the average power dissipated by the state registers;

• PCOMB is the average power dissipated by the combinational logic;

• POUT is the average power dissipated by the primary outputs.


RegistersPresent StateBit Lines

Next StateBit Lines

PrimaryInputs Outputs

CLK

Primary

LogicCombinational

Figure 4: The Finite State Machine model for power estimation.

The power estimation models dealing with each term of the above equation are described in

the following. As basic assumptions, we suppose to have the FSM description available in the

form of a State Transition Graph (STG), where each state is represented symbolically and

nothing is known on the structure of the combinational logic implementing the next state and

output functions. The input static signal probabilities and the input switching activity factors

are supposed to be given from the system-level specifications, being derived by simulating the

FSM at a high abstraction level or by direct knowledge of the typical input behavior.

Furthermore, we assume to use a Zero Delay Model for the logic gates and synchronous

primary inputs. Under these assumptions, we can ignore the effects of glitches and hazards on

the state bit lines, therefore the switching activity of the present and next state bit lines are

equal.

Given the FSM description and the input probabilities, the first step of our estimation consists

of the computation of the total state transition probabilities for each edge in the graph, by

modeling the FSM as a Markov chain and following the same method shown in Chapter 2.

The second step consists of finding a state assignment that minimizes the power dissipation,

by applying one of the state assignment methods described in Chapter 7.

4.7.1. Switching Activity Estimation for the State Bit LinesThe switching activity of the state bit lines, depends on both the state encoding and the total

state transition probabilities between each pair of states in the STG [185].

118

Let us generalize the concept of state transition probability to transitions occurring between

two distinct sub-sets of disjoint states, Si and Sj, contained in the set of states S = {s1, s2, ...,

sns}, as defined in [185]:

TP (Si ↔ Sj) = ∑si∈ Si

∑

sj∈ Sj

(Pij + Pji ) (18)

Being bi the i-th bit (1 ≤ i ≤ nvar) of the state code (called state bit) and nvar the number of state

bits ( lg2 ns ≤ nvar ≤ ns), we consider the two sets of sub-states in which the i-th state bit

assumes the value one and zero respectively. The switching activity αbi of the state bit line bi

is given by [185]:

αbi = TP (States(bi =1) ↔ States(bi = 0)) (19)

4.7.2. Switching Activity Estimation for the Primary OutputsConsidering a Moore-type FSM, the switching activity of the primary outputs can be defined

similarly to the switching activity of the state bit lines, depending on both the given output

encoding and the total state transition probabilities. In fact, in a Moore-type FSM, the total

state transition probabilities Pij between the two states si and sj are equal to the total transition

probabilities between the corresponding outputs oi and oj, where the output row vector oi

(i = 1, 2, ..., ns ) is composed of the nO primary outputs: (y1i, ..., y

li, ..., y

nOi).

Let us define the transition probability of the transitions occurring between two distinct sub-

sets of disjoint outputs, Oi and Oj, contained in the set of the outputs O = {o1, o2, ..., ons}, as:

TP (Oi ↔ Oj) = ∑oi∈ Oi

∑

oj∈ Oj

(Pij + Pji ) (20)

Being ym the m-th output bit (1 ≤ m ≤ nO) and nO the number of primary outputs, we consider

the two sets of outputs in which the m-th output bit assumes the value one and zero

respectively. The switching activity αym of the primary outputs ym is given by:

αym = TP (Outputs(ym =1) ↔ Outputs(ym = 0)) (21)

As an example, we consider the same Moore-type FSM considered in Chapter 2, where a state

encoding has been fixed. Figure 5 shows the total transition probabilities associated to each


edge. The figure shows also the steady state probabilities, the state codes and the ouput codes

associated to each node.

st2P23 = 9/58

st3

st4st1

P34 = 9/29P43 = 9/58

P42 = 9/58

P11 = 1/58

P12 = 3/58

P33 = 3/29

P21 = 3/58

P2 = 6/29

P1 = 2/29 P4 = 9/29

P3 = 12/29

01/11

00/00

11/01

10/10

Figure 5: Steady state probabilities and total transition probabilities for the example FSM with encoded states.

The switching activities of the state bit lines are given by:

αb1 = TP (S1 = {s3, s4} ↔ S2 = {s1, s2}) = 9/29

αb2 = TP (S3 = {s2, s3} ↔ S4 = {s1, s4}) = 21/29

The switching activities associated to the outputs are:

αy1 = TP (O1 = {o2, o4} ↔ O2 = {o1, o3}) = 21/29

αy2 = TP (O3 = {o2, o3} ↔ O4 = {o1, o4}) = 21/29

4.7.3. PIN EstimationAs mentioned before, let us assume that the input static signal probabilities and the input

switching activity factors are given from the system-level specifications.

The average power dissipated by the k-th primary input belonging to the set

PI = {x1, x2, ..., xk, ..., xnI} depends on the switching activity factors αxk and the input load

capacitance Cxk, the latter being proportional to the number of literals, nlitxk, that the k-th

primary input is driving in the combinational part, and the estimated capacitance Clit due to

each literal [185]. Therefore, the average power PIN can be estimated as:

PIN = ∑xk ∈ PI

Pxk (Cxk) TRxk (22)

120

where: Cxk = nlitxk Clit; TRxk = αxk fCLK and Pxk (Cxk) is the average power consumption per

MHz of the buffer cell driving the k-th input.

4.7.4. PSTATE_REG EstimationThe average power dissipated by the state registers, PSTATE_REG, can be derived by using the

switching activity αbi of the i-th state bit line bi, where 1 ≤ i ≤ nvar and the corresponding

toggle rate is TRbi = αbi fCLK. The term PSTATE_REG accounts for the switching and non-

switching power of the state registers:

PSTATE_REG = ∑i=1

nvar

(Pi + PNSi ) (23)

where nvar is the number of state registers and Pi and PNSi are the average switching and non-

switching power dissipated by each state register. As before, the switching power Pi includes

also the power dissipated by the internal clock buffers, during clock edges corresponding to

output transitions. Hence the terms Pi should account for a toggle rate given by TRbi, while the

terms PNSi should consider a toggle rate of (fCLK - TRbi).

The estimated values of Pi and PNSi are respectively given by:

Pi = Pti (Ci) TRbi (24)

PNSi = P0i (fCLK -TRbi ) (25)

where:

• Pti is the average power consumption per MHz of the i-th register of type ti as a function of

the load capacitance Ci and the input ramptime;

• P0i is the non-switching power consumption per MHz of a single register of type ti;

• Ci = nlitbi Clit is proportional to the number of literals, nlitbi, that the i-th state bit line is

driving in the combinational part, and the estimated capacitance Clit due to each literal,

expressed in equivalent standard loads.

4.7.5. PCOMB EstimationThe average power dissipated by the combinational logic PCOMB has been estimated by

considering the 2-level logic implementation, before the minimization step. The i-th state bit

line bi (where 1 ≤ i ≤ nvar) can be expressed by using the canonical form as the sum of Nbi


minterms (Nbi ≤ 2nlit where nlit is the number of literals and 2nlit is the maximum number of

minterms). Similarly, the m-th output bit ym (1 ≤ m ≤ nO) can be expressed in the canonical

form as the sum of Nym minterms (Nym ≤ 2nlit).

Let us assume to use a single AND gate to represent the generic minterm, hence the maximum

number of AND gates in the AND-plane is 2nlit, while in general nAND ≤ 2nlit. Given the

probabilistic model of the switching activity of the generic nlit-input AND gate, the static

probability associated to the output node is the product of static probabilities of the inputs:

p1AND = ∏

i=1

nlit

p1i (26)

Assuming that the inputs are spatio-temporal independent, the switching activity of the output

node for the nlit-input AND gate is given by:

αAND = 2 (1- ∏ i=1

nlit

p1i ) ∏

i=1

nlit

p1i (27)

To compute the switching activity of the output node of the generic AND gate, we need the

static probabilities of primary inputs and state bit lines. Given the vector of the steady state

probabilities P = {P1, ..., Pk, ..., Pns}T (obtained by solving the Chapman-Kolmogorov

equations, as shown in Chapter 2), the probability that the state bit line be equal to 1

corresponds to the sum of the probabilities for the set of states SI with this bit equal to 1:

p 1 bi = ∑ i∈ SI

pi (28)

In this case, we can derive an upper bound for the estimated power of the AND-plane:

PCOMB = ∑ i=1

nAND

Pi (Ci) TRi (29)

where:

• Pi (Ci) is the average power consumption per MHz of the i-th nlit-input AND gate;

• Ci is the capacitance driven by the i-th nlit-input AND gate;

• TRi = αi fCLK is the toggle rate of the i-th nlit-input AND gate (derived by using the

switching activity model of the nlit-input AND gate).

122

4.7.6. POUT EstimationPOUT is the average power dissipated by the OR-plane, that is composed of nvar Nbi-input OR

gates corresponding to the state bit lines, driving the input capacitance of the state registers,

and nO Nym-input OR gates corresponding to the primary outputs, driving the output load

capacitances.

Therefore, the upper bound for the power of the OR-plane is composed of two terms. The first

term is thus proportional to the switching activity factors αbi of the state bit line bi, while the

second term is proportional to the switching activity factors αyi of the primary outputs:

POUT = ∑ i=1

nvar

Pi (CIN_REG) TRbi + ∑i=1

nO

Pi (Cyi) TRyi (30)

where:

• Pi (CIN_REG) is the average power consumption per MHz of the i-th Nbi-input OR gate

driving the i-th state bit line;

• CIN_REG is the input capacitance of each state register;

• TRbi = αbi fCLK is the toggle rate of the i-th state bit line bi;

• Pi (Cyi) is the average power consumption per MHz of the i-th Nyi-input OR gate driving

the i-th primary output;

• Cyi is the output load capacitance of the i-th primary output;

• TRyi = αyi fCLK is the toggle rate of the i-th primary output.

4.8. Implementation and Experimental Results

The high-level power estimation model presented in the previous sections has been

implemented in the program vhdl2pow, written in C language, that receive as input the

VHDL description of the different sub-parts of the target system architecture. The structure of

the vhdl2pow program is composed of three main modules devoted to manage different

types of VHDL descriptions:

• the DP-RT module, for data-path VHDL descriptions at the RT-level;

• the DP-BEH module, for data-path VHDL descriptions at the behavioral-level;

• the CNTR module, for VHDL descriptions of the control part.


The first step of all the different modules is the lexical and syntactical analysis of the VHDL

code, implemented in a parsing function. The Lex and Yacc programs have been used to

perform the static analysis of the VHDL source code, based on the lexical and syntactical

rules. The result of the parsing step is an intermediate structure, containing all the information

necessary for the successive step of the semantic analysis.

The semantic analysis is executed by visiting the syntactic tree that represents the

intermediate structure in order to recognize the constructs. An example of VHDL source code

and the corresponding syntactic tree is depicted in Figure 6. The semantic analysis is

developed in different ways, depending on the given type of circuit description, since it is

referred to a particular style of circuit description.

At the RT-level, the architecture is well defined, being the scheduling and allocation steps

already been performed. Basically the analysis aims at identifying the architectural model

composed of logic modules (such as adders, multipliers, etc.) and their interconnections. The

goal of the data structure modeling the circuit architecture is to propagate the switching

activity and to calculate the load capacitances of the nodes. The final step is to estimate all the

parameters in the power model associated to the architectural description.

Concerning the behavioral-level descriptions, two different estimation approaches can be

adopted. The first approach derives an estimate of the switching activity and capacitive loads

directly from the information obtained during the live variable analysis, while the second

approach derives an architectural model corresponding to the behavioral-level model. In the

first approach, the semantic analysis is strictly connected to the results of the live variable

analysis, to obtain an estimate of the resources necessary to implement the circuit

functionality. In the second approach, the performed operations are quite similar to those

performed during high-level synthesis. In practice it is necessary to make some assumptions

about the scheduling of the operations and a possible resource allocation. Then the same

analysis techniques adopted for the RT-level descriptions can be applied to the corresponding

architectural description.

Given the complexity and large number of constructs provided by the VHDL language, our

analysis can accept as input only a sub-set of VHDL constructs, summarized in Table 2. This

imposes some limitation to the modeling style in VHDL, however the sub-set has been

124

derived by taking in consideration some industrial case studies developed at the R&D Labs of

SGS-Thomson and a set of standard benchmark circuits.

case

currentstate

IN10

process currentstate

IN00 IN01 IN11IN10SENSITIV.

ST1

Y_OUT

<=

if

IN00 IN01or or

"00" then else

NX_ST NX_STY_OUT Y_OUT

<= <= <= <=

"11" "00"ST1ST2

comb : process (current_state, IN00, IN10, IN01, IN11) begin

case CURRENT_STATE is

when ST1 =>

if( IN00 or IN01 or IN10 ) then

else

end if;

when ST2 =>

Y_OUT <= "11";

Y_OUT <= "00";

if( IN00 or (IN01 and IN10) ) then

NX_ST <= ST1;

NX_ST <= ST3;

Y_OUT <= "00";

else

NX_ST <= ST1;

Y_OUT <= "00";

NX_ST <= ST2;

Y_OUT <= "11";

end if;

Y_OUT <= "01";

end case;

end process;

(a)

(b)

Figure 6: An example of (a) VHDL source code and (b) the related syntactic tree


Constructs Considered Not ConsideredDeclarations constant,

function,variable

file,procedure,generic

Types integer,bit,boolean,std_logic,std_logic_vector

real,access,character,record,physical

Logic Operators and, or,nand, nor,xor, not

Relational Operators =, /=,<, >,<=, >=

Arithmetic Operators +, -, *, / **, rem,abs, mod

Sequential Instructions case, if,while, wait

exit, next,for, null

Attributes high, low,left, right,range, length

Table 2: VHDL construct considered in vhdl2pow

For the control logic, the vhdl2pow program provides a set of routines to extract a state

transition table from the behavioral VHDL description and its conversion to the BLIF format

(Berkeley Logic Interchange Format), a description formalism for sequential circuits adopted

in the OCTTOLLS package. An example of VHDL description of a Moore type and Mealy-

type FSM accepted by the vhdl2pow has been reported in Figure 3. The description is based

on based on the usage of a single process to describe both the state and output behavior.

A set of experimental results has been conducted by applying the power estimation model

implemented in vhdl2pow to several industrial case studies and standard benchmarks.

All the measures have been derived by using the HCMOS6 technology, featuring 0.35 µm and

3.3 V, supplied by SGS-Thomson Microelectronics at the target operating frequency of 100

MHz.

To verify the accuracy of the proposed model, the obtained results have been compared with

the estimates obtained by synthesized with the Synopsys tools targeting the HCMOS6

technology. For the synthesis of RT-level descriptions we used the Design Compiler tools,

while for behavioral-level description we used the Behavioral Compiler tool.

The estimation results obtained by the proposed methodology at pre-synthesis level have been

compared with the results derived by using the Synopsys Design Power tool, based on the

synthesized gate-level netlist and the probabilistic approach to propagate the switching

126

activity of internal nodes. The Synopsys estimates are expected to be more accurate than the

vhdl2pow estimates, since the Synopsys Design Power estimates are based on the

synthesized gate-level netlist, thus they have been derived on lower level description. Notice

that both methods are based on a Zero Delay Model.

END CASE; END PROCESS combo;

.

.

.

.

.

when ST2 => Y_OUT_temp <= "11"; if ( IN00 or IN01 or IN10 ) then NEXT_STATE <= ST3; Y_OUT_temp <= "01"; else NEXT_STATE <= ST1; Y_OUT_temp <= "00"; end if; when ST3 => Y_OUT_temp <= "01"; if ( IN11 or IN01 or IN10 ) then NEXT_STATE <= ST4; Y_OUT_temp <= "10"; else NEXT_STATE <= ST3; Y_OUT_temp <= "01"; end if;

Y_OUT_temp <= "10"; if ( IN00 or IN01 ) then NEXT_STATE <= ST3; Y_OUT_temp <= "01"; else NEXT_STATE <= ST2; Y_OUT_temp <= "11"; end if; end case; end process;

when ST4 =>

comb:process (current_state, IN00, IN01, IN10, IN11) begin case CURRENT_STATE is when ST1 => Y_OUT_temp <= "00"; if ( IN00 or IN01 or IN10 ) then NEXT_STATE <= ST2; Y_OUT_temp <= "11"; else NEXT_STATE <= ST1; Y_OUT_temp <= "00";

BEGIN CASE state IS WHEN IDLE => d_alarm_temp <= "00"; unlock_temp <= ’0’; start_timer_temp <= ’0’; reset_alarm_temp <= ’0’; IF (rc_response = ’1’) THEN nxt_state <= CAR_LOCKING; gas_temp <= ’0’; lock_car_temp <= ’1’; ELSIF (key_response = "00") THEN nxt_state <= GAS_LOCKED; gas_temp <= ’1’; lock_car_temp <= ’0’; ELSE nxt_state <= state; gas_temp <= ’0’; lock_car_temp <= ’0’; END IF; WHEN CAR_LOCKING => gas_temp <= ’0’; unlock_temp <= ’0’; start_timer_temp <= ’0’; reset_alarm_temp <= ’0’; IF (lock_err = ’1’) THEN nxt_state <= IDLE; d_alarm_temp <= "01"; lock_car_temp <= ’0’; ELSIF (lock_ok = ’1’) THEN nxt_state <= CAR_LOCKED; d_alarm_temp <= "10"; lock_car_temp <= ’0’; ELSE nxt_state <= state; d_alarm_temp <= "00"; lock_car_temp <= ’1’; END IF;

combo: PROCESS(state, key_response, rc_response, lock_err, .....)

(b)(a)

end if;

Figure 3: Example of VHDL description of a Moore-type FSM (a) and Mealy-type FSM (b).

The selected set of data-path dominated systems is composed of:

• mem_int, a VHDL RT-level description of a circuit to interface the main memory and

the system bus;

• gcd, a VHDL behavioral description to compute the great common divisor between two

8-bit number;

• diffeq, a VHDL behavioral description of a circuit for the numerical resolution of

second order linear differential equations with constant coefficients; diffeq-t has

been optimized for speed, while diffeq-a for area;


• ellipf, a VHDL behavioral description of an elliptic filter of the fifth order;

• dhrc, a VHDL behavioral description of the algorithm to solve the partial differential

equations to describe the heat propagation.

All the above mentioned case studies described at the behavioral-level have been derived

from the High Level Synthesis Design Repository, University of California at Irvine (1995).

The selected set of control dominated systems is composed of:

• pace, a VHDL description of an embedded controller of a pacemaker, for which the

power dissipation is a very significant design constraint

• cerbero, a VHDL description for an embedded controller for an anti-theft system for

automotive applications;

• a set of 43 FSMs derived from the MCNC-91 benchmark suite. First we applied the area-

oriented state assignment program NOVA [189] to the selected benchmarks, then the

encoded FSMs have been synthesized by the Synopsys Design Compiler tool targeting the

HCMOS6 technology.

Table 4 to Table 7 summarizes the results. First, let us discuss the results related to the case

studies reported in Table 4. For the sequential power, the proposed model provides an average

percentage error of 13.23% with respect to the Design Power estimates. For the combinational

power, the average percentage error is equal to 19.37%. Globally, a 13.95% average

percentage error on the total power figures. Let us discuss the results related to the MCNC

FSM benchmark set reported from Table 5 to Table 7. Regarding the sequential power, the

proposed model shows an average percentage error of 13.03% (ranging from 0.01% to

33.96%) with respect to the Design Power estimates. Regarding the combinational and total

power, the average percentage error is equal to 9.09% and 8.43% respectively. Globally, the

relative accuracy of our results compared with the Design Power results is considered

satisfactory at this level of abstraction.

128

CIRCUIT TotalSequential

Power [mW]

TotalCombinational

Power [mW]

TotalPower[mW]

Design Power 4.21 2.137 6.3478mem_int vhdl2pow RT 3.97 1.847 5.817

Perc. Error -5.7% -13.57% -8.36%Design Power 0.922 0.8953 1.8173

gcd vhdl2pow RT 0.832 0.826 1.658Perc. Error 9.76% -7.74% -8.77%Design Power 0.922 0.8953 1.8173

gcd vhdl2pow BEH 0.549 0.771 1.32Perc. Error 40.46% -13.88% -27.36%Design Power 13.72 29.68 43.4

diffeq-t vhdl2pow RT 11.961 27.22 39.181Perc. Error 12.82% -8.29% -9.72%Design Power 13.72 29.68 43.4

diffeq-t vhdl2pow BEH 13.205 34.21 47.415Perc. Error 3.75% 15.26% 9.25%Design Power 10.14 20.12 30.26

diffeq-a vhdl2pow RT 8.894 18.474 27.368Perc. Error 12.29% -8.2% -9.56%Design Power 10.14 20.12 30.26

diffeq-a vhdl2pow BEH 8.937 14.5 23.437Perc. Error 11.86% -27.93% -22.55%Design Power 16.25 1.8773 18.1273

ellipf vhdl2pow BEH 14.49 0.55 15.038Perc. Error 10.83% -70.74% -17.04%Design Power 4.78 4.42 9.2

dhrc vhdl2pow BEH 5.766 5.454 11.22Perc. Error 20.63% 23.3% 21.96%Design Power 0.735 1.01 1.745

pace vhdl2pow CNTR 0.8309 0.9141 1.9Perc. Error 13.05% -9.9% 8.88%Design Power 0.092 0.139 0.231

cerbero vhdl2pow CNTR 0.088 0.120 0.208Perc. Error 4.35% -14.29% -9.96%

Ave. |Perc. Error| 13.23% 19.37% 13.95%

Table 4: Power estimation comparison results for some case studies


CIRCUIT PI PO ns TotalSequential

Power [µW]

TotalCombinational

Power [µW]

TotalPower[µW]

Design Power 188.00 430.19 618.19bbara 4 2 10 vhdl2pow 214.98 411.03 626.01

Perc. Error 14.35% -4.45% 1.26%Design Power 266.00 850.10 1116.10

bbsse 7 7 16 vhdl2pow 269.26 903.02 1172.28Perc. Error 1.23% 6.23% 5.03%Design Power 180.00 203.29 383.29

bbtas 2 2 6 vhdl2pow 194.34 221.40 415.74Perc. Error 7.97% 8.91% 8.47%Design Power 218.00 272.09 490.09

bbtasmod 2 2 9 vhdl2pow 238.24 241.59 479.83Perc. Error 9.28% -11.21% -2.09%Design Power 168 329.95 497.95

beecount 3 4 7 vhdl2pow 175.87 331.81 507.68Perc. Error 4.68% 0.56% 1.95%Design Power 212 728.52 940.52

cse 7 7 16 vhdl2pow 271.88 544.11 815.99Perc. Error 28.25% -25.31% -13.24%Design Power 239.00 935.90 1174.90

dk14 3 5 7 vhdl2pow 217.33 995.32 1212.65Perc. Error -9.07% 6.35% 3.21%Design Power 146.00 802.15 948.15

dk15 3 5 4 vhdl2pow 149.82 783.91 933.73Perc. Error 2.62% -2.27% -1.52%Design Power 443 2197.4 2640.4

dk16 2 3 27 vhdl2pow 376.63 1767.62 2144.25Perc. Error -14.98% -19.56% -18.79%Design Power 429.00 659.90 1088.90

dk17 2 3 8 vhdl2pow 483.89 567.63 1051.52Perc. Error 12.79% -13.98% -3.43%Design Power 211.00 376.80 587.80



donfile 2 1 24 vhdl2pow 326.00 1052.87 1378.87Perc. Error -9.44% -0.97% -3.11%Design Power 351 1365.2 1716.2

ex1 9 19 20 vhdl2pow 316.05 1933.49 2249.54Perc. Error -9.96% 41.63% 31.08%Design Power 445 1562.9 2007.9

ex2 2 2 19 vhdl2pow 339.3 1533.78 1873.08Perc. Error -23.75% -1.86% -6.71%

Table 5: Power estimation comparison results for some MCNC FSM benchmarks (a).

130


Power [µW]

TotalCombinational

Power [µW]

TotalPower[µW]

Design Power 317 777.3 1094.3ex3 2 2 10 vhdl2pow 263.1 568.45 831.55

Perc. Error -17.00% -26.87% -24.01%Design Power 408 1211.8 1619.8

ex4 6 9 14 vhdl2pow 274.96 1205.72 1480.68Perc. Error -32.61% -0.50% -8.59%Design Power 260.00 662.45 922.45

ex5 2 2 9 vhdl2pow 272.82 638.52 911.34Perc. Error 4.93% -3.61% -1.20%Design Power 214.00 1139.80 1353.80

ex6 5 8 8 vhdl2pow 208.87 1277.16 1486.03Perc. Error -2.40% 12.05% 9.77%Design Power 253.00 681.57 934.57

ex7 2 2 10 vhdl2pow 266.72 706.38 973.10Perc. Error 5.42% 3.64% 4.12%Design Power 284 1158 1442

keyb 7 2 19 vhdl2pow 299.13 946.69 1245.82Perc. Error 5.33% -18.25% -13.60%Design Power 281 1259.8 1540.8

kirkman 12 6 16 vhdl2pow 220.73 1222.75 1443.48Perc. Error -21.45% -2.94% -6.32%Design Power 99.00 138.34 237.34

lion 2 1 4 vhdl2pow 119.25 157.42 276.67Perc. Error 20.45% 13.79% 16.57%Design Power 231.00 388.59 619.59

lion9 2 1 9 vhdl2pow 229.41 316.68 546.09Perc. Error -0.69% -18.51% -11.86%Design Power 358 1014.5 1372.5

mark1 5 16 15 vhdl2pow 259.57 1001.77 1261.34Perc. Error -27.49% -1.25% -8.10%Design Power 122.00 313.55 435.55

mc 3 5 4 vhdl2pow 130.20 315.65 445.85Perc. Error 6.72% 0.67% 2.36%Design Power 239.00 302.48 541.48

modulo12 1 1 12 vhdl2pow 279.16 305.55 584.71Perc. Error 16.80% 1.01% 7.98%Design Power 306 803 1109

opus 5 6 10 vhdl2pow 267.04 748.75 1015.79Perc. Error -12.73% -6.76% -8.40%Design Power 706.00 5014.60 5720.60

planet 7 19 48 vhdl2pow 515.69 5145.60 5661.29Perc. Error -26.96% 2.61% -1.04%Design Power 467 1976.8 2443.8

pma 8 8 24 vhdl2pow 308.4 2047.79 2356.19Perc. Error -33.96% 3.59% -3.58%

Table 6: Power estimation comparison results for some MCNC FSM benchmarks (b).



Power [µW]

TotalCombinational

Power [µW]

TotalPower[µW]

Design Power 454 3430.4 3884.4s1 8 6 20 vhdl2pow 343.72 3027.44 3371.16

Perc. Error -24.29% -11.75% -13.21%Design Power 387 2605.4 2992.4

s1a 8 6 20 vhdl2pow 343.72 2591.54 2935.26Perc. Error -11.18% -0.53% -1.91%Design Power 161 244.06 405.06

s27 4 1 6 vhdl2pow 203.09 293.16 496.25Perc. Error 26.14% 20.12% 22.51%Design Power 194.00 427.74 621.74

s8 4 1 5 vhdl2pow 191.37 374.71 566.08Perc. Error -1.36% -12.40% -8.95%Design Power 579 5292 5871

sand 11 9 32 vhdl2pow 531.64 4112 4643.64Perc. Error -8.18% -22.30% -20.91%Design Power 187.00 150.74 337.74

shiftreg 1 1 8 vhdl2pow 187.19 155.26 342.45Perc. Error 0.10% 3.00% 1.39%Design Power 306 973.3 1279.3

sse 7 7 16 vhdl2pow 266.77 926.37 1193.14Perc. Error -12.82% -4.82% -6.73%Design Power 423 3074 3497

styr 9 10 30 vhdl2pow 307.26 2968.76 3276.02Perc. Error -27.36% -3.42% -6.32%Design Power 150 281.02 431.02

tav 4 4 4 vhdl2pow 175 325.71 500.71Perc. Error 16.67% 15.90% 16.17%Design Power 377 5122.9 5499.9

tbk 6 3 32 vhdl2pow 387.16 4981.34 5368.5Perc. Error 2.69% -2.76% -2.39%Design Power 390 1998.8 2388.8

tma 7 6 20 vhdl2pow 289.38 1663.66 1953.04Perc. Error -25.80% -16.77% -18.24%Design Power 222.00 337.33 559.33

train11 2 1 11 vhdl2pow 223.87 332.28 556.15Perc. Error 0.84% -1.50% -0.57%Design Power 104.00 159.05 263.05

train4 2 1 4 vhdl2pow 121.37 173.71 295.08Perc. Error 16.70% 9.22% 12.18%

Ave. |Perc. Error| 13.03% 9.09% 8.43%

Table 7: Power estimation comparison results for some MCNC FSM benchmarks (c).

132

4.9. Summary

This chapter afforded the problem of the high-level power estimation of the HW-bound part

of embedded systems. High-level power estimation can be considered as a key topic to be

addressed in the design flow of an electronic system, to determine power trends during the

earliest stages of the design flow. The goal is the exploration of the architectural design space

from the power perspective. In this chapter, an estimation model has been proposed for both

the data-path and the control-path of the overall system. The input for the proposed model is

the high-level system specification described in VHDL at the behavioral- and RT-level of

abstraction. The high-level power model is composed of a set of sub-models devoted to

describe the power behavior of the heterogeneous parts composing the HW part of the target

system architecture. The most relevant feature of the proposed approach is to be quite general,

since it considers a general SOC architecture as well as their single components, that typically

constitute the HW-side of an embedded system. The model has been implemented in the

vhdl2pow program and applied to evaluate the power of several industrial case studies and

benchmark circuits. The results obtained by vhdl2power have then been compared to those

obtained by a commercial power estimation tool. The accuracy of the results is satisfactory at

this level of abstraction to consider the proposed model as a relative power indicator.

chapter 4 high-level power estimation - politecnico...

Documents