incremental power grid verification › ... › abhishek_201211_masc_thesi… · incremental power...

Incremental Power Grid Verification

Abhishek

A thesis submitted in conformity with the requirementsfor the degree of Master of Applied Sciences

Graduate Department of Electrical & Computer EngineeringUniversity of Toronto

Abstract

Abhishek

Master of Applied Sciences

Graduate Department of Electrical & Computer Engineering

University of Toronto

Verification of the on-die power grid is a key step in the design of complex high-

performance integrated circuits. For the very large grids in modern designs, incremental

verification is highly desirable, because it allows one to skip the verification of a certain

section of the grid (internal nodes) and instead, verify only the rest of the grid (external

nodes). The focus of this work is to develop efficient techniques for incremental veri-

fication in the context of vectorless constraints-based grid verification, under dynamic

conditions. The traditional difficulty is that the dynamic case requires iterative analysis

of both the internal and the external sections. A solution in the transient case is provided

through two key contributions: 1) a bound on the internal nodes’ voltages is developed

that eliminates the need for iterative analysis, and 2) a multi-port Norton approach is

used to construct a reduced macromodel for the internal section.

Acknowledgements

I would like to gratefully acknowledge the enthusiastic supervision of my supervisor Prof.

Farid N. Najm for his continuous guidance and inspiration. Without his efforts, insightful

suggestions and constant support, the development of this research work would not have

been possible. The weekly meetings with him always gave me much needed analysis of

the progress made in my research. I particularly remember a meeting in which he took

time in the end to explain to me the reasons as to why my progress was getting hampered.

I consider myself lucky to have got this opportunity to work under his supervision. Many

thanks professor for my overall development as a professional, a researcher and a person.

I would also like to thank Professors Andreas Veneris, Costas Sarris, and Jason An-

derson, from the ECE Department of the University of Toronto for reviewing this work.

I would also like to acknowledge the financial support provided by the University of

Toronto, Natural Sciences and Engineering Research Council (NSERC) of Canada, and

Advanced Micro Devices (AMD) Inc.

I am also thankful to Nahi H. Abdul Ghani and Ankit Goyal for their guidance

and support during the formative year of my degree program. They were always there to

answer my questions on almost any topic related to my research. I would also like to take

this opportunity to thank my colleagues for providing a great and pleasant environment.

I am especially grateful to Sari Onaissi, Niyati Shah, Sandeep Chatterjee, Mohammad

Fawaz, Jason Luu, Jongsok Choi, Andrew Canis, Li Liu, Braiden Brousseau and Bao Le.

I wish them best in their endeavors.

I am also lucky to have the support of my best friend Sampada Bagai. Knowing that

she will always be by my side no matter what happens, gave me the patience and the

strength to successfully complete my Masters.

My biggest gratitude goes to my parents, Dr. Avadhesh Kumar and Mrs. Chhaya

Saxena for always encouraging me and wanting only the best for me. I can never forget

those pep talks with mom and dad that always helped me whenever I was struggling

in my research. They never doubted my ability and invested in my future, at times

compromising their present. I would also like to thank my younger brother Avijit for

always considering me as a friend and giving me much needed breaks, through his writings

and musings.

Lastly, I offer my regards to those who supported me in any respect during the

completion of this work.

Contents

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Background 5

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Power Grid Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.1 RC Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2.2 RLC Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.3 Power Grid Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.1 Current Constraints . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.3.2 RC Grid Verification . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.3.3 RLC Grid Verification . . . . . . . . . . . . . . . . . . . . . . . . 13

2.4 Norton’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Model Order Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5.1 Concept of Moments . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.5.2 Explicit Moment Matching . . . . . . . . . . . . . . . . . . . . . . 22

2.5.3 Implicit Moment Matching . . . . . . . . . . . . . . . . . . . . . . 26

2.5.4 Truncated Balance Realization . . . . . . . . . . . . . . . . . . . . 28

2.5.5 Local Node Elimination . . . . . . . . . . . . . . . . . . . . . . . 30

2.5.6 Divide and Conquer . . . . . . . . . . . . . . . . . . . . . . . . . 31

3 Incremental Power Grid Verification 33

3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.2 Efficient Bounds Computation . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3 Power Grid Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.3.1 Moving Internal Current Sources . . . . . . . . . . . . . . . . . . 36

3.3.2 Sub-grid Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4 Verification after Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.5 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4 Dimension Reduction of the Feasible Space of Currents 51

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2 Incremental Verification . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

4.2.1 Defining F . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.2.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.3 Chip Power Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.3.1 Adapting to RLC . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.3.2 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . 69

5 Conclusions & Future Work 73

Bibliography 74

List of Tables

3.1 Legend . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.2 Speed and accuracy after using efficient bounds computation . . . . . . . 44

3.3 Speed and accuracy after applying macromodeling . . . . . . . . . . . . . 45

3.4 Runtime breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.1 Speed and accuracy after reducing the dimensions of the feasible space

of currents compared to incremental verification approach presented in

chapter 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

4.2 Runtime breakdown . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

4.3 Speed and accuracy after constructing the CPM using Approach I . . . . 69

4.4 Comparison between the three different approaches to macromodeling . . 69

List of Figures

2.1 An RC model of Power Grid . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 Power Grid Model showing Internal, External and Port nodes . . . . . . 7

2.3 RLC model of Power Grid [1] . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Combined Package and On-Chip Power Grid . . . . . . . . . . . . . . . . 10

2.5 Sub-grid and Rest of the Grid . . . . . . . . . . . . . . . . . . . . . . . . 16

2.6 Simplified Illustration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.7 Configuration after adding voltage sources . . . . . . . . . . . . . . . . . 18

2.8 Rest of the Grid disconnected . . . . . . . . . . . . . . . . . . . . . . . . 19

2.9 External Sources set to zero . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.10 Internal Sources moved to ports . . . . . . . . . . . . . . . . . . . . . . . 20

2.11 Final Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2.12 T-Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.13 Floating Capacitance Model . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.14 Parallel RC model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1 Relative Error Plot (efficient bounds) . . . . . . . . . . . . . . . . . . . . 45

3.2 Relative Error Plot (macromodeling) . . . . . . . . . . . . . . . . . . . . 46

3.3 Runtime vs Size of Sub-grid . . . . . . . . . . . . . . . . . . . . . . . . . 46

3.4 Speed-up vs Size of Sub-grid . . . . . . . . . . . . . . . . . . . . . . . . . 47

3.5 Runtime and Accuracy vs κ for G1 . . . . . . . . . . . . . . . . . . . . . 49

3.6 % Reduction and Runtime vs τN for G1 . . . . . . . . . . . . . . . . . . 49

4.1 A bar graph showing the contributions of major procedures in runtime for

both original (chapter 3) and the modified approaches . . . . . . . . . . . 63

4.2 Relative Error Plot for H6 (modified approach) . . . . . . . . . . . . . . 64

4.3 Relative Error Plot for A4 (vub(∞)) . . . . . . . . . . . . . . . . . . . . . 70

4.4 Relative Error Plot for A4 (vlb(∞)) . . . . . . . . . . . . . . . . . . . . . 71

Chapter 1

Introduction

1.1 Motivation

A power grid is an electrical network that is composed of multiple layers of metal and that

provides supply voltage connections from the power supply pins on the package to the

devices on the chip. There are typically two types of voltage variations: IR drop arising

due to resistive elements in the grid, and Ldi/dt noise which is a result of the inductive

elements in the grid and is proportional to the rate of change of the current [2, 3]. Such

variations in the supply voltage can lead to soft error, increased circuit delays and loss of

yield. Therefore, voltage integrity analysis is important for chip design. With technology

scaling, a trend of decreasing supply voltages, narrower wires, increasing power densities,

and tighter noise margins has been observed [4]. A consequence of these trends is that

the reliability of integrated circuits has become increasingly susceptible to the voltage

level fluctuations.

Most grid verification techniques use some form of circuit simulation to simulate

the grid. Simulation-based approaches require complete knowledge of current waveforms

drawn by the underlying logic circuitry, which are used to simulate the grid and deter-

mine the grid node voltage drops. However, verifying the grid in this way is prohibitively

Chapter 1. Introduction 2

expensive because the number of current traces required to cover all the possible cir-

cuit behaviors is extremely large. Another disadvantage is that a simulation-based flow

does not allow for early grid verification (when changes to the grid can most easily be

incorporated) because no current traces may be available at that time.

To overcome these issues, a vectorless verification approach based on partial current

specification in the form of current constraints was proposed in [5], and further developed

in subsequent work over the last decade. Grid verification is reduced to a problem of

finding the worst-case voltage drop over all possible currents that satisfy certain current

constraints. In [6], the authors used an RC model of the power grid and gave an upper

bound on the worst-case voltage drop using an iterative approach. A closed form ex-

pression for the upper bound was later proposed in [7], which involved solving a linear

program (LP) for every grid node. An efficient way to reduce the size of the LPs based

on the sparse approximate inverse technique was proposed in [8].

This previous work is useful for verifying the entire power grid but becomes an overkill

when verification of only a part of the grid is required, a scenario that is referred to as

incremental verification. Incremental verification has become desirable because modern

grids can be so large that full verification becomes expensive, and a divide-and-conquer

approach becomes a necessity. Alternatively, incremental verification is desirable when

design changes are made to a local region of a previously-verified grid, and the local

impact of these changes needs to be verified. There are also various other cases, such as

in case of IP reuse, where a portion of the grid may not need to be verified.

With increasing number of transistors and high operating frequencies, the Ldi/dt

noise is becoming increasingly significant [2, 9, 10]. Thus, the effect arising from the

inductance of metal lines and the inductance of interconnections between the grid and the

package should also be included while performing voltage integrity analysis. Traditionally,

IC designers focus on the voltage drop for the on-chip power delivery network (PDN)

while the package designers are generally concerned with reducing the off-chip network

impedance. The problem with such a design practice is that the package design does

not take into account the resonance which can be caused by the resulting RLC network

(consisting of the package inductance and the RC on-chip interconnects). Therefore, on-

chip interconnects must be considered while designing the chip-package power delivery

network [11]. However, this is computationally prohibitive in real designs because of

the size of the on-chip PDN. A Chip Power Model [11] is a reduced-order model that

captures the electrical behavior of the on-chip power distribution network coupled with

the parasitics of the chip-package network. The runtime is thus drastically reduced

because the entire on-chip network is now converted into a reduced-order model.

1.2 Objective

The goal of this research is to develop efficient techniques to allow incremental veri-

fication in the vectorless constraints-based power grid verification context. In [12], a

technique is given for incremental verification but only for the case of a resistive grid,

under the influence of DC currents. In this work, we propose techniques to efficiently

perform incremental power grid verification in the transient case. The difficulty with

grid verification in the transient case is that it requires iterative analysis of the complete

power grid (including the sections that need not be verified). The proposed incremental

verification technique should be such that the verification is only performed for the part

of the grid that needs to be verified, even under dynamic conditions.

Power grid macromodeling is an important step in incremental grid verification. The

main idea is to abstract the behavior of the parts of the grid on to the interface that

connects these parts to the portion of the grid, the user is interested in verifying. Two

main steps are involved in macromodeling: 1) moving the current sources to the interface,

and 2) reducing the resulting passive RC parts of the grid that need not be verified. In our

case, the current source values are not known instead, we have constraints on the current

sources. We need to develop a technique to perform macromodeling in a constraints-based

verification model, such that significant runtime savings can be achieved with negligible

loss of accuracy.

In [11], the authors use Norton Equivalent theorem to abstract the behavior of on-chip

PDN to the ports connecting the chip to the package. As an extension to incremental

verification, we develop an approach to perform vectorless verification of the off-chip

interconnects by constructing a Chip Power Model for the constraints-based power grid

verification framework.

1.3 Thesis Organization

The thesis is organized as follows: Chapter 2 gives the background information on the

power grid model and the vectorless approach to power grid verification. An overview of

some of the popular Model Order Reduction (MOR) techniques is also provided in this

chapter. In chapter 3, the proposed approach for incremental verification is presented.

Chapter 4 provides optimizations to the proposed approach and also extends the con-

cept to the verification of package nodes by constructing a Chip Power Model. Finally,

chapter 5 concludes and provides directions for further research.

Chapter 2

Background

2.1 Introduction

In this chapter, we review the background material for this work. In section 2.2, we

focus on the power grid model. In the next section, we provide a review of the vectorless

constraints-based verification approach, which is then followed by a review of the proof of

the multi-port Norton’s theorem [13]. In the last section, we review some popular Model

Order Reduction techniques.

2.2 Power Grid Model

2.2.1 RC Model

Consider an RC model of the grid as shown in Fig. 2.1. In such a model, each branch is

represented by a resistor and there exists a capacitor from every node to ground. Some

nodes have ideal current sources (to ground) to represent the currents drawn by the

underlying circuitry, and some have ideal voltage sources to represent the connections

to the external power supply. Let the power grid consists of n + p nodes, where nodes

1, 2, . . . , n have no voltage sources attached, and the remaining nodes are nodes where

the p voltage sources are attached. Let is,k(t) be the value of the current source connected

Chapter 2. Background 6

i (s,1) (t)

i (t)(s,2)

Figure 2.1: An RC model of Power Grid

to node k. We assume that ∀k = 1, . . . , n, is,k(t) is well-defined, so that nodes with no

current source attached have is,k(t) = 0. Let is(t) be the vector of all current sources

is,k(t) and u(t) be the vector of nodal voltages. Applying Modified Nodal Analysis (MNA)

to the grid leads to:

Gu(t) + Cu(t) = −is(t) +G0Vdd (2.1)

where G is an n × n conductance matrix as in the traditional MNA formulation; G0 is

another n×n matrix consisting of conductance elements connected to the Vdd sources [5];

C is an n × n diagonal matrix of node capacitances; and Vdd is a constant vector each

entry of which is equal to the supply voltage value. The matrix G is known to be a

diagonally-dominant, symmetric, positive-definite, and an M-matrix, so that G−1 ≥ 0.

Let v(t) = Vdd − u(t) be the vector of voltage drops. The RC model for the power grid

can then be written as [6]:

Gv(t) +Cv(t) = is(t) (2.2)

Note that this equation can be obtained directly by writing the MNA system for a

modified network in which all voltage sources are shorted (set to 0) and all current

sources are reversed. The work in chapter 3 is based on the above RC model. In the

incremental verification framework presented in chapter 3, the user identifies a part of

Figure 2.2: Power Grid Model showing Internal, External and Port nodes

the grid that does not need to be verified. This part of the grid is referred to as sub-grid

as shown in Fig. 2.2. Verification is required only for grid nodes that are outside the

sub-grid, referred to as external nodes while the nodes inside the sub-grid are either

internal nodes or port nodes. Nodes inside the sub-grid that are connected to the

external nodes are called port nodes, while all remaining sub-grid nodes are referred to

as internal nodes. Let next, nprt, and nint be the number of external nodes, port nodes,

and internal nodes respectively, such that next + nprt + nint = n. Because external nodes

connect only to the port nodes, the grid equation can be written as:

G11 G12 0

GT12 G22 G23

0 GT23 G33

vext(t)

vprt(t)

vint(t)

Cext 0 0

0 Cprt 0

0 0 Cint

vext(t)

vprt(t)

vint(t)

is,ext(t)

is,prt(t)

is,int(t)

where vext and is,ext are sub-vectors corresponding to voltage drops and current sources

at external nodes, vprt and is,prt correspond to voltage drops and current sources at port

nodes, and vint and is,int correspond to voltage drops and current sources at internal

i (s,1) (t)

c c i (t)(s,2)

Figure 2.3: RLC model of Power Grid [1]

nodes. The matrices G and C are partitioned into sub-matrices of appropriate dimen-

sions. Using a finite difference approximation as in [6], the system (2.2) can be written

Av(t) =C

∆tv(t−∆t) + is(t) (2.4)

where A =(

is also a symmetric, positive-definite, and an M-matrix, so that

A−1 ≥ 0.

2.2.2 RLC Model

The RLC model of the power grid is shown in Fig. 2.3. There are two different types

of branches in this model [1]: a branch that is represented by a resistor is referred to as

an r-branch, while a branch that is represented by a resistor in series with an inductor

is referred to as an rl-branch. The inductance represents the inductive components of

either the grid interconnects or the pad structure that connects the grid to external

voltage sources (chip-package interconnections). As in [1, 8], we define two types of

nodes: nodes which are internal to the rl-branch are referred to as branch nodes while

all other nodes are referred to as actual nodes. Some of the actual nodes connect to

the supply through a branch node while some have ideal current sources (to ground)

representing the currents drawn by the underlying logic circuitry. There are capacitors

from every actual node to ground.

Let the power grid consist of nb + nl + p nodes where nodes 1, 2, . . . , nb are branch

nodes, nodes (nb + 1), (nb + 2), . . . , (nb + nl) are actual nodes with no voltage sources

attached and the remaining nodes are the actual nodes where p voltage sources are

attached. Let is,k(t) be the current source connected to node k. We assume that is,k(t) is

defined for all nodes k = 1, 2, . . . , n, where n = nb +nl, so that all nodes with no current

sources attached have is,k(t) = 0. Let is(t) be the vector of all is,k(t) sources and u(t)

be the vector of voltage signals uk(t) at every node. Also, let i(t) be the vector of the

inductive branch currents il(t). The time-domain equation for the grid is given by:

Gu(t) +Cu(t) +Mi(t) = −is(t) +G0Vdd (2.5)

where G is an n×n conductance matrix that is known to be a diagonally-dominant, sym-

metric, positive definite, M-matrix; G0 is an n× n matrix of the conductance elements

connected to the Vdd sources [5]; C is an n×n capacitance matrix with the entries corre-

sponding to the branch nodes being equal to zero; and M is an n× nb incidence matrix

whose elements are either ±1 or 0, as in [1]. The term ±1 appears in the location mij of

the matrix M when the node i is connected to the jth inductor, else a 0 occurs. If the

current-direction assignment is away from the node the sign is positive, else it is negative.

Let v(t) = Vdd − u(t), then the RLC model of the power grid can be written as [1]:

Gv(t) +Cv(t)−Mi(t) = is(t) (2.6)

The inductive branch currents can also be expressed in terms of voltage drops v(t) as:

MTv(t) + Li(t) = 0 (2.7)

where L is an nb × nb diagonal matrix of inductance values. Equations (2.6) and (2.7)

represent the behavior of the RLC model of the power grid. In the construction of the

Chip Power Model discussed in chapter 4, we consider the model as shown in Fig. 2.4.

RLC Package Interconnections RC On-chip Interconnections

Figure 2.4: Combined Package and On-Chip Power Grid

We consider an RC model for the on-chip interconnects while the package to supply

interconnections are modeled using the RLC model described above. While designing the

combined package and on-chip power grids, the on-chip inductance can be safely ignored

because the effect of the on-chip inductance is small as compared to the resistance of the

lines [2, 14]. The current sources model the chip activity and therefore, are present only

on the on-chip interconnects. Let next = nb + npkg be the number of external nodes that

include nb branch nodes and npkg actual nodes, nprt and nint be the number of port nodes

and internal nodes (on-chip) respectively. The system can thus be partitioned as:

G11 G12 0

GT12 G22 G23

0 GT23 G33

vext(t)

vprt(t)

vint(t)

Cext 0 0

0 Cprt 0

0 0 Cint

vext(t)

vprt(t)

vint(t)

i(t) =

is,int(t)

Mext 0 0

vext(t)

vprt(t)

vint(t)

+ Li(t) = 0 (2.9)

where vext is the sub-vector corresponding to voltage drops at the external nodes, vprt

correspond to voltage drops at port nodes, and vint and is,int correspond to voltage drops

and current sources at internal nodes. The matrices G and C are partitioned into sub-

matrices of appropriate dimensions. Since only the inductances present in the package

interconnections are considered, the matrix M is composed of a matrix Mext of size

next × nb and all other entries are equal to zero. As in [1], a discrete time version of the

system can be given as:

v(t)−Mi(t) = is(t) +C

∆tv(t−∆t) (2.10)

MTv(t) +L

∆ti(t) =

∆ti(t−∆t) (2.11)

where(

is also a symmetric, positive definite, and an M-matrix.

2.3 Power Grid Verification

In this section, we review the vectorless verification approach based on partial current

specifications in the form of current constraints. The grid verification problem is reduced

to a problem of finding the worst-case voltage fluctuation over all possible currents that

satisfy the current constraints. The next sub-section gives an overview of the current

constraints.

2.3.1 Current Constraints

The verification framework used in this work allows for early verification of power grids,

when the details of the underlying circuitry may not be known. Current constraints [5]

provide a way to capture the uncertainty about circuit behavior and that one is uncertain

about the circuit itself early in the design flow. One can specify these constraints from

the knowledge of the design specs (area and power budget), and also from engineering

judgement (power needs in previous technology, the effect of scaling on those needs etc.).

As in previous work, we use two types of constraints: local constraints and global

constraints. Local constraints are upper bounds on the individual current sources, where

one specifies that the current is,k(t) never exceeds a certain fixed level iL,k. We assume

that every current source tied to the grid has an upper bound associated with it, so that

if a node does not have a current source attached, the upper bound for that current is 0.

We can express these constraints as:

0 ≤ is(t) ≤ iL, ∀t ≥ 0 (2.12)

If only local constraints are provided, the problem is much simplified but the results

become overly pessimistic, because it is never the case that all the chip components are

simultaneously drawing their maximum currents. Global constraints are upper bounds

on the sums of currents for groups of current sources. They represent the peak total

power dissipation of a group of circuit blocks. Assuming that we have a total of m global

constraints, then we can express them in matrix form as:

0 ≤ Sis(t) ≤ iG, ∀t ≥ 0 (2.13)

where S is an m × n matrix that contains only 0s and 1s, that indicate which current

sources are present in each global constraint. Together, the local and the global con-

straints define a feasible space of currents, denoted by F , such that is(t) lies inside the

feasible space (is(t) ∈ F) if and only if it satisfies (2.12) and (2.13), ∀t ≥ 0. Combin-

ing (2.12) and (2.13), we can write:

0 ≤ Uis(t) ≤ u (2.14)

where U is an (m + n) × n matrix consisting of In (the n × n identity matrix) and S,

and u is the upper bound vector. Therefore, the feasible space F can be defined as:

F = {is(t) : 0 ≤ Uis(t) ≤ u, ∀t ≥ 0} (2.15)

Given a power grid, we are interested in finding the worst-case voltage variations at

all the nodes, under all the possible (transient) current waveforms is(t) that satisfy the

current constraints.

2.3.2 RC Grid Verification

For an RC grid, the authors in [7] provide an upper bound vub on the worst-case voltage

drop vector, so that v(t) ≤ vub, ∀t, and this bound is given by:

I+G−1 C

Va (2.16)

where Va is the worst-case voltage drop vector at t = ∆t in the special case when

is(t) = 0, ∀t ≤ 0, and I is the identity matrix. Since v(0) = 0 in this special case, it

follows from (2.4) that:

v(∆t) = A−1

∆tv(0) + is(∆t)

= A−1is(∆t) (2.17)

and Va can be expressed as:

Va = emax∀is(∆t)∈F

A−1is(∆t) (2.18)

where “emax” is an operator that denotes element-wise maximization of its vector ar-

gument, under the given constraints. In other words, Va is the result of the for-

loop: for (k = 1, . . . , n) {maximize the kth element of the vector A−1is(∆t), over all

is(∆t) ∈ F}. Maximizing each element becomes a linear program (LP). Note that, be-

cause the definition of the local and global constraints does not depend on time, then Va

is independent of t and we can drop the ∆t argument, and write:

Va = emax∀is∈F

A−1is (2.19)

where, for the purpose of this optimization, is can be viewed as simply a “dummy vari-

able”, a n× 1 real vector with units of current. Thus, the problem of finding the worst-

case voltage drop is reduced to performing element-wise maximization of A−1is, over all

is ∈ F , to find Va, followed by a standard linear system solve, to find vub.

2.3.3 RLC Grid Verification

In the RLC verification framework, we are interested in the maximum and minimum

worst-case voltage drops, over all possible currents in F . An efficient method to compute

the upper bound on the maximum worst-case voltage drop and lower bound on the

minimum worst-case voltage drop was proposed in [1]. Let:

∆t−M

, x(t) =

, b(t) =

(2.20)

Combining (2.10) and (2.11) using (2.20), we get a simplified discrete-time expression for

the RLC model:

x(t) = A−1Bx(t−∆t) +A−1b(t) (2.21)

Consider the special case where the grid had no stimulus for all t ≤ 0, so that x(0) = 0.

At time t = ∆t, we get:

x(∆t) = A−1b(∆t) (2.22)

Similarly, at 2∆t:

x(2∆t) = A−1BA−1b(∆t) +A−1b(2∆t) (2.23)

Thus, at any future time p∆t, we have:

x(p∆t) =

p−1∑

(A−1B)kA−1b((p− k)∆t) (2.24)

The general solution to the exact voltage drop maximization and minimization problem

is provided in [1] as:

xopt(τ) = limp→∞

eoptb(t)∈F

p−1∑

(A−1B)kA−1b((p− k)∆t)

(2.25)

where the “eopt” operator denotes the element-wise maximization and minimization of

its vector argument, under the given constraints. Since the constraints are DC and do

not depend on time, decoupling the components of (2.25) leads to:

xopt(τ) = limp→∞

p−1∑

eoptb(t)∈F

(A−1B)kA−1b((p− k)∆t)]

(2.26)

Equation (2.26) is of theoretical interest only as it involves a large number of time steps.

In [1], the authors proposed bounds on the maximum and minimum worst-case voltage

drops by first transforming the RLC grid into a reduced circuit by eliminating the in-

ductive branch currents, without any approximation. The reduced circuit equation can

be written as:

∆t+M

v(t) =

∆t+MG

v(t−∆t) + is(t) (2.27)

where G is an nb × n matrix whose kth row is that of either G or −G depending on the

current assignment through the inductive rl-branch. The same equation can be written

in the compact form as:

v(t) = D−1Ev(t−∆t) +D−1is(t) (2.28)

whereD =(

∆t+M L

−1MT

is a sparse symmetric, positive definite, and a banded

M-matrix, and E =(

∆t+MG

. To compute the bounds at infinity, upper and lower

bounds on voltage drops for r time steps ahead in time are computed and are given by:

wr =r−1∑

eopt∀is∈F

[(D−1E)qD−1is] (2.29)

=r−2∑

eopt∀is∈F

[(D−1E)qD−1is] + eopt∀is∈F

[(D−1E)r−1D−1is] (2.30)

= wr−1 + eopt∀is∈F

[(D−1E)r−1D−1is] (2.31)

The choice of r is made in such a way that ‖N‖∞ < 1 and ‖N‖1 < 1, whereN = (D−1E)r.

The upper and lower bounds at infinity are now given by:

vub(∞)

vlb(∞)

= (I−R)−1wr (2.32)

where R is a 2n× 2n matrix defined as:

N− S S

S N− S

(2.33)

Sub-grid

Linear, withsources

Rest of the grid

Linear, with sources

Figure 2.5: Sub-grid and Rest of the Grid

where S = 12(N − Q), with Q being the matrix of the element-wise absolute values of

the entries in N.

2.4 Norton’s Theorem

Norton’s theorem is a fundamental theorem in circuit theory that converts any linear

two-terminal network into a simple parallel circuit consisting of an equivalent current

source, and an equivalent internal impedance. The equivalent current source value is the

current that would flow through a short circuit between the two terminals [15].

In this section, we review a proof for the multi-port Norton’s theorem [13] which

can be used to convert a linear n-port network into an equivalent circuit with all the

current sources internal to the network replaced by Norton current sources at the port

nodes. Consider a sub-grid section which is linear and may contain sources, connected

to the rest of the grid as shown in Fig. 2.5. For our purpose, the sources are independent

current or voltage sources that are always connected between a grid node and ground.

But, in general, any configuration of sources is allowed, including controlled sources. It

is assumed that no mutual inductance exists between the sub-grid and the rest of the

Sub-grid

Linear, withsources

Rest of the grid

Linear, withsources

Figure 2.6: Simplified Illustration

grid. We can simplify the illustration as shown in Fig. 2.6, where the port nodes are

numbered from 1 to p. With reference to Fig. 2.6, let the voltage waveforms at the port

nodes be v1(t), v2(t), . . ., vp(t), and let the port current waveforms be i1(t), i2(t), . . .,

ip(t), as shown. If we connect a new set of independent voltage sources carrying the

same port voltage waveforms v1(t), v2(t), . . ., vp(t), as shown in Fig. 2.7, then clearly

these sources will carry no current and the circuit behavior everywhere is not altered.

Furthermore, if we now disconnect the rest of the grid, as in Fig. 2.8, the port currents

i1(t), i2(t), . . ., ip(t) and the sub-grid internal signals remain unaffected, because the

sub-grid is subjected to the same port voltage waveforms as in the original circuit.

Now, considering Fig. 2.8, we are interested in finding the port currents with the aid

of the superposition theorem. The port currents can be found as the sum of two versions

of these currents resulting from the following two modified circuits: 1) the circuit of

Fig. 2.8, but with the external (voltage) sources v1(t), v2(t), . . ., vp(t) all set to zero, as

shown in Fig. 2.9 and 2) the circuit of Fig. 2.8, with all the internal independent sources

of the sub-grid set to zero (voltage sources replaced by short circuits and current sources

replaced by open circuits). With reference to Fig. 2.9, we denote the (short-circuit) port

current waveforms as i(s)1 (t), i

(s)2 (t), . . ., i

(s)p (t) and note that these currents depend only

Sub-grid

Linear, withsources

Rest of the grid

Linear, withsources

Figure 2.7: Configuration after adding voltage sources

on the sub-grid internals and are independent of the rest of the grid. Specifically, they

do not depend on the original port voltages v1(t), v2(t), . . ., vp(t).

It is clear, therefore, that the original port currents i1(t), i2(t), . . ., ip(t) can be found

from the circuit of Fig. 2.10, where the short-circuit currents i(s)1 (t), i

(s)2 (t), . . ., i

(s)p (t) have

been introduced as additional independent sources. In this circuit, the sub-grid passive

elements remain in place but its internal sources have all been removed – effectively, they

have been “moved” to the ports. The grid internal voltages and currents may have been

altered as a result of this construction, but its port voltages remain the same and its

original port currents are still available as i1(t), i2(t), . . ., ip(t) in Fig. 2.10.

Finally, we can “put back” the rest of the grid and remove the voltage sources v1(t),

v2(t), . . ., vp(t), reversing the earlier construction, leading to the circuit shown in Fig. 2.11.

As a result of this sequence of steps, all independent sources internal to the original sub-

grid have been moved to its ports, and these new port current sources depend only on

the sub-grid, so that the sub-grid port signals remain unaltered. In summary, the values

of current sources at port nodes that replace the internal sources can be found by the

following process:

1. The sub-grid is disconnected from the rest of the grid.

Sub-grid

Linear, withsources

Figure 2.8: Rest of the Grid disconnected

Sub-grid

Linear, withsources

i1(s)(t)

i2(s)(t)

ip(s)(t)

Figure 2.9: External Sources set to zero

Sub-grid

Linear, with,��

sources set to zero

_+ v1(t)

i1(s)(t)

_+ v2(t)

i2(s)(t)

_+ vp(t)

ip(s)(t)

Figure 2.10: Internal Sources moved to ports

2. Each port node is connected to ground via a short circuit.

3. The currents flowing through these short circuit connections (due to the applied

internal current sources) are evaluated.

2.5 Model Order Reduction

Compact modeling of the power grid is important because it allows for efficient analysis of

large scale designs. Model order reduction (MOR) was developed in the area of systems

and control theory, which studies the properties of dynamical systems with the goal of

reducing their complexity, while preserving their input-output behavior as much as pos-

sible [16]. MOR aims to capture the essential features of a structure, thereby simplifying

the model in order to perform analysis or simulation efficiently.

The fundamental methods in MOR were published in the early eighties and nineties of

the last century and a lot of work has been done in this area. We provide an overview of

some of the popular MOR techniques. The techniques for MOR can be broadly classified

into the following:

1. Explicit Moment Matching

Rest of the grid

Linear, withsources

Sub-grid

Linear, withall independentsources set to zero

i1(s)(t)

i2(s)(t)

ip(s)(t)

Figure 2.11: Final Configuration

2. Implicit Moment Matching

3. Truncated Balance Realization

4. Local Node Elimination

The first two categories are broadly based on the concept of moments of a linear

network. In truncated balance realization methods, the “weak” uncontrollable and un-

observable state variables are truncated to achieve reduction [17]. These methods pro-

duce nearly optimal models but are more computationally expensive than moment-based

methods. Local node elimination methods like TICER [18] allow reduction to be ap-

plied in a local manner by eliminating circuit nodes that have negligible effect on the

input-output behavior of the system.

2.5.1 Concept of Moments

The transfer function of a linear network H(s) is given by:

H(s) =Y(s)

X(s)(2.34)

where Y(s) and X(s) are the output and input functions, respectively. If the input is

an impulse function δ(t), then H(s) is also the transient solution of the output impulse

response of the system which can be expanded around s = 0 by Taylor series expansion

H(s) =∞∑

Mksk (2.35)

dkH(s)

(2.36)

is called the kth order moment. We can represent the MNA formulation in state space

Gx(t) +Cx(t) = Bu(t)

y(t) = LTx(t) (2.37)

whereG and C are the n×n conductive and storage element matrices, B is an n×p input

position matrix, L is an n× q output position matrix, x represents the circuit variables,

y is the output vector, and u is the excitation vector. Using the Laplace theorem with

the initial condition set to 0, and then expanding at s = 0 gives:

(G+ sC)(x0 + x1s+ x2s2 + ....) = B (2.38)

From (2.38), the moments for state variable x are given by a recursive formula as:

x0 = G−1B; x1 = −G−1Cxo

⇒xi = −G−1Cxi−1 (2.39)

The output moment Mi = LTxi is therefore given by:

Mi = LT (−G−1C)iG−1B (2.40)

2.5.2 Explicit Moment Matching

Explicit moment matching methods are based on the direct or explicit matching of the

moments in order to reduce the order of the system. Let us assume that the transfer

function of an original model is given by:

H(s) = M0 +M1s+M2s2 + . . . (2.41)

We try to build a reduced model that matches the first q moments and whose transfer

function will be given by:

Hq(s) = M0 +M1s+M2s2 + . . .+Mqs

q (2.42)

Asymptotic Waveform Evaluation

Asymptotic Waveform Evaluation (AWE) [19] is an efficient frequency-domain analy-

sis approach which combines the moment computation (described in the previous sub-

section) with Pade Approximation techniques, to match the moments. The main idea of

Pade Approximation is to approximate the transfer function H(s) by an order-limited

rational function Hq(s) of order q. For a single input and single output (SISO) system,

Hq(s) is given as:

Hq(s) =a0 + a1s+ a2s

2 + . . .+ aq−1sq−1

1 + b1s+ b2s2 + . . .+ bqsq(2.43)

Next, we match the first 2q moments of H(s) and Hq(s) such that:

Hq(s) = M0 +M1s+M2s2 + . . .+M2qs

2q (2.44)

Equations (2.43) and (2.44) can be used to solve for the coefficients of the rational function

by equating the coefficients of s. The poles, zeros, and residues of the reduced-order model

can now be found to perform time-domain or frequency-domain analysis. AWE tends to

generate unstable positive poles when higher order moments are computed. The accuracy

also suffers with higher order poles. Techniques like Multinode Moment Matching [20]

can be used to improve the stability while estimating the higher order poles. In such

a method, the poles are estimated from different nodes or different stimuli rather than

using a single node and a single stimulus as in AWE.

RC MOR

In RC MOR [21], a large RC interconnect network is partitioned into small subnetworks,

that are then approximated with lower order equivalent RC circuits. The partitioning is

done using the S-parameter matrix. After performing the partitioning, the admittance

matrix looking into the ports for a partition is approximated using the first two order

moments as:

Y(s) ≈ M0 +M1s (2.45)

The admittance to ground of the ith port is given as the sum of the ith row (or column)

of Y(s) while the admittance connecting the ith port and the jth port is the negative of

the (i, j)th element (yij) in Y(s). The circuit models shown in Fig. 2.12 and 2.13 are used

to synthesize the circuit between the pairs of ports by matching the moments yij. The

port-to-ground elements are synthesized using a parallel RC model as shown in Fig. 2.14.

The T-model is used to synthesize all the off-diagonal elements of Y(s) with mij1 ≥ 0,

where mijk is the (i, j)th element of the kth order moment matrix. The elements of the

T-model are given by:

Rij1 =−√

mii1 +

Rij2 =−√

mii1 +

) (2.46)

mii1 +

If the circuit contains floating capacitors, mij1 can become negative and in that case, we

use the floating capacitance model whose elements are given by:

Rij =−1

Cij = −mij1 (2.47)

port - i C

Rij2Rij1

ijport - j

Figure 2.12: T-Model

ijport - jport - i

Figure 2.13: Floating Capacitance Model

As for the port-to-ground connections, the authors [21] provide:

mii0 +

nprt∑

j=1i 6=j

Cii = mii1 −

nprt∑

j=1i 6=j

CijR2ij2

(Rij1 +Rij2)2(2.48)

RC MOR is used as a first step to reduce the sub-grid in the incremental verification

approaches presented in chapter 3 and chapter 4. The modeling time required for this

method is linear with the number of ports of the original interconnect network. RC

MOR preserves the block and sparse structure in the resultant reduced network. It has

improved stability over AWE because only lower order moments are required. Another

important advantage of this method is that it is completely realizable, as the reduced

model is also an RC circuit. AWE-based methods provide better accuracy because of

a high approximation order that results in an increase in the modeling and simulation

time. Partitioning allows the use of lower order models with accuracy comparable to

reducing a large RC interconnect using a higher approximation order.

In [22], the authors use hMETIS [23] to perform partitioning. The approach was

extended to include RLC circuits in PartMOR [24]. In PartMOR, moments upto the

third order are used to macromodel the RLC circuit elements. SparseRC [25] employs a

fill-in-reducing based ordering scheme to improve sparsity during model reduction. The

basic model reduction engine used in SparseRC is similar to PartMOR. The drawback of

SparseRC lies in the fact that it can only be used for RC circuits. In the next sub-section,

we cover implicit moment matching-based methods.

iiport - i Rii C

Figure 2.14: Parallel RC model

2.5.3 Implicit Moment Matching

The main idea of implicit moment matching techniques, also referred to as projection-

based MOR, is to project the moment space onto an orthonormal subspace called the

Krylov subspace. As always, we are interested in reducing the number of state variables

in x while approximately maintaining the input-output behavior of the system. This can

be achieved by finding a transformation matrix V such that x = Vx, where x is the state

variable vector for the reduced system. The MNA formulation given in (2.37) can also

be written as:

x(t) = Ax(t) +Ru(t)

y(t) = LTx(t) (2.49)

where A = −G−1C and R = G−1B. The reduced system can be written as:

˙x(t) = Ax(t) + Ru(t)

y(t) = LT x(t) (2.50)

where A = VTAV is r × r, R = VTR is r × p and L = VTL. The transfer function

H(s) of the original system can be written as:

H(s) = LT (I− sA)−1R (2.51)

The moment space of the original system for the first r moments is given by:

{M0, M1, M2, . . . , Mr−1} = {LTR, LTAR, LTA2R, . . . , LTAr−1R}

= LT{R, AR, A2R, . . . , Ar−1R} (2.52)

Therefore, the Krylov subspace of order r for the system described above is:

Kr(R,A) ≡ span(R, AR, A2R, . . . , AN−1R) (2.53)

If the columns (v0, v1, . . . , vr) of the projection matrix V are chosen such that:

span(v0, v1, . . . , vr) = span(R, AR, A2R, . . . , AN−1R) (2.54)

then the reduced-order model will match the first r moments of the original system.

Passive Reduced-Order Interconnect Macromodeling Algorithm

In PRIMA [26], the projection matrix is constructed using the block-Arnoldi algorithm [17].

It can guarantee the passivity of the reduced system if the original system is in MNA

form with L = B where B is n× nprt.

PRIMA does not preserve certain important circuit properties like reciprocity. The

resulting reduced matrices are also larger than those obtained from direct pole matching.

The size of the matrices (states) depends upon the number of moments to be matched

and also on the number of ports. For every block moment order increase, PRIMA will

generate nprt new poles. To overcome this issue, TERMMERG [27] uses a reduced-rank

approximation technique to group the terminals with similar timing or delay behavior

into one terminal prior to performing reduction using PRIMA. In [18], it has been argued

that PRIMA takes in an electrical circuit and outputs a system of poles and zeros which

needs to be recast into an electrical circuit. Therefore, one must have a knowledge of the

nexus between the matrix formulations and electric circuits to use this approach.

Structure-Preserving Reduced Order Interconnect Modeling

SPRIM [28] is a projection-based method for RLC circuits that can preserve the structure

of the original model. It uses a 2× 2 block structure of the MNA matrix:

G′ AT

−A 0

(2.55)

where A is a matrix that indicates the branch current flow at the inductor. Similarly, a

structured projection matrix V is obtained by partitioning V as:

→ V =

(2.56)

PRIMA can now be used to reduce the system. This reduced matrix will have twice

the number of poles but there would be an improvement in accuracy over PRIMA. The

system can be further reduced by using Schur’s decomposition [29] of the branch-current

variables. SPRIM does not completely model the sub-block structure and locality of an

interconnect model. Interconnect models or P/G grids are highly local and sparse. In

BSMOR [30], the MNA matrices are divided into many blocks, followed by the reduction

of every block using PRIMA. BSMOR results in better accuracy and also preserves the

block structure of the system.

2.5.4 Truncated Balance Realization

In truncated balance realization methods, the system is mapped onto a basis where

the states that are difficult to reach are truncated. These techniques are based on the

computation of controllability and observability Gramians [31].

Standard TBR

For the system defined in (2.49), the controllable GramianX and the observable Gramian

Y are unique symmetric, positive definite solutions to the Lyapunov equations [17]:

AX+XAT +RRT = 0

ATY +YA+ LLT = 0 (2.57)

Applying a similarity transformation to diagonalize the product XY such that:

T−1XYT = Σ = diag(σ21, σ2

2, . . . , σ2n) (2.58)

where the singular values of the system (σk) are arranged in a descending order. The

matrices are then partitioned as:

(2.59)

where Σ1 = diag(σ21, σ

22, . . . , σ

2r) is made from the first r largest eigenvalues of XY, and

W1 and V1 are the corresponding eigenvectors. Thus, the reduced model can now be

obtained by using WT1 and V1 instead of V as in the case of congruence transformations.

Therefore, the reduced system equation (2.50) will consist of the following matrices:

A = WT1 AV1 R = VT

1 R L = VT1 L (2.60)

Although TBR ensures excellent accuracy, it is computationally very expensive because

of the cubic complexity of solving the Lyapunov equations. Poor Man’s TBR [32] is a

truncation-based method that uses approximate Gramians to perform reduction. This

results in reducing the time complexity of the reduction method. Adaptive sampling-

based methods like WBMOR [33] can be used to provide near accurate reduced models

over wide frequency bands. These methods are based on the TBR method and compute

the approximate Gramians using a Monte-Carlo sampling approach.

2.5.5 Local Node Elimination

The main idea of techniques involving local node elimination is to reduce the number

of nodes in the circuit and approximate the newly added elements in the circuit matrix

with reduced rational forms [17]. The main advantage of these methods is the ability

to perform the reduction in a local manner and the fact that no overall solution of the

whole circuit is required, which makes these methods applicable to large interconnection

networks.

Time-Constant Equilibrium Reduction

TICER [18] is an elimination-based approach for reducing the interconnect model. It

converts a given RC network into a smaller RC network by eliminating nodes that have

fewer neighbors and a small value of time constant. These nodes are called quick nodes.

The time constant of a node is given by:

τN =CN

(2.61)

where CN is the sum of all capacitive elements incident on node N and GN is the sum of

all conductive elements. If |sτN | ≪ 1 where s is the frequency under which the system

needs to be operated, the node is called a quick node and can be eliminated from the

circuit. In terms of circuit elements, the reduction can be described as the following

two-step process:

1. Remove all the resistors and capacitors connecting other nodes to node N .

2. Insert new resistors and capacitors between the former neighbors using the following

two rules:

• If nodes i and j were connected to node N by conductances giN and gjN , insert

a conductance giNgjN/GN between i and j.

• If node i had a capacitor ciN to N and node j a conductance gjN to N, then

insert a capacitor ciNgjN/GN between i and j.

In the incremental verification approach, TICER is used to reduce the sub-grid that

has already been reduced once using RC MOR. It does not exactly replicate the first

order moments of the admittance matrix but it preserves the Elmore delays through RC

trees in most cases. It cannot be applied to RLC circuits. As a node is eliminated, it

leads to addition of new elements to the neighbors. This addition reduces the sparsity

of the matrices. Therefore, care must be taken while selecting the quick nodes. In [34],

the authors extend local node elimination to include RLC circuits by matching the DC

characteristics and the first two moments at all the nodes. The method is applicable

to a simplified RLC model in which the RLC network connecting a pair of nodes has

branches, in which one or more of the elements (R, L, or C) is zero. In this scheme,

two time constants are associated with each node and the reduction is done based on the

dominant time constant. The method introduces a large error if the time constants are

comparable to each other.

2.5.6 Divide and Conquer

We have covered some of the most popular techniques to compactly model the power

distribution network. There is yet another class of reduction techniques that applies a

divide-and-conquer strategy to efficiently reduce the given system. In [35], the power grid

is partitioned into global and local grids. The behavior of a local grid is abstracted at the

interface (port nodes) with the global grid and, then the global grid is simulated using the

macromodels of the local grids. The work presented in chapter 3 adapts this hierarchical

approach to the constraints-based verification framework. Block-based partitioning for

parallel power grid analysis is presented in [36]. In this work, the power grid is parti-

tioned based on the functional blocks and that most of the block current is drawn from

the C4 bump nearest to the block. The partitioner can then be combined with existing

MOR techniques to produce reduced-order models for each partition. The simulation re-

sults from individual partitions are then combined to perform full-chip analysis. In [37],

a hierarchical matrix representation of the power grid was proposed. The hierarchical

matrices are derived from the partitioning of the power grid. This type of representation

allows for reduced storage space and improved simulation times. Approaches based on

this strategy have fast simulation times because the entire problem is broken down into

small sub-problems that can be solved efficiently. In the next chapter, we will be focus-

ing on an incremental verification approach for vectorless constraints-based verification

framework.

Chapter 3

3.1 Introduction

In this chapter, we present an incremental power grid verification approach that extends

previous work [12] to the case of transient currents. In the incremental verification

framework defined in section 2.2.1, we are interested in finding the worst-case voltage

drops at those nodes in the RC model that have been identified as external by the user.

A solution for the case when all the currents are DC was given in [12]. Strictly speaking,

the extension in the general dynamic case requires an iterative relaxation-based analysis

of both the internal and external grid sections, which can be expensive. This difficulty

was overcome in [13] for the purpose of circuit simulation (not vectorless verification),

through the use of the multi-port Norton theorem.

In this work, we provide the first solution in the dynamic case for the purpose of

incremental vectorless verification, based on two contributions: 1) upper bounds on the

voltage drops at internal nodes are efficiently computed and used in lieu of the worst-case

drops, and 2) a macromodel is constructed for the sub-grid based on the movement of

the internal current sources by adapting the multi-port Norton theorem proposed in [13]

to our verification framework, followed by the reduction of the passive RC circuit by

Chapter 3. Incremental Power Grid Verification 34

combining the moment matching-based approach as described in [21] and [22] with the

node elimination-based approach of [18]. A version of this work has appeared in [38].

In the next section, we present an efficient way to compute the bounds on the worst-

case voltage drops, benefiting from the fact that voltage drops at internal nodes are not

required, and the following sections describe our power grid macromodeling approach

that then allows for incremental verification.

3.2 Efficient Bounds Computation

To compute vub using (2.16), we need to have an estimate of the worst-case voltage drop

entries of Va at both internal and external nodes. From (2.3), we have:

GT23vprt(t) +G33vint(t) +Cintvint(t) = is,int(t) (3.1)

which, after time-discretization, leads to:

vint(∆t) = A−1intis,int(∆t)−A−1

intGT23vprt(∆t) (3.2)

where Aint =(

G33 +Cint

is a symmetric, positive-definite, M-matrix, so that A−1int ≥ 0.

Because −GT23 and A−1

int are non-negative matrices, then in the special case used earlier

to define Va, we can write:

emax∀is∈F

(vint(∆t)) ≤ A−1int emax

∀is∈F(is,int(∆t)) +TT emax

∀is∈F(vprt(∆t)) (3.3)

where the transformation matrix T is given by:

T = −G23A−1int (3.4)

Equation (3.3) gives an upper-bound on the worst-case voltage drops at t = ∆t for all

internal nodes, so that we can write:

Va = emax∀is∈F

(v(∆t)) ≤

Iext 0 0

0 Iprt 0

0 TT A−1int

emax∀is∈F

vext(∆t)

vprt(∆t)

is,int(∆t)

Table 3.1: Legend

v(t) voltage drop in original grid

v′(t) voltage drop at internal nodes when port nodes are shorted

v(t) voltage drop after moving the internal current sources

v(t) voltage drop after macromodeling

where Iext and Iprt are identity matrices of sizes next and nprt, respectively. From this,

and because G−1 ≥ 0, we have from (2.16) that:

vub ≤

I+G−1 C

Iext 0 0

0 Iprt 0

0 TT A−1int

emax∀is∈F

vext(∆t)

vprt(∆t)

is,int(∆t)

Because emax∀is∈F

(is,int(∆t)) = iL,int (the vector of local constraint values for current sources

internal to the sub-grid), this gives a faster way to compute an upper bound on vub which

involves solving LPs for external and port nodes only, followed by a standard linear solve:

vub ≤

I+G−1 C

Iext 0 0

0 Iprt 0

0 TT A−1int

emax∀is∈F

(vext(∆t))

emax∀is∈F

(vprt(∆t))

iL,int

3.3 Power Grid Reduction

Because the internals of the sub-grid do not need to be verified, significant performance

improvement can be obtained by reducing or eliminating much of the sub-grid network.

Two steps are involved in this: 1) moving the internal current sources to the port nodes,

which benefits from the multi-port Norton theorem reviewed in section 2.4, and 2) re-

ducing the remaining parasitic RC network inside the sub-grid using MOR.

3.3.1 Moving Internal Current Sources

In HiPRIME [13], multi-port Norton equivalent circuits were used to move the current

sources internal to a block to the ports. This previous work benefited from the multi-port

Norton theorem for simulation purposes. In our work, we adapt this theorem for use in

verification, where the current sources are not known, but are instead subject to current

constraints.

Norton Equivalent Current Sources

The grid equation of the sub-grid when the port nodes are shorted to ground as in Fig. 2.9

is given by:

G33v′int(t) +Cintv

′int(t) = is,int(t) (3.7)

where v′int is the resulting nint × 1 voltage drop vector at internal nodes. Let us call an

internal node that connects to a port node k, a neighbor of k. The current through a

port node to ground i′k(t) is given by:

i′k(t) =∑

neighbors j of k

gkjv′intj(t) (3.8)

where gkj is the conductance through which port node k is connected to internal node j

and v′intj(t) is the voltage drop for internal node j. In (2.3), G23 is the nprt×nint matrix

consisting of all the conductance links from port nodes to internal nodes. Therefore,

using (3.8), the Norton short-circuit current vector i′(t) can be written as:

i′(t) = −G23v′int(t) (3.9)

Modified Grid

The grid resulting after the internal current sources of the sub-grid have been removed

and replaced by the new port current sources as in Fig. 2.11, will be referred to as the

modified grid. In this modified grid, the voltage drops at the nodes will be denoted by

v(t), and the system equation becomes:

vext(t)

vprt(t)

vint(t)

˙vext(t)

˙vprt(t)

˙vint(t)

Iext 0 0

0 Iprt 0

is,ext(t)

is,prt(t)

is,int(t)

i′(t)

which can also be written as:

Gv(t) +C ˙v(t) =Jis(t) +Ki′(t) (3.10)

where J is an n × n matrix consisting of Iext and Iprt, and K is an n × nprt matrix

consisting of Iprt. Time-discretizing (3.10) gives:

Gv(t) +C

v(t)− v(t−∆t))

= Jis(t) +Ki′(t) (3.11)

We now return to the special case situation used to define Va earlier. In that case, the

voltage in the modified grid at time t = ∆t is given by:

v(∆t) = A−1 (Jis(∆t) +Ki′(∆t)) (3.12)

Likewise, time-discretizing (3.7) and evaluating at t = ∆t, we get:

v′int(∆t) = A−1intis,int(∆t) (3.13)

From (3.9) and (3.13),

i′(∆t) = −G23v′int(∆t)

= −G23A−1intis,int(∆t) ≡ Tis,int(∆t) (3.14)

which can also be written as:

i′(∆t) =

is,ext(∆t)

is,prt(∆t)

is,int(∆t)

= Pis(∆t) (3.15)

where P is a nprt × n matrix that contains T. Using (3.15) in (3.12), we get:

v(∆t) = A−1(J+KP)is(∆t) = A−1Jis(∆t) (3.16)

where:

Iext 0 0

0 Iprt T

(3.17)

The above results will be used in the following to efficiently verify the external nodes

in the modified grid. From Norton’s theorem, the modified grid will exhibit the same

voltage response at external and port nodes as the original grid. Therefore, ∀is ∈ F :

vext(∆t) = vext(∆t)

vprt(∆t) = vprt(∆t)

so that:

emax∀is∈F

(vext(∆t)) = emax∀is∈F

(vext(∆t)) (3.18)

emax∀is∈F

(vprt(∆t)) = emax∀is∈F

(vprt(∆t)) (3.19)

Therefore, any verification that we will do below on the external nodes in the modified

grid will also verify the same nodes in the original grid:

vub ≤

I+G−1 C

Iext 0 0

0 Iprt 0

0 TT A−1int

emax∀is∈F

(vext(∆t))

emax∀is∈F

(vprt(∆t))

iL,int

(3.20)

Equation (3.16) is significant in that it is a replacement of (2.17) for the case of the

modified grid, and so it provides a way to find emax(vext(∆t)) and emax(vprt(∆t)) using

the same user-provided local and global constraints that were given for the original grid.

3.3.2 Sub-grid Reduction

After moving the internal current sources to the port nodes, we are left with a sub-

grid consisting only of parasitic RC elements. Therefore, we can use a passive MOR

technique to reduce the internals of the sub-grid. The reduction approach that we have

found applicable and beneficial for this work combines the two standard techniques of

moment matching and node elimination.

Moment Matching

We use a nodal formulation-based method [22] to compute the moments of the system

transfer function. The passive sub-grid is first isolated from the rest of the grid by

removing (i.e., make into an open circuit) the connections from all port nodes to external

nodes. Therefore, the isolated sub-grid can be represented in the s-domain by:

G22 G23

GT23 G33

Cprt 0

0 Cint

vprt(s)

vint(s)

is,prt(s) + i′(s)

(3.21)

where G22 is an nprt × nprt conductance matrix that is derived from G22 by removing

the connections from port nodes to external nodes represented by GT12. The admittance

looking into the ports of the sub-grid, a matrix Y(s), can be approximated as [39]:

Y(s) ≈ M0 +M1s (3.22)

where M0 = G22 −G23V and M1 = Cprt +VTCintV are the nprt × nprt, zero and first-

order moment matrices with V = G−133 G

T23. Because of the quadratic form of M1, it is

clear that it is a non-negative matrix.

For circuits with a non-negative M1 matrix, a 2π-model between pairs of ports was

constructed in [21] (RC MOR, reviewed earlier in sub-section 2.5.2) by matching the zero

and first-order moments. We will use this approach to reduce the circuit between pairs

of ports.

Node Elimination

Note that the T-model given by RC MOR generates an extra node (a new internal node)

for each pair of ports. If we have nprt port nodes, generating a T-model for every pair of

ports can be expensive because it will result in nprt(nprt − 1)/2 new nodes. To overcome

this, we can eliminate many of the new internal nodes by using the node elimination-

based reduction approach proposed in [18]. For every new internal node, the nodal time

constant [18] is given by:

τ =CijRij1Rij2

Rij1 +Rij2

(3.23)

If τ < τN where τN is a user-specified nodal time constant value, the internal node can

be eliminated by adding capacitors Ci and Cj to port nodes i and j and resistors Rij

between i and j. The capacitors and resistors are given by:

Rij = Rij1 +Rij2

Ci =CijRij2

Rij1 +Rij2

, Cj =CijRij1

Rij1 +Rij2

(3.24)

Sparsification

Once the system matrices of the reduced model are formed, it turns out that they often

contain a large number of negligible (near zero) entries. As a final step in the reduction,

therefore, we have found it useful to apply a sparsification step, where entries whose

absolute value is below a small value κ are simply set to 0. We have found the error

resulting from this to be insignificant and very much worth the effort.

Final Reduced Model

After applying the reduction techniques described above, we end up with new conductance

(Gsub) and capacitance (Csub) matrices for the isolated sub-grid, given by:

Gsub =

G22 G23

GT23 G33

; Csub =

Cprt 0

0 Cint

(3.25)

Algorithm 1 INCR VERIFYInput: Partitioned power grid matrices in (2.3), τN , κ, δ1 and δ2

Output: Upper bounds on worst-case voltage drops for external nodes

1: Construct subgrid matrices in (3.21)

2: (T, Gsub, Csub) = MACRO(subgrid matrices, τN , κ, δ2)

3: Construct J, G, C and A

4: for (j = 1, . . . , next + nprt) do

5: Compute jth row of A−1 using SPAI [8] with δ = δ1

6: Multiply that row by the columns of J, get row vector d

7: Maximize: d · is, subject to: is ∈ F

8: end for

9: Compute vub using (3.29)

where G22 and Cprt are the modified port-to-port conductance matrix and capacitance matrix

respectively, G33 and Cint are nint × nint conductance and capacitance matrices for the nint

remaining (new) internal nodes, and G23 and GT23 are matrices consisting of connections between

the port nodes and the remaining internal nodes.

3.4 Verification after Reduction

After macromodeling, the reduced full grid matrices can be constructed by stitching together

the reduced sub-grid and external grid matrices using the connections from external nodes to

port nodes:

G11 G12 0

GT12 G′

22 G23

0 GT23 G33

Cext 0 0

0 Cprt 0

0 0 Cint

(3.26)

where G′22 is the updated port-to-port conductance matrix in which the connections from

external nodes to port nodes have been added. Since we have used a realizable macromodeling

approach, G is also an n×n symmetric, positive-definite, M-matrix, where n = next+nprt+nint.

Let v(t) be the voltage drop vector in the reduced grid, which is partitioned in the usual way

into vext(t), vprt(t), and vint(t). It is to be expected that:

vext(t) = vext(t) ≈ vext(t)

vprt(t) = vprt(t) ≈ vprt(t) (3.27)

Applying (3.16) to the reduced grid gives:

v(∆t) = A−1Jis(∆t) (3.28)

where A =(

is an M-matrix and J is an n× n matrix with the first next + nprt rows

equal to the first next + nprt rows of J defined in (3.17), and the remaining nint rows have

all entries equal to 0. The worst-case voltage drop at external and port nodes can be found

(approximately) by element-wise maximization of vext and vprt. To the extent that vext ≈ vext

and vprt ≈ vprt, we can approximately restate (3.20) as:

vub ≤ vub =

I+G−1 C

Iext 0 0

0 Iprt 0

0 TT A−1int

emax∀is∈F

(vext(∆t))

emax∀is∈F

(vprt(∆t))

iL,int

(3.29)

where vub is the n-vector of upper bounds on worst-case voltage drops with the first next entries

corresponding to upper bounds for external nodes.

3.5 Implementation

The overall flow of the proposed incremental verification approach is given in Algorithm 1. We

start with a user-specified power grid and sub-grid, along with parameter values for τN , κ, and

error tolerance values (δ1, δ2) for the sparse approximate inverse (SPAI [8]) engine, which is

inherently parallelizable and can compute a single column/row of the approximate inverse. The

grid matrices are appropriately partitioned and reduction of the sub-grid is performed. As a

result, the size of the original power grid is reduced and the internal current sources are moved

to the port nodes. The inverse of the matrix A is then computed, row by row, using SPAI,

and every row is multiplied by J to account for the effect of the movement of internal current

sources. Next, we maximize the voltage drop at external and port nodes, and compute the

upper bounds on worst-case voltage drops by using (3.29).

The macromodeling algorithm is presented in Algorithm 2. To avoid the cost of constructing

the full matrix A−1int, we generate one row of the transformation matrix T = −G23A

−1int at a

time, using SPAI. For a port node k, we first identify the neighbors, and the connections (gkj) to

the neighbors. Then, we compute the corresponding row of the approximate inverse, multiply

the row vector by gkj , and add the result to the kth row of T. Calculation of moments is also

efficiently done by factorizing G33, followed by standard system solves.

3.6 Experimental Results

A C++ implementation has been written to test the proposed approach. We use SPAI to compute

the approximate inverses, and solve the linear programs using MOSEK [40]. The test grids

were generated from user specifications, including grid dimensions, metal layers, pitch and

width per layer, and supply voltage sites and current sources distribution. The supply voltages

and current sources were randomly placed on the grid. Around 15-20% of the nodes had

current sources attached to them while Vdd sources were present at about 7% of the nodes.

The technology specifications were consistent with 1.1 V 65nm CMOS technology. A global

constraint is specified for the sub-grid and other global constraints were specified to cover the

entire chip. The sub-grid nodes are also identified by the user. Computations were done using

a 2.6 GHz Linux machine with 24 GB of RAM. A SPAI error tolerance value of δ1 = 0.1mV

is used to compute the approximate inverse for the original and modified power grids. A lower

value of tolerance δ2 = 0.01mV is used to construct T.

Table 3.2 shows the speed and accuracy of the proposed bounds computation approach

(section 3.2). Since we are interested in analyzing only the external nodes, we report maximum

error and average percentage error values for the upper bounds on the worst-case voltage drops

at external nodes only. The runtime and accuracy are compared with the original approach

based on finding the worst-case voltage drop at every node, and then computing the upper

bound using (2.16). The results show that we are able to achieve significant runtime savings

Table 3.2: Speed and accuracy after using efficient bounds computation

Power Grid Sub-grid Max Error Avg. % CPU time Speed-up

Name n nint nprt (mV) Error Original Fast vub

G1 8,413 3,891 118 0.08 0.075 22.92 min. 10.58 min. 2.16x

G2 18,678 10,788 176 0.08 0.102 68.19 min. 24.97 min. 2.73x

G3 32,554 15,714 208 0.07 0.038 3.0 h. 1.38 h. 2.17x

G4 50,444 29,458 290 0.07 0.055 5.51 h. 1.96 h. 2.81x

G5 72,692 42,764 348 0.07 0.047 7.71 h. 3.11 h. 2.47x

G6 98,162 68,972 402 0.08 0.067 12.57 h. 3.51 h. 3.58x

G7 128,241 95,294 413 0.08 0.064 18.71 h. 4.19 h. 4.46x

G8 162,087 124,824 518 0.08 0.078 25.30 h. 5.27 h. 4.8x

with negligible error values. The power grid macromodeling approach was tested with user-

specified values for nodal time constant (τN = 5ps), and conductance threshold (κ = 5× 10−3).

Table 3.3 gives the speed and accuracy obtained after applying power grid macromodeling.

The runtime for the reduced grid includes the time taken to perform reduction of the sub-grid,

movement of internal current sources, and finding the upper bounds on worst-case voltage drop.

The accuracy is compared with the original approach while speed-up is measured with respect

to the efficient bounds computation approach. We also report the total speed-up with respect

to the original approach. The results show that we incurred an average error of about 1% for

large grids while extracting a total speed-up in the range of 3-8x.

The relative error plots for both the approaches are shown in Fig. 3.1 and Fig. 3.2. These

plots give the variation of the relative error (v−v)/v versus the worst-case voltage drop values at

the external nodes of the power grid. It can be observed from the plots that we have been able

to bound the error to under 1mV for the bounds computation and under 3mV for verification

performed after macromodeling the sub-grid.

It is to be expected that, if fewer nodes are to be verified, then corresponding time savings

Table 3.3: Speed and accuracy after applying macromodeling

Power Max Error Avg. % CPU time Speed-up Total Speed-up

Grid (mV) Error Fast vub Reduced Reduced vs Original

G1 2.4 1.96 10.58 min. 4.73 min. 2.23x 4.84x

G2 2.01 1.89 24.97 min. 12.26 min. 2.03x 5.56x

G3 1.35 1.05 1.38 h. 1.06 h. 1.3x 2.83x

G4 1.92 1.07 1.96 h. 1.44 h. 1.36x 3.82x

G5 2.01 0.88 3.11 h. 2.36 h. 1.31x 3.26x

G6 2.79 1.14 3.51 h. 2.28 h. 1.53x 5.51x

G7 3.13 1.11 4.19 h. 2.64 h. 1.58x 7.08x

G8 2.6 1.14 5.27 h. 3.14 h. 1.67x 8.05x

0 0.02 0.04 0.06 0.08 0.1−10

voltage drop (V)

Figure 3.1: Relative Error Plot (efficient bounds)

0 0.02 0.04 0.06 0.08 0.1−30

voltage drop (V)

Figure 3.2: Relative Error Plot (macromodeling)

0 20 40 60 80 1000

Size of Subgrid (% of Full Grid)

Using Efficient BoundsAfter MacromodelingTheoretical

Figure 3.3: Runtime vs Size of Sub-grid

0 20 40 60 80 1000

Size of Subgrid (% of Full Grid)

Using Efficient BoundsAfter MacromodelingTheoretical

Figure 3.4: Speed-up vs Size of Sub-grid

would be achieved. However, in our case, the speed-ups are much higher than would be obtained

based solely on this observation. The graph in Fig. 3.3 shows the variation of runtime for

power grid verification after applying macromodeling and verification using the efficient bounds

computation approach with respect to the size of the sub-grid. The runtimes are scaled to the

time required to perform full grid verification. The theoretical runtime is the runtime which

is to be expected considering that we are only verifying a part (certain percentage of total

number of nodes) of the grid. Fig. 3.4 gives the same variation for speed-up. It can be observed

from these plots that, as the size of the sub-grid increases, gains corresponding to power grid

reduction become larger. The power grid macromodeling approach has an overhead associated

with the movement of current sources that requires explicit computation of the transformation

matrix and moment matching that requires a system solve.

Table 3.4 gives the runtime break down for power grid verification after applying macromod-

eling. It can be observed that the overheads associated with macromodeling can be significant

for small grids (G1 and G2). As the sizes of the grids increase, the majority of the time re-

quired is spent in performing verification (INCR VERIFY), thereby making the overheads less

Table 3.4: Runtime breakdown

Power Total MACRO INCR VERIFY Bound on

Grid Time T Gsub and Gsub Multiply J LP SPAI vint

G1 4.73 min. 6.16 s. 1.66 s. 0.3 s. 53.22 s. 3.711 min. 0.1 s

G2 12.26 min. 12.17 s. 9.92 s. 0.19 s. 1.558 min. 10.32 min. 0.22 s.

G3 1.06 h. 14.23 s. 13.58 s. 0.63 s. 3.53 min. 59.84 min. 0.44 s.

G4 1.44 h. 22.21 s. 45.01 s. 4.96 s. 4.44 min. 1.35 h. 0.74 s.

G5 2.36 h. 29.68 s. 97.75 s. 12.49 s. 6.3 min. 2.17 h. 1.05 s.

G6 2.28 h. 50.21 s 2.49 min. 2.81 s. 6.26 min. 2.12 h. 1.68 s.

G7 2.64 h. 1.05 min. 3.18 min. 42.1 s. 7.12 min. 2.43 h. 3.1 s.

G8 3.14 h. 1.47 min. 9.11 min. 25.64 s. 7.85 min 2.82 h. 3.79 s.

significant. From a comparison between these runtime results and the results obtained from

full grid verification, it can be concluded that power grid macromodeling results in reducing

the size of individual problems, which in turn results in significant reductions in the time taken

to solve LPs and to compute approximate inverses using SPAI.

We introduced a sparsification step in our reduction approach to reduce the number of

fill-ins (so as to maintain the sparsity of the matrices) and hence, make optimization faster.

A higher value of conductance threshold (κ) would mean fewer fill-ins, better runtime, but

increased error as shown in Fig. 3.5. The plot shows runtime and error variation for grid G1

with κ, with τN kept constant at 5× 10−12.

Another important step in the reduction of the power grid was node elimination that was

applied to the reduced sub-grid in order to remove some of the remaining (newly formed) internal

nodes. The nodal time constant (τN ) determines the extent of reduction. We performed an

analysis of the extent of reduction and runtime variation with τN for G1, with κ kept constant

at 0.005. It can be seen from Fig. 3.6 that as the value of τN is increased, the number of nodes

which become eligible for removal increases, resulting in more reduction and improved runtime.

1e−5 1e−4 1e−3 1e−25e−3 5e−25e−45e−5

Runtime(s)Error (mV)

Figure 3.5: Runtime and Accuracy vs κ for G1

0 0.2 0.4 0.6 0.8 1x 10

Runtime% Reduction

Figure 3.6: % Reduction and Runtime vs τN for G1

Algorithm 2 MACRO(subgrid matrices, τN , κ, δ2)

Output: T in (3.3), Gsub and Csub in (3.25)

1: Construct Aint

2: for (every port node k) do

3: Find neighbors of k and gkj in (3.8)

4: for (every neighbor j of k) do

5: Compute the jth row of A−1int using SPAI [8] with δ = δ2

6: Multiply the row entries by gkj

7: Add the row entries to the kth row of T

8: end for

9: end for

10: Compute M0 and M1

11: for (every pair of ports i, j) do

12: Compute Rij1, Rij2 and Cij using (2.46)

13: end for

14: for (every port node k) do

15: Compute Rkk and Ckk using (2.48)

16: end for

17: for (every new internal node created) do

18: Compute τ using (3.23)

19: if τ < τN then

20: Compute Rij , Ci and Cj using (3.24)

21: Eliminate the new internal node

22: end if

23: end for

24: Drop insignificant connections with conductance less than κ

25: Construct Gsub and Csub

Chapter 4

Dimension Reduction of the Feasible

Space of Currents

4.1 Introduction

In the incremental verification framework described in the previous chapter, we did not physi-

cally replace the internal current sources by current sources at port nodes. Instead, we captured

the effect of the movement of internal current sources by using a transformation matrix (T).

Therefore, the feasible space and the size of the LPs in terms of the number of variables re-

mained the same. A typical sub-grid has smaller number of port nodes as compared to the

number of internal current sources. If we are able to replace the feasible space F by another

feasible space F defined in terms of the modified current source configuration (external sources

and Norton equivalent current sources at port nodes), we can reduce the size of the LPs, thereby

achieving additional speed-up. In this chapter, we will be focusing on algorithms to compute

this new feasible space F and the application of this concept in the construction of the Chip

Power Model (also referred to as CPM in this chapter). The next section describes a modified

incremental verification approach that benefits from the use of a dimension reduced feasible

space F .

Chapter 4. Dimension Reduction of the Feasible Space of Currents 52

4.2 Incremental Verification

In incremental verification, we are interested in verifying external nodes by macromodeling the

parts of the grid (sub-grid) that need not be verified. To perform macromodeling of the sub-

grid, we need to compute the Norton equivalent current sources at port nodes that will replace

the current sources internal to the sub-grid. For a general sub-grid, a global constraint may

include current sources internal and external to the sub-grid, which makes it difficult to simulate

the sub-grid in isolation. In [12], the authors decoupled the global constraints by replacing the

constraints on some external sources (based on locality) with an average value derived from the

switching activity of the external circuit. In this work, we use the transformation matrix T to

define a feasible space in terms of the new current source configuration in which the external

current sources remain intact while Norton equivalent current sources are added to port nodes.

Recall that the voltage drop at time t = ∆t after performing macromodeling was given by (3.28)

v(∆t) = A−1Jis(∆t)

Let us first define the following two notations:

v|k , a sub-vector consisting of the first k entries of v

A|k , a sub-matrix consisting of the first k × k elements of A

Using the fact that J is an n × n matrix with the first next + nprt rows equal to J defined

in (3.17) and the remaining nint rows have all entries equal to 0, the voltage drop for external

and port nodes after macromodeling will be given by:

vext(∆t)

vprt(∆t)

= v(∆t)|η = A−1|η(Jis(∆t))|η = A−1|ηi

′s (4.1)

where η = next + nprt and i′s is the η-vector of current sources consisting of original current

sources at port and external nodes and Norton equivalent current sources attached to the port

nodes and is given by:

i′s = (Jis)|η (4.2)

Using (4.1) in (3.29), we get:

I+G−1 C

Iext 0 0

0 Iprt 0

0 TT A−1int

emax∀is∈F

(A−1|η(Jis)|η)

iL,int

From (4.2), we can define a new feasible space in terms of the modified current source vector

i′s at time t = ∆t, that is given by:

i′s : ∃ is ∈ F , for which i′s = (Jis)|η

Claim 1. emax∀is∈F

(A−1|η(Jis)|η) ≡ emax∀i′s∈F

(A−1|ηi′s), where F and i′s are defined in (4.4) and (4.2),

respectively.

Proof. Let us assume that:

emax∀is∈F

(A−1|η(Jis)|η) = A−1|η

(Ji(1)s )|η (Ji

(2)s )|η . . . (Ji

(η)s )|η

where i(1)s , i

(2)s , . . . , i

(η)s are values of current sources such that the corresponding rows of A−1|η(Jis)|η

are maximized. Using (4.2) in (4.5), we have:

emax∀is∈F

(A−1|η(Jis)|η) = A−1|η

i′s(1) i′s

(2) . . . i′s(η)

i′s(1)

= (Ji(1)s )|η

i′s(2)

= (Ji(2)s )|η

i′s(η)

= (Ji(η)s )|η (4.7)

Using (4.4), (4.7) and the fact that i(k)s ∈ F , ∀k = 1, . . . , η, we have:

i′(k)s ∈ F , ∀k = 1, . . . , η (4.8)

emax∀is∈F

(A−1|η(Jis)|η) ≤ emax∀i′s∈F

(A−1|ηi′s) (4.9)

Similarly, assuming that:

emax∀i′s∈F

(A−1|ηi′s) = A−1|η

i′s(1) i′s

(2) . . . i′s(η)

(4.10)

where i′s(1), i′s

(2), . . . , i′s(η) are values of current sources such that the corresponding rows of

A−1|ηi′s are maximized. Since i

′(k)s ∈ F , ∀k = 1, . . . , η, we can find i

(k)s ∈ F such that:

i′s(1)

= (Ji(1)s )|η

i′s(2)

= (Ji(2)s )|η

i′s(η)

= (Ji(η)s )|η (4.11)

Using (4.11) in (4.10). we get:

emax∀i′s∈F

(A−1|ηi′s) = A−1|η

(Ji(1)s )|η (Ji

(2)s )|η . . . (Ji

(η)s )|η

(4.12)

Since i(k)s ∈ F , ∀k = 1, . . . , η,

emax∀is∈F

(A−1|η(Jis)|η) ≥ emax∀i′s∈F

(A−1|ηi′s) (4.13)

Therefore, from (4.9) and (4.13):

emax∀is∈F

(A−1|η(Jis)|η) ≡ emax∀i′s∈F

(A−1|ηi′s) (4.14)

Using (4.14), it can be concluded that (4.3) is equivalent to:

I+G−1 C

Iext 0 0

0 Iprt 0

0 TT A−1int

emax∀i′s∈F

(A−1|ηi′s)

iL,int

(4.15)

To define the problem, we make the following claim:

Claim 2. F is a convex polytope

Proof. From (2.15), it can be noted that F is bounded by an intersection of finitely many

halfspaces (given by Uis ≤ u and Uis ≥ 0) and is thus a convex polytope. Let i′1 and i′2 be two

points in F which can be written as:

i′1 = (Ji1)|η, i′2 = (Ji2)|η (4.16)

where i1, i2 ∈ F . Let us take a point i′3 such that i′3 lies on the line segment joining i′1 and i′2.

Hence,

i′3 = αi′1 + (1− α)i′2 (4.17)

where 0 ≤ α ≤ 1. From (4.16) and (4.17), we have:

i′3 = (Jαi1)|η + (J(1− α)i2)|η

= (J(αi1 + (1− α)i2))|η

= (Ji3)|η (4.18)

Since F is convex and i3 lies on the line segment joining i1 and i2, then i3 is also in F . Therefore,

from (4.4), i′3 ∈ F which implies that F is a convex set. It has been proved in [41] that any

linear mapping of a polytope is also a polytope. Since F is a linear mapping of F and F is a

convex polytope, then F is also a convex polytope.

Therefore, the convex polytope F can be expressed as an intersection of finitely many

hyperplanes:

F = {i′s : l′ ≤ U′i′s ≤ u′} (4.19)

In the sub-section that follows, we the hyperplanes that define F .

4.2.1 Defining F

The linear mapping (J) that converts the original current source configuration to the modified

current source configuration is non-invertible, implying that there is no one-to-one and onto

relationship between i′s and is. In the absence of such a relationship, the convex polytope F

can be computed as follows [41]:

1. Find the set of extreme points of F , denoted by E.

2. Compute E′ such that E′ = (JE)|η, where E′ is the superset of extreme points of F .

3. Compute the convex hull that contains the points in the set E′.

The time complexity of the above algorithm is O(N ⌊η/2⌋), where N is the number of points

in the set E′ and η is the dimension of the new feasible space F . Such an exponential time

complexity makes the exact solution infeasible in our case. An approximate algorithm for the

construction of multidimensional convex hulls was proposed in [42]. The algorithm starts with

a set of reference direction vectors represented in matrix form as rows of U, such that they are

regularly distributed on the unit hypersphere. For each direction vector ni, the inner product

〈ni, pi〉 is maximized and minimized over all pi ∈ E′ which defines the feasible space (or convex

hull) as:

F = {i′s : l ≤ Ui′s ≤ u} (4.20)

where l and u are the results of minimization and maximization of the inner products:

= eopt

∀i′s∈E′

(Ui′s) (4.21)

In this work, we use a modified version of this algorithm in which the set of direction vectors

is chosen based on the original constraint matrix U defined in (2.14) and the transformation

matrix T.

Direction Vectors

In the approximate algorithm to compute the convex hull for a set of points [42], the direction

vectors are important as they are the normal vectors to the hyperplanes that will describe the

convex hull. These hyperplanes circumscribe the exact convex hull meaning that the polytope

resulting from the approximate algorithm contains the original or exact convex hull. This will be

proved later in the text. To better approximate the convex hull, we need to find the direction

vectors such that the original convex hull is tightly contained by the approximate polytope.

Since the direction vectors are spread out uniformly around the unit hypersphere [42], the

quality of approximation is directly dependent on the number of these direction vectors. This

implies that we need a large number of direction vectors to get a good approximation for

the original space, which can be expensive because an “eopt” operation is required for every

direction vector.

We observe that the feasible space in our case is irregular meaning that there are certain

regions that require more detailing (more direction vectors) than others. The matrix U is such

that every row of the matrix corresponds to a direction vector that describes F . In our work, the

matrix (J) transforms the original current source configuration to the modified current source

configuration in which the Norton equivalent current sources are added at the port nodes.

Geometrically, it means that a point inside the n-dimensional space (F) is translated into a

point inside the η-dimensional space (F). Similarly, a direction vector (in F) can be projected

onto the η-dimensional space by using the transformation (J). Using the above observations,

we came up with a heuristic to find the direction vectors.

The first n rows of U are composed of an identity matrix In that represents the local con-

straints on the current sources. Using the same construction, the first η rows of the reference

direction vector matrix U are composed of an identity matrix Iη that represents the local con-

straints on the current sources present in the modified configuration. In the original constraint

matrix, the global constraints are represented by the last m rows of U. We use J to project the

hyperplanes representing the global constraints into the new η-dimensional space. Therefore,

the last m rows of U denoted by Um can be given by:

JUT(η+1)

JUT(η+2)

. . .(

JUT(m+n)

(4.22)

where UT(k) is the k

th column of UT . Using the above construction, we get an m+ η× η matrix

U. The convex polytope F obtained by using the direction vectors given by U will be able to

better approximate F because the direction vectors are obtained using the same transformation

and the same construction which was used to obtain F from F and therefore, are better able

to identify the regions that need more detailing.

New Feasible Space

After finding the direction vectors, the authors in [42] compute l and u using (4.21). Using the

same technique, the feasible space F can now be defined as:

F = {i′s : l ≤ Ui′s ≤ u} (4.23)

= eopt

∀i′s∈E′

(Ui′s) (4.24)

Using (4.2) and the fact that E′ = (JE)|η, (4.24) can also be written as:

= eopt

∀is∈E(U(Jis)|η) (4.25)

Since E is a set of the extreme points of F and solution to an LP is found at an extreme point

of the feasible space [41], we have:

= eopt

∀is∈F(U(Jis)|η) (4.26)

The matrix J is composed of an identity matrix and the transformation matrix (T). Since

T is a non-negative matrix (proved in section 3.2), J is also a non-negative matrix. The

constraint matrix U is composed of 0s and 1s and is thus non-negative. Therefore, the new

constraint matrix U in (4.22) is also a non-negative matrix. From (2.15) and the arguments

above, the result of minimization operation in (4.26) is a zero-vector which implies that l = 0.

Therefore, (4.26) gets reduced to:

u = emax∀is∈F

(U(Jis)|η) (4.27)

This implies that we need η + m optimizations to compute u. The new local constraints can

be separated out from u as:

i′L = emax∀is∈F

(Iη(Jis)|η) (4.28)

Algorithm 3 compute new space(U, J)

Input: U in 2.15 and J

Output: New feasible space F ′ in (4.33)

1: Initialize U to a zero matrix (η +m× η)

2: for (j = 1, . . . , next + nprt) do

3: U[j][j] = 1

4: end for

5: Compute Um using (4.22)

6: Construct U

7: Compute i′′L using (4.30)

8: for (j = 1, . . . ,m) do

9: Multiply the jth row of Um by the first η rows of J to get a row vector e

10: Maximize: e · is, subject to: is ∈ F

11: end for

12: Construct u′ using (4.32)

while the new global constraints will be given by:

i′G = emax∀is∈F

(Um(Jis)|η) (4.29)

Further reduction in the number of LP solves can be made by using the original local constraints

to compute i′L instead of maximizing over the feasible space F . The new local constraints can

then be computed by:

i′′L = JiL (4.30)

The results of maximization using the original local constraints instead of the original feasi-

ble space are pessimistic because it can never be the case that all the chip components are

simultaneously drawing maximum currents, implying that:

i′′L ≥ i′L (4.31)

Algorithm 4 NEW INCR VERIFYInput: Partitioned power grid matrices in (2.3), τN , κ, δ1 and δ2

Output: Upper bounds on worst-case voltage drops for external nodes

2: (T, Gsub, Csub) = MACRO(subgrid matrices, τN , κ, δ2)

3: Construct J, G, C and A

4: F ′ = compute new space(U, J)

5: for (j = 1, . . . , η) do

6: Compute the jth row of A−1 using SPAI [8] with δ = δ1

7: Use the first η elements of the row to get row vector d

8: Maximize: d · i′s, subject to: i′s ∈ F ′

9: end for

10: Compute vub using (4.34)

Combining (4.29) and (4.30), we get:

u′ =

i′′L

(4.32)

so that the new feasible space is given by:

F ′ = {i′s : 0 ≤ Ui′s ≤ u′} (4.33)

From (4.31), u′ ≥ u, implying that F ⊆ F ′. The approach to compute F ′ is presented in

Algorithm 3. Equation (4.15) can now be restated as:

I+G−1 C

Iext 0 0

0 Iprt 0

0 TT A−1int

emax∀i′s∈F

(A−1|ηi′s)

iL,int

(4.34)

The incremental verification approach using the new feasible space (F ′) is given in Algorithm 4.

To comment on the quality of results obtained by using the approach described above, we make

the following claim:

Table 4.1: Speed and accuracy after reducing the dimensions of the feasible space of

currents compared to incremental verification approach presented in chapter 3

Power Grid Sub-grid Max Error Avg. % CPU time Speed-up

Name n nint nprt (mV) Error Original Modified

H1 23,378 11,484 118 0.11 0.22 0.39 h. 0.28 h. 1.39x

H2 36,277 18,616 148 0.06 0.12 0.5 h. 0.43 h. 1.16x

H3 52,359 31,098 176 0.01 0.022 0.61 h. 0.52 h. 1.17x

H4 72,692 47,450 348 0.05 0.087 0.69 h. 0.63 h. 1.1x

H5 98,162 68,972 402 0.02 0.026 0.77 h. 0.68 h. 1.14x

H6 143,561 85,600 440 0.03 0.028 0.97 h. 0.81 h. 1.19x

H7 162,087 124,824 518 0.05 0.063 1.25 h. 1.1 h. 1.13x

Claim 3. The feasible space of voltages generated by the modified current constraints de-

scribed by F ′ contains the original feasible space of voltages generated by the original current

constraints.

Proof. Since F ⊆ F ′, we have:

emax∀i′s∈F

(A−1|ηi′s) ≥ emax

∀i′s∈F(A−1|ηi

′s) (4.35)

Recall that E′ is a superset of the extreme points of F and the solution to an LP is found at

an extreme point which implies that (4.24) can be written as:

= eopt

∀i′s∈F

(Ui′s) (4.36)

From (4.23) and (4.36),

∀i′s ∈ F ⇒ {l ≤ Ui′s ≤ u}

⇒ i′s ∈ F

⇒ F ⊆ F

⇒ emax∀i′s∈F

(A−1|ηi′s) ≥ emax

∀i′s∈F(A−1|ηi

′s) (4.37)

From (4.35), (4.37) and that G−1 ≥ 0, we can conclude that the vector of upper bounds given

by (4.34) is element-wise greater than or equal to the vector given by (4.15). Also, from the

equivalence relation between (4.3) and (4.15), it can be deduced that the feasible space of

voltages generated by the modified current constraints described by F ′ contains the original

feasible space of voltages generated by the original current constraints.

4.2.2 Experimental Results

To test this approach, we implemented Algorithm 3 and Algorithm 4 in C++. The grids generated

were consistent with 1.1 V 65nm CMOS technology and the sub-grid nodes were identified by

the user. Computations were done using a 2.6 GHz Linux machine with 24 GB of RAM. A

SPAI tolerance value of δ1 = 5mV was used to compute approximate inverse of A. A lower

value of tolerance δ2 = 0.1mV, conductance threshold value κ = 5 × 10−3, and time constant

value τN = 5ps were used to macromodel the sub-grid using Algorithm 2.

A comparison between the speed and accuracy obtained after reducing the dimensions of

the feasible space of currents (referred to as modified approach) and incremental verification

approach [38] (referred to as original approach) is presented in Table 4.1. We report the

maximum absolute error (in mV) and the average percentage error incurred while performing

verification of external nodes using the modified approach as compared to verification using

the original approach. The results show that we have been able to achieve additional speed-up

(1.18x on average) while incurring negligible loss of accuracy. The speed-up obtained from the

modified approach is due to the fact that the size of LPs has decreased because the internal

current sources have been physically replaced by Norton equivalent current sources at port

nodes.

On average, the size of LPs in terms of the number of variables was reduced by 78.8 %

as compared to the original approach. Although the size of LPs got significantly reduced, we

were not able to get speed-ups to the same scale because the contribution of LP solves in the

CPU time for the original approach is just 18.3 %. Since the modified approach targets this

18.3 % chunk of runtime, the speed-up is not comparable to the amount of reduction in the

size of the LPs. The contribution further gets reduced to only 9 % in the modified approach

Table 4.2: Runtime breakdown

Power Total MACROF ′

INCR VERIFY Bound on

Grid Time T Gsub and Gsub LP SPAI vint

H1 0.28 h. 12.17 s. 0.36 min. 0.65 s. 1.83 min. 16.48 min. 0.26 s.

H2 0.43 h. 14.23 s. 0.71 min. 0.63 s. 2.57 min. 22.7 min. 0.45 s.

H3 0.52 h. 27.9 s. 1.17 min. 0.85 s. 3.24 min. 27.3 min. 0.7 s.

H4 0.63 h. 36.7 s. 1.43 min. 0.83 s. 3.76 min. 31.2 min. 1.11 s.

H5 0.68 h. 43.9 s 2.19 min. 0.93 s. 3.93 min. 34.3 min. 1.61 s.

H6 0.81 h. 1.05 min. 3.18 min. 0.89 s. 4.35 min. 39.4 min. 2.36 s.

H7 1.1 h. 1.44 min. 9.11 min. 1.23 s. 5.02 min. 49.3 min. 3.06 s.

Figure 4.1: A bar graph showing the contributions of major procedures in runtime for

both original (chapter 3) and the modified approaches

Figure 4.2: Relative Error Plot for H6 (modified approach)

as shown in Fig. 4.1. A similar analysis when done for the traditional full-chip verification flow

showed that the LPs constitute only about 12 % of the runtime. With the modified approach,

we were able to obtain a total speed-up (full-chip vs modified incremental verification) of 10.2x

on average. The runtime breakdown for the modified approach is presented in Table 4.2. It

can be noted from the data presented in Table 4.2 that majority of the CPU time is consumed

in the computation of approximate inverse using SPAI. The relative error plot for H6 grid in

Fig 4.2 shows that the we have been able to bound the error to under 0.5mV for the modified

approach.

4.3 Chip Power Model

In this section, we propose a framework for performing power integrity analysis for the off-chip

interconnections in a vectorless verification context by adapting the problem of constructing

the CPM, to the incremental verification framework. The model that we are going to use for

this work is described in section 2.2.2. Verification of the package (external) nodes can be

efficiently performed by macromodeling the on-chip network (sub-grid). The sub-grid in this

case is special because all the current sources in the model are present inside (on internal nodes

of) the sub-grid, meaning that there are no global constraints in which the current sources

from inside the sub-grid are related to the current sources outside the sub-grid. We adapt our

algorithm to allow for verification of external nodes in the RLC verification context.

4.3.1 Adapting to RLC

In the RLC case, upper and lower bounds on voltage drops for r time steps ahead of time are

required. For r = 1 or t = ∆t, the upper and lower bounds on voltage drops can be computed

from (2.29) as:

w1 = eopt∀is∈F

(D−1is) (4.38)

Using (2.8), the grid equation of the RC sub-grid when the port nodes are shorted to ground

is given by:

G33v′int(t) +Cintv

′int(t) = is,int(t) (4.39)

The Norton equivalent current source vector at port nodes will be given by:

i′(∆t) = T∆tis,int(∆t) (4.40)

where T∆t = −G23A−1int is the transformation matrix that transforms the internal current

sources to the Norton equivalent current sources at t = ∆t as in (3.4). We can construct a

new feasible space (F ′∆t) at t = ∆t using Algorithm 3, where the transformation that maps the

original current source configuration to the modified configuration is given by:

i′s(∆t) =

Iext 0 0

0 Iprt T∆t

is(∆t)

= (J∆tis)|η (4.41)

Thus, the upper and lower bounds on voltage drops at r = 1 for external and port nodes are

given by:

w1,ext

w1,prt

= eopt

∀i′s∈F′

vext(∆t)

vprt(∆t)

= eopt

∀i′s∈F′

(D−1|ηi′s) (4.42)

where D is an n× n reduced system matrix obtained after performing reduction of the on-chip

RC interconnection network. To find an estimate of the upper and lower bounds on voltage

drops for internal nodes, we use the approach similar to the efficient bounds computation

approach described in section 3.2. From (3.2), we have:

vint(∆t) = A−1intis,int(∆t)−A−1

intGT23vprt(∆t)

emax∀is∈F

∀is,int∈F(is,int(∆t)) +TT

∆t emax∀is∈F

(vprt(∆t))

emin∀is∈F

(vint(∆t)) ≥ A−1int emin

∆t emin∀is∈F

(vprt(∆t))

(4.43)

To the extent that emax∀is∈F

(vprt(∆t)) ≈ emax∀i′s∈F

(vprt(∆t)), (4.43) can also be approximately re-

stated as:

emax∀is∈F

∆t emax∀i′s∈F

(vprt(∆t))

emin∀is∈F

(vint(∆t)) ≥ A−1int emin

∆t emin∀i′s∈F

(vprt(∆t))

w1,int,max ≈ w1,int,max = A−1intiL,int +TT

∆tw1,prt,max

w1,int,min ≈ w1,int,min = TT∆tw1,prt,min

(4.44)

Thus, the lower bounds and upper bounds on voltage drops at t = ∆t can be computed by

using (4.42) and (4.44). For any r or t = r∆t, the bounds were expressed in (2.31) as:

wr = wr−1 + eopt∀is∈F

[(D−1E)r−1D−1is]

= eopt∀is∈F

(v((r − 1)∆t)) + eopt∀is∈F

[(D−1E)r−1D−1is] (4.45)

v(r∆t) =r−1∑

(D−1E)kD−1is (4.46)

= v((r − 1)∆t) + (D−1E)r−1D−1is (4.47)

This implies that to find the bounds for any r, we need to iteratively perform an optimization

operation on voltage drops (given by (4.46)) at t = ∆t, . . . , r∆t such that the optimization

function is given by:

(D−1E)r−1D−1is at t = r∆t (4.48)

Similarly, for the RC system when the port nodes are shorted to ground (described by (4.39)),

the voltage at time t = r∆t is given in [8] as:

v′int(r∆t) = v′int((r − 1)∆t) +

A−1int

)r−1

(A−1intis,int) (4.49)

From the arguments above and using (3.9), the Norton equivalent current vector at port nodes

at t = r∆t is given by:

i′(r∆t) = −G23v′int(r∆t)

⇒Tr∆t = −G23

A−1int

)r−1

A−1int (4.50)

The new feasible space (F ′r∆t) at t = r∆t can now be defined using Algorithm 5, with the

transformation matrix Jr∆t being composed of Tr∆t. Using an approach similar to that was

used to determine w1 after macromodeling, the lower and upper bounds on voltage drops at

time t = r∆t after macromodeling can be expressed as:

wr,ext

wr,prt

wr−1,ext

wr−1,prt

+ eopt

∀i′s∈F′

[((D−1E)r−1D−1)|ηi′s] (4.51)

wr,int,max =(

A−1int

)r−1A−1

intiL,int +TTr∆twr,prt,max

wr,int,min = TTr∆twr,prt,min

(4.52)

The choice of r is made in such a way that ‖N‖∞ < 1 and ‖N‖1 < 1, where N = (D−1E)r.

Recall that in (2.32), the bounds at infinity were given by:

vub(∞)

vlb(∞)

= (I−R)−1wr

where R is a 2n× 2n matrix defined as:

N− S S

S N− S

where S = 12(N−Q), with Q being the matrix of the element-wise absolute values of the entries

in N. We also have an n× n reduced order matrix N such that:

N|η ≈ N|η (4.53)

Using the above approximation, we create an n× n matrix N′ such that:

N′ = NJr∆t (4.54)

and the bounds at infinity can now be computed as:

vub(∞)

vlb(∞)

= (I− R)−1wr (4.55)

where vub(∞) and vlb(∞) are respectively, the n-upper and lower bounds on the worst-case

voltage drops in the RLC case with the first next entries corresponding to the bounds on worst-

case voltage drops on external nodes and R is constructed in the same way using N′ instead of

It can be noted from (4.50) that computation of Tr∆t for any r requires one to compute

A−1int. Although Tr∆t can be efficiently computed by constructing the full matrix inverse using

SPAI, we found it worth the effort to alleviate the need to explicitly compute the inverse by

using an iterative approach. From (4.50), we have:

T(r−1)∆t = −G23

A−1int

)r−2

A−1int

⇒T(r−1)∆tAint = −G23

A−1int

)r−2

(4.56)

Also, (4.50) can be written as:

Tr∆t = −G23

A−1int

)r−2(

A−1int

A−1int (4.57)

Using (4.56) in (4.57),

Tr∆t = T(r−1)∆tAint

A−1int

= T(r−1)∆tCint

∆tA−1

⇒ Tr∆tAint = T(r−1)∆tCint

∆t(4.58)

Taking transpose on both sides of (4.58) and using the fact that Aint is a symmetric matrix,

we have:

AintTTr∆t =

∆tTT

(r−1)∆t (4.59)

Table 4.3: Speed and accuracy after constructing the CPM using Approach I

Power Grid On-chip Grid Max Error (mV) Avg. % Error CPU time Speed

Name n nint nprt vub vlb vub vlb Full Grid CPM Up

A1 13,905 12,577 664 4 2.6 8.8 1.2 47.21 min. 2.6 min. 18.16x

A2 24,548 22,208 1,170 5.5 4 10.2 5.1 2.41 h. 10.8 min. 13.36x

A3 34,183 30,925 1,629 10.9 3.2 12.8 4.3 3.36 h. 18.4 min. 10.95x

A4 52,968 47,920 2,524 11.19 3.2 12.9 2.7 6.27 h. 31.8 min. 11.83x

Table 4.4: Comparison between the three different approaches to macromodeling

Power CPU time Speed-up Max Error I vs III (mV)

Grid Approach I Approach II Approach III I vs II I vs III vub vlb

A1 2.6 min. 9.3 min. 4.13 min. 3.57x 1.58x 2.6 0.6

A2 10.8 min. 41.2 min. 12.2 min. 3.81x 1.12x 0.53 1.3

A3 18.4 min. 56.4 min. 20.3 min. 3.06x 1.1x 1.2 0.49

A4 31.8 min. 51.3 min. 35.1 min. 1.63x 1.14x 1.7 0.47

A method to computeT∆t without constructing the full matrixA−1int is presented in Algorithm 2.

In this method, we first identify for every port node k, the neighbors and the connections (gkj)

to the neighbors. Then, we compute the corresponding row of the approximate inverse, multiply

the row vector by gkj , and add the result to the kth row ofT∆t. To computeT2∆t, solve (4.59) for

TT2∆t using T∆t and then re-transpose to get T2∆t. The same steps can be repeated iteratively

to find Tr∆t. The complete algorithm to perform verification of the off-chip interconnects after

constructing the CPM is presented in Algorithm 5.

4.3.2 Experimental Results

A C++ implementation was written to test the proposed approach. The grids generated were

consistent with 1.1 V 65nm CMOS technology and the chip-to-package interconnections were

modeled as tank circuits [3]. Inductance values were specified based on the technology specifi-

0 0.02 0.04 0.06 0.08 0.1−100

voltage drop (V)

Figure 4.3: Relative Error Plot for A4 (vub(∞))

cations. A SPAI error tolerance value δ1 = 5mV was used to compute the approximate inverse

(D−1). To macromodel the on-chip RC grid, a SPAI error tolerance value δ2 = 0.1mV, conduc-

tance threshold value κ = 5×10−3 and a nodal time constant value τN = 5ps were used. Three

different approaches to macromodel the on-chip interconnects were implemented: 1) CPM is

constructed using the dimension reduced feasible space and computation of Tr∆t is done us-

ing (4.59), referred to as Approach I, 2) CPM is constructed using reduced dimensions while

Tr∆t is evaluated by constructing the full matrix A−1int (Approach II), and 3) the original incre-

mental verification approach presented in chapter 3 is adapted to the RLC case to construct

CPM and Tr∆t is computed using the iterative approach (4.59) in Approach III.

Table 4.3 gives the speed and accuracy achieved by performing verification of the off-chip

interconnects after constructing the CPM for the on-chip grid using Approach I. It can be

observed that use of CPM results in significant speed-up, because verification can now be

performed only on the nodes that need to be verified and optimizations are performed on a

reduced problem (as a result of macromodeling). The error incurred is primarily due to the

approximation involved with performing reduction of the sub-grid. This error can be reduced

by tweaking the values of κ and τN with some loss in the speed of verification. The relative

error plots for the error incurred in the upper bound (vub(∞)) and lower bound (vlb(∞)) for

0 0.01 0.02 0.03 0.04 0.05−100

voltage drop (V)

Figure 4.4: Relative Error Plot for A4 (vlb(∞))

A4 grid are presented in Fig. 4.3 and Fig. 4.4 respectively.

Table 4.4 provides representative data for comparison between the three approaches to

construct CPM. It can be observed that Approach I leads to significant speed-up (3.01x on

average) over Approach II in which the full inverse matrix was constructed. We were also

able to achieve speed-up over Approach III without incurring significant error. The speed-up

obtained was a result of the reduction in the size of the LPs in terms of the number of variables.

The data reiterates the fact that using an iterative approach to compute Tr∆t results in runtime

savings and less memory penalty because the full inverse matrix need not be stored.

Algorithm 5 CPM VERIFYInput: Partitioned power grid matrices in (2.8), τN , κ, δ1 and δ2

Output: Upper bound and lower bound estimates at infinity for external nodes

2: (T∆t, Gsub, Csub) = MACRO(subgrid matrices, τN , κ, δ2)

3: Construct E and D

4: Compute D−1 using SPAI with δ = δ1

5: r = 1

6: Construct J∆t using (4.41)

7: F ′∆t = compute new space(U, J∆t)

8: Compute w1 using (4.42) & (4.44)

9: N = D−1E

10: while (min(‖N‖∞, ‖N‖1) ≥ 1) do

11: r = r + 1

12: Compute TTr∆t using (4.59)

13: Tr∆t = (TTr∆t)

14: F ′r∆t = compute new space(U, Jr∆t)

15: Compute wr using (4.51) & (4.52)

16: N = N× (D−1E)

17: end while

18: Compute upper and lower bounds using (4.55)

Chapter 5

Conclusions & Future Work

With technology scaling, voltage integrity analysis of power grids is becoming increasingly im-

portant. The main problem with simulation-based verification approaches is that the number

of possible input vectors is very large that makes grid simulation not practical. Another draw-

back is that it does not allow the designer to verify the grid before the complete circuit has

been designed that renders it difficult to make modifications in the power grid. As a result, an

early vectorless verification approach based on the notion of current constraints is adopted in

this work. The current constraints capture the uncertainty about the circuit behavior. Grid

verification is thus reduced to a problem of finding the worst-case voltage drop over all pos-

sible currents that satisfy the constraints, implying that a solution in traditional vectorless

verification requires solving a LP for every node.

Another trend that has emerged out of technology scaling is that the size of the power grids

is becoming large that makes traditional flat (full-chip) verification an overkill. The need of

the moment is to come up with efficient divide-and-conquer based approaches that can scale

well to the increasing size of the power grids. One such approach is incremental verification

that allows for efficient verification of only the sections of the grid that the user is interested in

verifying. It also creates an avenue to analyze the local impact of changes made to the design

of the power grid. In this work, we describe an early incremental verification approach for RC

grids under a constraints-based power grid verification framework, in which only the nodes that

Chapter 5. Conclusions & Future Work 74

are external to a sub-grid region are to be verified. Our approach gives a fast and accurate

way to compute the upper bounds on worst-case voltage drops at external nodes, based on

two contributions: 1) an upper-bound method that eliminates the need to perform multiple

iterations, and 2) a macromodeling method that drastically reduces the internals of the sub-

grid. As a result, 3-8x speed-ups are obtained, with negligible 1-2% error. With this proposed

approach, it becomes practical to perform early incremental design verification of the on-die

power grid under dynamic conditions.

We have also extended this approach to efficiently perform vectorless constraints-based

verification of off-chip RLC interconnects by constructing the Chip Power Model for the on-

chip interconnects. An optimized version of the incremental verification approach which involves

construction of a new dimension reduced feasible space of currents is used while constructing

the CPM. The results show that we have been able to achieve significant speed-ups without

considerable loss of accuracy. Another point to be noted is that the loss of accuracy can be

reduced by tweaking the parameters that guide the order reduction of the passive RC sub-grid

with some reduction in the speed-up.

Modern power grids are made up of many metal layers such that the higher metal layers

provide connections to supply while lower metal layers provide supply to the underlying logic

circuitry. As a future work, the incremental approach can be used to macromodel the higher

metal layers to a point that makes it computationally effective to verify the lower metal layers

without incurring much loss of accuracy. Similarly, incremental verification can be used to

guide the placement of current sources in the power grid model that contribute to the worst-

case voltage drop. Usually, the current sources are distributed uniformly over the entire block.

The problem with such an approach is that it does not guarantee worst-case switching activity

for the grid. Therefore, macromodeling can be used to abstract of the current sources to the

block level, such that worst-case voltage behavior is ensured.

Bibliography

[1] N. H. Abdul Ghani and F. N. Najm. Fast vectorless power grid verification under an rlc

model. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,

30(5):691–703, May 2011.

[2] R. Panda, D. Blaauw, R. Chaudhry, V. Zolotov, B. Young, and R. Ramaraju. Model

and analysis for combined package and on-chip power grid simulation. In International

Symposium on Low Power Electronics and Design, pages 179–184, 2000.

[3] Q. K. Zhu. Power Distribution Network Design for VLSI. John Wiley and Sons, New

Jersey, 2004.

[4] Semiconductor Industry Association. International Technology Roadmap for Semiconduc-

tors. 2011.

[5] D. Kouroussis and F. N. Najm. A static pattern-independent technique for power grid

voltage integrity verification. In ACM/IEEE Design & Automation Conference, pages

99–104, Anaheim, CA, Jun. 2-6 2003.

[6] M. Nizam, F. N. Najm, and A. Devgan. Power grid voltage integrity verification. In

International Symposium on Low Power Electronics and Design, pages 239–244, San Diego,

CA, Aug. 8-10 2005.

[7] I. A. Ferzli, F. N. Najm, and L. Kruze. A geometric approach for early power grid veri-

fication using current constraints. In ACM/IEEE International Conference on Computer

Aided Design, pages 40–47, San Jose, CA, Nov. 5-8 2007.

Bibliography 76

[8] N. H. Abdul Ghani and F. N. Najm. Fast vectorless power grid verification using an

approximate inverse technique. In ACM/IEEE Design & Automation Conference, San

Fransisco, CA, Jul. 26-31 2009.

[9] Woo Hyung Lee, S. Pant, and D. Blaauw. Analysis and reduction of on-chip inductance

effects in power supply grids. In International Symposium on Quality of Electronic Design,

pages 131–136, 2004.

[10] N. Srivastava, Qi Xiaoning, and K. Banerjee. Impact of on-chip inductance on power distri-

bution network design for nanometer scale integrated circuits. In International Symposium

on Quality of Electronic Design, pages 346–351, 2005.

[11] E. Kulali, E. Wasserman, and J. Zheng. Chip power model - a new methodology for

system power integrity analysis and design. In IEEE Electrical Performance of Electronic

Packaging, pages 259–262, Atlanta, GA, Oct. 29-31 2007.

[12] D. Kouroussis, I. A. Ferzli, and F. N. Najm. Incremental partitioning-based vectorless

power grid verification. In ACM/IEEE International Conference on Computer Aided De-

sign, pages 358–364, San Jose, CA, November 6-10 2005.

[13] Y.-M. Lee, Y. Cao, T.-H. Chen, J.M. Wang, and C.C.-P. Chen. HiPRIME: hierarchical and

passivity preserved interconnect macromodeling engine for RLKC power delivery. IEEE

Transactions on Computer-Aided Design of Integrated Circuits and Systems, 24(6):797–

806, June 2005.

[14] A. Muramatsu, M. Hashimoto, and H. Onodera. Effects of on-chip inductance on power

distribution grid. In International Symposium on Physical Design, pages 63–69, 2005.

[15] Samuel L. Oppenheimer, Jean Paul Borchers, and F. Roger Hess. Direct and Alternating

Currents. McGraw-Hill, New York, 1973.

[16] Henk A. Vorst Wilhelmus H. Schilders and Joost Rommes. Model Order Reduction: The-

ory, Research Aspects and Applications. Springer, Berlin, 2008.

Bibliography 77

[17] Sheldon X.-D Tan and Lei He. Advanced Model Order Reduction Techniques in VLSI

Design. Cambridge University Press, Cambridge, UK, 2007.

[18] B. Sheehan. Realizable reduction of RC networks. IEEE Transactions on Computer-Aided

Design of Integrated Circuits and Systems, 26(8):1393–1407, August 2007.

[19] L. T. Pillage and R. A. Rohrer. Asymptotic waveform evaluation for timing analysis. IEEE

Transactions on Computer-Aided Design of Integrated Circuits and Systems, 9(4):352–366,

April 1990.

[20] Y. Ismail. Efficient model order reduction via multi-node moment matching. In ACM/IEEE

International Conference on Computer-Aided Design, pages 767–774, Nov 2002.

[21] Haifang Liao and W. Wei-Ming Dai. Partitioning and reduction of RC interconnect

networks based on scattering parameter macromodels. In Digest of Technical Papers,

ACM/IEEE International Conference on Computer-Aided Design, pages 704–709, Nov

[22] P. Miettinen, M. Honkala, J. Roos, C. Neff, and A. Basermann. Study and development of

an efficient RC-in-RC-out MOR method. In IEEE International Conference on Electronics,

Circuits and Systems, pages 1277–1280, Aug 2008.

[23] G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar. Multilevel hypergraph partitioning:

Applications in VLSI domain. In ACM/IEEE Design & Automation Conference, pages

526–529, 1997.

[24] P. Miettinen, M. Honkala, J. Roos, and M. Valtonen. PartMOR: Partitioning-based realiz-

able model-order reduction method for rlc circuits. IEEE Transactions on Computer-Aided

Design of Integrated Circuits and Systems, 30(3):374–387, March 2011.

[25] R. Inotiu, J. Rommes, and Wil H. A. Schilders. SparseRC: Sparsity preserving model

reduction for rc circuits with many terminals. IEEE Transactions on Computer-Aided

Design of Integrated Circuits and Systems, 30(12):1828–1841, 2011.

Bibliography 78

[26] A. Odabasioglu, M. Celik, and L. T. Pileggi. PRIMA: Passive reduced-order interconnect

macromodeling algorithm. IEEE Transactions on Computer-Aided Design of Integrated

Circuits and Systems, 17(8):645:654, August 1998.

[27] P. Liu, Sheldon X.-D Tan, B. McGaughy, Lifeng Wu, and Lei He. TermMerg: An efficient

terminal reduction method for interconnect circuits. IEEE Transactions on Computer-

Aided Design of Integrated Circuits and Systems, pages 1382–1392, 2007.

[28] R. W. Freund. SPRIM: Structure-preserving reduced-order interconnect modeling. In

ACM/IEEE International Conference on Computer-Aided Design, pages 80–87, 2004.

[29] Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press,

Cambridge, 1985.

[30] Hao Yu, Lei He, and Sheldon X.-D Tan. Block structure preserving model order reduc-

tion. In IEEE International Behavioral Modeling and Simulation Workshop, pages 1–6,

September 2005.

[31] T. Kailath. Linear Systems. Prentice-Hall, Englewood Cliffs, New Jersey, 1989.

[32] J. R. Phillips and L. M. Silveira. Poor man’s TBR: a simple model reduction scheme. In

European Design and Test Conference (DATE), pages 938–943, 2004.

[33] R. Inotiu, J. Rommes, and Wil H. A. Schilders. Compact modeling of interconnect circuits

over wide frequency band by adaptive complex-valued sampling method. ACM Transac-

tions on Design Automation of Electronic Systems, 17(1), 2012.

[34] C. S. Amin, M. H. Chowdhury, and Y. I. Ismail. Realizable RLCK circuit crunching. In

ACM/IEEE Design & Automation Conference, pages 226–231, June 2003.

[35] M. Zhao, R. Panda, S. Sapatnekar, and D. Blaauw. Hierarchical analysis of power dis-

tribution networks. IEEE Transactions on Computer-Aided Design of Integrated Circuits

and Systems, 21(2):159–168, 2002.

Bibliography 79

[36] Chun-Jen Wei, H. Chen, and Sao-Jie Chen. Design and implementation of block-based

partitioning for parallel flip-chip power-grid analysis. IEEE Transactions on Computer-

Aided Design of Integrated Circuits and Systems, 31(3):370–379, March 2012.

[37] J. M. S. Silva, J. R. Phillips, and L. M. Silveira. Efficient simulation of power grids. IEEE

Transactions on Computer-Aided Design of Integrated Circuits and Systems, 29(10):1523–

1532, October 2010.

[38] Abhishek and F. N. Najm. Incremental power grid verification. In ACM/IEEE Design &

Automation Conference, pages 151–156, San Fransisco, CA, June 3-7 2012.

[39] K.J. Kerns and A.T. Yang. Stable and efficient reduction of large, multiport RC networks

by pole analysis via congruence transformations. IEEE Transactions on Computer-Aided

Design of Integrated Circuits and Systems, 16(7):734–744, 1997.

[40] The MOSEK optimization software (www.mosek.com).

[41] Arne Brøndsted. An Introduction to Convex Polytopes (Graduate Texts in Mathematics).

Springer-Verlag, New York, Heidelberg, Berlin, 1983.

[42] Zong-Ben Xu, Jiang-She Zhang, and Yiu-Wing Leung. An approximate algorithm for com-

puting multidimensional convex hulls. Applied Mathematics and Computation, 94(23):193

– 226, 1998.

incremental power grid verification › ... › abhishek_201211_masc_thesi… · incremental power...

Documents

dynamic wireless power transfer grid impacts analysis · pdf...

power grid monitoring solution - webnms · pdf...

power grid failure ppt

power systems for grid simulation - opal-rt systems for grid...

operational experience and challenges of uhv grid in ecg ·...

power grid vulnerability

the smart power grid

power grid ppt

an intelligent water drop algorithm for solving optimal...

alstom power & grid

power grid wireless

incremental grid-like layout using soft and hard...

role of power grid in side channel attack and power-grid...

a incremental analysis of power grids using backward...

power quality improvement in grid interconnection of ... ·...

agents in grid. the grid grid – introduction power grid...

power grid and clock design -...

incremental transient simulation of power grid

ppt power grid design

power grid design