circuit simulation via matrix exponential method speaker: shih-hung weng adviser: chung-kuan cheng...

1

Circuit Simulation via Matrix Exponential Method

Speaker: Shih-Hung WengAdviser: Chung-Kuan Cheng

Date: 05/31/2013

2

Foundation of Design Flow

PlacementLogic Synthesis Timing Analysis Routing… … … …

Circuit Simulation

lookuptable

characterization Abstraction Layer

Circuit Simulation

3

Emerging Demands

• Full system verification and analysis– scalability and performance

time

voltageon-chip power grid

low frequency

Publications (1/3)• Circuit Simulation with Matrix Exponential Method:

1. S.-H. Weng, H. Zhuang and C.K. Cheng, “Adaptive Time Stepping for Power Grid Simulation using Matrix Exponential Method”, submitted to IEEE ICCAD 2013

2. S.-H. Weng, Q. Chen and C.K. Cheng, “Circuit Simulation using Matrix Exponential Method for Stiffness Handling and Parallel Processing”, IEEE ICCAD, Nov. 2012

3. Q. Chen, W. Schoenmaker, S.-H. Weng, C.K. Cheng, G.-H. Chen, L.-J. Jiang and N. Wong, “A Fast Time-Domain EM-TCAD Coupled Simulation Framework via Matrix Exponential,” IEEE ICCAD, Nov. 2012 (Best Paper Award Candidate)

4. Y. Li, Q. Cheng, S.-H. Weng, C.K. Cheng and N. Wong, “Globally Stable, Highly Parallelizable Fast Transient Circuit Simulation via Faber Series”, IEEE NewCAS May. 2012

5. S.-H. Weng, Q. Chen and C.K. Cheng, “Time-Domain Analysis of Large-Scale Circuits by Matrix Exponential Method with Adaptive Control”, IEEE Trans. on CAD, Jul. 2012

6. Q. Chen, S.-H. Weng and C.K. Cheng, “A Practical Regularization Technique for Modified Nodal Analysis in Large-Scale Time-Domain Circuit Simulation”, IEEE Trans. on CAD, Jun. 2012

7. S.-H. Weng, Q. Chen and C.K. Cheng, “Circuit Simulation by Matrix Exponential Method,” IEEE ASIC Conference, Oct. 2011

8. S.-H. Weng, P. Du and C.K. Cheng, “A Fast and Stable Explicit Integration Method by Matrix Exponential Operator for Large Scale Circuit Simulation”, IEEE ISCAS, May. 2011

4

Publications (2/3)• Clock Gating Synthesis:

9. S.-H Weng, Y.-M. Kuo and S.-C. Chang, “Timing Optimization in Sequential Circuit by Exploiting Clock-Gating Logic,” ACM Trans. on DAES, April 2012.

10. Y.-M. Kuo, S.-H. Weng, and S.-C. Chang, “A Novel Sequential Circuit Optimization with Clock Gating Logic,” IEEE ICCAD, Nov. 2008

• High-speed Interconnect:11. G. Sun, S.-H. Weng, C.K, Cheng, B. Lin and L. Zeng, “An On-Chip Global Broadcast Network Design

with Equalized Transmission Lines in the 1024-Core Era”, IEEE SLIP Jun. 201212. S.-H. Weng, Y. Zhang, J. F. Buckwalter and C.K. Cheng, “Energy Efficiency Optimization through Co-

Design of the Transmitter and Receiver in High-Speed On-Chip Interconnects”, accepted by IEEE Trans. on VLSI

• Placement and Routing:13. C.K. Cheng, P. Du, A.B. Kahng and S.-H. Weng, “Low-Power Gated Bus Synthesis for 3D IC via

Rectilinear Shortest-path Steiner Graph,” IEEE ISPD, Mar., 201214. P. Du, W. Zhao, S.H. Weng, C.K. Cheng and R.L. Graham, “Character Design and Stamp Algorithms

for Character Projection Electron-Beam Lithography,” IEEE ASPDAC, Feb., 2012

5

6

Publications (3/3)• Power Grid Analysis:

15. X. Hu, P. Du, S.-H. Weng and C.K. Cheng, “Worst-Case Noise Prediction With Non-zero Current Transition Times for Power Grid Planning,” accepted by IEEE Trans. on VLSI.

16. C.-C. Chou, H.-H. Chuang, T.-L. Wu, S.-H. Weng, and C.K. Cheng, “Eye Prediction of Digital Driver with Power Distribution Network Noise,” IEEE EPEPS, Nov. 2012 (Best Student Paper Award)

17. P. Du, S.-H. Weng, X. Hu and C.K. Cheng, “Power Grid Sizing via Convex Programming,” IEEE ASIC Conference, Oct. 2011

18. P. Du, X. Hu, S.H. Weng, A. Shayan, X. Chen, A. E. Engin and C.K. Cheng, “Worst-Case Noise Prediction with Non-zero Current Transition Times for Early Power Distribution System Verification,” IEEE ISQED, Mar. 2010

19. S.-H. Weng, Y.-M. Kuo, S.-C. Chang, and M. Marek-Sadowska, “Timing Analysis Considering IR Drop Waveforms in Power Gating Designs,” IEEE ICCD, Oct. 2008

7

Outline

• Numerical Integration in Circuit Simulation

• Matrix Exponential Method– Krylov Subspace Approximation– Rational Krylov Subspace Approximation– Parallelism

• Experimental Results

• Conclusions

Circuit Formulation

• Formulated as a system of DAEs [Ho et. al. ‘75]

8

ttt uxixGxCxq LL

resistance & incidence

capacitance & inductance

branch currents & nodal voltages

derivative of charges in nonlinear devices

input sources

currents of nonlinear devices

linearized by compact model (BSIM, PSP, etc.)

Circuit Formulation

• Formulated as a system of DAEs [Ho et. al. ‘75]

• Solve x(t) in implicit or explicit numerical method

9

)()()( ttt uGxxC

ttt uxixGxCxq LL

after linearization

10

forward Euler

backward Euler

Numerical Integration (1/2)

• Forward Euler (1st order explicit)

• Backward Euler (1st order implicit)

• Stability issue for stiff circuit

unstable result

)/()/( 11

1

nnn hh uxCGCx

nnn hh uCxGCIx 111 )(

performance & scalability issues

)()()( ttt uGxxC

sparse matrix-vector product

solving a linear system

11

MethodsLinear Nonlinear

High Mild Low High Mild LowForward Euler slow fast slow fast

Backward Euler mediumslow

Trapezoidal > Backward Euler

and beyond? fast

Numerical Integration (2/2)Methods Computation Scalability Error Stability Step size

Forward Euler x=Av high O(h2) low tiny

Backward Euler Ax=b low O(h2) A-stable medium

Trapezoidal Ax=b low O(h3) A-stable > Backward Euler

and beyond? simple high O(hn) high large

stiffness

lots Ax=b

one Ax=b with fixed step size in C/h+G

Performance = # steps x computation per stepcircuit dependent

more #steps

12

Outline




• Conclusions

Matrix Exponential Method (1/2)

• Analytical solution of– Let A=-C-1G, b=C-1u (C can be regularized [TCAD ‘12])

• Let input be piecewise linear

13

dtetehth hh )()()(

0

)( bxx AA

h

ththeteteht hhh )()(

)()()()( 21 bbAIAbAΙxx AAA

h

thtt

)()()()(

bbbb

)()()( ttt uGxxC

Matrix Exponential Method (2/2)

• One-exponential formulation [Al-Mohy&Higham ‘11]

– reduce three matrix exponential to one

14

h



2

)( 0)(

e

teht

h

n

xIx J0

WA

1

0

)(

00

10

2

)()(

e

ththt bW

J

bbwhere

Advantages• Accuracy: Analytical solution

– Approximate eAh as (I+Ah) Forward Euler

– Approximate eAh as (I-Ah)-1 Backward Euler

• Stability: A-stable for passive circuits

15

reference solution

How to compute eAv?

16

Computation on Matrix Exponential

• 19 dubious ways[van Loan03]

Categories Based on

Series Method

Rational Approximation

Decomposition

Splitting

Quadrature Rule

Krylov Subspace

!3!2

32 AAAI

dzze

iz 1

2

1AI

AA

D

N

},,,{ vAAvv mspan

1SBSA CBBCCBCB eee

eA

eAv

small

large

spec(A)

regular basis and rational basis

17

Outline




• Conclusions

Krylov Subspace Approximation (1/2)

• Krylov subspace K(A, v) = {v, Av, A2v, …, Am-1v}– orthogonalized by Arnoldi process

– approximate eAhv by eHmh

– posteriori error estimation[Saad92]

18

mmm AVVH {v, Av, A2v, …, Am-1v}Arnoldi process

12eee hh mH

mA Vvv

1Τ

21, eeehmmErr h

mkrylovmHHv

sparse matrix-vector multiplication

m is about 10~100

fast error estimation

scaling invariant

efficiency adaptivity

19

• Stiffness affects step size and dimension – Arnoldi process captures extreme and clustered eigenvalues

– Error bound [Saad92]

Krylov Subspace Approximation (2/2)

Image{h}

Real{h}

highly stiff

-max -min

Image{h}

Real{h}

captured regions

Arnoldi process with a small m

critical part for eAh

shrink h or increase m for capturing critical eigenvalues

! 12

1

2

m

eErr

m

krylov

v

2hAwhere

remedied by restarted scheme and scaling effect [ICCAD ‘12]

20

Outline




• Conclusions

21

• Rational basis (I-A)-1

– K((I-A)-1, v) = {v, (I-A)-1v, …, (I-A)-mv}

Rational Krylov Subspace Approximation (1/2)

…..for j = 1, 2, . . . , m solve (I- A)w = vj for i = 1, 2, . . . , j Hi,j = wTvi

w = w − Hi,jvi

end Hj+1,j = |w|2

vj+1 = w/Hj+1,j

end

Arnoldi process

(C+G)w=Cvj

avoid regularization of C mmm VAIVH 1

mm VH ,

mm1m AVVHI

1

subspace for Aone LU for linear circuit

w=Avj

22

• Rational basis (I-A)-1

– K((I-A)-1, v) = {v, (I-A)-1v, …, (I-A)-mv}

• Approximation of eAhv

• Posteriori error estimation[van den Eshof 06]

1

~

2eee

hh mH

mA Vvv


11

~/

Τ2

/~1,

~

m

h

mrational eh

eemmErr vAH

IH

v

m

Hm

1mm HIH ~

adaptivity

23

• Spectral transformation– similar to preconditioning– relax stiffness constraint– enable large step size with less dimension

’min ’maxsmall gap-max -min-h’’max -h’’min -’’max -’’min


Image{h}

Real{h}

transforming spectrum by (I-A)-1

captured by Arnoldi processcritical part for eA

projecting back to A by 1/ (I-H-1)

applying large h to 1/ (I-H-1)

small m is acceptable

determined by

within a unit circle

24



small step size

fix , sweep m and h

1

~

2eeeError

hh mH

mA Vvv

25



= 10-12

large error

fix h , sweep m and

1

~

2eeeError

hh mH

mA Vvv

MethodsLinear Nonlinear

High Mild Low High Mild LowForward Euler slow fast slow fast

Backward Euler mediumslow

Trapezoidal > Backward Euler

Krylov Approx slow fast slow mediumRation Krylov fast slow

Wrap UpMethods Computation Scalability Error Stability Step size

Forward Euler x=Av high O(h2) low tiny

Backward Euler Ax=b low O(h2) A-stable medium

Trapezoidal Ax=b low O(h3) A-stable > Backward Euler

Krylov Approx x=Av high O(hn) high medium

Ration Krylov Ax=b low O(hn) high large

26

27

Outline




• Conclusions

Parallelism in Krylov Subspace

• Arnoldi process– sparse matrix-vector multiplication [Bell&Garland ‘09]

• Exponential of a small matrix [Higham ‘05]

– dense matrix by matrix operation

28

n

n

nnn

n

x

x

x

x

aa

aa

1

2

1

,1,

,11,1

…

thread 1thread 2

thread n-1thread n

29

t9

• Constant slope within a step

Input Grouping

h



input 1

input 2

timet1 t2 t3 t4 t5 t6 t7 t8 t10t11

t12t13t14t15

tiny steps due to maintaining constant slope

30

• Constant slope within a step

Input Grouping

group 1

group 2

time

timet1 t2 t3 t4 t5 t6 t7 t8

t1 t2 t3 t4 t5 t6 t7 t8

thread 1

thread 2

31

Outline




• Conclusions

Settings of Experiments

• Environment– Implemented in Matlab– Intel i7 2.67GHz with 4GB memory

• Benchmarks– Nonlinear and large-scale circuits– Power distribution networks– IBM power grid testcases[Nassif 08]

32

Design Category # R # C # Trans. Size StiffnessD1 16bit adder 723 34 448 579 1.1x103

D2 ALU 13.6K 4.3K 6502 10K 5.4x106

D3 IO 1.26M 34.6K 1461 630K 1.6x106

D4 Power grid 10.4M 8.6M 0 12M 2.6x105

max

min

R

R

generalized eigenvalues of (G, C)




33

Design Area (mm2) # R # C # L Size StiffnessP1 0.352 23K 15K 15K 45.7K 8.7x109

P2 1.402 348K 228K 228K 688K 8.3x109

P3 2.802 1.46M 0.97M 0.97M 2.90M 1.0x1010

P4 5.002 3.75M 2.47M 2.47M 7.40M 1.0x1010

RC tanks for PCB and package




34

Design # R # C # L # I # V Size Stiffnessibmpg2t 245K 36K 330 36K 330 164K 3.5x1012

ibmpg3t 1.60M 201K 955 201K 955 1M 3.4x1011

ibmpg4t 1.83M 265K 962 266K 962 1.2M 2.5x1011

ibmpg5t 1.55M 473K 277 473K 539K 2.1M 4.7x1011

ibmpg6t 2.41M 761K 281 761K 836K 3.2M 3.8x1011

Nonlinear and Large-scale Circuits

• Matrix exponential method (MEXP)– Krylov subspace approximation – Restarted scheme and parallel SpMV on GPU

• Trapezoidal method (TRAP)– same adaptive scheme as MEXP

35

Design Size time m TRAP MEXP-Krylov speedupD1 579 100ps 20 671.4s 408.7s 1.64XD2 10K 100ps 30 3,085.91s 982.14s 3.14XD3 630K 100ps 30 8,053.45s 535.92s 15.05XD4 12M 1ns 20 fails 629.56 n/a

Parallel SpMV

36

Power Distribution Networks

• Simulate long time span (1μs) for step response• One LU factorization

– averaged by forward/backward substitutions• MEXP with rational basis adaptively scales h/• TRAP uses predetermined step size

DesignTRAP (h = 10ps) MEXP – Rational ( = 10-10)

LU(s) Total LU(s) Total Speedup

P1 0.67 44.85m 0.68 2.86m 15.73X

P2 15.60 15.43h 15.48 54.57m 16.96X

P3 91.60 76.92h 93.28 4.30h 17.91X

P4 293.81 203.64h 298.83 11.26h 18.08X

adaptive & large step size

37

Power Distribution Networks

38

IBM Testcases

• Widely adopted benchmarks • Many input current sources• Same MEXP with rational basis and TRAP


LU(s) Total(s) LU(s) Total(s) Speedup

ibmpg2t 1.31 48.19 1.29 41.81 1.15X

ibmpg3t 18.05 493.97 18.41 413.90 1.19X

ibmpg4t 30.32 675.78 31.01 229.13 2.95X

ibmpg5t 16.16 657.13 16.48 649.97 1.01X

ibmpg6t 23.99 965.53 34.60 915.62 1.05X

ill alignment

39

IBM Testcases

40

• Applying simple grouping – each group of inputs has the same pivot points– 6X speedup on average

IBM Testcases


LU(s) Total (s) # Group LU (s) Total (s) Speedup

ibmpg2t 1.31 48.19 25 1.29 7.93 6.77X

ibmpg3t 18.05 493.97 25 18.41 86.24 6.08X

ibmpg4t 30.32 675.78 4 31.01 124.16 5.73X

ibmpg5t 16.16 657.13 25 16.48 111.97 5.44X

ibmpg6t 23.99 965.53 25 34.60 166.34 5.80X

41

Conclusions

• Emerging challenges in the circuit simulation – scalability and performance

• Matrix exponential method– accuracy, adaptivity and stability– regular and rational Krylov subspace approximation

• Effectiveness of matrix exponential method– Simulate a large-scale circuit with 12M nodes– Nonlinear circuits: 6.61X speedup on average– Impulse response for PDNs: 15X speedup– IBM testcases: 6X speedup using input grouping

42

Future Works

• Variant basis in Krylov subspace– inverted, extended basis

• Model Order Reduction and matrix exponential method– both exploiting Krylov subspace– utilizing well-developed MOR to MEXP

• Hybrid simulation via matrix exponential– handle thermal, mechanical phenomena with FEM

43

Thank you!

• Trade off between stability and performance

SILCA [Li & Shi, ‘03]

ACES [Devgan & Rohrer, ‘97]

Where are we?

44

computationaleffort

stability

high

low high

Backward Euler

Forward Euler

Matrix Exponential Method [Weng et. al. ’11]

Telescopic [Dong & Li, ‘10]

Waveform Relaxation [E Lelarasmee et. al, ‘82]Domain Decomposition [K. Sun et. al., ‘07]

LIM [J. E. Schutt-Aine, ‘01]

Tailor for circuit simulation:• Adaptive step control• Scaling effect• Nonlinear device• Parallelization

ETD in numerical community:• [Saad ‘92]• [Ban et. al. ‘11]• [Aluffi-Pentini et. al. ‘03]• [Hochbruck et. al. ‘97]

Trapezoidal Method(SPICE)

Adaptive Step Control

• Typical circuit behavior

45

larger h

smaller h

total

total

T

Errherr

error budget

2

)( 0)(

e

teht

h

n

xIx J0

WA

Adaptive Step Size Strategy

• Adjustment of step size– Krylov subspace approximation

• require only to scale Hm: αA→αHm

• re-calculate eHm

– backward Euler• (C/h+G) changes and needs to solve linear system again

• Strategy: – maximize step size with a given error budget Errtotal

– error are from Krylov space method and linearization

46

)(/)/( 11

1

nnn hh BuxCGCx

total

NLtotal

nonlineartotal

Ltotal

krylov T

ErrhErr

T

ErrhErr

Nonlinear Formulation

• Decouple nonlinear and linear components

47

dttetehth hh

0

)()()( bxFxx AA

txiC 1

constant during Newton’s iterationcalculate Jacobian matrix

J(F) in MEXP has less non-zeros

h

ththetetteht

hht hhh )()(

)()()(2

)( 21 bbAIAbAΙxFxxFx AAA

approximate eAF

NLGC

h

2MEXP: NLL GGC h/BE:

48

• Rational basis A-1

– K(A-1, v) = {v, A-1v, …, A-mv}– requires more m and smaller h

Only Inverted

Image{h}

Real{h}

after shifted-and-invertedonly inverted

smaller spectrum

-1/ min

49

Different

needs large m

50

Different

51

Spectral Transformation – h = 10p• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace

52

Spectral Transformation – h = 10f• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace

53

Spectral Transformation – = 10f• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace

54

Spectral Transformation– = 1p• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace

55

Spectral Transformation– = 100p• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace

56

Sweep for Large Range

57

Sweep for Large Range

58

Difference Between Inverted and Rational

59

Fixed = 1p, sweep time step h

60

Fixed = 1n, sweep time step h

61

Fixed = 1u, sweep time step h

62

Fixed = 1m, sweep time step h

63

Fixed = 1, sweep time step h

64

Fixed = 1k, sweep time step h

65

Fixed = 1M, sweep time step h

circuit simulation via matrix exponential method speaker: shih-hung weng adviser: chung-kuan cheng...

Documents

simulation framework

simulation results

power grid simulation

long simulation time

todays design

design tools

printed circuit board

low frequency behavior