circuit simulation via matrix exponential method speaker: shih-hung weng adviser: chung-kuan cheng...
TRANSCRIPT
1
Circuit Simulation via Matrix Exponential Method
Speaker: Shih-Hung WengAdviser: Chung-Kuan Cheng
Date: 05/31/2013
2
Foundation of Design Flow
PlacementLogic Synthesis Timing Analysis Routing… … … …
Circuit Simulation
lookuptable
characterization Abstraction Layer
Circuit Simulation
3
Emerging Demands
• Full system verification and analysis– scalability and performance
time
voltageon-chip power grid
low frequency
Publications (1/3)• Circuit Simulation with Matrix Exponential Method:
1. S.-H. Weng, H. Zhuang and C.K. Cheng, “Adaptive Time Stepping for Power Grid Simulation using Matrix Exponential Method”, submitted to IEEE ICCAD 2013
2. S.-H. Weng, Q. Chen and C.K. Cheng, “Circuit Simulation using Matrix Exponential Method for Stiffness Handling and Parallel Processing”, IEEE ICCAD, Nov. 2012
3. Q. Chen, W. Schoenmaker, S.-H. Weng, C.K. Cheng, G.-H. Chen, L.-J. Jiang and N. Wong, “A Fast Time-Domain EM-TCAD Coupled Simulation Framework via Matrix Exponential,” IEEE ICCAD, Nov. 2012 (Best Paper Award Candidate)
4. Y. Li, Q. Cheng, S.-H. Weng, C.K. Cheng and N. Wong, “Globally Stable, Highly Parallelizable Fast Transient Circuit Simulation via Faber Series”, IEEE NewCAS May. 2012
5. S.-H. Weng, Q. Chen and C.K. Cheng, “Time-Domain Analysis of Large-Scale Circuits by Matrix Exponential Method with Adaptive Control”, IEEE Trans. on CAD, Jul. 2012
6. Q. Chen, S.-H. Weng and C.K. Cheng, “A Practical Regularization Technique for Modified Nodal Analysis in Large-Scale Time-Domain Circuit Simulation”, IEEE Trans. on CAD, Jun. 2012
7. S.-H. Weng, Q. Chen and C.K. Cheng, “Circuit Simulation by Matrix Exponential Method,” IEEE ASIC Conference, Oct. 2011
8. S.-H. Weng, P. Du and C.K. Cheng, “A Fast and Stable Explicit Integration Method by Matrix Exponential Operator for Large Scale Circuit Simulation”, IEEE ISCAS, May. 2011
4
Publications (2/3)• Clock Gating Synthesis:
9. S.-H Weng, Y.-M. Kuo and S.-C. Chang, “Timing Optimization in Sequential Circuit by Exploiting Clock-Gating Logic,” ACM Trans. on DAES, April 2012.
10. Y.-M. Kuo, S.-H. Weng, and S.-C. Chang, “A Novel Sequential Circuit Optimization with Clock Gating Logic,” IEEE ICCAD, Nov. 2008
• High-speed Interconnect:11. G. Sun, S.-H. Weng, C.K, Cheng, B. Lin and L. Zeng, “An On-Chip Global Broadcast Network Design
with Equalized Transmission Lines in the 1024-Core Era”, IEEE SLIP Jun. 201212. S.-H. Weng, Y. Zhang, J. F. Buckwalter and C.K. Cheng, “Energy Efficiency Optimization through Co-
Design of the Transmitter and Receiver in High-Speed On-Chip Interconnects”, accepted by IEEE Trans. on VLSI
• Placement and Routing:13. C.K. Cheng, P. Du, A.B. Kahng and S.-H. Weng, “Low-Power Gated Bus Synthesis for 3D IC via
Rectilinear Shortest-path Steiner Graph,” IEEE ISPD, Mar., 201214. P. Du, W. Zhao, S.H. Weng, C.K. Cheng and R.L. Graham, “Character Design and Stamp Algorithms
for Character Projection Electron-Beam Lithography,” IEEE ASPDAC, Feb., 2012
5
6
Publications (3/3)• Power Grid Analysis:
15. X. Hu, P. Du, S.-H. Weng and C.K. Cheng, “Worst-Case Noise Prediction With Non-zero Current Transition Times for Power Grid Planning,” accepted by IEEE Trans. on VLSI.
16. C.-C. Chou, H.-H. Chuang, T.-L. Wu, S.-H. Weng, and C.K. Cheng, “Eye Prediction of Digital Driver with Power Distribution Network Noise,” IEEE EPEPS, Nov. 2012 (Best Student Paper Award)
17. P. Du, S.-H. Weng, X. Hu and C.K. Cheng, “Power Grid Sizing via Convex Programming,” IEEE ASIC Conference, Oct. 2011
18. P. Du, X. Hu, S.H. Weng, A. Shayan, X. Chen, A. E. Engin and C.K. Cheng, “Worst-Case Noise Prediction with Non-zero Current Transition Times for Early Power Distribution System Verification,” IEEE ISQED, Mar. 2010
19. S.-H. Weng, Y.-M. Kuo, S.-C. Chang, and M. Marek-Sadowska, “Timing Analysis Considering IR Drop Waveforms in Power Gating Designs,” IEEE ICCD, Oct. 2008
7
Outline
• Numerical Integration in Circuit Simulation
• Matrix Exponential Method– Krylov Subspace Approximation– Rational Krylov Subspace Approximation– Parallelism
• Experimental Results
• Conclusions
Circuit Formulation
• Formulated as a system of DAEs [Ho et. al. ‘75]
8
ttt uxixGxCxq LL
resistance & incidence
capacitance & inductance
branch currents & nodal voltages
derivative of charges in nonlinear devices
input sources
currents of nonlinear devices
linearized by compact model (BSIM, PSP, etc.)
Circuit Formulation
• Formulated as a system of DAEs [Ho et. al. ‘75]
• Solve x(t) in implicit or explicit numerical method
9
)()()( ttt uGxxC
ttt uxixGxCxq LL
after linearization
10
forward Euler
backward Euler
Numerical Integration (1/2)
• Forward Euler (1st order explicit)
• Backward Euler (1st order implicit)
• Stability issue for stiff circuit
unstable result
)/()/( 11
1
nnn hh uxCGCx
nnn hh uCxGCIx 111 )(
performance & scalability issues
)()()( ttt uGxxC
sparse matrix-vector product
solving a linear system
11
MethodsLinear Nonlinear
High Mild Low High Mild LowForward Euler slow fast slow fast
Backward Euler mediumslow
Trapezoidal > Backward Euler
and beyond? fast
Numerical Integration (2/2)Methods Computation Scalability Error Stability Step size
Forward Euler x=Av high O(h2) low tiny
Backward Euler Ax=b low O(h2) A-stable medium
Trapezoidal Ax=b low O(h3) A-stable > Backward Euler
and beyond? simple high O(hn) high large
stiffness
lots Ax=b
one Ax=b with fixed step size in C/h+G
Performance = # steps x computation per stepcircuit dependent
more #steps
12
Outline
• Numerical Integration in Circuit Simulation
• Matrix Exponential Method– Krylov Subspace Approximation– Rational Krylov Subspace Approximation– Parallelism
• Experimental Results
• Conclusions
Matrix Exponential Method (1/2)
• Analytical solution of– Let A=-C-1G, b=C-1u (C can be regularized [TCAD ‘12])
• Let input be piecewise linear
13
dtetehth hh )()()(
0
)( bxx AA
h
ththeteteht hhh )()(
)()()()( 21 bbAIAbAΙxx AAA
h
thtt
)()()()(
bbbb
)()()( ttt uGxxC
Matrix Exponential Method (2/2)
• One-exponential formulation [Al-Mohy&Higham ‘11]
– reduce three matrix exponential to one
14
h
ththeteteht hhh )()(
)()()()( 21 bbAIAbAΙxx AAA
2
)( 0)(
e
teht
h
n
xIx J0
WA
1
0
)(
00
10
2
)()(
e
ththt bW
J
bbwhere
Advantages• Accuracy: Analytical solution
– Approximate eAh as (I+Ah) Forward Euler
– Approximate eAh as (I-Ah)-1 Backward Euler
• Stability: A-stable for passive circuits
15
reference solution
How to compute eAv?
16
Computation on Matrix Exponential
• 19 dubious ways[van Loan03]
Categories Based on
Series Method
Rational Approximation
Decomposition
Splitting
Quadrature Rule
Krylov Subspace
!3!2
32 AAAI
dzze
iz 1
2
1AI
AA
D
N
},,,{ vAAvv mspan
1SBSA CBBCCBCB eee
eA
eAv
small
large
spec(A)
regular basis and rational basis
17
Outline
• Numerical Integration in Circuit Simulation
• Matrix Exponential Method– Krylov Subspace Approximation– Rational Krylov Subspace Approximation– Parallelism
• Experimental Results
• Conclusions
Krylov Subspace Approximation (1/2)
• Krylov subspace K(A, v) = {v, Av, A2v, …, Am-1v}– orthogonalized by Arnoldi process
– approximate eAhv by eHmh
– posteriori error estimation[Saad92]
18
mmm AVVH {v, Av, A2v, …, Am-1v}Arnoldi process
12eee hh mH
mA Vvv
1Τ
21, eeehmmErr h
mkrylovmHHv
sparse matrix-vector multiplication
m is about 10~100
fast error estimation
scaling invariant
efficiency adaptivity
19
• Stiffness affects step size and dimension – Arnoldi process captures extreme and clustered eigenvalues
– Error bound [Saad92]
Krylov Subspace Approximation (2/2)
Image{h}
Real{h}
highly stiff
-max -min
Image{h}
Real{h}
captured regions
Arnoldi process with a small m
critical part for eAh
shrink h or increase m for capturing critical eigenvalues
! 12
1
2
m
eErr
m
krylov
v
2hAwhere
remedied by restarted scheme and scaling effect [ICCAD ‘12]
20
Outline
• Numerical Integration in Circuit Simulation
• Matrix Exponential Method– Krylov Subspace Approximation– Rational Krylov Subspace Approximation– Parallelism
• Experimental Results
• Conclusions
21
• Rational basis (I-A)-1
– K((I-A)-1, v) = {v, (I-A)-1v, …, (I-A)-mv}
Rational Krylov Subspace Approximation (1/2)
…..for j = 1, 2, . . . , m solve (I- A)w = vj for i = 1, 2, . . . , j Hi,j = wTvi
w = w − Hi,jvi
end Hj+1,j = |w|2
vj+1 = w/Hj+1,j
end
Arnoldi process
(C+G)w=Cvj
avoid regularization of C mmm VAIVH 1
mm VH ,
mm1m AVVHI
1
subspace for Aone LU for linear circuit
w=Avj
22
• Rational basis (I-A)-1
– K((I-A)-1, v) = {v, (I-A)-1v, …, (I-A)-mv}
• Approximation of eAhv
• Posteriori error estimation[van den Eshof 06]
1
~
2eee
hh mH
mA Vvv
Rational Krylov Subspace Approximation (1/2)
11
~/
Τ2
/~1,
~
m
h
mrational eh
eemmErr vAH
IH
v
m
Hm
1mm HIH ~
adaptivity
23
• Spectral transformation– similar to preconditioning– relax stiffness constraint– enable large step size with less dimension
’min ’maxsmall gap-max -min-h’’max -h’’min -’’max -’’min
Rational Krylov Subspace Approximation (2/2)
Image{h}
Real{h}
transforming spectrum by (I-A)-1
captured by Arnoldi processcritical part for eA
projecting back to A by 1/ (I-H-1)
applying large h to 1/ (I-H-1)
small m is acceptable
determined by
within a unit circle
24
• Spectral transformation– similar to preconditioning– relax stiffness constraint– enable large step size with less dimension
Rational Krylov Subspace Approximation (2/2)
small step size
fix , sweep m and h
1
~
2eeeError
hh mH
mA Vvv
25
• Spectral transformation– similar to preconditioning– relax stiffness constraint– enable large step size with less dimension
Rational Krylov Subspace Approximation (2/2)
= 10-12
large error
fix h , sweep m and
1
~
2eeeError
hh mH
mA Vvv
MethodsLinear Nonlinear
High Mild Low High Mild LowForward Euler slow fast slow fast
Backward Euler mediumslow
Trapezoidal > Backward Euler
Krylov Approx slow fast slow mediumRation Krylov fast slow
Wrap UpMethods Computation Scalability Error Stability Step size
Forward Euler x=Av high O(h2) low tiny
Backward Euler Ax=b low O(h2) A-stable medium
Trapezoidal Ax=b low O(h3) A-stable > Backward Euler
Krylov Approx x=Av high O(hn) high medium
Ration Krylov Ax=b low O(hn) high large
26
27
Outline
• Numerical Integration in Circuit Simulation
• Matrix Exponential Method– Krylov Subspace Approximation– Rational Krylov Subspace Approximation– Parallelism
• Experimental Results
• Conclusions
Parallelism in Krylov Subspace
• Arnoldi process– sparse matrix-vector multiplication [Bell&Garland ‘09]
• Exponential of a small matrix [Higham ‘05]
– dense matrix by matrix operation
28
n
n
nnn
n
x
x
x
x
aa
aa
1
2
1
,1,
,11,1
…
thread 1thread 2
thread n-1thread n
29
t9
• Constant slope within a step
Input Grouping
h
ththeteteht hhh )()(
)()()()( 21 bbAIAbAΙxx AAA
input 1
input 2
timet1 t2 t3 t4 t5 t6 t7 t8 t10t11
t12t13t14t15
tiny steps due to maintaining constant slope
30
• Constant slope within a step
Input Grouping
group 1
group 2
time
timet1 t2 t3 t4 t5 t6 t7 t8
t1 t2 t3 t4 t5 t6 t7 t8
thread 1
thread 2
31
Outline
• Numerical Integration in Circuit Simulation
• Matrix Exponential Method– Krylov Subspace Approximation– Rational Krylov Subspace Approximation– Parallelism
• Experimental Results
• Conclusions
Settings of Experiments
• Environment– Implemented in Matlab– Intel i7 2.67GHz with 4GB memory
• Benchmarks– Nonlinear and large-scale circuits– Power distribution networks– IBM power grid testcases[Nassif 08]
32
Design Category # R # C # Trans. Size StiffnessD1 16bit adder 723 34 448 579 1.1x103
D2 ALU 13.6K 4.3K 6502 10K 5.4x106
D3 IO 1.26M 34.6K 1461 630K 1.6x106
D4 Power grid 10.4M 8.6M 0 12M 2.6x105
max
min
R
R
generalized eigenvalues of (G, C)
Settings of Experiments
• Environment– Implemented in Matlab– Intel i7 2.67GHz with 4GB memory
• Benchmarks– Nonlinear and large-scale circuits– Power distribution networks– IBM power grid testcases[Nassif 08]
33
Design Area (mm2) # R # C # L Size StiffnessP1 0.352 23K 15K 15K 45.7K 8.7x109
P2 1.402 348K 228K 228K 688K 8.3x109
P3 2.802 1.46M 0.97M 0.97M 2.90M 1.0x1010
P4 5.002 3.75M 2.47M 2.47M 7.40M 1.0x1010
RC tanks for PCB and package
Settings of Experiments
• Environment– Implemented in Matlab– Intel i7 2.67GHz with 4GB memory
• Benchmarks– Nonlinear and large-scale circuits– Power distribution networks– IBM power grid testcases[Nassif 08]
34
Design # R # C # L # I # V Size Stiffnessibmpg2t 245K 36K 330 36K 330 164K 3.5x1012
ibmpg3t 1.60M 201K 955 201K 955 1M 3.4x1011
ibmpg4t 1.83M 265K 962 266K 962 1.2M 2.5x1011
ibmpg5t 1.55M 473K 277 473K 539K 2.1M 4.7x1011
ibmpg6t 2.41M 761K 281 761K 836K 3.2M 3.8x1011
Nonlinear and Large-scale Circuits
• Matrix exponential method (MEXP)– Krylov subspace approximation – Restarted scheme and parallel SpMV on GPU
• Trapezoidal method (TRAP)– same adaptive scheme as MEXP
35
Design Size time m TRAP MEXP-Krylov speedupD1 579 100ps 20 671.4s 408.7s 1.64XD2 10K 100ps 30 3,085.91s 982.14s 3.14XD3 630K 100ps 30 8,053.45s 535.92s 15.05XD4 12M 1ns 20 fails 629.56 n/a
Parallel SpMV
36
Power Distribution Networks
• Simulate long time span (1μs) for step response• One LU factorization
– averaged by forward/backward substitutions• MEXP with rational basis adaptively scales h/• TRAP uses predetermined step size
DesignTRAP (h = 10ps) MEXP – Rational ( = 10-10)
LU(s) Total LU(s) Total Speedup
P1 0.67 44.85m 0.68 2.86m 15.73X
P2 15.60 15.43h 15.48 54.57m 16.96X
P3 91.60 76.92h 93.28 4.30h 17.91X
P4 293.81 203.64h 298.83 11.26h 18.08X
adaptive & large step size
37
Power Distribution Networks
38
IBM Testcases
• Widely adopted benchmarks • Many input current sources• Same MEXP with rational basis and TRAP
DesignTRAP (h = 10ps) MEXP – Rational ( = 10-10)
LU(s) Total(s) LU(s) Total(s) Speedup
ibmpg2t 1.31 48.19 1.29 41.81 1.15X
ibmpg3t 18.05 493.97 18.41 413.90 1.19X
ibmpg4t 30.32 675.78 31.01 229.13 2.95X
ibmpg5t 16.16 657.13 16.48 649.97 1.01X
ibmpg6t 23.99 965.53 34.60 915.62 1.05X
ill alignment
39
IBM Testcases
40
• Applying simple grouping – each group of inputs has the same pivot points– 6X speedup on average
IBM Testcases
DesignTRAP (h = 10ps) MEXP – Rational ( = 10-10)
LU(s) Total (s) # Group LU (s) Total (s) Speedup
ibmpg2t 1.31 48.19 25 1.29 7.93 6.77X
ibmpg3t 18.05 493.97 25 18.41 86.24 6.08X
ibmpg4t 30.32 675.78 4 31.01 124.16 5.73X
ibmpg5t 16.16 657.13 25 16.48 111.97 5.44X
ibmpg6t 23.99 965.53 25 34.60 166.34 5.80X
41
Conclusions
• Emerging challenges in the circuit simulation – scalability and performance
• Matrix exponential method– accuracy, adaptivity and stability– regular and rational Krylov subspace approximation
• Effectiveness of matrix exponential method– Simulate a large-scale circuit with 12M nodes– Nonlinear circuits: 6.61X speedup on average– Impulse response for PDNs: 15X speedup– IBM testcases: 6X speedup using input grouping
42
Future Works
• Variant basis in Krylov subspace– inverted, extended basis
• Model Order Reduction and matrix exponential method– both exploiting Krylov subspace– utilizing well-developed MOR to MEXP
• Hybrid simulation via matrix exponential– handle thermal, mechanical phenomena with FEM
43
Thank you!
• Trade off between stability and performance
SILCA [Li & Shi, ‘03]
ACES [Devgan & Rohrer, ‘97]
Where are we?
44
computationaleffort
stability
high
low high
Backward Euler
Forward Euler
Matrix Exponential Method [Weng et. al. ’11]
Telescopic [Dong & Li, ‘10]
Waveform Relaxation [E Lelarasmee et. al, ‘82]Domain Decomposition [K. Sun et. al., ‘07]
LIM [J. E. Schutt-Aine, ‘01]
Tailor for circuit simulation:• Adaptive step control• Scaling effect• Nonlinear device• Parallelization
ETD in numerical community:• [Saad ‘92]• [Ban et. al. ‘11]• [Aluffi-Pentini et. al. ‘03]• [Hochbruck et. al. ‘97]
Trapezoidal Method(SPICE)
Adaptive Step Control
• Typical circuit behavior
45
larger h
smaller h
total
total
T
Errherr
error budget
2
)( 0)(
e
teht
h
n
xIx J0
WA
Adaptive Step Size Strategy
• Adjustment of step size– Krylov subspace approximation
• require only to scale Hm: αA→αHm
• re-calculate eHm
– backward Euler• (C/h+G) changes and needs to solve linear system again
• Strategy: – maximize step size with a given error budget Errtotal
– error are from Krylov space method and linearization
46
)(/)/( 11
1
nnn hh BuxCGCx
total
NLtotal
nonlineartotal
Ltotal
krylov T
ErrhErr
T
ErrhErr
Nonlinear Formulation
• Decouple nonlinear and linear components
47
dttetehth hh
0
)()()( bxFxx AA
txiC 1
constant during Newton’s iterationcalculate Jacobian matrix
J(F) in MEXP has less non-zeros
h
ththetetteht
hht hhh )()(
)()()(2
)( 21 bbAIAbAΙxFxxFx AAA
approximate eAF
NLGC
h
2MEXP: NLL GGC h/BE:
48
• Rational basis A-1
– K(A-1, v) = {v, A-1v, …, A-mv}– requires more m and smaller h
Only Inverted
Image{h}
Real{h}
after shifted-and-invertedonly inverted
smaller spectrum
-1/ min
49
Different
needs large m
50
Different
51
Spectral Transformation – h = 10p• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace
52
Spectral Transformation – h = 10f• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace
53
Spectral Transformation – = 10f• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace
54
Spectral Transformation– = 1p• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace
55
Spectral Transformation– = 100p• Small RC mesh, 100 by 100• Different h for Krylov subspace• Different for rational Krylov subspace
56
Sweep for Large Range
57
Sweep for Large Range
58
Difference Between Inverted and Rational
59
Fixed = 1p, sweep time step h
60
Fixed = 1n, sweep time step h
61
Fixed = 1u, sweep time step h
62
Fixed = 1m, sweep time step h
63
Fixed = 1, sweep time step h
64
Fixed = 1k, sweep time step h
65
Fixed = 1M, sweep time step h