what does it take to accelerate spice on the gpu? | gtc...

28
What Does It Take to Accelerate SPICE on the GPU? M. Naumov, F. Lannutti, S. Chetlur, L.S. Chien and P. Vandermersch

Upload: dinhtuyen

Post on 31-Jan-2018

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

What Does It Take to Accelerate SPICE on the GPU?

M. Naumov, F. Lannutti, S. Chetlur, L.S. Chien and P. Vandermersch

Page 2: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

What is SPICE?

Simulation Program with Integrated Circuit Emphasis

— First version was developed by Laurence Nagel in 1973

— http://en.wikipedia.org/wiki/SPICE

There exist many variations (not limited to)

— Academic:

ngspice, spice3 (UC – Berkeley), XSPICE (GeorgiaTech)

— Industrial:

HSPICE (Synopsys), Pspice (Cadence), Eldo (Mentor), EEsof (Agilent)

Page 3: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

What does SPICE do?

Vs

1

Ixs

R1 R3

R2

2

R4

3

Circuit (diagram):

Page 4: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

What does SPICE do?

nodes i j

R1 1 2 1k

R2 2 0 1k

R3 2 3 0.4k

R4 3 0 0.1k

V1 1 0 PWL (0 0 1n 0 1.1n 5 2n 5)

Vs

1

Ixs

R1 R3

R2

2

R4

3

Circuit (diagram): Netlist (text file):

Page 5: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

What does SPICE do?

Resistor Stamp

Sparse Matrix RHS

Vi Vj Ixs

i 1/R -1/R

j -1/R 1/R

xs

nodes i j

R1 1 2 1k

R2 2 0 1k

R3 2 3 0.4k

R4 3 0 0.1k

V1 1 0 PWL (0 0 1n 0 1.1n 5 2n 5)

Voltage Source Stamp

Sparse Matrix RHS

Vi Vj Ixs

i 1

j -1

xs 1 -1 Vs

Vs

1

Ixs

R1 R3

R2

2

R4

3

Circuit (diagram): Netlist (text file):

Physics (Kirchhoff + Ohms + ...):

row col

row col

Page 6: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

What does SPICE do?

Resistor Stamp

Sparse Matrix RHS

Vi Vj Ixs

i 1/R -1/R

j -1/R 1/R

xs

nodes i j

R1 1 2 1k

R2 2 0 1k

R3 2 3 0.4k

R4 3 0 0.1k

V1 1 0 PWL (0 0 1n 0 1.1n 5 2n 5)

Voltage Source Stamp

Sparse Matrix RHS

Vi Vj Ixs

i 1

j -1

xs 1 -1 Vs

Vs

1

Ixs

R1 R3

R2

2

R4

3

Circuit (diagram): Netlist (text file):

Physics (Kirchhoff + Ohms + ...): Linear system (sparse):

row col

row col

source

-1

(1/R3+1/R4)

1/R1 -1/R1

-1/R1 (1/R1+1/R2+1/R3) -1/R3

-1/R3

1 Vs Ixs

V1

V2

V3 node 3

node 2

node 1

Page 7: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Input

— Parse netlist and setup internal data structures

DC Analysis

— Device model evaluation

— Linear system solution

Transient Analysis

— Device model evaluation

— Linear system solution

— Truncation error + Time step correction

SPICE Details

Newton-Raphson

For each time step:

- Newton-Raphson

Page 8: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Device Model Evaluation

— Takes between 30%-60% of the simulation

DC Analysis

— Device model evaluation

— Linear system solution

Transient Analysis

— Device model evaluation

— Linear system solution

— Truncation error + Time step correction

SPICE Details

Newton-Raphson

For each time step:

- Newton-Raphson

Page 9: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Linear System Solution

— Takes between 30%-60% of the simulation

DC Analysis

— Device model evaluation

— Linear system solution

Transient Analysis

— Device model evaluation

— Linear system solution

— Truncation error + Time step correction

SPICE Details

Newton-Raphson

For each time step:

- Newton-Raphson

Page 10: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Basic models

— Resistor, Capacitor, Inductor, Voltage and Current Source

Transistor models

— MOSFET transistor (BSIM4v7, PSP, etc.)

— Bipolar transistor (Ebers–Moll, Gummel-Poon, etc.)

Other models

— Diodes, etc.

Device Model Evaluation

Page 11: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Basic models

— Resistor, Capacitor, Inductor, Voltage and Current Source

Transistor models

— MOSFET transistor (BSIM4v7, PSP, etc.)

— Bipolar transistor (Ebers–Moll, Gummel-Poon, etc.)

Other models

— Diodes, etc.

Device Model Evaluation

focus of this presentation

Page 12: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Key Idea (Transistor - BSIM4v7)

— Many branches are related to fixed parameters

Temperature

Operation Regime

— Reorganize the code (slightly)

Minimize thread divergence

Maximize memory coalescing

Device Model Evaluation

if() if() if() ... else ... else ...

T1 T2 T3 T4 T10K Tn T100K

... ... ...

BSIM

4v7

Inst

ances

...

Page 13: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Basic Device Model Evaluation

0

5

10

15

20

25

30

35

40

8192 16384 32768 65536 131072

Speedup

number of instances of models*

Resistor Netlist

Capacitor Netlist

Inductor Netlist

*NGSPICE

*NVIDIA C2070, ECC on

*Intel X5690 (Nehalem, 6 CoreTM) @ 3.47GHz

1) Resistor Netlist: all resistors;

2) Capacitor Netlist: half capacitors and half resistors;

3) Inductor Netlist: half resistors, quarter capacitors and quarter inductors;

Performance may vary based on

OS version and motherboard configuration

Page 14: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Transistor (BSIM4v7) Device Model Evaluation

0

10

20

30

40

50

60Tim

e (

ms)

ISCAS85 Benchmark Suite

CPU (1 core)

GPU

6.67x

*NGSPICE

*NVIDIA C2070, ECC on

*Intel X5690 (Nehalem, 6 CoreTM) @ 3.47GHz Performance may vary based on

OS version and motherboard configuration

Page 15: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Solve a set of (sparse) linear systems

Ai xi = fi for i=1,...,k

where the coefficient matrices Ai have the same sparsity pattern

Matrix properties

— Nonsymmetric

— Ill-conditioned

Different methods

— Direct Methods (LU factorization + triangular solve)

— Iterative Methods (GMRES, BiCGStab, etc.)

Solution of Linear Systems

Page 16: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Solve a set of (sparse) linear systems

Ai xi = fi for i=1,...,k

where the coefficient matrices Ai have the same sparsity pattern

Matrix properties

— Nonsymmetric

— Ill-conditioned

Different methods

— Direct Methods (LU factorization + triangular solve)

— Iterative Methods (GMRES, BiCGStab, etc.)

Solution of Linear Systems

focus of this presentation

Page 17: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Original linear system

A x = f

Reordering (to minimize fill-in)

(A Q) (QT x) = f where QTQ=QQT=I

Pivoting

(PT A Q) (QT x) = PT f where PTP=PPT=I

LU factorization

(PT A Q) = L U

Forward and backward (triangular) solve

L (U y) = b where y = QT x and b = PT x

Sparse Direct Methods

Page 18: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Recall

— Solving a set of linear systems

— Coefficient matrices have the same sparsity pattern

Assume

— reordering (to minimize fill-in) is the same

— pivoting is also constant

LU factorization (i=1)

(PT A Q) = L U

LU re-factorization (i=2,...,k)

— Sparsity (the required memory) of L and U is known ahead of time

Sparse Direct Methods

focus of this presentation

Page 19: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Key Idea

— LU-factorization

A = LU

— Incomplete-LU factorization

M = L(zeroed)+U(zeroed)+A

GLU: LU re-factorization on the GPU

equivalent

Page 20: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Key Idea

— LU-factorization

A = LU

— Incomplete-LU factorization

M = L(zeroed)+U(zeroed)+A

GLU: LU re-factorization on the GPU

equivalent

A1 = L1U1 (i=1)

Mi = L1(zeroed)+U1

(zeroed)+Ai

(i=2,...,k)

Solving a set of systems

Ai xi = fi (i=1,...,k)

Page 21: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Key Idea

— LU-factorization

A = LU

— Incomplete-LU factorization

M = L(zeroed)+U(zeroed)+A

— Many parallel techniques are applicable

GLU: LU re-factorization on the GPU

equivalent

A1 = L1U1 (i=1)

Mi = L1(zeroed)+U1

(zeroed)+Ai

(i=2,...,k)

Solving a set of systems

Ai xi = fi (i=1,...,k)

Page 22: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

GLU

— Developed in CUDA programming language for GPUs

— Sparsity pattern of L and U known ahead of time

— Memory requirements known ahead of time

vs. KLU, which is

— Designed specifically for circuit simulation

— Gilbert-Peierls (single threaded)

vs. PARDISO, which is

— Supernodal method (multi-threaded)

GLU: LU re-factorization on the GPU

Review of sparse direct solvers can be found at

http://www.cise.ufl.edu/research/sparse/codes/

Test matrices can be found at

http://www.cise.ufl.edu/research/sparse/matrices/

Page 23: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

GLU Speedup (C2070)

*NVIDIA C2070, ECC on

*Intel X5680 (Nehalem, 6 CoreTM) @ 3.33GHz, MKL 10.3.6 Performance may vary based on

OS version and motherboard configuration

0

1

2

3

4

rajat17 rajat23 trans4 G2_circuit transient ASIC_680ks ASIC_680k G3_circuit Freescale1 circuit5M

Speedup

GLU vs. KLU (1t)

GLU vs. PARDISO (6t)

14.3 7.5 7.0|25.2

Page 24: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

GLU Speedup (K20x)

Performance may vary based on

OS version and motherboard configuration

0

1

2

3

4

rajat17 rajat23 trans4 G2_circuit transient ASIC_680ks ASIC_680k G3_circuit Freescale1 circuit5M

Speedup

GLU vs. KLU (1t)

GLU vs. PARDISO (8t)

16.1 8.6 7.0|5.4

* NVIDIA K20, ECC on

* Intel E5-2687w (Sandy Bridge, 8 CoreTM) @ 3.1GHz, MKL 10.3.6

Page 25: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

GLU Speedup (K20x)

Performance may vary based on

OS version and motherboard configuration

0

1

2

3

4

rajat17 rajat23 trans4 G2_circuit transient ASIC_680ks ASIC_680k G3_circuit Freescale1 circuit5M

Speedup

GLU vs. KLU (1t)

GLU vs. PARDISO (8t)

16.1 8.6 7.0|5.4

Average Speedup vs. KLU: 2x

* NVIDIA K20, ECC on

* Intel E5-2687w (Sandy Bridge, 8 CoreTM) @ 3.1GHz, MKL 10.3.6

Page 26: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

GLU Speedup (K20x)

* NVIDIA K20, ECC on

* Intel E5-2687w (Sandy Bridge, 8 CoreTM) @ 3.1GHz, MKL 10.3.6 Performance may vary based on

OS version and motherboard configuration

0

1

2

3

4

rajat17 rajat23 trans4 G2_circuit transient ASIC_680ks ASIC_680k G3_circuit Freescale1 circuit5M

Speedup

GLU vs. KLU (1t)

GLU vs. PARDISO (8t)

16.1 8.6 7.0|5.4

Average Speedup vs. PARDISO: 2.5x

Page 27: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

SPICE simulation two most time consuming parts

— Device model evaluation

— Solution of linear systems

Device model evaluation

— Speedup* of up to 6x

Solution of linear systems

— Speedup* (average) of 2x

GPU (overall) acceleration

— SPICE (overall expected) speedup of 2-3x

— No slowdown: easy to test an iteration (and revert back if needed)

Conclusion

*: speedup is dependent on input parameters

Page 28: What Does It Take to Accelerate SPICE on the GPU? | GTC …on-demand.gputechconf.com/gtc/2013/presentations/S3364-SPICE... · What Does It Take to Accelerate SPICE on the GPU? M

Questions?

Thank you