gpu-enabled simulation and visualization of nanoelectronic ......step 3: splitsolve algorithm trick:...

20
Mathieu Luisier, Sascha Brück, Mauro Calderara, and Jean M. Favre* Integrated Systems Laboratory, ETH Zürich, Switzerland *CSCS, Lugano, Switzerland Thursday 19 November 15 1 NVIDIA Booth Presentation, SC15, November 2015 GPU-Enabled Simulation and Visualization of Nanoelectronic Devices

Upload: others

Post on 30-Jul-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Mathieu Luisier, Sascha Brück, Mauro Calderara, and Jean M. Favre* Integrated Systems Laboratory, ETH Zürich, Switzerland

*CSCS, Lugano, Switzerland

Thursday 19 November 15 1 NVIDIA Booth Presentation, SC15, November 2015

GPU-Enabled Simulation and Visualization of Nanoelectronic Devices

Page 2: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Thursday 19 November 15 2

•  Nanoelectronic Device Simulations From Moore’s Law to TCAD

•  Ab-initio Simulation Approach Models and scalability

•  Visualization of internal quantities Nanoscale current trajectories

•  Outlook and Conclusion Advanced Device Simulations

Overview

Page 3: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Thursday 19 November 15 3

•  Nanoelectronic Device Simulations From Moore’s Law to TCAD

•  Ab-initio Simulation Approach Models and scalability

•  Visualization of internal quantities Nanoscale current trajectories

•  Outlook and Conclusion Advanced Device Simulations

Overview

Page 4: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

4

Motivation: Future of Moore’s Scaling Law

65nm (2005)

45nm (2007)

32nm (2009)

22nm (2011)

The transistor evolution is governed by Moore’s Scaling Law

Thursday 19 November 15

XXnm (202X)

??? Every 18-24 months, 30% dimension scaling. ⇒  area is divided by 2.

Breakthrough: 2D->3D

Page 5: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

New Structures

Next Generation Devices

Thursday 19 November 15 5

Page 6: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

New Materials

Next Generation Devices

Thursday 19 November 15 6

Page 7: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

New Concepts

Advanced Design Tool(s) Required

Next Generation Devices

Thursday 19 November 15 7

Page 8: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

• 3D Quantum Transport Solver • Accurate Representation of the

Material Properties • Atomistic Description of Devices

• Multi-Physics Modeling OMEN

Physical Models Device Engineering

Efficient Parallel Computing

GAA NW

Electron Density

Id-Vgs

•  Industrial-Strength Nano-electronic Device Simulator

• Multi-Geometry Capabilities • Explore, Understand, Predict,

Optimize Novel Designs

TCAD Tool Requirements

Three different perspectives: EE+PHYS+HPC

• Accelerate Simulation Time •  Investigate New Phenomena

at the Nanometer Scale • Move Hero Experiments to a

Day-to-Day Basis

8 Thursday 19 November 15

Page 9: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Thursday 19 November 15 9

•  Nanoelectronic Device Simulations From Moore’s Law to TCAD

•  Ab-initio Simulation Approach Models and scalability

•  Visualization of internal quantities Nanoscale current trajectories

•  Outlook and Conclusion Advanced Device Simulations

Overview

Page 10: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

10 Thursday 19 November 15

Computational Approach

H (k) ⋅Ψ(k) = E(k) ⋅Ψ(k)

What is needed: solution of Schrödinger equation

Computational Goal: wave function ψ(k) and energy E(k)

Device Simulation: open system (NEGF)

Key Component of NEGF equations

H (r,k)

(E −H (k)−ΣR (E,k)) ⋅GR (E,k) = IG<(E,k) =GR (E,k) ⋅ Σ<(E,k) ⋅GA (E,k)

: Hamiltonian matrix in selected basis (EMA, TB, DFT)

Page 11: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Ab-initio Quantum Transport Simulations

Thursday 19 November 15 11

Coupling of OMEN with a density-functional theory (DFT) tool: CP2K

1.  Hamiltonian H and overlap S matrices created in CP2K and transferred in OMEN 2.  Ab-initio transport simulations in OMEN with DFT instead of tight-binding basis

Step 2

Step 1

Collaboration with the group of Prof. J. VandeVondele (ETH Zurich)

Page 12: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

NEGF Solution of Schrödinger Equation

Thursday 19 November 15 12

Open Boundary Conditions §  Generalized Eigenvalue Problem

§  Difficult to parallelize 1 core ~1.8 hrs (on Cray XC30)

Run

time

[s]

Matrix Inversion §  Recursive Green’s Function algorithm

§  Inherently sequential

4820

1290

0

1000

2000

3000

4000

5000

6000

NEGF

rGF OBC

(E ⋅SCP2K −HCP2K − ΣRB) ⋅GR(E) = I

Nanowire Transistor

Equation to solve:

for each energy E ~10000 atoms

Page 13: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Accelerating DFT+NEGF QT Simulations (1)

Thursday 19 November 15 13

Step 1: Moving from NEGF to QTBM

(E ⋅SCP2K −HCP2K − ΣRB) ⋅GR(E) = I

(E ⋅SCP2K −HCP2K − ΣRB) ⋅ Ψ(E) = Inj(E)

Run

time

[s]

0

1000

2000

3000

4000

5000

6000

NEGF QTBM

Boundary

Schrödinger Eq.

MUMPS on 16 cores: 48x faster than RGF on 1 core

Step 2: FEAST for OBCs

Run

time

[s]

0

200

400

600

800

1000

1200

1400

SI FEAST FEAST+

Boundary

Schrödinger Eq.

FEAST on 32 cores: 43x faster than shift-and-invert on 1 core

Solve generalized EV problem through contour integration

Page 14: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Thursday 19 November 15 14

Accelerating DFT+NEGF QT Simulations (2)

Step 3: SplitSolve Algorithm

Trick: interleave computation of on CPUs and GPU solution of

(E ⋅SCP2K −HCP2K − ΣRB) ⋅ Ψ(E) = Inj(E)

0

20

40

60

80

100

120

140

FEAST & MUMPS

FEAST & SplitSolv

Schrödinger Eq.

Boundary

Run

time

[s]

Strong scaling on Titan @ ORNL

15 PFlop/s on 18564 Hybrid Nodes

Gordon Bell Prize Finalist 2015

ΣRB(E)

Solution time: Time to compute OBCs on CPUs Speedup: >5x

Page 15: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Thursday 19 November 15 15

•  Nanoelectronic Device Simulations From Moore’s Law to TCAD

•  Ab-initio Simulation Approach Models and scalability

•  Visualization of internal quantities Nanoscale current trajectories

•  Outlook and Conclusion Advanced Device Simulations

Overview

Page 16: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Current Flow through Nanodevices

Thursday 19 November 15 16

K. Storm et al., NL 12, 1-6 (2012)

GAA Nanowire Transistor

Source

Gate Drain

Atomic-scale filament

A.  Emboras et al., arXiv:1508.07748

Memristor

Page 17: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Visualization Techniques

Thursday 19 November 15 17

Software: VTK from Kitware

New OpenGL2 access for better visualization

2015 marks great milestones by the Kitware team, enabling

much improved Molecular Support, pushing a great deal of

geometry creation and rendering on the GPU:

§  Improved Molecular Rendering

§  Improved Glyphing

§  Improved Point Sprites

Page 18: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Thursday 19 November 15 18

•  Nanoelectronic Device Simulations From Moore’s Law to TCAD

•  Ab-initio Simulation Approach Models and scalability

•  Visualization of internal quantities Nanoscale current trajectories

•  Outlook and Conclusion Advanced Device Simulations

Overview

Page 19: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

19 Thursday 19 November 15

Conclusion

• Next-Generation Nano-Devices TCAD to accelerate process innovation

• Proposed Simulation Approach Dedicated to large variety of nanoscale devices GPU as computation and visualization enabler Excellent scalability on petascale machines

• Future Work and Challenges Establish multi-scale coupling DFT->TB->DD Work more closely with experimental groups

Page 20: GPU-Enabled Simulation and Visualization of Nanoelectronic ......Step 3: SplitSolve Algorithm Trick: interleave computation of on CPUs and GPU solution of (E⋅S CP2K −H CP2K RBE)=Inj(E)

Thursday 19 November 15 20

Acknowledgments

We are deeply grateful to the following persons/organizations:

•  Joost VandeVondele (ETH Zürich)

•  Jack Wells and his team (ORNL)

•  Peter Messmer (NVIDIA)

•  Ichitaro Yamazaki (MAGMA)