vasp on gpus - nvidiaimages.nvidia.com/events/sc15/pdfs/sc5107-vasp-gpus.pdfgpu vasp will give 2-4x...

36
Intro Correctness Usage Road-map VASP on GPUs When and how Max Hutchinson University of Chicago November 17, 2015 Max Hutchinson (UofC) VASP on GPUs November 17, 2015 1 / 18

Upload: others

Post on 16-Mar-2020

28 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

VASP on GPUsWhen and how

Max HutchinsonUniversity of Chicago

November 17, 2015

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 1 / 18

Page 2: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Big thanks to

Carnegie Mellon group

Michael Widom

ENS/IFPEN group

Paul Fleurat-Lessard

Thomas Guignon

Ani Anciaux-Sedrakian

Philippe Sautet

RWTH Aachen Group

Stefan Maintz Bernhard Eck Richard Dronskowski

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 2 / 18

Page 3: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Big thanks to

University of Vienna group

Georg Kresse Martijn Marsman Doris Vogtenhuber

NVIDIAChristoph Angerer

Jeroen Bedorf

Arash Ashari

Mark Berger

Sarah Tariq

Dusan Stosic

Paul Springer

Jerry Chen

Anthony Scudiero

Darko Stosic

Przemek Tredak

Cliff Woolley

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 3 / 18

Page 4: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

What is VASP

VASP is a complex package for performing ab-initioquantum-mechanical molecular dynamics (MD) simulations usingpseudopotentials or the projector-augmented wave method and aplane wave basis set1.

1VASP the GUIDEMax Hutchinson (UofC) VASP on GPUs November 17, 2015 4 / 18

Page 5: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

VASP Users and Usage12-20% of CPU cycles @ HPC centers

AcademiaPhysics and physical chemistry

Materials science

Chemical engineering

IndustryBig semiconductor

Materials – metals, ceramics, polymers

Oil and gas

Chemicals

Usage @ Ohio SC’s Oakley 2

212/14 – 2/15, via pbsacctMax Hutchinson (UofC) VASP on GPUs November 17, 2015 5 / 18

Page 6: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

A brief history

Multiple prototypes (2009-2012)

Diagonalization for traditional DFT34(IFPEN, ENS, Aachen)

Exact-exchange for hybrid functionals5(CMU, UChicago)

Cooperation and tuning (2012 - 2014)

Merge prototypes with VASP 5.3.1

Performance tune with NVIDIA engineers

3M. Hacene et al., DOI:10.1002/jcc.230964S. Maintz et al., DOI:10.1016/j.cpc.2011.03.0105M. Hutchinson and M. Widom, DOI:10.1016/j.cpc.2012.02.017Max Hutchinson (UofC) VASP on GPUs November 17, 2015 6 / 18

Page 7: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

A brief history

Multiple prototypes (2009-2012)

Diagonalization for traditional DFT34(IFPEN, ENS, Aachen)

Exact-exchange for hybrid functionals5(CMU, UChicago)

Cooperation and tuning (2012 - 2014)

Merge prototypes with VASP 5.3.1

Performance tune with NVIDIA engineers

3M. Hacene et al., DOI:10.1002/jcc.230964S. Maintz et al., DOI:10.1016/j.cpc.2011.03.0105M. Hutchinson and M. Widom, DOI:10.1016/j.cpc.2012.02.017Max Hutchinson (UofC) VASP on GPUs November 17, 2015 6 / 18

Page 8: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

A brief history

Multiple prototypes (2009-2012)

Diagonalization for traditional DFT34(IFPEN, ENS, Aachen)

Exact-exchange for hybrid functionals5(CMU, UChicago)

Cooperation and tuning (2012 - 2014)

Merge prototypes with VASP 5.3.1

Performance tune with NVIDIA engineers

3M. Hacene et al., DOI:10.1002/jcc.230964S. Maintz et al., DOI:10.1016/j.cpc.2011.03.0105M. Hutchinson and M. Widom, DOI:10.1016/j.cpc.2012.02.017Max Hutchinson (UofC) VASP on GPUs November 17, 2015 6 / 18

Page 9: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

A brief history

Acceptance and distribution (2015)

GPU support accepted by Vienna

Integrated development environments

Established correctness

To be included in standard VASP releases

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 7 / 18

Page 10: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Establishing correctness

We’ve taken a three-pronged approach to validation:1. Internal testing against ∼ 50 cases collected from collaborators

Focus on actively ported algorithms and models

2. Acceptance testing against ∼ 100 cases by ViennaCover wider variety of VASP usage patterns

3. Beta testing by 37 early access groupsCover a wider variety of hardware and environments

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 8 / 18

Page 11: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Establishing correctness

We’ve taken a three-pronged approach to validation:1. Internal testing against ∼ 50 cases collected from collaborators

Focus on actively ported algorithms and models

2. Acceptance testing against ∼ 100 cases by ViennaCover wider variety of VASP usage patterns

3. Beta testing by 37 early access groupsCover a wider variety of hardware and environments

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 8 / 18

Page 12: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Establishing correctness

We’ve taken a three-pronged approach to validation:1. Internal testing against ∼ 50 cases collected from collaborators

Focus on actively ported algorithms and models

2. Acceptance testing against ∼ 100 cases by ViennaCover wider variety of VASP usage patterns

3. Beta testing by 37 early access groupsCover a wider variety of hardware and environments

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 8 / 18

Page 13: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Establishing correctness

We’ve taken a three-pronged approach to validation:1. Internal testing against ∼ 50 cases collected from collaborators

Focus on actively ported algorithms and models

2. Acceptance testing against ∼ 100 cases by ViennaCover wider variety of VASP usage patterns

3. Beta testing by 37 early access groupsCover a wider variety of hardware and environments

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 8 / 18

Page 14: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Beta testing

Three types of issues

Use of unsupported features

Merge with site-customized files (esp. main.F)

Bugs in edge cases

Generally positive feedback

“The short version is ‘it works”’

“So far I found no problems, the code is fast and stable.”

“Absolute time to solution is faster with GPUs.”

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 9 / 18

Page 15: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Release schedule

GPU support in official release

Add CUDA paths and libraries to makefile.include

make gpu gpu ncl

Executables are bin/gpu and bin/gpu ncl

We expect the release by the end of the year.

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 10 / 18

Page 16: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Release schedule

GPU support in official release

Add CUDA paths and libraries to makefile.include

make gpu gpu ncl

Executables are bin/gpu and bin/gpu ncl

We expect the release by the end of the year.

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 10 / 18

Page 17: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Feature support

Fully supported

Davidson

R-space projection

RMM-DIIS

Non-collinear

Exact-exchange

KPAR

Passively supported

[sc]GW[0] Damped All (Algo)

Unsupported

G-space projection NCORE > 1 EFIELD PEAD

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 11 / 18

Page 18: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Feature support

Fully supported

Davidson

R-space projection

RMM-DIIS

Non-collinear

Exact-exchange

KPAR

Passively supported

[sc]GW[0] Damped All (Algo)

Unsupported

G-space projection NCORE > 1 EFIELD PEAD

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 11 / 18

Page 19: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Feature support

Fully supported

Davidson

R-space projection

RMM-DIIS

Non-collinear

Exact-exchange

KPAR

Passively supported

[sc]GW[0] Damped All (Algo)

Unsupported

G-space projection NCORE > 1 EFIELD PEAD

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 11 / 18

Page 20: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Feature support

Fully supported

Davidson

R-space projection

RMM-DIIS

Non-collinear

Exact-exchange

KPAR

Passively supported

[sc]GW[0] Damped All (Algo)

Unsupported

G-space projection NCORE > 1 EFIELD PEAD

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 11 / 18

Page 21: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Traditional DFT

You shouldRun with MPS (multi-process service)

Experiment with multiple CPU ranks per GPU

Works bestLarge numbers of bands

Large numbers of plane-waves

You can expect 2-4x for large systems with CPU/GPU balance;better on GPU-heavy workstations.

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 12 / 18

Page 22: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Example: Si super-cell

512 Si atoms

1282 bands

864000 PWs

Algo = Normal

1 2 4 80

1

2

3

4

Nodes

2xK80 vs 2xHaswell-EP

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 13 / 18

Page 23: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Hybrid functionals (exact-exchange)

You shouldUse 1 or 2 CPUs rank per GPU

Set NSIM = NBAND / (2*NCPU)

Works bestLarge numbers of plane-waves

Small number of ionic types

You can expect 1.5-6x, highly dependent on system size; better onGPU-heavy workstations.

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 14 / 18

Page 24: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Example: β-rhombohedral boron

105 Boron atoms

216 bands

110592 PWs

Algo = Normal

1 2 4 80

1

2

3

4

5

Nodes

2xK80 vs 2xHaswell-EP

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 15 / 18

Page 25: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Road-map: Features

1. Gamma-point for very large unit cells

2. G-space projection for small to medium unit cells

3. Van der Waals density functional (vdF-DF)

4. Random phase approximation (RPA)

5. Active support for [sc]GW[0]

6. NCORE > 1 for highly parallel runs

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 16 / 18

Page 26: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Road-map: Performance

Better performance for moderate sizesAdd blocking to all core kernels

Add batching to all library calls

Better performance for large sizesUpdate Magma support

Merge with threaded code base to reduce ranks per GPU

Better performance for hybrid functionalsParallelize outer loops

Pad projection sizes

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 17 / 18

Page 27: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Road-map: Performance

Better performance for moderate sizesAdd blocking to all core kernels

Add batching to all library calls

Better performance for large sizesUpdate Magma support

Merge with threaded code base to reduce ranks per GPU

Better performance for hybrid functionalsParallelize outer loops

Pad projection sizes

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 17 / 18

Page 28: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Road-map: Performance

Better performance for moderate sizesAdd blocking to all core kernels

Add batching to all library calls

Better performance for large sizesUpdate Magma support

Merge with threaded code base to reduce ranks per GPU

Better performance for hybrid functionalsParallelize outer loops

Pad projection sizes

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 17 / 18

Page 29: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Road-map: Performance

Better performance for moderate sizesAdd blocking to all core kernels

Add batching to all library calls

Better performance for large sizesUpdate Magma support

Merge with threaded code base to reduce ranks per GPU

Better performance for hybrid functionalsParallelize outer loops

Pad projection sizes

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 17 / 18

Page 30: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Summary

GPU VASP will give you the right answerExtensive testing in Beta and for Vienna’s acceptance

GPU VASP will give 2-4x performance on moderate to large systemsThe bigger the better

We are continuing to add feature support and improve performanceGamma-point is next on the list

When you get GPU support in your next VASP release, try it.

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 18 / 18

Page 31: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Summary

GPU VASP will give you the right answerExtensive testing in Beta and for Vienna’s acceptance

GPU VASP will give 2-4x performance on moderate to large systemsThe bigger the better

We are continuing to add feature support and improve performanceGamma-point is next on the list

When you get GPU support in your next VASP release, try it.

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 18 / 18

Page 32: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Summary

GPU VASP will give you the right answerExtensive testing in Beta and for Vienna’s acceptance

GPU VASP will give 2-4x performance on moderate to large systemsThe bigger the better

We are continuing to add feature support and improve performanceGamma-point is next on the list

When you get GPU support in your next VASP release, try it.

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 18 / 18

Page 33: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Summary

GPU VASP will give you the right answerExtensive testing in Beta and for Vienna’s acceptance

GPU VASP will give 2-4x performance on moderate to large systemsThe bigger the better

We are continuing to add feature support and improve performanceGamma-point is next on the list

When you get GPU support in your next VASP release, try it.

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 18 / 18

Page 34: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Intro Correctness Usage Road-map

Summary

GPU VASP will give you the right answerExtensive testing in Beta and for Vienna’s acceptance

GPU VASP will give 2-4x performance on moderate to large systemsThe bigger the better

We are continuing to add feature support and improve performanceGamma-point is next on the list

When you get GPU support in your next VASP release, try it.

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 18 / 18

Page 35: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Performance examples

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 1 / 2

Page 36: VASP on GPUs - Nvidiaimages.nvidia.com/events/sc15/pdfs/SC5107-vasp-gpus.pdfGPU VASP will give 2-4x performance on moderate to large systems The bigger the better We are continuing

Performance examples

More performance

Max Hutchinson (UofC) VASP on GPUs November 17, 2015 2 / 2