cuda training day birmingham university · cuda training day – birmingham university ... maxwell...

29
© NVIDIA Corporation 2013 CUDA TRAINING DAY – BIRMINGHAM UNIVERSITY Jeremy Purches & Dr Timothy Lanfear 31 st July 2013

Upload: lamliem

Post on 14-Apr-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

© NVIDIA Corporation 2013

CUDA TRAINING DAY – BIRMINGHAM UNIVERSITY Jeremy Purches & Dr Timothy Lanfear 31st July 2013

© NVIDIA Corporation 2013

AGENDA

10:30 Introduction to GPU Computing

Introduction to NVIDIA & GPU Products – Jeremy Purches

Introduction to GPU programming – Tim Lanfear Programming languages (CUDA, OpenACC)

Programming environment and tools

12:00 Lunch

13:00 GPU programming with CUDA

Hands-on training – Tim Lanfear

16:00 Close

Birmingham University 31st July 2013

http://www.nvidia.co.uk/object/gpu-computing-survey-uk.html Course feedback/survey:

NVIDIA GPU

TECHNOLOGY

Jeremy Purches HPC Business Development Manager

Birmingham University 31st July 2013

© NVIDIA Corporation 2013

GPU

Mobile

Cloud

GeForce®

Quadro®

, Tesla® Tegra® GRID™

NVIDIA — Core Technologies and Brands

© NVIDIA Corporation 2013

The GPU is one of the most complex processors

ever created, with more than 7 billion transistors.

NVIDIA has shipped over 1 billion GPUs.

NVIDIA GPU

© NVIDIA Corporation 2013

GeForce

Quadro

Tesla

© NVIDIA Corporation 2013

© NVIDIA Corporation 2013

Tesla Parallel Computing with GPUs

© NVIDIA Corporation 2013

GPU Roadmap

2012 2014 2008 2010

DP G

FLO

PS p

er

Watt

Kepler

Tesla

Fermi

Maxwell

Volta Stacked DRAM

Unified Virtual Memory

Dynamic Parallelism

FP64

CUDA

32

16

8

4

2

1

0.5

© NVIDIA Corporation 2013

Tesla Kepler Family World’s Fastest and Most Efficient HPC Accelerators

GPUs

Single

Precision

Peak

(SGEMM)

Double

Precision

Peak

(DGEMM)

Memory

Size

Memory

Bandwidth

(ECC off)

System Solution

Weather & Climate,

Physics, BioChemistry, CAE,

Material Science

K20X 3.95 TF

(2.90 TF)

1.32 TF

(1.22 TF) 6 GB 250 GB/s Server only

K20 3.52 TF

(2.61 TF)

1.17 TF

(1.10 TF) 5 GB 208 GB/s

Server +

Workstation

Image, Signal,

Video, Seismic K10 4.58 TF 0.19 TF 8 GB 320 GB/s Server only

© NVIDIA Corporation 2013

Tesla Kepler Product Family

Excellent DP for widest range

of applications

Double precision performance

leader for the most demanding HPC

applications

Highest memory bandwidth and

single precision for seismic, signal,

image, video, molecular dynamics

Bubble size is single

precision SGEMM in

Teraflops K10 3 TF SGEMM

K20X

K20 M2090

M2075

© NVIDIA Corporation 2013

2.73x 3.20x

7.17x

8.85x

10.20x

0

2

4

6

8

10

12

NAMD WL-LSMS AMBER SPECFEM3D Chroma

Speed U

p

2x CPU

1x Tesla K20X + 1x CPU

Performance on Leading Scientific Applications

K20X Relative Performance vs. dual-socket Sandy Bridge

2x CPU = 2x Sandy Bridge E5-2687, 3.10 GHz 1x Tesla K20X + 1x CPU = 1x Tesla K20 GPU; 1x Sandy Bridge E5-2687, 3.10 GHz

1x

© NVIDIA Corporation 2013

Developer Momentum Continues to Grow

2008 2013

4,000 Academic Papers

150K CUDA Downloads

60 University Courses

100M CUDA –Capable GPUs

1 Supercomputer

430M CUDA-Capable GPUs

50 Supercomputers

1.6M CUDA Downloads

640 University Courses

37,000 Academic Papers

© NVIDIA Corporation 2013

0

5

10

15

20

25

30

35

40

2006 2007 2008 2009 2010 2011 2012

Performance of Accelerators To

tal Perf

orm

ance (

PFLO

PS)

NVIDIA Kepler

NVIDIA Fermi

Intel Xeon Phi

IBM Cell

Other

19% of FLOPS from GPU systems

© NVIDIA Corporation 2013

TITAN: World’s Fastest Open Science Supercomputer

18,688 Tesla K20X GPUs

27 Petaflops Peak, 17.59 Petaflops on Linpack

90% of Performance from GPUs

© NVIDIA Corporation 2013

CSCS - Europe’s Fastest GPU Supercomputer Switzerland’s Piz Daint, to be Powered by Tesla K20X

Astrophysics · Climate & Weather · Genomics · Geophysics · Material Science

© NVIDIA Corporation 2013

3150 MFLOPS/Watt

128 Tesla K20 Accelerators

$100k Energy Savings / Yr

300 Tons of CO2 Saved / Yr 0

1000

2000

3000

CINECA Eurora-Tesla K20

NICS Beacon-Greenest Xeon

Phi System

C-DAC- GreenestCPU System

MFLOPS/Watt

CINECA Eurora

“Liquid-Cooled” Eurotech Aurora Tigon

Greener than Xeon Phi, Xeon CPU

World’s Most Energy Efficient Supercomputer

© NVIDIA Corporation 2013

e-Infrastructure South Consortium

EMERALD

• 84 HP SL390 G7 servers

• 372 NVIDIA M2090 GPUs • Voltaire QDR IB Network

• Gnodal 10G Ethernet

• 135TB Panasas Storage

The UK's most powerful GPU-based supercomputer,

"Emerald", has been unveiled at the Science and Technology

Facilities Council's (STFC) Rutherford Appleton Laboratory (RAL).

Using the newly-available technology researchers will soon tackle areas

ranging from healthcare (Tamiflu and swine flu); astrophysics (real-time

pulsar detection application for the forthcoming Square Kilometre Array

Project), bioinformatics (analysis and statistical modelling of whole-genome

sequencing data); climate change modelling; complex engineering

systems; simulating 3G and 4G communications networks and developing

new tools for processing and managing medical images.

© NVIDIA Corporation 2013

Super Computer Performance Development

iPhone 4s (1.02 Gflop/s)

Laptop (70 Gflop/s)

GPU (1.3 Tflop/s)

© NVIDIA Corporation 2013

Accelerator Computing Now Mainstream

Our end customer survey shows that 78.4% of HPC sites

are planning to include accelerators/coprocessors in

their next technical computing server purchase, up from

29% just 2 years ago.

IDC

HPC Market Survey

April, 2013

© NVIDIA Corporation 2013

The Era of Accelerated Computing is Here

1980 1990 2000 2010 2020

Era of

Vector Computing

Era of

Accelerated Computing

Era of

Distributed Computing

GPU Accelerated

Applications

© NVIDIA Corporation 2013

OIL & GAS

MANUFACTURING

MEDIA & ENTMNT.

EDU/RESEARCH

LIFE SCIENCES

GOVERNMENT

DATA ANALYTICS

FINANCE

GPUs Central To Computing

Air Force

Research

Laboratory

Chinese

Academy

Of Sciences

© NVIDIA Corporation 2013

Top Scientific Apps

Computational

Chemistry

AMBER

CHARMM

GROMACS

LAMMPS

NAMD

DL_POLY

Material Science QMCPACK

Quantum Espresso

GAMESS-US

Gaussian

NWChem

VASP

Climate &

Weather COSMO

GEOS-5

CAM-SE

NIM

WRF

Physics Chroma

Denovo

GTC

GTS

ENZO

MILC

CAE ANSYS Mechanical

MSC Nastran

SIMULIA Abaqus

ANSYS Fluent

OpenFOAM

LS-DYNA

Explosive Growth of GPU Accelerated Apps

0

50

100

150

200

2010 2011 2012

# of Apps

40% Increase

61% Increase

Accelerated, In Development

© NVIDIA Corporation 2013

Top Applications Now with Built-in GPU Support

AMBER

NAMD

GROMACS

CHARMM

LAMMPS DL_POLY

Non-GPU Apps

Molecular Dynamics

Adobe CS

Apple Final Cut

Sony Vegas Pro

Avid Media Composer

Autodesk 3dsMax

Other GPU Apps

Non-GPU Apps

Digital Content Creation

Gaussian GAMESS

NWChem

CP2K Quantum Espresso

Non-GPU Apps

Quantum Chemistry

ANSYS

Simulia Abaqus

MSC Nastran Altair

Radioss

Non-GPU Apps

Computer-Aided Engineering

Application

Market Share

by Segment

© NVIDIA Corporation 2013

ANSYS Fluent 14.5 Multi-GPU Demonstration

G1 G2 G3 G4

8-Cores 8-Cores 16-Core Server Node

Multi-GPU Acceleration of

a 16-Core ANSYS Fluent

Simulation of External Aero

Xeon E5-2667 CPUs + Tesla K20X GPUs

2.9X Solver Speedup

CPU Configuration CPU + GPU Configuration

Click to Launch Movie

© NVIDIA Corporation 2013

207 GPU-Accelerated Applications www.nvidia.com/appscatalog

© NVIDIA Corporation 2013

AGENDA

10:30 Introduction to GPU Computing

Introduction to NVIDIA & GPU Products – Jeremy Purches

Introduction to GPU programming – Tim Lanfear Programming languages (CUDA, OpenACC)

Programming environment and tools

12:00 Lunch

13:00 GPU programming with CUDA

Hands-on training – Tim Lanfear

16:00 Close

http://www.nvidia.co.uk/object/gpu-computing-survey-uk.html Birmingham University 31st July 2013