© nvidia corporation 2009 -...

62

Upload: others

Post on 06-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice
Page 2: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice
Page 3: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Early 3D Graphics

Perspective study of a chalice

Paolo Uccello, circa 1450

Page 4: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Early Graphics Hardware

Perspective study of a chalice

Paolo Uccello, circa 1450

Artist using a perspective machine

Albrecht Dürer, 1525

Page 5: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Early Electronic Graphics Hardware

SKETCHPAD: A Man-Machine Graphical Communication System

Ivan Sutherland, 1963

Page 6: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Graphics Pipeline

The Geometry Engine: A VLSI Geometry System for Graphics

Jim Clark, 1982

Page 7: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Graphics Pipeline

Vertex Transform & Lighting

Triangle Setup & Rasterization

Texturing & Pixel Shading

Depth Test & Blending

Framebuffer

Page 8: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Graphics Pipeline

Vertex Transform & Lighting

Triangle Setup & Rasterization

Texturing & Pixel Shading

Depth Test & Blending

Framebuffer

Page 9: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Graphics Pipeline

Vertex Transform & Lighting

Triangle Setup & Rasterization

Texturing & Pixel Shading

Depth Test & Blending

Framebuffer

Page 10: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Graphics Pipeline

Vertex Transform & Lighting

Triangle Setup & Rasterization

Texturing & Pixel Shading

Depth Test & Blending

Framebuffer

Page 11: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Graphics Pipeline

Vertex Transform & Lighting

Triangle Setup & Rasterization

Texturing & Pixel Shading

Depth Test & Blending

Framebuffer

Page 12: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Graphics Pipeline

Key abstraction of real-time graphics

Hardware used to look like this

One chip/board per stage

Fixed data flow through pipeline

Vertex

Rasterize

Pixel

Test & Blend

Framebuffer

Page 13: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

!"#$%&'()(*+%!"#$%&'()*%)" +,#-.%/0,%-.//0&12%34+

SGI RealityEngine (1993)

Vertex

Rasterize

Pixel

Test & Blend

Framebuffer

Page 14: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

567$#*8%($%9)+%1)2%)%&"!"#$%&'3454,"#$6&%7"4*,#-.%/040'0&"7,%-.//0&12%3:

SGI InfiniteReality (1997)

Vertex

Rasterize

Pixel

Test & Blend

Framebuffer

Page 15: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Graphics Pipeline

Remains a useful abstraction

Hardware used to look like this

Vertex

Rasterize

Pixel

Test & Blend

Framebuffer

Page 16: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Graphics Pipeline

Hardware used to look like this:

Vertex, pixel processing became programmable

Vertex

Rasterize

Pixel

Test & Blend

Framebuffer

!!"#$%&"'&()$*"+)(,-(./"-0)"+$1(231/)"$**1'1-0

4456-7$644 8-1*"8)%9**:,6-$';"9<",6-$';"=<",6-$';">?

@

10' 1 A"'&()$*B*CDC E"76-%FG1.DC ;"76-%FB*CDCH

>I1J"A"9I1J"E"=I1JH

K

Page 17: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Graphics Pipeline

Hardware used to look like this

Vertex, pixel processing became programmable

New stages added

Vertex

Rasterize

Pixel

Test & Blend

Framebuffer

Geometry

!!"#$%&"'&()$*"+)(,-(./"-0)"+$1(231/)"$**1'1-0

4456-7$644 8-1*"8)%9**:,6-$';"9<",6-$';"=<",6-$';">?

@

10'"1"A"'&()$*B*CDC E"76-%FG1.DC ;"76-%FB*CDCH

>I1J"A"9I1J"E"=I1JH

K

Page 18: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Graphics Pipeline

Hardware used to look like this

Vertex, pixel processing became programmable

New stages added

GPU architecture increasingly

centers around shader execution

Vertex

Rasterize

Pixel

Test & Blend

Framebuffer

Geometry

Tessellation

!!"#$%&"'&()$*"+)(,-(./"-0)"+$1(231/)"$**1'1-0

4456-7$644 8-1*"8)%9**:,6-$';"9<",6-$';"=<",6-$';">?

@

10'"1"A"'&()$*B*CDC E"76-%FG1.DC ;"76-%FB*CDCH

>I1J"A"9I1J"E"=I1JH

K

Page 19: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

Modern GPUs: Unified Design

Shader D

Shader A

Shader B

Shader C

Shader

Core

ibuffer ibuffer ibuffer ibuffer

obuffer obuffer obufferobuffer

Discrete Design Unified Design

Vertex shaders, pixel shaders, etc. become threads

running different programs on a flexible core

Page 20: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

L2

Framebuffer

SP SP

L1

TF

;<#

(9=%

1#6

>(??

6#

@(#$(A%;<#(9=%.??"(

-($"B%C%09?$(#DE(

/(68%;<#(9=%.??"( 1DA()%;<#(9=%.??"(

.7B"$%&??(8F)(#

26?$

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

L2

Framebuffer

L2

Framebuffer

L2

Framebuffer

L2

Framebuffer

L2

Framebuffer

GeForce 8: Modern GPU Architecture

Page 21: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

L2

Framebuffer

SP SP

L1

TF

;<#

(9=%

1#6

>(??

6#

@(#$(A%;<#(9=%.??"(

-($"B%C%09?$(#DE(

/(68%;<#(9=%.??"( 1DA()%;<#(9=%.??"(

.7B"$%&??(8F)(#

26?$

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

SP SP

L1

TF

L2

Framebuffer

L2

Framebuffer

L2

Framebuffer

L2

Framebuffer

L2

Framebuffer

GeForce 8: Modern GPU Architecture

Page 22: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

L2

Framebuffer

Modern GPU Architecture: GT200

SP SP

L1

TF

SP

L2

Framebuffer

L2

Framebuffer

L2

Framebuffer

L2

Framebuffer

L2

Framebuffer

L2

Framebuffer

L2

Framebuffer

SP SP

L1

TF

SP SP SP

L1

TF

SP SP SP

L1

TF

SP SP SP

L1

TF

SP

SP SP

L1

TF

SP SP SP

L1

TF

SP SP SP

L1

TF

SP SP SP

L1

TF

SP SP SP

L1

TF

SP

;<#

(9=%

-><

(=")

(#

@(#$(A%;<#(9=%.??"(

-($"B%C%09?$(#DE(

/(68%;<#(9=%.??"( 1DA()%;<#(9=%.??"(

.7B"$%&??(8F)(#

26?$

Page 23: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

DR

AM

I/F

HO

ST

I/F

Gig

a T

hre

ad

DR

AM

I/F

DR

AM

I/FD

RA

M I/F

DR

AM

I/FD

RA

M I/F

L2

NVIDIA next-!"#$%&"'()*$+',-).",./'"

Next-Gen GPU Architecture: Fermi

Page 24: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

GPUs Today

Lessons from Graphics Pipeline

Throughput is paramount

Create, run, & retire lots of threads very rapidly

Use multithreading to hide latency

!""# $%%% $%%# $%!%

!"#$%&'(

)*%+,-./

012-.314 '56

')*%+,-./

012-.31 27

&'5*%+,-./

012-.31%((88

6(&*%+,-./

012-.31 )%

68*%+,-./

!"#$%&'

)9 +,-./

Page 25: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice
Page 26: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

Perspective

1 TeraFLOP in 1993

Page 27: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

!"#$%&#'($)**+#,-$./'

Computers no longer get faster, just wider

You must re-think your algorithms to be parallel !

Data-parallel computing is most scalable solution

Page 28: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

Why GPU Computing?

Throughput architecture ! massive parallelism

Massive parallelism ! computational horsepower

Fact:

nobody cares about theoretical peak

Challenge:

harness GPU power for real application performance

CPU

GPU

GF

LO

PS

!"#

!"#$#%&'()*%&+,-.-

/0-&12345

67.&89*:;)

$"#

#<=4>&+234&?@&6.A

0&12345

,-/&89*:;)

Page 29: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

100X

AstrophysicsRIKEN

30X

Gene SequencingU of Maryland

Matlab ComputingAccelerEyes

Video TranscodingElemental Tech

Medical Imaging U of Utah

146X

CUDA Successes

18X 50X

149X

Financial simulationOxford

36X

Molecular DynamicsU of Illinois

47X

Linear AlgebraUniversidad Jaime

20X

3D UltrasoundTechniscan

130X

Quantum ChemistryU of Illinois

Page 30: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Accelerating Insight

4.6 Days

27 Minutes

2.7 Days

30 Minutes

8 Hours

13 Minutes16 Minutes

3 Hours

CPU Only Heterogeneous with Tesla GPU

Page 31: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice
Page 32: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

GPU Computing 1.0

(Ignoring prehistory: Ikonas, Pixel Machine, Pixel-01/2#-34

GPU Computing 1.0: compute pretending to be graphics

Disguise data as textures or geometry

Disguise algorithm as render passes

!Trick graphics pipeline into doing your computation!

Term GPGPU coined by Mark Harris

Page 33: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Typical GPGPU Constructs

Page 34: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Typical GPGPU Constructs

Page 35: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

GPGPU Hardware & Algorithms

GPUs get progressively more capable

Fixed-function ! register combiners ! shaders

fp32 pixel hardware greatly extends reach

Algorithms get more sophisticated

Cellular automata ! PDE solvers ! ray tracing

Clever graphics tricks

High-level shading languages emerge

Page 36: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

GPU Computing 2.0 5 Enter CUDA

GPU Computing 2.0: direct compute

Program GPU directly, no graphics-based restrictions

GPU Computing supplants graphics-based GPGPU

November 2006: NVIDIA introduces CUDA

Page 37: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

CUDA In One Slide

Thread

B(#G$<#(9=)6>9)%8(86#*

B(#GF)6>'?<9#(=8(86#*

. . .

Kernel !"#$%

B(#G=(HD>(I)6F9)

8(86#*

Global barrier

Block

Local barrier

Kernel &''$%

. . .. . .

Page 38: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

CUDA C Example

!"#$%&'()*+&,-#'./#01%02%3."'1%'2%3."'1%4(2%3."'1%4*5

6

3"- /#01%#%7%89%# : 09%;;#5

*<#=%7%'4(<#=%;%*<#=9

>

??%@0!"A,%&,-#'. BCDEF%A,-0,.

&'()*+&,-#'./02%GH82%(2%*59

++I."J'.++%!"#$%&'()*+)'-'..,./#01%02%3."'1%'2%3."'1%4(2%3."'1%4*5

6

#01%#%7%J."KA@$(H(4J."KAL#MH(%;%1N-,'$@$(H(9

#3 /# : 05%%*<#=%7%'4(<#=%;%*<#=9

>

??%@0!"A,%)'-'..,. BCDEF%A,-0,. O#1N%GPQ%1N-,'$&?J."KA

#01%0J."KA&%7%/0%;%GPP5%?%GPQ9

&'()*+)'-'..,.:::0J."KA&2%GPQRRR/02%GH82%(2%*59

Serial C Code

Parallel C Code

Page 39: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Heterogeneous Programming

Use the right processor for the right job

Serial Code

. . .

. . .

Parallel Kernel

&''((()*+,-.)*/01)222$"#34%5

Serial Code

Parallel Kernel

!"#((()*+,-.)*/01)222$"#34%5

Page 40: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

GPU Computing 3.0 5 An Ecosystem

GPU Computing 3.0: an emerging ecosystem

Hardware & product lines

Algorithmic sophistication

Cross-platform standards

Education & research

Consumer applications

High-level languages

!"#$%&'()*$+,$-./)(*01$$%&0()2$3,2&*4$56/+7,89*4$/55*

Page 41: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Products

CUDA is in products from laptops to supercomputers

`

Page 42: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Emerging HPC Products

New class of hybrid GPU-CPU servers

2 TeslaM1060 GPUs

SuperMicro 1U

GPU Server

Upto 18 Tesla M1060 GPUs

Bull Bullx

Blade Enclosure

Page 43: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Algorithmic Sophistication

Sort

Sparse matrix

Hash tables

Fast multipole method

Ray tracing (parallel tree traversal)

012$3"4/5.4$6'7($%89.)():+.)7#$76$;9+'4"$<+.')=-Vector Multiplication on Emerging Multicore Platforms", Williams et al, Supercomputing 2007

Page 44: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Cross-Platform Standards

GPU Computing Applications

NVIDIA GPUwith the CUDA Parallel Computing Architecture

CUDA C OpenCLtm Direct Compute

CUDA Fortran

JavaPython.NET

MATLAB3

OpenCL is trademark of Apple Inc. used under license to the Khronos Group Inc.

Page 45: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Over 250 universities teach CUDAOver 1000 research papers

CUDA Momentum

Over 500 CUDA Appswww.nvidia.com/CUDA

CUDA powered TSUBAME29th fastest supercomputer

in the world

180 Million CUDA GPUs

100,000 active developers

Page 46: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Data Structures

! !"#$%!&&'()*+(,)(+!-#

! !"#$%!&&"-%!,)(+!-#

! !"#$%!&&'()*+(,.!#

! Etc.

Algorithms

! !"#$%!&&%-#!

! !"#$%!&&#('$+(

! !"#$%!&&(/+0$%*)(,%+12

! Etc.

thrust

thrust: a library of data parallel algorithms & data structures with an interface similar to the C++ Standard Template Library for CUDA

C++ template metaprogramming automatically chooses the fastest code path at compile time

Page 47: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

thrust::sort

sort.cu60*7,819)(:;#84:<;'4:=>97:'#?;2

60*7,819)(:;#84:<19>079=>97:'#?;2

60*7,819)(:;#84:<39*9#":9?;2

60*7,819)(:;#84:<4'#:?;2

60*7,819)(74:1,0!2

0*: @"0*$>'01%

A

<<)39*9#":9)#"*1'@)1":")'*):;9);'4:

:;#84:BB;'4:=>97:'#(0*:2);=>97$CDDDDDD%5

:;#84:BB39*9#":9$;=>97?!930*$%.);=>97?9*1$%.)#"*1%5

<<):#"*4&9#):')19>079)"*1)4'#:

:;#84:BB19>079=>97:'#(0*:2)1=>97 E);=>975

<<)4'#:)CFDG)HI!)-9J4<497)'*)K/IDD

:;#84:BB4'#:$1=>97?!930*$%.)1=>97?9*1$%%5

#9:8#*)D5L

http://thrust.googlecode.com

Page 48: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

GPU Begins to Vanish

Ever-increasing number of codes & commercial 6/78/9#-$%:;-<$9*$=/-<#+($'"#2$>0?$@-$6+#-#2<

Some bio/chem codes available or porting:

NAMD / VMD, GROMACS (alpha), HOOMD, GPU HMMER, MUMmerGPU, AutoDock3

LAMMPS, CHARMM, Q-ChemA$>/;--@/2A$B)CDE3

Consumer applications, e.g. Photoshop

Badaboom, vReveal, Nero MoveIt

OS: Microsoft Windows 7, MacOS Snowleapord

Page 49: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice
Page 50: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Next-Gen GPU Architecture: Fermi

3 billion transistors

Over 2x the cores (512 total)

~2x the memory bandwidth

L1 and L2 caches

8x the peak DP performance

ECC

C++

Page 51: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

SM Microarchitecture

Register File

Scheduler

Dispatch

Scheduler

Dispatch

Load/Store Units x 16

Special Func Units x 4

Interconnect Network

64K ConfigurableCache/Shared Mem

Uniform Cache

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Instruction Cache

Objective 5 optimize for GPU computingNew ISA

Revamp issue / control flow

New CUDA core architecture

32 cores per SM (512 total)

64KB of configurable L1$ / shared memory

FP32 FP64 INT SFU LD/ST

Ops / clk 32 16 32 4 16

Page 52: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

New IEEE 754-2008 arithmetic standard

Fused Multiply-Add(FMA) for SP & DP

New integer ALU optimized for 64-bit and extended precision ops

SM Microarchitecture

Register File

Scheduler

Dispatch

Scheduler

Dispatch

Load/Store Units x 16

Special Func Units x 4

Interconnect Network

64K ConfigurableCache/Shared Mem

Uniform Cache

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Core

Instruction Cache

CUDA CoreDispatch Port

Operand Collector

Result Queue

FP Unit INT Unit

Page 53: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

Hardware Thread Scheduling

Concurrent kernel execution + faster context switch

Serial Kernel Execution Parallel Kernel Execution

Tim

e

Kernel 1 Kernel 1 Kernel 2

Kernel 2 Kernel 3

Kernel 3

Ker

4

nelKernel 5

Kernel 5

Kernel 4

Kernel 2

Kernel 2

Page 54: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

More Fermi Goodness

ECC protection for DRAM, L2, L1, RF

Unified 40-bit address space for local, shared, global

5-20x faster atomics

Dual DMA engines for CPU"! GPU transfers

ISA extensions for C++ (e.g. virtual functions)

Page 55: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice
Page 56: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

Key CUDA Challenges

Express other programming models elegantly

Producer-consumer: persistent thread blocks reading and writing work queues

Task-parallel: kernel, thread block or warp as parallel task

C*<"$7*FF*2$%6*'#+-;-#+($G?HB$6/<<#+2-

Foster more high-level languages & platforms

Improve & mature development environment

Page 57: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

Key GPU Workloads

Computational graphics

Scientific and numeric computing

Image processing 5 video & images

Computer vision

Speech & natural language

Machine learning

Page 58: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

The Future of Computing?

Forward-looking statements:

All future interesting problems are throughput

problems.

GPUs will evolve to be the general-purpose throughput processors.

CPUs as we know them will become (already are?) %9**I$#2*;9"(A$/2I$-"+@28$<*$/$7*+2#+$*=$<"#$I@#J

Page 59: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

Final Thoughts 5 Education

We should teach parallel computing in CS 1 or CS 2

G*F6;<#+-$I*2,<$9#<$=/-<#+A$:;-<$'@I#+

Manycore is the future of computing

Heapsort and mergesort

Both O(n lg n)

One parallel-friendly, one not

Students need to understand this early

now

Page 60: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice
Page 61: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

MiscellaneousThoughts

Page 62: © NVIDIA Corporation 2009 - lorenabarba.comlorenabarba.com/gpuatbu/Program_files/Luebke_GPUatBU.pdf · © NVIDIA Corporation 2009 Early 3D Graphics Perspective study of a chalice

© NVIDIA Corporation 2009

NVIDIA Resources Available

NVIDIA enables heterogeneous computing research and teaching:

Grants for leading researchers in GPU ComputingPh.D. Fellowship program

Professor Partnership program

Discounted hardware

GPU Computing Ventures 5 Venture fund focused on GPU computing

Technical Training and Assistance