applications of gpu computing - rochester institute...

29
Applications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture Fall 2011

Upload: trinhphuc

Post on 21-Apr-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Applications of GPU Computing Alex Karantza

0306-722 Advanced Computer Architecture Fall 2011

Page 2: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Outline

• Introduction

• GPU Architecture

▫ Multiprocessing

▫ Vector ISA

• GPUs in Industry

▫ Scientific Computing

▫ Image Processing

▫ Databases

• Examples and Benefits

Page 3: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Introduction

“GPUs have evolved to the point where many real world applications are easily implemented on them and run significantly faster than on multi-core systems. Future computing architectures will be hybrid systems with parallel-core GPUs working in tandem with multi-core CPUs.”

- Prof. Jack Dongarra, director of the Innovative Computing Laboratory at the University of Tennessee

Author of LINPACK

Page 4: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

(As typified by NVIDIA CUDA)

Page 5: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

GPU Architecture

• Parallel Coprocessor to conventional CPUs

▫ Implement a SIMD structure, multiple threads running the

same code.

• Grid of Blocks of Threads

▫ Thread local registers

▫ Block local memory and control

▫ Global memory

Page 6: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Grids, Blocks, and Threads

Thread Thread

Processor

Thread

Block Multiprocessor

Grid Device(s)

Contains local registers

and memory; scalar processor

Shared memory and registers;

shared control logic

Global memory, can be easily

distributed across devices

Page 7: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

GPU Architecture

• Processors also implement vector instructions

▫ Vectors of length 2,3,4 of any fundamental type

integer, float, bits, predicate

▫ Instructions for conversion between vector, scalar

• To encourage uniform execution, rather than

branching for conditionals, use predicates

▫ All instructions can be conditionally executed based on

predicate registers

Page 8: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Vectors and Predicates

.global .v4 .f32 V; // a length-4 vector of floats

.shared .v2 .u16 uv; // a length-2 vector of unsigned

.global .v4 .b8 v; // a length-4 vector of bytes

.reg .s32 a, b; // two 32-bit signed ints

.reg .pred p; // a predicate register

setp.lt.s32 p, a, b; // if a < b, set p

@p add.v4.f32 V, V, {1,0,0,0}; // if p, V.x = V.x + 1

Page 9: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

NSF Keeneland

360 Tesla20s

Page 10: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

GPUs in Industry

• Many applications have been developed to use GPUs

for supercomputing in various fields

▫ Scientific Computing

CFD, Molecular Dynamics, Genome Sequencing,

Mechanical Simulation, Quantum Electrodynamics

▫ Image Processing

Registration, interpolation, feature detection, recognition,

filtering

▫ Data Analysis

Databases, sorting and searching, data mining

Page 11: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Major Categories of Algorithm

• 2D/3D filtering operations

• n-body simulations

• Parallel tree operations – searching/sorting

• All suited to GPUs because of data-parallel

requirements and uniform kernels

Page 12: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Computational Fluid Dynamics

• Simulate fluids in a discrete volume over time

• Involves solving the Navier-Stokes partial differential

equations iteratively on a grid

▫ Can be considered a filtering operation

• When parallelized on a GPU using multigrid solvers,

10x speedups have been reported

Page 13: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Molecular Dynamics

• Large set of particles with forces between them –

protein behavior, material simulation

• Calculating forces between particles can be done in

parallel for each particle

• Accumulation of forces can be implemented as

multilevel parallel sums

Page 14: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Genetics

• Large strings of genome sequences must be searched

through to organize and identify samples

• GPUs enable multiple parallel queries to the

database to perform string matching

• Again, order of magnitude

speedups reported

Page 15: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Electrodynamics

• Simulation of electric fields, Coulomb forces

• Requires iterative solving of partial differential

equations

• Cell phone modeling applications have

reported 50x speedups using GPUs

Page 16: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Image Processing

• Medical Imaging was the early adopter

▫ Registration of massive 3D voxel images

▫ Both the cost function for deformable registration and interpolation of results are filtering operations

• Generic feature detection, recognition, object extraction are all filters

• For object recognition, one can search a database of objects in parallel

• Running these algorithms off the CPU can allow real-time interaction

Page 17: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Data Analysis

• Huge databases for web services require instant

results for many simultaneous users

• Insufficient room in main memory, disk is too slow and

doesn’t allow parallel reads

• GPUs can split up the data and perform

fast searches, keeping their section

in memory

Page 18: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture
Page 19: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Example: Filtering Operation

• Many algorithms can be reduced to a filtering

operation. As an example, consider image convolution

for blurring

Kernel = Gaussian2D(size);

for (x,y) in Input {

for (p,q) in Kernel {

Output(x,y) += Input(x+p,y+q) * Kernel(p,q);

}

}

Page 20: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Example: Filtering Operation

• A quick optimization that can be made on many filters is that they are separable, and can be done in one pass per dimension

Kernel = Gaussian1D(size);

for (x,y) in Input {

for (p) in Kernel {

Output(x,y) += Input(x+p,y) * Kernel(p);

}

}

for (x,y) in Input {

for (q) in Kernel {

Output(x,y) += Input(x,y+q) * Kernel(q);

}

}

Page 21: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture
Page 22: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Example: Filtering Operation

• This is still O(2nnm) on a sequential processor • Each output pixel is independent, but shares spatially

local data and a constant kernel

UploadGPU(Kernel, CONSTANT);

UploadGPU(Input, TEXTURE);

ConvolveColumnsGPU<blocks,threads>();

ConvolveRowsGPU<blocks,threads>();

DownloadGPU(Output, TEXTURE);

Page 23: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Example: Filtering Operation

• Complexity remains the same, however each MAC

instruction can be executed on as many processors as

are available, and memory can be accessed quickly

because of the assignment of blocks and texture

memory

• In practice, the overhead of uploading and

downloading from the GPU is far less than the

performance gained in the kernel

Page 24: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Example: Filtering Operation

__global__ void convolutionColumnsKernel(

float *d_Dst,

float *d_Src,

int imageW,

int imageH,

int pitch

){

__shared__ float s_Data[COLUMNS_BLOCKDIM_X]

[(COLUMNS_RESULT_STEPS + 2 * COLUMNS_HALO_STEPS) *

COLUMNS_BLOCKDIM_Y + 1];

//// *snip* Populate s_Data from d_Src

__syncthreads();

#pragma unroll

for(int i = COLUMNS_HALO_STEPS; i < COLUMNS_HALO_STEPS + COLUMNS_RESULT_STEPS; i++){

float sum = 0;

#pragma unroll

for(int j = -KERNEL_RADIUS; j <= KERNEL_RADIUS; j++)

sum += c_Kernel[KERNEL_RADIUS - j] *

s_Data[threadIdx.x][threadIdx.y + i * COLUMNS_BLOCKDIM_Y + j];

d_Dst[i * COLUMNS_BLOCKDIM_Y * pitch] = sum;

}

Page 25: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Even More Fun

• Some of that overhead can be avoided when the

destination of the GPU’s data is graphics

• Texture memory can be shared between general

purpose computations and normal rendering

• For post-processing effects or visualizing particles, the

pixel/vertex data never needs to leave the GPU

Page 26: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture
Page 27: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Conclusions

Certain classes of problem appear in many different

fields, and involve very data-parallel operations such

as filtering, sorting, or integration

Taking advantage of the architecture decisions behind

graphics processing units such as their multiprocessing

and native vector operations, these problems can be

solved quickly and cheaply

Page 28: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

References • 1. Ziegler, Grenot. Introduction to the CUDA Architecture. [Online] 2009.

http://www.cse.scitech.ac.uk/disco/workshops/200907/Day1_01_Intro_CUDA_Architecture.pdf.

• 2. NVIDIA Corporation. NVIDIA Compute PTX: Parallel Thread Execution ISA Version 1.1. 2007.

• 3. Göddeke, Dominik. Fast and Accurate Finite-Element Multigrid Solvers for PDE Simulations on GPU Clusters. Berlin : Logos Verlag, 2010. 978-3-8325-2768-6.

• 4. Accellerating molecular modeling application swith graphics processors. John E Stone, James C Phillips, Peter L Freddolino, David J Hardy, Leonardo G Trabuco, and Klaus Schulten. 2007, Journal of Computational Chemistry, pp. 28:2618-2640.

• 5. Michael C Schatz, Cole Trapnell, Arthur L Delcher, and Amitabh Varshney. High-throughput sequence alignment using Graphics Processing Units. s.l. : BMC Bioinformatics, 2007.

• 6. ANSYS, Inc. ANSYS Unveils GPU Computing for Accelerated Engineering Simulations. [Online] 2010. http://investors.ansys.com/releasedetail.cfm?releaseid=509436.

• 7. Warburton, Tim. Parallel Numerical Methods for Partial Differential Equations. Rocky Mountain Mathematics Consortium. [Online] 2008. http://www.caam.rice.edu/~timwar/RMMC/gpuDG.html.

• 8. Ansorge, Richard. AIRWC : Accelerated Image Registration With CUDA . BSS Group, Cavendish Laboratory, University of Cambridge UK. 2008.

• 9. N. Cornelis, L. Van Gool. Fast Scale Invariant Feature Detection and Matching on Programmable Graphics Hardware. s.l. : CVPR 2008 Workshop, 2008.

• 10. Andrea DiBlas, Tim Kaldewey. Data Monster: Why graphics processors will transform database processing. IEEE Spectrum. [Online] 2009. http://spectrum.ieee.org/computing/software/data-monster/0.

• 11. Podlozhnyuk, Victor. Image Convolution with CUDA. [Online] 2007. http://developer.download.nvidia.com/compute/DevZone/C/html/C/src/convolutionSeparable/doc/convolutionSeparable.pdf.

• 12. Goodnight, Nolan. CUDA/OpenGL Fluid Simulation. [Online] 2007. http://new.math.uiuc.edu/MA198-2008/schaber2/fluidsGL.pdf.

Page 29: Applications of GPU Computing - Rochester Institute …meseec.ce.rit.edu/722-projects/fall2011/1-2.pdfApplications of GPU Computing Alex Karantza 0306-722 Advanced Computer Architecture

Questions?