gpu computing: a brief overview

GPU COMPUTING

Presented By

Rajiv Kumar VNo -34

S7C

Graphics Processing Units(GPU):

PowerfulProgrammable and Highly Parallel

Jen-sun Huang , ” GPU power is set to increase 570x whereas CPU power would increase a mere 3x over the same time frame of six years”

INTRODUCTION:

• GPU has powered the display of Computers

• Designed for real-time high resolution 3D graphics tasks

• Commercial GPU-based systems are becoming common

• NVIDIA and AMD expanding processor sophistication and

software development tools

• High accuracy by higher floating point precision

• GPUs currently on a development cycle much closer to CPUs

• GPU not constrained by sockets

• Very small backwards compatibility needed in firmware while

rest is delivered through driver implementation

• Computational requirements are large• Parallelism is substantial• Throughput is more important than latency

GPU based S/W’s requirements

App requirement to target GPGPU programming:• Large data sets• High parallelism• Minimal dependencies between data elements• High arithmetic intensity• Lots of work to do without CPU intervention

Task parallelism:• Independent processes with little

communication

Data parallelism:• Lots of data on which the same computation is

being executed• No dependencies between data elements in

each step in the computation• Can saturate many ALUs

Task Vs. Data parallelism

GPU Vs CPU

• CPU designed to process a task as fast as possible while GPU capable of processing a maximum of tasks on a large scale of data

• CPU divides work in time while GPU divides work in space

Graphics Pipeline:• Input to the GPU is a list of geometric primitives• Vertex Operations: primitives transformed into screen space

and shaded• Primitive Assembly: Vertices assembled into triangles• Computing their interaction with the lights in the scene• Rasterization: determines which screen-space pixels are

covered by each triangle• Fragment Operations: Using color information each

fragment is shaded to determine its final color • Each pixel’s color value may be computed from several

fragments• Composition: Fragments are assembled into a final image

with one color per pixel

Graphics Pipeline:

Evolution of GPU Architecture:• Fixed function pipeline lacked generality for complex

effects• Replacement of fixed function per vertex and per

fragment operations by vertex and fragment programs• Increased complexity of vertex and fragment program

as Shader Model evolved• Support for Unified Shader Models

Shader Models:• A Shader provides a user defined programmable

alternative to hard-coded approach in GLSL• A Vertex Shader describe the traits(position, colors ,

depth value etc) of a vertex• A Geometry shader add volumetric detail & O/P is then

sent to the rasterizer• A Pixel/fragment shader describe the traits (color, z-

depth and alpha value) of a pixel

GPU Programming Model

• Follows a SPMD programming model• Each element is independent from other

elements in base programming Model• Many parallel elements processed by single

program• Each element can operate on integer or

float data with reasonably complete instruction set

• Reads data from shared memory by scatter and gather operations

• Code is in SIMD manner• Allows different execution path for each

element• If elements branch in different directions

both branches are computed• Computation as blocks in order of 16

elements• Finally programmers branches are

permitted but not free

GPU Architecture:NVIDIA

Nvidia 8800GTX architecture(top)

A pair of SMs(right)

Memory Architecture

• Capable of reading and writing anywhere in local memory(GPU) or elsewhere.

• These non cached memories having large read/write latencies which can be masked by the extremely long pipeline, if they don’t wait for a reading instruction

GPGPU Programming

Stream processing is a new paradigm to maximize the efficiency of parallel computing. It can be decomposed in two parts:

• Stream: It’s a collection of objects which can be operated in parallel and which require the same computation.

• Kernel: It’s a function applied on the entire stream, looks like a “for each” loop

Streams-Collection of records requiring similar computation eg. Vertex positions, Voxels etc.-Provide data parallelism

Kernels–Functions applied to each element in stream transforms–No dependencies between stream elements encourage high Arithmetic Intensity

Gather–Indirect read from memory ( x = a[i] )–Naturally maps to a texture fetch–Used to access data structures and data streams

Scatter–Indirect write to memory ( a[i] = x )–Needed for building many data structures–Usually done on the CPU

Terminology:

What can you do on GPUs other than graphics?

• Large matrix/vector operations (BLAS)• Protein Folding (Molecular Dynamics)• FFT (SETI, signal processing)• Ray Tracing• Physics Simulation [cloth, fluid, collision]• Sequence Matching (Hidden Markov Models)• Speech Recognition (Hidden Markov Models,

Neural nets)• Databases • Sort/Search• Medical Imaging (image segmentation,

processing)And many, many more…

Future of GPU Computing:

• Higher Bandwidth PCI-E bus path between CPU and GPU

• AMD’s fusion and Intel’s IvyBridge places both CPU and GPU elements on a single chip

• Addition of AVX instructions in CPU architectures

• Programmable Pipelines over the current few programmable shading stages in the fixed graphics pipeline

• Flexibility of variety of rendering along with general purpose processing

Looking Ahead:

Problems in GPGPU Computing

• A killer App...???...??• Programming models and Tools…Proprietary

nature…??• GPU in tomorrow’s Computer…Will it get

dissolved…or absorbed???• Relationship to other parallel H/W and S/W• Managing Rapid Change…• Performance Evaluation and Cliffs• Broader Toolbox for computation and Data

Structures…”Vertical” model for app development• Faults and Lack of Precision…

Drawbacks:

• Power consumption• Increasing die size• Multi die solutions requiring inter-die connections

increase the packaging and wafer cost• Increasing amount of die space to control logic ,

registers and cache as GPU becomes flexible and programmable

• Comparing CPU to GPUs is more like comparing apples to oranges

• Still lots of fixed functions hardware• Integration of multimedia fixed functions within the

CPUs

References:

• GPU Computing Gems Emerald Edition By Wen.Mei W. Hwu

• Cuda By Example: An Introduction to General Purpose GPU Computing By J.Sanders,E.Kandrot (July 2010)

• http://www.oxford-man.ox.ac.uk/gpuss/simd.html• http://idlastro.gsfc.nasa.gov/idl_html_help/

About_Shader_Programs.html• GPU Computing Proceedings of IEEE,May 2008• Evolution Of GPU By Chris Sietz

Thank You All…

Any Questions…

???

gpu computing: a brief overview

Devices & Hardware