gpu computing: a brief overview
DESCRIPTION
TRANSCRIPT
GPU COMPUTING
Presented By
Rajiv Kumar VNo -34
S7C
Graphics Processing Units(GPU):
PowerfulProgrammable and Highly Parallel
Jen-sun Huang , ” GPU power is set to increase 570x whereas CPU power would increase a mere 3x over the same time frame of six years”
INTRODUCTION:
• GPU has powered the display of Computers
• Designed for real-time high resolution 3D graphics tasks
• Commercial GPU-based systems are becoming common
• NVIDIA and AMD expanding processor sophistication and
software development tools
• High accuracy by higher floating point precision
• GPUs currently on a development cycle much closer to CPUs
• GPU not constrained by sockets
• Very small backwards compatibility needed in firmware while
rest is delivered through driver implementation
• Computational requirements are large• Parallelism is substantial• Throughput is more important than latency
GPU based S/W’s requirements
App requirement to target GPGPU programming:• Large data sets• High parallelism• Minimal dependencies between data elements• High arithmetic intensity• Lots of work to do without CPU intervention
Task parallelism:• Independent processes with little
communication
Data parallelism:• Lots of data on which the same computation is
being executed• No dependencies between data elements in
each step in the computation• Can saturate many ALUs
Task Vs. Data parallelism
GPU Vs CPU
• CPU designed to process a task as fast as possible while GPU capable of processing a maximum of tasks on a large scale of data
• CPU divides work in time while GPU divides work in space
Graphics Pipeline:• Input to the GPU is a list of geometric primitives• Vertex Operations: primitives transformed into screen space
and shaded• Primitive Assembly: Vertices assembled into triangles• Computing their interaction with the lights in the scene• Rasterization: determines which screen-space pixels are
covered by each triangle• Fragment Operations: Using color information each
fragment is shaded to determine its final color • Each pixel’s color value may be computed from several
fragments• Composition: Fragments are assembled into a final image
with one color per pixel
Graphics Pipeline:
Evolution of GPU Architecture:• Fixed function pipeline lacked generality for complex
effects• Replacement of fixed function per vertex and per
fragment operations by vertex and fragment programs• Increased complexity of vertex and fragment program
as Shader Model evolved• Support for Unified Shader Models
Shader Models:• A Shader provides a user defined programmable
alternative to hard-coded approach in GLSL• A Vertex Shader describe the traits(position, colors ,
depth value etc) of a vertex• A Geometry shader add volumetric detail & O/P is then
sent to the rasterizer• A Pixel/fragment shader describe the traits (color, z-
depth and alpha value) of a pixel
GPU Programming Model
• Follows a SPMD programming model• Each element is independent from other
elements in base programming Model• Many parallel elements processed by single
program• Each element can operate on integer or
float data with reasonably complete instruction set
• Reads data from shared memory by scatter and gather operations
• Code is in SIMD manner• Allows different execution path for each
element• If elements branch in different directions
both branches are computed• Computation as blocks in order of 16
elements• Finally programmers branches are
permitted but not free
GPU Architecture:NVIDIA
Nvidia 8800GTX architecture(top)
A pair of SMs(right)
Memory Architecture
• Capable of reading and writing anywhere in local memory(GPU) or elsewhere.
• These non cached memories having large read/write latencies which can be masked by the extremely long pipeline, if they don’t wait for a reading instruction
GPGPU Programming
Stream processing is a new paradigm to maximize the efficiency of parallel computing. It can be decomposed in two parts:
• Stream: It’s a collection of objects which can be operated in parallel and which require the same computation.
• Kernel: It’s a function applied on the entire stream, looks like a “for each” loop
Streams-Collection of records requiring similar computation eg. Vertex positions, Voxels etc.-Provide data parallelism
Kernels–Functions applied to each element in stream transforms–No dependencies between stream elements encourage high Arithmetic Intensity
Gather–Indirect read from memory ( x = a[i] )–Naturally maps to a texture fetch–Used to access data structures and data streams
Scatter–Indirect write to memory ( a[i] = x )–Needed for building many data structures–Usually done on the CPU
Terminology:
What can you do on GPUs other than graphics?
• Large matrix/vector operations (BLAS)• Protein Folding (Molecular Dynamics)• FFT (SETI, signal processing)• Ray Tracing• Physics Simulation [cloth, fluid, collision]• Sequence Matching (Hidden Markov Models)• Speech Recognition (Hidden Markov Models,
Neural nets)• Databases • Sort/Search• Medical Imaging (image segmentation,
processing)And many, many more…
Future of GPU Computing:
• Higher Bandwidth PCI-E bus path between CPU and GPU
• AMD’s fusion and Intel’s IvyBridge places both CPU and GPU elements on a single chip
• Addition of AVX instructions in CPU architectures
• Programmable Pipelines over the current few programmable shading stages in the fixed graphics pipeline
• Flexibility of variety of rendering along with general purpose processing
Looking Ahead:
Problems in GPGPU Computing
• A killer App...???...??• Programming models and Tools…Proprietary
nature…??• GPU in tomorrow’s Computer…Will it get
dissolved…or absorbed???• Relationship to other parallel H/W and S/W• Managing Rapid Change…• Performance Evaluation and Cliffs• Broader Toolbox for computation and Data
Structures…”Vertical” model for app development• Faults and Lack of Precision…
Drawbacks:
• Power consumption• Increasing die size• Multi die solutions requiring inter-die connections
increase the packaging and wafer cost• Increasing amount of die space to control logic ,
registers and cache as GPU becomes flexible and programmable
• Comparing CPU to GPUs is more like comparing apples to oranges
• Still lots of fixed functions hardware• Integration of multimedia fixed functions within the
CPUs
References:
• GPU Computing Gems Emerald Edition By Wen.Mei W. Hwu
• Cuda By Example: An Introduction to General Purpose GPU Computing By J.Sanders,E.Kandrot (July 2010)
• http://www.oxford-man.ox.ac.uk/gpuss/simd.html• http://idlastro.gsfc.nasa.gov/idl_html_help/
About_Shader_Programs.html• GPU Computing Proceedings of IEEE,May 2008• Evolution Of GPU By Chris Sietz
Thank You All…
Any Questions…
???