aaron lefohn university of california, davis with updates from slides by suresh venkatasubramanian,...

59
Aaron Lefohn Aaron Lefohn University of California, Davis University of California, Davis With updates from slides by With updates from slides by Suresh Venkatasubramanian, Suresh Venkatasubramanian, University of Pennsylvania University of Pennsylvania Updates performed by Gary J. Katz, Updates performed by Gary J. Katz, University of Pennsylvania University of Pennsylvania GPU Memory Model Overview GPU Memory Model Overview

Post on 21-Dec-2015

229 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

Aaron LefohnAaron LefohnUniversity of California, DavisUniversity of California, DavisWith updates from slides by With updates from slides by Suresh Venkatasubramanian,Suresh Venkatasubramanian,

University of PennsylvaniaUniversity of PennsylvaniaUpdates performed by Gary J. Katz,Updates performed by Gary J. Katz,

University of PennsylvaniaUniversity of Pennsylvania

GPU Memory Model GPU Memory Model OverviewOverview

Page 2: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

ReviewReview

VertexIndex

Stream

3D APICommands

AssembledPrimitives

PixelUpdates

PixelLocationStream

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Tra

nsf

orm

ed

Vert

ices

ProgrammableVertex

Processor

ProgrammableVertex

Processor

GPUFront End

GPUFront End

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

3D API:OpenGL orDirect3D

3D API:OpenGL orDirect3D

3DApplication

Or Game

3DApplication

Or Game

Pre

-transfo

rmed

Vertice

s

Pre

-transfo

rmed

Fragm

en

ts

Tra

nsf

orm

ed

Fragm

en

ts

GPU

Com

mand &

Data

Stre

am

CPU-GPU Boundary (AGP/PCIe)

Fixed-function pipeline

Page 3: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

OverviewOverview

• Color Buffers– Front-left– Front-right– Back-left– Back-right

• Depth Buffer (z-buffer)• Stencil Buffer• Accumulation Buffer

Page 4: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

OverviewOverview

• GPU Memory Model• GPU Data Structure Basics• Introduction to Framebuffer Objects• Fragment Pipeline• Vertex Pipeline

Page 5: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

5

Memory HierarchyMemory Hierarchy

• CPU and GPU Memory Hierarchy

Disk

CPU Main Memory

GPU Video Memory

CPU Caches

CPU Registers GPU Caches

GPU Temporary Registers

GPU Constant Registers

Page 6: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

6

CPU Memory ModelCPU Memory Model

• At any program point– Allocate/free local or global memory– Random memory access

• Registers– Read/write

• Local memory– Read/write to stack

• Global memory– Read/write to heap

• Disk– Read/write to disk

Page 7: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

7

GPU Memory ModelGPU Memory Model

• Much more restricted memory access– Allocate/free memory only before computation– Limited memory access during computation

(kernel)• Registers

– Read/write

• Local memory– Does not exist

• Global memory– Read-only during computation– Write-only at end of computation (pre-computed

address)

• Disk access– Does not exist

Page 8: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

GPU Memory ModelGPU Memory Model

• Where is GPU Data Stored?– Vertex buffer– Frame buffer– Texture

Vertex BufferVertex

ProcessorRasterizer

FragmentProcessor

Texture

Frame Buffer(s)

VS 3.0 GPUs

Page 9: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

9

GPU Memory APIGPU Memory API

• Each GPU memory type supports subset of the following operations– CPU interface– GPU interface

Page 10: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

10

GPU Memory APIGPU Memory API

• CPU interface– Allocate– Free– Copy CPU GPU– Copy GPU CPU– Copy GPU GPU– Bind for read-only vertex stream access– Bind for read-only random access– Bind for write-only framebuffer access

Page 11: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

11

GPU Memory APIGPU Memory API

• GPU (shader/kernel) interface– Random-access read– Stream read

Page 12: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

Vertex BuffersVertex Buffers

• GPU memory for vertex data• Vertex data required to initiate

render pass

Vertex BufferVertex

ProcessorRasterizer

FragmentProcessor

Texture

Frame Buffer(s)

VS 3.0 GPUs

Page 13: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

13

Vertex BuffersVertex Buffers

• Supported Operations– CPU interface

• Allocate• Free• Copy CPU GPU• Copy GPU GPU (Render-to-vertex-array)• Bind for read-only vertex stream access

– GPU interface• Stream read (vertex program only)

Page 14: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

14

Vertex BuffersVertex Buffers

• Limitations– CPU

• No copy GPU CPU• No bind for read-only random access• No bind for write-only framebuffer access

– ATI supported this in uberbuffers. Perhaps we’ll see this as an OpenGL extension?

– GPU• No random-access reads• No access from fragment programs

Page 15: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

TexturesTextures

• Random-access GPU memory

Vertex BufferVertex

ProcessorRasterizer

FragmentProcessor

Texture

Frame Buffer(s)

VS 3.0 GPUs

Page 16: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

16

TexturesTextures

• Supported Operations– CPU interface

• Allocate• Free• Copy CPU GPU• Copy GPU CPU• Copy GPU GPU (Render-to-texture)• Bind for read-only random access (vertex or

fragment)• Bind for write-only framebuffer access

– GPU interface• Random read

Page 17: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

17

TexturesTextures

• Limitations– No bind for vertex stream access

Page 18: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

FramebufferFramebuffer

• Memory written by fragment processor

• Write-only GPU memory

Vertex BufferVertex

ProcessorRasterizer

FragmentProcessor

Texture

Frame Buffer(s)

VS 3.0 GPUs

Page 19: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

19

OpenGL Framebuffer ObjectsOpenGL Framebuffer Objects

• General idea– Framebuffer object is lightweight struct of

pointers– Bind GPU memory to framebuffer as write-only– Memory cannot be read while bound to

framebuffer

• Which memory? – Texture– Renderbuffer– Vertex buffer??

Texture(RGBA)

Renderbuffer(Depth)

FramebufferObject

Page 20: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

20

Framebuffer ObjectFramebuffer Object

• New OpenGL extension– Enables true render-to-texture in OpenGL– Mix-and-match depth/stencil buffers– Replaces pbuffers!– More details coming later in talk…

http://oss.sgi.com/projects/ogl-sample/registry/EXT/framebuffer_object.txt

Page 21: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

21

What is a Renderbuffer?What is a Renderbuffer?

• “Traditional” framebuffer memory – Write-only GPU memory

• Color buffer• Depth buffer• Stencil buffer

• New OpenGL memory object– Part of Framebuffer Object extension

Page 22: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

22

RenderbufferRenderbuffer

• Supported Operations– CPU interface

• Allocate• Free• Copy GPU CPU• Bind for write-only framebuffer access

Page 23: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

23

Pixel Buffer ObjectsPixel Buffer Objects

• Mechanism to efficiently transfer pixel data– API nearly identical to vertex buffer objects

Vertex BufferVertex

ProcessorRasterizer

FragmentProcessor

Texture

Frame Buffer(s)

VS 3.0 GPUs

Page 24: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

24

Pixel Buffer ObjectsPixel Buffer Objects

• Uses– Render-to-vertex-array

• glReadPixels into GPU-based pixel buffer• Use pixel buffer as vertex buffer

– Fast streaming textures• Map PBO into CPU memory space• Write directly to PBO• Reduces one or more copies

Page 25: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

25

Pixel Buffer ObjectsPixel Buffer Objects

• Uses (continued)– Asynchronous readback

• Non-blocking GPU CPU data copy• glReadPixels into PBO does not block• Blocks when PBO is mapped into CPU memory

Page 26: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

Summary : Render-to-Summary : Render-to-TextureTexture

• Basic operation in GPGPU apps

• OpenGL Support– Save up to 16, 32-bit floating values per pixel

• Multiple Render Targets (MRTs) on ATI and NVIDIA

1. Copy-to-texture• glCopyTexSubImage

2. Render-to-texture• GL_EXT_framebuffer_object

Page 27: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

27

Summary : Render-To-Vertex-Summary : Render-To-Vertex-ArrayArray• Enable top-of-pipe feedback loop

• OpenGL Support– Copy-to-vertex-array

• GL_ARB_pixel_buffer_object• NVIDIA and ATI

– Render-to-vertex-array• Maybe future extension to framebuffer objects

Page 28: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

28

Multiple Render to Texture (MRT) Multiple Render to Texture (MRT) [nv40][nv40]

Fragmentprogram

Fragmentprogram

MRT allows us to compress multiple passes into a single one.

This does not fundamentally change the model though, since read/write access is still not allowed.

Page 29: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

29

OverviewOverview

• GPU Memory Model• GPU Data Structure Basics• Introduction to Framebuffer Objects• Fragment Pipeline• Vertex Pipeline

Page 30: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

30

GPU Data Structure GPU Data Structure BasicsBasics• Summary of “Implementing Efficient

Parallel Data Structures on GPUs”– Chapter 33, GPU Gems II

• Low-level details– See the “Glift” talk for high-level view of GPU

data structures

• Now for the gory details…

Page 31: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

31

GPU ArraysGPU Arrays

• Large 1D Arrays– Current GPUs limit 1D array sizes to 2048 or

4096– Pack into 2D memory– 1D-to-2D address translation

Page 32: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

32

GPU ArraysGPU Arrays

• 3D Arrays– Problem

• GPUs do not have 3D frame buffers• No render-to-slice-of-3D-texture yet (coming

soon?)

– Solutions1. Stack of 2D slices2. Multiple slices per 2D buffer

Page 33: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

33

GPU ArraysGPU Arrays

• Problems With 3D Arrays for GPGPU– Cannot read stack of 2D slices as 3D texture– Must know which slices are needed in advance– Visualization of 3D data difficult

• Solutions– Flat 3D textures– Need render-to-slice-of-3D-texture

– Maybe with GL_EXT_framebuffer_object

– Volume rendering of flattened 3D data– “Deferred Filtering: Rendering from Difficult Data

Formats,” GPU Gems 2, Ch. 41, p. 667

Page 34: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

34

GPU ArraysGPU Arrays

• Higher Dimensional Arrays– Pack into 2D buffers– N-D to 2D address translation– Same problems as 3D arrays if data does not

fit in a single 2D texture

Page 35: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

35

Sparse/Adaptive Data Sparse/Adaptive Data StructuresStructures• Why?

– Reduce memory pressure– Reduce computational workload

• Examples– Sparse matrices

• Krueger et al., Siggraph 2003• Bolz et al., Siggraph 2003

– Deformable implicit surfaces (sparse volumes/PDEs)• Lefohn et al., IEEE Visualization 2003 / TVCG 2004

– Adaptive radiosity solution (Coombe et al.)

Premoze et al.Eurographics 2003

Page 36: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

36

Sparse/Adaptive Data Sparse/Adaptive Data StructuresStructures• Basic Idea

– Pack “active” data elements into GPU memory

Page 37: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

37

GPU Data StructuresGPU Data Structures

• Conclusions– Fundamental GPU memory primitive is a fixed-

size 2D array

– GPGPU needs more general memory model

– Building and modifying complex GPU-based data structures is an open research topic…

Page 38: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

38

OverviewOverview

• GPU Memory Model• GPU-Based Data Structures• Introduction to Framebuffer Objects• Fragment Pipeline• Vertex Pipeline

Page 39: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

39

Introduction to Framebuffer Introduction to Framebuffer ObjectsObjects• Where is the “Pbuffer Survival

Guide”?– Gone!!!– Framebuffer objects replace pbuffers– Simple, intuitive, fast render-to-texture in

OpenGL

http://oss.sgi.com/projects/ogl-sample/registry/EXT/framebuffer_object.txt

Page 40: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

40

Framebuffer ObjectsFramebuffer Objects

• What is an FBO?– A struct that holds pointers to memory

objects– Each bound memory object can be a

framebuffer rendering surface– Platform-independent

Texture(RGBA)

Renderbuffer(Depth)

FramebufferObject

Page 41: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

41

Framebuffer ObjectsFramebuffer Objects

• Which memory can be bound to an FBO?– Textures– Renderbuffers

• Depth, stencil, color• Traditional write-only framebuffer surfaces

Page 42: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

42

Framebuffer ObjectsFramebuffer Objects

• Usage models– Keep N textures bound to one FBO (up to 16)

• Change render targets with glDrawBuffers

– Keep one FBO for each size/format• Change render targets with attach/unattach textures

– Keep several FBOs with textures attached• Change render targets by binding FBO

Page 43: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

43

Framebuffer ObjectsFramebuffer Objects

• Performance– Render-to-texture

• glDrawBuffers is fastest on NVIDIA/ATI– As-fast or faster than pbuffers

• Attach/unattach textures same as changing FBOs

– Slightly slower than glDrawBuffers but faster than wglMakeCurrent

• Keep format/size identical for all attached memory

– Current driver limitation, not part of spec

– Readback• Same as pbuffers for NVIDIA and ATI

Page 44: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

44

Framebuffer ObjectsFramebuffer Objects

• Driver support still evolving– GPUBench FBO tests coming soon…

• “fbocheck” evalulates completeness• Other tests…

Page 45: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

45

Framebuffer ObjectFramebuffer Object

• Code examples– Simple C++ FBO and Renderbuffer classes

• HelloWorld example• http://gpgpu.sourceforge.net/

– OpenGL Spechttp://oss.sgi.com/projects/ogl-sample/registry/EXT/

framebuffer_object.txt

Page 46: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

OverviewOverview

• GPU Memory Model• GPU Data Structure Basics• Introduction to Framebuffer Objects• Fragment Pipeline• Vertex Pipeline

Page 47: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

47

The fragment pipelineThe fragment pipeline

Color R G B A

Position X Y Z W

Texture coordinat

esX Y [Z] -

Texture coordinat

esX Y [Z] -

Input: Fragment

Attributes

Interpolated from vertex information

Input: Texture Image

X Y Z W

• Each element of texture is 4D vector

• Textures can be “square” or rectangular (power-of-two or not)

32 bits = float

16 bits = half

Page 48: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

48

The fragment pipelineThe fragment pipeline

Input: Uniform parameters

• Can be passed to a fragment program like normal parameters

• set in advance before the fragment program executes

Example:A counter that tracks which pass the algorithm is in.

Input: Constant parameters

• Fixed inside program

• E.g. float4 v = (1.0, 1.0, 1.0, 1.0)

Examples:3.14159..Size of compute window

Page 49: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

49

The fragment pipelineThe fragment pipeline

Math ops: USE THEM !

• cos(x)/log2(x)/pow(x,y)

• dot(a,b)• mul(v, M)• sqrt(x)• cross(u, v)

Using built-in ops is more efficient than writing your own

Swizzling/masking: an easy way to move data around.

v1 = (4,-2,5,3); // Initializev2 = v1.yx; // v2 = (-2,4)s = v1.w; // s = 3v3 = s.rrr; // v3 = (3,3,3)

Write masking:

v4 = (1,5,3,2);v4.ar = v2; // v4=(4,5,4,-2)

Page 50: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

50

The fragment pipelineThe fragment pipeline

x

yfloat4 v = tex2D(IMG, float2(x,y))

Texture access is like an array lookup.

The value in v can be used

to perform another lookup!

This is called a dependent readTexture reads (and dependent reads) are expensive resources, and are limited in different GPUs. Use them wisely !

Page 51: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

51

The fragment pipelineThe fragment pipelineControl flow:• (<test>)?a:b operator.• if-then-else conditional

– [nv3x] Both branches are executed, and the condition code is used to decide which value is used to write the output register.

– [nv40] True conditionals• for-loops and do-while

– [nv3x] limited to what can be unrolled (i.e no variable loop limits)

– [nv40] True looping.

WARNING: Even though nv40 has true flow control, performance will suffer if there is no coherence (more on this later)

Page 52: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

52

The fragment pipelineThe fragment pipelineFragment programs use call-by-result

Notes:• Only output color can be modified• Textures cannot be written• Setting different values in different channels of

result can be useful for debugging

out float4 result : COLOR

// Do computation

result = <final answer>

Page 53: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

OverviewOverview

• GPU Memory Model• GPU Data Structure Basics• Introduction to Framebuffer Objects• Fragment Pipeline• Vertex Pipeline

Page 54: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

54

The Vertex PipelineThe Vertex Pipeline

Input: vertices• position, color, texture coords.Input: uniform and constant

parameters.• Matrices can be passed to a vertex

program.• Lighting/material parameters can

also be passed.

Page 55: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

55

The Vertex PipelineThe Vertex Pipeline

Operations:• Math/swizzle ops• Matrix operators• Flow control (as before)[nv3x] No access to textures.Output:• Modified vertices (position, color)• Vertex data transmitted to primitive

assembly.

Page 56: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

56

Vertex programs are usefulVertex programs are useful• We can replace the entire geometry

transformation portion of the fixed-function pipeline.

• Vertex programs used to change vertex coordinates (move objects around)

• There are many fewer vertices than fragments: shifting operations to vertex programs improves overall pipeline performance.

• Much of shader processing happens at vertex level.

• We have access to original scene geometry.

Page 57: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

57

Vertex programs are not usefulVertex programs are not useful

• Fragment programs allow us to exploit full parallelism of GPU pipeline (“a processor at every pixel”).

• Vertex programs can’t read input ! [nv3x]

• Current Cards can read vertex textures but can not read FBOs

Rule of thumb:If computation requires intensive calculation,it should probably be in the fragment processor.

If it requires more geometric/graphic computing, it should be in the vertex processor.

Page 58: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

58

ConclusionsConclusions

• GPU Memory Model Evolving– Writable GPU memory forms loop-back in an

otherwise feed-forward pipeline– Memory model will continue to evolve as GPUs

become more general data-parallel processors

• GPGPU Data Structures– Basic memory primitive is limited-size, 2D texture– Use address translation to fit all array dimensions

into 2D– See “Glift” talk for higher-level GPU data structures

Page 59: Aaron Lefohn University of California, Davis With updates from slides by Suresh Venkatasubramanian, University of Pennsylvania Updates performed by Gary

59

AcknowledgementsAcknowledgements

• Adam Moerschell, Shubho Sengupta UCDavis• Mike Houston Stanford

University• John Owens, Ph.D. advisor UC Davis

• National Science Foundation Graduate Fellowship

• Extra slides were added by Gary Katz from Suresh Venkatasubramanian, lecture 3 found at http://www.cis.upenn.edu/~suvenkat/700/

• Alteration to this slide package were made without the authorization by the original authors and should be used for educational purposes only.