gpu data formatting and addressing aaron lefohn university of california, davis

46
GPU Data Formatting and GPU Data Formatting and Addressing Addressing Aaron Lefohn University of California, Davis

Upload: damian-armstrong

Post on 13-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU Data Formatting and GPU Data Formatting and AddressingAddressing

Aaron Lefohn University of California, Davis

Page 2: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

OverviewOverview• GPU Memory Model

• GPU-Based Data Structures

• Performance Considerations

Page 3: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU memory modelGPU memory model

• GPU Data Storage– Vertex data– Texture data– Frame buffer

Vertex DataVertex

ProcessorRasterizer

FragmentProcessor

Texture Data

Frame Buffer(s)

PS3.0 GPUs

Page 4: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU memory modelGPU memory model

• Read-Only– Traditional use of GPU memory– CPU writes, GPU reads

• Read/Write– Save frame buffer(s) for later use as texture or vertex array– Save up to 16, 32-bit floating values per pixel

• Multiple Render Targets (MRTs)

Page 5: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

How to Save Render ResultHow to Save Render Result

1. Copy framebuffer result to “other GPU memory”– Copy-to-texture– Copy-to-vertex-array

2. Write directly to “other GPU memory'' – Render-to-texture– Render-to-vertex-array

Page 6: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

OpenGL GPU Memory WritesOpenGL GPU Memory Writes

• Texture1. Copy frame buffer to texture

2. Render-to-texture• WGL_ARB_render_texture • GL_EXT_render_target• Superbuffers

• Vertex Array1. Copy frame buffer to vertex array

• GL_EXT_pixel_buffer_object• Superbuffers

2. Render-to-vertex-array• Superbuffers

Page 7: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Render-To-Texture: 1Render-To-Texture: 1

• Copy-To-Texture– Good

• Cross-Platform texture writes• Flexible output• 2D output Copy to 1D, 2D, or 3D texture

– Bad• Slow • Consumes internal GPU memory bandwidth

Page 8: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Render-To-Texture: 2Render-To-Texture: 2

• WGL_ARB_render_texture– Render-to-texture (RTT) using pbuffers

http://oss.sgi.com/projects/ogl-sample/registry/ARB/wgl_render_texture.txt

– Good• Fast RTT• Current state of the art for RTT

– Bad• Only works on Windows• Slow OpenGL context switches • Many hacks to avoid this bottleneck

Page 9: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Render-To-Texture: 3Render-To-Texture: 3

• GL_EXT_render_target– Proposed extension for cross-platform RTT

http://www.opengl.org/resources/features/GL_EXT_render_target.txt

– Good• Cross-platform, efficient RTT solution• Lightweight, simple extension

– Bad• Specification not approved (April 24, 2004)• No implementations exist (April 24, 2004)

Page 10: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Render-To-Texture: 4Render-To-Texture: 4

• Superbuffers– Proposed new memory model for GPUs

http://www.ati.com/developer/gdc/SuperBuffers.pdf

– Good• Unified GPU memory model• Render to any GPU memory• Cross platform (OpenGL owns memory, not OS)• Mix-and-match depth/stencil/color buffers

– Bad• Large, complex extension• Specification not approved (April 24, 2004)• Only driver support is alpha version (ATI)

Page 11: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Render-To-Texture SummaryRender-To-Texture Summary

• OpenGL RTT Currently Only Under Windows– Pbuffers

• Complex and awkward RTT mechanism• Current state of the art

• Cross-Platform RTT Coming Soon…

Page 12: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Render-To-Vertex-Array: 1Render-To-Vertex-Array: 1

• GL_EXT_pixel_buffer_object– Copy framebuffer to vertex buffer object

http://developer.nvidia.com/object/nvidia_opengl_specs.html

– Good• Only GPU/AGP memory bandwidth• Works with current drivers (NVIDIA)

– Bad• No direct render-to-vertex-array (slower than true RTVA)• No ATI implementation

Page 13: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Render-To-Vertex-Array: 2Render-To-Vertex-Array: 2

• Superbuffers– Write to “memory object” as render target – Read from “memory object” as vertex array

– Good• Direct render-to-vertex-array (fast)

– Bad• Can render results always be interpreted as vertex data?• Large, complex, unapproved extension, …

Page 14: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Render-To-Vertex-Array SummaryRender-To-Vertex-Array Summary

• Current OpenGL Support– NVIDIA: GL_EXT_pixel_buffer_object– ATI: Superbuffers

• Semantics Still Under Development…

Page 15: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Fbuffer: Capturing FragmentsFbuffer: Capturing Fragments

• Idea– “Rasterization-Order FIFO Buffer”– Render results are fragment values instead of pixel values– Mark and Proudfoot, Graphics Hardware 2001

http://graphics.stanford.edu/projects/shading/pubs/hwws2001-fbuffer/

• Uses– Designed for multi-pass rendering with transparent geometry– New possibilities for GPGPU?

• Varying number of results per pixel• RTT and RTVA with an fbuffer?

Page 16: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Fbuffer: Capturing FragmentsFbuffer: Capturing Fragments

• Implementations– ATI Radeon 9800 and newer ATI GPUs– Not yet exposed to user (ask for it!)

• Problems– Size of fbuffer is not known before rendering– GPUs cannot perform dynamic memory allocation– How to handle buffer overflow?

Page 17: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

OverviewOverview• GPU Memory Model

• GPU-Based Data Structures

• Performance Considerations

Page 18: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU-Based Data StructuresGPU-Based Data Structures

• Building Blocks– GPU memory addresses

• Address Generation• Address Use• Pointers

– Multi-dimensional arrays– Sparse representations

Page 19: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU Memory AddressesGPU Memory Addresses

• Where Are Addresses Generated?– CPU Vertex stream or textures– Vertex processor Input stream, ALU ops or textures– Rasterizer Interpolation– Fragment processor Input stream, ALU ops or textures

Vertex Processor

Rasterizer FragmentProcessor

CPU

Page 20: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU Memory AddressesGPU Memory Addresses

• Where Are Addresses Used?– Vertex textures (PS3.0 GPUs)– Fragment textures

Vertex Processor

RasterizerFragmentProcessor

Texture Data

CPU

Page 21: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU Memory AddressesGPU Memory Addresses

• Pointers– Store addresses in texture– Dependent texture read– Example: See Tim Purcell’s ray tracing talk

float2 addr = tex2D( addrTex, texCoord );

float2 data = tex2D( dataTex, addr );

3311

DataDataDataData

Address Texture Data Texture

0123

0123

Page 22: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU-Based Data StructuresGPU-Based Data Structures

• Building Blocks– GPU memory addresses

• Address Generation• Address Use• Pointers

– Multi-dimensional arrays– Sparse representations

Page 23: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Multi-Dimensional ArraysMulti-Dimensional Arrays

• Build Data Structures in 2D Memory– Read/Write GPU memory optimized for 2D – Images

• But Isn’t Physical Memory 1D?– GPU memory hierarchy optimized to capture 2D locality

• Rasterization• Texture filtering• Igehy, Eldridge, Proudfoot, “"Prefetching in a Texture

Cache Architecture,” Graphics Hardware, 1998

• Conclusion: Use illusion of 2D physical memory

Page 24: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU ArraysGPU Arrays

• Large 1D Arrays– Current GPUs limit 1D array sizes to 2048 or 4096– Pack into 2D memory– 1D-to-2D address translation

Page 25: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU ArraysGPU Arrays

• 3D Arrays– Problem

• GPUs do not have 3D frame buffers• No RTT to slice of 3D texture (except Superbuffers)

– Solutions

1. Stack of 2D slices

2. Multiple slices per 2D buffer

Page 26: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU ArraysGPU Arrays

• Problems With 3D Arrays for GPGPU– Cannot read stack of 2D slices as 3D texture– Must know which slices are needed in advance– Visualization of 3D data difficult

• Solutions– Need render-to-slice-of-3D-texture (Superbuffers)– Volume rendering of slice-based 3D data

• Course 28, “Real-Time Volume Graphics”, Siggraph 2004

Page 27: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU ArraysGPU Arrays

• Higher Dimensional Arrays– Pack into 2D buffers– N-D to 2D address translation– Same problems as 3D arrays if data does not fit in a single

2D texture

• Conclusions– Fundamental GPU memory primitive is a fixed-size 2D array– GPGPU needs more general memory model

Page 28: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

GPU-Based Data StructuresGPU-Based Data Structures

• Building Blocks– GPU memory addresses

• Address Generation• Address Use• Pointers

– Multi-dimensional arrays– Sparse representations

Page 29: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Sparse Data StructuresSparse Data Structures

• Why Sparse Data Structures?– Reduce computational workload – Reduce memory pressure

• Examples– Sparse matrices

• Krueger et al., Siggraph 2003• Bolz et al., Siggraph 2003

– Implicit surface computations (sparse volumes)• Sherbondy et al., IEEE Visualization 2003• Lefohn et al., IEEE Visualization 2003

Premoze et al.Eurographics 2003

Page 30: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Sparse ComputationSparse Computation

• Option 1: Store Complete Data Set on GPU– Cull unused data– Conditional execution tricks (discussed earlier)

• Option 2: Store Only Sparse Data on GPU– Saves memory– Potentially much faster than culling– Much more complicated (especially if time-varying)

Page 31: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Sparse Data StructuresSparse Data Structures

• Basic Idea– Pack “active” data elements into GPU memory– For more information

• Linear algebra section in this course : Static structures• Level-set case study in this course : Dynamic

structures

Page 32: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Sparse Data StructuresSparse Data Structures

• Addressing Sparse Data– Neighborhoods no longer implicitly defined on grid

– Use pointer-based data structures to locate neighbors• Pre-compute neighbor addresses if possible

– Use CPU or vertex processor– Removes pointer dereference from fragment program

– Separate common addressing case from boundary conditions• Common case must be cache coherent• See Harris and Lefohn case studies for “substream”

technique

Page 33: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

OverviewOverview• GPU Memory Model

• GPU-Based Data Structures

• Performance Considerations

Page 34: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Memory Performance IssuesMemory Performance Issues

• Pbuffer Survival Guide

• Dependent Texture Costs

• Computational Frequency

Page 35: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Pbuffer Survival GuidePbuffer Survival Guide

• Pbuffers Give us Render-To-Texture– Designed to create an environment map or two– Never intended to be used for GPGPU (100s of pbuffers)

– Problem• Each pbuffer has its own OpenGL render context• Each pbuffer may have depth and/or stencil buffer• Changing OpenGL contexts is slow

– Solution• Many optimizations to avoid this bottleneck…

Page 36: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Pbuffer Survival GuidePbuffer Survival Guide

1. Pack Scalar Data Into RGBA– > 4x memory savings– 4x reduction in context switches– Be careful of read-modify-write hazard

1 RGBA PbufferScalar Data in 4 RGBA Pbuffers

Page 37: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Pbuffer Survival GuidePbuffer Survival Guide

2. Use Multi-Surface Pbuffers – Each RGBA surface is its own render-texture

• Front, Back, AuxN (N = 0,1,2,…)– Greatly reduces context switches– Technically illegal, but “blessed” by ATI. Works on NVIDIA.

1 Pbuffer5 RGBA Surfaces

5 Pbuffers1 RGBA Surface Each

Page 38: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Pbuffer Survival GuidePbuffer Survival Guide

2. Using Multi-Surface Pbuffers

a) Allocate double buffer pbuffer (and/or with AUX buffers)

b) Set render target to back bufferglDrawBuffer(GL_BACK)

2. Bind front buffer as texturewglBindTexImageARB(hpbuffer, WGL_FRONT_ARB)

a) Render

b) Switch bufferswglReleaseTexImageARB(hpbuffer, WGL_FRONT_ARB)

glDrawBuffer(GL_FRONT)

wglBindTexImageARB(hpbuffer, WGL_BACK_ARB)

Page 39: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Pbuffer Survival GuidePbuffer Survival Guide

3. Pack 2D domains into large buffer– “Flat 3D textures”– Be careful of read-modify-write hazard

Flattened Volume3D Volume

Page 40: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Dependent Texture CostsDependent Texture Costs

• Cache Coherency– Dependent reads fast if they hit cache

• Even chained dependencies can be same speed as non-dependent reads

– Very slow if out of cache• Example:

3 levels of dependent cache misses can be >10x slower

– More detail in “GPU Computation Strategies and Tricks”

Page 41: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Computational FrequencyComputational Frequency

• Compute Memory Addresses at Low Frequency– Compute memory addresses in vertex program

• Let rasterizer interpolation create per-fragment addresses• Compute neighbor addresses this way

– Avoid fragment-level address computation whenever possible• Consumes fragment instructions• Computation often redundant with neighboring fragments• May defeat texture pre-fetch

Page 42: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

ConclusionsConclusions

• GPU Memory Model Evolving– Writable GPU memory forms loop-back in an otherwise feed-

forward streaming pipeline– Memory model will continue to evolve as GPUs become more

general stream processors

• GPGPU Data Structures– Basic memory primitive is limited-size, 2D texture– Use address translation to fit all array dimensions into 2D– Maintain 2D cache locality

• Render-To-Texture– Use pbuffers with care and eagerly adopt their successor

Page 43: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Selected ReferencesSelected References

• J. Boltz, I. Farmer, E. Grinspun, P. Schoder, “Spare Matrix Solvers on the GPU: Conjugate Gradients and Multigrid,” SIGGRAPH 2003

• N. Goodnight, C. Woolley, G. Lewin, D. Luebke, G. Humphreys, “A Multigrid Solver for Boundary Value Problems Using Programmable Graphics Hardware,” Graphics Hardware 2003

• M. Harris, W. Baxter, T. Scheuermann, A. Lastra, “Simulation of Cloud Dynamics on Graphics Hardware,“ Graphics Hardware 2003

• H. Igehy, M. Eldridge, K. Proudfoot, “Prefetching in a Texture Cache Architecture,” Graphics Hardware 1998

• J. Krueger, R. Westermann, “Linear Algebra Operators for GPU Implementation of Numerical Algorithms,” SIGGRAPH 2003

• A. Lefohn, J. Kniss, C. Hansen, R. Whitaker, “A Streaming Narrow-Band Algorithm: Interactive Deformation and Visualization of Level Sets,” IEEE Transactions on Visualization and Computer Graphics 2004

Page 44: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Selected ReferencesSelected References

• A. Lefohn, J. Kniss, C. Hansen, R. Whitaker, “Interactive Deformation and Visualization of Level Set Surfaces Using Graphics Hardware,” IEEE Visualization 2003

• W. Mark, K. Proudfoot, “The F-Buffer: A Rasterization-Order FIFO Buffer for Multi-Pass Rendering,” Graphics Hardware 2001

• T. Purcell, C. Donner, M. Cammarano, H. W. Jensen, P. Hanrahan, “Photon Mapping on Programmable Graphics Hardware,” Graphics Hardware 2003

• A. Sherbondy, M. Houston, S. Napel, “Fast Volume Segmentation With Simultaneous Visualization Using Programmable Graphics Hardware,” IEEE Visualization 2003

Page 45: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

OpenGL ReferencesOpenGL References

• GL_EXT_pixel_buffer_objecthttp://www.nvidia.com/dev_content/nvopenglspecs/GL_EXT_pixel_buffer_object.txt

• GL_EXT_render_target, http://www.opengl.org/resources/features/GL_EXT_render_target.txt

• OpenGL Extension Registryhttp://oss.sgi.com/projects/ogl-sample/registry/

• Superbuffershttp://www.ati.com/developer/gdc/SuperBuffers.pdf

• WGL_ARB_render_texturehttp://oss.sgi.com/projects/ogl-sample/registry/ARB/wgl_render_texture.txthttp://oss.sgi.com/projects/ogl-sample/registry/ARB/wgl_pbuffer.txt

Page 46: GPU Data Formatting and Addressing Aaron Lefohn University of California, Davis

Questions?Questions?

• Acknowledgements– Cass Everitt, Craig Kolb, Chris Seitz, and Jeff Juliano at NVIDIA– Mark Segal, Rob Mace, and Evan Hart at ATI– GPGPU Siggraph 2004 course presenters– Joe Kniss and Ross Whitaker– Brian Budge– John Owens– National Science Foundation Graduate Fellowship– Pixar Animation Studios