real-time reyes: programmable pipelines and research challenges anjul patney university of...

57
Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Upload: marshall-welch

Post on 17-Dec-2015

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Real-Time Reyes: Programmable Pipelines and Research Challenges

Anjul PatneyUniversity of California, Davis

Page 2: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Real-Time Reyes-Style Adaptive Surface Subdivision

Anjul Patney and John D. OwensSIGGRAPH Asia 2008 (to appear)

http://graphics.idav.ucdavis.edu/publications/print_pub?pub_id=952

Page 3: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Cinematic rendering looks goodImage courtesy: Pixar

Page 4: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

High geometric complexityImage courtesy: Pixar

Page 5: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

High shading complexityImage courtesy: Pixar

Page 6: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Motion blur, Depth-of-fieldImage courtesy: Pixar

Page 7: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Complex lightingImage courtesy: Pixar

Page 8: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

All this is slowImage courtesy: Pixar

Page 9: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Our Goal

• Make it faster– As much as possible in real time

• Use the GPU– Massive available parallelism– Fixed-function texture/raster units– High Programmability

Page 10: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Enter Reyes• Developed in 1980s, offline

• Pixel-accurate smooth surfaces

• Eye-space shading

• Stochastic sampling

• Order-independent transparency

Page 11: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Bound/Split

Dice (tessellate)

Displace

Shade

Compose

Sample

Vertex Shade

Geometry Shade

Rasterize

Fragment Shade

Blend

Depth-buffer

Reyes Direct3D 10

Page 12: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Bound/Split

Dice (tessellate)

Displace

Shade

Compose

Sample

• High-quality geometry

• Convenient surface shading

• Cinematic quality

• Well-behaved (?)

Reyes

Page 13: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Bound/Split

Dice (tessellate)

Displace

Shade

Compose

Sample

Reyes

• Input:Smooth surfaces

• Output:0.5 x 0.5 pixel quads (micropolygons)

Page 14: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Outline

• Reyes Subdivision – algorithm

• Subdivision on GPU – implementation

• Results

• Limitations

Page 15: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Outline

• Reyes Subdivision – algorithm

• Subdivision on GPU – implementation

• Results

• Limitations

Page 16: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Split-Dice

• Recursively split a surface till dicing makes sense

• Uniformly sample it to form a grid of micropolygons

Page 17: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Split

Bound/Cull

Dice

Diceable?

No: Split

Yes:Dice

Page 18: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Dice

Grid

Micropolygon

Post-split surface

Page 19: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Split-Dice is hard

• Split–Recursive, serial–Rapid primitive

generation/destruction

• Dice–Huge memory demand

Page 20: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Can we do this in parallel?

Page 21: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Parallel Split

A lot of independent operations

Regular computation

Page 22: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Analogy: A Dynamic work queue

A B C D E F G H I

A A B C EE F H

Split No ActionCull

C

Page 23: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

How can we do these efficiently?

• Creating new primitives– How to dynamically allocate space?

• Culling unneeded primitives– How to avoid fragmentation?

Page 24: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

A child primitive is offset by the queue length

Our Choice – keep it simple…

A B C D E F G H I

A B C E F H A C E

Page 25: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

…and get rid of the holes later

A B C E F H A C E

A B C E F H A C E

Scan-based compact is fast!

(Sengupta ‘07)

Page 26: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Storage Issues for Dice

• Too many micropolygons– Cannot reject early

• Screen-space buckets (tiles)– In parallel ?– Ideal bucket size ?

Page 27: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Outline

• Reyes Subdivision – algorithm

• Subdivision on GPU – implementation

• Results

• Limitations

Page 28: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Platform

• NVIDIA GeForce 8800 GTX– 16 SIMD Multiprocessors– 16KB shared memory– 768 MB memory (no cache)

• NVIDIA CUDA 1.1– Grids/Blocks/Threads– OpenGL interface

Page 29: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Implementation Details

• Input choice: Bicubic Bézier Surfaces– Only affects implementation

• View Dependent Subdivision every frame– Single CPU-GPU transfer

• Final micropolygons written to a VBO– Flat-shaded and displayed

Page 30: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Kernels Implemented

• Dice– Regular, symmetric, parallel– 256 threads per patch– Primitive information in shared memory

• Bound/Split– Hard to ensure efficiency

Page 31: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Bound/Split: Efficiency Goals

• Memory Coherence– Off-chip memory accesses must be

efficient

• Computational Efficiency– Hardware SIMD must be maximally utilized

Page 32: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Memory Coherence during Split

• Compact work-queue after each iteration– Primitives always contiguous in memory

• Structure-Of-Arrays representation– Primitive attributes adjacent in memory

• 99.5% of all accesses fully coalesced

Page 33: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

SIMD utilization during split• Intra-Primitive parallelism

– Independent control points– Negligible divergence

• Vectorized Split– 16 Threads per patch

• 90.16% of all branches SIMD coherent

32 threads (1 warp)

Page 34: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Outline

• Reyes Subdivision – algorithm

• Subdivision on GPU – implementation

• Results

• Limitations

Page 35: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Results - Killeroo

• 14426 grids

• 5 levels of subdivision

• Bound/Split: 6.99 ms• Dice: 7.21ms

• 29.69 frames/second(19.92 with 16x AA)

Killeroo model courtesy: Headus Inc.

Page 36: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Results - Teapot

• 4823 grids

• 11 levels of subdivision

• Bound/Split: 3.46 ms• Dice: 2.42 ms

• 60.07 frames/second(30.02 with 16x AA)

Page 37: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Results – Random scenes

0 5000 10000 15000 200000

5

10

15

20

25

30

Number of grids

Tim

e (

ms)

Subdivision time proportional to number

of micropolygons

Dice

Split

Total

Page 38: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Teapot Killeroo Random Model (30 patches)

0%

25%

50%

75%

100%

Rendering

CUDA-OpenGL trans-fer

Normal generation

Subdivision

Render time BreakdownRendering lots of

small triangles

Should ideally be zero: CUDA limitation

Page 39: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Results - Screen-space buckets

0 128 256 384 5120

10

20

30

40

50

60

70

80

90

100

0

10

20

30

40

50

Subdivision time Memory Usage (max.) Memory Usage (avg.)

Bucket width (pixels)

Su

bd

ivis

ion

Tim

e (

ms)

Mem

ory

Usag

e (

MB

)Acceptable performance,

Small memory footprint

Page 40: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Outline

• Reyes Subdivision – algorithm

• Subdivision on GPU – implementation

• Results

• Limitations

Page 41: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Limitations

• Can’t split and dice in parallel

• Uniform dicing

• Cracks

Page 42: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Limitations – Uniform dicing

Page 43: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Limitations - Cracks

Page 44: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Limitations - Cracks

Page 45: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Conclusions

• Breadth-first recursive subdivision– Suits GPUs– Works fast

• Dicing Programmable tessellation is fast– 500M micropolygons/sec

• It is time to experiment with alternate graphics pipelines

Page 46: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Reyes in real-time rendering

• Visually superior to polygon pipeline

• Regular, highly parallel workload

• Extremely well-studied

• Good candidate for a real-time system?

Page 47: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Future Work

• Cracks, Displacement mapping

• Rest of Reyes– Offline quality shading– Interactive lighting– Parallel Stochastic Sampling (Wei 2008)– A-buffer (Myers 2007)

Page 48: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Thanks to

• Feedback and suggestions– Per Christensen, Charles Loop, Dave Luebke,

Matt Pharr and Daniel Wexler

• Financial support– DOE Early Career PI Award– National Science Foundation– SciDAC Institute for Ultrascale Visualization

• Equipment support from NVIDIA

Page 49: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Teapot VideoReal-time footage

Page 50: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Killeroo VideoReal-time footage

Page 51: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

BACKUP SLIDESReal-Time Reyes-Style Adaptive Surface Subdivision

Page 52: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

CUDA Thread Structure

Image courtesy: NVIDIA CUDA Programming Guide, 1.1

Page 53: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

CUDA Memory Architecture

Image courtesy: NVIDIA CUDA Programming Guide, 1.1

Page 54: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Displacement Mapping

• Fairly simple if cracks can be avoided– Displace adjacent grids together

Page 55: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Shading & Lighting

• Interactive preview– Lpics / Lightspeed

• Shaders– Intelligent textures– File access

• Shadows

Page 56: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Composite/Filter Stuff

• Stochastic sampling– 20x speedup on a GPU (Wei 2008)

• A-buffer

Page 57: Real-Time Reyes: Programmable Pipelines and Research Challenges Anjul Patney University of California, Davis

Special Effects

• Motion Blur and DOF

• Global Illumination

• Ambient Occlusion