fragment-parallel composite and filter anjul patney, stanley tzeng, and john d. owens university of...

30
Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Post on 19-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Fragment-Parallel Composite and Filter

Anjul Patney, Stanley Tzeng, and John D. OwensUniversity of California, Davis

Page 2: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Parallelism in Interactive Graphics• Well-expressed in hardware as well as APIs

• Consistently growing in degree & expression–More and more cores on upcoming GPUs– From programmable shaders to pipelines

• We should rethink algorithms to exploit this

• This paper provides one example– Parallelization of composite/filter stages

Page 3: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

A Feed-Forward Rendering Pipeline

Geometry Processing

Rasterization

Composite

Filter

Primitives

Pixels

Page 4: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Composite & Filter

• Input: – Unordered list of

fragments

• Output– Pixel colors

• Assumption– No fragments are

discarded

Pixel

Sample Locations

Page 5: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Basic Idea

Pixel-Parallel

Processors

Page 6: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Basic Idea

Insufficientparallelism

Irregularity

Fragment-Parallel

Processors

Page 7: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Motivation

• Most applications have low depth complexity– Pixel-level parallelism is sufficient

• We are interested in applications with– Very high depth complexity– High variation in depth complexity

• Further– Future platforms will demand more parallelism– High depth-complexity can limit pixel-

parallelism

Page 8: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Motivation

10

70

130

190

250

310

370

430

490

550

610

670

730

10

100

1000

10000

100000

1000000

Distribution of DepthComplexity

Number of depth layers

Nu

mb

er

of

su

bp

ixels

Page 9: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Related Work

Order-Independent Transparency (OIT)

• Depth-Peeling [Everitt 01]

– One pass per transparent layer

• Stencil-Routed A-buffer [Myers & Bavoil 07]

– One pass per 8 depth layers1

• Bucket Depth-Peeling [Liu et al. 09]

– One pass per up to 32 layers21 Maximum MSAA samples per pixel2 Maximum render targets

Page 10: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Related Work

Order-Independent Transparency (OIT)

• OIT using Direct3D 11 [Gruen et al. 10]

– Use fragment linked-lists– Per-pixel sort and composite

• Hair Self-Shadowing [Sintorn et al. 09]

– Each fragment computes its contribution– Assumes constant opacity

Page 11: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Related Work

Programmable Rendering Pipelines

• RenderAnts [Zhou et al. 09]

– Sort fragments globally– Per-pixel composite/filter

• FreePipe [Liu et al. 10]

– Sort fragments globally– Per-pixel composite/filter

Page 12: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Pixel-Parallel FormulationPi P(i+1) P(i+2)

Sj S(j+1) S(j+2) S(j+3) S(j+4) S(j+5) S(j+6)j (j+1) (j+2) (j+3) (j+4) (j+5) (j+6)Thread IDs

P: PixelS: Subsample

Page 13: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Fragment-Parallel Formulation

P: PixelS: Subsample

Pi P(i+1) P(i+2)

Sj S(j+1) S(j+2) S(j+3) S(j+4) S(j+5) S(j+6)

P: PixelS: Subsample

Thread IDs

jj+ 1 j+ 2 j+ 3 j+ 4 j+ 5 j+ 6 j+ 7 j+ 8 j+ 9 j+ 10 j+ 11 j+ 12 j+ 13 j+ 14 j+ 15 j+ 16 j+ 17 j+ 18 j+ 19 j+ 20 j+ 21 j+ 22 j+ 23

Page 14: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Fragment-Parallel Formulation• How can this behavior be achieved?

• Revisit the composite equation

Cs = α1C1 + (1-α1){α2C2+(1-α2)(…(αN+(1-αN)CB)…}fragment 1 fragment 2 … background

Cs = 1.α1.C1 + (1-α1).α2.C2 + (1-α1)(1-α2).α3.C3 + …

+ (1-α1)(1-α2)…(1-αk-1).αi.Ck + …

+ (1-α1)(1-α2)…(1-αN).CBLocal Contribution Lk

Global Contribution Gk

Page 15: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Fragment-Parallel Formulation

• Lk is trivially parallel (local computation)

• Gk is the result of a scan operation (product)

• For the list of input fragments– Compute G[ ] and L[ ], multiply– Perform reduction to add subpixel contributions

Cs = G1.L1 + G2.L2 + G3.L3 … GN.LN

Gk = (1-α1).(1-α2)…(1-αk-1)Lk = αk.Ck

Page 16: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Fragment-Parallel Formulation• Filter, for every pixel:

• This can be expressed as another reduction– After multiplying with subpixel weights

κm

– Can be merged with previous reduction

Cp = Cs1.κ1 + Cs2.κ2 + … + CsM.κM

Page 17: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Fragment-Parallel Composite & Filter

Final Algorithm

1. Two-key sort (Subpixel ID, depth)

2. Segmented Scan (obtain Gk)

3. Premultiply with weights (Lk, κm)

4. Segmented Reduction

Page 18: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Fragment-Parallel Formulation

P: PixelS: Subsample

Pi P(i+1) P(i+2)

P: PixelS: Subsample

Segmented Scan (product)

Segmented Reduction (sum)

Page 19: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Implementation

• Hardware used: NVIDIA GeForce GTX 280

• We require fast Segmented Scan and Reduce– CUDPP library provides that– Restricts implementation to NVIDIA CUDA

• No direct access to hardware rasterizer–We wrote our own

Page 20: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Example System – Polygons

• Applications– Games

• Depth Complexity– 1 to few tens of layers– Suited to pixel-parallel

• Fragment-parallel software rasterizer

Page 21: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Example System – Particles

• Applications– Simulations, games

• Depth Complexity– Hundreds of layers– High depth-variance

• Particle-parallel sprite rasterizer

Page 22: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Example System – Volumes

• Applications– Scientific Visualization

• Depth Complexity– Tens to Hundreds of

layers– Low depth-variance

• Major-axis-slice rasterizer

Page 23: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Example System – Reyes

• Applications– Offline rendering

• Depth Complexity– Tens of layers– Moderate depth variance

• Data-parallel micropolygon rasterizer

Page 24: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Performance Results

Part

icle

s

Volu

me

Reye

s (g

rass

)

Poly

gon

0

100

200

300

400

500

600

Ren

deri

ng

Tim

e (

ms)

Fragment GenerationPixel-Parallel Composite/FilterFragment-Parallel Composite/Fil-ter

Page 25: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Performance Variation

0 200 400 600 800 1000 1200 1400 16001.00E+05

1.00E+06

1.00E+07

1.00E+08

Performance Variation

Fragment-ParallelPixel-Parallel

Depth Complexity

Fra

gm

en

ts p

er

se

co

nd

Page 26: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Limitations

• Increased memory traffic– Several passes through CUDPP

primitives

• Unclear how to optimize for special cases– Threshold opacity– Threshold depth complexity

Page 27: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Summary and Conclusion

• Parallel formulation of composite equation–Maps well to known primitives– Can be integrated with filter– Consistent performance across varying workloads

• FPC is applicable to future rendering pipelines– Exploits higher degree of parallelism– Better related to size of rendering workload

• A tool for building programmable pipelines

Page 28: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Future Work

• Performance– Reduction in memory traffic– Extension to special-case scenes– Hybrid PPC-FPC formulations

• Applications– Integration with hardware rasterizer– Cinematic rendering, Photoshop

Page 29: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Acknowledgments

• NSF Award 0541448• SciDAC Insitute for Ultrascale

Visualization• NVIDIA Research Fellowship • Equipment donated by NVIDIA• Discussions and Feedback

– Shubho Sengupta (UC Davis), Matt Pharr (Intel), Aaron Lefohn (Intel), Mike Houston (AMD)

– Anonymous reviewers

• Implementation assistance– Jeff Stuart, Shubho Sengupta

Page 30: Fragment-Parallel Composite and Filter Anjul Patney, Stanley Tzeng, and John D. Owens University of California, Davis

Thanks!