embarrassingly parallel computation for occlusion culling

36
Embarrassingly Parallel Computation for Visibility Jasin Bushnaief Umbra Software

Upload: jasinb

Post on 12-Dec-2014

1.037 views

Category:

Technology


0 download

DESCRIPTION

One of the key challenges of modern 3D game rendering engines powering the next-generation of console games is to minimize resources spent on assets that do not actually contribute to the user experience. More specifically, determining which surfaces are hidden behind (occluded by) other surfaces can be a very hard problem to solve in real-time, but will typically yield significant performance gains. Real-time occlusion culling typically requires either a vast amount of manual labor or a computationally intensive pre-processing step. In this talk, I will show how the occluder generation step can actually be considered embarrassingly parallel, and distributed across multiple nodes accordingly. I will also discuss how this model can be further improved.

TRANSCRIPT

Page 1: Embarrassingly Parallel Computation for Occlusion Culling

Embarrassingly Parallel Computation for Visibility

Jasin BushnaiefUmbra Software

Page 2: Embarrassingly Parallel Computation for Occlusion Culling

Who are we?

• The only occlusion culling middleware company in the world

• Founded in 2006• Based in Helsinki• 12 people• Customers: Bungie (Halo), Guerrilla (Killzone),

Remedy (Alan Wake), Bioware (Mass Effect), CD Projekt (Witcher), ArenaNet (Guild Wars) and many more

Page 3: Embarrassingly Parallel Computation for Occlusion Culling

We’re going to talk about

• The past– Brief introduction to occlusion culling– Traditional methods of visibility computation

• The present– Umbra’s visibility computation algorithm– How it can be distributed

• The future– Challenges of modern games and engines

Page 4: Embarrassingly Parallel Computation for Occlusion Culling

SO, WHAT’S OCCLUSION CULLING ANYWAY?

The Past:

Page 5: Embarrassingly Parallel Computation for Occlusion Culling

Graphics in games

• Game development process:– Artists create content– Engine runtime renders it

• Rendering– Content consists of objects– Which consist of triangles– Which get rendered by the GPU

• Our business: rendering optimization

Page 6: Embarrassingly Parallel Computation for Occlusion Culling

Occlusion culling explained

• ”Culling is the process of removing breeding animals from a group based on specific criteria.” (Wikipedia) 

• Hidden surface removal: ”Which surfaces do not contribute to the final rendered image on the screen?”

• Some popular HSR methods:– Frustum culling– Backface culling– Occlusion culling

Page 7: Embarrassingly Parallel Computation for Occlusion Culling

Occlusion culling explained

• Occlusion culling: ”Which surfaces are blocked (occluded) by other surfaces?”

• Depth buffering is one way to do OC– Very accurate (i.e. pixel level)– Ubiquitous on hardware, easy problem to solve– Occurs very late in the pipeline

Page 8: Embarrassingly Parallel Computation for Occlusion Culling

Occlusion culling explained

• Higher-level methods complement depth-buffering nicely

• These cull entire objects, groups of objects or entire sections of the scene– Not easy!

• The earlier, the better

Page 9: Embarrassingly Parallel Computation for Occlusion Culling

Occlusion culling

Only the objects visible to the camera are rendered

Page 10: Embarrassingly Parallel Computation for Occlusion Culling

”Traditional” way to do OC

• Preprocess:– Divide scene into cells– Compute visibility between cells• Results in a visibility matrix (PVS)

• Runtime:– Locate the camera– Do a lookup into the PVS matrix

Page 11: Embarrassingly Parallel Computation for Occlusion Culling

Simple example

Page 12: Embarrassingly Parallel Computation for Occlusion Culling

Split scene into cells

A

D

B

C

Page 13: Embarrassingly Parallel Computation for Occlusion Culling

Compute visibility (sampling)

A B C DA 1 1 1 0BCD

A

DC

B

Page 14: Embarrassingly Parallel Computation for Occlusion Culling

Compute visibility

A B C DA 1 1 1 0B 1 1 0 1CD

A

DC

B

Page 15: Embarrassingly Parallel Computation for Occlusion Culling

Compute visibility

A B C DA 1 1 1 0B 1 1 0 1C 1 0 1 1D

A

DC

B

Page 16: Embarrassingly Parallel Computation for Occlusion Culling

Compute visibility

A B C DA 1 1 1 0B 1 1 0 1C 1 0 1 1D 0 1 1 1

A

DC

B

Page 17: Embarrassingly Parallel Computation for Occlusion Culling

Runtime PVS culling

A B C DA 1 1 1 0B 1 1 0 1C 1 0 1 1D 0 1 1 1

C

A

D

B

Page 18: Embarrassingly Parallel Computation for Occlusion Culling

Problem?

• Solving visibility between cells is very difficult– E.g. Solving analytically is actually O(n4)

• Global operation by nature• Doesn’t play well with dynamic scenes– Worst case: a change in one cell requires

recomputation of the entire matrix

Page 19: Embarrassingly Parallel Computation for Occlusion Culling

UMBRA DOES IT BETTERThe Present

Page 20: Embarrassingly Parallel Computation for Occlusion Culling

Welcome to the 2010s

• Modern game worlds are huge• So it’d be cool if you didn’t need the entire

scene in memory, ever• It’d be even cooler if the heavy lifting could be

distributed. Or sent to the Cloud™• Buildings collapse. Things change.

Page 21: Embarrassingly Parallel Computation for Occlusion Culling

The Umbra approach

• Don’t actually compute visibility for the entire scene

• Instead, process geometry to create a datastructure to solve visibility in the runtime

• Portal culling in the runtime

Page 22: Embarrassingly Parallel Computation for Occlusion Culling

Data generation

• Data = portal graph• Generate local graphs individually reasonably-

sized geometry chunks (tiles), in parallel• Combine the results into a global portal graph

that can be quickly traversed• Solve visibility quickly in the runtime using this

graph

Page 23: Embarrassingly Parallel Computation for Occlusion Culling

Will this work?

• Portal generation– Is very hard, but possible to do automatically– Only local geometry needed→Pretty much an embarrassingly parallel problem

• Runtime– Not as simple as a PVS lookup, but still quite fast

Page 24: Embarrassingly Parallel Computation for Occlusion Culling

Simple example revisited

Page 25: Embarrassingly Parallel Computation for Occlusion Culling

Split geometry into tiles

Page 26: Embarrassingly Parallel Computation for Occlusion Culling

Tile 3Tile 2Tile 1Tile 0

Dispatch tiles to worker nodes

Page 27: Embarrassingly Parallel Computation for Occlusion Culling

Tile 3Tile 2Tile 1Tile 0

Generate portals

Page 28: Embarrassingly Parallel Computation for Occlusion Culling

Combine portal graph

Page 29: Embarrassingly Parallel Computation for Occlusion Culling

Runtime query: traverse portals

Page 30: Embarrassingly Parallel Computation for Occlusion Culling

Runtime

What did we do here?

• Essentially a map-reduce– Split scene into distributable tiles– Generate local portal graph for each tile– Combine results, link global portal graph

Scene

Map

Global portal graph

Visible objects

Tile 0

Tile 1

Tile n

Portals 0

Portals 1

Portals n

Redu

ce

Que

ry

... ...

Page 31: Embarrassingly Parallel Computation for Occlusion Culling

THE NEXT GENERATIONThe Future

Page 32: Embarrassingly Parallel Computation for Occlusion Culling

Turns out...

• Even the initial ”map” is too much for large game worlds

• A global graph of a vast world is too expensive in the runtime

• You need to support multiple versions of some chunks for dynamic content– Quite a combinatorial problem

→ Next-gen games require an even better solution!

Page 33: Embarrassingly Parallel Computation for Occlusion Culling

Runtime

So we did something like this

Graph A Visible objects

Tile 0

Tile 1

Tile 2

Tile 3

Portals 0

Portals 1

Portals 2

Portals 3

Tile n Portals n

Graph B

Que

ry

Visible objects

Que

ry

Com

bine

Com

bine

... ... ...

Page 34: Embarrassingly Parallel Computation for Occlusion Culling

Runtime

Got rid of ”map”

Graph A Visible objects

Tile 0

Tile 1

Tile 2

Tile 3

Portals 0

Portals 1

Portals 2

Portals 3

Tile n Portals n

Graph B

Que

ry

Visible objects

Que

ry

Com

bine

Com

bine

... ... ...

Page 35: Embarrassingly Parallel Computation for Occlusion Culling

Runtime

Split up ”reduce”, moved to runtime

Graph A Visible objects

Tile 0

Tile 1

Tile 2

Tile 3

Portals 0

Portals 1

Portals 2

Portals 3

Tile n Portals n

Graph B

Que

ry

Visible objects

Que

ry

Com

bine

Com

bine

... ... ...

Page 36: Embarrassingly Parallel Computation for Occlusion Culling

Questions?

[email protected]