coherent hierarchical culling: hardware occlusion queries made useful jiri bittner 1, michael wimmer...

26
Coherent Hierarchical Coherent Hierarchical Culling: Culling: Hardware Occlusion Hardware Occlusion Queries Made Useful Queries Made Useful Jiri Bittner Jiri Bittner 1 , Michael Wimmer , Michael Wimmer 1 , , Harald Piringer Harald Piringer 2 , Werner Purgathofer , Werner Purgathofer 1 1 Vienna University of Technology Vienna University of Technology 2 VRVis Vienna VRVis Vienna

Upload: brian-williams

Post on 21-Jan-2016

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

Coherent Hierarchical Coherent Hierarchical Culling:Culling:

Hardware Occlusion Hardware Occlusion Queries Made UsefulQueries Made Useful

Jiri BittnerJiri Bittner11, Michael Wimmer, Michael Wimmer11, , Harald PiringerHarald Piringer22, Werner Purgathofer, Werner Purgathofer11

11Vienna University of TechnologyVienna University of Technology22VRVis ViennaVRVis Vienna

Page 2: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

2Michael Wimmer Vienna University of Technology

Coherent Hierarchical CullingCoherent Hierarchical Culling

MotivationMotivation

RR RenderRender

QQ Occlusion QueryOcclusion Query

CC CullCull

CPU

GPU

time

Typical hardware occlusion culling scenarioTypical hardware occlusion culling scenario

R Q

R Q

R Q

R Q

C Q

Q

R

R

Waiting time

Page 3: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

3Michael Wimmer Vienna University of Technology

Occlusion Culling: Offline vs. OnlineOcclusion Culling: Offline vs. Online OfflineOffline

Global information about visibility (from region)Global information about visibility (from region)

-- Difficult to implementDifficult to implement

-- Accuracy and maintenance problemsAccuracy and maintenance problems

++ No runtime overheadNo runtime overhead OnlineOnline

Local information about visibility (from point)Local information about visibility (from point)

++ Easier to implementEasier to implement

++ Greater accuracy, easy maintenanceGreater accuracy, easy maintenance

-- Runtime overheadRuntime overhead

Page 4: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

4Michael Wimmer Vienna University of Technology

Online Occlusion CullingOnline Occlusion Culling Object space methodsObject space methods

-- Need complex geometric calculationsNeed complex geometric calculations(hard to handle detailed scenes)(hard to handle detailed scenes)

++ Do not require rasterizationDo not require rasterization

Image space methodsImage space methods

++ No geometric calculationsNo geometric calculations

(easier to handle detailed scenes)(easier to handle detailed scenes) -- Require rasterizationRequire rasterization

Page 5: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

5Michael Wimmer Vienna University of Technology

Hardware Occlusion CullingHardware Occlusion Culling Hardware is good at rasterization!Hardware is good at rasterization! Hardware counts rasterized fragmentsHardware counts rasterized fragments

But need not update frame bufferBut need not update frame buffer

NV/ARB_occlusion_queryNV/ARB_occlusion_query AsynchronousAsynchronous Allows multiple simultaneous occlusion queriesAllows multiple simultaneous occlusion queries

General algorithm idea:General algorithm idea: Render simple approximation first (bbox)Render simple approximation first (bbox)

invisibleinvisible: cull object: cull object visiblevisible: render object: render object

Page 6: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

6Michael Wimmer Vienna University of Technology

Hardware Occlusion CullingHardware Occlusion Culling AdvantagesAdvantages

Pixel-exactPixel-exact No explicit occluder renderingNo explicit occluder rendering Exploit rasterization power of GPUExploit rasterization power of GPU Easy to use (API calls)Easy to use (API calls)

ProblemsProblems Delay in availability of the resultsDelay in availability of the results Time to execute queriesTime to execute queries If fill-bound: only useful if several objects culledIf fill-bound: only useful if several objects culled

Page 7: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

7Michael Wimmer Vienna University of Technology

Hierarchical Stop&Wait (S&W)Hierarchical Stop&Wait (S&W)Front-to-back hierarchy traversalFront-to-back hierarchy traversal1. Issue visibility query for node1. Issue visibility query for node2. 2. Stop and WaitStop and Wait for result for result

InvisibleInvisible: cull the subtree: cull the subtree VisibleVisible: render or continue 1. recursively: render or continue 1. recursively

Advantage: Advantage: Hierarchy can cull huge subtreesHierarchy can cull huge subtrees

Problems:Problems: Waiting causes CPU stalls and GPU starvationWaiting causes CPU stalls and GPU starvation Huge rasterization costsHuge rasterization costs

(especially for large interior nodes)(especially for large interior nodes)

Page 8: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

8Michael Wimmer Vienna University of Technology

and and

RxRx Render object xRender object x

QxQx Query object xQuery object x

CxCx Cull object xCull object x

CPU

GPU

CPU StallsCPU Stalls GPU StarvationGPU Starvation

R1 Q2

R1 Q2

R2 Q3

R2 Q3

C3 Q4

Q4

R4

R4

time

Waiting time

Page 9: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

9Michael Wimmer Vienna University of Technology

Solution: Coherent Hierarchical CullingSolution: Coherent Hierarchical Culling

Scheduling based on temporal coherenceScheduling based on temporal coherence Skipping certain visibility testsSkipping certain visibility tests Immediate rendering of certain geometryImmediate rendering of certain geometry

Clever interleaving of queries and renderingClever interleaving of queries and rendering Maintaining a queue of running occlusion Maintaining a queue of running occlusion

queriesqueries

Design goal: easy implementationDesign goal: easy implementation

Page 10: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

10Michael Wimmer Vienna University of Technology

Coherent Hierarchical Culling (CHC)Coherent Hierarchical Culling (CHC)

RxRx Render object xRender object x

QxQx Query object xQuery object x

CxCx Cull object xCull object x

CPU R1 Q2

GPU R1 Q2

R2 Q3

R2 Q3

C3 Q4

Q4

R4

R4

visible in previous frameAssume independent occlusion

time

Page 11: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

11Michael Wimmer Vienna University of Technology

CHC Algorithm OutlineCHC Algorithm Outline Front-to-back hierarchy traversalFront-to-back hierarchy traversal1.1. Node handlingNode handling

Interior nodeInterior node Previously invisiblePreviously invisible: : issueissue visibility query visibility query Previously visiblePreviously visible: : continuecontinue 1. recursively 1. recursively

LeafLeaf IssueIssue visibility query visibility query Previously visiblePreviously visible: render : render immediatelyimmediately

2.2. Check availability of query resultsCheck availability of query results InvisibleInvisible: propagate visibility change: propagate visibility change VisibleVisible: render or continue 1. recursively: render or continue 1. recursively

Page 12: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

12Michael Wimmer Vienna University of Technology

Why Interleaving Works…Why Interleaving Works… Processing a node only depends on…Processing a node only depends on…

1.1. Front to back orderFront to back order

2.2. Results of queries for processed nodes where:Results of queries for processed nodes where:Previous frame: Previous frame: processed node processed node current node current node S&WS&W CHCCHC

visible visible visible visible yesyes nono

visible visible invisible invisible yesyes nono

invisible invisible visible visible yesyes nono

invisible invisible invisible invisible (different subtrees)(different subtrees) yesyes nono

invisible invisible invisible invisible (parent (parent child, refinement of visibility) child, refinement of visibility) yesyes yesyes

Page 13: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

13Michael Wimmer Vienna University of Technology

no queries for previously visible interior nodes

CHC: Hierarchy TraversalCHC: Hierarchy Traversal

1011

76

5

8

1

29

3

4

5

7 6 8

1011

12 13

assume no query dependencies

previously visible

previously invisible

front-to-back order

hidden regions: queries depend on parents 47

681213

109

511

3

Page 14: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

14Michael Wimmer Vienna University of Technology

CHC FeaturesCHC Features Reduction of CPU stalls and GPU Reduction of CPU stalls and GPU

starvationstarvation Interleaving queries with rendering Interleaving queries with rendering

previously visible geometrypreviously visible geometry

Reduction of the number of queriesReduction of the number of queries Avoids expensive redundant queries for Avoids expensive redundant queries for

interior nodesinterior nodes Size of tested regions adapts to visibilitySize of tested regions adapts to visibility

pull-up: occluded region growingpull-up: occluded region growing pull-down: visible region growingpull-down: visible region growing

Page 15: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

15Michael Wimmer Vienna University of Technology

Implementation IssuesImplementation Issues

Front-to-back traversalFront-to-back traversal Priority queue: Priority queue:

allows various hierarchical data structures allows various hierarchical data structures

Checking query resultsChecking query results glGetOcclusionQueryivNV glGetOcclusionQueryivNV

GL_PIXEL_COUNT_AVAILABLE_NVGL_PIXEL_COUNT_AVAILABLE_NV Very cheap operationVery cheap operation

Queries for previously visible nodesQueries for previously visible nodes Use actual geometry as occludeeUse actual geometry as occludee

(instead of bounding box)(instead of bounding box)

Page 16: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

16Michael Wimmer Vienna University of Technology

Further OptimizationsFurther Optimizations Conservative visibility testingConservative visibility testing

Assume visible node remains visible n framesAssume visible node remains visible n frames

++ Saves additional occlusion queriesSaves additional occlusion queries

Approximate visibilityApproximate visibility #visible pixels < threshold #visible pixels < threshold node invisible node invisible

++ Saves rendered geometrySaves rendered geometry

-- Produces image errorsProduces image errors

Page 17: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

17Michael Wimmer Vienna University of Technology

Results – Test ScenesResults – Test Scenes

Teapots11.5M triangles21k kD-tree nodes

City1M triangles

33k kD-tree nodes

Power plant12.7M triangles18.7k kD-tree nodes

Page 18: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

18Michael Wimmer Vienna University of Technology

Results – SpeedupResults – Speedup

0

1

2

3

4

5

6

7

Teapots City Powerplant

VFC

S&W

CHC

Ideal

Ideal: zero overhead – render only visible geometry

Page 19: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

19Michael Wimmer Vienna University of Technology

Results – SummaryResults – Summary

Comparison to hierarchical S&WComparison to hierarchical S&W #queries reduced by almost 2#queries reduced by almost 2 Times for stalls reduced by 20-60xTimes for stalls reduced by 20-60x

(to 0.18 –1.31ms)(to 0.18 –1.31ms)

Close to ideal algorithm! Close to ideal algorithm! Only 2–9ms slowerOnly 2–9ms slower Overhead due to query timeOverhead due to query time

Page 20: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

20Michael Wimmer Vienna University of Technology

Results – TeapotResults – Teapot

Page 21: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

21Michael Wimmer Vienna University of Technology

Results – CityResults – City

Page 22: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

22Michael Wimmer Vienna University of Technology

Results – PowerplantResults – Powerplant

Page 23: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

23Michael Wimmer Vienna University of Technology

Optimization ResultsOptimization Results Conservative culling, Conservative culling,

2 frames assumed visible2 frames assumed visible Good for deep hierarchies with simple leaf Good for deep hierarchies with simple leaf

geometrygeometry Further speedup up to 21%Further speedup up to 21%

Approximate culling, Approximate culling, 25 pixels threshold25 pixels threshold Good for scenes with complex visible geometryGood for scenes with complex visible geometry Further speedup up to 33%Further speedup up to 33%

Page 24: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

24Michael Wimmer Vienna University of Technology

ConclusionConclusion Efficient scheduling of hardware occlusion Efficient scheduling of hardware occlusion

queriesqueries Greatly reduces CPU stalls and GPU starvationGreatly reduces CPU stalls and GPU starvation Reduces number of required queriesReduces number of required queries

Simple to implementSimple to implement Arbitrary hierarchical data structureArbitrary hierarchical data structure Speedup ~4 over VFCSpeedup ~4 over VFC Close to ideal solution for tested scenesClose to ideal solution for tested scenes

Watch out for GPU Gems IIWatch out for GPU Gems II

Page 25: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

25Michael Wimmer Vienna University of Technology

Thanks for Your AttentionThanks for Your Attention

Page 26: Coherent Hierarchical Culling: Hardware Occlusion Queries Made Useful Jiri Bittner 1, Michael Wimmer 1, Harald Piringer 2, Werner Purgathofer 1 1 Vienna

26Michael Wimmer Vienna University of Technology

previously visiblepreviously visible: continue 1. recursively: continue 1. recursivelypreviously visiblepreviously visible: render: render

CHC: ExampleCHC: Examplepreviously visiblepreviously visible: issue query + render: issue query + renderquery result available: continue 1. recursivelypull-up invisibilityfinal classificationpreviously invisible: queryquery result available: renderquery result available: cull

queryqueue

GPU

1

29

1011 3

4 76

5

8

R4

5

6

Q5 Q6/R6

7

Q7

8

Q8 R7

10

Q10/R10

11

Q11

issued queries

R6Q6/

query result available: mark visible

Q10