real-time mesh simplification using the gpu christopher decoro natasha tatarchuk 3d application...

36
Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

Post on 19-Dec-2015

234 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

Real-time Mesh Simplification Using the GPU

Christopher DeCoroNatasha Tatarchuk3D Application Research Group

Page 2: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

2

Introduction

• Implement Mesh Decimation in real-time• Utilizes new Geometry Shader stage of GPU

• Achieves a 20x speedup over CPU

Page 3: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

3

Project Motivation

• Massive Increases in submitted geometry• Geometry rendered per shadow map (6x for cubemap!)• Not always needed at highest resolution

• Geometry not always known at build-time• Dynamically-skinned objects only finalized at run-time• May be customized to users machine based on capabilities, would

need to be adapted at program load time• Could be dynamically generated per level, need to be adapted at

level load time• Simplification therefore needs to be fast (or even real-time)

Also, just as importantly…

• We want applications that exercise & stress GS/GPU• Evaluate new capabilities of the GPU• Learn how to adapt previously CPU-bound algorithms• Develop GPU-centric methodologies

• Identify future feature set for GS/GPU as a whole• Limitations still exist – which should be addressed?

Page 4: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

4

Contributions

• Mapping of Decimation to GPU• 20x speedup vs. CPU• Enables load-time or real-time usage

• Detail Preservation by Non-linear Warping• Also applicable to CPU out-of-core decimation

• General-purpose GPU Octree• Adaptive decimation w/ constant memory• Applications not limited to simplification: collision

detection, frustum culling, etc.

Page 5: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

5

Outline

• Project Introduction and Motivation• Background

• Decimation with Vertex Clustering• Geometry Shaders in Direct3D 10

• Geometry Shader-based Vertex Clustering• Adaptive Simplification w/ Non-linear Warps• Probabalistic Octrees on the GPU

Page 6: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

6

Vertex Clustering

• Reduces mesh resolution• High-res mesh as input • Low-res as output

• All implemented on the GPU• Ideal for processing streamed

out data• Useful when rendering multiple

times (i.e. shadows)• Can handle enormous models

from scanned data

• Based on “Out-of-Core Simplification of Large Polygonal Models,” P. Lindstrom, 2000

Figure from [Lindstrom 2000]

Page 7: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

7

Previous Rendering Pipeline

• Vertex Shaders and Pixel Shaders• Limits 1 output per 1 input

• No culling of triangles for decimation• Fixed destination for each stage

• Result meshes cannot be (easily) saved and reused

Page 8: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

8

DirectX10 Rendering Pipeline

• Geometry Shader in between VS & PS• Called for each primitive (usually triangle)

• Able to access all vertices of a primitive• Can compute per-face quantities

• Breaks 1:1 input-output limitation• Allows triangles to be culled from pipeline

• Allows stream-out of processed geometry• Decimated meshes can easily be saved and reused

Page 9: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

9

Outline

• Project Introduction and Motivation• Background• Geometry Shader-based Vertex Clustering

• Overview• Quadric Generation• Optimal Position Computation• Final Clustering

• Adaptive Simplification w/ Non-linear Warps• Probabilistic Octrees on the GPU

Page 10: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

10

Algorithm Overview

• Start with the input mesh• Shown divided into clusters

• Pass 1: Compute the quadric map from mesh• Use GS to compute quadric• Accumulate in cluster map, an RT used as large array

• Pass 2: For each cluster, compute optimal position• Solves a linear system given by quadrics

• Pass 3: Collapse each vertex to representative• 9x9x9 grid shown

Model Courtesy of Stanford Graphics Lab

Page 11: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

11

Vertex Clustering Pipeline

• Pass 1: Create Quadric Map• Input: Original Mesh• Computation:

• Determine plane equation, face quadrics for triangle• Compute the cluster and address of each vertex• Pack quadric into RT at appropriate address

• Output: Render Targets representing clusters with packed quadrics and average positions

Page 12: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

12

Quadric Map Implementation

• Start with the input mesh• Shown divided into clusters

• Compute the quadric map from mesh• Use GS to compute quadric• Accumulate in cluster map, an RT used as large array

• For each cluster, compute optimal position

• Collapse each vertex to representative• 9x9x9 grid shown

//Map a point to its location in the cluster map array

float2 writeAddr( float3 vPos )

{

uint iX = clusterId(vPos) / iClusterMapSize.x;

uint iY = clusterId(vPos) % iClusterMapSize.y;

return expand( float2(iX,iY)/float(iClusterMapSize.x) ) + 1.0/iClusterMapSize.x;

}

[maxvertexcount(3)]

void main( triangle ClipVertex input[3], inout PointStream<FragmentData> stream )

{

//For the current triangle, compute the area and normal

float3 vNormal = (cross( input[1].vWorldPos - input[0].vWorldPos, input[2].vWorldPos - input[0].vWorldPos ));

float fArea = length(vNormal)/6;

vNormal = normalize(vNormal);

//Then compute the distance of plane to the origin along the normal

float fDist = -dot(vNormal, input[0].vWorldPos);

//Compute the components of the face quadrics using the plane coefficients

float3x3 qA = fArea*outer(vNormal, vNormal);

float3 qb = fArea*vNormal*fDist;

float qc = fArea*fDist*fDist;

//Loop over each vertex in input triangle primitive

for(int i=0; i<3; i++)

{

//Assign the output position in the quadric map

FragmentData output;

output.vPos = float4(writeAddress(input[i].vPos),0,1);

//Write the quadric to be accumulated in the quadric map

packQuadric( qA, qb, qc, output );

stream.Append( output );

}

}

Page 13: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

13

Vertex Clustering Pipeline

• Pass 2: Find Optimal Positions• Input: Cluster Map Render Targets,

Full-screen Quad• Computation:

• Determine if we can solve for optimal position• If not, fall back to vertex average

• Output: Render Targets representing clusters with optimal position of representative vtx.

Page 14: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

14

Optimal Positions

• For each cell, need representative

• Naïve solution: Use averages• Looks very blocky• Does not consider the original faces,

only vertices

• Implemented solution: Use quadrics• Quadrics are a measure of surface• We can solve for optimal position

Original Mesh

Simplified w/ Averages

Simplified w/ Quadrics

Page 15: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

15

Optimal Positions Implementation

• Start with the input mesh• Shown divided into clusters

• Compute the quadric map from mesh• Use GS to compute quadric• Accumulate in cluster map, an RT used as large array

• For each cluster, compute optimal position

• Collapse each vertex to representative• 9x9x9 grid shown

float3 optimalPosition(float2 vTexcoord)

{

float3 vPos = float3(0,0,0);

float4 dataWorld, dataA0, dataB, dataA1;

//Read the vertex average from the cluster map

dataWorld = tClusterMap0.SampleLevel( sClusterMap0, vTexcoord, 0 );

int iCount = dataWorld.w;

//Only compute optimal position if there are vertices in this cluster

if( iCount != 0 )

{

//Read all the data from the clustermap to reconstruct the quadric

dataA0 = tClusterMap1.SampleLevel( sClusterMap1, vTexcoord, 0 );

dataA1 = tClusterMap2.SampleLevel( sClusterMap2, vTexcoord, 0 );

dataB = tClusterMap3.SampleLevel( sClusterMap3, vTexcoord, 0 );

//Then reassemble the quadric

float3x3 qA = { dataA0.x, dataA0.y, dataA0.z,

dataA0.y, dataA0.w, dataA1.x,

dataA0.z, dataA1.x, dataA1.y };

float3 qB = dataB.xyz;

float qC = dataA1.z;

//Determine if inverting A is stable, if so, compute optimal position

//If not, default to using the average position

const float SINGULAR_THRESHOLD = 1e-11;

if(determinant(quadricA) > SINGULAR_THRESHOLD )

vPos = -mul( inverse(quadricA), quadricB );

else

vPos = dataWorld.xyz / dataWorld.w;

}

return vPos;

}

Page 16: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

16

Vertex Clustering Pipeline

• Pass 3: Decimate Mesh• Input: Cluster Map Render Targets, Input Mesh• Computation:

• Find clusters, Remap vertices to representative• Determine if triangle becomes degenerate• If not, stream output new triangle at new positions

• Output: Low-resolution Mesh

Page 17: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

17

Final Clustering Implementation

• Start with the input mesh• Shown divided into clusters

• Compute the quadric map from mesh• Use GS to compute quadric• Accumulate in cluster map, an RT used as large array

• For each cluster, compute optimal position

• Collapse each vertex to representative• 9x9x9 grid shown

[maxvertexcount(3)]

void main( triangle ClipVertex input[3], inout TriangleStream<StreamoutVertex> stream )

{

//Only emit a triangle if all three vertices are in diff. clusters

if( all_different(clusterId(input[0].vPos),

clusterId(input[1].vPos),

clusterId(input[2].vPos)) )

{

for(int i=0; i<3; i++)

{

//Lookup optimal position in the RT computed in Step 2

vPos = tClusterMap3.SampleLevel( sClusterMap3, readAddr(input[0].vPos), 0 );

//Output vertex to stream out

stream.Append( vPos );

}

}

return;

}

Page 18: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

18

Vertex Clustering Pipeline

• Alternate Pass 2: Downsample RTs• Input and Output as before• Computation:

• Collapse 8 adjacent cells by adding cluster quadrics• Compute optimal position for 2x larger cell

• Create multiple lower levels of detail without repeatedly incurring Pass 1 overhead (~75%)• Pass 3 can use previous streamed-out mesh• Lower levels of detail almost free

Page 19: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

19

Timing Results

• Recorded Time Spent in Decimation• GPU: AMD/ATI XXX• CPU: 3Ghz Intel P4

• Significant Improvement over CPU• Averages ~20x speedup on large models• Scales linearly

Page 20: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

20

More Results

• Models shown at varying resolutions

Models Courtesy of Stanford Graphics Lab

Buddha, 45x130x45 grid

Bunny, 90x90x90 grid Dragon, 100x60x20 grid

Page 21: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

21

More Results

• Models shown at varying resolutions

Buddha, 20x70x20 grid

Bunny, 60x60x60 grid Dragon, 50x25x10 grid

Page 22: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

22

More Results

• Models shown at varying resolutions

Buddha, 10x40x10 grid

Bunny, 20x20x20 grid Dragon, 30x15x6 grid

Page 23: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

23

Outline

• Project Introduction and Motivation• Background• Geometry Shader-based Vertex Clustering• Adaptive Simplification w/ Non-linear Warps

• View-dependent Simplification• Region-of-interest Simplification

• Probabalistic Octrees on the GPU

Page 24: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

24

View-dependent Simplification

• Standard simplification does not consider view• Preserves uniform amount of detail all over

• Simplify in post-projection space to use view• Preserves more detail closer to viewer (left)

View Direction

Page 25: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

25

Arbitrary Warping Functions

• View Transform special case of nonlinear warp• Can use arbitrary warp for adaptive simplification

• Regular grids allow data-independence, parallelism• Constant time mapping from position to grid cell• Maps well onto GPU render targets• Forces uniform resolution throughout output mesh

• Irregular geometry grids allow non-uniform output• Cells can be larger/smaller in certain regions• Corresponds to lower/greater output triangle density• We lose constant-time mapping of position to cell

• Solution: apply inverse warp to vertices• Equivalent to applying forward warp to grid cells• Clustering still performed in uniform grid• Flexibility of irregular geometry w/ speed of regular• One proposal: Gaussian weighting functions

Page 26: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

26

Region-of-Interest Specification

• Importance specified w/ biased Gaussian

• Highest preservation at mean• Width of region given by sigma• Bias prevents falloff to zero

• Integrate to produce corresponding warp function

(Derivation given in paper)

Page 27: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

27

Region-of-Interest Specification

• Warping allows non-uniform/adaptive level of detail

• Head has most semantic importance

• Detail lost in uniform simplification

• We can warp first to expand center

• Equivalent to grid density increasing

• Adaptive simplification preserves head detail

Page 28: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

28

Outline

• Project Introduction and Motivation• Background• Geometry Shader-based Vertex Clustering• Adaptive Simplification w/ Non-linear Warps• Probabalistic Octrees on the GPU

• Motivation• Probablistic Storage• Adaptive Simplification• Randomized Construction • Results

Page 29: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

29

Octrees - Motivation

• Basic grid• regular geometry, regular topology• Limitations as we discussed

• Warped grid• irregular geometry, regular topology• Much improved; however, we can do better• May be difficult to know required detail a priori

• CPU Solution: Multi-resolution grid (i.e. octree)• Irregular topology (irregular geometry w/ warping)• Store grid at many levels of detail• Measure error at each level, use coarse as possible• Efficiency requires dynamic memory, storage O(L3)• Requires O(L) writes to produce correct tree

Page 30: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

30

GPU Solution – Probabilistic Octrees

• Proposal• Successful storage not guaranteed, w/ Prob. <= 1• However, storage failure detected on read

• Assumptions allow much flexibility• We can have unlimited depth tree (but lim P=0)• Sparse storage of data

• Require conservative algorithms for task• Vertex clustering (conveniently!) is such an example• So is collision detection and frustum culling

• Only studied in brief in this paper, we would like to analyze more for future work

Page 31: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

31

Implementation Details

• Storage: Spatial Hashes• Map (position,level) to cell, cell hashed to index• Additive blending for quadric accumulation (app-specific)• Max blending to store (key,-key) with data (i.e. min_key,max_key)

• Retrieval:• Again map (position, level) to index• Retrieve key value from data, collision iff min_key != max_key• Use parent level, which will have higher storage probability

• Usage for Adaptive Simplification• For each vertex, find maximum error level below some threshold• Use this as the representative vertex• Can perform binary search along path• Conservative, because we can maintain validity even when using

parent of optimal node (just adds some error)

Page 32: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

33

Probabilistic Octree Results

• Adaptive simplification shown on bunny (~4K tris)• Preserves detail around leg, eyes and ears• Simplifies significantly on large, flat regions

• Using 8% of storage of total tree, we have < 10% collisions• Only ~20% performance hit vs. standard grids

Page 33: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

34

Conclusions

• GS is a powerful tool for interactive graphics

• Amplification and decimation are important applications of GS

Page 34: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

35

Geometry Shaders and Other Feature Wish-List

• Bring back the Point fill mode• Important for scatter in GPGPU applications

• Data amplification improvements with indexed stream out• Avoiding triangle soups very non-trivial

• Efficient indexable temps

Page 35: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

36

Thanks a lot!

• Various people here…

Page 36: Real-time Mesh Simplification Using the GPU Christopher DeCoro Natasha Tatarchuk 3D Application Research Group

37

Questions?