using gpus for collision detection, recent advances in real-time collision and proximity...

53
USING GPUS FOR COLLISION DETECTION AMD Takahiro Harada

Upload: takahiro-harada

Post on 02-Jul-2015

896 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

USING GPUS FOR COLLISION DETECTION

AMD Takahiro Harada

Page 2: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

2 | Eurographics 2012 | Takahiro Harada

GPUS

Page 3: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

3 | Eurographics 2012 | Takahiro Harada

GPU RAW PERFORMANCE

§ High memory bandwidth

§ High TFLOPS

§ Radeon HD 7970

–  32x4 SIMD engines

–  64 wide SIMD

–  3.79 TFLOPS

–  264 GB/s

0

50

100

150

200

250

300

0

0.5

1

1.5

2

2.5

3

3.5

4

4890 5870 6970 7970

TFLOPS

GB/s

Page 4: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

4 | Eurographics 2012 | Takahiro Harada

RAW PERFORMANCE IS HIGH, BUT

§ GPU performs good only if used correctly

–  Divergence

–  ALU op / Memory op ratio

–  etc

§ Some algorithm requires operation GPUs are not good at

Page 5: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

5 | Eurographics 2012 | Takahiro Harada

EX) HASH GRID (LINKED LIST)

§ Low ALU op / Mem op ratio

§ Spatial query using hash grid

do

{

write( node );

node = node->next;

}while( node )

§ CPUs is better for this 0

10

20

30

40

50

60

Query

HD7970 HD6970 HD5870 HD6870 PhenomXII6(1) PhenomXII6(1) PhenomXII6(3) PhenomXII6(4) PhenomXII6(5) PhenomXII6(6)

CPU

Page 6: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

6 | Eurographics 2012 | Takahiro Harada

EX) HASH GRID (LINKED LIST)

§ Low ALU op / Mem op ratio

§ Spatial query using hash grid

–  while()

–  { § Fetch element

§ Copy

–  } § CPUs is better for this

0

10

20

30

40

50

60

Query

HD7970 HD6970 HD5870 HD6870 PhenomXII6(1) PhenomXII6(1) PhenomXII6(3) PhenomXII6(4) PhenomXII6(5) PhenomXII6(6)

GPU

Page 7: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

7 | Eurographics 2012 | Takahiro Harada

REASON OF PERFORMANCE JUMP

§ Key architectural changes

–  No VLIW

–  Improved memory system

Page 8: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

8 | Eurographics 2012 | Takahiro Harada

GPU COLLISION DETECTION

Page 9: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

9 | Eurographics 2012 | Takahiro Harada

COLLISION DETECTION WAS HARD FOR GPUS

§ Old GPUs were not as flexible as today’s GPUs

–  Fixed function

–  Programmable shader (Vertex shader, pixel shader)

§ Grid based fluid simulation

§ Cloth simulation

Crane et al., Real-Time Simulation and Rendering of 3D Fluids, GPU Gems3 (2007)

Page 10: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

10 | Eurographics 2012 | Takahiro Harada

UNIFORM GRID

§ Simplest but useful data structure

§ Pre Compute Language

–  Vertex shader, pixel shader

–  Vertex texture fetch -> Random write

–  Depth test, blending etc -> Atomics

§ Now it is very easy

–  Random write and atomics are supported

struct Cell { u32 m_counter; u32 m_data[N ]; }

Page 11: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

11 | Eurographics 2012 | Takahiro Harada

UNIFORM GRID

§ Simplest but useful data structure

§ Pre Compute Language

–  Vertex shader, pixel shader

–  Vertex texture fetch -> Random write

–  Depth test, blending etc -> Atomics

§ Now it is very easy

–  Random write and atomics are supported

2

5 8

Page 12: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

12 | Eurographics 2012 | Takahiro Harada

WITH UNIFORM GRID

§ Often used for particle-based simulation

§ Other steps are embarrassingly parallel

§ Distinct Element Method (DEM)

§ Particles can be extended to…

Page 13: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

13 | Eurographics 2012 | Takahiro Harada

SMOOTHED PARTICLE HYDRODYNAMICS

§ Fluid simulation

§ Solving the Navier-Stokes equation on particles

Harada et al., Smoothed Particle Hydrodynamics on GPUs, CGI (2007)

Page 14: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

14 | Eurographics 2012 | Takahiro Harada

RIGID BODY SIMULATION

§ Represent a rigid body with a set of particles

§ Rigid body collision = Particle collision

Harada et al., Real-time Rigid Body Simulation on GPUs, GPU Gems3 (2007)

Page 15: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

15 | Eurographics 2012 | Takahiro Harada

COUPLING RIGID BODIES + FLUID

Page 16: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

16 | Eurographics 2012 | Takahiro Harada

CLOTH COLLISION DETECTION

§ Particle v.s. triangle mesh collision

§ Dual uniform grid

–  1st grid: particles

–  2nd grid: triangles

Harada et al., Real-time Fluid Simulation Coupled with Cloth, TPCG (2007)

Page 17: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

17 | Eurographics 2012 | Takahiro Harada

SEVERAL PHYSICS PROBLEMS WERE SOLVED

Particle Simulations

Granular Materials Fluids Rigid Bodies

Cloth

Coupling

Page 18: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

18 | Eurographics 2012 | Takahiro Harada

PROBLEM SOLVED??

§ Not yet

§ Uniform grid is not a perfect solution

§ More complicated collision detection is necessary

–  E.g., rigid body simulation

–  Broad-phase collision detection

–  Narrow-phase collision detection

Page 19: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

19 | Eurographics 2012 | Takahiro Harada

BROAD-PHASE COLLISION DETECTION

§ Sweep & prune

–  Sort end points

–  Sweep sorted list

§ Optimizations

–  Split sweep of a large object

–  Workspace subdivision

–  Cell subdivision

Liu et al., Real-time Collision Culling of a Million Bodies on Graphics Processing Units, TOG(2010)

Page 20: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

20 | Eurographics 2012 | Takahiro Harada

NARROW-PHASE COLLISION DETECTION

§ Variety of work

–  So many shape combinations -> divergence

§ Unified shape representation

§ Each collision computation is parallelizable

Page 21: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

21 | Eurographics 2012 | Takahiro Harada

MULTIPLE GPU COLLISION DETECTION

Page 22: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

22 | Eurographics 2012 | Takahiro Harada

MULTIPLE GPU PROGRAMMING

Page 23: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

23 | Eurographics 2012 | Takahiro Harada

Parallelization using Multiple GPUs

• Two levels of parallelization

• 1GPU

• Multiple GPUs

Memory

MemoryMemoryMemory

Page 24: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

24 | Eurographics 2012 | Takahiro Harada

Page 25: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

25 | Eurographics 2012 | Takahiro Harada

PARTICLE SIMULATION ON

MULTIPLE GPUS

Page 26: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

26 | Eurographics 2012 | Takahiro Harada

Page 27: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

27 | Eurographics 2012 | Takahiro Harada

Page 28: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

28 | Eurographics 2012 | Takahiro Harada

Data Management

• Not all the data have to be sent

• Data required for the computation has to be sent

• Two kinds of particles have to be sent to adjacent GPU

1. Escaped particles : Particles get out from adjacent subdomain (adjacent GPU will be responsible for these particles in the next time step)

2. Ghost particles in the ghost region

Page 29: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

29 | Eurographics 2012 | Takahiro Harada

Page 30: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

30 | Eurographics 2012 | Takahiro Harada

Page 31: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

31 | Eurographics 2012 | Takahiro Harada

Data Transfer between GPUs

• No direct data transfer is provided on current hardwares

• Transfer via main memory

• The buffers equal to the number of connectors are allocated

• Each GPU writes data to the defined location at the same time

• Each GPU read data of the defined location at the same time

GPU0 GPU1 GPU2 GPU3

Main Memory

Page 32: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

32 | Eurographics 2012 | Takahiro Harada

Computation of 1 Time Step

GPU0 GPU1 GPU2 GPU3

Main Memory

ComptForce

GPU0ComptForce

GPU1ComptForce

GPU2ComptForce

GPU3

Update Update Update Update

PrepareSend

PrepareSend

PrepareSend

PrepareSend

Send Send Send Send

Synchronization

Receive Receive Receive Receive

PostReceive

PostReceive

PostReceive

PostReceive

Page 33: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

33 | Eurographics 2012 | Takahiro Harada

Environment

• 4GPU server (Simulation) + 1GPU (Rendering)

• 1M particles

• 6GPU server boxes (Simulation) + 1GPU (Rendering) @ GDC2008

• 3M particles

GPUGPU

GPU

GPU

GPU

GPU

GPU

Simulation GPUs

GPUGPU

GPU

GPU

GPU

Simulation GPUs

Page 34: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

34 | Eurographics 2012 | Takahiro Harada

MOVIE

Harada et al., Massive Particles: Particle-based Simulations on Multiple GPUs, SIG Talk(2008)

Page 35: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

35 | Eurographics 2012 | Takahiro Harada

Results

Number of Particles (103 particles)

0

10

20

30

40

50

60

70

80

90

100

0 200 400 600 800 1000 1200

Total (1GPU)Sim (2GPU)Total (2GPU)Sim (4GPU)Total (4GPU)

Com

puta

tion T

ime

(ms)

Page 36: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

36 | Eurographics 2012 | Takahiro Harada

PROBLEM SOLVED??

§ Not yet

§ Most of the problems had identical object size

–  e.g., Particles

§ The reason is because of GPU architecture

–  Not designed to solve non-uniform problem

Page 37: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

37 | Eurographics 2012 | Takahiro Harada

HETEROGENEOUS COLLISION DETECTION

Page 38: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

38 | Eurographics 2012 | Takahiro Harada

§ Large number of particles

§ Particles with identical size

–  Work granularity is almost the same

–  Good for the wide SIMD architecture

PARTICLE BASED SIMULATION ON THE GPU

Harada et al. 2007

Page 39: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

39 | Eurographics 2012 | Takahiro Harada

MIXED PARTICLE SIMULATION

§ Not only small particles

§ Difficulty for GPUs

–  Large particles interact with small particles

–  Large-large collision

Page 40: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

40 | Eurographics 2012 | Takahiro Harada

CHALLENGE

§ Non uniform work granularity

–  Small-small(SS) collision

§ Uniform, GPU

–  Large-large(LL) collision

§ Non Uniform, CPU

–  Large-small(LS) collision

§ Non Uniform, CPU

Page 41: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

41 | Eurographics 2012 | Takahiro Harada

FUSION ARCHITECTURE

§ CPU and GPU are:

–  On the same die

–  Much closer

–  Efficient data sharing

§ CPU and GPU are good at different works

–  CPU: serial computation, conditional branch

–  GPU: parallel computation

§ Able to dispatch works to:

–  Serial work with varying granularity → CPU

–  Parallel work with the uniform granularity → GPU

Page 42: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

42 | Eurographics 2012 | Takahiro Harada

METHOD

Page 43: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

43 | Eurographics 2012 | Takahiro Harada

TWO SIMULATIONS

§ Small particles

§ Large particles

Build Acc. Structure

SS Collision

S Integration

Build Acc. Structure

LL Collision

L Integration

LS

C

olli

sio

n

Position Velocity Force Grid

Position Velocity Force

Page 44: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

44 | Eurographics 2012 | Takahiro Harada

§ Small particles

§ Large particles

Uniform Work

Non Uniform Work

CLASSIFY BY WORK GRANULARITY

Build Acc. Structure

SS Collision

S Integration

L Integration

Position Velocity Force Grid

Position Velocity Force LL

Collision LS

Collision Build

Acc. Structure

Page 45: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

45 | Eurographics 2012 | Takahiro Harada

§ Small particles

§ Large particles

GPU

CPU

CLASSIFY BY WORK GRANULARITY, ASSIGN PROCESSOR

Build Acc. Structure

SS Collision

S Integration

L Integration

Position Velocity Force Grid

Position Velocity Force LL

Collision LS

Collision Build

Acc. Structure

Page 46: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

46 | Eurographics 2012 | Takahiro Harada

§ Small particles

§ Large particles

§ Grid, small particle data has to be shared with the CPU for LS collision

–  Allocated as zero copy buffer

GPU

CPU

DATA SHARING

Build Acc. Structure

SS Collision

S Integration

L Integration

Position Velocity Force Grid

Position Velocity Force LL

Collision Build

Acc. Structure

Position Velocity Grid

Force

LS Collision

Page 47: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

47 | Eurographics 2012 | Takahiro Harada

§ Small particles

§ Large particles

§ Grid, small particle data has to be shared with the CPU for LS collision

–  Allocated as zero copy buffer

GPU

CPU

SYNCHRONIZATION

Position Velocity Force Grid

Position Velocity Force

SS Collision

S Integration

L Integration

LL Collision

Position Velocity Grid

Force

Syn

chro

niz

atio

n

LS Collision

Build Acc. Structure

Build Acc. Structure

Syn

chro

niz

atio

n

Build Acc. Structure

Build Acc. Structure

Page 48: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

48 | Eurographics 2012 | Takahiro Harada

GPU

CPU

VISUALIZING WORKLOADS

Build Acc. Structure

SS Collision

S

Inte

gra

tio

n Position

Velocity Force Grid

Position Velocity Force LL

Collision LS

Collision

Syn

chro

niz

atio

n

L

Inte

gra

tio

n

§ Small particles

§ Large particles

§ Grid construction can be moved at the end of the pipeline

–  Unbalanced workload

Page 49: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

49 | Eurographics 2012 | Takahiro Harada

§ Small particles

§ Large particles

§ To get better load balancing

–  The sync is for passing the force buffer filled by the CPU to the GPU

–  Move the LL collision after the sync

GPU

CPU

LOAD BALANCING

Build Acc. Structure

SS Collision

S

Inte

gra

tio

n Position

Velocity Force Grid

Position Velocity Force LL

Collision

Syn

chro

niz

atio

n

L

Inte

gra

tio

n

LS Collision

Page 50: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

50 | Eurographics 2012 | Takahiro Harada

MULTI THREADING (4 THREADS)

Page 51: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

51 | Eurographics 2012 | Takahiro Harada

OPTIMIZATION2: IMPROVING SMALL-SMALL COLLISION

GPU

Build Acc. Structure

SS Collision

S

Inte

g.

CPU0

CPU1

CPU2

LS Collision

LS Collision

LS Collision

Syn

chro

niz

atio

n

Mer

ge

Mer

ge

Mer

ge

LL

C

oll.

L

Inte

g.

Syn

chro

niz

atio

n

S Sorting

S Sorting

S Sorting

Syn

chro

niz

atio

n

Harada, Heterogeneous Particle-Based Simulation, SIG ASIA Talk(2011)

Page 52: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

52 | Eurographics 2012 | Takahiro Harada

DEMO

GP

U W

ork

CP

U W

ork

Page 53: Using GPUs for Collision detection, Recent Advances in Real-Time Collision and Proximity Computations for Games and Simulations (EUROGRAPHICS 2012)

53 | Eurographics 2012 | Takahiro Harada

QUESTIONS?