enhancing locality in ray tracing algorithms 612 presentations by vidhyashankar venkataraman...

45
Enhancing locality in Enhancing locality in ray tracing algorithms ray tracing algorithms 612 presentations by 612 presentations by Vidhyashankar Vidhyashankar Venkataraman Venkataraman Biswanath Panda Biswanath Panda

Upload: verity-fowler

Post on 02-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Enhancing locality in ray Enhancing locality in ray tracing algorithmstracing algorithms

612 presentations by612 presentations by

Vidhyashankar VenkataramanVidhyashankar Venkataraman

Biswanath PandaBiswanath Panda

IntroductionIntroduction

Two-lecture seriesTwo-lecture series

We will be discussing two methods to We will be discussing two methods to preserve locality preserve locality By processing data groups that are likely to be By processing data groups that are likely to be

accessed at the same timeaccessed at the same time

Today : A locality-aware algorithm in ray Today : A locality-aware algorithm in ray tracingtracing

What is Computer Graphics What is Computer Graphics (CG)? (CG)?

Generating imagesGenerating images Lots of cool ones in this talk!Lots of cool ones in this talk!

Deals withDeals with Geometric modeling : The math and physics Geometric modeling : The math and physics Rendering : Model to imagesRendering : Model to images Animation : Time dependent behavior of Animation : Time dependent behavior of

objectsobjects

Applications in games, real world Applications in games, real world simulations, CADsimulations, CAD

Rendering an imageRendering an image

Produce scene Produce scene image on the image on the image planeimage plane

Three parts:Three parts: Geometry ModelingGeometry Modeling

Illumination of Illumination of objectsobjects

Surface complexity : Surface complexity : TextureTexture

1) Modeling geometry1) Modeling geometry

Regular objects Regular objects easy to representeasy to represent Eg. Sphere (R,x,y,z)Eg. Sphere (R,x,y,z)

Complicated Complicated objects through a objects through a ‘mesh’ of polygons‘mesh’ of polygons

Millions of Millions of primitives for a primitives for a single scenesingle scene

2) Illumination modeling : 2) Illumination modeling : ShadingShading

Lighting of objects (shading)Lighting of objects (shading)

Light energy absorbed, Light energy absorbed, reflected or transmittedreflected or transmitted Degree varies with nature of Degree varies with nature of

each objecteach object Expressed for R,G,BExpressed for R,G,B

Various aspects to think ofVarious aspects to think of Diffuse and specular lightingDiffuse and specular lighting RefractionRefraction ShadowsShadows

Mathematical models Mathematical models availableavailable

Global IlluminationGlobal Illumination

3) Surface complexity - 3) Surface complexity - TextureTexture

To represent surface To represent surface roughnessroughness

The ‘jaggedness’The ‘jaggedness’

Texture map : Simple 2-D to 3-Texture map : Simple 2-D to 3-D surfaceD surface

Can add geometric detailCan add geometric detail Difficult with polygons Difficult with polygons

Also used to represent Also used to represent complicated surfacescomplicated surfaces

Eg: MarblesEg: Marbles Reflection of a scene on a Reflection of a scene on a

complex polished surfacecomplex polished surface

Storage Complexity?Storage Complexity? 100s of KB to 100s of MB!100s of KB to 100s of MB!

More pictures…More pictures…

Rendering an imageRendering an image

Process of converting 3-D scene to actual Process of converting 3-D scene to actual imageimage Projection of the 3-D objects onto an image planeProjection of the 3-D objects onto an image plane

Global illumination : more realisticGlobal illumination : more realistic

Various methods availableVarious methods available Ray-tracingRay-tracing Scan-line conversionScan-line conversion

Ray tracingRay tracing Introduced in 1980 by Turner WhittedIntroduced in 1980 by Turner Whitted

First global illumination algorithmFirst global illumination algorithm

Insight : To find the color of each pixel : BacktracingInsight : To find the color of each pixel : Backtracing Trace rays from eye (pixel) into sceneTrace rays from eye (pixel) into scene Rays intersect with objects and get reflected or transmittedRays intersect with objects and get reflected or transmitted Shadows, reflection, refractionsShadows, reflection, refractions

Algorithm in picturesAlgorithm in picturesNo intersection Single intersection with object

Intersected object couldbe directly illuminated

Algorithm in PicturesAlgorithm in PicturesShadow region

Reflection

Algorithm in picturesAlgorithm in picturesRefraction : Transmission of rays

Multiple reflections

In short…In short…

Shoot ray from eye through pixel into the sceneShoot ray from eye through pixel into the scene

Obtain intersection point if anyObtain intersection point if any

Spawn off new rays in the incident directions wrt Spawn off new rays in the incident directions wrt reflection, refraction, direct lighting or through reflection, refraction, direct lighting or through shadowsshadows

Color of pixel is the sum of light energies of all of Color of pixel is the sum of light energies of all of them (called the radiance)them (called the radiance)

The secondary rays will also spawn off new rays : The secondary rays will also spawn off new rays : Recursively performedRecursively performed

The algorithm in textThe algorithm in text

For each pixel (x,y) in image, generate corresponding ray in 3DFor each pixel (x,y) in image, generate corresponding ray in 3D

Image(x,y) := TraceRay(ray)Image(x,y) := TraceRay(ray)

TraceRay(ray)TraceRay(ray)1) Compute nearest surface-ray intersection1) Compute nearest surface-ray intersection2) If none found return background color2) If none found return background color3) Compute direct illumination from 3) Compute direct illumination from eacheach light sourcelight source4) Compute illumination arriving from 4) Compute illumination arriving from reflected directionreflected direction5) Compute illumination arriving from 5) Compute illumination arriving from refracted directionrefracted direction6) Combine all illuminations6) Combine all illuminations7) Return resulting color7) Return resulting color

Step 3 involves testing visibility of source by shooting shadow ray Step 3 involves testing visibility of source by shooting shadow ray towards ittowards it

Steps 4 and 5 involve recursive calls to TraceRay using corresponding Steps 4 and 5 involve recursive calls to TraceRay using corresponding raysrays

The ray treeThe ray tree

Recursive calls represented as a tree

RT : Backward TracingRT : Backward Tracing

First ray-traced image

Surface-ray intersectionSurface-ray intersection

Most important partMost important part Closest intersectionClosest intersection Surface primitives : polygons, spheres, cubesSurface primitives : polygons, spheres, cubes

Too expensive to test for each surface primitive in Too expensive to test for each surface primitive in scenescene Moving GB of geometry in and out of memory!Moving GB of geometry in and out of memory!

Optimizations :Optimizations : Curb depth of treeCurb depth of tree Faster and fewer intersection calculationsFaster and fewer intersection calculations

Bounding volume of each object by some regular shape Bounding volume of each object by some regular shape (sphere / cube)(sphere / cube)

Spatial Subdivision (discussed in next slide)Spatial Subdivision (discussed in next slide)

Optimizations – ‘Voxel’ Optimizations – ‘Voxel’ subdivisionsubdivision

Uniform subdivisionAdaptive subdivision (Octree)

Voxel is a 3-D sub-region of a scene

Issues in renderingIssues in rendering Pros and cons of RTPros and cons of RT

Pros :Pros : Almost accurately lit if tree is sufficiently deepAlmost accurately lit if tree is sufficiently deep Simple algorithmSimple algorithm

Cons :Cons : For faster rendering, standard traversals may not be coherent, For faster rendering, standard traversals may not be coherent,

hence can lead to a large number of page faultshence can lead to a large number of page faults

Other rendering algorithmOther rendering algorithm Scan-line based : Can render complex scenesScan-line based : Can render complex scenes Inaccurate illumination : Very unrealisticInaccurate illumination : Very unrealistic Much faster than RTMuch faster than RT

Advent of GPUsAdvent of GPUs Processors exclusively for CG : Faster renderingProcessors exclusively for CG : Faster rendering Parallelism and pipeliningParallelism and pipelining Aggressive prefetching from memoryAggressive prefetching from memory

Examples of Scan Examples of Scan ConversionConversion

Poor lighting; More use of texture maps

A more memory-coherent RT algorithmcould improve things

Enough of intro…Enough of intro…

612 in CG!612 in CG!

Enhance locality in RT to avoid memory issuesEnhance locality in RT to avoid memory issues Take this! An image having 10 million primitives with Take this! An image having 10 million primitives with

400 MB geometry400 MB geometry Involved 2 GB of I/O! Took 5 hours of rendering with RT!Involved 2 GB of I/O! Took 5 hours of rendering with RT!

First paper in two lecture series : Pharr et al. First paper in two lecture series : Pharr et al. ((SIGGRAPH ‘97SIGGRAPH ‘97)) Lazy creation of texture and geometry to manage scene Lazy creation of texture and geometry to manage scene

complexity : complexity : Caching Caching in main memoryin main memory Increase locality of reference by dynamically Increase locality of reference by dynamically reorderingreordering

rendering computationrendering computation

Essential IdeasEssential Ideas Statically reorder geometry into voxels of trianglesStatically reorder geometry into voxels of triangles

Remember voxels? Uniform 3-D cubes enclosing some Remember voxels? Uniform 3-D cubes enclosing some geometrygeometry

Maintain geometry cacheMaintain geometry cache

Texture data pre filtered and cachedTexture data pre filtered and cached

Application-level cachingApplication-level caching

Process one bunch of rays after another (from queue)Process one bunch of rays after another (from queue) Rays partitioned into coherent groupsRays partitioned into coherent groups Calculate illumination wherever rays intersect, possibly spawn Calculate illumination wherever rays intersect, possibly spawn

new ones and queue themnew ones and queue them

Terminate if all rays finishedTerminate if all rays finished

Block diagram of systemBlock diagram of system

Scheduling of rays - Scheduling of rays - ReorderingReordering

Goal : To process rays in particular order so Goal : To process rays in particular order so as toas to Minimize cache misses (here, page faults)Minimize cache misses (here, page faults) Advance computation towards completionAdvance computation towards completion

Each queued ray to be independent of Each queued ray to be independent of result or state of other raysresult or state of other rays

Take advantage of the illumination Take advantage of the illumination computationcomputation

Decompose ComputationDecompose Computation

Illumination computation at point x in Illumination computation at point x in direction w1 is of the form:direction w1 is of the form: Lo(x, wr) = Le(x, wr) + Lo(x, wr) = Le(x, wr) + ΣΣ W(x, wi, wr, W(x, wi, wr, ΘΘi) Li(x, wi)i) Li(x, wi)

WhereWhere Lo = Outgoing radianceLo = Outgoing radiance Le = Emitted radianceLe = Emitted radiance Li = Incoming radiance through direction wi hitting at xLi = Incoming radiance through direction wi hitting at x ΘΘi = Angle between wi and surface normal at xi = Angle between wi and surface normal at x W is a factor that depends on the material of x and whether there W is a factor that depends on the material of x and whether there

is reflection or refractionis reflection or refraction

We can successively multiply the W’s as We can successively multiply the W’s as we go down the tree!we go down the tree!

Decompose ComputationDecompose Computation

Each ray associated Each ray associated with weight and with weight and source pixel locationsource pixel location

Spawned ray’s weight Spawned ray’s weight multiplied by weight of multiplied by weight of parent rayparent ray

If ray hits light source If ray hits light source weight multiplied and weight multiplied and result added to source result added to source pixelpixel

W1

W1.W2

W3

W3.W4

W3.W5

Ray GroupingRay Grouping Closely spaced rays likely to intersect closely spaced Closely spaced rays likely to intersect closely spaced

geometry primitivesgeometry primitives

Scene uniformly divided into another grid of voxels : Scene uniformly divided into another grid of voxels : scheduling gridscheduling grid

Each voxel has following stateEach voxel has following state Queue of rays passing through it Queue of rays passing through it The geometry voxels overlapping itThe geometry voxels overlapping it

Voxel with highest ratio of benefit to cost chosen by Voxel with highest ratio of benefit to cost chosen by scheduler scheduler

For each ray in queue, test for intersection in voxelFor each ray in queue, test for intersection in voxel If yes, calculate illumination and spawn new raysIf yes, calculate illumination and spawn new rays Else, queue it up in next voxelElse, queue it up in next voxel

The algorithmThe algorithm

Issues : Size of scheduling Issues : Size of scheduling voxelvoxel

Scheduling voxel : small enough for overlapping Scheduling voxel : small enough for overlapping geometry voxel to fit into memorygeometry voxel to fit into memory

Non-uniform geometry : Can use adaptive Non-uniform geometry : Can use adaptive subdivision (octree)subdivision (octree)

Avoid geometry cache misses (page faults)Avoid geometry cache misses (page faults) Schedule voxels that have all geometry in cacheSchedule voxels that have all geometry in cache Defer processing rays that don’t have geometry in cacheDefer processing rays that don’t have geometry in cache Lots of rays then : Have ray cache as wellLots of rays then : Have ray cache as well

Issues : Voxel SchedulingIssues : Voxel Scheduling

Choose voxel with highest ratio of benefit to costChoose voxel with highest ratio of benefit to cost

Cost :Cost : How much overlapping geometry not in cacheHow much overlapping geometry not in cache Difficult to estimate apriori if lazy accessDifficult to estimate apriori if lazy access Reduce cost a lot (by 90%) if all geometry in cacheReduce cost a lot (by 90%) if all geometry in cache

Benefit :Benefit : How much towards completionHow much towards completion Number of rays , their weights?Number of rays , their weights? The weighted sum?The weighted sum?

Scene cacheScene cache

Geometry represented as mesh of Geometry represented as mesh of trianglestriangles Even spheres, cubes..!Even spheres, cubes..! For ease of sub dividing into voxelsFor ease of sub dividing into voxels Only one kind of intersection testOnly one kind of intersection test

Storage of geometry:Storage of geometry: ΔΔgle meshes stored as voxels in diskgle meshes stored as voxels in disk Tessellated patches also as trianglesTessellated patches also as triangles Procedurally generated geometryProcedurally generated geometry Texture-based data stored as extra geometryTexture-based data stored as extra geometry

Scene cacheScene cache

Size of geometry cache in main memorySize of geometry cache in main memory Make volume of voxel roughly equal to size of blockMake volume of voxel roughly equal to size of block Few thousands of Few thousands of ΔΔgles per voxelgles per voxel Divided into sub-voxel for ray intersection Divided into sub-voxel for ray intersection

accelerationacceleration

Remember voxels may not occupy same spaceRemember voxels may not occupy same space To avoid fragmentation special allocation routines To avoid fragmentation special allocation routines

writtenwritten

Texture CacheTexture Cache

Similar to one proposed earlier by PeacheySimilar to one proposed earlier by Peachey

Texture data pre-filtered into set of multi Texture data pre-filtered into set of multi resolution imagesresolution images Choose image depending on resolution of textureChoose image depending on resolution of texture Called Called mip-mapsmip-maps

Shading calculation of a pixel makes a small Shading calculation of a pixel makes a small number of accesses to some local part of number of accesses to some local part of texturetexture

Block DiagramBlock Diagram

ResultsResults

Experiments performed on 190 MHz Experiments performed on 190 MHz MIPS R10000 processor with 1 GB of MIPS R10000 processor with 1 GB of memorymemory

I/O buffering disabled to increase I/O buffering disabled to increase memory constraintsmemory constraints

Scenes occupy between 431 MB and Scenes occupy between 431 MB and 1.9 GB1.9 GB

Rendered scenes – Tree by Rendered scenes – Tree by lakelake

Maximum of 3.3 million triangles for the treeMaximum of 3.3 million triangles for the tree Terrain and lake used displacement mapping : Terrain and lake used displacement mapping :

more number of trianglesmore number of triangles Total of 9.6 million primitives : 440 MB neededTotal of 9.6 million primitives : 440 MB needed 677 X 288 resolution677 X 288 resolution

Rendered scenes – Office Rendered scenes – Office buildingbuilding

Very complex scene with dense occlusionsVery complex scene with dense occlusions Office Building has two floors with four officesOffice Building has two floors with four offices 46.4 million primitives with 1.9 GB of memory46.4 million primitives with 1.9 GB of memory Lit by sunlight and some lights in ceilingLit by sunlight and some lights in ceiling 672 X 384 resolution672 X 384 resolution

Rendered Scenes - Rendered Scenes - CathedralCathedral

Base 11K Triangles; With displacement map : 5.1 Base 11K Triangles; With displacement map : 5.1 million primitives! A total of 431 MBmillion primitives! A total of 431 MB

576 X 864 resolution576 X 864 resolution 1495 texture maps of 116 MB!1495 texture maps of 116 MB! Simple lighting sourceSimple lighting source

Caching but no reorderingCaching but no reordering

Unlimited cache size but Unlimited cache size but with lazy loadingwith lazy loading

Both memory and running Both memory and running time costs decreasetime costs decrease

22% memory use 22% memory use reduction in Cathedral case reduction in Cathedral case (not accessed)(not accessed)

Only 18% of total scene Only 18% of total scene accessed in indoor caseaccessed in indoor case

Obvious result!Obvious result!

Caching but no reorderingCaching but no reordering

Performance of Performance of geometry caching geometry caching when DFS Ray when DFS Ray tracing is usedtracing is used

Limited cache sizeLimited cache size

Performance Performance decrease not very decrease not very significantsignificant

Scheduling & ReorderingScheduling & Reordering

Rendering Lake sceneRendering Lake scene

Cache size of 10% of Cache size of 10% of maximum gives orders maximum gives orders of magnitude of magnitude performance gainperformance gain

Ray cache of 100K Ray cache of 100K rays (6% of total rays (6% of total number of rays)number of rays)

80% of scene memory

Scheduling and ReorderingScheduling and Reordering

Lake scene renderingLake scene rendering

Without reordering Without reordering and 325MB of and 325MB of geometry cache, 2.1 geometry cache, 2.1 GB of I/O!GB of I/O!

With reordering and With reordering and 50 MB cache, 938 MB 50 MB cache, 938 MB in totalin total

Accessed 15-20 times

Average access = 8 times

ConclusionsConclusions

Enhance locality in RT through caching and reorderingEnhance locality in RT through caching and reordering

Gives orders of magnitude performance gainGives orders of magnitude performance gain

Algorithm performs well!Algorithm performs well! Ideas not very seminal.. But the work is!Ideas not very seminal.. But the work is!

Future work : Experiments could be redone on the IBM Cell Future work : Experiments could be redone on the IBM Cell Processor to confirm the bottlenecksProcessor to confirm the bottlenecks Designed for PlayStation3Designed for PlayStation3 4.6 GHz specialized graphics processors…4.6 GHz specialized graphics processors…

Next lecture : A static method to perform data groupingNext lecture : A static method to perform data grouping