z-buffer optimizations
DESCRIPTION
Z-Buffer Optimizations. Patrick Cozzi University of Pennsylvania CIS 565 - Fall 2013. Announcements. Student work on course website Twitter Eric Haines’ talk. Announcements. Project 4 demos today Project 5 Due tomorrow Demos on Monday Project 6 – Deferred shading - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/1.jpg)
Z-Buffer Optimizations
Patrick CozziUniversity of PennsylvaniaCIS 565 - Fall 2013
![Page 2: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/2.jpg)
Announcements
Student work on course website Twitter Eric Haines’ talk
2
![Page 3: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/3.jpg)
Announcements
Project 4 demos today Project 5
Due tomorrowDemos on Monday
Project 6 – Deferred shadingReleased Friday. Due next Friday 11/15
HackathonNext Saturday 11/16, 6pm-12am, SIG lab
3
![Page 4: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/4.jpg)
Overview
Hardware: Early-Z Software: Front-to-Back Sorting Hardware: Double-Speed Z-Only Software: Early-Z Pass Software: Deferred Shading Hardware: Buffer Compression Hardware: Fast Clear Hardware: Z-Cull Future: Programmable Culling Unit
![Page 5: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/5.jpg)
Z-Buffer
Also called Depth BufferFragment vs PixelAlternatives: Painter’s, Ray Casting, etc.
![Page 6: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/6.jpg)
Z-Buffer History
“Brute-force approach”“Ridiculously expensive”
Sutherland, Sproull, and, Schumacker, “A Characterization of Ten Hidden-Surface Algorithms”, 1974
![Page 7: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/7.jpg)
Z-Buffer Quiz10 triangles cover a pixel. Rendering these in random order with a Z-buffer, what is the average number of times the pixel’s z-value is written?
See Subtle Tools Slides: erich.realtimerendering.com
![Page 8: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/8.jpg)
Z-Buffer Quiz
1st triangle writes depth2nd triangle has 1/2 chance of writing depth3rd triangle has 1/3 chance of writing depth
1 + 1/2 + 1/3 + …+ 1/10 = 2.9289…
See Subtle Tools Slides: erich.realtimerendering.com
![Page 9: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/9.jpg)
Z-Buffer Quiz
See Subtle Tools Slides: erich.realtimerendering.com
Harmonic Series
# Triangles # Depth Writes1 14 2.0811 3.0231 4.0383 5
12,367 10
![Page 10: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/10.jpg)
Z-Test in the Pipeline
When is the Z-Test?
FragmentShader
FragmentShader
Z-Test
Z-Test
or
![Page 11: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/11.jpg)
Early-Z
Avoid expensive fragment shaders
FragmentShader
Z-Test
![Page 12: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/12.jpg)
Early-Z
Automatically enabled on GeForce (8?) unless1
Fragment shader discards or write depthDepth writes and alpha-test2 are enabled
Fine-grained as opposed to Z-CullATI: “Top of the Pipe Z Reject”
FragmentShader
Z-Test
1 See NVIDIA GPU Programming Guide for exact details2 Alpha-test is deprecated in GL 3
![Page 13: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/13.jpg)
Front-to-Back Sorting
Utilize Early-Z for opaque objectsOld hardware still has less z-buffer writesCPU overhead. Need efficient sorting
Bucket SortOctree
Conflicts with state sorting1
0 - 0.25 0.25 – 0.5 0.5 – 0.75 0.75 - 1
01
12
[1] For example, see http://home.comcast.net/~tom_forsyth/blog.wiki.html#%5B%5BRenderstate%20change%20costs%5D%5D
![Page 14: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/14.jpg)
Double Speed Z-Only
GeForce FX and later render at double speed when writing only depth or stencilEnabled when
Color writes are disabledFragment shader discards or write depthAlpha-test is disabled
See NVIDIA GPU Programming Guide for exact details
![Page 15: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/15.jpg)
Early-Z PassSoftware technique to utilize Early-Z and Double Speed Z-OnlyTwo passes
Render depth only. “Lay down depth”Double Speed Z-Only
Render with full shaders and no depthEarly-Z (and Z-Cull)
![Page 16: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/16.jpg)
Early-Z PassOptimizations
Depth passCoarse sort front-to-backOnly render major occluders
Color passSort by stateRender depth just for non-occluders
![Page 17: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/17.jpg)
Deferred Shading
Similar to Early-Z Pass1st Pass: Visibility tests2nd Pass: Shading
Different than Early-Z PassGeometry is only transformed once
![Page 18: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/18.jpg)
Deferred Shading
1st PassRender geometry into G-Buffers:
Images from Tabula Rasa. See Resources.
Diffuse Normals Depth
![Page 19: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/19.jpg)
Deferred Shading
2nd PassShading == post processing effectsRender full screen quads that read from G-Buffers
Geometry is no longer needed
![Page 20: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/20.jpg)
Deferred Shading
Light Accumulation Result
Image from Tabula Rasa. See Resources.
![Page 21: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/21.jpg)
Deferred Shading
Eliminates shading fragments that fail Z-TestIncreases video memory requirementHow does it affect bandwidth?
![Page 22: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/22.jpg)
Buffer Compression
Reduce depth buffer bandwidthGenerally does not reduce memory usage of actual depth bufferSame architecture applies to other buffers, e.g. color and stencil
![Page 23: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/23.jpg)
Buffer Compression
Tile Table: Status for nxn tile of depths, e.g. n=8[state, zmin, zmax]state is either compressed, uncompressed, or cleared
0.10.50.50.1
0.5 0.5 0.10.8 0.80.8 0.8
0.50.5
0.5 0.5 0.1
[uncompressed, 0.1, 0.8]
![Page 24: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/24.jpg)
Buffer Compression
TileTable
Decompress Compress
Compressed Z-Buffer
Rasterizerupdatedz-values
updated z-max
nxn uncompressed z values[zmin, zmax]
![Page 25: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/25.jpg)
Buffer Compression
Depth Buffer WriteRasterizer modifies copy of uncompressed tileTile is lossless compressed (if possible) and sent to actual depth buffer
Update Tile Tablezmin and zmax
status: compressed or decompressed
![Page 26: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/26.jpg)
Buffer Compression
Depth Buffer ReadTile Status
Uncompressed: Send tileCompressed: Decompress and send tileCleared: See Fast Clear
![Page 27: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/27.jpg)
Buffer Compression
ATI: Writing depth interferes with compressionRender those objects last
Minimize far/near ratioImproves Zmin, Zmax precision
![Page 28: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/28.jpg)
Fast Clear
Doesn’t touch depth bufferglClear sets state of each tile to clearedWhen the rasterizer reads a cleared buffer
A tile filled with GL_DEPTH_CLEAR_VALUE is sentDepth buffer is not accessed
![Page 29: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/29.jpg)
Fast Clear
Use glClearNot full screen quadsNot the skyboxNo "one frame positive, one frame negative“ trickClear stencil together with depth – they are stored in the same buffer
![Page 30: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/30.jpg)
Z-Cull
Cull blocks of fragments before shadingCoarse-grained as opposed to Early-ZAlso called Hierarchical Z
![Page 31: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/31.jpg)
Z-CullZmax-CullingRasterizer fetches zmax for each tile it processesCompute ztriangle
min for a triangleCulled if ztriangle
min > zmax
FragmentShader
Z-Cull
Ztrianglemin > tile’s zmax
ztrianglemin
![Page 32: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/32.jpg)
Z-CullZmin-Culling
Support different depth testsAvoid depth buffer readsIf triangle is in front of tile, depth tests for each pixel is unnecessary
FragmentShader
Z-Cull
Ztrianglemax < tile’s zmin
ztrianglemax
![Page 33: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/33.jpg)
Z-Cull
Automatically enabled on GeForce (6?) cards unlessglClear isn’t usedFragment shader writes depth (or discards?)Direction of depth test is changed
ATI: avoid = and != depth compares on old cardsATI: avoid stencil fail and stencil depth fail operationsLess efficient when depth varies a lot within a few pixels
See NVIDIA GPU Programming Guide for exact details
![Page 34: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/34.jpg)
ATI HyperZ
HyperZ = Early Z + Z Compression + Fast Z clear + Z-Cull
See ATI's Depth-in-depth
![Page 35: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/35.jpg)
Programmable Culling Unit
Cull before fragment shader even if the shader writes depth or discardsRun part of shader over an entire tile to determine lower bound z value
Hasselgren and Akenine-Möller, “PCU: The Programmable Culling Unit,” 2007
![Page 36: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/36.jpg)
Summary
What was once “ridiculously expensive” is now the primary visible surface algorithm for rasterization
![Page 37: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/37.jpg)
Resources
www.realtimerendering.com
Sections 7.9.2 and 18.3
![Page 38: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/38.jpg)
Resources
developer.nvidia.com/object/gpu_programming_guide.html
GeForce 8 Guide: sections 3.4.9, 3.6, and 4.8GeForce 7 Guide: section 3.6
![Page 39: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/39.jpg)
Resources
http://developer.amd.com/media/gpu_assets/Depth_in-depth.pdf
Depth In-depth
![Page 40: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/40.jpg)
Resources
http://www.graphicshardware.org/previous/www_2000/presentations/ATIHot3D.pdf
ATI Radeon HyperZ TechnologySteve Morein
![Page 41: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/41.jpg)
Resources
http://ati.amd.com/developer/dx9/ATI-DX9_Optimization.pdf
Performance Optimization Techniques for ATI Graphics Hardware with DirectX® 9.0
Guennadi Riguer
Sections 6.5 and 8
![Page 42: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/42.jpg)
Resources
developer.nvidia.com/object/gpu_gems_home.html
Chapter 28: Graphics Pipeline Performance
![Page 43: Z-Buffer Optimizations](https://reader033.vdocuments.site/reader033/viewer/2022050809/56816252550346895dd29af5/html5/thumbnails/43.jpg)
Resources
developer.nvidia.com/object/gpu-gems-3.html
Chapter 19: Deferred Shading in Tabula Rasa