low-overhead rendering with direct3d - nvidia … · low-overhead rendering with direct3d evan hart...
TRANSCRIPT
![Page 1: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/1.jpg)
Low-Overhead Rendering with Direct3D
Evan Hart Principal Engineer - NVIDIA
![Page 2: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/2.jpg)
Ground Rules
● No DX9
● Need to move fast
● Big topic in 30 minutes
● Assuming experienced audience
● Everything is a tradeoff
● These are suggestions if an app is SW limited
![Page 3: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/3.jpg)
Why care about overhead?
● Overhead translates to draws per second
● Everyone wants more draws
● More detailed/interesting world
● Further draw distance
● More shadows
![Page 4: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/4.jpg)
Do I have a problem?
●Optimizing the wrong thing is harmful
●All games are CPU limited somewhere
● Also likely GPU-bound somewhere
●Need to find the tradeoff for your game
●Genre/style can influence balance
● Online / strategy can be worse
![Page 5: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/5.jpg)
Game / Driver Interaction
Game Thread
Dra
w
SetS
RV
SetC
B
SetV
B
Dra
w
SetS
RV
SetC
B
SetV
B
Driver Thread
![Page 6: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/6.jpg)
Game/Driver Interaction
●Driver thread does most real API work
● Not easily visible in app threads
●App threads just queue commands
● Typically very fast
●Deferred context is slightly different
● Driver work is done on app thread
● Work is kept minimal though
![Page 7: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/7.jpg)
State of the World
● 5 Million draws / second is feasible
● >50k draws / scene
● 290K draws / second w/ state changes
● Change most relevant state
● Drawing is cheap
● Object references are expensive
![Page 8: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/8.jpg)
“Full” Draw Call Ctx->IASetInputLayout( … );
Ctx->IASetVertexBuffers( … );
Ctx->IASetIndexBuffer( … );
Ctx->VSSetShader( … );
Ctx->Map( … );
Ctx->VSSetConstantBuffers( … );
Ctx->PSSetShader( … );
Ctx->Map( … );
Ctx->PSSetConstantBuffers( … );
Ctx->PSSetShaderResources( … );
Ctx->PSSetSamplers( … );
Ctx->DrawIndexed( … );
![Page 9: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/9.jpg)
Cost of the “Full” Draw Call
● Binding 5+ GPU memory objects
● Vertices, Indices, Constants, Textures
● Two memory management operations
● Map + DISCARD
● Whole bunch of indirections and cache misses
![Page 10: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/10.jpg)
User costs for “Full” draw call
● ~14 COM calls
● Copying of data tied to buffer update
● Gathering of any data needed to make the calls
● Map probably looks the most expensive
● Requires driver to provide an immediate response
![Page 11: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/11.jpg)
Driver costs for a “Full” draw
● Every object ref potentially requires
● Pointer indirection
● Object lifetime management
● Object residency management
● Map + Discard requires
● Rename active object
● Manage memory pool
![Page 12: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/12.jpg)
What can be done?
●Parallelize the load
●Reduce non-draw functions
●Reduce draw calls
● Seems counter-intuitive to our goals
![Page 13: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/13.jpg)
Deferred Contexts
● Can allow the load to be spread out
● Biggest help is Map related to constants
● 30-50% faster isn’t unreasonable
●On drivers that support them natively
● Not a panacea as the driver thread still does a lot of serial work
![Page 14: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/14.jpg)
Reducing Draw Setup
● Übershaders
● Optimize Constant Buffers
● Sub-allocate Vertex and Index Buffers
● Managing SRVs
● Managing Samplers
![Page 15: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/15.jpg)
Ubershaders
● Changing shaders snowballs costs
● Shader is connected to everything
● Conditionals are fairly cheap ● if (hasSpec) / for( numLayers )
● Can have a negative impact on GPU costs
● Secondary factors like register pressure
![Page 16: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/16.jpg)
Reducing Constant Buffer Costs
Pixel Shader Vertex Shader
Per-Frame Constants
Per-View Constants
Per-Draw Constants
![Page 17: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/17.jpg)
Shared Constant Buffers DX 11.1 PSSetConstantBuffers1(
// Parameters from older methods
UINT StartSlot,
UINT NumBuffers,
ID3DBuffer *const *Buffers,
// Offset in number of constants (16 bytes each)
const UINT* FirstConstant,
// Size of the block in constants ( Num % 16 == 0)
const UINT* NumConstants
)
![Page 18: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/18.jpg)
Shared Constant Buffers DX 11 // Vertex Shader
// VS constants at the beginning
float4x4 WVP_matrix : packoffset( c0 );
float4x4 W_matrix : packoffset( c4 );
…
// Pixel Shader
// PS constants later
float4 Ambient : packoffset( c16 );
…
![Page 19: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/19.jpg)
Vertex and Index Suballocation
● Simple concept
● Multiple objects can share the same buffer
● Most games standardize vertex formats
● Handful of configurations
● DrawIndexed offers BaseIndex/BaseVertex
● Vastly cuts down on # of IASetXXX calls
![Page 20: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/20.jpg)
SRV Management Basics
● Assign slots for “global” textures
● Shadow maps, environment maps
● Don’t “Clean-up” slots
● Binding NULL does have a cost
● Group SRVs that change together
● Albedo and normal
![Page 21: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/21.jpg)
SRV Management Advanced
● Suballocate from texture arrays
● Sizes and format are typically quasi-standard
● Terrain tiles are a fantastic example
● Just need to add an extra index to CB
● Need to take some care
● Allocating the max array for all combinations probably doesn’t make sense
![Page 22: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/22.jpg)
Sampler Management
● Repeat SRV basic advice
● Samplers don’t change much
● Fairly limited set of common choices
![Page 23: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/23.jpg)
Revised Draw Call // Setup:
// ubershader constants
// texture array indices
// normal constants, like transform
Ctx->Map( … );
// Don’t need to rebind CBs
// Offsets in multiuse IB/VB
// StartIndexLocation -> offset to IB
// VertexOffset -> offset for vertices
Ctx->DrawIndexed( … );
![Page 24: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/24.jpg)
DX 11.1 is better still
● Use one Map() for batch of draw calls
● Change offset into constant buffer per draw
![Page 25: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/25.jpg)
Reducing draw calls
● You’ve heard it for years
● Instancing
● Nearly every game could benefit
● Same model drawn N times in a frame
● There are practical concerns
● Depth sorting
● Constant buffer size
![Page 26: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/26.jpg)
Wrap-up
● Analyze your situation
● Consider API options
● Get the engine out of the way
● See appendix slides online
![Page 27: Low-Overhead Rendering with Direct3D - NVIDIA … · Low-Overhead Rendering with Direct3D Evan Hart Principal Engineer - NVIDIA](https://reader030.vdocuments.site/reader030/viewer/2022021717/5b2cccb57f8b9ac06e8b6c36/html5/thumbnails/27.jpg)
Thanks
NVIDIA Devtech team
NVIDIA D3D Driver Performance Team
Bryan Dudash
Dan Baker