nvidia graphics technology libraries · set of libraries and modules of visual effects and...
TRANSCRIPT
gameworks.nvidia.com
NVIDIA Graphics Technology libraries
…and more in the works
TXAA WaveWorks
HBAO+ NVDOF FaceWorks
gameworks.nvidia.com
Agenda
Introducing NVIDIA GeForce Works
TXAA
WaveWorks
FaceWorks
HBAO+
NVDOF
gameworks.nvidia.com
GeForce Works
Set of libraries and modules of visual effects and
enhancements for games
GPU optimized
Self-contained
Easy to integrate and tune
Scalable
Covering a wide spectrum of effects (water sim, post-
processing, GI, tessellation,…)
NVIDIA Confidential
TXAA
Cinematic Antialiasing
gameworks.nvidia.com
TXAA
TXAA is a form of temporal anti-aliasing mixed with MSAA
TXAA replaces MSAA Resolve
TXAA also provides a higher quality resolve filter
Better than the default MSAA box filter
TXAA is provided as a library
Library supported on NVIDIA Kepler and future GPUs
Library currently DX11 only (GL in progress)
2 versions of the library: TXAA 2.1 and TXAA 2.E
Looking at unifying in the future
gameworks.nvidia.com
MSAA
Processing
After
Resolve
Resolved
Color MSAA Resolve
2x or 4x
MSAA
Color
gameworks.nvidia.com
Motion Vector is offset to
location of pixel in prior frame
TXAA 2.1 - Motion Vector Input
2x or 4x
MSAA
Color
Motion
Vectors
Prior TXAA
Output
TXAA Resolve Resolved
Color
Processing
After
Resolve
gameworks.nvidia.com
TXAA 2.E - Motion Vector Input
2x or 4x
MSAA
Color
Resolved
Depth
Prior TXAA
Output
TXAA Resolve
(converts depth
to camera motion)
Resolved
Color
Processing
After
Resolve
gameworks.nvidia.com
Very simple library interface
TxaaOpenDX()
TxaaResolveDX() - Call in place of MSAA resolve
TxaaCloseDX()
Built in debug visualization modes
TXAA library header provides
Example good precision depth to camera motion vector transform
For those who want to roll their own motion vectors (TXAA 2.1)
TXAA Library API
gameworks.nvidia.com
Visual edges require more than 1 sample/pixel
Otherwise quality will be poor
Visual edges are caused by more than just triangles
Alpha test surfaces from textures
Soft particle to opaque triangle edge
Post processing (depth of field, motion blur, etc)
Requires smart application of selective super-sampling
TXAA/MSAA Quality = # of Samples
gameworks.nvidia.com
Progression of AA Quality
Looking at samples used to reconstruct pixel color
• More shaded samples increases quality
gameworks.nvidia.com
Post Processing applied per pixel on edges removes AA
TXAA/MSAA + Post Processing Problem
No Depth Of Field Per Pixel
Depth Of Field
Edge AA
Gone
gameworks.nvidia.com
Many kinds of effects
Motion Blur
Depth of Field
Screen Space Ambient Occlusion
Soft Particle Blending
Screen Space Reflections
etc
Fix by masking simple and complex pixels
Apply post processing per sample on complex pixels
TXAA/MSAA + Post Processing Fix
Complex Pixel
Mask From
Prior Slide
gameworks.nvidia.com
Manually super-sample alpha test in pixel shader
Set coverage mask to samples which pass test
GL: interpolateAtSample(), gl_SampleMask
DX: EvaluateAttributeAtSample(), SV_Coverage
High Quality Alpha Test
Pixel
gameworks.nvidia.com
Image captured from Call of Duty Black Ops 2 with 4xTXAA+4xSGSSAA
NVIDIA Confidential
WaveWorks
Realtime GPU simulated wind driven waves on
large bodies of water
gameworks.nvidia.com
WaveWorks is…
...a library for simulating wind-driven waves on large bodies of
water, in real time, using GPU acceleration
(BUT: the application still defines the look)
(DEMO)
gameworks.nvidia.com
1. Wave simulation Producing the height map evolving over time
Controllable by wind speed
2. Mesh generating for water surface Quad-tree LOD system
3. Feedbacks for vessel dynamics Game objects interaction
Injecting spray and audio
4. A guideline for water body rendering Sample shader code for different water apparence
What WaveWorks Offers: A Breakdown
gameworks.nvidia.com
WaveWorks’ mission space
Wide range
of weather
conditions
Range of
hardware
Range of
lengths
Beaufort 12
Beaufort 1
GPU
CPU
1km
features
1cm
features
gameworks.nvidia.com
Detour: The Beaufort scale
An empirical observation-based wind speed scale
Devised 1805 by Francis Beaufort, British Navy
Originally 0 to 12
0 = Flat
6 = Long waves begin to form. White foam crests are very frequent. Some
airborne spray is present.
12 = Huge waves. Sea is completely white with foam and spray. Air is filled
with driving spray, greatly reducing visibility.
Modern extensions to 13 and more
gameworks.nvidia.com
The length range challenge
Need both uniqueness and fine detail for a simulation to look
good
UNIQUENESS: minimize objectionable
repeating patterns over large areas
FINE DETAIL: accurately portray near-
camera short-wavelength features
gameworks.nvidia.com
The length range challenge
WaveWorks supports a length range of 64,000 : 1 -
64,000 represents the uniqueness dimension within which the simulation
produces non-repeating results
1 is the finest spacing of height samples seen near-camera
e.g. 1.5cm detail vs. 1km uniqueness
Handles 1km-order wavelengths
Beaufort 12 relevance
gameworks.nvidia.com
The Simulation Algorithm
Tessendorf’s spectral algo,
based on Phillips spectrum
Used in many movies, right
back to Waterworld and
Titanic in ’97, and more
recently Life of Pi
Simulation step runs in <2ms on GTX680 at max setting
gameworks.nvidia.com
The ocean surface is
composed by enormous
simple waves
Each sine wave is a hybrid
sine wave (Gerstner wave)
Each mass point on the surface
is doing vertical circular motion
The Algorithm: Wave Composition
gameworks.nvidia.com
The Algorithm: the Phillips spectrum
The distribution of simple waves’ length, speed and amplitude
are following the statistic model of Phillips Spectrum
Explainer:
Predicted by Phillips in 1950s, working from first principles of Fluid
Mechanics
Subsequently validated by experimental oceanographic data in 1960s
Models a ‘fully developed’ sea state
So a great choice for ‘default’ spectrum
NB: rest of algorithm is spectrum-agnostic
gameworks.nvidia.com
Explainer: the Phillips spectrum
1 10 100 1000 10000
λ, m
Beaufort 6
Beaufort 7
Beaufort 8
Beaufort 9
gameworks.nvidia.com
The Diagram of the Runtime Algorithm
0
1( ) ( ) ( )
2hH Pk k k 0
*
0
( , ) ( )
( )
i t
i t
H t H e
H e
k k
k
.( , ) ( , )y
yD t i H t
k
kk k
.( , ) ( , )x
xD t i H t
k
kk k
Dy
Dz
Dx
Normal
Folding
Displacement
1
1
1
Per-params-change (CUDA/CPU) Per-frame (CUDA/CPU) Per-frame (PS)
F
F
F
gameworks.nvidia.com
What is in the Distribution
32bit and 64bit libraries
Header files
Shaders
Sample code for rendering
Documents
./bin/x86/
./bin/x64/
./demo/
./inc/
./lib/x86/
./lib/x64/
./shader/
./IntegrationNotes.txt
gameworks.nvidia.com
WaveWorks 1.2
1.2 is latest greatest release
Multi-res simulation for 64,000 : 1 length range
Geometry LODing
Coarse level: quad tree
Fine level, D3D9/10: geo-morphing
Fine level, D3D11: tessellation
Supported configurations
CPU
sim
GPU
sim
D3D9
D3D10
D3D11
gameworks.nvidia.com
WaveWorks 1.3
New features:
foam
GPU acceleration
for evolving
spectra
Beaufort presets
gameworks.nvidia.com
How to exploit
The obvious use-case: render a good-looking sea
Also: as prime mover for secondary / tertiary effects
For vessel physics
For vessel spray/foam production
And from there, audio for spray/foam
Conceptually: a big high-quality animated noise field
gameworks.nvidia.com
API Usage – init/teardown
WaveWorks
App
Init/teardown
// Use these calls to globally initialize/release on D3D device
create/destroy.
nvsdk_result NVSDK_Water_InitD3D9(IDirect3DDevice9* pD3DDevice);
nvsdk_result NVSDK_Water_ReleaseD3D9(IDirect3DDevice9* pD3DDevice);
nvsdk_result NVSDK_Water_InitD3D10(ID3D10Device* pD3DDevice);
nvsdk_result NVSDK_Water_ReleaseD3D10(ID3D10Device* pD3DDevice);
nvsdk_result NVSDK_Water_InitD3D11(ID3D11Device* pD3DDevice);
nvsdk_result NVSDK_Water_ReleaseD3D11(ID3D11Device* pD3DDevice);
gameworks.nvidia.com
API Usage – configure a simulation
WaveWorks
App
simulation
struct NVSDK_Water_Simulation_Params
{
...
float2 wind_dir;
float wind_speed; // <- the most important param!
float wind_dependency;
float choppy_scale;
bool readback_displacements;
...
};
// D3D11
NVSDK_Water_Simulation_CreateD3D11( const NVSDK_Water_Simulation_Params& params,
ID3D11Device* pD3DDevice,
NvWaterSimulationHandle* pResult);
// D3D9/10 creation is similar
NVSDK_Water_Simulation_UpdateParams( NvWaterSimulationHandle hSim,
const NVSDK_Water_Simulation_Params& params);
gameworks.nvidia.com
API Usage – pumping the simulation
WaveWorks
App
simulation
Update()
// Make these calls once per frame per simulation
NVSDK_Water_Simulation_SetTime(NvWaterSimulationHandle hSim, float
fAppTime);
NVSDK_Water_Simulation_UpdateTick( NvWaterSimulationHandle hSim,
IUnknown* pDC,
NvWaterSavestateHandle hSavestate);
gameworks.nvidia.com
API Usage – rendering the results
WaveWorks
App
Vertex Shader
Pixel Shader
WaveWorks
attribute
fragments
simulation
Update()
struct NV_WATER_VERTEX_OUTPUT
{
NV_WATER_INTERPOLATED_VERTEX_OUTPUT interp;
float3 pos_world;
float3 pos_world_undisplaced;
float3 world_displacement;
};
NV_WATER_VERTEX_OUTPUT
NV_GetDisplacedWaterVertex(NV_WATER_VERTEX_INPUT
In);
struct NV_WATER_SURFACE_ATTRIBUTES
{
float3 normal;
float3 eye_dir;
float fold;
};
NV_WATER_SURFACE_ATTRIBUTES
NV_GetWaterSurfaceAttributes(NV_WATER_INTERPOLATED_VE
RTEX_OUTPUT In);
gameworks.nvidia.com
API Usage – rendering the results
WaveWorks
App
simulation
Update()
reflection Vertex Shader
Pixel Shader
WaveWorks
attribute
fragments
shader constant indices
struct NVSDK_Water_ShaderInput_Desc
{
enum InputType {
VertexShader_FloatConstant = 0,
VertexShader_ConstantBuffer,
VertexShader_Texture,
VertexShader_Sampler,
//... Etc.
};
InputType Type;
nvsdk_cstr Name;
nvsdk_uint RegisterOffset;
};
// D3D9/10 are similar
UINT NVSDK_Water_Simulation_GetShaderInputCountD3D11();
nvsdk_result
NVSDK_Water_Simulation_GetShaderInputDescD3D11(
UINT inputIndex,
NVSDK_Water_ShaderInput_Desc* pDesc);
gameworks.nvidia.com
API – rendering the results
WaveWorks
App
simulation
Update()
SetRenderState()
reflection Vertex Shader
Pixel Shader
WaveWorks
attribute
fragments
shader constant indices
// C APIs – getting the simulation to set its render state
NVSDK_Water_Simulation_SetRenderState( NvWaterSimulationHandle hSim,
IUnknown* pDC,
const float4x4& matView,
const uint* pShaderInputRegisterMappings,
NvWaterSavestateHandle hSavestate);
gameworks.nvidia.com
API – rendering the results
WaveWorks
App
simulation
Update()
SetRenderState()
reflection Vertex Shader
Pixel Shader
WaveWorks
attribute
fragments
shader constant indices
gameworks.nvidia.com
API – rendering the results
Shader hookup is likely trickiest aspect of integration
Demo app: ships as source within distro
Includes an example using -
Named constants
FX file format
Shader reflection to get constant offsets from compiled FX’s
Good way to follow the workings
gameworks.nvidia.com
API – geometry generator
WaveWorks
App
simulation
Update()
SetRenderState()
reflection Vertex Shader
Pixel Shader
WaveWorks
attribute
fragments
shader constant indices
WaveWorks
geomgen
fragments
geomgen
Draw()
gameworks.nvidia.com
API – quad-tree generator
Quadtree is a “stock” generator
shipped in the lib
Does frustum culling
Does mesh LOD
Tiling is calculated in world space
ensures mesh is stable w.r.t camera
rotation, no “swimming” artifact.
gameworks.nvidia.com
Case study: Just Cause 2
Release 0.5
Bimodal simulation
Used Perlin noise for
distant LOD
Integration points of interest -
— Explicit-mode for quadtree
— Save/restore of graphics state
— Readbacks for vessel dynamics
gameworks.nvidia.com
JC2 – readbacks
WaveWorks
App
simulation
Update()
SetRenderState()
reflection Vertex Shader
Pixel Shader
WaveWorks
attribute
fragments
shader constant indices
geomgen
Draw()
GetDisplacements()
NVSDK_Water_Simulation_GetDisplacements(NvWaterSimulationHandle hSim,
IUnknown* pDC,
const float2* inSamplePoints,
float4* outDisplacements,
uint numSamples);
WaveWorks
geomgen
fragments
gameworks.nvidia.com
JC2 – save/restore
WaveWorks
App Shader
WaveWorks
attribute
fragments
WaveWorks
geomgen
fragments
geomgen simulation
GetDisplacements()
Draw()
Update()
SetRenderState()
shader constant indices
reflection
save/restore
Restore()
gameworks.nvidia.com
Case study: 1.3 demo
Vessel dynamics
Spray
generation,
including audio
Reinjecting spray
back into the
foam map
gameworks.nvidia.com
Demo – vessel dynamics
WaveWorks
App
20K hull
sensors
simulation
Depth
readings
Vessel
dynamics GetDisplacements()
gameworks.nvidia.com
Demo – vessel dynamics
Even distribution over hull area for unbiased read
Readback data is parameterized over undisplaced coords
So search readback data to find water height at worldpos
i.e. reverse lookups are intrinsically hard
Roadmap plans should help here
gameworks.nvidia.com
Demo – spray gen
WaveWorks
App
20K hull
sensors
simulation
Depth
readings
Vessel
dynamics GetDisplacements()
1-frame
FIFO
Spraygen
model
Audio FX
Spray
particles
gameworks.nvidia.com
Demo – spray reinjection
WaveWorks
App
Spray
particles Spray
reinject
shader
Foam
map
gameworks.nvidia.com
Demo – spray reinjection
WaveWorks
App
simulation
Shader
WaveWorks
attribute
fragments
SetRenderState()
Neighborhood
height map
Spray
particles Spray
reinject
shader
Foam
map
gameworks.nvidia.com
WaveWorks Roadmap
Productize 1.3 demo features
Spray generation
Hull sensing
GPU acceleration for reverse lookup
Server-mode for Linux
User-defined spectra
NVIDIA Confidential
FaceWorks
Pretty faces
gameworks.nvidia.com
FaceWorks
Photo-realistic skin rendering
Subsurface scattering
Skin Reflectance
High definition skin detail
Scalable approach from low end to
high end GPUs
Coming soon!
NVIDIA Confidential
HBAO+
Fast, High Quality Ambient Occlusion
gameworks.nvidia.com
HBAO+
State of the art SSAO
Fullscreen
Flicker free
Artifact free
Performance mode
1.2ms (1920x1200)
Quality mode
4.0ms (1920x1200)
(On a GTX660)
gameworks.nvidia.com
HBAO+ Design Goals
Full-res SSAO, not half-res
Rendering SSAO in half-res tends to cause bad flickering on thin
geometry (e.g. alpha-tested surfaces)
Look better than the HBAO algorithm
HBAO suffers from over-occlusion behind thin objects
Better efficiency than the HBAO algorithm
Minimize the math ops / TEX sample, and have the highest possible
texture cache hit rate
Easy to integrate
Only requires a depth buffer as input
gameworks.nvidia.com HBAO
GPU time: 5.22ms
Over-occlusion &
flickering
gameworks.nvidia.com No HBAO+
gameworks.nvidia.com With HBAO+
gameworks.nvidia.com HBAO+ (QUALITY mode) 2.5 ms in 1920x1200 on GTX 680
NVIDIA Confidential
DOF
High Quality Depth of Field with Bokeh
gameworks.nvidia.com
DOF + Bokeh
Diffusion DOF solver
Arbitrary blur size
Constant cost
FFT base Bokeh
Fixed cost
Uses direct compute FFT
implementation
Fast convolution in frequency
space
gameworks.nvidia.com
DOF Goals
DOF with arbitrary blur size
Diffusion Depth of Field based technique
Optimized DDOF solver
Fixed cost Bokeh
Bokeh transformed to frequency space through an FFT
Frame transformed to frequency space through an FFT
Convolution of Bokeh shape with game frame done in frequency
space as one complex multiplication per pixel
Transform back to image space with an iFFT
gameworks.nvidia.com
Performance in 1920x1200
Technique ms on GTX680
Diffusion DOF 2.7
DDOF + FFT Bokeh 3.6
gameworks.nvidia.com
PN-AEN Tessellation
Particle Shadows
Plug and Play Tessellation
High Quality Shadows
Volumetric Lights
Global Illumination
and more….
Other / Future GeForce Works Libraries
gameworks.nvidia.com
http://developer.nvidia.com
Questions?