penn graphics

130
Floored 3D Visualization & Virtual Reality for Real Estate

Upload: floored

Post on 12-Jul-2015

3.792 views

Category:

Technology


1 download

TRANSCRIPT

Floored

3D Visualization amp Virtual Reality for Real Estate

Introduction- Hi my name is Nick Brancaccio

- I work on graphics at Floored

- Floored does real time 3D architectural visualization on the web

Introduction- Demo

- Check out flooredcom for more

Challenges

Architectural

- Clean light-filled aesthetic

- Canrsquot hide tech art deficiencies with grungy textures

Challenges

Challenges

Interior Spaces

- Many secondary light sources rather than single key light

- Direct light fairly high frequency (directionally and spatially)

- Sunlight does not dominate many of our scenes

- Especially in NYC

Challenges

Real world material representation

- Important for communicating quality mood feel

- Comparable real-life counterparts

- Customers are comparing to high-quality offline rendering

Challenges

Challenges

webGL

- Limited OpenGL ES API

- Variable browser support

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Introduction- Hi my name is Nick Brancaccio

- I work on graphics at Floored

- Floored does real time 3D architectural visualization on the web

Introduction- Demo

- Check out flooredcom for more

Challenges

Architectural

- Clean light-filled aesthetic

- Canrsquot hide tech art deficiencies with grungy textures

Challenges

Challenges

Interior Spaces

- Many secondary light sources rather than single key light

- Direct light fairly high frequency (directionally and spatially)

- Sunlight does not dominate many of our scenes

- Especially in NYC

Challenges

Real world material representation

- Important for communicating quality mood feel

- Comparable real-life counterparts

- Customers are comparing to high-quality offline rendering

Challenges

Challenges

webGL

- Limited OpenGL ES API

- Variable browser support

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Introduction- Demo

- Check out flooredcom for more

Challenges

Architectural

- Clean light-filled aesthetic

- Canrsquot hide tech art deficiencies with grungy textures

Challenges

Challenges

Interior Spaces

- Many secondary light sources rather than single key light

- Direct light fairly high frequency (directionally and spatially)

- Sunlight does not dominate many of our scenes

- Especially in NYC

Challenges

Real world material representation

- Important for communicating quality mood feel

- Comparable real-life counterparts

- Customers are comparing to high-quality offline rendering

Challenges

Challenges

webGL

- Limited OpenGL ES API

- Variable browser support

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges

Architectural

- Clean light-filled aesthetic

- Canrsquot hide tech art deficiencies with grungy textures

Challenges

Challenges

Interior Spaces

- Many secondary light sources rather than single key light

- Direct light fairly high frequency (directionally and spatially)

- Sunlight does not dominate many of our scenes

- Especially in NYC

Challenges

Real world material representation

- Important for communicating quality mood feel

- Comparable real-life counterparts

- Customers are comparing to high-quality offline rendering

Challenges

Challenges

webGL

- Limited OpenGL ES API

- Variable browser support

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges

Challenges

Interior Spaces

- Many secondary light sources rather than single key light

- Direct light fairly high frequency (directionally and spatially)

- Sunlight does not dominate many of our scenes

- Especially in NYC

Challenges

Real world material representation

- Important for communicating quality mood feel

- Comparable real-life counterparts

- Customers are comparing to high-quality offline rendering

Challenges

Challenges

webGL

- Limited OpenGL ES API

- Variable browser support

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges

Interior Spaces

- Many secondary light sources rather than single key light

- Direct light fairly high frequency (directionally and spatially)

- Sunlight does not dominate many of our scenes

- Especially in NYC

Challenges

Real world material representation

- Important for communicating quality mood feel

- Comparable real-life counterparts

- Customers are comparing to high-quality offline rendering

Challenges

Challenges

webGL

- Limited OpenGL ES API

- Variable browser support

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges

Real world material representation

- Important for communicating quality mood feel

- Comparable real-life counterparts

- Customers are comparing to high-quality offline rendering

Challenges

Challenges

webGL

- Limited OpenGL ES API

- Variable browser support

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges

Challenges

webGL

- Limited OpenGL ES API

- Variable browser support

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges

webGL

- Limited OpenGL ES API

- Variable browser support

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Approach

- Physically Based Shading

- Deferred Rendering

- Temporal Amortization [Yang 09][Herzog 10][Wronski 14][Karis 14]

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Physically Based Shading

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Physically Based Shading

- Scalable Quality

- Architectural visualization industry has embraced PBS in offline

rendering for quite some time

- Maxwell VRay Arnold etc

- High Standards

- Vocabulary of PBS connects real time and offline disciplines

- Offline can more readily consume real time assets

- Real time can more readily consume offline assets

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Physically Based Shading

- Authoring cost is high but so is reusability

- Floored has a variety of art assets spaces furniture lighting

materials

- PBS supports reusability across projects

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Physically Based Shading

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Physically Based Shading

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Physically Based Shading

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Physically Based Shading

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Material Parameterization

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Standard Material Parameterization

Full Artist Control

- Albedo

- Specular Color

- Alpha

- Emission

- Gloss

- Normal

Physically Coupled

- Metallic

- Color

- Alpha

- Emission

- Gloss

- Normal

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Microfacet BRDF

- Microfacet Specular

- D Normal Distribution Function GGX [Walter 07]

- G Geometry Shadow Masking Function Height-Correlated Smith [Heitz 14]

- F Fresnel Spherical Gaussian Schlickrsquos Approximation [Schlick 94]

- Microfacet Diffuse

- Qualitative Oren Nayar [Oren 94]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Standard Material ParameterizationTime to shameless steal from Real-Time Rendering [Moumlller 08]

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Standard Material Parameterization

- Give color parameter conditional meaning [Burley 12] [Karis 13]

if (metallic)

albedo = color

specularColor = vec3(004)

else

albedo = vec3(00)

specularColor = color

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Standard Material Parameterization

- Can throw out a whole vec3 parameter

- Less knobs help enforce physically plausible materials

- Significantly lighter g-buffer storage

- Less textures better download times

- What control did we lose

- Video of non-metallic materials sweeping through physically plausible range of

specular colors

- 002 to 005 [Hoffman 10][Lagarde 11]

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Standard Material Parameterization

- Our standard material does not support

- Translucency (Skin Foliage Snow)

- Anisotropic Gloss (Brushed Metal Hair Fabrics)

- Layered Materials (Clear coat)

- Partially Metallic Filtered Hybrid Materials (Car paints Sci Fi Materials)

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Deferred Rendering

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Forward Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- For each light

- outgoing radiance += incoming radiance brdf projected area

- Remap outgoing radiance to perceptual display domain

- Tonemap

- Gamma Color Space Conversion

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Forward Pipeline Cons- Challenging to effectively cull lights

- Typically pay cost of worst case

- for (int i = 0 i lt MAX_NUM_LIGHTS ++i)

- outgoing radiance += incoming radiance brdf projected area

- MAX_NUM_LIGHTS small due to MAX_FRAGMENT_UNIFORM_VECTORS

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Deferred Pipeline Overview- For each model

- For each primitive

- For each vertex

- Transform vertex by modelViewProjectionMatrix

- For each pixel

- Write geometric and material data to g-buffer

- For each light

- For each pixel inside light volume

- Read geometric and material data from texture

- outgoing radiance = incoming radiance brdf projected area

- Blend Add outgoing radiance to render target

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Deferred Pipeline Cons- Heavy on read bandwidth

- Read G-Buffer for each light source

- Heavy on write bandwidth

- Blend add outgoing radiance for each light source

- Material parameterization limited by G-Buffer storage

- Challenging to support non-standard materials

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer- Parameters What data do we need to execute shading

- Rasterization How do we access these parameters

- Storage How do we store these parameters

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Parameters

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Lit Scene

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Color

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Metallic

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Gloss

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Depth

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Normal

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Velocity

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Rasterization

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Screen Space Velocity

- Compute per pixel screen space velocity for temporal reprojection

- In vertex shader

- In fragment shader

varying vec3 vPositionScreenSpace

varying vec3 vPositionScreenSpaceOld

vPositionScreenSpace = model_uModelViewProjectionMatrix vec4(aPosition 10)

vPositionScreenSpaceOld = model_uModelViewProjectionMatrixOld vec4(aPosition 10)

gl_Position = vPositionScreenSpace

vec2 velocity = vPositionScreenSpacexy vPositionScreenSpacew

- vPositionScreenSpaceOldxy vPositionScreenSpaceOldw

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Read Material Data

- Rely on dynamic branching for swatch vs texture sampling

vec3 color = (material_uTextureAssignedColor gt 00)

texture2D(material_uColorMap colorUV)rgb

colorSwatch

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Encode

gBufferComponents buffer

buffermetallic = metallic

buffercolor = color

buffergloss = gloss

buffernormal = normalCameraSpace

bufferdepth = depthViewSpace

buffervelocity = velocity

- our data is ready Now we just need to write it out

- and after skipping some tangential details

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Storage

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges Storage

- In vanilla webGL largest pixel storage we can write to is a single RGBA

unsigned byte texture This isnrsquot going to cut it

- What extensions can we pull in

- Poll webglstatscom for support

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges Storage

- Multiple render targets not well supported

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges Storage

- Reading from render buffer depth getting better

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges Storage

- Texture float support quite good

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges Storage

- Texture half float support getting better

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Challenges Encode Decode

- Texture float looks like our best option

- Can we store all our G-Buffer data into a single floating point texture

- Pack the data

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Integer Packing

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Integer Packing

- Use floating point arithmetic to store multiple bytes in large numbers

- 32-bit float can represent every integer to 2^24 precisely

- Step size increases at integers gt 2^24

- 0 to 16777215

- 16-bit half float can represent every integer to 2^11 precisely

- Step size increases at integers gt 2^11

- 0 to 2048

- Example pack 3 8-bit integer values into 32-bit float

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Integer Packing

- No bitwise operators

- Can shift left with multiplies right with divisions

- AND OR operator simulation though multiples mods and adds

- Impractical for general single bit manipulation

- Must be high speed especially decode

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Packing Example Encodefloat normalizedFloat_to_uint8(const in float raw)

return floor(raw 2550)

float uint8_8_8_to_uint24(const in vec3 raw)

const float SHIFT_LEFT_16 = 2560 2560

const float SHIFT_LEFT_8 = 2560

return rawx SHIFT_LEFT_16 + (rawy SHIFT_LEFT_8 + rawz)

vec3 color888

color888r = normalizedFloat_to_uint8(colorr)

color888g = normalizedFloat_to_uint8(colorg)

color888b = normalizedFloat_to_uint8(colorb)

float colorPacked = uint8_8_8_to_uint24(color888)

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Packing Example Decodevec3 uint24_to_uint8_8_8(const in float raw)

const float SHIFT_RIGHT_16 = 10 (2560 2560)

const float SHIFT_RIGHT_8 = 10 2560

const float SHIFT_LEFT_8 = 2560

vec3 res

resx = floor(raw SHIFT_RIGHT_16)

float temp = floor(raw SHIFT_RIGHT_8)

resy = -resx SHIFT_LEFT_8 + temp

resz = -temp SHIFT_LEFT_8 + raw

return res

vec3 color888 = uint24_to_uint8_8_8(colorPacked)

vec3 color

colorr = uint8_to_normalizedFloat(color888r)

colorg = uint8_to_normalizedFloat(color888g)

colorb = uint8_to_normalizedFloat(color888b)

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Unit Testing

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Unit Testing

- Important to unit test packing functions

- Easy to miss collisions

- Easy to miss precision issues

- Watch out for glsl functions such as mod() that expand to multiple

arithmetic instructions

- Desirable to test on the gpu

- WebGL has no support for readPixels on floating point textures

- Requires packing

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Unit Testing

- 2^24 not a very large number

- Can exhaustively test entire domain with a 4096 x 4096 render target

- Assign pixel unique integer ID

- pack ID

- unpack ID

- Compare unpacked ID to pixel ID

- Write success fail color

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Packing Unit Test Single Passvoid main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

Encode Decode and Compare

vec3 expectedEncoded = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

float expectedDecoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(expectedEncode))

if (expectedDecoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Unit Testing

- Single pass verifies our packing functions are mathematically correct

- Pass 1 Pack data upack data compare to expected value

- In practice we will write read from textures in between pack unpack

phases

- Better to run a more exhaustive two pass test

- Pass 1 Pack data render to texture

- Pass 2 Read texture unpack data compare to expected value

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

gl_FragColorrgb = uint8_8_8_to_sample(uint24_to_uint8_8_8(expected))

- Pass 1 Pack data render to texture

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Packing Unit Test Two Pass

void main()

Covers the range of all uint24 with a 4k x 4k canvas

Avoid floor(gl_FragCoord) here Itrsquos mediump in webGL Not enough precision to uniquely identify pixels in a 4k target

vec2 pixelCoord = floor(vUV pass_uViewportResolution)

float expected = pixelCoordy pass_uViewportResolutionx + pixelCoordx

vec3 encoded = texture2D(encodedSampler vUV)xyz

float decoded = uint8_8_8_to_uint24(sample_to_uint8_8_8(encoded))

if (decoded == expected)

Packing Successful

gl_FragColor = vec4(00 10 00 10)

else

Packing Failed

gl_FragColor = vec4(10 00 00 10)

- Pass 2 Read texture unpack data compare to expected value

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer PackingCompression

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Compression

- What surface properties can we compress to make packing easier

- Surface Properties

- Normal

- Emission

- Color

- Gloss

- Metallic

- Depth

- Velocity

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Normal Compression

- Normal data encoded in octahedral space [Cigolle 14]

- Transform normal to 2D Basis

- Reasonably uniform discretization across the sphere

- Uses full 0 to 1 domain

- Cheap encode decode

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Emission

- Donrsquot pack emission Forward render

- Avoid another vec3 in the G-Buffer

- Emission only needs access when adding to light accumulation buffer

Not accessed many times a frame like other material parameters

- Emissive surfaces are geometrically lightweight in common cases

- Light fixtures elevator switches clocks computer monitors

- Emissive surfaces are uncommon in general

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Color Compression

- Transform to perceptual basis YUV YCrCb YCoCg

- Human perceptual system sensitive to luminance shifts

- Human perceptual system fairly insensitive to chroma shifts

- Color swatches textures can be pre-transformed

- Already a practice for higher quality dxt compression [Waveren 07]

- Store chroma components at a lower frequency

- Write 2 components of the signal alternating between chroma bases

- Color data encoded in checkerboarded YCoCg space [Mavridis 12]

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer PackingFormat

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Sign Bits of R G and B are available for use as flags

- ie Material Type

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG NormalX 12 Bits NormalY 12 Bits

- RGB Float 96bpp

- Throw out velocity discretize normals a bit more

- In practice not reliable bandwidth saving RGB Float is deprecated in

webGL Could be RGBA Float texture under the hood

B Depth 31 Bits Metallic 1 Bit

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Format

R ColorY 7 Bits ColorC 5 Bits (sign bit)

G NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

A Depth 15 Bits Metallic 1 Bit

- RGBA Half-float 64 bpp

- Half-float target more challenging

- Probably not practical Depth precision is the real killer here

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Format

R ColorY 7 Bits ColorC 4 Bits Metallic 1

BitG NormalX 9 Bits (sign bit) Gloss 3 Bits

B NormalY 9 Bits (sign bit) Gloss 3 Bits

- RGB Half-float 48 bpp

- Rely on WEBGL_depth_texture support to read depth from renderbuffer

- Future work to evaluate Probably too discretized

- Maybe useful on mobile where mediump 16-bit float preferable

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

G-Buffer Format

R ColorY 8 Bits ColorC 8 Bits Gloss 8

BitsG VelocityX 10 Bits NormalX 14 Bits

B VelocityY 10 Bits NormalY 14 Bits

A Depth 31 Bits Metallic 1 Bit

- RGBA Float 128bpp

- Letrsquos take a look at packing code for this format

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Packing Color and Glossvec4 encodeGBuffer(const in gBufferComponents components const in vec2 uv const in vec2 resolution)

vec4 res

Interlace chroma and bias -05 to 05 chroma range to 00 to 10 range

vec3 colorYcocg = rgbToYcocg(componentscolor)

vec2 colorYc

colorYcx = colorYcocgx

colorYcy = checkerboardInterlace(colorYcocgyz uv resolution)

const float CHROMA_BIAS = 05 2560 2550

colorYcy += CHROMA_BIAS

resx = uint8_8_8_to_uint24(sample_to_uint8_8_8(vec3(colorYc componentsgloss)))

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Packing Normal and Velocityvec2 normalOctohedron = octohedronEncode(componentsnormal)

vec2 normalOctohedronQuantized

normalOctohedronQuantizedx = normalizedFloat_to_uint14(normalOctohedronx)

normalOctohedronQuantizedy = normalizedFloat_to_uint14(normalOctohedrony)

takes in screen space -10 to 10 velocity and stores -512 to 511 quantized pixel velocity

-512 and 511 both represent infinity

vec2 velocityQuantized = componentsvelocity resolution SUB_PIXEL_PRECISION_STEPS 05

velocityQuantized = floor(clamp(velocityQuantized -5120 5110))

velocityQuantized += 5120

resy = uint10_14_to_uint24(vec2(velocityQuantizedx normalOctohedronQuantizedx))

resz = uint10_14_to_uint24(vec2(velocityQuantizedy normalOctohedronQuantizedy))

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Packing Depth and Metallic

Pack depth and metallic together

If not metallic negate depth Extract bool as sign()

resw = componentsdepth componentsmetallic

return res

- Phew wersquore done

- Depth is the cheapest to encode decode

- Can write fast depth decode function for ray marching screen space

sampling shaders such as AO

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Packing Challenges

- Must balance packing efficiency with cost of encoding decoding

- Packed pixels cannot be correctly hardware filtered

- Deferred decals cannot be alpha blended

- No MSAA

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Direct Light

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Accumulation Buffer

- Accumulate opaque surface direct lighting to an RGB Float Render Target

- Half Float where supported

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Light Uniforms- ClipFar float

- Color vec3

- Decay Exponent float

- Gobo sampler2D

- HotspotLengthScreenSpace float

- Luminous Intensity float

- Position vec3

- TextureAssignedGobo float

- ViewProjectionMatrix mat4

- ViewMatrix mat4

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Rasterize Proxy

- Point Light = Sphere Proxy

- Spot Light = Cone Pyramid Proxy

- Directional Light = Billboard

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Decode G-Buffer RGB Lighting

gBufferComponents decodeGBuffer(

const in sampler2D gBufferSampler

const in vec2 uv

const in vec2 gBufferResolution

const in vec2 inverseGBufferResolution)

gBufferComponents res

vec4 encodedGBuffer = texture2D(gBufferSampler uv)

resdepth = abs(encodedGBufferw)

Early out if sampling infinity

if (resdepth lt= 00)

rescolor = vec3(00)

return res

- Decode Depth

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Decode G-Buffer RGB Lighting

resmetallic = sign(encodedGBufferw)

- Decode Metallic

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Decode G-Buffer RGB Lighting

vec2 velocityNormalQuantizedX = uint24_to_uint10_14((encodedGBuffery))

vec2 velocityNormalQuantizedY = uint24_to_uint10_14((encodedGBufferz))

vec2 normalOctohedron

normalOctohedronx = uint14_to_normalizedFloat(velocityNormalQuantizedXy)

normalOctohedrony = uint14_to_normalizedFloat(velocityNormalQuantizedYy)

resnormal = octohedronDecode(normalOctohedron)

- Decode Normal

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Decode G-Buffer RGB Lighting

resvelocity = vec2(velocityNormalQuantizedXx velocityNormalQuantizedYx)

resvelocity -= 5120

if (max(abs(resvelocityx) abs(resvelocityy)) gt 5100)

When velocity is out of representable range throw it outside of screenspace for culling in future passes

sqrt(2) + 1e-3

resvelocity = vec2(141521356)

else

resvelocity = inverseGBufferResolution INVERSE_SUB_PIXEL_PRECISION_STEPS

- Decode Velocity

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Decode G-Buffer RGB Lighting

vec3 colorGlossData = uint8_8_8_to_sample(uint24_to_uint8_8_8(encodedGBufferx))

resgloss = colorGlossDataz

- Decode Gloss

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Decode G-Buffer RGB Lighting

const float CHROMA_BIAS = 05 2560 2550

vec3 colorYcocg

colorYcocgx = colorGlossDatax

colorYcocgy = colorGlossDatay - CHROMA_BIAS

- Decode Color YC

- Now we need to reconstruct the missing chroma sample in order to light

our G-Buffer in RGB space

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Decode G-Buffer RGB Lighting

vec2 gBufferSampleYc0 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample0x))xy

vec2 gBufferSampleYc1 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample1x))xy

vec2 gBufferSampleYc2 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample2x))xy

vec2 gBufferSampleYc3 = uint8_8_8_to_sample(uint24_to_uint8_8_8(gBufferSample3x))xy

gBufferSampleYc0y -= CHROMA_BIAS

gBufferSampleYc1y -= CHROMA_BIAS

gBufferSampleYc2y -= CHROMA_BIAS

gBufferSampleYc3y -= CHROMA_BIAS

- Decode G-Buffer Cross Neighborhood Color YC

vec4 gBufferSample0 = texture2D(gBufferSampler vec2(uvx - inverseGBufferResolutionx uvy))

vec4 gBufferSample1 = texture2D(gBufferSampler vec2(uvx + inverseGBufferResolutionx uvy))

vec4 gBufferSample2 = texture2D(gBufferSampler vec2(uvx uvy + inverseGBufferResolutiony))

vec4 gBufferSample3 = texture2D(gBufferSampler vec2(uvx uvy - inverseGBufferResolutiony))

- Sample G-Buffer Cross Neighborhood

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Decode G-Buffer RGB Lighting

float gBufferSampleDepth0 = abs(gBufferSample0w)

float gBufferSampleDepth1 = abs(gBufferSample1w)

float gBufferSampleDepth2 = abs(gBufferSample2w)

float gBufferSampleDepth3 = abs(gBufferSample3w)

- Decode G-Buffer Cross Neighborhood Depth

Account for samples at infinity by setting their luminance and chroma to 0

gBufferSampleYc0 = gBufferSampleDepth0 gt 00 gBufferSampleYc0 vec2(00)

gBufferSampleYc1 = gBufferSampleDepth1 gt 00 gBufferSampleYc1 vec2(00)

gBufferSampleYc2 = gBufferSampleDepth2 gt 00 gBufferSampleYc2 vec2(00)

gBufferSampleYc3 = gBufferSampleDepth3 gt 00 gBufferSampleYc3 vec2(00)

- Guard Against Chroma Samples at Infinity

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Decode G-Buffer RGB Lighting

colorYcocgyz = reconstructChromaComponent(colorYcocgxy gBufferSampleYc0 gBufferSampleYc1 gBufferSampleYc2

gBufferSampleYc3)

- Reconstruct missing chroma sample based on luminance similarity

float offsetDirection = getCheckerboard(uv gBufferResolution)

colorYcocgyz = offsetDirection gt 00 diffuseYcocgyz diffuseYcocgzy

- Swizzle chroma samples based on subsampled checkerboard layout

- Color stored in non-linear space to distribute precision perceptually

Color stored in sRGB-gtYCoCg Returned as linear RGB for lighting

rescolor = sRgbToRgb(YcocgToRgb(colorYcocg))

return res

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Decode G-Buffer RGB Lighting

- Quite a bit of work went into reconstructing that missing chroma

component

- Can we defer reconstruction later down the pipe

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Light Pre-pass

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Light Pre-pass

- Many resources

- [Geldreich 04][Shishkovtsov 05][Lobanchikov 09][Mittring 09][Hoffman 09][Sousa 13][Pranckevičius 13]

- Accumulate lighting unmodulated by albedo or specular color

- Modulate by albedo and specular color in resolve pass

- Pulls fresnel out of the integral with nDotV approximation

- Bad for microfacet model We want nDotH

- Could light pre-pass all non-metallic pixels due to constant 004

- Keep fresnel inside the integral for nDotH evaluation

- Requires running through all lights twice

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Reconstruct missing chroma component in a post process

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Artifacts

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Results- All results are rendered

- Direct Light Only

- No Anti-Aliasing

- No Temporal Techniques

- G-Buffer Color Component YCoCg Checkerboard Interlaced

- Unique settings will accompany each result

- Percentages represent render target dimensions not pixel count

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

RGB Lighting Rendered at 100

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting Rendered at 100

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

RGB Lighting Rendered at 25

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting Rendered at 25

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Letrsquos take a closer look

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

EnhanceRGB Lighting 100

YC Lighting 100 YC Lighting 25

RGB Lighting 25

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Results- Chroma artifacts incurred from YC Lighting seem a fair tradeoff for decode savings

- Challenging to find artifacts when viewed at 100

- Easy to find artifacts in detail shots

- Artifacts occur at strong chroma boundaries

- Depends on art direction

- Temporal techniques can significantly mitigate artifacts

- Can alternate checkerboard pattern each frame

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Implementation

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting- Light our G-Buffer in chroma subsampled YC space

- Modify incoming radiance evaluation to run in YCoCg Space

- Access light color in YCoCg Space

- Already have Y from Luminance Intensity Uniform

- Color becomes vec2 chroma

- Modify BRDF evaluation to run in YCoCg Space

- Schlickrsquos Approximation of Fresnel

- Luminance calculation the same

- Chroma calculation inverted approaches zero at perpendicular

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting- RGB Schlickrsquos Approximation of Fresnel [Schlick 94]

vec3 fresnelSchlick(const in float vDotH const in vec3 reflectionCoefficient)

float power = pow(10 - vDotH 50)

return (10 - reflectionCoefficient) power + reflectionCoefficient

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting- YC Schlickrsquos Approximation of Fresnel

vec2 fresnelSchlickYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = pow(10 - vDotH 50)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

- Slightly cheaper Donrsquot be fooled that we expanded from vector to scalar arithmetic Save an

ADD in the 2nd component Not to mention we are now operating on a vec2 saving us a MADD

and ADD from the skipped 3rd component

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting- Works fine with spherical gaussian [Lagarde 12] approximation too

vec2 fresnelSchlickSphericalGaussianYC(const in float vDotH const in vec2 reflectionCoefficientYC)

float power = exp2((-555473 vDotH - 698316) vDotH)

return vec2(

(10 - reflectionCoefficientYCx) power + reflectionCoefficientYCx

reflectionCoefficientYCy -power + reflectionCoefficientYCy

)

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting- Write YC to RG components of render target

- Frees up B component

- Could write outgoing radiance unmodulated by albedo for more accurate light meter data

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting- Write YC to RG components of render target

- Could write to an RGBA target and light 2 pixels at once YCYC

- Write bandwidth savings

- Where typical scenes are bottlenecked

- Only applicable for billboard rasterization

- Canrsquot conservatively depth stencil test light proxies

- Interesting for tiled deferred [Olsson 11] clustered [Billeter 12] approaches

- Future work

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting- Reconstruct missing chroma component in a post process

- Bilateral Filter

- Luminance Similarity

- Geometric Similarity

- Depth

- Normal

- Plane

- Wrap into a pre-existing billboard pass Plenty of candidates

- OIT Transparency Composite

- Anti-Aliasing

Tonemapping

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

YC Lighting- Simple luminance based chroma reconstruction function for radiance data

vec2 reconstructChromaHDR(const in vec2 center const in vec2 a1 const in vec2 a2 const in vec2 a3 const in vec2 a4)

vec4 luminance = vec4(a1x a2x a3x a4x)

vec4 chroma = vec4(a1y a2y a3y a4y)

vec4 lumaDelta = abs(luminance - vec4(centerx))

const float SENSITIVITY = 250

vec4 weight = exp2(-SENSITIVITY lumaDelta)

Guard the case where sample is black

weight = step(1e-5 luminance)

float totalWeight = weightx + weighty + weightz + weightw

Guard the case where all weights are 0

return totalWeight gt 1e-5 vec2(centery dot(chroma weight) totalWeight) vec2(00)

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Thanks for listening

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Oh right wersquore hiring- If you enjoy working on these sorts of problems let us know

- Contact Josh Paul

- Our very own talent scout joshflooredcom

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Thanks Floored EngineeringJuan Andres Andrango Neha Batra Dustin Byrne Emma Carlson Won Chun Andrey Dmitrov Lars

Hamre Judy He Josh Karges Ben LeVeque Yingxue Li Rob Thomas Angela Wei

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Questionsnickflooredcom

pastasfuture

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Resources[WebGLStats] WebGL Stats

httpwebglstatscom 2014

[Moumlller 08] Real-Time Rendering

Thomas Akenine-Moumlller Eric Haines Naty Hoffman 2008

[Hoffman 10] Physically-Based Shading Models in Film and Game Production

httprenderwonkcompublicationss2010-shading-coursehoffmans2010_physically_based_shading_hoffman_a_notespdf Naty Hoffman Siggraph 2010

[Lagarde 11] Feeding a Physically-Based Shading Model

httpseblagardewordpresscom20110817feeding-a-physical-based-lighting-mode Seacutebastien Lagarde 2011

[Burley 12] Physically-Based Shading at Disney

httpdisney-animations3amazonawscomlibrarys2012_pbs_disney_brdf_notes_v2pdf Brent Burley 2012

[Karis 13] Real Shading in Unreal Engine 4

httpblogselfshadowcompublicationss2013-shading-coursekariss2013_pbs_epic_notes_v2pdf Brian Karis 2013

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Resources[Pranckevičius 09] Encoding Floats to RGBA - The final

httparas-pinfoblog20090730encoding-floats-to-rgba-the-final Aras Pranckevičius 2009

[Cigolle 14] A Survey of Efficient Representations for Independent Unit Vectors

httpjcgtorgpublished00030201 Cigolle Donow Evangelakos Mara McGuire Meyer 2014

[Mavridis 12] The Compact YCoCg Frame Buffer

httpjcgtorgpublished00010102 Mavridis and Papaioannou Journal of Computer Graphics Techniques 2012

[Waveren 07] Real-Time YCoCg-DXT Compression

httpdeveloperdownloadnvidiacomwhitepapers2007Real-Time-YCoCg-DXT-CompressionReal-Time20YCoCg-DXT20Compressionpdf JMP van

Waveren Ignacio Castantildeo 2007

[Geldreich 04] Deferred Lighting and Shading

httpssitesgooglecomsiterichgel99home Rich Geldreich Matt Pritchard John Brooks 2004

[Hoffman 09] Deferred Lighting Approaches

httpwwwrealtimerenderingcomblogdeferred-lighting-approaches Naty Hoffman 2009

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Resources[Shishkovtsov 05] Deferred Shading in STALKER

httphttpdevelopernvidiacomGPUGems2gpugems2_chapter09html Oles Shishkovtsov 2005

[Lobanchikov 09] GSC Game Worldrsquos STALKER Clear Sky - a Showcase for Direct3D 1001

httpamd-devwpenginenetdna-cdncomwordpressmedia20121001GDC09AD3DDStalkerClearSky210309ppt Igor A Lobanchikov Holger Gruen Game

Developers Conference 2009

[Mittring 09] A Bit More Deferred - CryEngine 3

httpwwwcrytekcomcryenginecryengine3presentationsa-bit-more-deferred---cryengine3 Martin Mittring 2009

[Sousa 13] The Rendering Technologies of Crysis 3

httpwwwcrytekcomcryenginepresentationsthe-rendering-technologies-of-crysis-3 Tiago Sousa 2013

[Pranckevičius 13] Physically Based Shading in Unity

httparas-pinfotextsfiles201403-GDC_UnityPhysicallyBasedShading_notespdf Aras Pranckevičius Game Developers Conference 2013

[Olsson 11] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=tiled_shading Ola Olsson Ulf Assarsson 2011

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Resources[Billeter 12] Clustered Deferred and Forward Shading

httpwwwcsechalmersse~olaolssmain_framephpcontents=publicationampid=clustered_shading Markus Billeter Ola Olsson Ulf Assarsson 2012

[Yang 09] Amortized Supersampling

httpresearchmicrosoftcomen-usumpeoplehoppesupersamplepdf Lei Yang Diego Nehab Pedro V Sander Pitchaya Sitthi-amorn Jason Lawrence

Hugues Hoppe 2009

[Herzog 10] Spatio-Temporal Upsampling on the GPU

httpspeoplempi-infmpgde~rherzogPapersspatioTemporalUpsampling_preprintI3D2010pdf Robert Herzog Elmar Eisemann Karol Myszkowski H-P

Seidel 2010

[Wronski 14] Temporal Supersampling and Antialiasing

httpbartwronskicom20140315temporal-supersampling-and-antialiasing Bart Wronski 2014

[Karis 14] High Quality Temporal Supersampling

httpsde45xmedrsdbpcloudfrontnetResourcesfilesTemporalAA_small-71938806pptx Brian Karis 2014

[Walter 07] Microfacet Models for Refraction Through Rough Surfaces

httpwwwcscornelledu~srmpublicationsEGSR07-btdfpdf Bruce Walter Stephan R Marschner Hongsong Li Kenneth E Torrance 2007

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994

Resources[Heitz 14] Understanding the Shadow Masking Function

httpjcgtorgpublished00030203paperpdf Eric Heitz 2014

[Schlick 94] An Inexpensive BRDF Model for Physically-based Rendering

httpwwwcsvirginiaedu~jdlbibappearanceanalytic20modelsschlick94bpdf Christophe Schlick 1994

[Lagarde 12] Spherical Gaussian Approximation for Blinn-Phong Phong and Fresnel

httpseblagardewordpresscom20120603spherical-gaussien-approximation-for-blinn-phong-phong-and-fresnel Sebastien Lagarde 2012

[Oren 94] Generalization of Lambertrsquos Reflectance Model

httpwww1cscolumbiaeduCAVEpublicationspdfsOren_SIGGRAPH94pdf Michael Oren Shree K Nayar 1994