optimized effects for mobile devices

45
Optimized Effects for Mobile Devices Ed Plowman, Director of Performance Analysis, ARM Stacy Smith, Senior Software Engineer, ARM

Upload: shauna

Post on 23-Feb-2016

48 views

Category:

Documents


3 download

DESCRIPTION

Optimized Effects for Mobile Devices. Ed Plowman, Director of Performance Analysis, ARM Stacy Smith, Senior Software Engineer, ARM . Grab Your Crystal Balls. Top 3 questions I get asked: Q. What does the future of mobile content look like? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optimized Effects for Mobile Devices

Optimized Effects for Mobile Devices

Ed Plowman, Director of Performance Analysis, ARM

Stacy Smith, Senior Software Engineer, ARM

Page 2: Optimized Effects for Mobile Devices

Grab Your Crystal Balls Top 3 questions I get asked:

Q. What does the future of mobile content look like? A. That depends on how much GPU capability you have?

Q. How much performance will content developers need? A. As much as you can give them!

Q. When will the mobile reach console quality?

Well, lets take a look at that and see if we can answer the others along the way…

Page 3: Optimized Effects for Mobile Devices

Mobile GPU Compute Year On Year

2006 2008 2010 2012 2014 2016 20180

1000

2000

3000

4000

5000

6000 How long before Desktop GPU compute is in Mobile?

GFL

OPS

/Sec

PS3Xbox 360Sa

te o

f the

Art

Desk

top

Sate of the ArtMobile

Page 4: Optimized Effects for Mobile Devices

2006 2008 2010 2012 2014 2016 20180.000

50.000

100.000

150.000

200.000

250.000

300.000

350.000

400.000 How long before Desktop GPU Bandwidth is seen in Mobile?

Gig

a by

tes/

sec

Mobile GPU BW Growth Year on Year

PS3 Xbox 360Sa

te o

f the

Art

Desk

top

Sate of the ArtMobile

Page 5: Optimized Effects for Mobile Devices

Why is BW Not Progressing as Fast? Simple… Power! I did the graph, but desktop GPU power is too horrific!

Desktop = 170 Watts to >300 Watts… that’s just the GPU! Console = 80-100 Watts (CPU/GPU/WiFi/Network) Mobile Platform = 3 - 7 Watts (CPU/GPU/Modem/WiFi)!

Page 6: Optimized Effects for Mobile Devices

How to Get 100W of Work from 3Watts?“I believe the sign of maturity is accepting

deferred gratification.” - Peggy Cahn

Five main suppliers of GPU tech in mobile Three are deferred renderers And those three make up >90% of the volume This is not a coincidence! Deferred rendering is most efficient GPU tech for Mobile

Efficiency of BW, HW resource and Power Getting the most from it requires slightly different thinking…

Page 7: Optimized Effects for Mobile Devices

Thinking in a Deferred World… Minimize draw calls and state changes

Draw Calls/API calls are not free use them wisely Grouping draw calls with like state = good

But… don’t go crazy Large object batches with high potential occlusion can be costly Remember those vertices still need processing

Draw Target Bind/unbind on each draw call = bad Seen (disappointingly) in a lot of commercial engines Can cause flush and reload cycles of tile/cache memory Bind it once, issue all draw calls, unbind it… Hint: Take a look at the use of glDiscardFramebufferEXT() Indicates to driver that render attachment is done with/complete

Page 8: Optimized Effects for Mobile Devices

Thinking in a Deferred World… Use Vertex Buffer Objects

Client side vertex buffers use copy on write (CoW) on each Draw Call VBO’s don’t, so they provide a considerable performance increase Avoid dynamic VBO or IBO updates using glBufferSubData()

Multiple Render Targets (New for OpenGL® ES 3.0) Very efficient on deferred GPU Make sure sum of bits/frag is “do-able” “in tile” for max performance Different criteria for each GPU provider

Page 9: Optimized Effects for Mobile Devices

Avoiding Blocking Behaviours Deferred GPU’s use a pipeline

glReadPixels(), glCopyTexImage(), glTexSubImage() = bad… If you must use glReadPixels use PBO’s Use FBO instead of glCopyTexImage() Also Occlusion Query (OpenGL ES 3.0) - Results delayed by 1-2 frames Busy waiting on OQ bad idea!

Build Command Vertex Shading Fragment Shading

Frame N [-----] [-----]Frame N+1 Frame N [-----]Frame N+2 Frame N+1 Frame N

Frame N+1 [-----] [-----]

Page 10: Optimized Effects for Mobile Devices

Make Every Access Count Think about “cacheability” of data

De-interleave vertex data

Think about representation Do you really need a FP32/component for a texture coordinates accessing a 512x512

texture?

X Y Z W RGB TexCordVertex = Vert 0 Vert 1

Vert 2 Vert 3Cache line 1 =

Cache line 2 =

Vert 0(XYZW)

(RG

B)Cache line 1 =

Cache line 2 =

Vert 1(XYZW)

Vert 2(XYZW)

Vert 3(XYZW)

Tex CordCache line 3 =

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

(RG

B)

Tex Cord

Tex Cord

Tex Cord

Tex Cord

Tex Cord

Tex Cord

Tex Cord

Page 11: Optimized Effects for Mobile Devices

Compress, Compress, Compress!

8 5.12 3.56 2 1.28 0.89000000000000125354555

Compression Rate (bpp)

PSN

R (d

B)

ASTC = Adaptive Scalable Texture Compression New texture compression standard developed by ARM, adopted by Khronos

KHR_texture_compression_astc_ldr for OpenGL ES and Open GL Increased quality and fidelity at low bit-rates Expansive range of input formats offers complete flexibility

Choice of base format, 2D and 3D plus addition of HDR formats

Page 12: Optimized Effects for Mobile Devices

Compression in the Pre-ASTC World

LLA

X+YHDR L

RGBXY+ZRBGA

RGB+AHDR X+YHDR RGB

HDR XY+Z

HDR RGBA

HDR RGB+A

1 2 3 4 5 6 7 8Compressed bits/pixel

Inpu

t Col

or F

orm

ats

8

16

16

16

24

24

32

32

32

48

48

64

64

Inpu

t bits

/pixe

l

ETC, BC5

ETC, BC4

BC7

All Major Players

PVRTC PVRTC

ETC, BC1

BC6

ETC, BC2BC3, BC7

Page 13: Optimized Effects for Mobile Devices

ASTC Choices

LLA

X+YHDR L

RGBXY+ZRBGA

RGB+AHDR X+YHDR RGB

HDR XY+Z

HDR RGBA

HDR RGB+A

1 2 3 4 5 6 7 8Compressed bits/pixel

Inpu

t Col

or F

orm

ats

16

24

24

32

32

32

48

48

64

64

Inpu

t bits

/pixe

l

All ASTC

8

16

16

Page 14: Optimized Effects for Mobile Devices

Look it up or Calculate it? You would be surprised what you can get done in a cycle…

precision mediump float;varying vec4 detailtc_envtc, bumptrans;uniform sampler2D dettex, envtex, colormap;uniform float color_param, bumpstrength;void main() { vec4 bt = bumptrans; vec2 bt_crossmul = bt.xy * bt.wz; float diffuse = max(0.0, bt_crossmul.x-bt_crossmul.y); vec4 bump_cr = texture2D(dettex,detailtc_envtc.xy); vec4 tbump = bt * bumpstrength * bump_cr.xyxy; vec2 envtc = tbump.xy + tbump.zw + detailtc_envtc.zw; vec4 col = texture2D(colormap, vec2(bump_cr.z, color_param)); vec4 env = texture2D(envtex,envtc); gl_FragColor = col * diffuse + env * bump_cr.w;}

Shader Features:• Paletted color mapping • Environment mapping • Bump mapping • Variable reflectance mapping • Diffuse falloff of the texture color • Adjustable bump map strength• Adjustable color table

Mali-T600 series = 3 Cycles

Page 15: Optimized Effects for Mobile Devices

GDC 2012 Demo: Timbuktu

Page 16: Optimized Effects for Mobile Devices

Asset ConditioningCross platform - desktop & mobile

Desktop build - caching

Mobile build - loads caches

Asset pipeline - utility functions

Page 17: Optimized Effects for Mobile Devices

BatchingDeferred immediate mode renderingglDrawElements and glDrawArrays have an

overheadLess draw calls, less overhead.DrawCall class stitches multiple objects into one

drawMacro functions in shaders make batching as simple

as:

vec4 pos=transform[getInstance()]*getPosition();

Page 18: Optimized Effects for Mobile Devices

Batching

Page 19: Optimized Effects for Mobile Devices

Batching

uniform mat4 transforms[4];

Page 20: Optimized Effects for Mobile Devices

Object InstancingMultiple geometries, or single object instances:

for(int i=0;i<50;i++) drawbuilder.addGeometry(geo1);

drawbuilder.Build()

Can implement LOD switching, when objects are sorted front to back and correctly culled.

Seen in TrueForce

Page 21: Optimized Effects for Mobile Devices

Special Effects

Page 22: Optimized Effects for Mobile Devices

Special Effects: Bloom

Page 23: Optimized Effects for Mobile Devices

Special Effects: Bloom

When considering uses of greater colour resolution the first thought was HDR and Bloom.

But how to do bloom without the HDR images?

Page 24: Optimized Effects for Mobile Devices

Special Effects: Bloom

Page 25: Optimized Effects for Mobile Devices

Special Effects: BloomRender to low res FBO mapped to texture

Value filter and blur in 1st post- processing pass, onto second FBO textureSample vertical blur in second pass then apply to full resolution frame buffer

Page 26: Optimized Effects for Mobile Devices

Special Effects: Depth of Field

Page 27: Optimized Effects for Mobile Devices

Special Effects: Depth of Field

16bit depth buffers as textures opened the possibility of a variable blur for depth of field

But how to do it without 16bit textures?

Page 28: Optimized Effects for Mobile Devices

Special Effects: Depth of Field

Page 29: Optimized Effects for Mobile Devices

Special Effects: Depth of Field

Additive

Mix

Bloom

Page 30: Optimized Effects for Mobile Devices

Special Effects: Terrain Mapping

Page 31: Optimized Effects for Mobile Devices

Special Effects: Terrain Mapping

Uniform Buffers and Vertex IDs can be used to implement tessellated mesh subdivision

But how can this be approximated without the buffers or IDs?

Page 32: Optimized Effects for Mobile Devices

Special Effects: Terrain Mapping

Page 33: Optimized Effects for Mobile Devices

Special Effects: Terrain Mapping

Page 34: Optimized Effects for Mobile Devices

SIGGRAPH 2012 Demo: Timbuktu 2

Page 35: Optimized Effects for Mobile Devices

Timbuktu 2: Extended featuresOpenGL® ES 3.0!

3D textures

Shadow comparison

16 bit depth textures

HDR lighting

Page 36: Optimized Effects for Mobile Devices

3D TexturesGive more definition to deformed track

3D Textures mipmap in all 3 dimensions

Instead used 2D Texture arrays

Page 37: Optimized Effects for Mobile Devices

3D Textures

floor(z)fract(z)ceil(z)

texture2DArray

mix(t1, t2, frac)

Page 38: Optimized Effects for Mobile Devices

Shadow MappingDepth rendered to FBO from viewpoint of light

Projected onto scene to compare to distance

Doing this in the shader yields some interesting results

Page 39: Optimized Effects for Mobile Devices

Shadow Comparison Texture

Compare

Interpolate

Compare

CompareCompare

Compare

Interpolate

OpenGL ES 2.0 Texture Compare:

OpenGL ES 3.0 Shadow Texture Compare:

Page 40: Optimized Effects for Mobile Devices

16 Bit Depth BuffersAlso used for:

Soft Particles

Better fidelity of DOF

Page 41: Optimized Effects for Mobile Devices

Particle Lighting Displacement Mapping:

Texture offset to X and Y coords

Distortion strength increases over time

Page 42: Optimized Effects for Mobile Devices

HDR LightingRGBA 10 10 10 2 format used

Everything gets normalised

Bright spots need to stand out

Make everything else darker!

Set exposure in post processing

Page 43: Optimized Effects for Mobile Devices

Main conference ARM Sponsored SessionsThursday, March 28th 10:00 – 11:00amRoom 3022

Optimized Effects for Mobile DevicesStacy Smith, Senior Software Engineer, ARMEd Plowman, Director of Performance Analysis, ARM

1:00 – 2:00pmRoom 3016

The Future of Mobile GamingPANEL SESSIONModerator: Jason Della Rocca, Co-founder of Execution LabsBaudouin Corman, Vice President of Publishing, Americas for GameloftDavid Helgason, CEO, UnityDr Chris Doran, Founder & COO, GeomericsNiccolo De Masi, CEO, GluJasper Smith, CEO, PlayJamNizar Romdhane, Director of Ecosystem, ARM

Page 44: Optimized Effects for Mobile Devices

ARM #1124 in-booth Educational TheaterOver 30 talks from ARM and partners such as EpicGames,

Havok, PlayJam, Geomerics, Softkinetics, Metaio, Marmalade

20 minute length talks with Q&A at the endAn Android tablet prize draw at each sessionSummary and videos of all Educational Theater Talks at

http://malideveloper.arm.com/gdc2013

Page 45: Optimized Effects for Mobile Devices

malideveloper.arm.com

Thank youAny questions?