gpu acceleration for the web : aug14.pdf

28
© Copyright Khronos Group 2014 - Page 1 GPU Acceleration for the Web State of the Union Neil Trevett Khronos President NVIDIA Vice President Mobile Content

Upload: nguyenkhue

Post on 27-Dec-2016

232 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 1

GPU Acceleration for the Web State of the Union

Neil Trevett Khronos President

NVIDIA Vice President Mobile Content

Page 2: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 2

Khronos Connects Software to Silicon

Open Consortium creating ROYALTY-FREE, OPEN STANDARD APIs for hardware acceleration

Defining the roadmap for low-level silicon interfaces needed on every platform

Graphics, compute, rich media, vision, sensor and camera

processing

Rigorous specifications AND conformance tests for cross-

vendor portability

Acceleration APIs BY the Industry

FOR the Industry

Well over a BILLION people use Khronos APIs Every Day…

Page 3: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 3

http://accelerateyourworld.org/

Page 4: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 4

Mobile Web is a Real Time Application

Buttery smooth touch interaction needs continuous

60Hz updates

Apple iPhone

320x480 153K Pixels

163 DPI

Apple iPad

1024x768 786K Pixels

132 DPI

2048x1536 3100K Pixels

326 DPI

Apple iPad Mini

In 5 years the number of pixels to process on

mobile screens has gone up by factor of TWENTY

+ =

Need GPU Acceleration for Web Rendering!

Page 5: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 5

Access to 3D on Over 2 BILLION Devices

300M Desktops / year

1.9B Mobiles / year

1B Browsers / year

Source: Gartner (December 2013)

Page 6: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 6

Pervasive WebGL

http://caniuse.com/#feat=webgl

WebGL on EVERY major desktop browser And coming to ALL major mobile browsers

Portable (NO source change) 3D applications are possible for the first time

Page 7: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 7

WebGL Tool/Engine Ecosystem

https://www.youtube.com/watch?v=l9KRBuVBjVo

Epic Citadel - WebGL HTML 5 Benchmark (Firefox 22)

Page 8: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 8

WebGL on Mobile

http://crypt-webgl.unigine.com/

Unigine Engine Demo

Page 9: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 9

WebGL Roadmap

2003 1.0

2004 1.1

2007 2.0

2012 3.0

2014 3.1

Driver Update

Silicon Update

Silicon Update

Driver Update

Compute Shaders

32-bit integers and floats NPOT, 3D/depth textures Texture arrays Multiple Render Targets

Programmable Shaders Fixed function

Pipeline

2011

WebGL 2.0 - Open Review http://www.khronos.org/registry/webgl/specs/latest/2.0/

WebGL 2.0 Under Development

WebGL 1.0

Page 10: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 10

glTF - Transmitting 3D Assets to WebGL Apps • ‘GL Transmission Format’

- Runtime asset format for WebGL, OpenGL ES, and OpenGL applications

• Efficient Representation = Small Size AND Minimal Load Processing - JSON for scene structure and other high-level constructs - Binary mesh and animation data - Little or no processing to drop glTF data into client application

• Runtime Neutral - Can be created and used by any app or runtime

• Khronos is prototyping standards-based pipeline - Conditioning of COLLADA assets into glTF for WebGL applications

Playback Authoring

Page 11: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 11

COLLADA and glTF Ecosystem

Tool Interop

Three.js glTF Importer. Rest3D initiative

COLLADA2GLTF Translator

OpenCOLLADA Importer/Exporter

and COLLADA Conformance Tests

On GitHUB

Pervasive WebGL deployment

Other authoring formats

Web-based Tools

Page 12: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 12

Open Source Resources for glTF • COLLADA2GLTF open-source converter is gaining robustness and momentum

- https://github.com/KhronosGroup/glTF/tree/master/converter/COLLADA2GLTF - Binaries are available on GitHUB for easy use

• Three.js glTF loader - https://github.com/KhronosGroup/glTF/tree/master/loaders/threejs - Most glTF features are already supported

• Open specification; Open process - Spec, and sample code: https://github.com/KhronosGroup/glTF - All features backed up by multiple implementations in code - glTF 0.8 schema available - getting very close to glTF 1.0!

• Convertor using Open3DGC to compress 3D Meshes, Skinning, Animations - Available at https://github.com/fabrobinet/glTF-webgl-viewer

Page 13: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 13

glTF and Compression Extension • Benchmarking 3D compression formats for implementation as glTF extensions

- Baseline is GZIP - MPEG royalty-free Scalable Complexity

3D Mesh Compression codec MPEG-SC3DMC - Open3DGC JavaScript and C/C++ implementation

- WebGL-loader is Google lightweight compression format for WebGL content

Format

CAD Models (Mbytes)

3D Scanned Models (Mbytes)

MPEG dataset (Mbytes)

OBJ 1310 (100%) 736 (100%) 600 (100%) Gzip 336 (26%) 204 (28%) 157 (26%) Webgl-loader 219 (17%) 117 (16%) 103 (17%) Open3DGC 67 (5%) 22 (3%) 22 (4%) Webgl-loader + Gzip 80 (6%) 38 (5%) 26 (4%)

Open3DGC is 5x-9x more efficient than Gzip and 1.2x-1.5x more efficient than webgl-loader

Page 14: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 14

Motivation for WebCL • Parallel acceleration for compute-intensive web applications

- Portable and efficient access to heterogeneous multicore devices in JavaScript

• Typical Use Cases - 3D asset codecs, video codecs and processing, imaging and vision processing - Physics for WebGL games, Online data visualization, Augmented Reality

• WebCL 1.0 specification officially released at GDC March 2014 - https://www.khronos.org/webcl

http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc

Page 15: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 15

WebCL - Heterogeneous Computing for Web • OpenCL = Two APIs and C-based Kernel language

- Platform Layer API to query, select and initialize compute devices - Kernel language - Subset of ISO C99 + language extensions - C Runtime API to build and execute kernels across multiple devices

• WebCL defines JavaScript binding to the OpenCL APIs - Enables initiation of OpenCL C Kernels from within the browser

OpenCL

Kernel Code

OpenCL Kernel Code

OpenCL Kernel Code

OpenCL C Kernel Code

GPU

DSP CPU

CPU HW

JavaScript Platform API To query, select and initialize

compute devices

JavaScript Runtime API To build and execute kernels

across multiple devices

Page 16: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 16

Content JavaScript, HTML, CSS, ...

WebGL/WebCL Ecosystem

JavaScript Middleware

JavaScript HTML5 / CSS

Browser provides WebGL and WebCL Alongside other HTML5 technologies

No plug-in required

OS Provided Drivers WebGL uses OpenGL ES 2.0 or

Angle for OpenGL ES 2.0 over DX9 WebCL uses OpenCL 1.X

Content downloaded from the Web

Middleware can make WebGL and WebCL accessible to non-expert programmers

E.g. three.js library: http://threejs.org/ used by majority of WebGL content

Low-level APIs provide a powerful foundation for a rich JavaScript

middleware ecosystem

Page 17: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 17

WebCL - Designed-in Architectural Security • Leverages OpenCL 1.2 robustness/security extensions

- Context Termination: to prevent DoS from long running kernels - Memory Initialization: no leakage from out of bounds memory access

• API and Language Restrictions to ensure no unsafe behavior is possible - Structures are not supported as kernel arguments - Kernels name must be less than 256 characters - Mapping of CL memory objects into host memory space is not supported - Program binaries are not supported - Some OpenCL API functions & builtin functions may require translation

• WebCL Kernel Validator - Open source on GitHub - Static and dynamic kernel checking - Verifies memory accesses are inside valid memory areas - Run-time checks injected in code if neccesary - https://github.com/KhronosGroup/webcl-validator

Page 18: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 18

• Implementations - Nokia - Firefox extension (Mozilla Public License 2.0)

- https://github.com/toaarnio/webcl-firefox - Samsung - WebKit (BSD)

- https://github.com/SRA-SiliconValley/webkit-webcl - Motorola - Uses Node.js (BSD)

- https://github.com/Motorola-Mobility/node-webcl - AMD –Chromium build

- https://github.com/amd/Chromium-WebCL

• WebCL Kernel Validator (open source) - https://github.com/KhronosGroup/webcl-validator

• OpenCL to WebCL Translator - https://github.com/wolfviking0/webcl-translator

• OpenCL Conformance Tests - https://github.com/KhronosGroup/WebCL-conformance/

WebCL Open Source Resources

http://fract.ured.me/

Based on Iñigo Quilez, Shader Toy

Based on Apple QJulia

Based on Iñigo Quilez, Shader Toy

Page 19: GPU Acceleration for the web : Aug14.pdf

© 2014 NVIDIA - Page 19

Path Rendering Acceleration Offload the CPU so the application can run as fast as possible

Make maximum use of the GPU for best performance and power

CPU creates paths

Use standard 3D commands to

process polygons

CPU renders paths

CPU creates paths

CPU tessellates paths into polygons

Define new OpenGL path commands to

process paths directly

CPU creates paths

- Software Scanline renderers can be high quality and portable

- CPU has to process complete pipeline – stealing cycles

from the application - Software rendering limits

performance

- Tessellation loads the CPU – stealing cycles from the application so perf

sometimes slower than software alone - Tessellation consumes a lot of data

and memory bandwidth = power - Quality can be compromised due to

tessellation accuracy

CPU

GPU

- Maximum CPU offload - Compact data format sent

to GPU renderer - GPU provides excellent performance and power

- GPU can increase quality and functionality

Page 20: GPU Acceleration for the web : Aug14.pdf

© 2014 NVIDIA - Page 20

NV_path_rendering OpenGL Extension Brings Path processing directly to OpenGL

OpenGL processes paths as fundmental primitive No tessellation necessary

Goals Functionally complete for key standards: SVG, Canvas, PostScript etc. Much faster—often 4x to 100x faster than CPUs Enhanced quality – can avoid approximations needed by CPU renderers Lower power by leveraging GPU hardware New functionality – e.g. mix 2D paths with 3D and programmable shading

Page 21: GPU Acceleration for the web : Aug14.pdf

© 2014 NVIDIA - Page 21

Stencil then Cover Approach Create a path object and pass directly to the GPU

Cubic & quadratic Bezier segments, line segments, partial elliptical arcs

GPU “Stencils” the path object into the stencil buffer GPU provides massively parallel stenciling of filled or stroked paths Calculate winding rule or containment at every sub-pixel sample in parallel

“Cover” the path object and stencil test against its coverage Test against path coverage determined in the 1st step and shade the path

Uses GPU MSAA anti-aliasing 8 or 16 samples/pixel gives good quality

Step 1 Stencil

Step 2: Cover

repeat

Page 22: GPU Acceleration for the web : Aug14.pdf

© 2014 NVIDIA - Page 22

Enhanced Quality on GPU

conflation artifacts on CPU conflation free on GPU Eliminate Conflation Artifacts

Multiple color AND stencil samples per pixel

color bleeding

Cairo NV_path_rendering Skia

feathers? weird big holes

Stroking approximations avoided by GPU regular grid on CPU - sub-optimal Antialiasing

jitter pattern on GPU for better Antialiasing

GPU Offers Jittered Sampling for Free

GPU

Qt

Cairo

Moiré artifacts Similar for Qt & Skia

Proper gradient filtering on GPU

GPUs great at texturing: Mip-mapping Anisotropic filtering Wrap modes

Page 23: GPU Acceleration for the web : Aug14.pdf

© 2014 NVIDIA - Page 23

New GPU Functionality

light source position for BUMP Mapping

Programmable Shading Paint in GLSL – for filter and blending acceleration

Projective Transformation

Fast Arbitrary Path Clipping

Mixing depth tested Text, 3D, and Paths

linear RGB transition between saturated red and saturated blue has dark purple region

sRGB perceptually smooth transition from saturated red to saturated blue

Fully sRGB Correct Rendering

Page 24: GPU Acceleration for the web : Aug14.pdf

© 2014 NVIDIA - Page 24

NVPR Resources and Adoption Adobe Illustrator CC shipping with NVPR!

Significant application acceleration

Developer resources http://developer.nvidia.com/nv-path-rendering Open source SVG Renderer - pr_svg Whitepapers, FAQ, specification NVprSDK—software development kit Email: [email protected]

Availability Shipping in Release 275 drivers and beyond All CUDA-capable NVIDIA GPUs – including mobile

Proposed open standard Submitted royalty-free to Khronos For use in any OpenGL-family API including WebGL

Page 25: GPU Acceleration for the web : Aug14.pdf

© 2014 NVIDIA - Page 25

Path Rendering Acceleration on Android Tablet

Page 26: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 26

Vision Pipeline Challenges and Opportunities

• Light / Proximity • 2 cameras • 3 microphones • Touch • Position

- GPS - WiFi (fingerprint) - Cellular trilateration - NFC/Bluetooth Beacons

• Accelerometer • Magnetometer • Gyroscope • Pressure / Temp / Humidity

19

Sensor Proliferation Diverse sensor awareness of the user and surroundings

• Camera sensors >20MPix • Novel sensor configurations • Stereo pairs • Plenoptic Arrays • Active Structured Light • Active TOF

Growing Camera Diversity Capturing color, range

and lightfields

Diverse Vision Processors Driving for high performance

and low power

• Camera ISPs • Dedicated vision IP blocks • DSPs and DSP arrays • Programmable GPUs • Multi-core CPUs

Flexible sensor and camera control to generate

required image stream

Use best processing available for image stream processing –

with code portability

Control/fuse vision data by/with all other sensor data

on device

Page 27: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 27

Khronos and W3C Cooperation • Khronos and W3C liaison for Web APIs

- Leverage proven native APIs - Fast API development/deployment - Designed by hardware community - Familiar foundation reduces

developer learning curve

Native APIs shipping or Khronos working group

JavaScript API shipping, acceleration being developed or work underway

WebVX? Vision

Processing

WebKCAM? Camera control

Possible future JavaScript APIs or acceleration

WebStream? Sensor Fusion

Native

JavaScript Canvas

Path Rendering

W3C Augmented Web Community Group discussing many of these vision issues for the Web: e.g. leveraging WebRTC in the short term http://w3.org/community/ar

WebSL? JS Binding to

WebAudio

Page 28: GPU Acceleration for the web : Aug14.pdf

© Copyright Khronos Group 2014 - Page 28

Questions?

• www.khronos.org • [email protected] • @neilt3d