bringing gpu to the web (html5 dev conference) oct13

54
© Copyright Khronos Group 2013 - Page 1 Harnessing the Power of the GPU in Web Applications Neil Trevett Khronos President NVIDIA Vice President Mobile Content

Upload: neil-trevett

Post on 08-May-2015

3.286 views

Category:

Technology


9 download

DESCRIPTION

This presentation explores three open standards bringing the power of the GPU to the Web with cutting edge examples of each: - WebGL is a significant advance in the evolution of 3D on the Web, enabling foundational, GPU-accelerated 3D to be delivered by the browser without the need for a plug-in; - WebCL is a direct JavaScript binding to the OpenCL standard framework for heterogeneous parallel computation in web applications; - NVIDIA has spearheaded research and development into innovative OpenGL functionality that enables full GPU acceleration of vector based APIs such as SVG.

TRANSCRIPT

Page 1: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 1

Harnessing the Power of the GPU

in Web Applications Neil Trevett

Khronos President NVIDIA Vice President Mobile Content

Page 2: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 2

GPUs are everywhere the Web goes. Making full use of GPUs is essential for any modern computing platform.

But.. Traditionally the Web has not made effective use of GPUs. That is changing…

Page 3: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 3

Mobile is the New Epicenter of Innovation

Page 4: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 4

Mobile Web is a Real Time Application

Buttery smooth touch interaction needs continuous

60Hz updates

Apple iPhone

320x480 153K Pixels

163 DPI

Apple iPad

1024x768 786K Pixels

132 DPI

2048x1536 3100K Pixels

326 DPI

Apple iPad Mini

In 5 years the number of pixels to process on

mobile screens has gone up by factor of TWENTY

+ =

Need GPU Acceleration for everything Web!

Page 5: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 5

Mobile SOC Performance Increases

1

100

CPU

/GPU

AG

GRE

GAT

E PE

RFO

RMA

NCE

2013 2015

Tegra 4 Quad A15

2014 2011

2012

Tegra 2 Dual A9

Tegra 3 Quad A9 Power saver 5th core

Logan

10

Parker

HTC One X+

Google Nexus 7

100x perf increase in four years

Device Shipping Dates

Full Kepler GPU CUDA 5.5

OpenGL 4.4

Denver 64-bit CPU Maxwell GPU

NVIDIA Shield

Page 6: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 6

NVIDIA Logan Mobile SOC

Kepler GPU Architecture now on PC and Mobile. Can run essentially the same code – scaled for

different power constraints

Page 7: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 7

How are GPUs Accessible to the Web? Hardware composition

Within the browser stack – under the hood

Vector Acceleration for SVG Using NVIDIA OpenGL extensions

3D Developer Functionality OpenGL ES functionality through JavaScript

Compute Acceleration Offloading compute intensive code to GPU

Compression and streaming of 3D assets For network transmission

Camera, vision and sensor processing Future JavaScript bindings to native APIs?

Page 8: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 8

Khronos Connects Software to Silicon

ROYALTY-FREE, OPEN STANDARD APIs for advanced hardware acceleration

Low level silicon to software interfaces needed on every platform

Graphics, video, audio, compute, vision, sensor and camera processing

Defines the forward looking roadmap for the silicon community

Shipping on billions of devices across multiple operating systems

Rigorous conformance tests for cross-vendor consistency

Khronos is OPEN for any company to join and participate

Acceleration APIs BY the Industry FOR the Industry

Page 9: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 9

Khronos Standards and AR

Native Visual Computing - Gaming and professional apps - Advanced scene construction

3D Asset Authoring - Advanced Authoring pipelines

- glTF 3D Asset Transmission Format with streaming and compression

Sensor Processing - Mobile Vision Acceleration - On-device Sensor Fusion

Camera Control API

Over 100 companies defining royalty-free APIs to connect software to silicon

Acceleration in the Browser - WebGL for 3D in browsers

- WebCL – Heterogeneous Computing for the web

Page 10: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 10

Mobile OS Adoption of Khronos APIs

OpenGL ES 2.0 Shipping - Android 2.2

OpenSL ES 1.0 (subset) Shipping – Android 2.3

OpenMAX AL 1.0 (subset) Shipping - Android 4.0

EGL 1.4 Shipping under SDK -> NDK

Opera and Firefox WebGL now Chrome soon

OpenGL 3.2 on MacOS

OpenCL 1.2 on MacOS

OpenGL ES 3.0 on iOS

Can enable on MacOS Safari iOS5 enables WebGL for iAds

Page 11: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 11

WebGL – 3D on the Web – No Plug-in! • Leveraging HTML 5 and <canvas> element

- WebGL defines JavaScript binding to OpenGL ES 2.0 - Enables a 3D context for the canvas

• Low-level foundational Web API for accessing the GPU - Flexibility and direct GPU access - Enables higher-level frameworks and middleware

Availability of OpenGL and OpenGL ES on almost every

web-capable device

JavaScript binding to

OpenGL ES 2.0 Increasing JavaScript performance. HTML 5 Canvas Tag

Page 12: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 12

Content JavaScript, HTML, CSS, ...

WebGL Implementation Anatomy

JavaScript Middleware

HTML5

JavaScript CSS

Browser provides WebGL functionality alongside other HTML5 technologies

- no plug-in required

OS Provided Drivers. WebGL on Windows can use Direct3D - for example Angle open source

project creates OpenGL ES 2.0 over DX9

OpenGL ES 2.0 OpenGL

DX9/Angle

Content downloaded from the Web. Middleware can make WebGL accessible to

non-expert 3D programmers

Much WebGL content uses

three.js library:

http://threejs.org/

Page 13: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 13

WebGL Availability in Browsers

- Microsoft – “where you have IE11, you have WebGL – turned on by default and working all the time” - Microsoft - WebGL also enabled for Windows applications - web app framework and web view - Chrome on Android now shipping with WebGL - Chrome OS - WebGL is the only cross-platform API to program the GPU - Apple - WebGL is present – but must be explicitly turned on MAC Safari and only exposed on iOS for iAds

Page 14: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 14

Microsoft PhotoSynth2 • Demonstrated at Build 2013

http://channel9.msdn.com/Events/Build/2013/4-072 1:50

Page 15: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 15

WebGL on Logan Android Tablet

Page 16: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 16

WebGL on Logan Android Tablet

Page 17: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 17

C/C++

SDK Dalvik (Java) Objective C C#

DirectX

HTML/CSS HTML/CSS HTML/CSS

Cross-OS Portability

HTML5 provides cross platform portability. GPU

accessibility through WebGL available soon on

~90% mobile systems

Preferred development environments not

designed for portability

Native code is portable- but apps must cope with different available APIs

and libraries

Page 18: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 18

OpenGL 3D API Family Tree

OpenGL ES 1.0 OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0

2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

OpenGL 1.5 OpenGL 2.0 OpenGL 4.3 OpenGL 2.1 OpenGL 3.0

OpenGL 3.1 OpenGL 3.2

OpenGL 3.3 OpenGL 4.0

OpenGL 4.1

OpenGL 4.2

2002

OpenGL 1.3

ES-Next

GL-Next

OpenGL ES 2.0 Content

OpenGL ES 1.1 Content

OpenGL ES 3.0 Content

ES3 is backward compatible so new features can be

added incrementally Fixed function

3D Pipeline Programmable vertex and fragment shaders

WebGL 1.0

OpenGL 4.4 is a superset of DX11

WebGL 2.0

Desktop 3D

Mobile 3D

OpenGL 4.4

WebGL 2.0 is in development now - will bring OpenGL ES 3.0 functionality to the Web

http://www.khronos.org/webgl/public-mailing-list/ http://www.khronos.org/registry/webgl/specs/latest/

http://www.khronos.org/webgl/wiki/Testing/Conformance

Page 19: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 19

OpenGL ES 3.0 Highlights • Better looking, faster performing games and apps – at lower power

- Incorporates proven features from OpenGL 3.3 / 4.x - 32-bit integers and floats in shader programs - NPOT, 3D textures, depth textures, texture arrays - Multiple Render Targets for deferred rendering, Occlusion Queries - Instanced Rendering, Transform Feedback …

• Make life better for the programmer - Tighter requirements for supported features to reduce implementation variability

• Backward compatible with OpenGL ES 2.0 - OpenGL ES 2.0 apps continue to run unmodified

• Standardized Texture Compression - #1 developer request!

Page 20: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 20

Texture Compression is Key •Texture compression saves precious resources

- Network bandwidth, device memory space AND device memory bandwidth

•Developers need the same texture compression EVERYWHERE - Otherwise portable apps – such as WebGL need multiple copies of same texture

DXTC/S3TC Windows

PVRTC iOS

ETC1 Mandated in

Android Froyo (400M devices)

ETC2 / EAC MANDATED in

OpenGL ES 3.0 OpenGL 4.3

ASTC OpenGL ES 3.0

and OpenGL 4.3 extensions -> Core

once proven

Pervasive Deployment

Qua

lity

NOT Royalty-free. Platform

Fragmentation

Royalty-free BUT only optional in ES. Only 4bpp | 3 channel

No alpha support

Royalty-free Backward compatible with ETC1

ETC2: 4bpp | 3 channel EAC: 4 (8) bpp | 1(2) channel

COMBINED: RGBA 8bpp | 4 channel Does not have 1-2 bit compression

WITH ALPHA

Royalty-free Best quality.

Independent control of bit-rate and # channels 1 to 4 channel

1-8bpp in fine steps

2008-2010 2012-2013 2014->

Page 21: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 21

ASTC – Universal Texture Standard • Adaptive Scalable Texture Compression (ASTC)

- Quality significantly exceeds S3TC or PVRTC at same bit rate

• Industry-leading orthogonal compression rate and format flexibility - 1 to 4 color components: R / RG / RGB / RGBA - Choice of bit rate: from 8bpp to <1bpp in fine steps

• ASTC is royalty-free and so is available to be universally adopted - Shipping as OpenGL/OpenGL ES extension today for industry feedback

Original 24bpp

ASTC Compression 8bpp 3.56bpp 2bpp

Page 22: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 22

Why Khronos for WebGL? • Hardware API standards must take into account silicon design cycles

- Multi-year pipeline of APIs that affect chips that take $100Ms to execute - Deep insights into silicon and driver architectures - Rigorous conformance tests and infrastructure

• Khronos is committed to being a good citizen in the larger Web community - Opened Khronos WebGL processes to enable cooperation with web community

• Khronos is the industry forum to drive hardware consensus and cooperation - Help create foundational support for higher-level Web standards that access

hardware capabilities

Page 23: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 23

OpenCL – Portable Heterogeneous Computing • Native framework for programming diverse

parallel computing resources - CPU, GPU, DSP etc.

• OpenCL C kernel language - Very close to C99

• APIs to discover compute resources and distribute kernels - Across all available compute resources

OpenCL Kernel Code

OpenCL Kernel Code

OpenCL Kernel Code

OpenCL Kernel Code

GPU

DSP

One code tree can be executed on CPUs, GPUs, DSPs and hardware.

Dynamically interrogate system load and load balance work across

available processors

CPU

CPU HW

Page 24: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 24

OpenCL as Parallel Compute Foundation

C++ syntax/compiler

extensions

OpenCL HLM JavaScript binding to OpenCL for initiation of OpenCL C kernels

WebCL River Trail Language

extensions to JavaScript

C++ AMP Shevlin Park Uses Clang and LLVM

OpenCL provides vendor optimized, cross-platform, cross-vendor access to

heterogeneous compute resources

Harlan High level

language for GPU programming

Compiler directives for

Fortran C and C++

Aparapi Java language extensions for

parallelism

PyOpenCL Python wrapper

around OpenCL

Page 25: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 25

WebCL – Parallel Computing for the Web • JavaScript bindings to OpenCL APIs

- Enables initiation of Kernels written in OpenCL C within the browser

http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc

Page 26: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 26

Leveraging Proven Native APIs into HTML5 • Khronos and W3C exploring liaison

- Leverage proven native API investments into the Web - Fast API development and deployment - Designed by the hardware community - Familiar foundation reduces developer learning curve

Native APIs shipping or Khronos working group

JavaScript API shipping, acceleration being developed or work underway

WebVX? Vision

Processing

WebCAM(!) Camera

control and video

processing

Possible future JavaScript APIs or acceleration

WebStream? Sensor Fusion

Native

JavaScript Canvas

Path Rendering

Camera Control

HTML

Page 27: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 27

StreamInput - Sensor Fusion • Defines access to high-quality fused sensor stream and context changes

- Implementers can optimize and innovate generation of the sensor stream

OS Sensor OS APIs (E.g. Android SensorManager or

iOS CoreMotion)

Low-level native API defines access to fused sensor data stream and context-awareness

Applications

Sensor Sensor

Sensor Hub Sensor

Hub

StreamInput implementations compete on sensor stream quality,

reduced power consumption, environment triggering and context

detection – enabling sensor subsystem vendors to increased

ADDED VALUE

Middleware (E.g. Augmented Reality engines,

gaming engines)

Platforms can provide increased access to

improved sensor data stream – driving faster, deeper

sensor usage by applications

Middleware engines need platform-portable access to native, low-level

sensor data stream

Mobile or embedded platforms without sensor fusion APIs can provide

direct application access to StreamInput

Page 28: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 28

OpenVX – Power Efficient Vision Acceleration • Complementary to OpenCV open source project

- Which is great for prototyping

• OpenVX is tightly specified API with conformance - Portable, production-grade vision functions

• OpenVX enables graph of vision processing - Each Node in graph can be implemented in

software or accelerated hardware

• Nodes may be fused and optimized - e.g. implementation may stripe execution

over an image sections in cache

Open source sample implementation

Hardware vendor implementations

OpenCV open source library

Other higher-level CV libraries

Application

OpenVX Node

OpenVX Node

OpenVX Node

OpenVX Node

Page 29: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 29

Typical Imaging Pipeline • Processing pre- and post-ISP can be done on CPU, GPU, DSP

- E.g. using OpenCL or OpenVX

• BUT.. Applications have often had limited control over the actual camera and ISP - ISP controls camera via 3A algorithms - Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF)

Pre-processing Image Signal Processor (ISP)

Post-processing

CMOS sensor Color Filter Array

Lens

Bayer RGB/YUV

App

Lens, sensor, aperture control 3A

Need for advanced camera control API: - to provide more flexible app camera control

- over more types of camera sensors - with tighter integration with the rest of the system

Page 30: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 30

Khronos APIs for Augmented Reality

Advanced Camera Control and stream

generation

3D Rendering and Video Composition

On GPU

Audio Rendering

Application on CPUs, GPUs

and DSPs

Sensor Fusion

Vision Processing

MEMS Sensors

Camera Control API

EGLStream - stream data

between APIs

Precision timestamps on all sensor samples

AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together

W3C Augmented Web Community Group discussing many of these issues for the Web: e.g. leveraging WebRTC in the short term http://w3.org/community/ar

Page 31: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 31

3D Needs a Transmission Format! • Compression and streaming of 3D assets becoming essential

- Mobile and connected devices need access to increasingly large asset databases

• 3D is the last media type to define a compressed format - 3D is more complex – diverse asset types and use cases

• Needs to be royalty-free - Avoid an ‘internet video codec war’ scenario

• Eventually enable hardware implementations of successful codecs - High-performance and low power – but pragmatic adoption strategy is key

Audio Video Images 3D MP3 H.264 JPEG ?

! An effective and widely adopted codec ignites previously

unimagined opportunities for a media type

Page 32: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 32

glTF – OpenGL Transmission Format • Binary file format for efficient transmission for 3D assets

- Reduce network bandwidth and minimize client processing overhead

• Run-time neutral - DO NOT IMPLY OR MANDATE ANY RUN-TIME BEHAVIOR - Can be used by any app or run-time – usually WebGL accelerated

• Scalable to handle compression and streaming - Though baseline format does not include compression

• ‘Direct load efficiency’ for WebGL - Little or NO processing to drop glTF data into WebGL client

• Carry conditioned data from any authoring format - Prototyping and optimizing efficient handling of COLLADA assets

A standards-based content pipeline for

rich native and Web 3D applications Playback Authoring

Page 33: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 33

COLLADA and glTF Open Source Ecosystem

Tool Interop

Three.js glTF Importer. Rest3D initiative

COLLADA2GLTF Translator

OpenCOLLADA Importer/Exporter

and COLLADA Conformance Tests

On GitHUB

Pervasive WebGL deployment

Other authoring formats

Web-based Tools

https://github.com/KhronosGroup/glTF

https://github.com/KhronosGroup/OpenCOLLADA

https://github.com/KhronosGroup/COLLADA-CTS

Page 34: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 34

WebGL as Test-bed for 3D Asset Compression • Integrating and benchmarking 3D geometry compression formats with glTF

- Baseline is GZIP

• Scalable Complexity 3D Mesh Compression codec MPEG-SC3DMC - Royalty-free graphics compression technology from MPEG (MIT License) - Open3DGC is efficient JavaScript and C/C++ implementation - Convertor using Open3DGC to compress 3D Meshes, Skinning, Animations - Available at https://github.com/fabrobinet/glTF-webgl-viewer

• WebGL-loader is Google lightweight compression for WebGL content

Page 35: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 35

Compression Efficiency – Early Results

Format

CAD Models (Mbytes)

3D Scanned Models (Mbytes)

MPEG dataset (Mbytes)

OBJ 1310 (100%) 736 (100%) 600 (100%) Gzip 336 (26%) 204 (28%) 157 (26%) Webgl-loader 219 (17%) 117 (16%) 103 (17%) Open3DGC 67 (5%) 22 (3%) 22 (4%) Webgl-loader + Gzip 80 (6%) 38 (5%) 26 (4%)

Open3DGC is 5x-9x more efficient than Gzip and 1.2x-1.5x more efficient than webgl-loader

Page 36: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 36

Decoding Speed • For mobile - need to balance file size AND decompression processing

- Extensive processing can take more time/power than transmission

• OpenCTM is also promising but LZMA is very processor intensive - Work may lead to LZMA in hardware?

Hand (100K Tri.) Dilo (54K Tri.) Octopus (34K Tri.)

Win7 64-bit, 10GB RAM i7-2600 CPU @ 3.4GHz

Samsung Galaxy S4 Android 4.2.2

130 ms 86 ms 65 ms

1045 ms 768 ms 457 ms

Page 37: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 37

Path Rendering Acceleration Offload the CPU so the application can run as fast as possible

Make maximum use of the GPU for best performance and power

CPU creates paths

Use standard 3D commands to

process polygons

CPU renders paths

CPU creates paths

CPU tessellates paths into polygons

Define new OpenGL path commands to

process paths directly

CPU creates paths

- Software Scanline renderers can be high quality and portable

- CPU has to process complete pipeline – stealing cycles

from the application - Software rendering limits

performance

- Tessellation loads the CPU – stealing cycles from the application so perf

sometimes slower than software alone - Tessellation consumes a lot of data

and memory bandwidth = power - Quality can be compromised due to

tessellation accuracy

CPU

GPU

- Maximum CPU offload - Compact data format sent

to GPU renderer - GPU provides excellent performance and power

- GPU can increase quality and functionality

Page 38: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 38

NV_path_rendering OpenGL Extension Brings Path processing directly to OpenGL

No tessellation necessary

Goals Functionally complete for key standards: SVG, Canvas, PostScript etc. Much faster—often 4x to 100x faster than CPUs Enhanced quality – can avoid approximations needed by CPU renderers Lower power by leveraging dedicated hardware New functionality – e.g. mix 2D paths with 3D and programmable shading

Page 39: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 39

Stencil then Cover Approach Create a path object and pass directly to the GPU

Cubic & quadratic Bezier segments, line segments, partial elliptical arcs

GPU “Stencils” the path object into the stencil buffer GPU provides massively parallel stenciling of filled or stroked paths Calculate winding rule or containment at every sub-pixel sample in parallel

“Cover” the path object and stencil test against its coverage Test against path coverage determined in the 1st step and shade the path

Uses GPU MSAA anti-aliasing 8 or 16 samples/pixel gives good quality

Step 1 Stencil

Step 2: Cover

repeat

Page 40: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 40

Excellent Geometric Fidelity for Stroking

Correct stroking is hard Lots of CPU implementations approximate stroking

GPU-accelerated stroking avoids such short-cuts

GPU has FLOPS to compute true stroke point containment

GPU-accelerated OpenVG reference

Cairo Qt

Stroking with tight end-point curve

Page 41: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 41

Micrography

“Girl with Words in Her Hair” 591 paths 338,507 commands 1,244,474 coordinates

Ron Maharik, Mikhail Bessmeltsev, Alla Sheffer, Ariel Shamir and Nathan Carr SIGGRAPH 2011

Page 42: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 42

More Details on nvpr Functionality union of all major path rendering standards

Enables mixing traditional functionality with 3D and programmable shading

Point sampling for path filling is exact No approximations due to tessellation or subdivision

Path stroking is exact Line segments & quadratic Bezier segments stroking is exact All stroke cap + join styles supported Dashing fully supported

Minimal pre-computation required NO tessellation involved, NO recursive subdivision Fast to animate, morph, or edit paths

Page 43: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 43

Enhanced Quality on GPU

conflation artifacts on CPU conflation free on GPU Eliminate Conflation Artifacts

Multiple color AND stencil samples per pixel

color bleeding

Cairo NV_path_rendering Skia

feathers? weird big holes

Stroking approximations avoided by GPU regular grid on CPU - sub-optimal Antialiasing

jitter pattern on GPU for better Antialiasing

GPU Offers Jittered Sampling for Free

GPU

Qt

Cairo

Moiré artifacts Similar for Qt & Skia

Proper gradient filtering on GPU

GPUs great at texturing: Mip-mapping Anisotropic filtering Wrap modes

Page 44: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 44

Comparing Performance

Page 45: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 45

0.10

1.00

10.00

100.00

1000.00 10

0x100

200x2

00 30

0x300

400x4

00 50

0x500

600x6

00 70

0x700

800x8

00 90

0x900

1000x1

000

1100x1

100

100x1

00 20

0x200

300x3

00 40

0x400

500x5

00 60

0x600

700x7

00 80

0x800

900x9

00 100

0x1000

110

0x1100

10

0x100

200x2

00 30

0x300

400x4

00 50

0x500

600x6

00 70

0x700

800x8

00 90

0x900

1000x1

000

1100x1

100

100x1

00 20

0x200

300x3

00 40

0x400

500x5

00 60

0x600

700x7

00 80

0x800

900x9

00 100

0x1000

110

0x1100

10

0x100

200x2

00 30

0x300

400x4

00 50

0x500

600x6

00 70

0x700

800x8

00 90

0x900

1000x1

000

1100x1

100

100x1

00 20

0x200

300x3

00 40

0x400

500x5

00 60

0x600

700x7

00 80

0x800

900x9

00 100

0x1000

110

0x1100

10

0x100

200x2

00 30

0x300

400x4

00 50

0x500

600x6

00 70

0x700

800x8

00 90

0x900

1000x1

000

1100x1

100

100x1

00 20

0x200

300x3

00 40

0x400

500x5

00 60

0x600

700x7

00 80

0x800

900x9

00 100

0x1000

110

0x1100

10

0x100

200x2

00 30

0x300

400x4

00 50

0x500

600x6

00 70

0x700

800x8

00 90

0x900

1000x1

000

1100x1

100

100x1

00 20

0x200

300x3

00 40

0x400

500x5

00 60

0x600

700x7

00 80

0x800

900x9

00 100

0x1000

110

0x1100

10

0x100

200x2

00 30

0x300

400x4

00 50

0x500

600x6

00 70

0x700

800x8

00 90

0x900

1000x1

000

1100x1

100

100x1

00 20

0x200

300x3

00 40

0x400

500x5

00 60

0x600

700x7

00 80

0x800

900x9

00 100

0x1000

110

0x1100

tigerWelsh_dragon

Celtic_round_dogsbutterfly spikesAmerican_Samoacowboy BuonaparteEmbrace_the_WorldYokozawaCougar

tiger_clipped_by_he

NVpr16/Cairo

NVpr16/SkiaBitmap

NVpr16/SkiaGanesh

NVpr16/Direct2D GPU

NVpr16/Direct2D WARP

GeForce GTX 480. Release drivers V.300. x16 MSAA

Comparative Performance (Logarithmic Scale)

Page 46: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 46

New GPU Functionality

light source position for BUMP Mapping

Programmable Shading Paint in GLSL – for filter and blending acceleration

Projective Transformation

Fast Arbitrary Path Clipping

Mixing depth tested Text, 3D, and Paths

linear RGB transition between saturated red and saturated blue has dark purple region

sRGB perceptually smooth transition from saturated red to saturated blue

Fully sRGB Correct Rendering

Page 47: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 47

Mixing 2D and 3D

Page 48: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 48

Resolution-independent Font Support Fonts are a standard, first-class part of all path rendering systems

Foreign to 3D graphics systems such as OpenGL and Direct3D

NV_path_rendering has built-in font support Can specify a range of path objects with

A specified font Sequence or range of Unicode character points

No requirement for applications use font API to load glyphs You can also load glyphs “manually” from your own glyph outlines Functionality provides OS portability

Page 49: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 49

Path Geometric Queries glIsPointInFillPathNV

Determine if object-space (x,y) position is inside or outside path, given a winding number mask

glIsPointInStrokePathNV Determine if object-space (x,y) position is inside the stroke of a path accounts for dash pattern, joins, and caps

glGetPathLengthNV Returns approximation of geometric length of a given sub-range of path segments

glPointAlongPathNV Returns the object-space (x,y) position and 2D tangent vector a given offset into a specified path object Useful for “text follows a path”

Queries are modeled after OpenVG queries

Page 50: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 50

Open Source Accelerated SVG Renderer Partial SVG Renderer - pr_svg

Path filling, transformations and grouping Path stroking with all stroking embellishments Clipping – including clipping paths to other arbitrary paths Painting with linear/radial gradients and images Basic compositing Coming in next update: markers and text

Stuff that’s missing from pr_svg Filters, Blending, Opacity groups, Animation, JavaScript integration Not hard, just best done in context of a browser

NVIDIA welcomes any community involvement http://developer.nvidia.com/nv-path-rendering

Page 51: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 51

More Information Best drivers: OpenGL 4.4

www.nvidia.com/drivers Grab the latest drivers for your OS & GPU Runs on any CUDA-capable GPU (GeForce 8 onwards)

Developer resources http://developer.nvidia.com/nv-path-rendering Whitepapers, FAQ, specification NVprSDK—software development kit NVprDEMOs—pre-compiled Windows demos YouTube videos demonstrate various NVpr DEMOs

Email: [email protected]

Page 52: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 52

Standardization and Adoption Pipeline NVIDIA is proposing nvpr to OpenGL working group at Khronos to create open, royalty-free cross platform foundation for vector graphics acceleration

Vendor Extension to OpenGL

OpenGL Extension

or Core

Vector acceleration pervasive on desktop

and mobile

Initial functionality proposal. Prove concepts.

Solicit industry feedback

Pervasive multi-vendor availability. Widespread application usage inspires silicon optimizations

nvpr is here!

OpenGL vector acceleration adopted into OpenGL and OpenGL ES

Desktop and mobile displays typically

>300 DPI

Mobile silicon is CUDA/OpenCL capable

Page 53: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© 2013 NVIDIA - Page 53

Path Rendering Acceleration on Android Tablet

Page 54: Bringing GPU to the Web (HTML5 Dev Conference) Oct13

© Copyright Khronos Group 2013 - Page 54

Summary • Open standards such as WebGL and WebCL are enabling web applications to

reach the power of the GPU through JavaScript • GPU acceleration will soon become vital for Web applications wanting to

leverage advanced use of camera and sensors • Direct acceleration of path primitives directly on GPUs will drive browser

performance for new classes of applications and devices • Work starting on 3D asset streaming and compression standards – to enable 3D as

a social media type on the web • The Web and hardware community have significant opportunity to leverage each

others efforts for the benefit of the industry • Khronos is committed to enable the hardware community to be a good citizen in

creating the next generation of accelerated web standards

[email protected]