bringing gpu to the web (html5 dev conference) oct13
DESCRIPTION
This presentation explores three open standards bringing the power of the GPU to the Web with cutting edge examples of each: - WebGL is a significant advance in the evolution of 3D on the Web, enabling foundational, GPU-accelerated 3D to be delivered by the browser without the need for a plug-in; - WebCL is a direct JavaScript binding to the OpenCL standard framework for heterogeneous parallel computation in web applications; - NVIDIA has spearheaded research and development into innovative OpenGL functionality that enables full GPU acceleration of vector based APIs such as SVG.TRANSCRIPT
© Copyright Khronos Group 2013 - Page 1
Harnessing the Power of the GPU
in Web Applications Neil Trevett
Khronos President NVIDIA Vice President Mobile Content
© Copyright Khronos Group 2013 - Page 2
GPUs are everywhere the Web goes. Making full use of GPUs is essential for any modern computing platform.
But.. Traditionally the Web has not made effective use of GPUs. That is changing…
© 2013 NVIDIA - Page 3
Mobile is the New Epicenter of Innovation
© 2013 NVIDIA - Page 4
Mobile Web is a Real Time Application
Buttery smooth touch interaction needs continuous
60Hz updates
Apple iPhone
320x480 153K Pixels
163 DPI
Apple iPad
1024x768 786K Pixels
132 DPI
2048x1536 3100K Pixels
326 DPI
Apple iPad Mini
In 5 years the number of pixels to process on
mobile screens has gone up by factor of TWENTY
+ =
Need GPU Acceleration for everything Web!
© 2013 NVIDIA - Page 5
Mobile SOC Performance Increases
1
100
CPU
/GPU
AG
GRE
GAT
E PE
RFO
RMA
NCE
2013 2015
Tegra 4 Quad A15
2014 2011
2012
Tegra 2 Dual A9
Tegra 3 Quad A9 Power saver 5th core
Logan
10
Parker
HTC One X+
Google Nexus 7
100x perf increase in four years
Device Shipping Dates
Full Kepler GPU CUDA 5.5
OpenGL 4.4
Denver 64-bit CPU Maxwell GPU
NVIDIA Shield
© 2013 NVIDIA - Page 6
NVIDIA Logan Mobile SOC
Kepler GPU Architecture now on PC and Mobile. Can run essentially the same code – scaled for
different power constraints
© 2013 NVIDIA - Page 7
How are GPUs Accessible to the Web? Hardware composition
Within the browser stack – under the hood
Vector Acceleration for SVG Using NVIDIA OpenGL extensions
3D Developer Functionality OpenGL ES functionality through JavaScript
Compute Acceleration Offloading compute intensive code to GPU
Compression and streaming of 3D assets For network transmission
Camera, vision and sensor processing Future JavaScript bindings to native APIs?
© Copyright Khronos Group 2013 - Page 8
Khronos Connects Software to Silicon
ROYALTY-FREE, OPEN STANDARD APIs for advanced hardware acceleration
Low level silicon to software interfaces needed on every platform
Graphics, video, audio, compute, vision, sensor and camera processing
Defines the forward looking roadmap for the silicon community
Shipping on billions of devices across multiple operating systems
Rigorous conformance tests for cross-vendor consistency
Khronos is OPEN for any company to join and participate
Acceleration APIs BY the Industry FOR the Industry
© Copyright Khronos Group 2013 - Page 9
Khronos Standards and AR
Native Visual Computing - Gaming and professional apps - Advanced scene construction
3D Asset Authoring - Advanced Authoring pipelines
- glTF 3D Asset Transmission Format with streaming and compression
Sensor Processing - Mobile Vision Acceleration - On-device Sensor Fusion
Camera Control API
Over 100 companies defining royalty-free APIs to connect software to silicon
Acceleration in the Browser - WebGL for 3D in browsers
- WebCL – Heterogeneous Computing for the web
© Copyright Khronos Group 2013 - Page 10
Mobile OS Adoption of Khronos APIs
OpenGL ES 2.0 Shipping - Android 2.2
OpenSL ES 1.0 (subset) Shipping – Android 2.3
OpenMAX AL 1.0 (subset) Shipping - Android 4.0
EGL 1.4 Shipping under SDK -> NDK
Opera and Firefox WebGL now Chrome soon
OpenGL 3.2 on MacOS
OpenCL 1.2 on MacOS
OpenGL ES 3.0 on iOS
Can enable on MacOS Safari iOS5 enables WebGL for iAds
© Copyright Khronos Group 2013 - Page 11
WebGL – 3D on the Web – No Plug-in! • Leveraging HTML 5 and <canvas> element
- WebGL defines JavaScript binding to OpenGL ES 2.0 - Enables a 3D context for the canvas
• Low-level foundational Web API for accessing the GPU - Flexibility and direct GPU access - Enables higher-level frameworks and middleware
Availability of OpenGL and OpenGL ES on almost every
web-capable device
JavaScript binding to
OpenGL ES 2.0 Increasing JavaScript performance. HTML 5 Canvas Tag
© Copyright Khronos Group 2013 - Page 12
Content JavaScript, HTML, CSS, ...
WebGL Implementation Anatomy
JavaScript Middleware
HTML5
JavaScript CSS
Browser provides WebGL functionality alongside other HTML5 technologies
- no plug-in required
OS Provided Drivers. WebGL on Windows can use Direct3D - for example Angle open source
project creates OpenGL ES 2.0 over DX9
OpenGL ES 2.0 OpenGL
DX9/Angle
Content downloaded from the Web. Middleware can make WebGL accessible to
non-expert 3D programmers
Much WebGL content uses
three.js library:
http://threejs.org/
© Copyright Khronos Group 2013 - Page 13
WebGL Availability in Browsers
- Microsoft – “where you have IE11, you have WebGL – turned on by default and working all the time” - Microsoft - WebGL also enabled for Windows applications - web app framework and web view - Chrome on Android now shipping with WebGL - Chrome OS - WebGL is the only cross-platform API to program the GPU - Apple - WebGL is present – but must be explicitly turned on MAC Safari and only exposed on iOS for iAds
© Copyright Khronos Group 2013 - Page 14
Microsoft PhotoSynth2 • Demonstrated at Build 2013
http://channel9.msdn.com/Events/Build/2013/4-072 1:50
© Copyright Khronos Group 2013 - Page 15
WebGL on Logan Android Tablet
© Copyright Khronos Group 2013 - Page 16
WebGL on Logan Android Tablet
© Copyright Khronos Group 2013 - Page 17
C/C++
SDK Dalvik (Java) Objective C C#
DirectX
HTML/CSS HTML/CSS HTML/CSS
Cross-OS Portability
HTML5 provides cross platform portability. GPU
accessibility through WebGL available soon on
~90% mobile systems
Preferred development environments not
designed for portability
Native code is portable- but apps must cope with different available APIs
and libraries
© Copyright Khronos Group 2013 - Page 18
OpenGL 3D API Family Tree
OpenGL ES 1.0 OpenGL ES 1.1 OpenGL ES 2.0 OpenGL ES 3.0
2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
OpenGL 1.5 OpenGL 2.0 OpenGL 4.3 OpenGL 2.1 OpenGL 3.0
OpenGL 3.1 OpenGL 3.2
OpenGL 3.3 OpenGL 4.0
OpenGL 4.1
OpenGL 4.2
2002
OpenGL 1.3
ES-Next
GL-Next
OpenGL ES 2.0 Content
OpenGL ES 1.1 Content
OpenGL ES 3.0 Content
ES3 is backward compatible so new features can be
added incrementally Fixed function
3D Pipeline Programmable vertex and fragment shaders
WebGL 1.0
OpenGL 4.4 is a superset of DX11
WebGL 2.0
Desktop 3D
Mobile 3D
OpenGL 4.4
WebGL 2.0 is in development now - will bring OpenGL ES 3.0 functionality to the Web
http://www.khronos.org/webgl/public-mailing-list/ http://www.khronos.org/registry/webgl/specs/latest/
http://www.khronos.org/webgl/wiki/Testing/Conformance
© Copyright Khronos Group 2013 - Page 19
OpenGL ES 3.0 Highlights • Better looking, faster performing games and apps – at lower power
- Incorporates proven features from OpenGL 3.3 / 4.x - 32-bit integers and floats in shader programs - NPOT, 3D textures, depth textures, texture arrays - Multiple Render Targets for deferred rendering, Occlusion Queries - Instanced Rendering, Transform Feedback …
• Make life better for the programmer - Tighter requirements for supported features to reduce implementation variability
• Backward compatible with OpenGL ES 2.0 - OpenGL ES 2.0 apps continue to run unmodified
• Standardized Texture Compression - #1 developer request!
© Copyright Khronos Group 2013 - Page 20
Texture Compression is Key •Texture compression saves precious resources
- Network bandwidth, device memory space AND device memory bandwidth
•Developers need the same texture compression EVERYWHERE - Otherwise portable apps – such as WebGL need multiple copies of same texture
DXTC/S3TC Windows
PVRTC iOS
ETC1 Mandated in
Android Froyo (400M devices)
ETC2 / EAC MANDATED in
OpenGL ES 3.0 OpenGL 4.3
ASTC OpenGL ES 3.0
and OpenGL 4.3 extensions -> Core
once proven
Pervasive Deployment
Qua
lity
NOT Royalty-free. Platform
Fragmentation
Royalty-free BUT only optional in ES. Only 4bpp | 3 channel
No alpha support
Royalty-free Backward compatible with ETC1
ETC2: 4bpp | 3 channel EAC: 4 (8) bpp | 1(2) channel
COMBINED: RGBA 8bpp | 4 channel Does not have 1-2 bit compression
WITH ALPHA
Royalty-free Best quality.
Independent control of bit-rate and # channels 1 to 4 channel
1-8bpp in fine steps
2008-2010 2012-2013 2014->
© Copyright Khronos Group 2013 - Page 21
ASTC – Universal Texture Standard • Adaptive Scalable Texture Compression (ASTC)
- Quality significantly exceeds S3TC or PVRTC at same bit rate
• Industry-leading orthogonal compression rate and format flexibility - 1 to 4 color components: R / RG / RGB / RGBA - Choice of bit rate: from 8bpp to <1bpp in fine steps
• ASTC is royalty-free and so is available to be universally adopted - Shipping as OpenGL/OpenGL ES extension today for industry feedback
Original 24bpp
ASTC Compression 8bpp 3.56bpp 2bpp
© Copyright Khronos Group 2013 - Page 22
Why Khronos for WebGL? • Hardware API standards must take into account silicon design cycles
- Multi-year pipeline of APIs that affect chips that take $100Ms to execute - Deep insights into silicon and driver architectures - Rigorous conformance tests and infrastructure
• Khronos is committed to being a good citizen in the larger Web community - Opened Khronos WebGL processes to enable cooperation with web community
• Khronos is the industry forum to drive hardware consensus and cooperation - Help create foundational support for higher-level Web standards that access
hardware capabilities
© Copyright Khronos Group 2013 - Page 23
OpenCL – Portable Heterogeneous Computing • Native framework for programming diverse
parallel computing resources - CPU, GPU, DSP etc.
• OpenCL C kernel language - Very close to C99
• APIs to discover compute resources and distribute kernels - Across all available compute resources
OpenCL Kernel Code
OpenCL Kernel Code
OpenCL Kernel Code
OpenCL Kernel Code
GPU
DSP
One code tree can be executed on CPUs, GPUs, DSPs and hardware.
Dynamically interrogate system load and load balance work across
available processors
CPU
CPU HW
© Copyright Khronos Group 2013 - Page 24
OpenCL as Parallel Compute Foundation
C++ syntax/compiler
extensions
OpenCL HLM JavaScript binding to OpenCL for initiation of OpenCL C kernels
WebCL River Trail Language
extensions to JavaScript
C++ AMP Shevlin Park Uses Clang and LLVM
OpenCL provides vendor optimized, cross-platform, cross-vendor access to
heterogeneous compute resources
Harlan High level
language for GPU programming
Compiler directives for
Fortran C and C++
Aparapi Java language extensions for
parallelism
PyOpenCL Python wrapper
around OpenCL
© Copyright Khronos Group 2013 - Page 25
WebCL – Parallel Computing for the Web • JavaScript bindings to OpenCL APIs
- Enables initiation of Kernels written in OpenCL C within the browser
http://www.youtube.com/user/SamsungSISA#p/a/u/1/9Ttux1A-Nuc
© Copyright Khronos Group 2013 - Page 26
Leveraging Proven Native APIs into HTML5 • Khronos and W3C exploring liaison
- Leverage proven native API investments into the Web - Fast API development and deployment - Designed by the hardware community - Familiar foundation reduces developer learning curve
Native APIs shipping or Khronos working group
JavaScript API shipping, acceleration being developed or work underway
WebVX? Vision
Processing
WebCAM(!) Camera
control and video
processing
Possible future JavaScript APIs or acceleration
WebStream? Sensor Fusion
Native
JavaScript Canvas
Path Rendering
Camera Control
HTML
© Copyright Khronos Group 2013 - Page 27
StreamInput - Sensor Fusion • Defines access to high-quality fused sensor stream and context changes
- Implementers can optimize and innovate generation of the sensor stream
OS Sensor OS APIs (E.g. Android SensorManager or
iOS CoreMotion)
Low-level native API defines access to fused sensor data stream and context-awareness
…
Applications
Sensor Sensor
Sensor Hub Sensor
Hub
StreamInput implementations compete on sensor stream quality,
reduced power consumption, environment triggering and context
detection – enabling sensor subsystem vendors to increased
ADDED VALUE
Middleware (E.g. Augmented Reality engines,
gaming engines)
Platforms can provide increased access to
improved sensor data stream – driving faster, deeper
sensor usage by applications
Middleware engines need platform-portable access to native, low-level
sensor data stream
Mobile or embedded platforms without sensor fusion APIs can provide
direct application access to StreamInput
© Copyright Khronos Group 2013 - Page 28
OpenVX – Power Efficient Vision Acceleration • Complementary to OpenCV open source project
- Which is great for prototyping
• OpenVX is tightly specified API with conformance - Portable, production-grade vision functions
• OpenVX enables graph of vision processing - Each Node in graph can be implemented in
software or accelerated hardware
• Nodes may be fused and optimized - e.g. implementation may stripe execution
over an image sections in cache
Open source sample implementation
Hardware vendor implementations
OpenCV open source library
Other higher-level CV libraries
Application
OpenVX Node
OpenVX Node
OpenVX Node
OpenVX Node
© Copyright Khronos Group 2013 - Page 29
Typical Imaging Pipeline • Processing pre- and post-ISP can be done on CPU, GPU, DSP
- E.g. using OpenCL or OpenVX
• BUT.. Applications have often had limited control over the actual camera and ISP - ISP controls camera via 3A algorithms - Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF)
Pre-processing Image Signal Processor (ISP)
Post-processing
CMOS sensor Color Filter Array
Lens
Bayer RGB/YUV
App
Lens, sensor, aperture control 3A
Need for advanced camera control API: - to provide more flexible app camera control
- over more types of camera sensors - with tighter integration with the rest of the system
© Copyright Khronos Group 2013 - Page 30
Khronos APIs for Augmented Reality
Advanced Camera Control and stream
generation
3D Rendering and Video Composition
On GPU
Audio Rendering
Application on CPUs, GPUs
and DSPs
Sensor Fusion
Vision Processing
MEMS Sensors
Camera Control API
EGLStream - stream data
between APIs
Precision timestamps on all sensor samples
AR needs not just advanced sensor processing, vision acceleration, computation and rendering - but also for all these subsystems to work efficiently together
W3C Augmented Web Community Group discussing many of these issues for the Web: e.g. leveraging WebRTC in the short term http://w3.org/community/ar
© Copyright Khronos Group 2013 - Page 31
3D Needs a Transmission Format! • Compression and streaming of 3D assets becoming essential
- Mobile and connected devices need access to increasingly large asset databases
• 3D is the last media type to define a compressed format - 3D is more complex – diverse asset types and use cases
• Needs to be royalty-free - Avoid an ‘internet video codec war’ scenario
• Eventually enable hardware implementations of successful codecs - High-performance and low power – but pragmatic adoption strategy is key
Audio Video Images 3D MP3 H.264 JPEG ?
! An effective and widely adopted codec ignites previously
unimagined opportunities for a media type
© Copyright Khronos Group 2013 - Page 32
glTF – OpenGL Transmission Format • Binary file format for efficient transmission for 3D assets
- Reduce network bandwidth and minimize client processing overhead
• Run-time neutral - DO NOT IMPLY OR MANDATE ANY RUN-TIME BEHAVIOR - Can be used by any app or run-time – usually WebGL accelerated
• Scalable to handle compression and streaming - Though baseline format does not include compression
• ‘Direct load efficiency’ for WebGL - Little or NO processing to drop glTF data into WebGL client
• Carry conditioned data from any authoring format - Prototyping and optimizing efficient handling of COLLADA assets
A standards-based content pipeline for
rich native and Web 3D applications Playback Authoring
© Copyright Khronos Group 2013 - Page 33
COLLADA and glTF Open Source Ecosystem
Tool Interop
Three.js glTF Importer. Rest3D initiative
COLLADA2GLTF Translator
OpenCOLLADA Importer/Exporter
and COLLADA Conformance Tests
On GitHUB
Pervasive WebGL deployment
Other authoring formats
Web-based Tools
https://github.com/KhronosGroup/glTF
https://github.com/KhronosGroup/OpenCOLLADA
https://github.com/KhronosGroup/COLLADA-CTS
© Copyright Khronos Group 2013 - Page 34
WebGL as Test-bed for 3D Asset Compression • Integrating and benchmarking 3D geometry compression formats with glTF
- Baseline is GZIP
• Scalable Complexity 3D Mesh Compression codec MPEG-SC3DMC - Royalty-free graphics compression technology from MPEG (MIT License) - Open3DGC is efficient JavaScript and C/C++ implementation - Convertor using Open3DGC to compress 3D Meshes, Skinning, Animations - Available at https://github.com/fabrobinet/glTF-webgl-viewer
• WebGL-loader is Google lightweight compression for WebGL content
© Copyright Khronos Group 2013 - Page 35
Compression Efficiency – Early Results
Format
CAD Models (Mbytes)
3D Scanned Models (Mbytes)
MPEG dataset (Mbytes)
OBJ 1310 (100%) 736 (100%) 600 (100%) Gzip 336 (26%) 204 (28%) 157 (26%) Webgl-loader 219 (17%) 117 (16%) 103 (17%) Open3DGC 67 (5%) 22 (3%) 22 (4%) Webgl-loader + Gzip 80 (6%) 38 (5%) 26 (4%)
Open3DGC is 5x-9x more efficient than Gzip and 1.2x-1.5x more efficient than webgl-loader
© Copyright Khronos Group 2013 - Page 36
Decoding Speed • For mobile - need to balance file size AND decompression processing
- Extensive processing can take more time/power than transmission
• OpenCTM is also promising but LZMA is very processor intensive - Work may lead to LZMA in hardware?
Hand (100K Tri.) Dilo (54K Tri.) Octopus (34K Tri.)
Win7 64-bit, 10GB RAM i7-2600 CPU @ 3.4GHz
Samsung Galaxy S4 Android 4.2.2
130 ms 86 ms 65 ms
1045 ms 768 ms 457 ms
© 2013 NVIDIA - Page 37
Path Rendering Acceleration Offload the CPU so the application can run as fast as possible
Make maximum use of the GPU for best performance and power
CPU creates paths
Use standard 3D commands to
process polygons
CPU renders paths
CPU creates paths
CPU tessellates paths into polygons
Define new OpenGL path commands to
process paths directly
CPU creates paths
- Software Scanline renderers can be high quality and portable
- CPU has to process complete pipeline – stealing cycles
from the application - Software rendering limits
performance
- Tessellation loads the CPU – stealing cycles from the application so perf
sometimes slower than software alone - Tessellation consumes a lot of data
and memory bandwidth = power - Quality can be compromised due to
tessellation accuracy
CPU
GPU
- Maximum CPU offload - Compact data format sent
to GPU renderer - GPU provides excellent performance and power
- GPU can increase quality and functionality
© 2013 NVIDIA - Page 38
NV_path_rendering OpenGL Extension Brings Path processing directly to OpenGL
No tessellation necessary
Goals Functionally complete for key standards: SVG, Canvas, PostScript etc. Much faster—often 4x to 100x faster than CPUs Enhanced quality – can avoid approximations needed by CPU renderers Lower power by leveraging dedicated hardware New functionality – e.g. mix 2D paths with 3D and programmable shading
© 2013 NVIDIA - Page 39
Stencil then Cover Approach Create a path object and pass directly to the GPU
Cubic & quadratic Bezier segments, line segments, partial elliptical arcs
GPU “Stencils” the path object into the stencil buffer GPU provides massively parallel stenciling of filled or stroked paths Calculate winding rule or containment at every sub-pixel sample in parallel
“Cover” the path object and stencil test against its coverage Test against path coverage determined in the 1st step and shade the path
Uses GPU MSAA anti-aliasing 8 or 16 samples/pixel gives good quality
Step 1 Stencil
Step 2: Cover
repeat
© 2013 NVIDIA - Page 40
Excellent Geometric Fidelity for Stroking
Correct stroking is hard Lots of CPU implementations approximate stroking
GPU-accelerated stroking avoids such short-cuts
GPU has FLOPS to compute true stroke point containment
GPU-accelerated OpenVG reference
Cairo Qt
Stroking with tight end-point curve
© 2013 NVIDIA - Page 41
Micrography
“Girl with Words in Her Hair” 591 paths 338,507 commands 1,244,474 coordinates
Ron Maharik, Mikhail Bessmeltsev, Alla Sheffer, Ariel Shamir and Nathan Carr SIGGRAPH 2011
© 2013 NVIDIA - Page 42
More Details on nvpr Functionality union of all major path rendering standards
Enables mixing traditional functionality with 3D and programmable shading
Point sampling for path filling is exact No approximations due to tessellation or subdivision
Path stroking is exact Line segments & quadratic Bezier segments stroking is exact All stroke cap + join styles supported Dashing fully supported
Minimal pre-computation required NO tessellation involved, NO recursive subdivision Fast to animate, morph, or edit paths
© 2013 NVIDIA - Page 43
Enhanced Quality on GPU
conflation artifacts on CPU conflation free on GPU Eliminate Conflation Artifacts
Multiple color AND stencil samples per pixel
color bleeding
Cairo NV_path_rendering Skia
feathers? weird big holes
Stroking approximations avoided by GPU regular grid on CPU - sub-optimal Antialiasing
jitter pattern on GPU for better Antialiasing
GPU Offers Jittered Sampling for Free
GPU
Qt
Cairo
Moiré artifacts Similar for Qt & Skia
Proper gradient filtering on GPU
GPUs great at texturing: Mip-mapping Anisotropic filtering Wrap modes
© 2013 NVIDIA - Page 44
Comparing Performance
© 2013 NVIDIA - Page 45
0.10
1.00
10.00
100.00
1000.00 10
0x100
200x2
00 30
0x300
400x4
00 50
0x500
600x6
00 70
0x700
800x8
00 90
0x900
1000x1
000
1100x1
100
100x1
00 20
0x200
300x3
00 40
0x400
500x5
00 60
0x600
700x7
00 80
0x800
900x9
00 100
0x1000
110
0x1100
10
0x100
200x2
00 30
0x300
400x4
00 50
0x500
600x6
00 70
0x700
800x8
00 90
0x900
1000x1
000
1100x1
100
100x1
00 20
0x200
300x3
00 40
0x400
500x5
00 60
0x600
700x7
00 80
0x800
900x9
00 100
0x1000
110
0x1100
10
0x100
200x2
00 30
0x300
400x4
00 50
0x500
600x6
00 70
0x700
800x8
00 90
0x900
1000x1
000
1100x1
100
100x1
00 20
0x200
300x3
00 40
0x400
500x5
00 60
0x600
700x7
00 80
0x800
900x9
00 100
0x1000
110
0x1100
10
0x100
200x2
00 30
0x300
400x4
00 50
0x500
600x6
00 70
0x700
800x8
00 90
0x900
1000x1
000
1100x1
100
100x1
00 20
0x200
300x3
00 40
0x400
500x5
00 60
0x600
700x7
00 80
0x800
900x9
00 100
0x1000
110
0x1100
10
0x100
200x2
00 30
0x300
400x4
00 50
0x500
600x6
00 70
0x700
800x8
00 90
0x900
1000x1
000
1100x1
100
100x1
00 20
0x200
300x3
00 40
0x400
500x5
00 60
0x600
700x7
00 80
0x800
900x9
00 100
0x1000
110
0x1100
10
0x100
200x2
00 30
0x300
400x4
00 50
0x500
600x6
00 70
0x700
800x8
00 90
0x900
1000x1
000
1100x1
100
100x1
00 20
0x200
300x3
00 40
0x400
500x5
00 60
0x600
700x7
00 80
0x800
900x9
00 100
0x1000
110
0x1100
tigerWelsh_dragon
Celtic_round_dogsbutterfly spikesAmerican_Samoacowboy BuonaparteEmbrace_the_WorldYokozawaCougar
tiger_clipped_by_he
NVpr16/Cairo
NVpr16/SkiaBitmap
NVpr16/SkiaGanesh
NVpr16/Direct2D GPU
NVpr16/Direct2D WARP
GeForce GTX 480. Release drivers V.300. x16 MSAA
Comparative Performance (Logarithmic Scale)
© 2013 NVIDIA - Page 46
New GPU Functionality
light source position for BUMP Mapping
Programmable Shading Paint in GLSL – for filter and blending acceleration
Projective Transformation
Fast Arbitrary Path Clipping
Mixing depth tested Text, 3D, and Paths
linear RGB transition between saturated red and saturated blue has dark purple region
sRGB perceptually smooth transition from saturated red to saturated blue
Fully sRGB Correct Rendering
© 2013 NVIDIA - Page 47
Mixing 2D and 3D
© 2013 NVIDIA - Page 48
Resolution-independent Font Support Fonts are a standard, first-class part of all path rendering systems
Foreign to 3D graphics systems such as OpenGL and Direct3D
NV_path_rendering has built-in font support Can specify a range of path objects with
A specified font Sequence or range of Unicode character points
No requirement for applications use font API to load glyphs You can also load glyphs “manually” from your own glyph outlines Functionality provides OS portability
© 2013 NVIDIA - Page 49
Path Geometric Queries glIsPointInFillPathNV
Determine if object-space (x,y) position is inside or outside path, given a winding number mask
glIsPointInStrokePathNV Determine if object-space (x,y) position is inside the stroke of a path accounts for dash pattern, joins, and caps
glGetPathLengthNV Returns approximation of geometric length of a given sub-range of path segments
glPointAlongPathNV Returns the object-space (x,y) position and 2D tangent vector a given offset into a specified path object Useful for “text follows a path”
Queries are modeled after OpenVG queries
© 2013 NVIDIA - Page 50
Open Source Accelerated SVG Renderer Partial SVG Renderer - pr_svg
Path filling, transformations and grouping Path stroking with all stroking embellishments Clipping – including clipping paths to other arbitrary paths Painting with linear/radial gradients and images Basic compositing Coming in next update: markers and text
Stuff that’s missing from pr_svg Filters, Blending, Opacity groups, Animation, JavaScript integration Not hard, just best done in context of a browser
NVIDIA welcomes any community involvement http://developer.nvidia.com/nv-path-rendering
© 2013 NVIDIA - Page 51
More Information Best drivers: OpenGL 4.4
www.nvidia.com/drivers Grab the latest drivers for your OS & GPU Runs on any CUDA-capable GPU (GeForce 8 onwards)
Developer resources http://developer.nvidia.com/nv-path-rendering Whitepapers, FAQ, specification NVprSDK—software development kit NVprDEMOs—pre-compiled Windows demos YouTube videos demonstrate various NVpr DEMOs
Email: [email protected]
© 2013 NVIDIA - Page 52
Standardization and Adoption Pipeline NVIDIA is proposing nvpr to OpenGL working group at Khronos to create open, royalty-free cross platform foundation for vector graphics acceleration
Vendor Extension to OpenGL
OpenGL Extension
or Core
Vector acceleration pervasive on desktop
and mobile
Initial functionality proposal. Prove concepts.
Solicit industry feedback
Pervasive multi-vendor availability. Widespread application usage inspires silicon optimizations
nvpr is here!
OpenGL vector acceleration adopted into OpenGL and OpenGL ES
Desktop and mobile displays typically
>300 DPI
Mobile silicon is CUDA/OpenCL capable
© 2013 NVIDIA - Page 53
Path Rendering Acceleration on Android Tablet
© Copyright Khronos Group 2013 - Page 54
Summary • Open standards such as WebGL and WebCL are enabling web applications to
reach the power of the GPU through JavaScript • GPU acceleration will soon become vital for Web applications wanting to
leverage advanced use of camera and sensors • Direct acceleration of path primitives directly on GPUs will drive browser
performance for new classes of applications and devices • Work starting on 3D asset streaming and compression standards – to enable 3D as
a social media type on the web • The Web and hardware community have significant opportunity to leverage each
others efforts for the benefit of the industry • Khronos is committed to enable the hardware community to be a good citizen in
creating the next generation of accelerated web standards