vision bof - siggraph 2014

42
© Copyright Khronos Group 2014 - Page 1 Open Standard APIs for Vision and Camera Processing Neil Trevett Vice President Mobile Ecosystem, NVIDIA President, Khronos Group

Upload: the-khronos-group-inc

Post on 10-Jun-2015

506 views

Category:

Technology


3 download

DESCRIPTION

Vision slide presentation from the 2014 SIGGRAPH BOF, including OpenKCAM, OpenVX, EGL and StreamInput

TRANSCRIPT

Page 1: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 1

Open Standard APIs for Vision and Camera Processing

Neil Trevett Vice President Mobile Ecosystem, NVIDIA

President, Khronos Group

Page 2: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 2

Khronos Connects Software to Silicon

Open Consortium creating

ROYALTY-FREE, OPEN STANDARD

APIs for hardware acceleration

Defining the roadmap for

low-level silicon interfaces

needed on every platform

Graphics, compute, rich media,

vision, sensor and camera

processing

Rigorous specifications AND

conformance tests for cross-

vendor portability

Acceleration APIs

BY the Industry

FOR the Industry

Well over a BILLION people use Khronos APIs

Every Day…

Page 3: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 3

Khronos Standards

Visual Computing - 3D Graphics - Heterogeneous Parallel Computing

3D Asset Handling - 3D authoring asset interchange

- 3D asset transmission format with compression

Acceleration in HTML5 - 3D in browser – no Plug-in

- Heterogeneous computing for JavaScript

Over 100 companies defining royalty-free

APIs to connect software to silicon

Sensor Processing - Vision Acceleration - Camera Control - Sensor Fusion

Page 4: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 4

Visual Computing = Graphics PLUS Vision

Real-time GPU Compute Research project on CUDA-enabled laptop

High-Quality Reflections, Refractions, and Caustics in Augmented

Reality and their Contribution to Visual Coherence

P. Kán, H. Kaufmann, Institute of Software Technology and Interactive

Systems, Vienna University of Technology, Vienna, Austria

https://www.youtube.com/watch?v=i2MEwVZzDaA

Imagery

Data

Vision Processing

Graphics Processing

Enhanced sensor and vision

capability deepens the interaction

between real and virtual worlds

Page 5: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 5

Mobile Visual Computing = New Experiences

Augmented Reality

Face, Body and Gesture Tracking

Computational Photography and

Videography

3D Scene/Object Reconstruction

Need for advanced sensors and the GPU throughput to process them

Page 6: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 6

Vision Pipeline Challenges and Opportunities

• Light / Proximity

• 2 cameras

• 3 microphones

• Touch

• Position

- GPS

- WiFi (fingerprint)

- Cellular trilateration

- NFC/Bluetooth Beacons

• Accelerometer

• Magnetometer

• Gyroscope

• Pressure / Temp / Humidity

19

Sensor Proliferation Diverse sensor awareness of

the user and surroundings

• Camera sensors >20MPix

• Novel sensor configurations

• Stereo pairs

• Plenoptic Arrays

• Active Structured Light

• Active TOF

Growing Camera Diversity Capturing color, range

and lightfields

Diverse Vision Processors Driving for high performance

and low power

• Camera ISPs

• Dedicated vision IP blocks

• DSPs and DSP arrays

• Programmable GPUs

• Multi-core CPUs

Flexible sensor and camera

control to generate

required image stream

Use best processing available

for image stream processing –

with code portability

Control/fuse vision data

by/with all other sensor data

on device

Page 7: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 7

Vision Processing Power Efficiency • Depth sensors = significant processing

- Generate/use environmental information

• Wearables will need ‘always-on’ vision

- With smaller thermal limit / battery than phones!

• GPUs has x10 CPU imaging power efficiency

- GPUs architected for efficient pixel handling

• Traditional cameras have dedicated hardware

- ISP = Image Signal Processor – on all SOCs today

• SOCs have space for more transistors

- But can’t turn on at same time = Dark Silicon

• Potential for dedicated sensor/vision silicon

- Can trigger full CPU/GPU complex

Pow

er

Eff

icie

ncy

Computation Flexibility

Dedicated Hardware

GPU Compute

Multi-core CPU X1

X10

X100

Advanced Sensors

Wearables

But how to program specialized processors? Performance and Functional Portability

Page 8: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 8

OpenVX – Power Efficient Vision Acceleration • Out-of-the-Box vision acceleration framework

- Low-power, real-time, mobile and embedded

• Performance portability for diverse hardware

- ISPs, Dedicated vision blocks,

DSPs and DSP arrays, GPUs, Multi-core CPUs

• Suited for low-power, always-on acceleration

- Can run solely on dedicated vision hardware

• Foundational API for vision acceleration

- Can be used by middleware or applications

• Complementary to OpenCV

- Which is great for prototyping

• Khronos open source sample implementation

- To be released with final specification Open source sample

implementation

Hardware vendor

implementations

OpenCV open

source library

Other higher-level

CV libraries

Application

Page 9: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 9

OpenVX Graphs – The Key to Efficiency • Vision processing directed graphs for power and performance efficiency

- Each Node can be implemented in software or accelerated hardware

- Nodes may be fused by the implementation to eliminate memory transfers

- Processing can be tiled to keep data entirely in local memory/cache

• VXU Utility Library for access to single nodes

- Easy way to start using OpenVX by calling each node independently

• EGLStreams can provide data and event interop with other Khronos APIs

- BUT use of other Khronos APIs are not mandated

OpenVX Node

OpenVX Node

OpenVX Node

OpenVX Node

Downstream

Application

Processing

Native

Camera

Control

Example OpenVX Graph

Page 10: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 10

OpenVX 1.0 Function Overview • Core data structures

- Images and Image Pyramids

- Processing Graphs, Kernels, Parameters

• Image Processing

- Arithmetic, Logical, and statistical operations

- Multichannel Color and BitDepth Extraction and Conversion

- 2D Filtering and Morphological operations

- Image Resizing and Warping

• Core Computer Vision

- Pyramid computation

- Integral Image computation

• Feature Extraction and Tracking

- Histogram Computation and Equalization

- Canny Edge Detection

- Harris and FAST Corner detection

- Sparse Optical Flow

OpenVX 1.0 defines

framework for

creating, managing and

executing graphs

Focused set of widely

used functions that are

readily accelerated

Implementers can add

functions as extensions

Widely used extensions

adopted into future

versions of the core

OpenVX Specification

Evolution

Page 11: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 11

Example Graph - Stereo Machine Vision

Camera 1 Compute Depth

Map (User Node)

Detect and track objects (User Node)

Camera 2

Image Pyramid

Stereo Rectify with

Remap

Stereo Rectify with

Remap

Compute Optical Flow

Object

coordinates

OpenVX Graph

Delay

Tiling extension enables user nodes (extensions) to also optimally run in local memory

Page 12: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 12

OpenVX and OpenCV are Complementary

Governance Community driven open source

with no formal specification

Formal specification defined and

implemented by hardware vendors

Conformance No conformance tests for consistency and

every vendor implements different subset

Full conformance test suite / process

creates a reliable acceleration platform

Portability APIs can vary depending on processor Hardware abstracted for portability

Scope Very wide

1000s of imaging and vision functions

Multiple camera APIs/interfaces

Tight focus on hardware accelerated

functions for mobile vision

Use external camera API

Efficiency Memory-based architecture

Each operation reads and writes memory

Graph-based execution

Optimizable computation, data transfer

Use Case Rapid experimentation Production development & deployment

Page 13: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 13

OpenVX Participants and Timeline • Provisional 1.0 specification released November 2013 for industry feedback

- An update to the provisional spec published in July

• OpenVX 1.0 final release planned for 2014

- With conformance tests

• Itseez is working group chair (the convener of OpenCV)

- Qualcomm and TI are specification editors

Page 14: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 14

Applications and Middleware

Tegra K1

CUDA Libraries

VisionWorks Primitives

Classifier

Corner Detection

3rd Party

Vision Pipeline Samples

Object Detection

3rd Party Pipelines …

SLAM

VisionWorks Framework

NVIDIA VisionWorks Uses OpenVX • VisionWorks library contains diverse vision and imaging primitives

• Leverages OpenVX for optimized primitive execution

• Can extend VisionWorks nodes through CUDA accelerated primitives

• Provided with sample library of fully accelerated pipelines

Page 15: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 15

OpenCL – Portable Heterogeneous Computing • Portable Heterogeneous programming of diverse compute resources

- Targeting supercomputers -> embedded systems -> mobile devices

• One code tree can be executed on CPUs, GPUs, DSPs and hardware

- Dynamically interrogate system load and balance work across available processors

• OpenCL = Two APIs and C-based Kernel language

- Platform Layer API to query, select and initialize compute devices

- Kernel language - Subset of ISO C99 + language extensions

- C Runtime API to build and execute kernels

across multiple devices OpenCL

Kernel

Code

OpenCL

Kernel

Code

OpenCL

Kernel

Code

OpenCL

Kernel

Code

GPU

DSP CPU

CPU HW

Page 16: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 16

OpenCL as Parallel Language Backend

JavaScript

binding for

initiation of

OpenCL C

kernels

River Trail Language

extensions to

JavaScript

MulticoreWare

open source

project on

Bitbucket

OpenCL provides vendor optimized,

cross-platform, cross-vendor access to

heterogeneous compute resources

Harlan High level

language

for GPU

programming

Compiler

directives for

Fortran,

C and C++

Java language

extensions

for

parallelism

PyOpenCL

Python

wrapper

around

OpenCL

Language for

image

processing and

computational

photography

Embedded

array

language for

Haskell

SPIR Standard Portable

Intermediate Representation (extending LLVM for parallel computation)

SPIR 2.0 Released here at SIGGRAPH

Page 17: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 17

Mixamo - Avatar Videoconferencing • Real time facial animation capture on mobile – ported directly from PC

• Animate an avatar while conferencing

• Full GPU acceleration of vision processing using OpenCL

NVIDIA Tegra K1 Development Board

Page 18: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 18

Khronos APIs for Vision Processing

GPU Compute Shaders (OpenGL 4.X and OpenGL ES 3.1)

Pervasively available on almost any mobile device or OS

Easy integration into graphics apps – no compute API interop needed

Program in GLSL not C

Limited to acceleration on a single GPU

General Purpose Heterogeneous Programming Framework Flexible, low-level access to any devices with OpenCL compiler

Single programming and run-time framework for CPUs, GPUs, DSPs, hardware

Open standard for any device or OS – being used as backed by many languages and frameworks

Needs full compiler stack and IEEE precision

Out of the Box Vision Framework - Operators and graph framework library Can run on dedicated hardware – no compiler needed

Easier performance portability to diverse hardware

Suited for low-power, always-on acceleration

Fixed set of operators – but can be extended

It is possible to use OpenCL or GLSL to build OpenVX Nodes on programmable devices

Page 19: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 19

Kari Pulli, NVIDIA Research

Page 20: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 20

Advanced Camera Control Use Cases • High-dynamic range (HDR) and computational flash photography

- High-speed burst with individual frame control over exposure and flash

• Subject isolation and depth detection - High-speed burst with individual frame control over focus

• Rolling shutter elimination

- High-precision intra-frame synchronization between camera and motion sensor

• Augmented Reality

- 60Hz, low-latency capture with motion sensor synchronization

- Multiple Region of Interest (ROI) capture

- Synchronized stereo sensors for scene scaling

- Detailed feedback on camera operation per frame

• Time-of-flight or structured light depth camera processing

- Aligned stacking of data from multiple sensors

Page 21: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 21

Typical Imaging Pipeline

• Pre-processing is non-existent in basic use-cases

• Pre- and Post-processing can be done on CPU, GPU, DSP…

• ISP controls camera via 3A algorithms

Auto Exposure (AE), Auto White Balance (AWB), Auto Focus (AF)

Pre-processing Image Signal Processor

(ISP)

Post-

processing

CMOS sensor

Color Filter Array

Lens

Bayer RGB

YUV

App

Lens, sensor, aperture control

Page 22: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 22

High Dynamic Range (HDR) • HDR works by combining differing exposures into the same image

• A variety of methods for HDR, based on application

- Multiple frame HDR (requires frame memory)

- Interlace HDR

- Multiple Zone HDR

• HDR requires

- Precise control over camera parameters (exposure)

- Fast capture and processing of multiple images

- Note: with interlace HDR, only 1 image is needed

Short

exposure Long

exposure

Optional mid

exposure

HDR processing

Page 23: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 23

Image stitching, panoramic images

• Made with • Requires processing of multiple images

• Requires position / geometry information

• Requires control of camera (e.g. AE lock)

Page 24: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 24

Typical Burst Sequence Applications

Page 25: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 25

Pipelined Sensor Model • Traditional one-shot sensor model

- Need to know which parameters were used

- reset pipeline between shots slow

• Viewfinding / video mode:

- Pipelined, high frame rate

- Settings changes take effect later

• Need new model for Computational

Photography

- Need parameterized SEQUENCE of images

to feed advanced algorithms

• Real image sensors are pipelined

- While one frame exposing

- Next one is being prepared

- Previous one is being read out

Page 26: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 26

Need for Camera Control API - OpenKCAM • Advanced control of ISP and camera subsystem – with cross-platform portability

- Generate sophisticated image stream for advanced imaging & vision apps

• No platform API currently fulfills all developer requirements

- Portable access to growing sensor diversity: e.g. depth sensors and sensor arrays

- Cross sensor synch: e.g. synch of camera and MEMS sensors

- Advanced, high-frequency per-frame burst control of camera/sensor: e.g. ROI

- Multiple input, output re-circulating streams with RAW, Bayer or YUV Processing

Image Signal

Processor (ISP)

Image/Vision

Applications

Defines control of Sensor, Color Filter Array

Lens, Flash, Focus, Aperture

Auto Exposure (AE)

Auto White Balance (AWB)

Auto Focus (AF)

EGLStreams

Page 27: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 27

OpenKCAM API Requirements • Provide functional portability for advanced camera applications

- Reduce extreme fragmentation for ISVs wanting more than point and shoot

• Application control over ISP processing (including 3A)

- Including multiple, re-entrant ISPs

• Control multiple sensors with synch and alignment

- E.g. Stereo pairs, Plenoptic arrays, TOF or structured light depth cameras

• Enhanced per frame detailed control

- Format flexibility, Region of Interest (ROI) selection

• Global timing & synchronization

- E.g. Between cameras and MEMS sensors

• Flexible processing/streaming

- Multiple input and output streams with RAW, Bayer or YUV Processing

- Streaming of rows (not just frames)

Enable advanced camera functionality not available on current platforms

Page 28: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 28

OpenKCAM is FCAM-based • FCAM (2010) Stanford/Nokia, open source

• Capture stream of camera images with precision control

- A pipeline that converts requests into image stream

- All parameters packed into the requests - no visible state

- Programmer has full control over sensor settings for each frame in stream

• Control over focus and flash

- No hidden daemon running

• Control ISP

- Can access supplemental

statistics from ISP if available

• No global state

- State travels with image requests

- Every pipeline stage may have different state

- Enables fast, deterministic state changes

Page 29: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 29

OpenKCAM Design Philosophy • C-language API starting from proven designs

- e.g. FCAM, Android camera platform

• Design alignment with widely used hardware standards

- e.g. MIPI CSI

• Focus on mobile, power-limited devices

- But do not preclude other use cases such as automotive, DSLR…

• Minimize overlap and maximize interoperability with other Khronos APIs

- But other Khronos APIs are not required

• Provide support for vendor-specific extensions

Page 30: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 30

• Android Exposes Java camera APIs to developers

- Controls underlying Camera HAL

• Camera HAL v1 API simplified basic point and shoot apps

- Difficult or impossible to do much else

• Camera HAL v3 API is a fundamentally different API

- Streams-based to enable more sophisticated camera applications

Potential Adoption on Android

Open source

project developed

by Nokia and

Stanford

Camera API

HAL V3 adopts many

FCAM ideas and can use

EGL in its implementation

OpenKCAM builds on FCAM with a goal of

being forward compatible with Android

architecture

OpenKCAM may be used to IMPLEMENT Android

Camera HAL – and provide an advanced native

camera API in NDK

Page 31: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 31

Participating Companies and Milestones

Apr13

Jul13

Group charter approved

3Q14

Sample implementation and

tests

1Q15

Specification ratification

Page 32: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 32

OpenKCAM Working Group • Royalty free API for portable access to advanced mobile camera functionality

- Reduce fragmentation and encourage more advanced camera applications

• Control for the new wave of sensors to enable advanced imaging and vision

- Multiple sensors, depth cameras, synchronized sensors

• Provide sophisticated camera functionality not available on today’s platforms

- But work to enable easy adoption by platform vendors

• Eager to contribute? Join Khronos OpenKCAM WG!

- http://www.khronos.org/camera

• Mikaël Bourges-Sévenier

- [email protected]

Page 33: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 33

Neil Trevett, NVIDIA

Page 34: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 34

Sensor Industry Fragmentation …

Page 35: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 35

Sensor Types • Basic sensor data:

- Acceleration, Magnetic Field, Angular Rates

- Pressure, Ambient Light, Proximity, Temperature, Humidity, RGB light, UV light

- Heart rate, Blood Oxygen Level, Skin Hydration, Breathalyzer

• Sensor fusion

- Orientation (Quaternion or Euler Angles), Gravity, Linear Acceleration

- Position

• Context awareness

- Device Motion: general movement of the device: still, free-fall, …

- Carry: how the device is being held by a user: in pocket, in hand, …

- Posture: how the body holding the device is positioned: standing, sitting, step, …

- Transport: about the environment around the device: in elevator, in car, …

Page 36: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 36

Low-level Sensor Abstraction API

Apps Need Sophisticated Access to Sensor Data Without coding to specific

sensor hardware

Apps request semantic sensor information StreamInput defines possible requests, e.g.

Read Physical or Virtual Sensors e.g. “Game Quaternion”

Context detection e.g. “Am I in an elevator?”

StreamInput processing graph provides

optimized sensor data stream High-value, smart sensor fusion middleware can connect

to apps in a portable way

Apps can gain ‘magical’ situational awareness

Advanced Sensors Everywhere Multi-axis motion/position, quaternions,

context-awareness, gestures, activity monitoring, health and environmental sensors

Sensor Discoverability

Sensor Code Portability

Page 37: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 37

StreamInput: Platform Integration

OS Sensor APIs (E.g. Android SensorManager or

iOS CoreMotion)

Low-level native API defines portable access to

fused sensor data stream and context-awareness

Applications

Sensor Sensor

Sensor

Hub Sensor

Hub

Middleware (E.g. Context-awareness engines,

gaming engines) Flexible native API to

integrate where needed

depending on existing

platform sensor stacks

Page 38: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 38

Sensor OSP Announcement • Proposal to converge OSP (Open Sensor Platform) APIs with StreamInput

- Sensor Platforms is StreamInput Spec Editor

Page 39: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 39

EGL 1.5 Released at GDC 2014 • EGL 1.5 brings functionality from

multiple extensions into core

- Increased reliability and portability

• EGLImages

- Sharing textures and renderbuffers

• Context Robustness

- Defending against malicious code

• EGLSync objects

- Improved OpenGL /OpenCL interop

• Platform extensions

- Standardized interactions for multiple

OS e.g. Android and 64-bit platforms

• sRGB colorspace rendering

Applications

OS and Display

Platforms

Application Portability EGL abstracts graphics context

management, surface and

buffer binding and rendering synchronization

API Interop EGL provides efficient

transfer of data and events

between Khronos APIs

Page 40: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 40

Potential EGL Future Directions • EGLImageStream extensions are very powerful today

- But need wider implementation in drivers

- Stream other types of data – unformatted buffers for metadata and more

- GPU-to-GPU streaming and invoking client API activities directly from other client

APIs without CPU intervention

• Separation of traditional context/surface functionality from “hub” functionality

• Support for new Khronos APIs where appropriate

- Streaming video + image processing + display use case

Page 41: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 41

Khronos APIs for Augmented Reality

Advanced Camera Control and stream

generation

3D Rendering and Video

Composition

On GPU

Audio

Rendering

Application

on CPUs, GPUs

and DSPs

Sensor

Fusion

Vision

Processing

MEMS

Sensors

EGLStream - stream data

between APIs

Precision timestamps

on all sensor samples

AR needs not just advanced sensor processing, vision

acceleration, computation and rendering - but also for

all these subsystems to work efficiently together

Page 42: Vision BOF - SIGGRAPH 2014

© Copyright Khronos Group 2014 - Page 42

Summary • Khronos is building a family of interoperating APIs for portable and

power-efficient vision processing

• OpenVX 1.0 has been provisionally released and non-members are invited to

provide feedback on the forums - http://www.khronos.org/message_boards/forumdisplay.php/110-OpenVX-General

• OpenKCAM and StreamInput APIs are currently in design and complement and

integrate with OpenVX

• Any company is welcome to join Khronos to influence the direction of mobile

and embedded vision processing!

- $15K annual membership fee for access to all Khronos API working groups

- Well-defined IP framework protects your IP and conformant implementations

• www.khronos.org

- [email protected]