integrating computer vision sensor innovatoins into mobile...

INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES

Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp.

Computer Vision in Mobile

Tegra K1

It’s time!

AGENDA Use cases categories

Underlying technologies examples

Performance and power considerations

Software considerations and dilemmas

VISION FUNCTIONALITY TAXONOMY

3D Reconstruction

Environmental Feature Tracking Face, eye and hand

gesture tracking

Object Reconstruction

Scene Reconstruction

User Facing Scene Facing User Facing Scene Facing

Tracking

Indoor/Outdoor Positional Tracking Body Modeling

Facial Modeling

Body Tracking

Markets

UI / Smart TV / STB

Gaming

Automotive

Social/Media

E-commerce

Modeling/Architecture/DIY/3D printing

Small Scale

Large Scale

UNDERLYING TECHNOLOGY: DEPTH EXTRACTION

Obtain a depth map for many points on a 2D picture

Not necessarily per every pixel

From there, we can calculate:

— 3D geometry and model

— Body position and movement

— Face features and expression

Aggregating models is easy

— From different shots

— From different sources

3D SCANNING: THE TECHNOLOGIES Different approaches:

— Structured light Project IR pattern

Find the pattern symbols on the image

Triangulate to find depth

— Stereo Capture two or more images

Find corresponding points

Triangulate to find depth

— Structure from Motion (SfM) Similar to Stereo but using same camera over time (instead of multiple cameras)

— Coded / multiple aperture Project different patterns and solve for depth

— Time of Flight Project pulse of light

Capture returned phase

IR

B

A

UNDERLYING TECHNOLOGY: VISUAL ODOMETRY

The use of data from cameras to estimate device change in position over time

1. Uses either single, stereo, or omnidirectional cameras

2. Image correction for lens distortion

3. Feature detection

4. Construct optical flow field

5. Estimation of the camera motion from the optical flow

1. Kalman filter or cost function minimization

6. Check potential tracking errors and remove outliers

7. Periodic repopulation of points to maintain coverage across the image

Images from Davide Scaramuzza

ARE WE THERE YET? Performance

— Do the algorithms fit in the HW? Is the HW fast enough?

— Do they leave enough headroom for the actual application?

— Do the algorithms and the applications work together efficiently?

Power

— Does it fit the constrains of thermal, max current and battery life?

Cost

— New sensors, light sources, etc.

SW infrastructure

— Do the right APIs exist?

— Is the imaging pipeline flexible enough?

— Are there programming languages/environment to support this?

TEGRA K1: A MAJOR LEAP FORWARD FOR MOBILE & EMBEDDED APPLICATIONS KEPLER GPU, 192 CORES CUDA 12GB/S BANDWIDTH VIDEO IMAGE COMPOSITOR (VIC) DESIGNED FOR MOBILE DEVICES

HD Video Processor 1080p24/30 Video Decode 1080p24/30 Video Encode H.264 | MPEG4 | VC1 | MPEG2 VP8

Kepler GeForce®

GPU w/CUDA

OpenGL-ES nextgen

192 Stream Processors

2D Graphics/Scaling

DAP x5 (12S/TDM)

HDMI eDP/LVDS

ARM

7

Audio

Pro

cess

or Image Processor

25MP Sensor Support ISP 1080p60 Enhanced JPEG Engine

PCIe* G2 x4 + x1

CSI x4 + x4

SATA2 x1 USB 2.0 x3

Security Engine

Display x2

NOR Flash

UART x4 I2C x5

DDR3 Ctlr 64b

800+ MHz

SPI x4 SDIO/MMC x4

28 nm HPM 23x23mm, 0.7mm pitch HS-FCBGA

USB 3.0* x2

Quad Cortex-A15

4x Cores (1+ GHz) NEON SIMD 2 MB L2 (Shared) ARM Trust Zone

Shadow LP C-A15 CPU

TEGRA K1

KEPLER Architecture 192 CUDA Cores, SM3.2 ISA Compatible to GeForce, Quadro, Tesla 64kb L1 Cache and Shared Memory 128kb L2 Cache 128 kb Register File

HD Video Processor 1080p24/30 Video Decode 1080p24/30 Video Encode H.264 | MPEG4 | VC1 | MPEG2 VP8

Kepler GeForce®

GPU w/CUDA

OpenGL-ES nextgen

192 Stream Processors

2D Graphics/Scaling

DAP x5 (12S/TDM)

HDMI eDP/LVDS

ARM

7

Audio

Pro

cess

or Image Processor

25MP Sensor Support ISP 1080p60 Enhanced JPEG Engine

PCIe* G2 x4 + x1

CSI x4 + x4

SATA2 x1 USB 2.0 x3

Security Engine

Display x2

NOR Flash

UART x4 I2C x5

DDR3 Ctlr 64b

800+ MHz

SPI x4 SDIO/MMC x4

28 nm HPM 23x23mm, 0.7mm pitch HS-FCBGA

USB 3.0* x2

Quad Cortex-A15

4x Cores (1+ GHz) NEON SIMD 2 MB L2 (Shared) ARM Trust Zone

Shadow LP C-A15 CPU

GPU

SW CONSIDERATIONS

Need APIs and frameworks to develop SW

— Flexible and complete enough for experimentation

— Fast and stable enough for productization

— Portable for installed base

APIs and libraries

— Android Camera HAL v.3

— OpenCV

— OpenVX

— StreamInput

— VisionWorks

— CUDA

Camera HAL v3 is a fundamentally new API

— Flexible primitives for building

sophisticated use-cases

— Interface is clean and easily extensible

— Apps can have more control, and more

responsibility

Enables sophisticated camera applications

Faster time to market and higher quality

— 1 Request 1 capture

1 result metadata + N image buffers

ANDROID CAMERA HAL V3

OPENCV LIBRARY Version 2.4.5 >900 functions (x the datatypes)

OpenCV4Tegra acceleration:

— CUDA, NEON, GLSL, TBB multithreading

General Image

Processing

Segmentation Machine Learning,

Detection Image Pyramids Transforms Fitting

Image processing

Video, Stereo, and 3D

Camera Calibration Features Depth Maps Optical Flow Inpainting Tracking

OpenCV

VISIONWORKS

Sobel

Convolve

Bilateral Filter

Integral Image

Integral Histogram

Corner Harris

Corner FAST

Image Pyramid

Optical Flow PyrLK

Optical Flow Farneback

Warp Perspective

Hough Lines

Fast NLM Denoising

Stereo Block Matching

IME (Iterative Motion

Estimation)

HOG (Histogram of

Oriented Gradients)

Soft Cascade Detector

Object Tracker

TLD Object Tracker

SLAM

Path Estimator

MedianFlow Estimator

IT IS HAPPENING!

Use cases emerging

Tegra K1 mobile compute power in mobile devices

Software Infrastructure

THANKS

integrating computer vision sensor innovatoins into mobile...

Documents