integrating computer vision sensor innovatoins into mobile...
TRANSCRIPT
INTEGRATING COMPUTER VISION SENSOR INNOVATIONS INTO MOBILE DEVICES
Eli Savransky Principal Architect - CTO Office Mobile BU NVIDIA corp.
Computer Vision in Mobile
Tegra K1
It’s time!
AGENDA Use cases categories
Underlying technologies examples
Performance and power considerations
Software considerations and dilemmas
VISION FUNCTIONALITY TAXONOMY
3D Reconstruction
Environmental Feature Tracking Face, eye and hand
gesture tracking
Object Reconstruction
Scene Reconstruction
User Facing Scene Facing User Facing Scene Facing
Tracking
Indoor/Outdoor Positional Tracking Body Modeling
Facial Modeling
Body Tracking
Markets
UI / Smart TV / STB
Gaming
Automotive
Social/Media
E-commerce
Modeling/Architecture/DIY/3D printing
Small Scale
Large Scale
UNDERLYING TECHNOLOGY: DEPTH EXTRACTION
Obtain a depth map for many points on a 2D picture
Not necessarily per every pixel
From there, we can calculate:
— 3D geometry and model
— Body position and movement
— Face features and expression
Aggregating models is easy
— From different shots
— From different sources
3D SCANNING: THE TECHNOLOGIES Different approaches:
— Structured light Project IR pattern
Find the pattern symbols on the image
Triangulate to find depth
— Stereo Capture two or more images
Find corresponding points
Triangulate to find depth
— Structure from Motion (SfM) Similar to Stereo but using same camera over time (instead of multiple cameras)
— Coded / multiple aperture Project different patterns and solve for depth
— Time of Flight Project pulse of light
Capture returned phase
IR
B
A
UNDERLYING TECHNOLOGY: VISUAL ODOMETRY
The use of data from cameras to estimate device change in position over time
1. Uses either single, stereo, or omnidirectional cameras
2. Image correction for lens distortion
3. Feature detection
4. Construct optical flow field
5. Estimation of the camera motion from the optical flow
1. Kalman filter or cost function minimization
6. Check potential tracking errors and remove outliers
7. Periodic repopulation of points to maintain coverage across the image
Images from Davide Scaramuzza
ARE WE THERE YET? Performance
— Do the algorithms fit in the HW? Is the HW fast enough?
— Do they leave enough headroom for the actual application?
— Do the algorithms and the applications work together efficiently?
Power
— Does it fit the constrains of thermal, max current and battery life?
Cost
— New sensors, light sources, etc.
SW infrastructure
— Do the right APIs exist?
— Is the imaging pipeline flexible enough?
— Are there programming languages/environment to support this?
TEGRA K1: A MAJOR LEAP FORWARD FOR MOBILE & EMBEDDED APPLICATIONS KEPLER GPU, 192 CORES CUDA 12GB/S BANDWIDTH VIDEO IMAGE COMPOSITOR (VIC) DESIGNED FOR MOBILE DEVICES
HD Video Processor 1080p24/30 Video Decode 1080p24/30 Video Encode H.264 | MPEG4 | VC1 | MPEG2 VP8
Kepler GeForce®
GPU w/CUDA
OpenGL-ES nextgen
192 Stream Processors
2D Graphics/Scaling
DAP x5 (12S/TDM)
HDMI eDP/LVDS
ARM
7
Audio
Pro
cess
or Image Processor
25MP Sensor Support ISP 1080p60 Enhanced JPEG Engine
PCIe* G2 x4 + x1
CSI x4 + x4
SATA2 x1 USB 2.0 x3
Security Engine
Display x2
NOR Flash
UART x4 I2C x5
DDR3 Ctlr 64b
800+ MHz
SPI x4 SDIO/MMC x4
28 nm HPM 23x23mm, 0.7mm pitch HS-FCBGA
USB 3.0* x2
Quad Cortex-A15
4x Cores (1+ GHz) NEON SIMD 2 MB L2 (Shared) ARM Trust Zone
Shadow LP C-A15 CPU
TEGRA K1
KEPLER Architecture 192 CUDA Cores, SM3.2 ISA Compatible to GeForce, Quadro, Tesla 64kb L1 Cache and Shared Memory 128kb L2 Cache 128 kb Register File
HD Video Processor 1080p24/30 Video Decode 1080p24/30 Video Encode H.264 | MPEG4 | VC1 | MPEG2 VP8
Kepler GeForce®
GPU w/CUDA
OpenGL-ES nextgen
192 Stream Processors
2D Graphics/Scaling
DAP x5 (12S/TDM)
HDMI eDP/LVDS
ARM
7
Audio
Pro
cess
or Image Processor
25MP Sensor Support ISP 1080p60 Enhanced JPEG Engine
PCIe* G2 x4 + x1
CSI x4 + x4
SATA2 x1 USB 2.0 x3
Security Engine
Display x2
NOR Flash
UART x4 I2C x5
DDR3 Ctlr 64b
800+ MHz
SPI x4 SDIO/MMC x4
28 nm HPM 23x23mm, 0.7mm pitch HS-FCBGA
USB 3.0* x2
Quad Cortex-A15
4x Cores (1+ GHz) NEON SIMD 2 MB L2 (Shared) ARM Trust Zone
Shadow LP C-A15 CPU
GPU
SW CONSIDERATIONS
Need APIs and frameworks to develop SW
— Flexible and complete enough for experimentation
— Fast and stable enough for productization
— Portable for installed base
APIs and libraries
— Android Camera HAL v.3
— OpenCV
— OpenVX
— StreamInput
— VisionWorks
— CUDA
Camera HAL v3 is a fundamentally new API
— Flexible primitives for building
sophisticated use-cases
— Interface is clean and easily extensible
— Apps can have more control, and more
responsibility
Enables sophisticated camera applications
Faster time to market and higher quality
— 1 Request 1 capture
1 result metadata + N image buffers
ANDROID CAMERA HAL V3
OPENCV LIBRARY Version 2.4.5 >900 functions (x the datatypes)
OpenCV4Tegra acceleration:
— CUDA, NEON, GLSL, TBB multithreading
General Image
Processing
Segmentation Machine Learning,
Detection Image Pyramids Transforms Fitting
Image processing
Video, Stereo, and 3D
Camera Calibration Features Depth Maps Optical Flow Inpainting Tracking
OpenCV
VISIONWORKS
Sobel
Convolve
Bilateral Filter
Integral Image
Integral Histogram
Corner Harris
Corner FAST
Image Pyramid
Optical Flow PyrLK
Optical Flow Farneback
Warp Perspective
Hough Lines
Fast NLM Denoising
Stereo Block Matching
IME (Iterative Motion
Estimation)
HOG (Histogram of
Oriented Gradients)
Soft Cascade Detector
Object Tracker
TLD Object Tracker
SLAM
Path Estimator
MedianFlow Estimator
IT IS HAPPENING!
Use cases emerging
Tegra K1 mobile compute power in mobile devices
Software Infrastructure
THANKS