3d object recognition u sing computer vision

32

Upload: melissa-stephens

Post on 03-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

3D Object Recognition U sing Computer Vision. VanGogh Imaging, Inc. Kenneth Lee. CEO/Founder [email protected]. Corporate Overview. Founded in 2007, located in McLean VA - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: 3D Object Recognition  U sing Computer Vision
Page 2: 3D Object Recognition  U sing Computer Vision

3D Object Recognition Using Computer Vision

VanGogh Imaging, Inc.

Page 3: 3D Object Recognition  U sing Computer Vision

Kenneth LeeCEO/Founder

[email protected]

Page 4: 3D Object Recognition  U sing Computer Vision

Corporate Overview

Founded in 2007, located in McLean VA

Mission: “Provide easy to use, real-time 3D computer vision (CV) technology for embedded and mobile applications” – 2D to 3D for better visualization, higher reliability, and accuracy– Solve problems that require spatial measurements (e.g. parts inspection)

Target customer: Application and System Developers– Enhance existing product or develop new products

Product: ‘Starry Night’ 3D-CV Middleware (Unity Plugin)– Operating Systems: Android and Linux– 3D Sensor: Occipital Structure and Intel RealSense– Processors: ARM and Xilinx Zynq

Our focus– Object recognition– Feature detection– Analysis (e.g., measurements)

Page 5: 3D Object Recognition  U sing Computer Vision

Potential Applications

3D Printing Parts Inspection Robotics

Entertainment

Automotive Safety

Security

Medical Imaging

Page 6: 3D Object Recognition  U sing Computer Vision

Challenges for ImplementingReal-Time 3D Computer Vision

– Busy uncontrolled real-world environment– Limited processing power and memory– Noisy and uncalibrated low-cost scanners– Difficult to use libraries– Hard to find proficient computer vision engineers– Lack of standards– Large development investment

Page 7: 3D Object Recognition  U sing Computer Vision

Starry Night Unity Plugin(patent pending)

Starry Night Video:https://www.youtube.com/watch?v=IZX-9PH7Erw&feature=youtu.be

Page 8: 3D Object Recognition  U sing Computer Vision

The ‘Starry Night’ Template-Based3D Model Reconstruction

Reliable - The output is always a fully-formed 3D model with known

feature points despite noisy or partial scans

Easy to use – Fully automated process

Powerful – Known data structure for easy analysis and measurement

Fast – Real-time modelingInput Scan (Partial) + Reference Model = Full 3D Model

Page 9: 3D Object Recognition  U sing Computer Vision

3D Object Recognition Algorithm

for mobile and embedded Devices

Page 10: 3D Object Recognition  U sing Computer Vision

Challenges - Scene

Busy scene, object orientation, and occlusion

Page 11: 3D Object Recognition  U sing Computer Vision

Challenges - Platform

Mobile and Embedded Devices– ARM – A9 or A15, <2G RAM – Existing libraries were built for laptop/desktop platform– GPU processing is not always available

Page 12: 3D Object Recognition  U sing Computer Vision

Previous Approaches

(2D) Texture-Based Methods– Color-based → depends heavily on lighting or color of the object– Machine learning → robust, but requires training for each object– Neither method provides transform (i.e., orientation)

(3D) Methods– Hough transform and geometric hashing → slow– Geometric hashing → even slower– Tensor matching → not good for noisy and sparse scene– Correspondence-based methods using rigid geometric descriptors

– The models must have distinctive feature points which is not true for most models (i.e., cylinder)

Tried

Page 13: 3D Object Recognition  U sing Computer Vision

General Concept for CV-BasedObject Recognition

Reference Object

Descriptor

Scene

Compare

Distance & Normal

Distance & Normal ofRandom Sample Points

Match CriteriaFine-Tune Orientation

LocationTranspose

Page 14: 3D Object Recognition  U sing Computer Vision

Block Diagram

Page 15: 3D Object Recognition  U sing Computer Vision

Model Descriptor (Pre-Processed)

Sample all point pairs in the model that are

separated by the same distance D

Use the surface normal of the pair to group them into the

hash tablet

key

(α1,β1,Ω1) P1, P2 P3, P4

(α2,β2,Ω2) P5, P6 P7, P8 P9, P10 P11, P12

(α3,β3,Ω3) P13, P14

Note: In the bear example, D = 5 cm which resulted in 1000 pairs

Note: The keys are angles derived from the normal of the points.alpha(α) = first normal to second pointbeta(β) = second normal to first pointomega(Ω) = angle of the plane between two points

Page 16: 3D Object Recognition  U sing Computer Vision

Object Recognition Workflow

Grab Scene

Sample point pair w/ distance D using

RANSAC

Generate key using same hash function

Use key to retrieve similarly oriented

points in the model & rough transform

Match criteria to find the best match

Use ICP to refine transform

Note: The example scene has around 16K points

Note: We iterated this sampling process 100 times

Note: Entire process can be easily parallelized

Very Important: Multiple models can be found using a single hash table, for example, sampled point pair in the scene

Page 17: 3D Object Recognition  U sing Computer Vision

Implementation

Result

Object Recognition Video:https://www.youtube.com/watch?v=h7whfei0fTw&feature=youtu.be

Page 18: 3D Object Recognition  U sing Computer Vision

Object Recognition Examples

* CONFIDENTIAL * 18

Page 19: 3D Object Recognition  U sing Computer Vision

Adaptive 3D Object Recognition Algorithm

Resize and Reshape

Page 20: 3D Object Recognition  U sing Computer Vision

Object Recognitionfor Different Sizes & Shape

Objects in the real world are not always identical

Similarity Factor, S%, can be used to denote % of shape difference– This allows recognition of object that’s similar but does not have the

exact shape as the reference model

Size Factor, Z%, can be used to note the % size the object can recognize– This allows recognition of object that’s of different sizes from the

reference model

Page 21: 3D Object Recognition  U sing Computer Vision

General Approach

Dynamically resizes the reference model

Dynamically reshapes the reference model– Uses our ‘Shape-based Registration’ technique

Hence, the reference model is ‘deformed’ to match the object in the scene

Results in very robust object recognition

The end reference model best represents the object in the scene both in size and shape

Page 22: 3D Object Recognition  U sing Computer Vision

Block Diagram – Adaptive Object Recognition with feedback

Reference model is iteratively modified with every new frame until it converges into the same object in the scene

Note: Currently in the process of being implementedand will be available in Version 1.2 later this year

Page 23: 3D Object Recognition  U sing Computer Vision

Object RecognitionPerformance Numbers

Page 24: 3D Object Recognition  U sing Computer Vision

Reliability (w/ bear model)

Reliability– % false positives – depends on the scene

– Clean scene: <1%– Noisy scene: 5% (1 out of 20 frames)

– % negative results (cannot find the object)– Clean scene: <1%– Noisy scene: 10% (also takes longer)

Effect of orientation on success ratio– Model facing front: >99%– Model facing backwards: >99%– Model facing sideways (narrower): 85%

Page 25: 3D Object Recognition  U sing Computer Vision

Performance - Mobile

Performance on Cortex A-15 2GHz ARM (on Android mobile)– Amount of time it takes to find one object

– Single thread: 2 seconds– Multi-thread & NEON: 0.3 second

– Amount of time it takes to find two objects– Single thread: 2.5 seconds– Multi-thread & NEON: 0.5 second

Note: Effective use of NEON led to significant performance gains of X2.5 for certain functions

Page 26: 3D Object Recognition  U sing Computer Vision

Hardware Acceleration Using FPGA

• Xilinx Zynq SoC provides 20 to 1,000 parallel voxel processors depending on the size of the FPGA

Zynq

ARM

FPGA

Processor 1

Processor 1

Processor 1

Processor 1

Processor 20+

voxel

voxel

voxel

voxel

voxel

scan

Page 27: 3D Object Recognition  U sing Computer Vision

Hardware Acceleration:FPGA (Xilinx Zynq)

Select Functions to Be Implemented in Zynq– FPGA: Matrix operations– Dual-core ARM: Data management + Floating point– Entire implementation done in C++ (Xilinx Vivado-HLS)

Page 28: 3D Object Recognition  U sing Computer Vision

Performance:Embedded Using FPGA

Note: Currently, only 30% of the computationally intensive functions are implemented on the FPGA with the rest still running on ARM A9. Speed will be much improved once the remaining high-intensity functions are transferred to the FPGA.

Performance on Xilinx Zynq (Cortex A-9 800 MHZ + FPGA)– Amount of time it takes to find one object

– Zynq 7020: 0.7 second– Zynq 7045 (est.): 0.1 second

– No test results for two objects, but should scale the same way as for the ARM

Page 29: 3D Object Recognition  U sing Computer Vision

Future

The chosen algorithm works well in most real-world conditions

The chosen algorithm is tolerant to size and shape differences respect to the reference model

The chosen algorithm can find multiple objects at the same time with minimal additional processing power

Additional improvements in performance are needed– Algorithm– Application-specific parameters (e.g., size of the model descriptor)– ARM - NEON– Optimize the use of FPGA core

Page 30: 3D Object Recognition  U sing Computer Vision

Summary

Key implementation issues– Model descriptor– Data structure– Sampling technique– Platform

IMPORTANT– Both ARM & FPGA provide the scalability

Therefore– Real-time 3D object recognition was very difficult but

successfully implemented on both mobile and embedded platforms!

LIVE DEMO AT THE Xilinx BOOTH!

Page 31: 3D Object Recognition  U sing Computer Vision

Resources

www.vangoghimaging.com

Android 3D printing: http://www.youtube.com/watch?v=7yCAVCGvvso

“Challenges and Techniques in Using CPUs and GPUs for Embedded Vision” by Ken Lee, VanGogh Imaging—http://www.embedded-vision.com/platinum-members/vangogh-imaging/embedded-vision-training/videos/pages/september-2012-embedded-vision-summit

“Using FPGAs to Accelerate Embedded Vision Applications”, Kamalina Srikant, National Instruments— http://www.embedded-vision.com/platinum-members/national-instruments/embedded-vision-training/videos/pages/september-2012-embedded-vision-summit

“Demonstration of Optical Flow algorithm on an FPGA”—http://www.embedded-vision.com/platinum-members/bdti/embedded-vision-training/videos/pages/demonstration-optical-flow-algorithm-fpg

* Reference: “An Efficient RANSAC for 3D Object Recognition in Noisy and Occluded Scenes” by Chavdar Papazov and Darius Burschka. Technische Universitat Munchen (TUM), Germany.

Page 32: 3D Object Recognition  U sing Computer Vision