3d object recognition pipeline kurt konolige, radu rusu, victor eruhmov, suat gedikli willow garage...

3D Object Recognition Pipeline

Kurt Konolige, Radu Rusu, Victor Eruhmov, Suat Gedikli

Willow Garage

Stefan Holzer, Stefan Hinterstoisser

TUM

Morgan Quigley, Stephen GouldStanford

Marius MujaUBC

2

3D and Object Recognition

•Provides more info than just visual texture

•Good for scale and segmentation

•Verification

Need a good device for 3D info

3

3D CamerasTechnology Examples Pro/Con

Stereo Newcombe, Davison CVPR 2010

Not dense, smearing; real-time, good resolutionRegistration + regularization

Stereo + texture WG device Dense, real-time, good resolutionShort range

Laser line scan STAIR Borg scanner Dense, most accurateShort range, not real time

Structured light PrimeSense Dense, real-time, good resolutionShort range, ambient light/scene texture

Phase shift SR4, PMD Dense, real-time, medium rangeLow resolution, low accuracy, gross errors

Gated reflectance Canesta Dense, real-timeLow resolution, low accuracy

Tabletop manipulation:• Short range• High resolution• High range accuracy• Real-time

Technology Examples Pro/Con

Stereo Newcombe, Davison CVPR 2010

Not dense, smearing; real-time, good resolutionRegistration + regularization

Stereo + texture WG device Dense, real-time, good resolutionShort range

Laser line scan STAIR Borg scanner Dense, most accurateShort range, not real time

Structured light PrimeSense Dense, real-time, good resolutionShort range, ambient light/scene texture

Phase shift SR4, PMD, Canesta Dense, real-time, medium rangeLow resolution, low accuracy, gross errors

Gated reflectance 3DV Dense, real-timeLow resolution, low accuracy

WG Projected Texture Stereo Device

• Paint the scene with texture from a projector• vs. single camera with structured light

• Advantages:• Simple projector• Standard algorithms• Full frame rates (640x480)• Dynamic scenes

WG project texture device

Projector• Red LED• Eye safe• Synchronized to cameras

3D Fly-thru

6

Object Recognition Pipeline

•Textured objects via keypoints [Victor Eruhimov, Suat Gedikli]

•Untextured objects via DOT [Stefan Holzer, Stefan Hinterstoisser]

•Simple 3D model matching [Marius Muja]

•STAIR 2D/3D features [Stephen Gould]

Pre-filter Detect Verify

7

MOPED – Textured object recognition with pose

•Model: Stereo view of an object at a known pose

•Extract keypoints and features

•For a new scene, match keypoints to each model

•Run SfM geometric check to verify and recover pose

Torres, Romea, Srinivasa ICRA 2010

8

- Need texture- Need high res camera

Dominant Orientation Templates (DOT) Stefan Hinterstoisser, Stefan Holzer (TUM; CVPR 2010, ECCV

2010)● DOT is a template matching based approach

template current scene

- Template is slid over the image to compute the response for each image position- If response is above a threshold it is considered as detection of the template

DOT – Basic Principle● DOT uses gradients instead of color or gray values

template current scene

- Gradients are less sensitive to illumination changes- Gradients have orientation and magnitude

Offline Learning● Good learning is necessary to reduce false-positive rate● We try to use all available information to segment the object:

● Point cloud from narrow stereo is used to detect the table and segment the point cloud of the object

● Object point cloud is used to create an initial mask● Mask is refined using GrabCut (see OpenCV)

False-Positive Rejection

● Two more precise templates for validation:● more precise and not discretized gradient template● disparity template to compare expected with real disparities

False-Positive Rejection

● Compute error between reference point cloud and point cloud at detected position

Optimize initial 3D point cloud pose given from the detection

Directly gives object pose if model is associated with learned point clouds

STAIR Vision Library (SVL)Stanford STAIR project [Andrew Ng, Stephen

Gould]• Initially developed to

support the Stanford AI Robot (STAIR) project

• Builds on top of OpenCV computer vision library and Eigen matrix library

• Provides a range of software infrastructure for

• computer vision

• machine learning

• probabilistic graphical models

• Hosted on SourceForge

Object Detection in SVL• Sliding-window object detector

• Features are extracted from a local window

• Learned boosted decision-tree classifier scores each window

• Image is scanned at multiple resolutions to detect objects at different scales

Image Channels• Image decomposed into multiple channels

• Depth at each pixel, obtained from a laser scanner, can be thought of as an additional channel

intensity image edge map depth map

[Quigley et al., ICRA 2009]

Object Detection Features

• Learn a “patch” dictionary over intensity, edge and depth channels

• Patches encode localized templates for matching

• Depth patches capture shape; intensity and edge patches capture appearance

• Patch responses (over entire dictionary) are combined to form the feature vector


Results• 150 images of cluttered indoor scenes

• 5-fold cross-validation

• Depth information provides significant improvement in area under precision-recall curve


8% improvement 3% improvement 38% improvement

24

Conclusions

•Realtime, accurate 3D devices are becoming available

•3D can help in object detection for untextured objects

- Combo of visual and 3D features best

•3D is useful for verification

•Check out the PR2 Grasping Demo!

3d object recognition pipeline kurt konolige, radu rusu, victor eruhmov, suat gedikli willow garage...

Documents

good device

dot stefan holzer

realtimelow resolution

accurateshort range

stefan hinterstoissersimple

d info2

need texture

stefan holzer tum cvpr