vision for robotics – detection and tracking

Vision for Robotics –Detection and Tracking

Markus VinczeAutomation Control Institute

Vienna University of [email protected]

www.acin.tuwien.ac.at

PSFMR – Fermo, 11.-16.9.2006

Content• Overview

• Tracking– Model-based tracking

– Interest point tracking

– Maximum tracking velocity

• Detection– Perceptual grouping

• Cognitive Vision

„Robot Assistent“„James, please bring me my cup“

Research fields• Machine & cognitive vision• Robotics, visual servoing• System integration

Motivation

3D (6 DoF) Object TrackingRobVision – EU Project 1998 - 2001

• Model-based: line, ellipse determination of 3D pose

• Robustness: integration of model and image cues

• Real time, 25 Hz

Robot Navigation in OfficeVision for Robotics (V4R Tracking Tool)

Detection of Structure• Find cylinders

• First step to perceivefunktion– Container; graspable

• Structure to reducecombinatorial complexity

Vision for Natural Interaction

System functions• Detect, track, recognise

• Spatio-temporal object relationships in 3D

• Semantic interpretation

ActIPret: Interpretation of human activities with objects

ActIPret – EU Project 2001 - 2004

MOVEMENT• Movement of

– Persons, objects, data

• Task: autonomous navigation– Wheelchair and table

– Obstacle avoidance

– Navigation

person

Infor-mationobject

MOVEMENT – EU IST Project 2004-2007

Sensor Concept

Rationale of Sensor Concept• Stereo vision

– 3D: detect tables, chairs

– Cheap, only alternative TOF (time-of-flight camera)

– Both investigated

• Infrared– Special directions: door traversal

• Bumpers– Last resort, hopefully never used

Example: Table Scene• Objects learned from one scan

• Detection in one view, 2 sec.

Summary• 2D: robust detection and

tracking

• 3D: classes of features

• Spatio-temproalrelationships

• Prediction, context, kognitive approaches

• Framework – integrationof vision and robots

Vision for Automation

(Some) Tasks of Perception• Object detection

– Objects? Rather collection of primitives

– Primitives or features• Interest points

• Edge features: line, junction, parallels, rectangle; arc, ellipse

• Surface patches

– Result: object location in image (2D) and/or pose (3D)

• Object tracking: following a detected object/feature– Feature location in image seuqences

– Result: real-time pose through sequence (2D and/or 3D)

Approaches• Model based

– CAD model of object, environment; geometric features

• Appearance based – Enables easier learning of objects

– Interest points or „whole“ object

• Mixture: structure in data – Gestalt principles– Model physics of world and imaging process

(rather than objects)

– Features, perceptual grouping

Content• Overview

• Tracking– Model-based tracking– Interest point tracking




System View

• Task is known

• Objects involved are knwon some model

• Environment is partly known

Object Tracking• Arbitrary motion in 3D

– Navigation, manipulation

• Robustness in real world environment

• System integration: dynamic aspects

State of the Art (1/2)• Model based object tracking

– Gradient: Harris‘88, Dickmanns‘88, Lowe‘92, Nagel‘00, Thompson‘01, Drummond‘02, Kragic‘03

– Motion model: Dickmanns‘88, Gennery‘92, Isard‘98

[Thompson’01] [Drummond’02] [Kragic‘03]

Model-based Object Tracking

State of the Art (2/2)• Integration of image cues (Cue Integration)

– Edge classification: Hoff‘89, Poggio‘89

– Region based: Aloimonos‘89, Toyama‘99, Kragic’01, Schiele‘02

Show object: colour + texture = found


Tracking in V4R (Vision for Robotics)

• Model-based system for Object tracking

• Robustness by integrating and evaluating cues

Objecttracking

3D object pose

Image

Features

3D object pose

Approach• Window warping (Hager98):

– z.B. Line is vertical in the image

• Color Edge ProjectedIntegration of Cues: – Pre-selection of relevant

Edgels– Local cues:

• Image: Intensity, color, (texture)• Model: Region belonging to

object

CEPIC

feature candidates

window

Tracking Windows

Warping [Hager98]

...

...

Integration of Image and Model CuesEPIC - Edge Projected Integration of Cues (1/2)

1. Edge detection: all edges

∑=

+=cues

irechtsirechtslinksilinks HwHwe

1,,

Image: intensity, colourAdaptation: μ und σ

Model: object side

YT1T2

2. For each edgel:

Integration of Image and Model Cues EPIC - Edge Projected Integration of Cues (2/2)

3. Selection of the most likely edgels

Adaptiver Schwellwert

Wahrscheinlichkeit e

An z

ahl K

a nte

n ele

men

te

Likelihood e

Adaptive Threshold

# of

edg

els

Extension – Occlusion Handling

Gradient only vs. CEPIC

Tests – Example Magazine Box

• Maximum gradient only • Local and global cues

Model-based Approach• Topological Integration of Cues:

– Test of the feature topologies from the model:

• Junctions, Parallel Lines

– Global evaluation of sets of feature candidates

TOPIC

feature candidates

object candidates

Pose Validation

3D pose

• Pose Validation: – Validation of the image feature to

model feature fit– Final feature selection– Detection of outliers

feature candidates

3D Pose

object candidates

Pose Validation

TOPIC

Approach – Self-evaluation• Scene-dependent

evaluation of cues

• Ambiguity of elements is a measure for the perceived scene complexity– e.g. # candidates / feature

• Implementation: – Optional call of global

evaluation methods (i.e. TOPIC and Pose Validation)

feature candidates

Switch

Tests – Example Toy Helicopter

• CEPIC • Local and global cues

Tests – ResultsMethod % correct % wrong factor time

Max. Gradient 54.1 22.6 1*Epic (Intensity only) 61.3 13.2 1.09Cepic 71.5 11.0 3.06Cepic+Topic 74.5 7.8 3.33Topic+Pose 68.3 11.7 1.54Cepic+Topic+Pose 77.1 5.0 3.35

Switch 77.6 4.6 3.3

*factor 1 = 5.4 ms/Line

Conclusion – Tracking (1/2)• Improvement with each additional cue

• Edges tracked: 77.6 %, Wrong edge: 4.6%

• Remaining: 14.0%– Bad contrast, reflexions, camera saturation


Conclusion – Tracking (2/2)• Increasing robustness by self-evaluation using

perceived redundancy

• Size: is known, easy to estimate, exploit it

• Limits– Texture, multi-colored regions

– Few control points

• Problem: automatic initialisation

• Run live - otherwise tuned to sequences

• V4R homepage: http://robsens.acin.tuwien.ac.at/v4r/


Content• Overview


– Interest point tracking– Maximum tracking velocity



Objects and Interest Points• Extraction of interest points

(characteristic locations)

• Computation of local descriptors

• Determining correspondences

• Detect similar image parts (objects)

Extraction of Interest Points• Corner detectors

– Harris, Hessian

• Multi-scale corner detectors (with scale selection)– Scale invariant Harris and Hessian corners

– Difference of Gaussian (DoG) (Lowe)

• Affine covariant regions– Harris-Affine (Mikolajczyk, Schmid ‘02, Schaffalitzky, Zisserman ’02)

– Hessian-Affine (Mikolajczyk and Schmid ’02)

– Maximally stable extremal regions (MSER) (Matas et al. ’02)

– Intensity based regions (IBR) (Tuytelaars and Van Gool ’00)

– Edge based regions (EBR) (Tuytelaars and Van Gool ’00)

– Entropy-based regions (salient regions) (Kadir et al. ’04

Scale Invariant Harris Points• Multi-scale extraction of Harris interest points

• Selection of points at characteristic scale in scale space

Characteristic scale:

• Maximum in scale space

• Scale invariant

[Mikolajczyk 04]

Difference of Gaussian (DoG)• Detect peaks in the difference of Gaussian pyramid

[Lowe 04]

Affine Covariant Regions

[Mikolajczyk 04]

Harris-Affine and Hessian-Affine (1)

[Mik05]

Harris-Affine and Hessian-Affine (2)• Initialization with multi-scale interest points

• Iterative modification of location, scale and neigh-borhood

[Mik04]

Maximally Stable Extremal Regions (MSER)

[Mik05]

Maximally Stable Extremal Regions (MSER)

• Threshold image intensities: I > I0• Extract connected components “Extremal Regions”

• Find threshold when extremal region is “Maximally Stable”, i.e. local minimum of the relative growth of its square

• Approximate a region with an ellipse

• Local Affine Frame

[Matas 02]

Computation of Local Descriptors

• Distinctive

• Robust

• Invariant to geometric & photometric transformation

• Descriptors– Sampled image patch– Gradient orientation histogram – SIFT (Lowe)– Shape context (Belongie et al. ’02)– PCA-SIFT (Ke and Sukthankar ’04)– Moment invariants (Van Gool ’96)– Gaussian derivative-based (Koenderink ’87, Freeman ’91)– Complex filters (Baumberg ’00, Schaffalitzky and Zisserman ’02)

Gradient Orientation Histogram (SIFT – Scale Invariant Feature Transformation)• Thresholded image gradients are sampled over

16x16 array of locations in scale space

• Create array of orientation histograms

• 8 orientations x 4 x 4 histogram array = 128 dimen.

[Lowe 04]

PCA-SIFT Local Descriptor

From Sukthankar 2004 [Ke04]

Interest points can be used for ...• Object recognition

• Object recognition and segmentation

• Robot Localization

• Tracking

Planar Recognition• Planar surfaces can be

reliably recognized at a rotation of 60° away from the camera

• Affine fit approximates perspective projection

• Only 3 points are needed for recognition

Cope with occlusion

[Lowe]

Recognition Under Occlusion

[Lowe]

Recognition and Segmentation• Initialisation of

object surface with dense features

• Iterative search for visible features using affine refinement of features

[Ferrari 04]

Robot Localization

[Se 05]

Tracking of Interest Points

Interest Point Tracking and Occlusion Reasoning

• Grouping KLT features based on motion

• Detect occlusion based on appearance and disappearance of interest points

Appearance-based Object Recognition

• Training with segmented images

• Representation in high dimensional or reduced (Principal Component Analysis PCA) space

• Separate objects linear or non-linear (kernel methods, SVM)

• Challenges– Illumination, Scale, Occlusion

[Bischof, Summerschool 2005]

PCA for visual recognition and pose estimation

[Bischof 02]

Origin

Margin

w

H1

H2

Object Recognition using SVM• Approximate 200 trainings images / object

(RGB, different views, different light)

• Background trainings images

• Hyperspace with 3072 dimensions

• Iterative calculation of separating surface betweentwo classes of objects

[Zillich 01]

...... ...

... histogramintersection...

Database of histograms of object models

Image with anunknown object

Histograms for Object Representation

[Swain 90]

Tracking using Colour Histograms

• Simple approach

• Very fast (~30 fps)

Summary• Model-based

+ Robust

– Difficult to model: how to extract high level features for wire-frame model?

• Appearance-based+ Learning by showing is possible

- Sensitive to illumination, view point, pose

* Interest points are presently en vogue

Content• Overview


– Interest point tracking




Velocity of Target

Video: Slow motion of target object.

Velocity of Target

Video: Fast motion of target object.

Maximum Target Velocity

• Maximum velocity of target in image:latencyradius

[sec]][pixel

radius

target

Image

radius

Window

Radius

• Calculation time: ∝ #Pixel = 4Cr2

• C depends on image processing method– Z.B.: PETS Workshops, IEEE ICRA, ECCV, CVPR

2r

2r

Latency

• Sum of all times in control loop (T )– E.g., image acquisition, data transfer time, other

latencies

• + time for image processing

Controlsignal

Δx

Δy

Controller Visionsystem

Maximum Tracking Velocity

⇒ Calculation time = sum of latencies

⇒ Maximum at T = 4Cr2

resp. at CTr /2/1=[sec]][

4 2pixel

CrTr

latencyradiusv

+==

Tesselation of Image with Fovea

[Sandini]

[IBIDEM retina]

Log-polar fovea Image pyramide

Tracking Velocity

Video: Fast motion of target object

Maximum Tracking Velocity

⇒ exploit full view angle

Increasing size of Fovea

Radius [pixel]

Trac

king

vel

ocity

[pix

el/s

ec]

ExperimentsTr

acki

ng v

eloc

ity[p

ixel

/sec

] Image pyramidFovea: 21 pixel

const. resolutionWindow

Radius [pixel]

Summary – Obtaining High Tracking Velocity

• Cameras with fovea

• Presently: CCD, CMOS sensors– Adjust tracking window to latency of control loop

• Reduce latency or resolution (1:1)

• Faster computer, higher frame rate (2: )

• Results independent of controller– Imperfect controller only reduces height of peak

2

State of the Art

vision for robotics – detection and tracking

Documents