computational vision: object recognition object recognition jeremy wyatt

32
Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Post on 19-Dec-2015

240 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Object Recognition

Jeremy Wyatt

Page 2: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Plan

David Marr: the model based approach to vision

Model based approaches: Geons, Model Fitting

Appearance based approaches: PCA, SIFT, implicit shape model

Psychological Evidence: View dependent vs. view independent recognition

Summary: who is right?

Page 3: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Model based vision David Marr was a brilliant young British vision researcher

who defined a coherent approach to the study of vision during the 1970s

According to one tradition coming out of Marr’s work:

• Vision is process of reconstructing the 3d scene from 2d information

• The vision system has representations of 3d geometric structures

• Visual pipeline

• So selecting models and recovering their parameters from image data is a key task in vision

Intensity image

Primal sketch

Model selection

2.5d sketch

Page 4: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Model based vision

There is an infinite variety of objects. How do we represent, store and access models of them efficiently?

One suggestion was the use of a small library of 3d parts from which many complex models can be constructed

There are many schemes: generalised cylinders, Geons, Superquadrics

Vision researchers set about applying them

Page 5: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Models vs Appearances

But they didn’t work very well …

By the early 1990s people were experimenting with statistical techniques, e.g. PCA

These learn a statistical summary of the appearance of each view of an object

Appearance Model

Page 6: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Appearance based recognition: SIFT

These statistical approaches characterise some aspects of the appearance of an object that can be used to recognise it

But this means they are (largely) view dependent, you have to learn a different statistical model for each different view

e.g. SIFT based recognition (David Lowe, UBC)

• Find interest points in the scale space• Re-describe the interest points so that

they are robust to: Image translation, scaling, rotation Partially invariant to illumination

changes, affine and 3d projection changes

Page 7: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Category level recognition (Thanks to Bastian Liebe)

Page 8: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Category level recognition (Thanks to Bastian Liebe)

Page 9: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Category level recognition (Thanks to Bastian Liebe)

Page 10: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Constellation model (Thanks to Bastian Liebe)

Page 11: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Constellation Model (Thanks to Bastian Liebe)

Page 12: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Implicit Shape Model (Thanks to Bastian Liebe)

Page 13: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Implicit Shape Model (Thanks to Bastian Liebe)

Page 14: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Implicit Shape Model (Thanks to Bastian Liebe)

Page 15: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Implicit Shape Model (Thanks to Bastian Liebe)

Page 16: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Implicit Shape Model (Thanks to Bastian Liebe)

Page 17: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Implicit Shape Model (Thanks to Bastian Liebe)

Page 18: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Implicit Shape Model (Thanks to Bastian Liebe)

Page 19: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Implicit Shape Model (Thanks to Bastian Liebe)

Page 20: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Implicit Shape Model (Thanks to Bastian Liebe)

Page 21: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Towards Scalable Representations of Object Categories: Learning a Hierarchy of Parts

Aleš Leonardis and Sanja Fidler

University of LjubljanaFaculty of Computer and Information Science

Visual Cognitive Systems Laboratory

Reproduced with permission

Page 22: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Framework

Main properties of the framework:Main properties of the framework:

• Computational plausibility Computational plausibility

Hierarchical representationHierarchical representation

CCompositionalityompositionality ( (parts composed of partsparts composed of parts))

IIndexing & matchingndexing & matching recognition scheme recognition scheme

• Statistics driven learning (unsupervised learning)Statistics driven learning (unsupervised learning)

• Fast, incremental (continuous) learningFast, incremental (continuous) learning

Page 23: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Recognition: Indexing and matching

image

car motorcycle dog person

hypotheses

verification

Gradually limiting the searchGradually limiting the search

LEARNLEARN

Page 24: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Overview of the architecture

Starts with simple, local features and Starts with simple, local features and learnslearns more more and more complex and more complex compositionscompositions

Learns layer after layerLearns layer after layer to exploit the regularities to exploit the regularities in natural images as efficiently and compactly as in natural images as efficiently and compactly as possiblepossible

Builds computationally feasible layers of parts by Builds computationally feasible layers of parts by selecting only the most selecting only the most statistically significant statistically significant compositions of specific granularitycompositions of specific granularity

Learns Learns lower layers in a category independent lower layers in a category independent wayway (to obtain optimally sharable parts) and (to obtain optimally sharable parts) and category specific higher layerscategory specific higher layers which contain which contain only a small number of highly generalizable parts only a small number of highly generalizable parts for each categoryfor each category

New categories can efficiently and continuously be New categories can efficiently and continuously be added to the representation without the need to added to the representation without the need to restructure the complete hierarchyrestructure the complete hierarchy

Implements parts in a robust, Implements parts in a robust, layeredlayered interplay of interplay of indexing & matchingindexing & matching

Page 25: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Part based appearance recognition (Fidler & Leonardis 07)

Page 26: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Learned hierarchy for faces and cars (first three layers are the same; links show compositionality for each of the categories; spatial variability of parts is not shown)

Results

Page 27: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Part based appearance recognition (Fidler & Leonardis 07)

Page 28: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Results - Detections

Page 29: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Results - Specific categories, faces

Detection of Layer5 parts

Page 30: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Results - Specific categories, faces

Page 31: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Evidence from biology

Is human object recognition view dependent?

Shepherd & Miller

Pinker & Tarr

There is a quite a large body of experimental data that supports the view dependent camp.

Appearance based approaches fit neatly with this camp.

Page 32: Computational Vision: Object Recognition Object Recognition Jeremy Wyatt

Computational Vision: Object Recognition

Summary

This is not a resolved debate

There is evidence for both sides

Structural 3d information is almost certainly extracted by the brain too

Model based: how do we extract good enough low level features (e.g. a depth map)?

Appearance based: only seems to be good for recognition, which is a small part of the vision problem.