07. object recognition (2001)

50
Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition CS564 Lecture 7. Object Recognition and Scene Analysis Reading Assignments: TMB2: Sections 2.2, and 5.2 “Handout”: Extracts from HBTNN 2e Drafts: Shimon Edelman and Nathan Intrator: Visual Processing of Object Structure Guy Wallis and Heinrich Bülthoff: Object recognition, neurophysiology Simon Thorpe and Michèle Fabre-Thorpe: Fast Visual Processing (My thanks to Laurent Itti and Bosco Tjan for permission to use the slides they prepared for lectures on this topic.)

Upload: subhajit-kamila

Post on 02-Feb-2016

10 views

Category:

Documents


0 download

DESCRIPTION

Object Recognition project

TRANSCRIPT

Page 1: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

CS564 – Lecture 7. Object Recognition

and Scene Analysis

Reading Assignments:TMB2: Sections 2.2, and 5.2

“Handout”: Extracts from HBTNN 2e Drafts: Shimon Edelman and Nathan Intrator: Visual Processing of Object Structure

Guy Wallis and Heinrich Bülthoff: Object recognition, neurophysiology

Simon Thorpe and Michèle Fabre-Thorpe: Fast Visual Processing

(My thanks to Laurent Itti and Bosco Tjan for permission to use the slides they

prepared for lectures on this topic.)

Page 2: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Page 3: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Bottom-Up Segmentation or Top-Down Control?

Page 4: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Object Recognition

What is Object Recognition?

Segmentation/Figure-Ground Separation: prerequisite or consequence?

Labeling an object [The focus of most studies]

Extracting a parametric description as well

Object Recognition versus Scene Analysis

An object may be part of a scene or

Itself be recognized as a “scene”

What is Object Recognition for?

As a context for recognizing something else (locating a house by the tree

in the garden)

As a target for action (climb that tree)

Page 5: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

"What" versus "How” in Human

Visual

Cortex

Parietal

Cortex

Inferotemporal

Cortex

How (dorsal)

What (ventral)

reach programming

grasp programming

AT: Goodale and Milner

Lesion here: Inability to verbalize or

pantomime size or orientation

DF: Jeannerod et al.

Lesion here: Inability to Preshape

(except for objects with size “in the semantics”

Monkey Data:

Mishkin and

Ungerleider on

“What” versus

“Where”

Page 6: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Clinical Studies

Studies with patients with some visual deficits strongly

argue that tight interaction between where and

what/how visual streams are necessary for scene interpretation.

Visual agnosia: can see objects, copy drawings of them, etc., but cannot

recognize or name them!

Dorsal agnosia: cannot recognize objects

if more than two are presented simulta-

neously: problem with localization

Ventral agnosia: cannot identify objects.

Page 7: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

These studies suggest…

We bind features of objects into objects (feature binding)

We bind objects in space into some arrangement (space binding)

We perceive the scene.

Feature binding = what/how stream

Space binding = where stream

Double role of spatial relationships:

To relate different portions of an object or scene as a guide to recognition

Augmented by other “how” parameters, to guide our behavior with respect

to the observed scene.

Page 8: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Inferotemporal Pathways

Later stages of IT (AIT/CIT) connect to the frontal

lobe, whereas earlier ones (CIT/PIT) connect to the

parietal lobe. This functional distinction may well

be important in forming a complete picture of

inter-lobe interaction.

Page 9: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Shape perception and scene analysis

- Shape-selective neurons in cortex

- Coding: one neuron per object

or population codes?

- Biologically-inspired algorithms

for shape perception

- The "gist" of a scene: how can we get

it in 100ms or less?

- Visual memory: how much do we remember

of what we have seen?

- The world as an outside memory and our eyes as a lookup tool

Page 10: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Face Cells in Monkey

Page 11: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Object recognition

- The basic issues

- Translation and rotation invariance

- Neural models that do it

- 3D viewpoint invariance (data and models)

- Classical computer vision approaches: template matching and matched

filters; wavelet transforms; correlation; etc.

- Examples: face recognition.

- More examples of biologically-

inspired object recognition systems

which work remarkably well

Page 12: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Extended Scene Perception

Attention-based analysis: Scan scene with attention, accumulate

evidence from detailed local analysis at each attended location.

Main issues:

- what is the internal representation?

- how detailed is memory?

- do we really have a detailed internal representation at all!!?

Gist: Can very quickly (120ms) classify entire scenes or do simple

recognition tasks; can only shift attention twice in that much time!

Page 13: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Thorpe: Recognizing Whether a Scene Contains an Animal

0

200

400

600

800

1000

1200

1400

0 200 400 600 800 1000

Reaction Time

Distractors

Targets

ERP difference onset

A n i m alN o n - a n i m alDifference

Mean of 15 subjects-6

6µV

100 200 300 ms

A.

B.

Minimum ResponseTime

Claim: This is so quick that only feedforward processing can be involved

Page 14: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Eye Movements: Beyond Feedforward Processing

1) Examine scene freely

2) estimate material

circumstances of family

3) give ages of the people

4) surmise what family has

been doing before arrival

of “unexpected visitor”

5) remember clothes worn by

the people

6) remember position of people

and objects

7) estimate how long the “unexpected

visitor” has been away from family

Page 15: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

The World as an Outside Memory

Kevin O’Regan, early 90s:

why build a detailed internal representation of the world?

too complex…

not enough memory…

… and useless?

The world is the memory. Attention and the eyes are a look-up tool!

Page 16: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

The “Attention Hypothesis”

Rensink, 2000

No “integrative buffer”

Early processing extracts information up to “proto-object” complexity in massively parallel manner

Attention is necessary to bind the different proto-objects into complete objects, as well as to bind object and location

Once attention leaves an object, the binding “dissolves.” Not a problem, it can be formed again whenever needed, by shifting attention back to the object.

Only a rather sketchy “virtual representation” is kept in memory, and attention/eye movements are used to gather details as needed

Page 17: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Challenges of Object Recognition

The binding problem: binding different features (color, orientation, etc)

to yield a unitary percept. (see next slide)

Bottom-up vs. top-down processing: how

much is assumed top-down vs. extracted

from the image?

Perception vs. recognition vs. categorization: seeing an object vs. seeing

is as something. Matching views of known objects to memory vs.

matching a novel object to object categories in memory.

Viewpoint invariance: a major issue is to recognize objects irrespective

of the viewpoint from which we see them.

Page 18: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Four stages of representation (Marr, 1982)

1) pixel-based (light intensity)

2) primal sketch (discontinuities in intensity)

3) 2 ½ D sketch (oriented surfaces, relative depth between surfaces)

4) 3D model (shapes, spatial relationships, volumes)

TMB2 view: This may work in ideal cases, but in general “cooperative

computation” of multiple visual cues and perceptual schemas will be

required.

problem: computationally intractable!

Page 19: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

VISIONS

A computer vision system from 1987 developed by

Allen Hanson and Edward Riseman on the basis of

the HEARSAY system for speech understanding (TMB2 Sec. 4.2)

and Arbib’s Schema Theory (TMB2 Sec. 2.2 and Chap. 5)

This is schema-based and can be “mapped” onto hypotheses

about cooperative computation in the brain.

Key idea: Bringing context and scene knowledge into play so that

recognition of objects proceeds via islands of reliability to yield a

consensus interpretation of the scene.

See TMB2 Sec. 5.2 for the figures.

Page 20: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Biederman: Recognition by Components

Biederman et al. (1991 – )

“geons”: units of

3D geometric structure

Page 21: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

JIM 3 (Hummel)

Page 22: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Collection of Fragments (Edelman and Intrator)

Page 23: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Collection of Fragments 2

Page 24: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Page 25: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Viewpoint Invariance

Major problem for recognition.

Biederman & Gerhardstein, 1994:

We can recognize two views of an unfamiliar object as being the same

object.

Thus, viewpoint invariance cannot only rely on matching views to

memory.

Page 26: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Models of Object Recognition

See Hummel, 1995, The Handbook of Brain Theory & Neural Networks

Direct Template Matching:

Processing hierarchy yields activation of view-tuned units.

A collection of view-tuned units is associated with one object.

View tuned units are built from V4-like units,

using sets of weights which differ for each object.

e.g., Poggio & Edelman, 1990; Riesenhuber & Poggio, 1999

Page 27: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Computational Model of Object Recognition

(Riesenhuber and Poggio, 1999)

Page 28: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

the model neurons are

tuned for size

and 3D orientation

of object

Page 29: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Models of Object Recognition

Hierarchical Template Matching:

Image passed through layers of units with progressively more complex

features at progressively less specific locations.

Hierarchical in that features at one stage are built from features at

earlier stages.

e.g., Fukushima & Miyake (1982)’s Neocognitron:

Several processing layers, comprising

simple (S) and complex (C) cells.

S-cells in one layer respond to conjunc-

tions of C-cells in previous layer.

C-cells in one layer are excited by

small neighborhoods of S-cells.

Page 30: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Models of Object Recognition

Transform & Match:

First take care of rotation, translation, scale, etc. invariances.

Then recognize based on standardized pixel representation of objects.

e.g., Olshausen et al, 1993,

dynamic routing model

Template match: e.g., with

an associative memory based on

a Hopfield network.

Page 31: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Recognition by Components

Structural approach to object recognition:

Biederman, 1987:

Complex objects are composed so simpler pieces

We can recognize a novel/unfamiliar object by parsing it in terms of its

component pieces, then comparing the assemblage of pieces to those of

known objects.

Page 32: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Recognition by components (Biederman, 1987)

GEONS: geometric elements of which all objects are composed

(cylinders, cones, etc). On the order of 30 different shapes.

Skips 2 ½ D sketch: Geons are directly recognized from edges, based

on their nonaccidental properties (i.e., 3D features that are usually

preserved by the projective imaging process).

Page 33: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Basic Properties of GEONs

They are sufficiently different from each other to be easily

discriminated

They are view-invariant (look identical from most viewpoints)

They are robust to noise (can be identified even with parts of image

missing)

Page 34: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Support for RBC: We can recognize partially occluded

objects easily if the occlusions do not obscure the set

of geons which constitute the object.

Page 35: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Potential difficulties

Edelman, 1997

A. Structural description not

enough, also need metric info

B. Difficult to extract geons

from real images

C. Ambiguity in the structu-

ral description: most often

we have several candidates

D. For some objects,

deriving a structural repre-

sentation can be difficult

Page 36: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Geon Neurons in IT?

These are preferred

stimuli for some IT neurons.

Page 37: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Page 38: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Fusiform Face Area in Humans

Page 39: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

representation

• Image specific

• Supports fine

discrimination

• Noise tolerant

• Image invariant

• Supports

generalization

• Noise sensitive

visual processing

Standard View on Visual Processing

Tjan, 1999

Page 40: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Early visual processing

Face

Place

Common objects ?(e.g. Kanwisher et al; Ishai et al)

primary visual processing

(Tjan, 1999) Multiple memory/decision sites

Page 41: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

primary visual processing

memory memory memory...

“R1” “Ri” “Rn”Independent

Decisions

t1 ti tnDelays

Homunculus’

Response the first arriving response

Sensory

Memory

Tjan’s “Recognition by Anarchy”

Page 42: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

A toy visual system

Task: Identify letters from arbitrary

positions & orientations

“e”

Page 43: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

normalize

position

normalize

orientationImage

down-

sampling

memory

Page 44: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

memory

normalize

position

normalize

orientationImage

down-

sampling

memory memorySite 1 Site 2 Site 3

Page 45: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Test stimuli:1) familiar (studied) views,

2) new positions,

3) new position & orientations

1800 {30%} 1500 {25%} 800 {20%} 450 {15%} 210 {10%}

Signal-to-Noise Ratio {RMS Contrast}

Study stimuli:5 orientations 20 positions at high SNR

Page 46: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

raw image

norm. pos.

norm. ori.

Site 3

Site 2

Site 1

Processing speed for each recognition module depends

on recognition difficulty by that module.

Page 47: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

0

0.2

0.4

0.6

0.8

1

10 100

0

0.2

0.4

0.6

0.8

1

10 100

0

0.2

0.4

0.6

0.8

1

10 100

Pro

port

ion C

orr

ect

Contrast (%)

Familiar views Novel positionsNovel positions

& orientations

raw image

norm. pos.

norm. ori.

Site 3

Site 2

Site 1

Page 48: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

0

0.2

0.4

0.6

0.8

1

10 100

0

0.2

0.4

0.6

0.8

1

10 100

0

0.2

0.4

0.6

0.8

1

10 100

Novel positionsNovel positions

& orientations

Pro

port

ion C

orr

ect

raw image

norm. pos.

norm. ori.

Site 3

Site 2

Site 1

Contrast (%)

Familiar views

Black curve: full model in which recognition is based

on the fastest of the responses from the three stages.

Page 49: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Page 50: 07. Object Recognition (2001)

Arbib: CS564 - Brain Theory and Artificial Intelligence, USC, Fall 2001. Lecture 7. Object Recognition

Experimental techniques in visual neuroscience

- Recording from neurons: electrophysiology

- Multi-unit recording using electrode arrays

- Stimulating while recording

- Anesthetized vs. awake animals

- Single-neuron recording in awake humans

- Probing the limits of vision: visual psychophysics

- Functional neuroimaging: Techniques

- Experimental design issues

- Optical imaging

- Transcranial magnetic stimulation