michigan state university1 visual attention and recognition through neuromorphic modeling of...

51
ichigan State University Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways Zhengping Ji Embodied Intelligence Laboratory Computer Science and Engineering Michigan State University, Lansing, USA

Post on 19-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Michigan State University 1

Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and

“What” Pathways

Zhengping JiEmbodied Intelligence Laboratory

Computer Science and EngineeringMichigan State University,

Lansing, USA

Michigan State University 2

Outline Attention and recognition: Chicken-egg problem Motivation: brain inspired, neuromorphic, brain’s

visual pathway Saliency-based attention Where-what Network (WWN):

How to integrate the saliency-based attention & top-down attention control

How attention and recognition helps each other Conclusions and future work

Michigan State University 3

What is attention?

Michigan State University 4

Bottom-up Attention (Saliency)

Michigan State University 5

Bottom-up Attention (Saliency)

Michigan State University 6

Attention Shifting

Michigan State University 7

Attention Shifting

Michigan State University 8

Attention Shifting

Michigan State University 9

Attention Shifting

Michigan State University 10

Spatial Top-down Attention Control

Michigan State University 11

Spatial Top-down Attention Control

e.g. pay attention to the center

Michigan State University 12

Object-based Top-down Attention Control

Michigan State University 13

Object-based Top-down Attention Control

e.g. pay attention to the square

Michigan State University 14

Chicken-egg Problem

Without attention, recognition cannot do well: recognition requires attended areas for the

further processing. Without recognition, attention is limited:

not only bottom-up saliency-based cues, but also top-down object-dependant signals and top-down spatial controls.

Michigan State University 15

Problem

Michigan State University 16

Challenge High-dimensional space Background noise Large variance

Scale Shape Illumination View point …..

Michigan State University 17

Saliency-based Attention (I)

IHDR Tree

IHDR Tree Heading

Direction

Boundary Detection Part

The mapping from two visual images to correct road boundary type for

each sub-window

(Reinforcement Learning)

Action Generation Part

The mapping from road boundary type to correct heading

direction

(Supervised Learning)

e1Desired

Path

Win1Win2

Win3 Win4Win5

Win6

e2 e3 e4 e5 e6

Naïve way: attention window

by guessing

Michigan State University 18

Saliency-based Attention (II)

Low-level image

processing

Itti & Koch et al. 1998

Michigan State University 19

Review Attention and recognition: Chicken-egg problem Motivation: brain inspired, neuromorphic, brain’s

visual pathway Saliency-based attention Where-what Network (WWN):

How to integrate the saliency-based attention & top-down attention control

How attention and recognition helps each other Conclusions and future work

Michigan State University 20

Biological Motivations

Michigan State University 21

Challenge: Foreground Teaching

How does a neuron separate a foreground from a complex background? No need for a teacher to hand-segment the

foreground Fixed foreground, changing background

E.g., during baby object tracking The background weights are averaged out

(no effect during neuronal competition)

Michigan State University 22

Novelty Bottom-up attention:

Koch & Ullman in 1985, Itti & Koch et al. 1998, Baker et al. 2001, etc. Position based top-down control:

Olshausen et al. 1993, Tsotsos et al. 1995, Mozer et al. 1996, Schill et al. 2001, Rao et al. 2004, etc.

Object based top-down control: Deco & Rolls 2004 (no performance evaluation), etc.

Our work: Saliency is developed features Both bottom-up and top-down based control Top-down: either object, position or none Attention and recognition is a single process

Michigan State University 23

ICDL Architecture

ImageV1 V2

“what”-motor

40*40

11*11

11*1111*1121*21

“where”-motor

(r, c) 40*40 pixel-based

Size fixed: 20*20

global

global

Michigan State University 24

Multi-level Receptive Fields

Michigan State University 25

Layer Computation Compute pre-response of cell (i, j) at

time t

Sort: z1 ≥ z2 ≥ … zk… ≥ zm;

Only top-k neurons respond to keep selectiveness and long-term memory

Response range is normalized Update the local winners

Michigan State University 26

In-place Learning Rule Do not use back-prop

Not biologically plausible Does not give long-term memory

Do not use any distribution model (e.g., Gaussian mixture) Avoid high complexity of covariance matrix

New Hebbian like rule: With automatic plasticity scheduling: only winners

update Minimum error toward target in every incremental

estimation stage (local first principal component)

Michigan State University 27

Top-down Attention

Recruit & identify class invariant

features

Recruit & identify position invariant

features

Michigan State University 28

Experiment

Foreground objects defined by “what” motor (20*20)

Attended areas defined by “where” motor

Randomly Selected background patches (40*40)

Michigan State University 29

Developed Layer 1

Bottom-up synaptic weights of neurons in Layer 1, developed through randomly selected patches from natural images.

Michigan State University 30

Developed Layer 2

Bottom-up synaptic weights of neurons in Layer 2.

Not Intuitive for understanding!!

Michigan State University 31

Response Weighted Stimuli for Layer 2

Michigan State University 32

Experimental Result I

Recognition rate with incremental learning

Michigan State University 33

Experimental Result II

(a) Examples of input images; (b) Responses of attention (“where”) motors when supervised by “what” motors. (c) Responses of attention (“where”) motor when “what” supervision is not available.

Michigan State University 34

Summary

“What” motor helps to direct attention of network to features of particular object;

“Where” motor helps to direct attention to positional information (from 45% to 100% accurate when “where” information is present);

Saliency-based bottom-up attention, location-based top-down attention, and object-based top-down attention are integrated in the top-k spatial competition rule;

Michigan State University 35

Problems The accuracy for the “where” motors is not good:

45.53% Layer 1 was developed offline; More layers are needed to handle more positions Where motor should be given externally, instead of

retina-based representation No internal iterations especially when the number of

hidden layers is larger than one No cross-level projections

Michigan State University 36

Fully Implemented WWN (Original Design)“where”-motor

Image(40*40)

V1(40*40)

V2(40*40)

V4(40*40) “what”-motor: 4 objects

11*1111*11

11*1121*21

V3LIP

31*31

IT (40*40)

MT PP

(r, c) 25 center

Fixed size motor

global global

Michigan State University 37

Problems The accuracy for “where” and “what” motors are not good:

25.53% for “what” motor and 4.15% for “where” motor Too many parameters to be tuned Training is extremely slow How to do the internal iterations

“Sweeping” way: always use recently updated weights and responses.

Always use p-1 weights and responses, where p records the current number of iterations.

The response should not be normalized in each lateral inhibition neighborhood.

Michigan State University 38

Modified Simple Architecture

ImageV1 V2

“what”-motor : 5 Objects

40*40

11*11

11*1111*1121*21

“where”-motor

(r, c) 5 centers

Size fixed: 20*20

global

global

Retina-based supervision

Michigan State University 39

Advantage

Internal iterations are not necessary Network is running much faster Easier to track neural representations and

evaluate performance Performance evaluation

What motor reaches 100% accuracy for disjoint test

Where motor reaches 41.09% accuracy for disjoint test

Michigan State University 40

Problems

Top-down projection from motor

+Bottom-up responses

Top-down responses

Total responses

Dominance by Top-down Projection

Michigan State University 41

Solution

Sparse bottom-up responses by only keeping local top-k winner of bottom-up responses

The performance of where motor increases from around 40% to 91%.

Michigan State University 42

Fully Implemented WWN (Latest)“where”-motor

Image(40*40)

V1(40*35)

V2(40*40)

V4(40*40)

“what”-motor: 5 objects(smoothing by Gaussian)

11*1111*11

11*1121*21

MT(r, c) 3*3 center

Fixed-size: 20*20

(smoothing by Gaussian)(40*40)

Each cortex: Modified ADAST

Michigan State University 43

Modified ADAST

Previous Cortex

L4 L2/3

L6 (ranking)

L5 (ranking)

L2/3Next Cortex

Michigan State University 44

Other improvements

Smooth the external motors using Gaussian function

Where motors are evaluated by regression errors Local top-k is adaptive by neuron positions The network does not converge by internal

iterations learning rate for top-down excitation is adaptive by

internal iterations. Using context information

Michigan State University 45

Layer 1 – Bottom-up Weights

Michigan State University 46

Layer 2 – Response-weighted Stimuli

Michigan State University 47

Layer 3 (Where) – Top-down Weights

Michigan State University 48

Layer 3 (What) – Top-down Weights

Michigan State University 49

Test Samples

Input “Where” motor (ground truth) “What” motor (ground truth)

“Where” output (Saliency-based) “Where” output (“What” supervised) “What” output (Saliency-based) “What” output (“Where” supervised)

Michigan State University 50

Performance Evaluation

Without supervision

Supervise “Where”

Supervise “What”

“Where” motor (regression error: MSE)

4.137 pixels N/A 4.137 pixels

“What” motor (classification

error: %)12.7% 12.1 % N/A

Average error for “where” and “what” motors (250 test samples)

Michigan State University 51

Discussions