visual attention and recognition through neuromorphic modeling of “where” and “what”...
DESCRIPTION
Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and “What” Pathways. Zhengping Ji Embodied Intelligence Laboratory Computer Science and Engineering Michigan State University, Lansing, USA. Outline. Attention and recognition: Chicken-egg problem - PowerPoint PPT PresentationTRANSCRIPT
Michigan State University 1
Visual Attention and Recognition Through Neuromorphic Modeling of “Where” and
“What” Pathways
Zhengping JiEmbodied Intelligence Laboratory
Computer Science and EngineeringMichigan State University,
Lansing, USA
Michigan State University 2
Outline Attention and recognition: Chicken-egg problem Motivation: brain inspired, neuromorphic, brain’s
visual pathway Saliency-based attention Where-what Network (WWN):
How to integrate the saliency-based attention & top-down attention control
How attention and recognition helps each other Conclusions and future work
Michigan State University 3
What is attention?
Michigan State University 4
Bottom-up Attention (Saliency)
Michigan State University 5
Bottom-up Attention (Saliency)
Michigan State University 6
Attention Shifting
Michigan State University 7
Attention Shifting
Michigan State University 8
Attention Shifting
Michigan State University 9
Attention Shifting
Michigan State University 10
Spatial Top-down Attention Control
Michigan State University 11
Spatial Top-down Attention Control
e.g. pay attention to the center
Michigan State University 12
Object-based Top-down Attention Control
Michigan State University 13
Object-based Top-down Attention Control
e.g. pay attention to the square
Michigan State University 14
Chicken-egg Problem
Without attention, recognition cannot do well: recognition requires attended areas for the
further processing. Without recognition, attention is limited:
not only bottom-up saliency-based cues, but also top-down object-dependant signals and top-down spatial controls.
Michigan State University 15
Problem
Michigan State University 16
Challenge High-dimensional space Background noise Large variance
Scale Shape Illumination View point …..
Michigan State University 17
Saliency-based Attention (I)
IHDR Tree
IHDR Tree Heading
Direction
Boundary Detection Part
The mapping from two visual images to correct road boundary type for
each sub-window
(Reinforcement Learning)
Action Generation Part
The mapping from road boundary type to correct heading
direction
(Supervised Learning)
e1Desired
Path
Win1Win2
Win3 Win4Win5
Win6
e2 e3 e4 e5 e6
Naïve way: attention window
by guessing
Michigan State University 18
Saliency-based Attention (II)
Low-level image
processing
Itti & Koch et al. 1998
Michigan State University 19
Review Attention and recognition: Chicken-egg problem Motivation: brain inspired, neuromorphic, brain’s
visual pathway Saliency-based attention Where-what Network (WWN):
How to integrate the saliency-based attention & top-down attention control
How attention and recognition helps each other Conclusions and future work
Michigan State University 20
Biological Motivations
Michigan State University 21
Challenge: Foreground Teaching
How does a neuron separate a foreground from a complex background? No need for a teacher to hand-segment the
foreground Fixed foreground, changing background
E.g., during baby object tracking The background weights are averaged out
(no effect during neuronal competition)
Michigan State University 22
Novelty Bottom-up attention:
Koch & Ullman in 1985, Itti & Koch et al. 1998, Baker et al. 2001, etc. Position based top-down control:
Olshausen et al. 1993, Tsotsos et al. 1995, Mozer et al. 1996, Schill et al. 2001, Rao et al. 2004, etc.
Object based top-down control: Deco & Rolls 2004 (no performance evaluation), etc.
Our work: Saliency is developed features Both bottom-up and top-down based control Top-down: either object, position or none Attention and recognition is a single process
Michigan State University 23
ICDL Architecture
ImageV1 V2
“what”-motor
40*40
11*11
11*1111*1121*21
“where”-motor
(r, c) 40*40 pixel-based
Size fixed: 20*20
global
global
Michigan State University 24
Multi-level Receptive Fields
Michigan State University 25
Layer Computation Compute pre-response of cell (i, j) at
time t
Sort: z1 ≥ z2 ≥ … zk… ≥ zm;
Only top-k neurons respond to keep selectiveness and long-term memory
Response range is normalized Update the local winners
Michigan State University 26
In-place Learning Rule Do not use back-prop
Not biologically plausible Does not give long-term memory
Do not use any distribution model (e.g., Gaussian mixture) Avoid high complexity of covariance matrix
New Hebbian like rule: With automatic plasticity scheduling: only winners
update Minimum error toward target in every incremental
estimation stage (local first principal component)
Michigan State University 27
Top-down Attention
Recruit & identify class invariant
features
Recruit & identify position invariant
features
Michigan State University 28
Experiment
Foreground objects defined by “what” motor (20*20)
Attended areas defined by “where” motor
Randomly Selected background patches (40*40)
Michigan State University 29
Developed Layer 1
Bottom-up synaptic weights of neurons in Layer 1, developed through randomly selected patches from natural images.
Michigan State University 30
Developed Layer 2
Bottom-up synaptic weights of neurons in Layer 2.
Not Intuitive for understanding!!
Michigan State University 31
Response Weighted Stimuli for Layer 2
Michigan State University 32
Experimental Result I
Recognition rate with incremental learning
Michigan State University 33
Experimental Result II
(a) Examples of input images; (b) Responses of attention (“where”) motors when supervised by “what” motors. (c) Responses of attention (“where”) motor when “what” supervision is not available.
Michigan State University 34
Summary
“What” motor helps to direct attention of network to features of particular object;
“Where” motor helps to direct attention to positional information (from 45% to 100% accurate when “where” information is present);
Saliency-based bottom-up attention, location-based top-down attention, and object-based top-down attention are integrated in the top-k spatial competition rule;
Michigan State University 35
Problems The accuracy for the “where” motors is not good:
45.53% Layer 1 was developed offline; More layers are needed to handle more positions Where motor should be given externally, instead of
retina-based representation No internal iterations especially when the number of
hidden layers is larger than one No cross-level projections
Michigan State University 36
Fully Implemented WWN (Original Design)“where”-motor
Image(40*40)
V1(40*40)
V2(40*40)
V4(40*40) “what”-motor: 4 objects
11*1111*11
11*1121*21
V3LIP
31*31
IT (40*40)
MT PP
(r, c) 25 center
Fixed size motor
global global
Michigan State University 37
Problems The accuracy for “where” and “what” motors are not good:
25.53% for “what” motor and 4.15% for “where” motor Too many parameters to be tuned Training is extremely slow How to do the internal iterations
“Sweeping” way: always use recently updated weights and responses.
Always use p-1 weights and responses, where p records the current number of iterations.
The response should not be normalized in each lateral inhibition neighborhood.
Michigan State University 38
Modified Simple Architecture
ImageV1 V2
“what”-motor : 5 Objects
40*40
11*11
11*1111*1121*21
“where”-motor
(r, c) 5 centers
Size fixed: 20*20
global
global
Retina-based supervision
Michigan State University 39
Advantage
Internal iterations are not necessary Network is running much faster Easier to track neural representations and
evaluate performance Performance evaluation
What motor reaches 100% accuracy for disjoint test
Where motor reaches 41.09% accuracy for disjoint test
Michigan State University 40
Problems
Top-down projection from motor
+Bottom-up responses
Top-down responses
Total responses
Dominance by Top-down Projection
Michigan State University 41
Solution
Sparse bottom-up responses by only keeping local top-k winner of bottom-up responses
The performance of where motor increases from around 40% to 91%.
Michigan State University 42
Fully Implemented WWN (Latest)“where”-motor
Image(40*40)
V1(40*35)
V2(40*40)
V4(40*40)
“what”-motor: 5 objects(smoothing by Gaussian)
11*1111*11
11*1121*21
MT(r, c) 3*3 center
Fixed-size: 20*20
(smoothing by Gaussian)(40*40)
Each cortex: Modified ADAST
Michigan State University 43
Modified ADAST
Previous Cortex
L4 L2/3
L6 (ranking)
L5 (ranking)
L2/3Next Cortex
Michigan State University 44
Other improvements
Smooth the external motors using Gaussian function
Where motors are evaluated by regression errors Local top-k is adaptive by neuron positions The network does not converge by internal
iterations learning rate for top-down excitation is adaptive by
internal iterations. Using context information
Michigan State University 45
Layer 1 – Bottom-up Weights
Michigan State University 46
Layer 2 – Response-weighted Stimuli
Michigan State University 47
Layer 3 (Where) – Top-down Weights
Michigan State University 48
Layer 3 (What) – Top-down Weights
Michigan State University 49
Test Samples
Input “Where” motor (ground truth) “What” motor (ground truth)
“Where” output (Saliency-based) “Where” output (“What” supervised) “What” output (Saliency-based) “What” output (“Where” supervised)
Michigan State University 50
Performance Evaluation
Without supervision
Supervise “Where”
Supervise “What”
“Where” motor (regression error: MSE)
4.137 pixels N/A 4.137 pixels
“What” motor (classification
error: %)12.7% 12.1 % N/A
Average error for “where” and “what” motors (250 test samples)
Michigan State University 51
Discussions