qingqun kong 2011.7 - iavision.ia.ac.cn/zh/senimar/reports/visnet.pdf · visnet a model of...
TRANSCRIPT
Qingqun Kong
2011.7.12
Visnet A model of invariant object recognition
Edmund T. Rolls and Gustavo Deco, ”Computational Neuroscience of Vision”, Oxford University Press,2002
Visnet A model of invariant object representation
Hierarchical network
提纲物体识别简介
物体识别的生理机制
物体识别的方法
Visnet
Visnet的实现过程及结果分析
下一步的工作
Invariant object recognition
model
Outputs
Inputs
Invariant object recognition
Visnet
Outputs
InputsImages of different
0bjects at different positions
Invariant object recognition
model
Outputs
Inputs
labels
Invariant object recognition
model
Outputs
Inputs
labels
Images of different 0bjects at
different positions
Invariant object recognition
model
Outputs
Inputs
Invariant objectRepresentation
labels
Invariant object recognition Solving translation(view、size…) invariance:
responding the same local spatial arrangement ,ignoring the global position of the object
Recognizing the object in different transforms in just a few seconds of inspection of an object
提纲 物体识别简介
物体表示的生理机制
物体表示的方法
Visnet
Visnet的实现过程及结果分析
下一步的工作
Neurophysiological mechanisms Hierarchical network
Feed forward connection
Neurophysiological mechanisms Hierarchical network
Feed forward connection
Lateral connection
Neurophysiological mechanisms Hierarchical network
Sparse representation
Local representation
distributed representation
Neurophysiological mechanisms Hierarchical network
Sparse representation
Local representation
distributed representation
Representing similarity by vector correlation;
Exponential coding capacity;
Neurophysiological mechanisms Hierarchical network
Sparse coding
Temporal properties
When a object was translated to a nearby position, because this would occur in a short period, the membrane of the postsynaptic neuron would still be in its ‘Hebb-modifiable’ state, and the presynaptic afferents activated with the neuron.
提纲 物体识别简介
物体表示的生理机制
物体表示的方法
Visnet
Visnet的实现过程及结果分析
下一步的工作
Approaches to invariant object recognition Feature space
Regardless of the relative arrangement of the features
Some birds(pigeons)
Structural descriptions and syntactic pattern
3D descriptions
Necessary for language to provide description of objects
Template matching and the alignment
Active vision (some invertebrates)
Feature hierarchies and 2D view-based object recognition
Visnet
提纲 物体识别简介
物体表示的生理机制
物体表示的方法
Visnet
Visnet的实现过程及结果分析
下一步的工作
Visnet
Visnet
Architecture of Visnet
The forward connections to individual cells are derived from a topologically corresponding region of the preceding layer , using a Gaussian distribution of connection probabilities.
Input to Visnet
Visnet
Outputs
Inputs
Images of different 0bjects at
different positions( , )I x y
Input to Visnet
Visnet
Outputs
Inputs
Images of different 0bjects at
different positions( , )I x y
Input to Visnet
Camera
Visnet
Outputs
Inputs
filter
( , )I x y
( , , )xy f
( , )* ( , , )xyI x y f
2 2 2cos sin cos sin cos sin( ) ( ) ( )
2 1.6 2 3 21( , , ) [ ]
1.6
x y x y x y
f f f
xy f e e e
1 1 0.5 0.25 0.125 0.0625f
0 45 90 135
Images of different 0bjects at
different positionsRetina
V1
Learning ProcessLearning Process(take layer 1 for example)
2.Competition and lateral inhibition
i j ij
j
h x w
*r h I
1.The activation of each neuronih
Learning ProcessLearning Process(take layer 1 for example)
2.Competition and lateral inhibition
3.Contrast enhancement
i j ij
j
h x w
*r h I
1.The activation of each neuronih
is used to control the sparseness of firing rates
within each layer
Learning ProcessLearning Process(take layer 1 for example)
2.Competition and lateral inhibition
3.Contrast enhancement
4.Updating weights
i j ij
j
h x w
*r h I
ij i jw y x
1
(1 )i i iy y y
1.The activation of each neuronih
Learning ProcessLearning Process(take layer 1 for example)
2.Competition and lateral inhibition
3.Contrast enhancement
4.Updating weights
5.Return 1
i j ij
j
h x w
*r h I
1.The activation of each neuronih
ij ij ijw w w
( )ij i j ijw y x w
Testing ProcessTesting Process(take layer 1 for example)
2.Competition and lateral inhibition
3.Contrast enhancement
i j ij
j
h x w
*r h I
1.The activation of each neuronih
Experiment Each image is 64*64 pixels and is shown at different
positions in the 128*128 “retina”.
The number of pixels by which the image was translated was 8 for each move.
Experiment 1
Experiment 1
Experiment 1
Experiment 1
Experiment 2
1
ij i jw y x
Learning rule:
Experiment 2
ConclusionForming feature combination at the early stage of
processing
Trace learning rule
Solving translation invariance(responding the same local spatial arrangement ,ignoring the global position of the object)
Recognizing the object in different transforms in just a few seconds of inspection of an object
It would be less good for making actions in 3D space
提纲 物体识别简介
物体表示的生理机制
物体表示的方法
Visnet
Visnet的实现过程及结果分析
下一步的工作
结果分析 输入
滤波器2 2 2cos sin cos sin cos sin
( ) ( ) ( )2 1.6 2 3 21
( , , ) [ ]1.6
x y x y x y
f f f
xy f e e e
结果分析 输入
滤波器
输入连接
结果分析 输入
滤波器
输入连接
的确定
结果分析 输入
滤波器
输入连接
的确定
的确定
结果分析 输入
滤波器
输入连接
Visnet的输出,作为竞争性网络的输出,用于分类
的确定
的确定
提纲 物体识别简介
物体表示的生理机制
物体表示的方法
Visnet
Visnet 的实现过程及结果分析
下一步的工作
下一步的工作继续查找原因,实现Visnet针对平移不变性的功能;
对于View、size,测试Visnet的不变性
考虑反馈的作用
物体识别与立体视觉的关系
Thanks!