outline
DESCRIPTION
Outline. 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/1.jpg)
Outline
1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction
2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control
3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning
![Page 2: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/2.jpg)
Outline
1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction
2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control
3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning
![Page 3: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/3.jpg)
reinforcement learning
input s
action a
weights
![Page 4: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/4.jpg)
actor
go right? go left?
simple input
go right!
complex input
reinforcement learning
input (state space)
![Page 5: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/5.jpg)
sensory input
reward
action
complex input
scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’;
reward given if horizontal bar at specific position
![Page 6: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/6.jpg)
need another layer(s) to pre-process complex data
P(a=1) = softmax(Q s)a action
s state
I input
Q weight matrix
W weight matrix
position of relevant bar
encodes v = a Q s
feature detector
s = softmax(W I)
E = (0.9 v(s’,a’) - v(s,a))2 = δ2
d Q ≈ dE/dQ = δ a sd W ≈ dE/dW = δ Q s I + ε
minimize error:
learning rules:
![Page 7: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/7.jpg)
SARSA with WTA input layer
![Page 8: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/8.jpg)
memory extension
model uses previous state and action to estimate current state
![Page 9: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/9.jpg)
RL action weights
feature weights
data
learning the ‘short bars’ data
reward
action
![Page 10: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/10.jpg)
short bars in 12x12 average # of steps to goal: 11
![Page 11: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/11.jpg)
RL action weights
feature weights
input reward 2 actions (not shown)
data
learning ‘long bars’ data
![Page 12: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/12.jpg)
WTAnon-negative weights
SoftMaxnon-negative weights
SoftMaxno weight constraints
![Page 13: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/13.jpg)
models’ background:
- gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995)
- reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)
- reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002)
- reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ...
- RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)
![Page 14: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/14.jpg)
unsupervisedlearningin cortex
reinforcementlearning
in basal ganglia
state spaceactor
Doya, 1999
![Page 15: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/15.jpg)
![Page 16: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/16.jpg)
![Page 17: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/17.jpg)
Discussion
- may help reinforcement learning work with real-world data
... real visual processing!
![Page 18: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/18.jpg)
Outline
1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction
2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control
3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning
![Page 19: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/19.jpg)
Representation of depth
• How to learn disparity tuned
neurons in V1?
![Page 20: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/20.jpg)
Reinforcement learning in a neural network
• after vergence: input at a new disparity
• if disparity is zero reward
![Page 21: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/21.jpg)
Attention-Gated Reinforcement Learning
Hebbian-like weight learning:
(Roelfsema, van Ooyen, 2005)
![Page 22: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/22.jpg)
Six types of tuning curves (Poggio, Gonzalez, Krause, 1988)
Measured disparity tuning curves
![Page 23: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/23.jpg)
All six types of tuning curves emerge in the hidden layer!
Development of disparity tuning
![Page 24: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/24.jpg)
Discussion
- requires application
... use 2D images from 3D space
... open question as to the implementation of the reward
... learning of attention?
![Page 25: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/25.jpg)
Outline
1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction
2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control
3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning
![Page 26: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/26.jpg)
Reinforcement Learning leads to a fixed reactive systemthat always strives for the same goal
value actor units
task: in exploration phase, learn a general model
to allow the agent to plan a route to any goal
![Page 27: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/27.jpg)
Learning
actor
state space
randomly move aroundthe state space
learn world models:● associative model● inverse model● forward model
![Page 28: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/28.jpg)
Learning: Associative Model
weights to associateneighbouring states
use these to find any possible routes between agent and goal
si '=∑ w ijs'ss j
jiiss'
ij s''sε=Δw s~
![Page 29: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/29.jpg)
Learning: Inverse Model
weights to “postdict”action given state pair
use these to identify the action that leads to a desired stateji
s s'akijk s'sw=a ~ jikk
sas'kij s'saaε=Δw ~
sum product Sigma-Pi neuron model
![Page 30: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/30.jpg)
Learning: Forward Model
weights to predict stategiven state-action pair
use these to predict the next state given the chosen actionjk
ass'ikji saw=' s Δw ik j
s'as=ε si '− si ' ak s j
![Page 31: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/31.jpg)
Planning
![Page 32: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/32.jpg)
Planning
![Page 33: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/33.jpg)
Planning
![Page 34: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/34.jpg)
Planning
![Page 35: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/35.jpg)
Planning
![Page 36: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/36.jpg)
Planning
![Page 37: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/37.jpg)
Planning
![Page 38: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/38.jpg)
Planning
![Page 39: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/39.jpg)
Planning
![Page 40: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/40.jpg)
Planning
![Page 41: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/41.jpg)
Planning
![Page 42: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/42.jpg)
Planning
![Page 43: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/43.jpg)
Planning
![Page 44: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/44.jpg)
Planning
goal
actorunits
agent
![Page 45: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/45.jpg)
Planning
![Page 46: Outline](https://reader036.vdocuments.site/reader036/viewer/2022062804/568148f7550346895db6171d/html5/thumbnails/46.jpg)
Discussion
- requires embedding
... learn state space from sensor input
... only random exploration implemented
- tong ... hand-designed planning phases
... hierarchical models?