outline

46
Outline 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning

Upload: jihan

Post on 14-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Outline. 1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) ‏ Task 4.1 Visual processing based on feature abstraction 2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) ‏ Task 4.3 Learning of attention and vergence control 3) From Exploration to Planning - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Outline

Outline

1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction

2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control

3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning

Page 2: Outline

Outline

1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction

2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control

3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning

Page 3: Outline

reinforcement learning

input s

action a

weights

Page 4: Outline

actor

go right? go left?

simple input

go right!

complex input

reinforcement learning

input (state space)

Page 5: Outline

sensory input

reward

action

complex input

scenario: bars controlled by actions, ‘up’, ‘down’, ‘left’, ‘right’;

reward given if horizontal bar at specific position

Page 6: Outline

need another layer(s) to pre-process complex data

P(a=1) = softmax(Q s)a action

s state

I input

Q weight matrix

W weight matrix

position of relevant bar

encodes v = a Q s

feature detector

s = softmax(W I)

E = (0.9 v(s’,a’) - v(s,a))2 = δ2

d Q ≈ dE/dQ = δ a sd W ≈ dE/dW = δ Q s I + ε

minimize error:

learning rules:

Page 7: Outline

SARSA with WTA input layer

Page 8: Outline

memory extension

model uses previous state and action to estimate current state

Page 9: Outline

RL action weights

feature weights

data

learning the ‘short bars’ data

reward

action

Page 10: Outline

short bars in 12x12 average # of steps to goal: 11

Page 11: Outline

RL action weights

feature weights

input reward 2 actions (not shown)

data

learning ‘long bars’ data

Page 12: Outline

WTAnon-negative weights

SoftMaxnon-negative weights

SoftMaxno weight constraints

Page 13: Outline

models’ background:

- gradient descent methods generalize RL to several layers Sutton&Barto RL book (1998); Tesauro (1992;1995)

- reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)

- reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002)

- reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ...

- RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)

Page 14: Outline

unsupervisedlearningin cortex

reinforcementlearning

in basal ganglia

state spaceactor

Doya, 1999

Page 15: Outline
Page 16: Outline
Page 17: Outline

Discussion

- may help reinforcement learning work with real-world data

... real visual processing!

Page 18: Outline

Outline

1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction

2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control

3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning

Page 19: Outline

Representation of depth

• How to learn disparity tuned

neurons in V1?

Page 20: Outline

Reinforcement learning in a neural network

• after vergence: input at a new disparity

• if disparity is zero reward

Page 21: Outline

Attention-Gated Reinforcement Learning

Hebbian-like weight learning:

(Roelfsema, van Ooyen, 2005)

Page 22: Outline

Six types of tuning curves (Poggio, Gonzalez, Krause, 1988)

Measured disparity tuning curves

Page 23: Outline

All six types of tuning curves emerge in the hidden layer!

Development of disparity tuning

Page 24: Outline

Discussion

- requires application

... use 2D images from 3D space

... open question as to the implementation of the reward

... learning of attention?

Page 25: Outline

Outline

1) Goal-Directed Feature Learning (Weber & Triesch, IJCNN 2009) Task 4.1 Visual processing based on feature abstraction

2) Emergence of Disparity Tuning (Franz & Triesch, ICDL 2007) Task 4.3 Learning of attention and vergence control

3) From Exploration to Planning (Weber & Triesch, ICANN 2008) Task 6.4 Learning hierarchical world models for planning

Page 26: Outline

Reinforcement Learning leads to a fixed reactive systemthat always strives for the same goal

value actor units

task: in exploration phase, learn a general model

to allow the agent to plan a route to any goal

Page 27: Outline

Learning

actor

state space

randomly move aroundthe state space

learn world models:● associative model● inverse model● forward model

Page 28: Outline

Learning: Associative Model

weights to associateneighbouring states

use these to find any possible routes between agent and goal

si '=∑ w ijs'ss j

jiiss'

ij s''sε=Δw s~

Page 29: Outline

Learning: Inverse Model

weights to “postdict”action given state pair

use these to identify the action that leads to a desired stateji

s s'akijk s'sw=a ~ jikk

sas'kij s'saaε=Δw ~

sum product Sigma-Pi neuron model

Page 30: Outline

Learning: Forward Model

weights to predict stategiven state-action pair

use these to predict the next state given the chosen actionjk

ass'ikji saw=' s Δw ik j

s'as=ε si '− si ' ak s j

Page 31: Outline

Planning

Page 32: Outline

Planning

Page 33: Outline

Planning

Page 34: Outline

Planning

Page 35: Outline

Planning

Page 36: Outline

Planning

Page 37: Outline

Planning

Page 38: Outline

Planning

Page 39: Outline

Planning

Page 40: Outline

Planning

Page 41: Outline

Planning

Page 42: Outline

Planning

Page 43: Outline

Planning

Page 44: Outline

Planning

goal

actorunits

agent

Page 45: Outline

Planning

Page 46: Outline

Discussion

- requires embedding

... learn state space from sensor input

... only random exploration implemented

- tong ... hand-designed planning phases

... hierarchical models?