actor-critic models: from ventral striatal reward-related activity to robotics simulations

59
Intro Electrophysiolo gy Modelling Discussion slide # 1 / 59 Actor-Critic models: from ventral striatal reward-related activity to robotics simulations. Dr. Mehdi Khamassi 1,2 1 LPPA, UMR CNRS 7152, Collège de France, Paris 2 AnimatLab-LIP6 / SIMA-ISIR, Université Pierre et Marie Curie, Paris 6

Upload: ratana

Post on 08-Jan-2016

38 views

Category:

Documents


0 download

DESCRIPTION

Actor-Critic models: from ventral striatal reward-related activity to robotics simulations. Dr. Mehdi Khamassi 1,2 1 LPPA, UMR CNRS 7152, Collège de France, Paris 2 AnimatLab-LIP6 / SIMA-ISIR, Université Pierre et Marie Curie, Paris 6. Intro. Intro. Intro. Intro. OBJECTIVE. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 1 / 59

Actor-Critic models: from ventral striatal reward-related activity to robotics

simulations.

Dr. Mehdi Khamassi1,2

1LPPA, UMR CNRS 7152, Collège de France, Paris

2AnimatLab-LIP6 / SIMA-ISIR, Université Pierre et Marie Curie, Paris 6

Page 2: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 2 / 59

OBJECTIVE

Help to understand how mammals can adapt their behavior in order to maximize reward obtained from the environment.

Help to understand brain mechanisms underlying these cognitive processes.

IntroIntroIntroIntro

Page 3: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 3 / 59

OBJECTIVE

Challenging goal: different levels of decision, different learning

processes, different types of representation

Pluridisciplinary approach

Behavioral Neurophysiology Computational Modelling Autonomous Robotics

IntroIntroIntroIntro

Page 4: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 4 / 59

ACTOR-CRITIC MODEL

CRITIC

Learns to

Predict reward

IntroIntroIntroIntro

• Developed in the AI community (RL)

• Explains some reward-seeking behaviors

• Resemblance with some part of the brain

(dopaminergic neurons & striatum)

ACTOR

Learns to

Select actions

Page 5: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 5 / 59

Outline

1. Introduction How does an Actor-

Critic model work ?

2. Electrophysiology Reward predictions in

the rat ventral striatum

Intro

3. Computational modelling

An Actor-Critic model in a simulated robot

4. Discussion

IntroIntroIntro

Page 6: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 6 / 59

The Actor-Critic model

Learning from reward

1

2

3

4

5Reward

1 2 3 4 5actions:reward

Intro

Page 7: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 7 / 59

The Actor-Critic model

• Learning from reward

1

2

3

4

5Reward

1 2 3 4 5actions:

reinforcement

reward

rewardreinforcement

Intro

Page 8: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 8 / 59

The Actor-Critic model

• Learning from reward

1

2

3

4

5Reward

1 2 3 4 5actions:

reinforcement

reward

rewardreinforcement

Pt-1reward prediction:

Rescorla and Wagner (1972).

Intro

Page 9: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 9 / 59

The Actor-Critic model

• Temporal-Difference (TD) learning

1

2

3

4

5

Pt-1 Pt

Reward

1 2 3 4 5actions:reward

reward predictions:

rewardreinforcement

reinforcement ȓ

Sutton and Barto (1998).

Intro

Page 10: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 10 / 59

The Actor-Critic model

• Analogy with dopaminergic neurons

rewardreinforcement

R S

Romo & Schultz (1990).Houk et al. (1995); Schultz et al. (1997).

+1

Intro

Page 11: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 11 / 59

The Actor-Critic model

Analogy with dopaminergic neurons

rewardreinforcement

R S

+1

Romo & Schultz (1990).Houk et al. (1995); Schultz et al. (1997).

Intro

Page 12: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 12 / 59

The Actor-Critic model

Analogy with dopaminergic neurons

rewardreinforcement

R S

0

Romo & Schultz (1990).Houk et al. (1995); Schultz et al. (1997).

Intro

Page 13: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 13 / 59

The Actor-Critic model

Analogy with dopaminergic neurons

rewardreinforcement

R S

-1

Romo & Schultz (1990).Houk et al. (1995); Schultz et al. (1997).

Intro

Page 14: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 14 / 59

The Actor-Critic model

Actor-Critic models

Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002).see Joel et al. (2002) for a review.

Dopaminergic neuron

Intro

Page 15: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 15 / 59

The Actor-Critic model

Actor-Critic models

Dopaminergic neuron

Intro

P = 0 P = 0

P = 0 P = 0

r = 0

r = 1

L E

Page 16: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 16 / 59

The Actor-Critic model

Actor-Critic models

Dopaminergic neuron

Intro

P = 0 P = 0

P = 0 P = 1

r = 0

r = 1

L E

11

Page 17: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 17 / 59

The Actor-Critic model

Actor-Critic models

Dopaminergic neuron

Intro

P = 1 P = 0

P = 0 P = 1

r = 0

r = 1

L E

11

11

Page 18: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 18 / 59

Adapted from Tierney (2006)

The rat brainIntro

Page 19: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 19 / 59

Adapted from Voorn et al. (2004)

The striatumIntro

Page 20: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 20 / 59

Ventral Striatum

Dopaminergic neurons (VTA / SNc)

Dorsal Striatum

Actions

ACTORCRITIC

The striatumIntro

(Barto, 1995; Houk et al., 1995; Montague et al., 1996; Schultz et al., 1997; Doya et al., 2002; O’Doherty et

al., 2004)

Page 21: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 21 / 59

Learning based on reward prediction in VS...

... on dopamine reinforcements.

... modelled by Temporal Difference (TD)-learning

In the monkey: (Hikosaka et al., 1989; Hollerman et al., 1998; Kawagoe et al., 1998; Hassani et al., 2001; Cromwell and

Schultz, 2003)In the rat: (Carelli et al., 2000; Daw et al., 2002; Setlow et al.,

2003; Nicola et al., 2004; Wilson and Bowman, 2005)

(Barto, 1995; Houk et al., 1995; Schultz et al., 1997; Doya et al., 2002)

(Schultz et al., 1992; Satoh et al., 2003; Nakahara et al., 2004)

The striatumIntro

Page 22: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 22 / 59

... using precise timing reward prediction in TD-learning

Adapted from (Suri and Schultz, 2001)

simulation of a TD-learning model

activity recorded from the monkey striatum

(Montague et al., 1996; Suri and Schultz, 2001; Perez-Uribe, 2001; Alexander and Sporns, 2002)

The striatumIntro

Page 23: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 23 / 59

ElectrophysiologyMethods

Recording in the rat VS

Simple electrodes

Electrophysiology

Page 24: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 24 / 59

ElectrophysiologyBehavioral methods

The plus-maze task

Electrophysiology

Page 25: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 25 / 59

ElectrophysiologyBehavioral methods

immobilerunning

Box arrival

Time

Center departure

The plus-maze task

Electrophysiology

Page 26: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 26 / 59

ElectrophysiologyResults

170 neurons 91 neurons with behavioral correlates

Departure Center Arrival

5

Time

Electrophysiology

Page 27: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 27 / 59

ElectrophysiologyResults: Reward anticipation

Ventral striatal neuron.

Activity anticipating

each reward droplet.

Independent from

locomotor behavior.

Khamassi, Mulder et al. (in revision) J Neurophysiol.

Electrophysiology

Page 28: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 28 / 59

ElectrophysiologyResults: Reward anticipation

Ventral striatal neuron.

Activity anticipating

each reward droplet.

Independent from

locomotor behavior.

Khamassi, Mulder et al. (in revision) J Neurophysiol.

Electrophysiology

Page 29: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 29 / 59

ElectrophysiologyResults: Reward anticipation

Ventral striatal neuron.

Activity anticipating

each reward droplet.

Independent from

locomotor behavior.

Anticipation of an extra

reward.

Khamassi, Mulder et al. (in revision) J Neurophysiol.

Electrophysiology

Page 30: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 30 / 59

Modelling with TD-learningResults

TD-learning

Temporal representation of stimuli (Montague et al., 1996).

Incomplete temporal representation

Ambiguous visual input

No spatial information

7 droplets 5 3 1

TD-learning

TD-learning

TD-learning

Electrophysiology

Page 31: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 31 / 59

Modelling with TD-learningResults

TD-learning

Temporal representation of stimuli (Montague et al., 1996).

Incomplete temporal representation

Same context after last drop than during droplets delivery.

No spatial information

7 droplets 5 3 1

TD-learning

TD-learning

TD-learning

Electrophysiology

Page 32: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 32 / 59

Modelling with TD-learningResults

TD-learning

Temporal representation of stimuli (Montague et al., 1996).

Incomplete temporal representation

Ambiguous visual input

No spatial information

7 droplets 5 3 1

TD-learning

TD-learning

TD-learning

Electrophysiology

Page 33: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 33 / 59

Modelling with TD-learningResults

TD-learning

Temporal representation of stimuli (Montague et al., 1996).

Incomplete temporal representation

Ambiguous visual input

No spatial information

7 droplets 5 3 1

TD-learning

TD-learning

TD-learning

Electrophysiology

Page 34: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 34 / 59

TD-learning could reproduce neural anticipatory activity.

Can it reproduce the rat's locomotor behavior in the same task ?

Khamassi, Mulder et al. (in revision) J Neurophysiol.

Electrophysiology

Page 35: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 35 / 59

Autonomous roboticsMethods

Virtual plus-maze

Visual perceptions

reward

reward

Actions

Modelling

Page 36: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 36 / 59

Autonomous roboticsMethods

Virtual plus-maze

Actions1

2

3

4

1

2

3

4

Visual perceptions

5

5

reward

reward

Modelling

Page 37: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 37 / 59

Autonomous roboticsMethods

Results expected

1

2

3

4

5

reward

Modelling

Page 38: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 38 / 59

Autonomous roboticsMethods

Actor-Critic models

Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002).see Joel et al. (2002) for a review.

Simplistic Actor. Most often: discrete

environments.

Dopaminergic neuron

Modelling

Page 39: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 39 / 59

Autonomous roboticsMethods

Actor-Critic models

Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002).see Joel et al. (2002) for a review.

Simplistic Actor. Most often: discrete

environments.

Continuous environments: coordination of modules.

gating network: Baldassarre (2002); Doya et al. (2002).

hand-tuned (independent from modules' performances): Suri and Schultz (2001).

Dopaminergic neuron

Modelling

Page 40: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 40 / 59

Autonomous roboticsMethods

Actor-Critic models

Barto (1995); Houk et al. (1995); Montague et al. (1996); Schultz et al. (1997); Berns and Sejnowski (1996); Suri and Schultz (1999); Doya (2000); Suri et al. (2001); Baldassarre (2002).see Joel et al. (2002) for a review.

Simplistic Actor. Most often: discrete

environments.

Continuous environments: coordination of modules.

gating network: Baldassarre (2002); Doya et al. (2002).

hand-tuned (independent from modules' performances): Suri and Schultz (2001).

Test principles within a common framework

Dopaminergic neuron

Modelling

Page 41: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 41 / 59

Autonomous roboticsMethods

Implemented framework

Modelling

Page 42: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 42 / 59

Autonomous roboticsMethods

Gurney, Prescott & Redgrave. (2001)Adapted by Girard et al. (2002; 2003).

Modelling

Page 43: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 43 / 59

Autonomous roboticsMethods

module coordination

Modelling

Page 44: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 44 / 59

Autonomous roboticsMethods

1. gating network(tests modules' capacity for state prediction)

Modelling

Page 45: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 45 / 59

Autonomous roboticsMethods

2. hand-tuned(independent from modules' performance)

reward

Categorization

Visual perceptions

Modelling

Page 46: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 46 / 59

Autonomous roboticsMethods

3. unsupervised categorization(Self-Oganizing Maps)

Modelling

Page 47: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 47 / 59

Autonomous roboticsMethods

4. random robot

Modelling

Page 48: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 48 / 59

Autonomous roboticsResults

average

Modelling

Page 49: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 49 / 59

Autonomous roboticsResults

Nb of iterations required(Average performance during the second

half of the experiment)

3,50094

40430,000

1. gating network2. hand-tuned3. unsupervised categorization (SOM)4. random robot

Modelling

Page 50: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 50 / 59

Autonomous roboticsResults

1. gating network2. hand-tuned3. unsupervised categorization (SOM)4. random robot

Nb of iterations required(Average performance during the second

half of the experiment)

3,50094

40430,000

Modelling

Page 51: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 51 / 59

Discussion

Contributions Critic-like reward anticipation in the ventral striatum Coordinating multiple modules with SOM

Discussion

Page 52: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 52 / 59

Discussion

Contributions Critic-like reward anticipation in the ventral striatum Coordinating multiple modules with SOM Prediction: dopamine signal for missing final drop

Discussion

Page 53: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 53 / 59

Discussion

Contributions Critic-like reward anticipation in the ventral

striatum Coordinating multiple modules with SOM Prediction: dopamine signal for missing final

drop

Perspectives Vary intervals between droplet rewards Integrate action values (Samejima et al., 2005) Improve the model based on other robotics

multi-modules reinforcement learning methods (Uchibe et al., 2004; Brunskill et al.; 2006)

Discussion

Page 54: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 54 / 59

The Actor-Critic model

Actor-Critic models

Dopaminergic neuron

Intro

P = 1 P = 0

P = 0 P = 1

r = 0

r = 1

L E

11

11

Page 55: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 55 / 59

Model-based reinforcement learning

Intro

P = 1 P = 0

P = 0 P = 1

r = 0

r = 1

Page 56: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 56 / 59

General discussionS

trat

egy

dim

ensi

on

Visual

Place

Cue-guided strategy

Place strategy

Action selection process

flexible, rapidly learned

(cognitive map)

(Action-outcome contingencies)

inflexible, slow to acquire

(Stimulus-Response associations)

Place recognition-triggered responseTrullier et al. (1997)

Cue-guided strategyDickinson and Balleine (1998)

Daw et al. (2005)

Model-free Model-based

Discussion

Page 57: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 57 / 59

General discussion

Reinterpret inconsistent behavioral results spatial more rapidly acquired than cue-guided (Packard and

McGaugh, 1996)

cue-guided more rapidly acquired than spatial (Pych et al., 2005).

Evidence for involvement of the prefronto-striatal system in model-based strategies

In mPFC: A-O contingencies (Mulder et al., 2003), spatial goals (Hok et al., 2005)

Lesions of the striatum impair model-based strategies (Kelley et al., 1997; Corbit et al., 2001; Yin et al., 2005)

Discussion

Page 58: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 58 / 59

Perspective

EC Project ICEA (Integrating Cognition, Emotion and Autonomy)

Bioinspired interfaces for assessing new hypotheses

DiscussionDiscussion

Neurophysiological experiments, LPPA

Autonomous robotics, LIP6/ISIR

Discussion

Webots software, (c) Wany Robotics

Klusters software(c) L. Hazan in Buzśaki’s lab

Page 59: Actor-Critic models: from ventral striatal reward-related activity to robotics simulations

IntroElectrophysiology

ModellingDiscussion

slide # 59 / 59

Collaborators

Thesis advisors:Agnès GuillotSidney I. Wiener

LPPA Collège de France:Alain BerthozBenoît GirardAdrien PeyracheKarim Benchenane

IDIAP Research Institute:Ricardo Chavarriaga

ISIR, Université Paris 6:Jean-Arcady MeyerLaurent DolléLouis-Emmanuel MartinetOlivier Sigaud

Universiteit van Amsterdam:Francesco P. BattagliaAntonius B. Mulder

Toyama Faculty of Food nutrition:

Eichi Tabuchi

DiscussionDiscussionDiscussion