大脳基底核の神経活動と行動 neural activity of basal ganglia and its correlation with...

大脳基底核の神経活動と行動Neural activity of Basal ganglia and its correlation with behavior

Kazuyuki Samejima

ATR Human Information Science Laboratories &

`Creating the Brain’ CREST ,JST

共同研究京都府立医科大学　第二生理学教室　　　上田康雅　助手　　　木村實　　教授

ＡＴＲ人間情報科学研究所　　　銅谷賢治

Basal Ganglia

• 運動の実行に関係　疾患 ( パーキンソン病、ハンチントン病）

• 系列運動の学習 ( Hikosaka et.al. 1999)

• 報酬予測 (Shidara et.al. 1998 , Kawagoe 1998)

• 報酬の予測誤差 (Shultz 1997)

Overview

1. Data analysis of recorded data– Striatum activity in remembered sequential

movement

2. Task design based on reinforcement learning paradigm– Dose the Striatum represent ``Value function’’ used

by decision making?

3. Model of dopamine neuron activity– How dose the basal ganglia calculate temporal

difference of reward expectation?

Data analysis of Striatum neurons 　

共同研究開始当時、基底核は系列運動の学習に関係すると考えられていて、運動系列の短期記憶に関係するのではないかと考えられていた。

– 仮説　　　　　視覚誘導　 -> 　 PM 　－ M1

　　　　　記憶誘導　 -> 　 SMA – BG

Instructed – remembered motor sequence task

(Ueda and Kimura, 1999)

Information analysis

• Former analysis method of neural activity– Which type and How many neurons found?

• The category of neurons set by experimenter are arbitrary.

• If the neurons related task but is not classified by the category, they leave in ‘unclassified’ and information from these neurons are lost.

• Information analysis– Total information as a whole of recorded neurons– Time course of information

(Sugase et.al. 1999; Kitazawa et.al. 1998)

Information analysis: definition

Information of movement sequence

s rs

rsprspspsp

RSHSHRSI

)|(log)|()(log)(

)|()();(

Two type of sequence categories are calculated

1. First movement direction ( Left or Right)

2. Second movement type ( Stick or Button )

Information analysis: example

Total distribution →H(s)

Conditioned distribution H(s|r)

I(S;R) = H(S)-H(S|R)

Result: first movement direction

300ms

Time ( msec.)

HoldCenterStart

HoldCenterStart

INSTRUCTED TASK

REMEMBERED TASK

Time ( msec.)

0.011

0.041

•After 200msec from first stimulus, the information of direction (L/R information) rose rapidly.•After first movement, a high level L/R information was maintained.

• The L/R information slowly rose up 800 msec before the first stimulus

• After first movement, a high level L/R information was maintained

Result: Second movement type

Reward

2nd Stim

1stStim

1st Move

2ndMove

2nd Stim

1stStim

1st Move

2ndMove

HoldCenter

Reward

Time ( msec.)

Time ( msec.)

Instructed task

Remembered task•The S/B information was suppressed during the holding period.

•After the fist movement, the S/B information rose up

•After 200msec from 2nd stimulus, the S/B information rapidly rose up and was maintained after reward supply.

Result: # of significant cell

1. Significant cells of L/R information increased before first stimulus (B1) in remembered task (orange bar, upper left).

2. Significant cells of L/B information decreased before first stimulus and increase before second stimulus in remembered task (upper right).

Conclusion

• The activity of striate neurons contains 　 information of direction and types of movement not only during execution of the movement but also during preparatory delay period of the REMEMBERED task. The neuron activity contains information about next movement but little information about the movement in two steps later.

この解析でわかったこと• 線条体は予測性の反応を示す。• ただし、直後の運動要素に対する予測が

ほとんど• 報酬の予測であるのか刺激もしくは行動

の予測であるのかは報酬の条件を変えて見ないとわからない。

　　　　　タスクの条件を変えてみたくなる

Overview


movement

2. Task design based on reinforcement learning paradigm– Dose the Striatum represent ``Value function’’

(reward expectation) used by decision making?

3. Model of Basal Ganglia as a reinforcement learner.– How dose the basal ganglia calculate temporal


Ventral striatum activity in approaching stages to reward

（（ Shidara et.al. 1998Shidara et.al. 1998 ））

Approaching stage is essential for ventral striatal activity

Reward expectation

(Kawagoe et.al 1998)

Reinforcement learning

• Model of shaping animal behavior which lead to get reward.

Environment

Agent/animal

State ActionReward

Agent maximize getting reward by acting to environment

Value function

• Expected reward in future

• Learning value function and policy– Value function indicate the goodness of state or

action.– Policy is selected by value difference of selecting

action or predicted state.

• Temporal difference error

....])2()1()([))(( 2 trtrtrEtxV

)()1()()( tVtVtrt

Substiantia Nigra and Temporal Difference error

• This suggest that TD learning is implemented in BG network .

Schultz et.al. 　 1997

TD errorDopamine neuron

Computational model of striatum

– Action value function Q = MatrixState value function V= Striosome

– GP makes stochastic action selection.– SNc dopamin neuron carry error information of evaluation

(Temporal difference error).

(Doya 2000)

Designing new task inspired by reinforcement learning model

• Dose the striatum represent “action value function” ?

• If neural activity is represent value function – It correlate with Reward expectation– It dose not only represent simple action – It dose not only represent simple stimulus

response

• The condition of same stimulus, same action, but different reward expectation is needed to prove the activity represent value function.

• Action selection based on reward expectation

Related work

– Reward prediction (Kawagoe et.al. 1998)• ADR,1DR task– same stimulus and same action b

ut different reward expectation. Stimuls and action is coupled (delayed saccade indicating cue signal)

-> 　 Dissociate stimulus and action

– Progress in multi stage to reward (Shidara et.al. 1998)

• It has only one action (one lever release)

-> multiple action and decision making

Stochastic reward and target task

•Decision making

•Monkey turns the lever left or right.

•Stochastic Feedback ( LED indicate reaching target, Reward )

•Reward is delivered with probability P(x) where x is target position.

State transition diagram of task

State　 N: Goal positon is hidden　 R: Right is goal position　 L: Left is goal positionAction　 l : selecting left turn　 r: selecting right turnReward probability 　 P( R ): when it reach right target 　 P( L ) ： when it reach left target

サルは隠れたターゲットに気づいて行動しているか？

Yes

報酬確率を変えると行動を変えるか？

Dose the monkey change its action selection by reward probability?

Yes

Model: 3 state RL model• Reinforcement learning

• Action selection

))}(),(())(),(({))(),(())(),((1 tatSQtatSrtatSQtatSQ tt

'

)',(

),(

)|(

a

aSQ

aSQ

e

eSaP

Prediction : short term

• N-state activity and R or L-state activity – Reward expectation

Q(N,l) ~ 1/2P(L)

Q(R,r) ~ P(R)– Between the internal states

which is unknown target and known target position

Q

Prediction: Long term change of reward prediction

• Fixed alpha and beta prediction model

Estimating parameter and hyper-parameter from behavioral data

In previous prediction, we assumed reinforcement learner with fixed learning rate, fixed parameter of stochastic behavior. But, It might change.

• We have to estimate parameter and hyper-parameter of learning system.Sequential Monte Carlo method for estimating Q

value, learning rate and parameter for stochastic behavior.

Overview


movement

2. Task design based on reinforcement learning paradigm

– Dose the Striatum represent ``Value function’’ used by decision making?

3. Model of dopamine neural activity– How dose the basal ganglia calculate temporal


Substiantia Nigra and Temproal Difference error

• This suggest that TD learning is implemented in BG network .

Schultz et.al. 　 1997

TD errorDopamine neuron

Classical Conditioning • Reward expectation (value function) V(t)

• Rewrad r(t)

• Temporal difference error

)1()()()( tVtVtrt

Conditional stimulus

R

r(t)

V(t)

(t)

Reward:

Reward prediction

TD error

Before learningAfter learningReward omitting

Computaional model of BG

• TD model ( Barto 1995, Suri 2000)

Delay!!

(Barto 1995)

What is the biological implimentation of delay for the temporal difference of reward expectation?

Direct-Indirect delay model

• Houk et.al 1995– Direct inhibition pathw

ay -V(t-1)– Indirect disinhibition pa

thway +V(t)

C C C C C

SPs

DA

ST

Primaryreinforcement

-V(t-1)

V(t)r(t)

(Houk, Adams, & Bartio 1995)

TD models

How dose the BG computes TD of reward expectation?

• Network model of BG– Value function V(s ） is learned by cortico-

striatal synaptic plasticity • Dopamine dependent plasticity in cortico-striata

l synapses

– Temprol difference is calculated by difference of receptor delay or dynamics between direct-indirect pathway.

• Integrate-and-fire type neuron model

Basal ganglia network

Cortex

matrix

SNc SNr

Striosome

GPe

GPi

STN

Thalamus

Striatum

- - --

+

-

- - -

Reward

Dopamine

Basal ganglia

Cortex

matrix

SNc SNr

Model

Striosome

Striatum

-

-

GABAAFast dis-inhibition

V(t)

GABABSlow inhibition

V(t-t)

Reward r(t)

2. Two kinds of temporal property of synaptic

current calculate temporal difference of the value V.

1. Cortico-striatal projection computes V(t)

Idea

V(s(t))

Dopamine

3. Dopamine mediated LTP and LTD in cortico-striatal synapse.

-

Integrate-and-fire (IF) neuron model

• A single neuron model – Synaptic current

• Exponential function

– Spontaneous activity • Random current injection

– Threthold dynamics and reset

membrane potential

t

kkC

ktt

k

dttItI

ttforeItI k

)()(

)( /)(max

Input pulse

Ic(t)

GABAA

GABAB ms100

ms10

),()())(( NtIVtVdt

dVCrestm

Dopamine dependent prastisity in Cortico-striatal connection

• SNc dopamine regulates synaptic plasticity in rat neostriatum (Reynolds & Wickens 2001)

•Dopamine sensitive plasticity

)()()ˆ)(( tyTtxtw jdelayiij

Simulation

• Leaky integrate and fire neuron with Glutamate, GABAA and GABAB receptor

• Dopamine mediated LTP and LTD• Orthogonal traveling activity as frontal cortex activity input

…

Slow Synapse= 100msec

Fast synapse = 10msec

8 neurons as cortical input

50 Hz10 Hz

CS

Inputs100 msec ,50Hz

R

Cortex

Reward

MatrixStriosome

SNrSNc -

--

+

+

Reward

Result : Shifting phasic activity

CS

R

CS only

Rewardonly

Afterconditioning

Before Model

Experiment

(Schultz et.al.1997)

Result : Omitted rewardModel Experiment

CS

CS

R

(no R)

(Schultz et.al.1997)

Result: Weight change of cortico-stiatum synapse

Prediction

• Tdelay>100 msec • Activity shifting through learning

1-20

21-40

41-60

61-80

81-100

Rcs

大脳基底核の神経活動と行動 neural activity of basal ganglia and its correlation with...

Documents