figure 4 responses of dopamine neurons to unpredicted primary reward (top) and the transfer of this...

FIGURE 4 Responses of dopamine neurons to unpredicted primary reward (top) and the transfer of this

response to progressively earlier reward-predicting conditioned stimuli with training (middle). The bottom record shows a control baseline task when the reward is predicted by an earlier stimulus and not the light. From Schultz et al. (1995) with permission.

http://images.neuron.org/images/journal_images/0896-6273/PIIS0896627303004744.GR1.lrg.gif

Odor Selective Cells in the Amygdala fire preferentially with regard to outcome or reward value of an odor prior to demonstration that the animal has learned this outcome or value.

Odor Selective Cells in the Amygdala fire preferentially with regard to outcome or reward value of an odor simultaneous to demonstration that the animal has learned this outcome or value.

Cells in Orbitofrontal Cortex (OFC) show less selectivity to outcome, in rats without an amygdala. This

demonstrates a role for the amygdala in conveying motivational/reward information to the OFC.

http://images.neuron.org/images/journal_images/0896-6273/PIIS0896627303004744.GR6.lrg.gif

http://www.sciencemag.org/content/vol301/issue5636/images/large/se3231783001.jpeg

http://www.sciencemag.org/content/vol301/issue5636/images/large/se3231783002.jpeg

Dopamine, reward processing and optimal prediction

ONLY AS A REFERENCE FOR THOSE WHO ARE INTERESTED IN BEGINNING TO CROSS THE NEUROBEHAVIORALCOMPUTATIONAL DIVIDE – Maybe after the Exam??

Human dopaminergic system

Cortical and striatal projections

Schultz, 1998

Koob & Le Moal, 2001

Schultz, Dayan & Montague 1997

Expected Reward

v = wu

v : expected reward w : weight (association) u : stimulus (binary)

Rescorla-Wagner Rule

Association update rule: w w + αδuw : weight (association)α : learning rateu : stimulus

Prediction error: δ = r - vr : actual reward

v : expected reward

Rescorla - Wagner provides account for:

Some Pavlovian conditioningExtinctionPartial reinforcement

and, with more than one stimulus:

BlockingInhibitory conditioningOvershadowing

… but not

Latent inhibition (CS preexposure effect)Secondary conditioning

A recent update: uncertainty (i²)

Kakade, Montague & Dayan, 2001

Kalman weight update rule:

wi wi + αi δ

With associability:

αi = i² ui

jj² uj +E

An example:

U1 U2 U3 U4 U5

U(t)

input

U(t)

input

r(t)

U(t)

input

r(t)

w(t)

U(t)

input

ŵ(t)

v(t)

U(t)

input

r(t)

ŵ(t)

v(t)

U(t)

input

r(t)

ŵ(t)

v(t)

δ(t)

(t) = r(t) - v(t)

Error Rule

U(t)

ŵ(t)

v(t)

inset

Ui -input

i wi

-uncertainty -weight

Uncertainty

Kalman learning & associability

weight update rule:

ŵi (t+1) = ŵi (t) + α i (t) δ (t)

associability:

αi(t) =i(t)² xi (t)jj(t)² xj (t)+E

Stimulus uncertainties

Reward prediction

Predicting future reward

single time steps:v = wu v : expected reward

w : weight (association)

u : stimulus

total predicted reward:

v(t) = w(τ) u(t - τ) t : time steps in a

trial τ : current time step

t τ=0

Sum of discounted future rewards:

With 0 ≤ γ ≤ 1

In recursive form:

Schultz, Dayan & Montague, 1997

Exponential discounting, γ = .95

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

TIME STEPS

RE

WA

RD

VA

LUE

Temporal difference rule

Total estimated future reward: v(t) = r(t)+ γv(t+1) r(t) = v(t)-γv(t+1)

Temporal difference rule: δ = r(t)+γv(t+1)-v(t)

(With single time steps: δ = r - vr : actual reward

v : expected reward )

Temporal difference rule

Total estimated future reward: v(t) = r(t)+v(t+1) r(t) = v(t)-v(t+1)

Temporal difference rule: δ = r(t) + v(t+1)-v(t)

(With single time steps: δ = r - vr : actual reward

v : expected reward )

Schultz, 1996

Anatomical interpretation


Temporal Difference Rule for Navigation

between successive steps u and u’

δ = ra (u) + γ v(u’)-v(u)

Behavior evaluation Hippocampal place field

Foster, Morris & Dayan 2000

Spatial learning

Foster, Morris & Dayan 2000

Conclusions

• Behavioral study of (nonhuman) neural systems is interesting

• Neural processes amenable to contemporary learning theory

• .. they may play distinct roles a normative framework of learning

e.g. vta, hippocampus, subiculum, also- Ach in NBM/SI, NE in LC, 5-HT, ventral striatum,

lateral connections ,core/shell distinctions of the NAAC, patch-matrix anatomy in basal ganglia, the superior colliculus,

psychoalphabetadiscobioaquadodoo

figure 4 responses of dopamine neurons to unpredicted primary reward (top) and the transfer of this...

Documents

weight slide

uncertainty slide

expected reward slide

reward prediction slide

utut rtrt slide

utut input slide

stimulus binary slide

utut tt vtvt slide