dopamine, uncertainty and td learning

Dopamine, Uncertainty and TD Learning

CNS 2004

Yael Niv Michael DuffPeter Dayan

Gatsby Computational Neuroscience Unit, UCL

What is the function of Dopamine?Dorsal Striatum (Caudate, Putamen)

Ventral TegmentalArea

Substantia Nigra

Amygdala

Nucleus Accumbens(Ventral Striatum)

Prefrontal Cortex

Parkinson’s Disease-> Movement control?

Intracranial self-stimulation;Drug addiction-> Reward pathway?-> Learning?

Also involved in:- Working memory- Novel situations- ADHD- Schizophrenia…

What does phasic Dopamine encode?Unpredicted reward(neutral/no stimulus)

Predicted reward(learned task)

Omitted reward(probe trial)

(Schultz et al.)

The TD Hypothesis of Dopamine

Phasic DA encodes a reward prediction error

• Precise theory for generation of DA firing patterns

• Compelling account for the role of DA in classical conditioning

)1()( ttV

)1()1( tVtr

...)3()2()1()()(

trtrtrrtVt

rewardvalue

rV

(Sutton+Barto 1987, Schultz,Dayan,Montague 1997)

)()1()1()1( tVtVtrt Temporal difference error

But: Fiorillo, Tobler & Schultz 2003• Introduce inherent uncertainty into the classical

conditioning paradigm

• Five visual stimuli indicating different reward probabilities: P= 100%, 75%, 50%, 25%, 0%

Stimulus = 2 sec visual stimulus

Reward (probabilistic) = drops of juice

Fiorillo, Tobler & Schultz 2003At stimulus time - DA represents

mean expected reward

Delay activity - A ramp in activity up to reward

Hypothesis: DA ramp encodes uncertainty in reward

“Uncertainty Ramping” and TD error?• The uncertainty is predictable from the stimulus• TD predicts away predictable quantities If it represents uncertainty, the ramping activity should

disappear with learning according to TD.

Uncertainty ramping is not easily compatible with the TD hypothesis

Are the ramps really coding uncertainty?

At time of reward:• Prediction errors result from

probabilistic reward delivery

• Crucially: Positive and negative errors cancel out

A closer look at FTS’s results

p = 50%

p = 75%

• TD prediction error δ(t) can be positive or negative• Neuronal firing rate is only positive (negative values can

be encoded relative to base firing rate)

But: DA base firing rate is low -> asymmetric encoding of δ(t)

A TD Resolution:

55%

270%

δ(t)

DA

Negative δ(t) scaled by d=1/6 prior to PSTH summation

Simulating TD with asymmetric errors

Learning proceeds normally (without scaling) − Necessary to produce the right predictions− Can be biologically plausible

With asymmetric coding of errors, the mean TD error at the time of reward p(1-p)=> Maximal at p=50%

However:• No need to assume explicit coding of uncertainty -

Ramping is explained by neural constraints.• Explanation for puzzling absence of ramp in trace

conditioning results.• Experimental test: Ramp as within or

between trial phenomenon?

Challenges: TD and noise; Conditioned inhibition, additivity

DA - Uncertainty or Temporal Difference?Experiment

Model

Trace conditioning: A puzzle and its resolution

• Same (if not more) uncertainty, but no DA ramping (Fiorillo et al.; Morris, Arkadir, Nevet, Vaadia & Bergman)

• Resolution: lower learning rate in trace conditioning eliminates ramp

CS = short visual stimulus

Trace period

US (probabilistic) = drops of juice

• Rate coding is inherently stochastic• Add noise to tapped delay line representation

=> TD learning is robust to this type of noise

σ = 0.0577

σ = 0.0866

σ = 0.1155

prediction error weights

Mirenowicz and Schultz (1996)

Other sources of uncertainty: Representational Noise (1)

• Neural timing of events is necessarily inaccurate• Add temporal noise to tapped delay line representation

=> Devastating effects of even small amounts of temporal noise on TD predictions

Other sources of uncertainty: Representational Noise (2)

ε = 0.05

ε = 0.10

dopamine, uncertainty and td learning

Documents

td learningcns

td resolution

additivityda uncertainty

inherent uncertainty

td hypothesisare

mean td error

explicit coding of uncertainty

da ramping fiorillo