dopamine, uncertainty and td learning
DESCRIPTION
Dopamine, Uncertainty and TD Learning. Yael Niv Michael Duff Peter Dayan Gatsby Computational Neuroscience Unit, UCL. CNS 2004. Dorsal Striatum (Caudate, Putamen). Prefrontal Cortex. Nucleus Accumbens (Ventral Striatum). Amygdala. Substantia Nigra. Ventral Tegmental Area. - PowerPoint PPT PresentationTRANSCRIPT
Dopamine, Uncertainty and TD Learning
CNS 2004
Yael Niv Michael DuffPeter Dayan
Gatsby Computational Neuroscience Unit, UCL
What is the function of Dopamine?Dorsal Striatum (Caudate, Putamen)
Ventral TegmentalArea
Substantia Nigra
Amygdala
Nucleus Accumbens(Ventral Striatum)
Prefrontal Cortex
Parkinson’s Disease-> Movement control?
Intracranial self-stimulation;Drug addiction-> Reward pathway?-> Learning?
Also involved in:- Working memory- Novel situations- ADHD- Schizophrenia…
What does phasic Dopamine encode?Unpredicted reward(neutral/no stimulus)
Predicted reward(learned task)
Omitted reward(probe trial)
(Schultz et al.)
The TD Hypothesis of Dopamine
Phasic DA encodes a reward prediction error
• Precise theory for generation of DA firing patterns
• Compelling account for the role of DA in classical conditioning
)1()( ttV
)1()1( tVtr
...)3()2()1()()(
trtrtrrtVt
rewardvalue
rV
(Sutton+Barto 1987, Schultz,Dayan,Montague 1997)
)()1()1()1( tVtVtrt Temporal difference error
But: Fiorillo, Tobler & Schultz 2003• Introduce inherent uncertainty into the classical
conditioning paradigm
• Five visual stimuli indicating different reward probabilities: P= 100%, 75%, 50%, 25%, 0%
Stimulus = 2 sec visual stimulus
Reward (probabilistic) = drops of juice
Fiorillo, Tobler & Schultz 2003At stimulus time - DA represents
mean expected reward
Delay activity - A ramp in activity up to reward
Hypothesis: DA ramp encodes uncertainty in reward
“Uncertainty Ramping” and TD error?• The uncertainty is predictable from the stimulus• TD predicts away predictable quantities If it represents uncertainty, the ramping activity should
disappear with learning according to TD.
Uncertainty ramping is not easily compatible with the TD hypothesis
Are the ramps really coding uncertainty?
At time of reward:• Prediction errors result from
probabilistic reward delivery
• Crucially: Positive and negative errors cancel out
A closer look at FTS’s results
p = 50%
p = 75%
• TD prediction error δ(t) can be positive or negative• Neuronal firing rate is only positive (negative values can
be encoded relative to base firing rate)
But: DA base firing rate is low -> asymmetric encoding of δ(t)
A TD Resolution:
55%
270%
δ(t)
DA
Negative δ(t) scaled by d=1/6 prior to PSTH summation
Simulating TD with asymmetric errors
Learning proceeds normally (without scaling) − Necessary to produce the right predictions− Can be biologically plausible
With asymmetric coding of errors, the mean TD error at the time of reward p(1-p)=> Maximal at p=50%
However:• No need to assume explicit coding of uncertainty -
Ramping is explained by neural constraints.• Explanation for puzzling absence of ramp in trace
conditioning results.• Experimental test: Ramp as within or
between trial phenomenon?
Challenges: TD and noise; Conditioned inhibition, additivity
DA - Uncertainty or Temporal Difference?Experiment
Model
Trace conditioning: A puzzle and its resolution
• Same (if not more) uncertainty, but no DA ramping (Fiorillo et al.; Morris, Arkadir, Nevet, Vaadia & Bergman)
• Resolution: lower learning rate in trace conditioning eliminates ramp
CS = short visual stimulus
Trace period
US (probabilistic) = drops of juice
• Rate coding is inherently stochastic• Add noise to tapped delay line representation
=> TD learning is robust to this type of noise
σ = 0.0577
σ = 0.0866
σ = 0.1155
prediction error weights
Mirenowicz and Schultz (1996)
Other sources of uncertainty: Representational Noise (1)
• Neural timing of events is necessarily inaccurate• Add temporal noise to tapped delay line representation
=> Devastating effects of even small amounts of temporal noise on TD predictions
Other sources of uncertainty: Representational Noise (2)
ε = 0.05
ε = 0.10