prefrontal cortex as a meta-reinforcement learning system · prefrontal cortex as a...
TRANSCRIPT
Prefrontal cortex as a Meta-reinforcement learning system
Matthew BotvinickDeepMind, London UKGatsby Computational Neuroscience Unit, UCL
Mnih et al, Nature (2015)
Mnih et al, Nature (2015)
Yamins & DiCarlo, 2016
Schultz et al, Science (1997)
Jederberg et al., 2016
Jederberg et al., 2016
Mante et al., Nature, 2013
Song et al., Elife, 2017
Lake et al, BBS (2017)
Harlow, Psychological Review, 1949
“Learning to learn”
Harlow, Psychological Review, 1949
Training episodes
“Learning to learn”
Mnih et al, Nature (2015)
Jederberg et al., 2016
Jederberg et al., 2016
https://deepmind.com/blog/impala-scalable-distributed-deeprl-dmlab-30/
at vt
ot at-1 rt-1
δ
(PFC)
(DA)
Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci., 2016; Duan et al., arXiv (2016)
0.7 0.4 0.6 0.9 0.3 0.1 0.8 0.7
Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)
at vt
ot at-1 rt-1
δ
(PFC)
(DA)
Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)
Trial1008060401 20
1
2
3
4
Cum
ulat
ive
regr
et
Gittins indices
UCBThompson sampling
Trial
Episode
Left Right
Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)
at vt
ot at-1 rt-1
δ
(PFC)
(DA)
Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)
0.7 0.3 0.6 0.4 0.3 0.7 0.8 0.2
Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)
Trial1008060401 20
1
2
3
4
Cum
ulat
ive
regr
et
Gittins indices
UCBThompson sampling
Trial
Episode
Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)
Training episodes
Wang et al., Nature Neuroscience (2018), Wang et al., Cog. Sci. (2016)
at vt
ot at-1 rt-1
δ
(PFC)
(DA)
Volkmann et al., Nature Reviews Neurology, 2010
420-2-4
-4
-2
0
2
4
log2RRRL
log 2
CR CL
420-2-4
log2RRRL
log 2
CR CL
-4
-2
0
2
4
Tsutsui et al., Nature Comms, 2016
Wang et al., Nature Neuroscience (2018)
at vt
ot at-1 rt-1
δ
(PFC)
(DA)
Wang et al., Nature Neuroscience (2018)
at-1 rt-1 at-1x rt-1 vt
0.2
0.1
0.3
0.4
0.5
0.6
Pro
porti
on
Tsutsui et al., Nature Comms, 2016
0.2
0.1
0.3
0.4
0.5
0.6
Cor
rela
tion
at-1 rt-1 at-1x rt-1 vt
Wang et al., Nature Neuroscience (2018)
at vt
ot at-1 rt-1
δ
(PFC)
(DA)
Wang et al., Nature Neuroscience (2018)
Trial1008060401 20
1
2
3
4
Cum
ulat
ive
regr
et
Gittins indices
UCBThompson sampling
Trial
Episode
A
B
0 20 40 60 80 100 120 140 160 180 200Step
0
0.2
0.4
0.6
0.8
1
0
0.2
0.4
0.6
0.8
1
0 20 40 60 80 100 120 140 160 180 200
Step
Reward probability
Inferred/decoded volatilityLearning rate
action feedback
Behrens et al., Nature Neuroscience, 2007Wang et al., Nature Neuroscience (2018)
Behrens et al., Nature Neuroscience, 2007Wang et al., Nature Neuroscience (2018)
at vt
ot at-1 rt-1
δ
(PFC)
(DA)
Volkmann et al., Nature Reviews Neurology, 2010
Bromberg-Martin et al, J Neurophys, 2010
REVERSAL
Wang et al., Nature Neuroscience (2018)
at vt
ot at-1 rt-1
δ
(PFC)
(DA)
Left rewardedRight rewarded
Wang et al., Nature Neuroscience (2018)
Miller, Botvinick & Brody, Nat. Neuro., 2017; Daw et al., Neuron, 2011
Model-based RPE
Stage 2
1
0
-1
1-1 0
Met
a-R
L R
PE
Reward
r2 = 0.89
Model-based RL (from model-free RL)
Wang et al., Nature Neuroscience (2018)
DA blocked uponfood reward fromlarge/risky option
DA blocked upon food reward from
small/certain option
DA triggered uponfood omission from large/risky option
Wang et al., arXiv; 2018Stopper et al., Neuron, 2014
Optogenetic manipulation of dopamine
Mnih et al, Nature (2015)
• Richer environments / abstractions (Espeholt et al., arXiv, 2018)
• Architectural biases (e.g., Raposo et al., NIPS, 2017)
• Complementary forms of meta-learning (e.g., Fernando et al., under review)
• Episodic reinstatement (Ritter et al., in press)
Current / Future Work
Neuroscience and AI: A virtuous circle
Jane WangZeb Kurth-NelsonDharshan KumaranChris SummerfieldHubert SoyerJoel LeiboSam Ritter
Collaborators
Adam SantoroTim LillicrapDavid Barrett Dhruva TirumalaRemi MunosCharles BlundellDemis Hassabis
DeepMind, London UKGatsby Computational Neuroscience Unit, UCL