hierarchical bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05...

76
Hierarchical Bayesian modelling of volatile environments Christoph Mathys Scuola Internazionale Superiore di Studi Avanzati (SISSA) Trieste Quantitative Life Sciences Guest Seminar The Abdus Salam International Centre for Theoretical Physics (ICTP) Trieste, 8 March 2017

Upload: others

Post on 24-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HierarchicalBayesianmodellingofvolatileenvironments

ChristophMathys

Scuola Internazionale Superiore diStudi Avanzati (SISSA)

Trieste

QuantitativeLifeSciencesGuestSeminarTheAbdus SalamInternationalCentreforTheoreticalPhysics(ICTP)

Trieste,8March2017

Page 2: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

• The inferential (i.e., Bayesian) brain

• Reduction of Bayesian inference to precision-weighted prediction errors

• Non-stationary environments and the hierarchical Gaussian filter (HGF)

• Experimental studies and results

Outline

2

Page 3: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Computationalmodellingandtheinferentialbrain

Stephanetal.(2016),Front.Hum.Neurosci.,10:5503

Page 4: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Abstract:

«Thedesignofacomplexregulatoroftenincludesthemakingofamodelofthesystemtoberegulated.Themakingofsuchamodelhashithertobeenregardedasoptional,asmerelyoneofmanypossibleways.

Inthispaperatheoremispresentedwhichshows,underverybroadconditions,thatanyregulatorthatismaximallybothsuccessfulandsimplemust beisomorphicwiththesystembeingregulated.(Theexactassumptionsaregiven.)Makingamodelisthusnecessary.

Thetheoremhastheinterestingcorollarythatthelivingbrain,sofarasitistobesuccessfulandefficientasaregulatorforsurvival,must proceed,inlearning,bytheformationofamodel(ormodels)ofitsenvironment.»

“Everygoodregulatorofasystemmustbeamodelofthatsystem”(Conant&Ashby,1970)

4

Page 5: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

• «Bayesian inference» simply means inference on uncertain quantitiesaccording to the rules of probability theory (i.e., according to logic).

• Agents who use Bayesian inference will make better predictions (providedthey have a good model of their environment), which will give them anevolutionary advantage.

• We may therefore assume that evolved biological agents use Bayesianinference, or a close approximation to it.

• But how can we reduce Bayesian inference to a simple algorithm that canbe implemented by neurons?

Whattheinferentialbraindoes:Bayesianinference

5

Page 6: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Imagine the following situation:

You’re on a boat, you’re lost in a storm and trying to get back to shore. Alighthouse has just appeared on the horizon, but you can only see it whenyou’re at the peak of a wave. Your GPS etc., has all been washed overboard,but what you can still do to get an idea of your position is to measure the anglebetween north and the lighthouse. These are your measurements (in degrees):

76,73,75,72,77

What number are you going to base your calculation on?

Right. The mean: 74.6. How do you calculate that?

BeforewereturntoBayes:averysimpleexampleofupdatinginresponsetonewinformation

6

Page 7: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

The usual way to calculate the mean �̅� of 𝑥#, 𝑥%, … , 𝑥' is to take

�̅� =1𝑛+𝑥,

'

,-#

This requires you to remember all 𝑥, , which can become inefficient. Since themeasurements arrive sequentially, we would like to update �̅� sequentially asthe 𝑥, come in – without having to remember them.

It turns out that this is possible. After some algebra (see next slide), we get

�̅�'.# = �̅�' +1

𝑛 + 1 𝑥'.# − �̅�'

Updatingthemeanofaseriesofobservations

7

Page 8: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Proof of sequential update formula:

�̅�'.# =1

𝑛 + 1+ 𝑥, =𝑥'.#𝑛 + 1

'.#

,-#

+1

𝑛 + 1+𝑥, =𝑥'.#𝑛 + 1

'

,-#

+𝑛

𝑛 + 11𝑛+𝑥,

'

,-#-2̅3

=

=𝑥'.#𝑛 + 1 +

𝑛𝑛 + 1 �̅�' = �̅�' +

𝑥'.#𝑛 + 1 +

𝑛𝑛 + 1 �̅�' −

𝑛 + 1𝑛 + 1 �̅�' =

= �̅�' +1

𝑛 + 1 𝑥'.# + 𝑛 − 𝑛 − 1 �̅�' = �̅�' +1

𝑛 + 1 𝑥'.# − �̅�'

q.e.d.

Updatingthemeanofaseriesofobservations

8

Page 9: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

The seqential updates in our example now look like this:

Updatingthemeanofaseriesofobservations

72 73 74 75 76 77

�̅�# = 76

�̅�% = 76 +12 73 − 76 = 74.5

�̅�; = 74.5 +13 75 − 74.5 = 74.6<

�̅�= = 74.6< +14 72 − 74.6< = 74

�̅�> = 74 +15 77 − 74 = 74.6

9

Page 10: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

�̅�'.# = �̅�' +1

𝑛 + 1 𝑥'.# − �̅�'

Whatarethebuildingblocksoftheupdateswe’vejustseen?

prediction

predictionerror

newinput

weight(learningrate)

Is this a general pattern?

More specifically, does it generalize to Bayesian inference?

Indeed, it turns out that in many cases, Bayesian inference can be based onparameters that are updated using precision-weighted prediction errors.

10

Page 11: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Think boat, lighthouse, etc., again, but now we’re doing Bayesian inference.

Before we make the next observation, our belief about the true value of the parameter 𝜗 can bedescribed by a Gaussian prior:

𝑝(𝜗) ∼ 𝒩(𝜇F, 𝜋FH#)

The likelihood of an observation 𝑥 is also Gaussian, with precision 𝜋I :

𝑝 𝑥 𝜗 ∼ 𝒩 𝜗, 𝜋IH#

Bayes’ rule now tells us that the posterior is Gaussian again:

𝑝 𝜗 𝑥 =𝑝 𝑥 𝜗 𝑝(𝜗)

∫ 𝑝 𝑥 𝜗′ 𝑝 𝜗′ d𝜗′��

∼ 𝒩 𝜇F|2, 𝜋F|2H#

UpdatesinasimpleGaussianmodel

11

Page 12: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Here’s how the updates to the sufficent statistics 𝜇 and 𝜋 describing our belief look like:

𝜋F|2 = 𝜋F + 𝜋I

𝜇F|2 = 𝜇F +𝜋I𝜋F|2

(𝑥 − 𝜇F)

The mean is updated by an uncertainty-weighted (more specifically: precision-weighted)prediction error.

The size of the update is proportional to the likelihood precision and inversely proportional to theposterior precision.

This pattern is not specific to the univariate Gaussian case, but generalizes to Bayesian updatesfor all exponential families of likelihood distributions with conjugate priors (i.e., to all formaldescriptions of inference you are ever likely to need).

UpdatesinasimpleGaussianmodel

prediction weight(learningrate)=OPQRSTOQUVWUXUYWZ[Z\OUWU

OPQRSTOQUYXWUY]^_ZPQ

predictionerror

12

Page 13: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Reminder (Gaussian update):

𝜇F|2 = 𝜇F +𝜋I𝜋F|2

𝑥 − 𝜇F = 𝜇F +𝜋I

𝜋F + 𝜋I(𝑥 − 𝜇F)

Reducing by 𝜋`the fraction of precisions that make the learning rate, we get

𝜇F|2 = 𝜇F +1

𝜋F𝜋I+ 1

(𝑥 − 𝜇F)

As we shall see, this is the equation for updating an arithmetic mean, but with the number ofobservations 𝑛 replaced by ab

ac.

This shows that Bayesian inference on the mean of a Gaussian distribution entails nothing morethan updating the arithmetic mean of observations with ab

ac=: 𝜈 as a proxy for the number of

prior observations, i.e. for theweight of the prior relative to the observation.

Reductiontomeanupdating

13

Page 14: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Many of the most widely used probability distributions are families of exponential distributions.

For example, the Gaussian distribution is an exponential family of distributions (and so are the beta,gamma, binomial, Bernoulli, multinomial, categorical, Dirichlet, Wishart, Gaussian-gamma, log-Gaussian, multivariate Gaussian,Poisson, and exponential distributions, among others). This means it can be written the following way:

𝑝 𝒙 𝝑 = ℎ 𝒙 exp 𝜼 𝝑 m 𝑻 𝒙 − 𝐴(𝝑) =12𝜋𝜎� exp −

𝑥 − 𝜇 %

2𝜎with

𝒙 = 𝑥, 𝝑 = 𝜇, 𝜎 u, ℎ 𝒙 =12𝜋� , 𝜼 𝝑 =

𝜇𝜎 , −

12𝜎

u, 𝑻 𝒙 = 𝑥, 𝑥% u, 𝐴 𝝑 =

𝜇%

𝜎 +ln 𝜎2

This allows us to look at Bayesian belief updating in a very general way for all exponentialfamilies of distributions.

Generalizationtoallexponentialfamiliesofdistributions

14

Page 15: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Our likelihood is an exponential family in its general form:

𝑝 𝒙 𝝑 = ℎ 𝒙 exp 𝜼 𝝑 m 𝑻 𝒙 − 𝐴(𝝑)

The vector 𝑻 𝒙 (a function of the observation 𝒙) is called the sufficient statistic.

For the prior, we may assume that we have made 𝜈 observations with sufficient statistic 𝝃:

𝑝 𝝑 𝝃, 𝜈 = 𝑧 𝝃, 𝜈 exp 𝜈 𝜼 𝝑 m 𝝃 − 𝐴(𝝑) (where𝑧 𝝃, 𝜈 isanormlizationconstant)

It then turns out that the posterior has the same form, but with an updated 𝝃 and 𝜈 replaced with𝜈 + 1:

𝑝 𝝑 𝒙, 𝝃, 𝜈 = 𝑧 𝝃z, 𝜈 + 1 exp 𝜈 + 1 𝜼 𝝑 m 𝝃′ − 𝐴(𝝑)

𝝃z = 𝝃 +1

𝜈 + 1 𝑻 𝒙 − 𝝃

Generalizationtoallexponentialfamiliesofdistributions(Mathys,2016)

15

Page 16: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Proofoftheupdateequation

𝑝 𝝑 𝒙, 𝝃, 𝜈{P|}UW[PW

∝ 𝑝 𝒙 𝝑X[_UX[OPP]

𝑝 𝝑 𝝃, 𝜈{W[PW

= ℎ 𝒙 exp 𝜼 𝝑 m 𝑻 𝒙 − 𝐴(𝝑) 𝑧 𝝃, 𝜈 exp 𝜈 𝜼 𝝑 m 𝝃 − 𝐴(𝝑)

∝ exp 𝜼 𝝑 m 𝑻 𝒙 + 𝜈𝝃 − 𝜈 + 1 𝐴 𝝑

= exp 𝜈 + 1 𝜼 𝝑 m1

𝜈 + 1 𝑻 𝒙 + 𝜈𝝃 − 𝐴 𝝑

= exp 𝜈 + 1 𝜼 𝝑 m 𝝃 +1

𝜈 + 1 𝑻 𝒙 + 𝜈𝝃 − 𝜈 + 1 𝝃 − 𝐴 𝝑

= exp 𝜈 + 1 𝜼 𝝑 m 𝝃 +1

𝜈 + 1 𝑻 𝒙 − 𝝃-:𝝃z

− 𝐴 𝝑

⟹ 𝑝 𝝑 𝒙, 𝝃, 𝜈 = 𝑧 𝝃z, 𝜈z exp 𝜈′ 𝜼 𝝑 m 𝝃z − 𝐴 𝝑

with𝜈z ≔ 𝜈 + 1, 𝝃z ≔ 𝝃 +1

𝜈 + 1 𝑻 𝒙 − 𝝃

q.e.d.

16

Page 17: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Univariate Gaussian model with unkown mean but known precision (our example from thebeginning):

𝑻 𝑥 = 𝑥

This means updating beliefs about the mean simply requires tracking the mean of observations

Univariate Gaussianmodel with unkownmean and unkown precision:

𝑻 𝑥 = 𝑥, 𝑥% u

Updating beliefs about both mean and precision of a Gaussian requires tracking the means ofobservations and squared observations; this amounts to the first and second moments by which aGaussian distribution is fully characterized.

In themultivariate Gaussian case we have 𝑻 𝒙 = 𝒙, 𝒙𝒙u u

Someexamples

17

Page 18: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Bernoullimodel (one out of two possible outcomes, coded as 0 and 1; e.g., coin flipping):

𝑻 𝑥 = 𝑥

The prior here turns out to be a beta distribution corresponding to 𝜈 pseudo-observations withmean 𝜉. All we need to do to get the posterior (i.e., to update our belief) is to update the mean asnew observations come in.

Categoricalmodel (one out of several possible outcomes, with the observed outcome coded as 1,the rest as 0)

𝑻 𝒙 = 𝒙

The prior and posterior here are Dirichlet distributions, and again, all we need to do to updatebeliefs that have a Dirichlet form is to track the means of observed succeses (1) and failures (0).

Someexamples

18

Page 19: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Betamodel (an outcome bounded between 0 and 1):

𝑻 𝑥 = ln 𝑥 , ln 1 − 𝑥 u

Gammamodel (an outcome bounded below at 0):

𝑻 𝑥 = ln 𝑥 , 𝑥 u

Someexamples

19

Page 20: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

…we have dealt (among others) with beliefs about states that are• binary (Bernoulli)

• categorical

• bounded on both sides (beta)• bounded on one side (gamma)

• and unbounded (Gaussian)

This includes most kinds of states we can have beliefs about. Notably,

• All Bayesian (i.e., probabilistic, rational) updates of such beliefs take the form ofprecision-weighted prediction errors.

• These prediction errors and their precision weights are easy to compute.

• The prediction errors are simple functions of inputs.

...soinsum:

20

Page 21: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

• Examples of distributions that are not exponential families: Student’s t, Cauchy

• These distributions are popular because of their «fat tails». However, fat tails canalso be achieved with appropriate hierarchies of Gaussians (cf. the hierarchicalGaussian filter, HGF)

• A further kind of distributions that are not exponential families are found inmixture models.

• Such models are popular because of they provide multimodal distributions. Butagain, appropriate hierarchies of distributions may save the day.

Limitations

21

Page 22: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

• Formulate the problem hierarchically (i.e., imitate evolution: whenit built a brain that supports a mind which is a model of itsenvironment, it came up with a (largely) hierarchical solution)

• Separate levels using a mean-field approximation

• Derive update equations

• Example: HGF

Howtorevealtheprecision-weightingofpredictionerrorswhensimpleexponential-familylikelihoodswillnotdo

22

Page 23: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Neurobiology:predictivecoding(e.g.,Friston,2005)

23

Page 24: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

𝜆 𝑥

Sensoryinput

Truehiddenstates

Inferredhiddenstates

Action

𝑢

𝑎

WorldAgent No,thedynamicsaremissing!

But:doesinferenceaswe’vedescribeditadequatelydescribethesituationofactualbiologicalagents?

24

Page 25: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Up to now, we’ve only looked at inference on static quantities, but biologicalagents live in a continually changing world.

In our example, the boat’s position changes and with it the angle to thelighthouse.

How can we take into account that old information becomes obsolete? If wedon’t, our learning rate becomes smaller and smaller because our eqationswere derived under the assumption that we’re accumulating informationabout a stable quantity.

Whataboutdynamics?

25

Page 26: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Keep it constant!

So, taking the update equation for the mean of our observations as our point of departure...

�̅�' = �̅�'H# +1𝑛 𝑥' − �̅�'H# ,

... we simply replace #'with a constant 𝛼:

𝜇' = 𝜇'H# + 𝛼 𝑥' − 𝜇'H# .

This is called Rescorla-Wagner learning [although it wasn’t this line of reasoning that ledRescorla & Wagner (1972) to their formulation].

What’sthesimplestwaytokeepthelearningratefromgoingtoolow?

26

Page 27: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Partly: it implies a certain rate of forgetting because it amounts to taking onlythe 𝑛 = #

�last data points into account. But...

... if the learning rate is supposed to reflect uncertainty in Bayesian inference,then how do we

(a) know that 𝛼 reflects the right level of uncertainty at any one time, and

(b) account for changes in uncertainty if 𝛼 is constant?

What we really need is an adaptive learning that accurately reflectsuncertainty.

Doesaconstantlearningratesolveourproblems?

27

Page 28: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

This requires us to think a bit more about what kinds of uncertainty we are dealing with.

A possible taxonomy of uncertainty is (cf. Yu & Dayan, 2003; Payzan-LeNestour & Bossaerts,2011):

(a) outcome uncertainty that remains unaccounted for by the model, called risk byeconomists (𝜋I in our Bayesian example); this uncertainty remains even when we know allparameters exactly,

(b) informational or expected uncertainty about the value of model parameters (𝜋F|2 in theBayesian example),

(c) environmental or unexpected uncertainty owing to changes in model parameters (notaccounted for in our Bayesian example, hence unexpected).

Needed:anadaptivelearningratethataccuratelyreflectsuncertainty

28

Page 29: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Various efforts have been made to come up with an adaptive learning rate:– Kalman (1960)

– Sutton (1992)

– Nassar et al. (2010)– Payzan-LeNestour & Bossaerts (2011)

– Mathys et al. (2011)

– Wilson et al. (2013)

The Kalman filter is optimal for linear dynamical systems, but realistic data usuallyrequire non-linear models.

Mathys et al. use a generic non-linear hierarchical Bayesian model that allows us toderive update equations that are optimal in the sense that they minimize surprise.

Anadaptivelearningratethataccuratelyreflectsuncertainty

29

Page 30: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

ThehierarchicalGaussianfilter(HGF,Mathysetal.,2011;2014)

𝑝 𝑥'(�)

𝑥%(�H#)

𝑥'(�)~𝒩 𝑥'

(�H#), 𝜗

𝑥#(�)~𝒩 𝑥#

�H# , 𝑓# 𝑥%

𝑝 𝑥#(�)

𝑥#(�H#)

𝑥%(�)~𝒩 𝑥%

�H# , 𝑓% 𝑥;

𝑝 𝑥%(�)

𝑥%(�H#)

𝑥,(�)~𝒩 𝑥,

�H# , 𝑓, 𝑥,.#

𝑝 𝑥,(�)

𝑥,(�H#)

30

The HGF provides a generic solution to the problem of adapting one’s learning rate in avolatile environment.

Variationalinversionleadstoupdateequationsthatare– youguessedit– precision-weightedpredictionerrors.

Page 31: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Couplingbetweenlevels

Since𝑓 hastobeeverywherepositive,wecannotapproximateitbyexpandinginpowers.Instead,weexpanditslogarithm.

𝑓 𝑥 > 0∀𝑥 ⟹∃𝑔: 𝑓 𝑥 = exp 𝑔 𝑥 ∀𝑥

𝑔 𝑥 = 𝑔 𝑎 + 𝑔z 𝑎 m 𝑥 − 𝑎 + 𝑂 2 = log 𝑓 𝑥 =

= log 𝑓 𝑎 +𝑓′(𝑎)𝑓(𝑎) m 𝑥 − 𝑎 + 𝑂(2) =

=𝑓′(𝑎)𝑓(𝑎)≝�

m 𝑥 + log 𝑓 𝑎 − 𝑎 m𝑓z(𝑎)𝑓(𝑎)

≝�

+ 𝑂 2 =

= 𝜅𝑥 + 𝜔 + 𝑂 2

⟹ 𝑓 𝑥 ≈ exp 𝜅𝑥 + 𝜔

31

Page 32: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Variationalinversion

• Aquadraticapproximationisfoundbyexpandingtosecondorderabouttheexpectation𝜇(�H#).

• Theupdateinthesufficientstatisticsoftheapproximateposterioristhenperformedbyanalyticallyfindingthemaximumofthequadraticapproximation.

x

μ(k) μ(k-1)

Ĩ(x)I(x)

Expansion point

Our quadraticapproximation

Variationalenergy

Maximum of Ĩ

Laplace’s quadraticapproximation

Mathys et al. (2011). Front. Hum. Neurosci., 5:39.

32

Page 33: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Δ𝜇, ∝𝜋�,H#𝜋,

𝛿,H#

• Inversion proceeds by introducing a mean field approximation and fittingquadratic approximations to the resulting variational energies (Mathys et al.,2011).

• This leads to simple one-step update equations for the sufficient statistics(mean and precision) of the approximate Gaussian posteriors of the states𝑥, .

• The updates of the means have the same structure as value updates inRescorla-Wagner learning:

• Furthermore, the updates are precision-weighted prediction errors.

Variationalinversionandupdateequations

Prediction error

Precisions determine learning rate

33

Page 34: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Precision-weightingofvolatilityupdates

Comparisontothesimplenon-hierarchicalBayesianupdate:

HGF:

SimpleGaussian:

𝜇,(�) = 𝜇,

(�H#) +12 𝜅,H#𝑣,H#

(�) m𝜋�,H#(�)

𝜋,(�) m 𝛿,H#

(�)

𝜇F|2 = 𝜇F +𝜋I𝜋F|2

(𝑥 − 𝜇F)

Precision-weighted prediction error

34

Page 35: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

At the outcome level (i.e., at the very bottom of the hierarchy), we have

𝑢(�)~𝒩 𝑥#� , 𝜋��H#

This gives us the following update for our belief on 𝑥# (our quantity of interest):

𝜋#(�) = 𝜋�#

(�) + 𝜋��

𝜇#(�) = 𝜇#

(�H#) +𝜋��𝜋#(�) 𝑢 � − 𝜇#

(�H#)

The familiar structure again – but now with a learning rate that is responsive to allkinds of uncertainty, including environmental (unexpected) uncertainty.

Updatesattheoutcomelevel

35

Page 36: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Unpacking the learning rate, we see:

𝜋��𝜋#(�) =

𝜋��𝜋�#(�) + 𝜋��

=𝜋��

1𝜎#(�H#) + exp 𝜅#𝜇%

(�H#) + 𝜔#+ 𝜋��

ThelearningrateintheHGF

informationaluncertainty

environmentaluncertainty

outcomeuncertainty

36

Page 37: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

3-levelHGFforcontinuousobservations

37

Page 38: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Exampleofprecisionweighttrajectory

38

Page 39: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Contexteffectsonthelearningrate

Simulation: 4.1 ,2.2 ,5.0 =-== kwJ

39

Page 40: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Iglesiasetal,Neuron,2013)

40

Page 41: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Modelcomparison:

HGF:empiricalevidence(Iglesiasetal,Neuron,2013)

41

Page 42: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Modelcomparison:

HGF:empiricalevidence(Iglesiasetal,Neuron,2013)

42

Page 43: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Iglesiasetal,Neuron,2013)

43

Page 44: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Iglesiasetal,Neuron,2013)

44

Page 45: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Iglesiasetal,Neuron,2013)

45

Page 46: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(DeBerkeretal,NatureComms,2016)

46

Page 47: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(DeBerkeretal,NatureComms,2016)

47

Page 48: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(DeBerkeretal,NatureComms,2016)

48

Page 49: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(DeBerkeretal,NatureComms,2016)

49

Page 50: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Marshall&Mathysetal.,PLOSBiol.,2016)

50

Page 51: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Marshall&Mathysetal.,PLOSBiol.,2016)

51

Page 52: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Marshall&Mathysetal.,PLOSBiol.,2016)

52

Page 53: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Marshall&Mathysetal.,PLOSBiol.,2016)

53

Page 54: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Diaconescu etal.,inpreparation)

54

Page 55: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Hierarchy

3. Outcome PE

1. Cue-Related PE

2. Advice PE

5. Volatility PE

4. Advice Precision

6. Volatility Precision

x=5, y=0, z=35

x=7, y=7, z=-2

x=57, y=-44, z=7

x=6, y=33, z=20

x=9, y=42, z=11 x=9, y=4, z=52

x=7, y=37, z=32

x=-7, y=-56, z=20

Positive

x=7, y=39, z=38

Negative

x=-30, y=-50, z=46

!"#$ = & $ − "# $

!($ = & $ − )*(

$

+,($) = /*(

$ + +*,$

!,$ = /, + ),

$ − ),$1( ,

+*,$ − (

+3($4 = /*3

$ +5,

, 6,$ 6,

$ + 7,$ !,

$

!8$ = & $ − 8 $

HGF:empiricalevidence(Diaconescu etal.,inpreparation)

55

Page 56: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Diaconescu etal.,inpreparation)

Scal

p Lo

catio

n

peak: 134ms

posterior

anterior

1. Cue-related PE

peak: 152ms

2a. Negative Advice PE

4. Advice Precision

peak: 358ms

peak: 400ms

peak: 414ms

peak: 310ms

3. Outcome PE 5. Volatility PE

50 Time (ms) 550

6. Volatility Precision

Hierarchy

2b. Positive Advice PE

peak: 242ms

56

Page 57: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Diaconescu etal.,inpreparation)

57

Page 58: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

�������������

A

tone outcome

300ms 150ms

response1500-2500ms

orface

nonoise

mednoise

highnoise

loworhigh

house

C

*

**

**

BASD

ASD

ASD

ASD

Communication Score (ADOS-2)

UE-

E R

T (m

s)a.

u

E

D

UE-

E ou

tcom

es (a

.u.)

NTASD

*

*

NT

NT

NT

NT

time(ms)

HGF:empiricalevidence(Lawsonetal.,inrevision)

58

Page 59: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

HGF:empiricalevidence(Lawsonetal.,inrevision)

Effect of precision-weighted volatility prediction error 𝜺𝟑 on pupil diameter:

59

Page 60: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Howtoestimateandcomparemodels:theHGFToolbox

• Availableathttps://www.tnu.ethz.ch/tapas

• StartwithREADME,manual,andinteractivedemo

• Modular,extensible

• Matlab-based

60

Page 61: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

ThanksRickAdamsArchiedeBerker

SvenBestmannKayBrodersen

JeanDaunizeau

Andreea DiaconescuRayDolan

KarlFristonSandraIglesias

LarsKasperRebeccaLawson

EkaterinaLomakinaBerkMirza

ReadMontagueTobiasNolte

Saee PaliwalRobbRutledge

Klaas EnnoStephanPhilippSchwartenbeck

SimoneVosselLilianWeber

61

Page 62: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Applicationtobinarydata

State of the world Model

Log-volatilityx3

of tendency

Gaussianrandom walk with

constant step size ϑ

p(x3(k)) ~ N(x3

(k-1),ϑ)

Tendencyx2

towards category “1”

Gaussian random walk with

step sizeexp(κx3+ω)

p(x2(k)) ~ N(x2

(k-1), exp(κx3+ω))

Stimulus categoryx1

(“0” or “1”)

Sigmoid trans-formation of x2

p(x1=1) = s(x2)p(x1=0) = 1-s(x2)

0

x2

1

p(x1=1)

𝑥#(�H#)

𝜅, 𝜔

𝜗

𝑥;(�H#)

𝑥%(�H#)

𝑥;(�)

𝑥%(�)

𝑥#(�)

x3(k-1)

p(x3(k))

x2(k-1)

p(x2(k))

Mathys et al. (2011). Front. Hum. Neurosci., 5:39.62

Page 63: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Updateequationforbinaryobservations

• 𝑥# ∈ {0,1} isobservedbytheagent.Eachobservationleadstoanupdateinthebeliefon𝑥%, 𝑥;, …,andsoonupthehierarchy.

• Theupdatesfor𝑥% canbederivedinthesamemannerasabove.

• Atfirst,thissimplylookslikeanuncertainty-weightedupdate.However,whenweunpack𝜎% anddoaTaylorexpansioninpowersof𝜋�#,weseethatitisagainproportionaltotheprecisionofthepredictiononthelevelbelow:

• Atallhigherlevels,theupdatesareaspreviouslyderived.

𝐼 𝑥%(�) = ln 𝑠 𝑥%

(�) + 𝑥%(�) 𝑥#

(�) − 1 −12𝜋�%

� 𝑥%(�) − 𝜇%

(�H#) %

𝜇%(�) = 𝜇%

(�H#) + 𝜎%(�)𝛿#

(�)

𝜎%(�) =

𝜋�#�

𝜋�%� 𝜋�#

� + 1= 𝜋�#

� − 𝜋�%� 𝜋�#

� %+ 𝜋�%

� %𝜋�#� ;

+ 𝑂(4)

63

Page 64: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

VAPEsandVOPEs

Theupdatesofthebeliefon𝑥# aredrivenbyvaluepredictionerrors(VAPEs)

𝜇#(�) = 𝜇#

(�H#) +𝜋��𝜋#(�) 𝑢(�) − 𝜇#

(�H#) ,

whilethe𝑥%-updatesaredrivenbyvolatilitypredictionerrors(VOPEs)

𝜇%(�) = 𝜇%

(�H#) +12 𝜅#𝑣#

(�) 𝜋�#(�)

𝜋%(�) 𝛿#

(�)

𝛿#(�) ≝

𝜎#� + 𝜇#

� − 𝜇#(�H#) %

𝜎#(�H#) + exp 𝜅#𝜇%

(�H#) + 𝜔#− 1

VAPE

VOPE

64

Page 65: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

3-levelHGFforbinaryobservations

𝑥;

𝑥%

𝜗

𝜅, 𝜔

𝑥#

𝑥;� ~𝒩 𝑥;

�H# , 𝜗

𝑥%� ~𝒩 𝑥%

�H# , exp 𝜅𝑥;� + 𝜔

𝑥#� ~Bern 𝑠 𝑥%

Mathysetal.,2011;Iglesiasetal.,2013;Vosseletal.,2014a;Hauseretal.,2014;Diaconescuetal.,2014;Vosseletal.,2014b;...

VAPE

VOPE

65

Page 66: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Notation

𝑦

ζ

≝ζ

𝑦(�H#) 𝑦(�) 𝑦(�.#)

𝑘 = 1,… , 𝑛

ζ

≝ζ

𝑦(�)

𝑘 = 1,… , 𝑛

𝑦

𝑥

≝𝑘 = 1,… , 𝑛

𝑢 𝑢(�H#) 𝑢(�) 𝑢(�.#)

𝑥(�H#) 𝑥(�) 𝑥(�.#)

66

Page 67: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

3-levelHGFforcontinuousobservations

𝑢

𝑥;

𝑥#

𝜗

𝜅#, 𝜔#

𝜋��

VAPE

VOPE

𝑥#(�)~𝒩 𝑥#

�H# , exp 𝜅#𝑥%(�) + 𝜔#

𝑥;(�)~𝒩 𝑥;

�H# , 𝜗

𝑢(�)~𝒩 𝑥#� , 𝜋��H#

𝑥%𝜅%, 𝜔%

VOPE

𝑥%(�)~𝒩 𝑥%

�H# , exp 𝜅%𝑥;(�) + 𝜔%

67

Page 68: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Variabledrift

𝑢

𝑥% 𝑧%

𝑥# 𝑧#

𝜗2 𝜗©

𝜅2, 𝜔2 𝜅©, 𝜔©

𝜋��

VAPE

VAPE

VOPE VOPE

𝑥#(�)~𝒩 𝑥#

�H# + 𝑧#(�H#), exp 𝜅2𝑥%

(�) + 𝜔2

68

Page 69: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

JumpingGaussianestimationtask

DatafromChaohui Guo

69

Page 70: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Independentmeanandvariancemodel

𝑢

𝑥ª 𝛼ª

𝑥 𝛼

𝜗2 𝜗�

𝜅2, 𝜔2 𝜅�, 𝜔�

𝜅�, 𝜔�

VAPE VAPE

VOPE VOPE

𝑢(�)~𝒩 𝑥 � , exp 𝜅�𝛼(�) + 𝜔�

70

Page 71: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

JumpingGaussianestimationtask

71

Page 72: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Actionasactiveinference

Stephanetal.(2016),Front.Hum.Neurosci.,10:55072

Page 73: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Actionasactiveinference

Stephanetal.(2016),Front.Hum.Neurosci.,10:550

Model:

Inference(i.e.,beliefupdate):

73

Page 74: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Actionasactiveinference

Stephanetal.(2016),Front.Hum.Neurosci.,10:550

Deltapriorsonmeanandprecisionofstate:

Negativeoflog-evidenceL isShannonsurpriseS:

74

Page 75: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Actionasactiveinference

Stephanetal.(2016),Front.Hum.Neurosci.,10:550

Definitionofaction:

ActioninducesgradientdescentonsurpriseS:

75

precision-weightedpredictionerror

Page 76: Hierarchical Bayesian modelling of volatile environments › wp-content › uploads › 2015 › 05 › QLS_ICT… · Abstract: «The design of a complex regulator often includes

Actionasactiveinference

Stephanetal.(2016),Front.Hum.Neurosci.,10:55076