bayesian theories of conditioning in a changing world · 2019. 5. 7. · learning of r(t) might be...

Bayesian theories of conditioning in a changingworld

Advanced Signal Processing 2, SS2012

Alexander Melzer

SPSC, TU Graz

May 6, 2012

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Outline

1 Introduction

2 Classical and Statistical Conditioning

3 Sigmoid Belief Networks

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world


Introduction

IssueThe finding that surprising events provoke animals to learn faster

Prediction of biologically significant events

Quantitative models of conditioning

Recent interest: Reframing in explicitly statistical terms



Introduction

Surprise causes faster learning due to signal change → increased

uncertainty

Pearce’s theory of surprise in conditioning ↔ Bayesian inference

Change is a relatively unexplored aspect of the Bayesian model space



Stimuli differentiation

Conditioned stimuli, CSsNeutral stimuli, unknown event to the animalExamples: bells, lights

Unconditioned stimuli, USsBiologically significant reinforces for animalsExamples: food, shock

Conditional Responses, CRs

Animals’ prediction with various patterns of CS/US pairingsExamples: light → food



Stimuli differentiation - Pavlovian conditioning

Figure: Pavlovian conditioning



Bayesian accounts of conditioning and change

Interpret animal’s responding about the likelihood of reinforcement,given their experience

Use conditioned responding to reflect subjects’ estimates

P(US(t)|CS(t),D) (1)

where D is the training history of CSs

Different Baysian accounts can differ in what sort of model theyassume→ World Models



World ModelsDiscriminative

P(US(t)|CS(t),D) (2)

→ Reinforcement given the current stimuli

Generative

P(US(t),CS(t)|D) (3)

→ Predict full pattern of both stimuli and reinforcement

ChangeHow to incorporate the possibility of change?

w(t − 1)→ w(t)

w(t) = [w0(t),w1(t), ...,wn(t)]T



Historical models

Consider experiment: Light l(t) and sound s(t)

w(t) =

[wl(t)ws(t)

]Pavlovian conditioning: Positive association with reward r(t)

∆wl(t) = αl(t) (r(t)− wl(t)) l(t) (4)

l(t) ∈ {0, 1} ... Presence of light (CS)r(t) ................ Reward (US)wl(t) .............. Strength of expectation of rewardαl(t) .............. Learning rate

Similarly,∆ws(t) = αs(t) (r(t)− ws(t)) s(t) (5)



Historical models

US-processing theory: Delta rule

wl(t) = wl(t − 1) + [αl(t) (r(t)− wl(t)) l(t)]︸︷︷︸∆wl (t)

(6)

and

ws(t) = ws(t − 1) + [αs(t) (r(t)− ws(t)) s(t)]︸︷︷︸∆ws (t)

(7)

Associative strength

V (t) = wl(t)l(t) + ws(t)s(t) (8)

Prediction error: δ(t) = r(t)− V (t)



Paradigms in conditioning

Figure: Paradigms in conditioning (Dickinson, 1980; Mackintosh, 1983)



Paradigms in conditioningUnblocking with qualitative change in reinforcement

Figure: Unblocking with qualitative change in reinforcement (Courville, Dawand Touretzky, 2006)



Paradigms in conditioningOvershadowing counteracting latent inhibition

Figure: Overshadowing counteracting latent inhibition (Courville, Daw andTouretzky, 2006)



Paradigms in conditioning

Competition between different stimuli → competition between

learning rates

Blocking: Nothing unexpected happens in second set of stimuli

(shadowed)

Unblocking: Quality improvement for such models (due to delay

inbetween rewards)

Extension to multivariate problem ⇒ Statistical formulation



Statistical formulation

Parametrized probability distribution

P[r(t)|s(t), l(t)] (9)

Maximum likelihood inference → maximize probability P over allsamples

⇒ Three natural models of P[r(t)|s(t), l(t)]



Three natural models of P[r(t)|s(t), l(t)]

1) Rescorla Wagner (Rescorla and Wagner, 1972)

PG [r(t)|s(t), l(t)] = N [wl l(t) + wss(t), σ2] (10)

Only σ2 added compared to (8)

Learning of r(t) might be corrupted if substantial noise is present

Downwards unblocking suggests that animals are not using PG asbasis for their predictions



Three natural models of P[r(t)|s(t), l(t)] (cont)2) Compatitive mixture of experts (Nowlan, 1991; Jacobs et al,1991)

Mixture of Gaussian model, EM (Expectation-Maximization)Algorithm

M phase:∆wl(t) ∝ (r(t)− wl(t))ql(t) (11)

whereql(t) ∝ πl(t)e−(r(t)−wl l(t))2/2σ2

(12)

and πl(t) (together with πs(t)) are the mixing proportions.

PM [r(t)|s(t), l(t)] =

πl(t)N [wl , σ2] + πs(t)N [ws , σ

2] + π̄(t)N [w̄ , τ 2] (13)

Model captures downwards unblocking



Three natural models of P[r(t)|s(t), l(t)] (cont)

3) Cooperative mixture of experts (Jacobs et al, 1991)

PJ [r(t)|s(t), l(t)] = N [wlπl(t)l(t) + wsπs(t)s(t), σ2] (14)

Idea:P[wl(t)|r ] = N [r , ρ−1

l (t)] (15)

P[ws(t)|r ] = N [r , ρ−1s (t)] (16)

where ρl(t) and ρs(t) are the inverse variances.Thus,

σ2 = (ρl(t) + ρs(t))−1 (17)

πl(t) = ρl(t)σ2 πs(t) = ρs(t)σ2 (18)



Three natural models of P[r(t)|s(t), l(t)] (cont)

3) Cooperative mixture of experts (Jacobs et al, 1991)Normative learning rules

∆wl = αwπl(t)

ρl(t)δ(t) (19)

where δ(t) = r(t)− πl(t)wl(t)− πs(t)ws(t) is the prediction error



Three natural models of P[r(t)|s(t), l(t)] (summary)

1) Rescorla Wagner

PG [r(t)|s(t), l(t)] = N [wl l(t) + wss(t), σ2]

2) Compatitive mixture of experts

PM [r(t)|s(t), l(t)] = πl(t)N [wl , σ2] + πs(t)N [ws , σ

2] + π̄(t)N [w̄ , τ 2]

3) Cooperative mixture of experts

PJ [r(t)|s(t), l(t)] = N [wlπl(t)l(t) + wsπs(t)s(t), σ2]



Second-order conditioning

Paradigm Phase 1 training Phase 2 training Test1st-order cond. S1-US - S1?2nd-order cond. S1-US S2-S1 S2?Sensory precond. S2-S1 S1-US S2?

Table: Phases of 1st and 2nd order conditioning



Second-order conditioning (cont)

Figure: Transience of second order conditioning (Gerwitz and Davis, 2000)



Second-order conditioning (cont)

Figure: Schematic representation of hypothetical associations (Gerwitz andDavis, 2000)



Second-order conditioning

Figure: Second-order fear conditioning (Gerwitz and Davis, 2000)



Sigmoid Belief Networks

Conditional probabilities defined as functions of weighted sums of parentnodes

P(yj = 1|x1, ..., xc ,wm,m) =1

1 + exp(−∑

i wijxi − wyj )(20)

and

P(yj = 0|x1, ..., xc ,wm,m) = 1− P(yj = 1|x1, ..., xc ,wm,m)

wij .... weight: influence of the parent node xi on the child node yi

wyj ... bias termwm ... model parameters for model structure m



Model representation

Directed graph model

Figure: Sigmoid Belief Network (Courville et al, 2003)



Sigmoid Belief Likelihood

Stimuli are mutually independent (latent causes) → conditional jointprobability of the observed stimuli:

s∏j=1

P(yj |x1, ..., xc ,wm,m)

Similarly, we assume trials drawn from a stationary process. Resultinglikelihood function of the training data:

P(D|wm,m) =T∏

t=1

∑x

s∏j=1

P(yj(t)|x,wm,m)P(x|wm,m) (21)



Prediction under Parameter Uncertainty

Consider particular network structure m with parameters wm

Uncertainty associated with the parameters → posterior distributionover wm

p(wm|m,D) ∝ P(D|wm,m)︸︷︷︸(21)

p(wm|m)︸︷︷︸prior distribution

Assume model parameters are a priori independent:

p(wm|m) =∏ij

p(wij)∏

i

p(wxi )∏

j

p(wyj )



Prediction under Parameter Uncertainty (cont)

Measure uncertainty by testing the CR (Conditioned Response):

P(US |CS ,m,D) =

∫P(US |CS ,wm,m,D)p(wm|m,D)dwm (22)



Prediction under Model Uncertainty (cont)

Tradeoff between model fidelity and complexity

Figure: Marginal likelihood (Courville et al, 2003)



Experiments and results

Figure: Experiments summary (Yin et al, 1994)



Experiments and results (cont)

Figure: Simulation results (Courville et al, 2003)



Experiments and results (cont)

Figure: Corresponding Sigmoid Belief Networks (Courville et al, 2003)



References

A. C. Courville, N. D. Daw, G. J. Gordon, and D. S. Touretzky.Model uncertainty in classical conditioning.pages 977–984. MIT Press, 2003.

A. C. Courville, N. D. Daw, G. J. Gordon, and D. S. Touretzky.Bayesian theories of conditioning in a changing world.pages 294–300. Elsevier, 2006.

Peter Dayan, Cognitive Sciences, and Theresa Long.Statistical models of conditioning.In NIPS, pages 117–123. MIT Press, 1999.

Davis M. Gewirtz JC.Using pavlovian higher-order conditioning paradigms to investigatethe neural substrates of emotional learning and memory.University of Minnesota, Minneapolis, 2000.


bayesian theories of conditioning in a changing world · 2019. 5. 7. · learning of r(t) might be...

Documents