bayesian theories of conditioning in a changing world · 2019. 5. 7. · learning of r(t) might be...

35
Bayesian theories of conditioning in a changing world Advanced Signal Processing 2, SS2012 Alexander Melzer SPSC, TU Graz May 6, 2012

Upload: others

Post on 04-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Bayesian theories of conditioning in a changingworld

Advanced Signal Processing 2, SS2012

Alexander Melzer

SPSC, TU Graz

May 6, 2012

Page 2: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Outline

1 Introduction

2 Classical and Statistical Conditioning

3 Sigmoid Belief Networks

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 3: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Introduction

IssueThe finding that surprising events provoke animals to learn faster

Prediction of biologically significant events

Quantitative models of conditioning

Recent interest: Reframing in explicitly statistical terms

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 4: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Introduction

Surprise causes faster learning due to signal change → increased

uncertainty

Pearce’s theory of surprise in conditioning ↔ Bayesian inference

Change is a relatively unexplored aspect of the Bayesian model space

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 5: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Stimuli differentiation

Conditioned stimuli, CSsNeutral stimuli, unknown event to the animalExamples: bells, lights

Unconditioned stimuli, USsBiologically significant reinforces for animalsExamples: food, shock

Conditional Responses, CRs

Animals’ prediction with various patterns of CS/US pairingsExamples: light → food

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 6: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Stimuli differentiation - Pavlovian conditioning

Figure: Pavlovian conditioning

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 7: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Bayesian accounts of conditioning and change

Interpret animal’s responding about the likelihood of reinforcement,given their experience

Use conditioned responding to reflect subjects’ estimates

P(US(t)|CS(t),D) (1)

where D is the training history of CSs

Different Baysian accounts can differ in what sort of model theyassume→ World Models

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 8: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

World ModelsDiscriminative

P(US(t)|CS(t),D) (2)

→ Reinforcement given the current stimuli

Generative

P(US(t),CS(t)|D) (3)

→ Predict full pattern of both stimuli and reinforcement

ChangeHow to incorporate the possibility of change?

w(t − 1)→ w(t)

w(t) = [w0(t),w1(t), ...,wn(t)]T

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 9: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Historical models

Consider experiment: Light l(t) and sound s(t)

w(t) =

[wl(t)ws(t)

]Pavlovian conditioning: Positive association with reward r(t)

∆wl(t) = αl(t) (r(t)− wl(t)) l(t) (4)

l(t) ∈ {0, 1} ... Presence of light (CS)r(t) ................ Reward (US)wl(t) .............. Strength of expectation of rewardαl(t) .............. Learning rate

Similarly,∆ws(t) = αs(t) (r(t)− ws(t)) s(t) (5)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 10: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Historical models

US-processing theory: Delta rule

wl(t) = wl(t − 1) + [αl(t) (r(t)− wl(t)) l(t)]︸ ︷︷ ︸∆wl (t)

(6)

and

ws(t) = ws(t − 1) + [αs(t) (r(t)− ws(t)) s(t)]︸ ︷︷ ︸∆ws (t)

(7)

Associative strength

V (t) = wl(t)l(t) + ws(t)s(t) (8)

Prediction error: δ(t) = r(t)− V (t)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 11: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Paradigms in conditioning

Figure: Paradigms in conditioning (Dickinson, 1980; Mackintosh, 1983)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 12: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Paradigms in conditioningUnblocking with qualitative change in reinforcement

Figure: Unblocking with qualitative change in reinforcement (Courville, Dawand Touretzky, 2006)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 13: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Paradigms in conditioningOvershadowing counteracting latent inhibition

Figure: Overshadowing counteracting latent inhibition (Courville, Daw andTouretzky, 2006)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 14: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Paradigms in conditioning

Competition between different stimuli → competition between

learning rates

Blocking: Nothing unexpected happens in second set of stimuli

(shadowed)

Unblocking: Quality improvement for such models (due to delay

inbetween rewards)

Extension to multivariate problem ⇒ Statistical formulation

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 15: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Statistical formulation

Parametrized probability distribution

P[r(t)|s(t), l(t)] (9)

Maximum likelihood inference → maximize probability P over allsamples

⇒ Three natural models of P[r(t)|s(t), l(t)]

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 16: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Three natural models of P[r(t)|s(t), l(t)]

1) Rescorla Wagner (Rescorla and Wagner, 1972)

PG [r(t)|s(t), l(t)] = N [wl l(t) + wss(t), σ2] (10)

Only σ2 added compared to (8)

Learning of r(t) might be corrupted if substantial noise is present

Downwards unblocking suggests that animals are not using PG asbasis for their predictions

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 17: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Three natural models of P[r(t)|s(t), l(t)] (cont)2) Compatitive mixture of experts (Nowlan, 1991; Jacobs et al,1991)

Mixture of Gaussian model, EM (Expectation-Maximization)Algorithm

M phase:∆wl(t) ∝ (r(t)− wl(t))ql(t) (11)

whereql(t) ∝ πl(t)e−(r(t)−wl l(t))2/2σ2

(12)

and πl(t) (together with πs(t)) are the mixing proportions.

PM [r(t)|s(t), l(t)] =

πl(t)N [wl , σ2] + πs(t)N [ws , σ

2] + π̄(t)N [w̄ , τ 2] (13)

Model captures downwards unblocking

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 18: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Three natural models of P[r(t)|s(t), l(t)] (cont)

3) Cooperative mixture of experts (Jacobs et al, 1991)

PJ [r(t)|s(t), l(t)] = N [wlπl(t)l(t) + wsπs(t)s(t), σ2] (14)

Idea:P[wl(t)|r ] = N [r , ρ−1

l (t)] (15)

P[ws(t)|r ] = N [r , ρ−1s (t)] (16)

where ρl(t) and ρs(t) are the inverse variances.Thus,

σ2 = (ρl(t) + ρs(t))−1 (17)

πl(t) = ρl(t)σ2 πs(t) = ρs(t)σ2 (18)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 19: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Three natural models of P[r(t)|s(t), l(t)] (cont)

3) Cooperative mixture of experts (Jacobs et al, 1991)Normative learning rules

∆wl = αwπl(t)

ρl(t)δ(t) (19)

where δ(t) = r(t)− πl(t)wl(t)− πs(t)ws(t) is the prediction error

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 20: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Three natural models of P[r(t)|s(t), l(t)] (summary)

1) Rescorla Wagner

PG [r(t)|s(t), l(t)] = N [wl l(t) + wss(t), σ2]

2) Compatitive mixture of experts

PM [r(t)|s(t), l(t)] = πl(t)N [wl , σ2] + πs(t)N [ws , σ

2] + π̄(t)N [w̄ , τ 2]

3) Cooperative mixture of experts

PJ [r(t)|s(t), l(t)] = N [wlπl(t)l(t) + wsπs(t)s(t), σ2]

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 21: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Second-order conditioning

Paradigm Phase 1 training Phase 2 training Test1st-order cond. S1-US - S1?2nd-order cond. S1-US S2-S1 S2?Sensory precond. S2-S1 S1-US S2?

Table: Phases of 1st and 2nd order conditioning

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 22: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Second-order conditioning (cont)

Figure: Transience of second order conditioning (Gerwitz and Davis, 2000)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 23: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Second-order conditioning (cont)

Figure: Schematic representation of hypothetical associations (Gerwitz andDavis, 2000)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 24: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Second-order conditioning

Figure: Second-order fear conditioning (Gerwitz and Davis, 2000)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 25: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Sigmoid Belief Networks

Conditional probabilities defined as functions of weighted sums of parentnodes

P(yj = 1|x1, ..., xc ,wm,m) =1

1 + exp(−∑

i wijxi − wyj )(20)

and

P(yj = 0|x1, ..., xc ,wm,m) = 1− P(yj = 1|x1, ..., xc ,wm,m)

wij .... weight: influence of the parent node xi on the child node yi

wyj ... bias termwm ... model parameters for model structure m

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 26: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Model representation

Directed graph model

Figure: Sigmoid Belief Network (Courville et al, 2003)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 27: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Sigmoid Belief Likelihood

Stimuli are mutually independent (latent causes) → conditional jointprobability of the observed stimuli:

s∏j=1

P(yj |x1, ..., xc ,wm,m)

Similarly, we assume trials drawn from a stationary process. Resultinglikelihood function of the training data:

P(D|wm,m) =T∏

t=1

∑x

s∏j=1

P(yj(t)|x,wm,m)P(x|wm,m) (21)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 28: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Prediction under Parameter Uncertainty

Consider particular network structure m with parameters wm

Uncertainty associated with the parameters → posterior distributionover wm

p(wm|m,D) ∝ P(D|wm,m)︸ ︷︷ ︸(21)

p(wm|m)︸ ︷︷ ︸prior distribution

Assume model parameters are a priori independent:

p(wm|m) =∏ij

p(wij)∏

i

p(wxi )∏

j

p(wyj )

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 29: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Prediction under Parameter Uncertainty (cont)

Measure uncertainty by testing the CR (Conditioned Response):

P(US |CS ,m,D) =

∫P(US |CS ,wm,m,D)p(wm|m,D)dwm (22)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 30: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Prediction under Model Uncertainty

Which model m is the correct one to choose?

Standard Bayesian approach: marginalize out influence of modelchoice

P(US |CS ,D) =∑m

P(US |CS ,m,D)P(m|D) (23)

Posterior over models:

P(m,D) =P(D|m)P(m)∑

m′ P(D|m′)P(m′),

P(D,m)︸ ︷︷ ︸marginal likelihood

=

∫P(D|wm,m)p(wm|m)dwm

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 31: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Prediction under Model Uncertainty (cont)

Tradeoff between model fidelity and complexity

Figure: Marginal likelihood (Courville et al, 2003)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 32: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Experiments and results

Figure: Experiments summary (Yin et al, 1994)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 33: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Experiments and results (cont)

Figure: Simulation results (Courville et al, 2003)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 34: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

Experiments and results (cont)

Figure: Corresponding Sigmoid Belief Networks (Courville et al, 2003)

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world

Page 35: Bayesian theories of conditioning in a changing world · 2019. 5. 7. · Learning of r(t) might be corrupted if substantial noise is present Downwards unblocking suggests that animals

Introduction Classical and Statistical Conditioning Sigmoid Belief Networks References

References

A. C. Courville, N. D. Daw, G. J. Gordon, and D. S. Touretzky.Model uncertainty in classical conditioning.pages 977–984. MIT Press, 2003.

A. C. Courville, N. D. Daw, G. J. Gordon, and D. S. Touretzky.Bayesian theories of conditioning in a changing world.pages 294–300. Elsevier, 2006.

Peter Dayan, Cognitive Sciences, and Theresa Long.Statistical models of conditioning.In NIPS, pages 117–123. MIT Press, 1999.

Davis M. Gewirtz JC.Using pavlovian higher-order conditioning paradigms to investigatethe neural substrates of emotional learning and memory.University of Minnesota, Minneapolis, 2000.

Alexander Melzer SPSC, TU Graz Bayesian theories of conditioning in a changing world