informatics and mathematical modelling / lars kai hansen adv. signal proc. 2006 variational bayes...

48
Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Post on 21-Dec-2015

225 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Informatics and Mathematical Modelling / Lars Kai Hansen

Adv. Signal Proc. 2006

Variational Bayes 101

Page 2: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

The Bayes scene Exact averaging in

discrete/small models (Bayes networks)

Approximate averaging: - Monte Carlo methods - Ensemble/mean field

- Variational Bayes methods

Variational-Bayes .orgMLpediaWikipedia

• ISP Bayes:

ICA: mean field, Kalman, dynamical systemsNeuroImaging: Optimal signal detectorApproximate inferenceMachine learning methods

Page 3: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Bayes’ methodology

Minimal error rate obtained when detector is based on posterior

probability (Bayes decision theory)

( | ) ( )( | ) , | 1,..,

( ) n

P D M P MP M D D x n N

P D

Likelihood may contain unknown parameters

( | ) ( | ) ( | )

[ ( | )] ( | )nn

P D M P D p M d

P x p M d

Page 4: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Bayes’ methodology

Conventional approach is to use most probable parameters

* *( | ) ( | ) ( | ) ( | )n nn n

P D M P x M P x p M However: averaged model is generalization optimal (Hansen, 1999),

i.e.:

( | ) ,( | ) arg max log ( | )BayesianAverage P x D d D

P x D P x M

Page 5: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

The hidden agenda of learning

Typically learning proceeds by generalization from limited set of samples…but

We would like to identify the model that generated the data

….Choose the least complex model compatible with data

That I figured

out in 1386

Page 6: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Generalizability is defined as the expected performance on a random new sample ... the mean performance of a model on a ”fresh” data set is an unbiased estimate of generalization

Typical loss functions: <-log p(x)> , < # prediction errors > < [ g(x)-ĝ(x) ] 2 >, <log p(x,g)/p(x)p(g)>, etc

Results can be presented as ”bias-variance trade-off curves” or ”learning curves”

Generalization!

Page 7: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Generalization optimal predictive distribution

”The game of guessing a pdf” Assume: Random teacher drawn from P(θ), random

data set, D, drawn from P(x|θ) The prediction / generalization error is

( , , ) [ log ( | , )] ( | )

( ) ( , , ) ( ) ( | )

D A p x D A P x dx

A D A P P D d dD

Predictive distribution of model A Test sample distribution

Page 8: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Generalization optimal predictive distribution

We define the ”generalization functional” (Hansen, NIPS 1999)

Minimized by the ”Bayesian averaging” predictive distribution

[ (. | .,.)] log ( | ) ( | ) ( | ) ( )

( )[ ( | ) 1]

H q q x D P x dxP D dDP d

D q x D dx dD

( | ) ( )( | ) ( | )

( | ') ( ') '

P D Pq x D P x d

P D P d

Page 9: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Bias-variance trade-off and averaging

Now averaging is good, can we average ”too much”?

Define the family of tempered posterior distributions

Case: univariate normal dist. w. unknown mean parameter…

High temperature: widened posterior average

Low temperature: Narrow average

1/

1/

( ( | ) ( ))( | , ) ( | )

( ( | ') ( ')) '

T

T

P D Pq x D T P x d

P D P d

Page 10: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Bayes’ model selection, example Let three models A,B,C be given

A) x is normal N(0,1) B) x is normal N(0,σ2), σ2 is uniform U(0,∞) C) x is normal N(μ,σ2), μ, σ2 are uniform U(0,∞)

2 2 22

1

1 N

n x xn

m xN

1

1 N

x nn

xN

2 2

1

1( )

N

x n Xn

xN

Page 11: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Model A

The likelihood of N samples is given by

/ 2

21( | ) exp

2 2

NNm

P D A

Page 12: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Model B

The likelihood of N samples is given by

2 2 2

0

/ 2

222 2

2

222

( | ) ( | 0, ) ( )

1exp

2 2

22

2 2

N

NN

P D A P D P d

Nmd

NmN

Page 13: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Model C

The likelihood of N samples is given by

2 2 2

0

/ 2 2 22

2 2

31 21 22 2

( | ) ( | , ) ( , )

[( ) ]1exp

2 2

32

2 2

N

X X

NN

X

P D A P D P d d

Nd d

NNN

Page 14: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

•Bayesian model selection•C(green) is the correct model,

what if only A(red)+B(blue) are known?

Page 15: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

•Bayesian model selection•A (red) is the correct model

Page 16: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Bayesian inference• Bayesian averaging

• Caveats: Bayes can rarely be implemented exactly

Not optimal if the model family is incorrect: ”Bayes can not detect bias”

However, still asymptotically optimal if observation model is correct & prior is ”weak” (Hansen, 1999).

( | , ) ( | , ) ( | , ) ,

ˆ( | , ) ( | , ( ))

p g x D p g x p x D d

p g x D p g x D

Page 17: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Hierarchical Bayes models• Multi-level models in Bayesian averaging

( | , ) ( | , ) ( | , , ) ( | , ) ,

ˆ ˆ( | , ) ( | , ( , ( )))

p g x D p g x p x D p x D d d

p g x D p g x D D

C.P. Robert: The Bayesian Choice - A Decision-Theoretic Motivation.Springer Texts in Statistics, Springer Verlag, NewYork (1994).

G. Golub, M. Heath and G. Wahba, Generalized crossvalidationas a method for choosing a good ridge parameter,Technometrics 21 pp. 215–223, (1979).

K. Friston: A theory of Cortical Responses. Phil. Trans. R. Soc. B 360:815-836 (2005)

Page 18: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Hierarchical Bayes models

posterior( ) prior( )

( | , ) ( | , ) ( | , ) ( | ) ,

( | ) ( | )( | , )

( )

( | ) ( ) exp( ( ))

( | ) ( | ) ( | )

( ) ( )

p g x D p g x p D p D d d

p D pp D

p D

p C f

p D p D p d

f f

“learning hyper-

parameters by adjusting prior expectations”

-empirical Bayes-MacKay, (1992)

Hansen et al. (Eusipco, 2006)Cf. Boltzmann learning (Hinton et al. 1983)

Posterior

“Evidence”

Prior

Target atMaximal evidence

Page 19: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Hyperparameter dynamics

2

2 2

posterior( ) prior( )

2 2,

2

2,

( | ) ( ) exp( )

1 1

11

j jj

j j

j ML

OPTj

j ML

p C

A

N AANN

AN

Gaussian prior w adaptive hyperparameter

Discontinuity: Parameter is pruned atLow signal-to-noise Hansen & Rasmussen, Neural Comp (1994)Tipping “Relevance vector machine” (1999)

θ2A is a signal-to-noise measure

θML is maximum lik. opt.

Page 20: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Hyperparameter dynamics

Hyperparameters dynamically updated implies pruning

Pruning decisions based on SNR

Mechanism for cognitive selection, attention?

Page 21: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Hansen & Rasmussen, Neural Comp (1994)

Page 22: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 23: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 24: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 25: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 26: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 27: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 28: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 29: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 30: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 31: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 32: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Approximations needed for posteriors Approximations using asymptotic expansions

(Laplace etc) -JL Approximation of posteriors using tractable

(factorized) pdf’s by KL-fitting… Approximation of products using EP -AH Wednesday Approximation by MCMC –OWI Thursday

Page 33: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

P. Højen-Sørensen: Thesis (2001)

Illustration of approximation by a gaussian pdf

Page 34: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 35: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Variational Bayes

Notation are observables and hidden variables – we analyse the log likelihood of a mixture model

xlog ( | ) log p( , , | )p M M d d

y y θ x θ x

,n ny x

p( , , | ) p( | , , )p( | , )p( | )M M M My θ x y x θ x θ θ

Page 36: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Variational Bayes

x

x x

x

x

log ( | ) log p( , , | )

( , , | )log p( , , | ) log q( )r( )

q( )r( )

( , , | )q( )r( ) log

q( )r( )

( , | , )q( )r( ) log log ( | )

q( )r( )

p M M d d

p MM d d d d

p Md d

p Md d p M

y y θ x θ x

y θ xy θ x θ x x θ θ x

x θ

y θ xx θ θ x

x θ

θ x yx θ θ x y

x θ

Page 37: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

r( )

q( )

q( ) exp log ( , , | )

r( ) exp log ( , , | )

p M

p M

θ

x

x θ x y

θ θ x y

Variational Bayes:

Page 38: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Conjugate exponential families

1

( , | , ) , ) ( )exp[ ( , )]

( ) ( , ) ( ) exp( )

( , | , ) ( ) ( ', ') ( ) exp[ ( ( , )]

' 1

' ( , )

p M g

p h g

p M p h g

y x θ y x θ u y x

θ ν θ θ ν

y x θ θ ν θ θ ν u y x

ν ν u y x

Page 39: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Mini exercise What are the natural parameters for a Gaussian? What are the natural parameters for a MoG?

Page 40: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 41: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

•Observation model and “Bayes factor”

Page 42: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

•“Normal inverse gamma” prior – the conjugate prior for the GLM observation model

Page 43: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

•“Normal inverse gamma” prior – the conjugate prior for the GLM observation model

Page 44: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

•Bayes factor is the ratio between normalization const. of NIG’s:

Page 45: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 46: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 47: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Page 48: Informatics and Mathematical Modelling / Lars Kai Hansen Adv. Signal Proc. 2006 Variational Bayes 101

Adv. Signal Proc. 2006

Informatics and Mathematical Modelling / Lars Kai Hansen

Exercises

Matthew Beal’s Mixture of Factor Analyzers code– Code available (variational-bayes.org)

Code a VB version of the BGML for signal detection– Code available for exact posterior