bayesian learning & estimation theory

21
Bayesian Learning & Estimation Theory

Upload: mignon

Post on 05-Jan-2016

77 views

Category:

Documents


2 download

DESCRIPTION

Bayesian Learning & Estimation Theory. Example: For Gaussian likelihood P ( x | q ) = N ( x |  ,  2 ),. Objective of regression: Minimize error. E ( w ) = ½ S n ( t n - y ( x n , w ) ) 2. L =. Maximum likelihood estimation. Precision b =1/ s 2. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Bayesian Learning & Estimation Theory

Bayesian Learning & Estimation Theory

Page 2: Bayesian Learning & Estimation Theory

Maximum likelihood estimation

L =

• Example: For Gaussian likelihood P(x|) = N (x|,2),

Objective of regression: Minimize error

E(w) = ½ n ( tn - y(xn,w) )2

Page 3: Bayesian Learning & Estimation Theory

A probabilistic view of linear regression

• Compare to error function: E(w) = ½ n ( tn - y(xn,w) )2

• Since argminw E(w) = argmaxw , regression is equivalent to ML estimation of w

Precision

=1/2

Page 4: Bayesian Learning & Estimation Theory

Bayesian learning

• View the data D and parameter as random variables (for regression, D = (x, t) and = w)

• The data induces a distribution over the parameter:

P( |D) = P(D,) / P(D) P(D,)

• Substituting P(D,) = P(D |) P(), we obtain Bayes’ theorem:

P( |D) P(D |) P()Posterior Likelihood x Prior

Page 5: Bayesian Learning & Estimation Theory

Bayesian prediction

• Predictions (eg, predict t from x using data D) are mediated through the parameter:

P(prediction|D) = P(prediction|) P(|D) d

• Maximum a posteriori (MAP) estimation:

MAP = argmaxP(|D)

P(prediction|D) P(prediction| MAP)

– Accurate when P(|D) is concentrated on MAP

Page 6: Bayesian Learning & Estimation Theory

A probabilistic view of regularized regression

• E(w) = ½ n ( tn - y(xn,w) )2 + /2m wm2

• Prior: w’s are IID Gaussian

p(w) = m (1/ 2-1 ) exp{- wm2 / 2 }

• Since argminw E(w) = argmaxw p(t|x,w) p(w), regularized regression is equivalent to MAP estimation of w

ln p(w)ln p(t|x,w)

Page 7: Bayesian Learning & Estimation Theory

Bayesian linear regression

• Likelihood:

– specifies precision of data noise

• Prior:

– specifies precision of weights

• Posterior:

– This is an M+1 dimensional Gaussian density

• Prediction:

m = 0

M

wm| 0,-1 Computed using linear algebra (see

textbook)

Page 8: Bayesian Learning & Estimation Theory

Example: y(x) = w0 + w1x

No data

1st point

2nd point

20th point

...

Data Posteriory(x) sampled

from posteriorPriorLikelihood

Page 9: Bayesian Learning & Estimation Theory

Example: y(x) = w0 + w1x + … + wMxM

• M = 9, = 5x10-3: Gives a reasonable range of functions

• = 11.1: Known precision of noise

Mean and one std dev of the predictive distribution

Page 10: Bayesian Learning & Estimation Theory

Example: y(x) = w0 + w11(x) + … + wMM(x)

Gaussian basis functions:

0 1

Page 11: Bayesian Learning & Estimation Theory

How are we doing on the pass sequence?

• Least squares regression…

Han

d-l

abe

led

ho

rizo

nta

l co

ord

inat

e, t

The red line doesn’t reveal different levels of uncertainty in predictions

Cross validation reduced the training data, so the red line isn’t as accurate as it should be

Choosing a particular M and w seems wrong – we should hedge our bets

Page 12: Bayesian Learning & Estimation Theory

How are we doing on the pass sequence?

Ha

nd

-lab

ele

d h

ori

zon

tal

co

ord

ina

te, t

The red line doesn’t reveal different levels of uncertainty in predictions

Cross validation reduced the training data, so the red line isn’t as accurate as it should be

Choosing a particular M and w seems wrong – we should hedge our bets

Han

d-l

abe

led

ho

rizo

nta

l co

ord

inat

e, t

Bayesian regression

Page 13: Bayesian Learning & Estimation Theory

Estimation theory

• Provided with a predictive distribution p(t|x), how do we estimate a single value for t?– Example: In the pass sequence,

Cupid must aim at and hit the man in the white shirt, without hitting the man in the striped shirt

• Define L(t,t*) as the loss incurred by estimating t*

when the true value is t• Assuming p(t|x) is correct, the expected loss is

E[L] = t L(t,t*) p(t|x) dt

• The minimum loss estimate is found by minimizing E[L] w.r.t. t*

Page 14: Bayesian Learning & Estimation Theory

Squared loss

• A common choice: L(t,t*) = ( t - t* )2

E[L] = t ( t - t* )2 p(t|x) dt– Not appropriate for Cupid’s problem

• To minimize E[L] , set its derivative to zero:

dE[L]/dt* = -2t ( t - t* ) p(t|x) dt = 0

-2t t p(t|x)dt + t* = 0

• Minimum mean squared error (MMSE) estimate:

t* = E[t|x] = t t p(t|x)dt

For regression: t* = y(x,w)

Page 15: Bayesian Learning & Estimation Theory

Other loss functions

Squared loss

Absolute loss

Page 16: Bayesian Learning & Estimation Theory

Absolute loss

L = |t*-t1| + |t*-t2| + |t*-t3| + |t*-t4| + |t*-t5| + |t*-t6| + |t*-t7|

• Consider moving t* to the left by – L decreases by 6 and increases by – Changes in L are balanced when t* = t4

• The median of t under p(t|x) minimizes absolute loss

• Important: The median is invariant to monotonic transformations of t

tt*t1 t2 t3 t4 t5 t6 t7

Mean and medianMedian Mean

Page 17: Bayesian Learning & Estimation Theory

D-dimensional estimation

• Suppose t is D-dimensional, t = (t1,…,tD)– Example: 2-dimensional tracking

• Approach 1: Minimum marginal loss estimation

– Find td* that minimizes t L(td,td*) p(td|x) dtd

• Approach 2: Minimum joint loss estimation– Define joint loss L(t,t*)

– Find t* that minimizes t L(t,t*) p(t|x) dt

Page 18: Bayesian Learning & Estimation Theory

Questions?

Page 19: Bayesian Learning & Estimation Theory

Feature, xHan

d-l

abe

led

ho

rizo

nta

l co

ord

inat

e, t

Compute 1st moment: x = 224

How are we doing on the pass sequence?• Bayesian regression and estimation enables us to track

the man in the striped shirt based on labeled data• Can we track the man in the white shirt?

t = 290

0 320

Horizontal location

Fra

cti

on

of

pix

els

in

colu

mn

wit

h i

nte

nsi

ty >

0.9

Man in white shirt is

occluded

Page 20: Bayesian Learning & Estimation Theory

How are we doing on the pass sequence?• Bayesian regression and estimation enables us to

track the man in the striped shirt based on labeled data• Can we track the man in the white shirt?

Not very well.

Feature, xHan

d-l

abe

led

ho

rizo

nta

l co

ord

inat

e, t

Regression fails to identify that there really are two classes of solution

Page 21: Bayesian Learning & Estimation Theory