hw2-lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 probability ratio in mcmc...

26
1 %HW2-Sivia N=10000; alphas=rand(N,1); alphas=(alphas-1/2)*pi; xs=tan(alphas); xs=sort(xs); figure; plot(xs) meanest=mean(xs) medest=median(xs) llh=[]; for x0=-1:0.01:1 llh=[llh sum(-log((1+(x-x0).^2)))]; end figure; plot(llh) HW2-Lighthouse problem meanest =-0.1099 medest =0.0047 0 50 100 150 200 250 ï6 ï5.5 ï5 ï4.5 ï4 ï3.5 ï3 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 ï3500 ï3000 ï2500 ï2000 ï1500 ï1000 ï500 0 500 1000 1500

Upload: others

Post on 31-Jan-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

1

%HW2-Sivia N=10000; alphas=rand(N,1); alphas=(alphas-1/2)*pi; xs=tan(alphas); xs=sort(xs); figure; plot(xs) meanest=mean(xs) medest=median(xs) llh=[]; for x0=-1:0.01:1 llh=[llh sum(-log((1+(x-x0).^2)))]; end figure; plot(llh)

HW2-Lighthouse problem

meanest =-0.1099 medest =0.0047

0 50 100 150 200 2506

5.5

5

4.5

4

3.5

3

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 100003500

3000

2500

2000

1500

1000

500

0

500

1000

1500

Page 2: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

2

Statistical Data models, Non-parametrics,

Dynamics

Non-informative, proper and improper priors

•  For real quantity bounded to interval, standard prior is uniform distribution

•  For real quantity, unbounded, standard is uniform - but with what density?

•  For real quantity on half-open interval, standard prior is f(s)=1/s - but integral diverges!

•  Divergent priors are called improper - they can only be used with convergent likelihoods

Page 3: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

3

Dirichlet Distribution- prior for discrete distribution

Mean of Dirichlet - Laplaces estimator

Page 4: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

4

Occurence table probability

Occurence table probability Uniform prior:

Page 5: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

5

Non-parametric inference

•  How to perform inference about a distribution without assuming a distribution family?

•  A distribution over reals can be approximated by a piecewise uniform distribution a mixture of real distributions

•  But how many parts? This is non-parametric inference

Non-parametric inference Change-points, Rao-Blackwell

•  Given times for events (eg coal-mining disasters) Infer a piecewise constant intensity function (change-point problem)

•  State is set of change-points with intensities inbetween •  But how many pieces? This is non-parametric inference •  MCMC: Given current state, propose change in segment

bounadry or intensity •  But it is possible to integrate out intensities proposed

Page 6: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

6

Probability ratio in MCMC

For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α), were the counts and obtained by tossing a ‘coin’ with success probability or not? Compute model probability ratio as in HW1. Also, the total number of breakpoints has prior distribution Poisson with parameter (average) . Probability ratio in favor of split :

!

n j

!

n j+1

!

"

!

"

Averging MCMC run, positions and number of breakpoints

Page 7: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

7

Averging MCMC run, positions with uniform test data

Mixture of Normals

Page 8: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

8

Mixture of Normals elimination of nuisance parameters

Mixture of Normals elimination of nuisance parameters

(integrate using normalization constant of Gaussian and Gamma distributions)

Page 9: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

9

Matlab Mixture of Normals, MCMC (AutoClass method)

function [lh,lab,trlpost,trm,trstd,trlab,trct,nbounc]= mmnonu1(x,N,k,labi,NN); %[lh,lab,trlpost,trm,trstd,trlab,trct,nbounc]= % MMNONU1(x,N,k,labi,NN); %inputs % 1D MCMC mixture modelling, % x - 1D data column vector % N - MCMC iterations. % k - number of components %lab,labi - component labelling of data vector) % NN - thinning (optional)

Matlab Mixture of Normals, MCMC

function [lab,trlh,trm,trstd,trlab,trct,nbounc]= mmnonu1(x,N,k,labi,NN); %[lh,lab,trlpost,trm,trstd,trlab,trct,nbounc]= % MMNONU1(x,N,k,labi,NN); %outputs %trlh - thinned trace of log probability (optional) %trm - thinned trace of means vector (optional) %trstd - thinned vector of standard deviations (optional) %trlab - thinned trace of labels vector (size(x,1) by N/NN (optional) %trct - thinned trace of mixing proportions

Page 10: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

10

Matlab Mixture of Normals, MCMC

N=10000; NN=100; x=[randn(100,1)-1;randn(100,1)*3;randn(100,1)+1]; % 3 components synthetic data k=2; labi=ceil(rand(size(x))*2); [llhc,lab2,trl,trm,trstd,trlab,trct,nbounc]= … mmnonu1(x,N,k,labi,NN); [llhc2,lab2,trl2,trm2,trstd2,trlab2,trct2,nbounc]=… mmnonu1(x,N,k,lab2,NN); … (k=3, 4, 5)

Matlab Mixture of Normals, MCMC

The three components and the joint empirical distr

Page 11: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

11

Matlab Mixture of Normals, MCMC Putting them

together makes the identification seem harder.

Matlab Mixture of Normals, MCMC

K=2:

std

mean

Page 12: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

12

Matlab Mixture of Normals, MCMC

K=3:

std

mean

Burn in progressing

Matlab Mixture of Normals, MCMC

K=3:

std

mean

Burnt in

Page 13: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

13

Matlab Mixture of Normals, MCMC

K=4: Low prob

std

mean

No focus- No interpretation as 4 clusters

Matlab Mixture of Normals, MCMC

K=5: Low prob

std

mean

Page 14: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

14

Matlab Mixture of Normals, MCMC

X sample: 1-100 : (-1 1) 101:200: (0 3) 201:300: (1 1)

Trace of state labels

Unsorted sample label trace sorted

Mixtures of multivariate normals

•  This works the same way, but instead of a Gamma distribution for the variance we use the Wishart distribution, a matrix-valued distribution over covariance matrices.

•  Competes well with both clustering and Expectation Maximization, which are prone to overfitting (clustering cannot handle overlapping components)

Page 15: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

15

Dynamic Systems, time series

•  An abundance of linear prediction models exists

•  For non-linear and Chaotic systems, method was developed in 1990:s (Santa Fe)

•  Gershenfeld, Weigend: The Future of Time Series

•  Online/offline: prediction/retrodiction

Page 16: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

16

Berry and Linoff have eloquently stated their preferences with ���the often quoted sentence: "Neural networks are a good choice for most classification problems when the results of the model are more important than understanding how the model works".��� “Neural networks typically give the right answer”

Dynamic Systems and Taken’s Theorem

•  Lag vectors (xi,x(i-1),…x(i-T), for all i, occupy a submanifold of E^T, if T is large enough

•  This manifold is ‘diffeomorphic’ to original state space and can be used to create a good dynamic model

•  Taken’s theorem assumes no noise and must be empirically verified.

Page 17: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

17

Dynamic Systems and Taken’s Theorem

Santa Fe 1992 Competition

Unstable Laser

Intensive Care Unit Data, Apnea

Exchange rate Data

Synthetic series with drift

White Dwarf Star Data

Bach’s unfinished Fugue

Page 18: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

18

Stereoscopic 3D view of state space manifold, series A (Laser) The points seem to lie on a surface, which means that a lag-vector of 3 gives good prediction of the time series. The surface is either produced for a training batch, or produced on-the-fly from neighboring data points (possibly downweighing very old points)

Figure in book misleading: Origin where surface touches ground

Page 19: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

19

Variational Bayes

Page 20: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

20

True trajectory in state space (Valpola-Karhunen 2002)

Reconstructed trajectory in inferred state space

Page 21: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

21

Hidden Markov Models

•  Given a sequence of discrete signals xi •  Is there a model likely to have produced xi

from a sequence of states si of a Finite Markov Chain?

•  P(.|s) - transition probability in state s •  S(.|s) - signal probability in state s •  Speech Recognition, Bioinformatics, …

Hidden Markov Models function [Pn,Sn,stn,trP,trS,trst,tll]=… hmmsim(A,N,n,s,prop,Po,So,sto,NN); %[Pn,Sn,stn,trP,trS,trst]=HMMSIM(A,N,n,s,prop,Po,So,sto,NN); % Compute trace of posterior for hmm parameters % A - the sequence of signals % N - the length of trace % n - number of states in Markov chain % s - number of signal values % prop - proposal stepsize % optional inputs: % Po - starting transition matrix (each of n columns a discrete pdf % in n-vector % So - starting signal matrix (each of n columns a discrete pdf

Page 22: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

22

Hidden Markov Models function [Pn,Sn,stn,trP,trS,trst,tll]=… hmmsim(A,N,n,s,prop,Po,So,sto,NN); % in s-vector % sto - starting state sequence (congruent to vector A) % NN - thining of trace, default 10 % outputs % Pn - last transition matrix in trace % Sn - last signal emission matrix % stn - last hidden state vector (congruent to A) % trP - trace of transition matrices % trS - trace of signal matrices % trace of hidden state vectors

Hidden Markov Models

Page 23: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

23

Hidden Markov Models

Hidden Markov Models

Page 24: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

24

Hidden Markov Models Over 100000 iterations, burnin is visible 2 states, 2 signals P-transition matrix S-signaling

Chapman Kolmogorov version of Bayes’ rule

f (!t | Dt ) " f (dt | !t)# f (!t |!t$1) f (!t$1 | Dt$1 )d!t$1

Page 25: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

25

Chapman Kolmogorov version of Bayes’ rule

f (!t | Dt ) " f (dt | !t)# f (!t |!t$1) f (!t$1 | Dt$1 )d!t$1

Observation and video based particle filter tracking

Defence: tracking with heterogeneous observations

Crowd analysis: tracking from video

Page 26: HW2-Lighthouse problemstefan/kurs1447/kddl5_2011.pdf · 2011-02-11 · 6 Probability ratio in MCMC For a proposed merge of intervals j and j+1, with sizes proportional to (α,1-α),

26

Cycle in Particle filter

Importance (weighted) sample Resampled ordinary sample Diffused sample Weighted by likelihood X- state Z - Observation

Time step cycle

Particle filter- general tracking