Download - mcmc-A comparative study
![Page 1: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/1.jpg)
MarkovChainMonteCarlotheory and worked examples
Dario Digiuni,
A.A. 2007/2008
![Page 2: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/2.jpg)
Markov Chain Monte Carlo
• Class of sampling algorithms
• High sampling efficiency
• Sample from a distribution with unknown normalization constant
• Often the only way to solve problems in time polynomial in the number of dimensions
e.g. evaluation of a convex body volume
![Page 3: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/3.jpg)
MCMC: applications
• Statistical Mechanics
Metropolis-Hastings
• Optimization
▫ Simulated annealing
• Bayesian Inference
▫ Metropolis-Hastings
▫ Gibbs sampling
![Page 4: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/4.jpg)
The Monte Carlo principle• Sample a set of N independent and identically-distributed variables
• Approximation of the target p.d.f. with the empirical expression
… then approximation of the integrals!
![Page 5: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/5.jpg)
Rejection Sampling
1. It needs finding M!2. Low acceptance rate
![Page 6: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/6.jpg)
Idea• I can use the previously sampled value to find the following one
• Exploration of the configuration space by means of Markov Chains:
def .: Markov process
def .: Markov chain
![Page 7: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/7.jpg)
Invariant distribution• Stability conditions:
1. Irreducibility= for every state there exists a finite probability to visitany other state
2. Aperiodicity = there are no loops.
• Sufficient condition
1. Detailed balance principle
MCMC algorithms are aperiodic, irreducible Markov chains havingthe target pdf as the invariant distribution
![Page 8: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/8.jpg)
Example• What is the probability to find the lift at the ground floor in a three
floor building?
▫ 3 states Markov chain
▫ Lift= Random Walker
▫ Transition matrix
▫ Looking for the invariant distribution
… burn-in …
![Page 9: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/9.jpg)
Example - 2• I can apply the matrix T on the right to any of the states, e.g.
• Google’s PageRank:
▫ Websites are the states, T is defined by the number of hyperlinks amongthem and the user is the random walker:
The webpages are displayed following the invariant distribution!
~ 50% is the probability to findthe lift at the ground floor
homogeneousMarkov chain
![Page 10: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/10.jpg)
Metropolis-Hastings• Given the target distribution
1. Choose a value for
2. Sample from a proposal distribution
3. Accept the new value with probability
4. Return to 1
Ratio independentof the normalization!
Equal in Metropolis algorithm
equivalent to T
![Page 11: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/11.jpg)
M.-H. – Pros and Cons
• Very general sampling method:
▫ I can sample from a unnormalized distribution
▫ It does not require to provide upper bound for the function
• Good working depends on the choice of the proposal distribution
▫ well-mixing condition
![Page 12: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/12.jpg)
M.-H. - Example• In Statistical Mechanics it is important to evalue the partition
function,
e.g. Ising model
Sum every possible spin state:In a 10 x 10 x 10 spin cube,I would have to sum over
Possible states = UNFEASIBLE
MCMC APPROACH:
1. Evaluate the system’s energy
2. Pick up a spin at random and flip it:
1. If energy decreases, this is the new spin configuration
2. If energy increases, this is the new spin configuration withprobability
![Page 13: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/13.jpg)
Simulated Annealing
• It allows one to find the global maximum of a generic pdf
▫ No comparison between the value of local minima required
▫ Application to the maximum-likelihood method
• It is a non-homogeneous Markov chain whose invariant distributionkeeps changing as follows:
![Page 14: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/14.jpg)
Simulated Annealing: example• Let us apply the algorithm to a simple, 1-dimensional case
• The optimal cooling scheme is
![Page 15: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/15.jpg)
Simulated Annealing: Pros and Cons
• The global maximum is univocally determined▫ Even if walker starts next to a local (non global!) maximum, it converges to the
true global maximum
• It requires a good tuning of the parameters
![Page 16: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/16.jpg)
Gibbs Sampler
• Optimal method to marginalize multidimensional distributions
• Let us assume we have a n-dimensional vector and that we know allthe conditional probability expression for the pdf
• We take the following proposal distribution:
![Page 17: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/17.jpg)
Gibbs Sampler - 2
• Then:
very efficientmethod!
![Page 18: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/18.jpg)
Gibbs Sampler – practically
![Page 19: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/19.jpg)
Gibbs Sampler – practically
1. §Initialize
2. for (i=0 ; i < N; i++)
• Sample
• Sample
• Sample
• Sample
fix n-1 coordinates and sample from the resulting pdf
![Page 20: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/20.jpg)
Gibbs Sampler – example
• Let us pretend we cannot determine the normalizationconstant…
… but we can make a comparison with the true marginalizedpdf…
![Page 21: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/21.jpg)
Gibbs Sampler – results
• Comparison between GibbsSampling and the true M.-H.sampling from the marginalized pdf
• Good c2 agreement
![Page 22: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/22.jpg)
A complex MCMC application
A radioactive source decays with frequency l1 and a detector recordsonly every k1 –th event, then at the moment tc the decay rate
changes to l2 and only one event out ofk2 is recorded.
Apparently l1 , k1 , tc , l2 and k2 are undetermined.
We wish to find them.
![Page 23: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/23.jpg)
Preparation
• The waiting time for the k-th event in a Poissonian process withfrequency l is distributed according to:
• I can sample a big amount of events from this pdf, changing the parameters l1 e k1 to l2 e k2 at time tc
• I evaluate the likelihood:
![Page 24: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/24.jpg)
Idea• I assume log-likelihood to be the invariant distribution!
▫ which are the Markov chain states?
struct State {
double lambda1, lambda2;double tc;int k1, k2;double plog;
State(double la1, double la2, double t, int kk1, int kk2) :
lambda1(la1), lambda2(la2), tc(t), k1(kk1), k2(kk2) {}
State() {}; };
Parameterspace
Corresponding log-likelihood value
![Page 25: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/25.jpg)
Practically
• I have to find an appropriate proposal distribution to move amongthe states
▫ Attention: varying li and ki I have toi prevent the acceptance rate to betoo low… but also too high!
• The a ratio is evaluated as the ratio between the final-state and initial-state likelihood values.
• Try to guess the values for li , ki and tc
• Let the chain evolve for a burn-in time and then record the results.
![Page 26: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/26.jpg)
Results• Even if the inital guess is quite far from the real
value, the random walker converges.
guess: l1=5 l2 = 5 k1 = 3 k2 = 2
real: l1=1 l2 = 2 k1 = 1, k2 = 1
![Page 27: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/27.jpg)
Results- 2
• Estimate of the uncertainty
l1
l2
![Page 28: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/28.jpg)
Results- 3
• All the parameters can be detemined quickly
guess: tc=150 real: tc=300
![Page 29: mcmc-A comparative study](https://reader034.vdocuments.site/reader034/viewer/2022042619/577cc9a71a28aba711a44bac/html5/thumbnails/29.jpg)
References
• C. Andrieu, N. De Freitas, A. Doucet e M.I. Jordan, Machine Learning 50(2003), 5-43.
• G. Casella e E.I. George, The American Statistician 46, 3 (1992), 167-174.
• W.H. Press, S. A. Teukolsky, W.T. Vetterling e B.P. Flannery, NumericalRecipes , Third Edition, Cambridge University Press, 2007.
• M. Loreti, Teoria degli errori e fondamenti di statistica, Decibel, Zanichelli (1998).
• B. Walsh, Markov Chain Monte Carlo and Gibbs Sampling, Lecture Notes for EEB 581