intro to sampling methods
DESCRIPTION
Intro to Sampling Methods . CSE586 Computer Vision II Spring 2010, Penn State Univ. A Brief Overview of Sampling. Monte Carlo Integration Sampling and Expected Values Inverse Transform Sampling (CDF) Rejection Sampling Importance Sampling. Integration and Area. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/1.jpg)
CSE586, PSURobert Collins
Intro to Sampling Methods
CSE586 Computer Vision IISpring 2010, Penn State Univ
![Page 2: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/2.jpg)
CSE586, PSURobert Collins
A Brief Overview of Sampling
Monte Carlo Integration
Sampling and Expected Values
Inverse Transform Sampling (CDF)
Rejection Sampling
Importance Sampling
![Page 3: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/3.jpg)
CSE586, PSURobert Collins
Integration and Area
http://videolectures.net/mlss08au_freitas_asm/
![Page 4: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/4.jpg)
CSE586, PSURobert Collins
Integration and Area
http://videolectures.net/mlss08au_freitas_asm/
![Page 5: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/5.jpg)
CSE586, PSURobert Collins
Integration and Area
http://videolectures.net/mlss08au_freitas_asm/
total # boxes
![Page 6: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/6.jpg)
CSE586, PSURobert Collins
Integration and Area
http://videolectures.net/mlss08au_freitas_asm/
![Page 7: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/7.jpg)
CSE586, PSURobert Collins
Integration and Area
• As we use more samples, our answer should get more and more accurate
• Doesn’t matter what the shape looks like
(0,0)
(1,1)
area under curveaka integration!
(0,0)
(1,1)
arbitrary region(even disconnected)
![Page 8: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/8.jpg)
CSE586, PSURobert Collins
Monte Carlo Integration
upper boundc
f(x)
a b
Goal: compute definite integral of function f(x) from a to b
count the K samples that fall below the f(x) curve
K
Generate N uniform random samples in upper bound volume
N
K NAnswer= * Area of upper bound volume
![Page 9: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/9.jpg)
CSE586, PSURobert Collins
Monte Carlo IntegrationSampling-based integration is useful for computing the normalizing constant that turns an arbitrary non-negative function f(x) into a probability density function p(x).
Compute this via sampling (Monte Carlo Integration). Then:
Z
1Z
Note: for complicated, multidimensional functions, this is the ONLY way we can compute this normalizing constant.
![Page 10: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/10.jpg)
CSE586, PSURobert Collins
Sampling: A MotivationIf we can generate random samples xi from a given distribution P(x), then we can estimate expected values of functions under this distribution by summation, rather than integration.
That is, we can approximate:
by first generating N i.i.d. samples from P(x) and then forming the empirical estimate:
![Page 11: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/11.jpg)
CSE586, PSURobert Collins
Inverse Transform Sampling
It is easy to sample from a discrete 1D distribution, using the cumulative distribution function.
1 N
wk
k 1 Nk
1
0
![Page 12: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/12.jpg)
CSE586, PSURobert Collins
Inverse Transform Sampling
It is easy to sample from a discrete 1D distribution, using the cumulative distribution function.
1 Nk
1
0
1) Generate uniform u in the range [0,1]
u
2) Visualize a horizontal line intersecting bars
3) If index of intersected bar is j, output new sample xk=j j
xk
![Page 13: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/13.jpg)
CSE586, PSURobert Collins
Inverse Transform Sampling
Why it works:
inverse cumulative distribution function
cumulative distribution function
![Page 14: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/14.jpg)
CSE586, PSURobert Collins
Efficient Generating Many Samples
Basic idea: choose one initial small random number; deterministically sample the rest by “crawling” up the cdf function. This is O(N).
Due to Arulampalam
odd property: you generate the “random”numbers in sorted order...
![Page 15: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/15.jpg)
CSE586, PSURobert Collins
A Brief Overview of Sampling
Inverse Transform Sampling (CDF)
Rejection Sampling
Importance Sampling
For these two, we can sample from continuous distributions, and they do not even need to be normalized.
That is, to sample from distribution P, we only need to know a function P*, where P = P* / c , for some normalization constant c.
![Page 16: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/16.jpg)
CSE586, PSURobert Collins
Rejection SamplingNeed a proposal density Q(x) [e.g. uniform or Gaussian], and a constant c such that c(Qx) is an upper bound for P*(x)
Example with Q(x) uniformgenerate uniform random samples in upper bound volumeupper bound
cQ(x)
P*(x)
accept samples that fallbelow the P*(x) curve
the marginal density of thex coordinates of the pointsis then proportional to P*(x)
Note the relationship toMonte Carlo integration.
![Page 17: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/17.jpg)
CSE586, PSURobert Collins
Rejection Sampling
More generally:
1) generate sample xi from a proposal density Q(x)
2) generate sample u from uniform [0,cQ(xi)]
3) if u <= P*(xi) accept xi; else reject
Q(x)
cQ(xi)
P*(xi)
P*(x)cQ(x)
xi
![Page 18: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/18.jpg)
CSE586, PSURobert Collins
Importance “Sampling”
Not for generating samples. It is a method to estimate the expected value of a function f(xi) directly
1) Generate xi from Q(x)
2) an empirical estimate of EQ(f(x)), the expected value of f(x) under distribution Q(x), is then
3) However, we want EP(f(x)), which is the expected value of f(x) under distribution P(x) = P*(x)/Z
![Page 19: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/19.jpg)
CSE586, PSURobert Collins
Importance Sampling
Q(x)
P*(x)
When we generate from Q(x), values of x where Q(x) is greater than P*(x) are overrepresented, and values where Q(x) is less than P*(x) are underrepresented.
To mitigate this effect, introduce a weighting term
![Page 20: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/20.jpg)
CSE586, PSURobert Collins
Importance Sampling
New procedure to estimate EP(f(x)):
1) Generate N samples xi from Q(x)
2) form importance weights
3) compute empirical estimate of EP(f(x)), the expected value of f(x) under distribution P(x), as
![Page 21: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/21.jpg)
CSE586, PSURobert Collins
Resampling
Note: We thus have a set of weighted samples (xi, wi | i=1,…,N)
If we really need random samples from P, we can generate them by resampling such that the likelihood of choosing value xi is proportional to its weight wi
This would now involve now sampling from a discrete distribution of N possible values (the N values of xi )
Therefore, regardless of the dimensionality of vector x, we are resampling from a 1D distribution (we are essentially sampling from the indices 1...N, in proportion to the importance weights wi). So we can using the inverse transform sampling method we discussed earlier.
![Page 22: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/22.jpg)
CSE586, PSURobert Collins
Note on Proposal Functions
Computational efficiency is best if the proposal distribution looks a lot like the desired distribution (area between curves is small).
These methods can fail badly when the proposal distribution has 0 density in a region where the desired distribution has non-negligeable density.
For this last reason, it is said that the proposal distribution should have heavy tails.
![Page 23: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/23.jpg)
CSE586, PSURobert Collins
Sequential Monte Carlo Methods
Sequential Importance Sampling (SIS) and the closely related algorithm Sampling Importance Sampling (SIR) are known by various names in the literature:
- bootstrap filtering- particle filtering- Condensation algorithm- survival of the fittest
General idea: Importance sampling on time series data, with samples and weights updated as each new data term is observed. Well-suited for simulating Markov chains and HMMs!
![Page 24: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/24.jpg)
CSE586, PSURobert Collins
Markov-Chain Monte Carlo
![Page 25: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/25.jpg)
CSE586, PSURobert Collins
References
![Page 26: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/26.jpg)
CSE586, PSURobert Collins
Problem
Intuition: In high dimension problems, the “Typical Set” (volume of nonnegligable prob in state space) is a small fraction of the total space.
![Page 27: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/27.jpg)
CSE586, PSURobert Collins
High Dimensional Spaces
each pixel has twostates: on and off
![Page 28: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/28.jpg)
CSE586, PSURobert Collins
, but...
![Page 29: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/29.jpg)
CSE586, PSURobert Collins
ignores neighborhoodstructure of pixel latticeand empirical evidence that images are “smooth”
![Page 30: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/30.jpg)
CSE586, PSURobert Collins
![Page 31: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/31.jpg)
CSE586, PSURobert Collins
Recall: Markov Chains
Markov Chain:
• A sequence of random variables X1,X2,X3,...
• Each variable is a distribution over a set of states (a,b,c...)
• Transition probability of going to next state only depends on the current state. e.g. P(Xn+1 = a | Xn = b)
a
b
c
dtransition probs can be arranged in an NxN table of elements kij = P(Xn+1=j | Xn = i)
where the rows sum to one
![Page 32: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/32.jpg)
CSE586, PSURobert Collins
K= transpose of transitionprob table {k ij} (cols sum to one. We do this for computational convenience (next slide)
![Page 33: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/33.jpg)
CSE586, PSURobert Collins
Question:Assume you start in some state, and then run the simulation for a large number of time steps. What percentage of time do you spend at X1, X2 and X3?
![Page 34: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/34.jpg)
CSE586, PSURobert Collins
four possible initial distributions
[.33 .33 .33]
initial distribution
distribution after one time step
all eventually end up with same distribution -- this is the stationary distribution!
![Page 35: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/35.jpg)
CSE586, PSURobert Collins
in matlab: [E,D] = eigs(K)
![Page 36: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/36.jpg)
CSE586, PSURobert Collins
The PageRank of a webpage as used by Google is defined by a Markov chain. It is the probability to be at page i in the stationary distribution on the following Markov chain on all (known) webpages. If N is the number of known webpages, and a page i has ki links then it has transition probability (1-q)/ki + q/N for all pages that are linked to and q/N for all pages that are not linked to. The parameter q is taken to be about 0.15.
![Page 37: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/37.jpg)
CSE586, PSURobert Collins
![Page 38: Intro to Sampling Methods](https://reader036.vdocuments.site/reader036/viewer/2022062501/56816836550346895dddf675/html5/thumbnails/38.jpg)
CSE586, PSURobert Collins
Another Question:Assume you want to spend a particular percentage of time at X1, X2 and X3. What should the transition probabilities be?
X1
X2X3
P(x1) = .2P(x2) = .3P(x3) = .5 K = [ ? ? ?
? ? ? ? ? ? ]
we will discuss this next time