introduction to bootstrap and elements of markov chains

Statistics Lab

Rodolfo Metulini

IMT Institute for Advanced Studies, Lucca, Italy

Lesson 5 - Introduction to Bootstrap and Introduction toMarkov Chains - 23.01.2014

Introduction

Let’s assume, for a moment, the CLT:

If random samples of n observations y1, y2, ..., yn are drawn from apopulation of meran µ and sd σ, for n sufficienly large, thesampling distribution of the sample mean can be approximated bya normal density with mean µ and variance σ

↘

I Averages taken from any distribution will have a normaldistribution

I The standard deviation decreases as the number ofobservation increases

But .. nobody tells us exactly how big the sample has to be.

Why Bootstrap?

Sometimes we can not make use of the CLT, because:

1. Nobody tells us exactly how big the sample has to be

2. The sample can be really small.

So that, we are not encouraged to hypotize any distributionassumption. We just have the data and let the raw dataspeak.

The bootstrap method attempts to determine the probabilitydistribution from the data itself, without recourse to CLT.

N.B. The Bootstrap method is not a way of reducing the error! itonly tries to estimate it.

Basic Idea of Bootstrap

Using the original sample as the population, and draw N samplesfrom the original sample (which are the bootstrap samples).Defining the estimator using the bootstrap samples.

Figure: Real World versus Bootstrap World

Structure of Bootstrap

1. Originally, from a list of data (the sample), one compute astatistic (an estimation)

2. Create an artifical list of data (a new sample), by randomlydrawing elements from the list

3. Compute a new statistic (estimation), from the new sample

4. Repeat, let’s say, 1000 times the point 2) and 3) and look tothe distribution of these 1000 statistics.

Sample mean

Suppose we extracted a sample x = (x1, x2, ..., xn) from thepopulation X . Let’s say the sample size is small: n = 10.

We can compute the sample mean x̂ using the values of thesample x . But, since n is small, the CLT does not hold, so that wecan say anything about the sample mean distribution.

APPROACH: We extract M samples (or sub-samples) of dimensionn from the sample x (with replacement).

We can define the bootstrap sample means: xbi∀i = 1...,M. Thisbecome the new sample with dimension M

Bootstrap sample mean:

Mb(X ) =∑M

i xbi/M

Bootstrap sample variance:

Vb(X ) =∑M

i (xbi −Mb(X ))2/M − 1

Bootstrap Confidence interval with varianceestimation

Let’s take a random sample of size 25 from a normal distributionwith mean 10 and standard deviation 3.

We can consider the sampling distribution of the sample mean.From that, we estimate the intervals.

The bootstrap estimates standard error by resampling the data inour original sample. Instead of repeatedly drawing samples of size100 from the population, we will repeatedly draw new samples ofsize 100 from our original sample, resampling withreplacement.

We can estimate the standard error of the sample mean using thestandard deviation of the bootstrapped sample means.

Confidence interval with quantiles

Suppose we have a sample of data from an exponential distributionwith parameter λ:

f (x |λ) = λe−λx (remember the estimation of λ̂ = 1/x̂n).

An alternative solution to the use of bootstrap estimated standarderrors (the estimation of the sd from an exponential is notstraightforward) is the use of bootstrap quantiles.

We can obtain M bootstrap estimates λ̂b and define q∗(α) the αquantile of the bootstrap distribution.

The new bootstrap confidence interval for λ will be:

[2 ∗ λ̂− q∗(1− α/2); 2 ∗ λ̂− q∗(α/2)]

Regression model coefficient estimate with Bootstrap

Now we will consider the situation where we have data on twovariables. This is the type of data that arises in a linear regressionsituation. It doesnt make sense to bootstrap the two variablesseparately, and so must remain linked when bootstrapped.

For example, if our original data contains the observations (1,3),(2,6), (4,3), and (6, 2), we re-sample this original sample inpairs.

Recall that the linear regression model is: y = β0 + β1x

We are going to construct a bootstrap interval for the slopecoefficient β1

1. Draw M bootstrap samples

2. Define the olsβ1 coefficient for each bootstrap sample

3. Define the bootstrap quantiles, and use the 0.025 and the0.975 to define the confidence interval for β1

Regression model coefficient estimate with Bootstrap:sampling the residuals

An alternative solution for the regression coefficient is a two stagemethods in which:

1. You draw M samples, for eah one you run a regression andyou define M bootstrap regression residuals (dim=n)

2. You add those residuals to each M draw sampled dependentvariable, to define M bootstrapped β1

The method consists in using the 0.025 and the 0.975 quantiles ofbootstrapped β1 to define the confidence interval.

References

Efron, B., Tibshirani, R. (1993). An introduction to thebootstrap (Vol. 57). CRC press

Figure: Efron and Tbishirani foundational book

Routines in R

1. boot, by Brian Ripley.

Functions and datasets for bootstrapping from the bookBootstrap Methods and Their Applications by A. C. Davisonand D. V. Hinkley (1997, CUP).

2. bootstrap, by Rob Tibshirani.

Software (bootstrap, cross-validation, jackknife) and data forthe book An Introduction to the Bootstrap by B. Efron andR. Tibshirani, 1993, Chapman and Hall

Markov Chain

Markov Chain are important concept in probability and many otherarea of research.

They are used to model the probability to belong to a certain statein a certain period, given the state in the past period.

Example of weather: What is the markov probability for the statetomorrow will be sunny, given that today is rainy?

The main properties of Markov Chain processes are:

I Memory of the process (usually the memory is fixed to 1)

I Stationarity of the distribution

Chart 1

A picture of an easy example of markov chain with 2 possiblestates and transition probabilities.

Figure: An example of 2 states markov chain

Notation

We define a stochastic process {Xn, n = 0, 1, 2, ...} that takes on afinite or countable number of possible values.

Let the possible values be non negative integers (i .e.Xn ∈ Z+). IfXn = i , then the process is said to be in state i at time n.

The Markov process (in discrete time) is defined as follows:

Pij = P[Xn+1 = j |Xn = in,Xn−1 = in−1, ...,X0 = i0] = P[Xn+1 =j |Xn = in], ∀i , j ∈ Z+

We call Pij a 1-step transition probability because we moved fromtime n to time n + 1.

It is a first order Markov Chain (memory = 1) because theprobability of being in state j at time (n + 1) only depends on thestate at time n.

Notation - 2

The n − step transition probabilityPnij = P[Xn+k = j |Xk = i ], ∀n ≥ 0, i , j ≥ 0

The Champman Kolmogorov equations allow us to compute thesen − step transition probabilities. It states that:

Pnij =∑

k PnikPmkj ,∀n,m ≥ 0, ∀i , j ≥ 0

N.B. Base probability properties:

1. Pij ≥ 0, ∀i , j ≥ 0

2.∑

j≥0 Pij = 1, i = 0, 1, 2, ...

Example: conditional probability

Consider two states: 0 = rain and 1 = no rain.

Define two probabilities:

α = P00 = P[Xn+1 = 0|Xn = 0] the probability it will raintomorrow given it rained today

β = P01 = P[Xn+1 = 1|Xn = 0] the probability it will raintomorrow given it did not rain today.

What is the probability it will rain the day after tomorrow given itrained today?

The transition probability matrix will be:

P = [α, β, 1− α, 1− β]

Example: uncoditional probababily

What is the unconditional probability it will rain the day aftertomorrow?

We need to define the uncoditional or marginal distribution of thestate at time n:

P[Xn = j ] =∑

i P[Xn = j |X0 = 1]P[X0 = i ] =∑

i Pnij ∗ αi ,

where αi = P[X0 = i ],∀i ≥ 0

and P[Xn = j |X0 = 1] is the conditional probability just computedbefore.

Stationary distributions

A stationary distribution π is the probability distribution such thatwhen the Markov chain reaches the stationary distribution, them itremains in that probability forever.

It means we are asking this question: What is the probability to bein a particular state in the long-run?

Let’s define πj as the limiting probability that the process will be instate j at time n, or

πj = limn→∞Pnij

Using Fubini’s theorem, we can define the stationary distributionas:

πj =∑

i Pijπi

Example: stationary distribution

Back to out example.

We can compute the 2 step, 3 step, ..., n- step transitiondistribution, and give a look WHEN it reach theconvergence.

An alternative method to compute the stationary transitiondistribution consists in using this easy formula:

π0 = βα

π1 = 1−αα

References

Ross, S. M. (2006). Introduction to probability models. AccessOnline via Elsevier.

Figure: Cover of the 10th edition

Routines in R

I markovchain, by Giorgio Alfredo Spedicato.

A package for easily handling discrete Markov chains.

I MCMCpack, by Andrew D. Martin, Kevin M. Quinn, andJong Hee Park.

Perform Monte Carlo simulations based on Markov Chainapproach.