nathaniel e. helwig - statisticsusers.stat.umn.edu/~helwig/notes/boot-notes.pdf · estimate given...

69
Bootstrap Resampling Nathaniel E. Helwig Assistant Professor of Psychology and Statistics University of Minnesota (Twin Cities) Updated 04-Jan-2017 Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 1

Upload: others

Post on 16-Mar-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Resampling

Nathaniel E. Helwig

Assistant Professor of Psychology and StatisticsUniversity of Minnesota (Twin Cities)

Updated 04-Jan-2017

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 1

Page 2: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Copyright

Copyright c© 2017 by Nathaniel E. Helwig

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 2

Page 3: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Outline of Notes

1) Background InformationStatistical inferenceSampling distributionsNeed for resampling

2) Bootstrap BasicsOverviewEmpirical distributionPlug-in principle

3) Bootstrap in PracticeBootstrap in RBias and mean-squared errorThe Jackknife

4) Bootstrapping RegressionRegression reviewBootstrapping residualsBootstrapping pairs

For a thorough treatment see:Efron, B. & Tibshirani, R.J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 3

Page 4: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Background Information

Background Information

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 4

Page 5: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Background Information Statistical Inference

The Classic Statistical Paradigm

X is some random variable, e.g., age in years

X = {x1, x2, x3, . . .} is some population of interest, e.g.,Ages of all students at the University of MinnesotaAges of all people in the state of Minnesota

At the population level. . .F (x) = P(X ≤ x) for all x ∈ X is the population CDFθ = t(F ) is population parameter, where t is some function of F

At the sample level. . .

x = (x1, . . . , xn)′ is sample of data with xiiid∼ F for i ∈ {1, . . . ,n}

θ = s(x) is sample statistic, where s is some function of x

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 5

Page 6: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Background Information Statistical Inference

The Classic Statistical Paradigm (continued)

θ is a random variable that depends on x (and thus F )

The sampling distribution of θ refers to the CDF (or PDF) of θ.

If F is known (or assumed to be known), then the sampling distributionof θ may have some known distribution.

If xiiid∼ N(µ, σ2), then x ∼ N(µ, σ2/n) where x = 1

n∑n

i=1 xi

Note in the above example, θ ≡ µ and θ ≡ x

How can we make inferences about θ using θ when F is unknown?

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 6

Page 7: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Background Information Sampling Distributions

The Hypothetical Ideal

Assume that X is too large to measure all members of population.

If we had a really LARGE research budget, we could collect Bindependent samples from the population X

xj = (x1j , . . . , xnj)′ is j-th sample with xij

iid∼ F

θj = s(xj) is statistic (parameter estimate) for j-th sample

Sampling distribution of θ can be estimated via distribution of {θj}Bj=1.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 7

Page 8: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Background Information Sampling Distributions

The Hypothetical Ideal: Example 1 (Normal Mean)

Sampling Distribution of x with xiiid∼ N(0,1) for n = 100:

Sampling Distribution: B = 200

xbar

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Sampling Distribution: B = 500

xbar

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Sampling Distribution: B = 1000

xbar

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Sampling Distribution: B = 2000

xbar

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Sampling Distribution: B = 5000

xbar

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Sampling Distribution: B = 10000

xbar

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 8

Page 9: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Background Information Sampling Distributions

The Hypothetical Ideal: Example 1 R Code

# hypothetical ideal: example 1 (normal mean)set.seed(1)n = 100B = c(200,500,1000,2000,5000,10000)xseq = seq(-0.4,0.4,length=200)quartz(width=12,height=8)par(mfrow=c(2,3))for(k in 1:6){X = replicate(B[k], rnorm(n))xbar = apply(X, 2, mean)hist(xbar, freq=F, xlim=c(-0.4,0.4), ylim=c(0,5),

main=paste("Sampling Distribution: B =",B[k]))lines(xseq, dnorm(xseq, sd=1/sqrt(n)))legend("topright",expression(bar(x)*" pdf"),lty=1,bty="n")

}

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 9

Page 10: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Background Information Sampling Distributions

The Hypothetical Ideal: Example 2 (Normal Median)

Sampling Distribution of median(x) with xiiid∼ N(0,1) for n = 100:

Sampling Distribution: B = 200

xmed

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Sampling Distribution: B = 500

xmed

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Sampling Distribution: B = 1000

xmed

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Sampling Distribution: B = 2000

xmed

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Sampling Distribution: B = 5000

xmed

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Sampling Distribution: B = 10000

xmed

Den

sity

−0.4 −0.2 0.0 0.2 0.4

01

23

45 x pdf

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 10

Page 11: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Background Information Sampling Distributions

The Hypothetical Ideal: Example 2 R Code

# hypothetical ideal: example 2 (normal median)set.seed(1)n = 100B = c(200,500,1000,2000,5000,10000)xseq = seq(-0.4,0.4,length=200)quartz(width=12,height=8)par(mfrow=c(2,3))for(k in 1:6){X = replicate(B[k], rnorm(n))xmed = apply(X, 2, median)hist(xmed, freq=F, xlim=c(-0.4,0.4), ylim=c(0,5),

main=paste("Sampling Distribution: B =",B[k]))lines(xseq, dnorm(xseq, sd=1/sqrt(n)))legend("topright",expression(bar(x)*" pdf"),lty=1,bty="n")

}

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 11

Page 12: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Background Information Need for Resampling

Back to the Real World

In most cases, we only have one sample of data. What do we do?

If n is large and we only care about x , we can use the CLT.

Sampling Distribution of x with xiiid∼ U[0,1] for B = 10000:

Sampling Distribution: n = 3

xbar

Den

sity

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

2.0

2.5

asymp pdf

Sampling Distribution: n = 5

xbar

Den

sity

0.2 0.4 0.6 0.8

0.0

0.5

1.0

1.5

2.0

2.5

3.0 asymp pdf

Sampling Distribution: n = 10

xbar

Den

sity

0.2 0.4 0.6 0.8

01

23

4

asymp pdf

Sampling Distribution: n = 20

xbar

Den

sity

0.3 0.4 0.5 0.6 0.7 0.8

01

23

45

6 asymp pdf

Sampling Distribution: n = 50

xbar

Den

sity

0.35 0.40 0.45 0.50 0.55 0.60 0.65

02

46

810 asymp pdf

Sampling Distribution: n = 100

xbar

Den

sity

0.40 0.45 0.50 0.55 0.60

02

46

810

1214 asymp pdf

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 12

Page 13: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Background Information Need for Resampling

The Need for a Nonparametric Resampling Method

For most statistics other than the sample mean, there is no theoreticalargument to derive the sampling distribution.

To make inferences, we need to somehow obtain (or approximate) thesampling distribution of any generic statistic θ.

Note that parametric statistics overcome this issue by assumingsome particular distribution for the dataNonparametric bootstrap overcomes this problem by resamplingobserved data to approximate the sampling distribution of θ.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 13

Page 14: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Basics

Bootstrap Basics

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 14

Page 15: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Basics Overview

Problem of Interest

In statistics, we typically want to know the properties of our estimates,e.g., precision, accuracy, etc.

In parametric situation, we can often derive the distribution of ourestimate given our assumptions about the data (or MLE principles).

In nonparametric situation, we can use the bootstrap to examineproperties of our estimates in a variety of different situations.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 15

Page 16: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Basics Overview

Bootstrap Procedure

Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we

want to make inferences about some statistic θ = s(x).

Can use Monte Carlo Bootstrap:1 Sample x∗i with replacement from {x1, . . . , xn} for i ∈ {1, . . . ,n}2 Calculate θ∗ = s(x∗) for b-th sample where x∗ = (x∗1 , . . . , x

∗n )′

3 Repeat 1–2 a total of B times to get bootstrap distribution of θ4 Compare θ = s(x) to bootstrap distribution

Estimated standard error of θ is standard deviation of {θ∗b}Bb=1:

σB =

√1

B − 1∑B

b=1(θ∗b − θ∗)2

where θ∗ = 1B∑B

b=1 θ∗b is the mean of the bootstrap distribution of θ.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 16

Page 17: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Basics Empirical Distribution

Empirical Cumulative Distribution Functions

Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we

want to estimate the cdf F .

The empirical cumulative distribution function (ecdf) Fn is defined as

Fn(x) = P(X ≤ x) =1n∑n

i=1 I{xi≤x}

where I{·} denotes an indicator function.

The ecdf assigns probability 1/n to each value xi , which implies that

Pn(A) =1n∑n

i=1 I{xi∈A}

for any set A in the sample space of X .Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 17

Page 18: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Basics Empirical Distribution

Some Properties of ECDFs

For any fixed value x , we have thatE [Fn(x)] = F (x)

V [Fn(x)] = 1n F (x)[1− F (x)]

As n→∞ we have that

supx∈R|Fn(x)− F (x)| as→ 0

which is the Glivenko-Cantelli theorem.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 18

Page 19: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Basics Empirical Distribution

ECDF Visualization for Normal Distribution

−2 −1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

n = 100

x

Fn(

x)

●●

●●

●●

●●●

●●

●●●

●●●●

●●●●●●●

●●●●●●

●●

●●●●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●

●●

●●●●●●●●

●●●●●●●●

●●●

●●●●●●

●●●

●●●

●●

−2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

n = 500

x

Fn(

x)

● ● ●● ●●●●●●●●●●●●●●●

●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●● ● ●

−4 −2 0 2 4

0.0

0.2

0.4

0.6

0.8

1.0

n = 1000

x

Fn(

x)

set.seed(1)par(mfrow=c(1,3))n = c(100,500,1000)xseq = seq(-4,4,length=100)for(j in 1:3){x = rnorm(n[j])plot(ecdf(x),main=paste("n = ",n[j]))lines(xseq,pnorm(xseq),col="blue")

}

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 19

Page 20: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Basics Empirical Distribution

ECDF Example

Table 3.1 An Introduction to the Bootstrap (Efron & Tibshirani, 1993).

School LSAT (y ) GPA (z) School LSAT (y ) GPA (z)1 576 3.39 9 651 3.362 635 3.30 10 605 3.133 558 2.81 11 653 3.124 578 3.03 12 575 2.745 666 3.44 13 545 2.766 580 3.07 14 572 2.887 555 3.00 15 594 2.968 661 3.43

Defining A = {(y , z) : 0 < y < 600, 0 < z < 3.00}, we have

P15(A) = (1/15)∑15

i=1 I{(yi ,zi )∈A} = 5/15

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 20

Page 21: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Basics Plug-In Principle

Plug-In Parameter Estimates

Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we

want to estimate some parameter θ = t(F ) that depends on the cdf F .Example: want to estimate expected value E(X ) =

∫xf (x)dx

The plug-in estimate of θ = t(F ) is given by

θ = t(F )

which is the statistic calculated using the ecdf in place of the cdf.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 21

Page 22: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Basics Plug-In Principle

Plug-In Estimate of Mean

Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we

want to estimate the expected value θ = E(X ) =∫

xf (x)dx .

The plug-in estimate of the expected value is the sample mean

θ = EF (x) =n∑

i=1

xi fi =1n

n∑i=1

xi = x

where fi = 1n is sample probability from the ecdf.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 22

Page 23: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Basics Plug-In Principle

Standard Error of Mean

Let µF = EF (x) and σ2F = VF (x) = EF [(x − µF )2] denote the mean and

variance of X , and denote this using the notation X ∼ (µF , σ2F ).

If x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, then the sample

mean x = 1n∑n

i=1 xi has mean and variance x ∼ (µF , σ2F/n).

The standard error of the mean x is the square root of the variance of x

SEF (x) = σF/√

n

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 23

Page 24: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap Basics Plug-In Principle

Plug-In Estimate of Standard Error of Mean

Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we

want to estimate the standard error of the meanSEF (x) = σF/

√n =

√EF [(x − µF )2]/n.

The plug-in estimate of the standard deviation is given by

σ = σF =

{1n∑n

i=1(xi − x)2}1/2

so the plug-in estimate of the standard error of the mean is

σ/√

n = σF/√

n =

{1n2

∑ni=1(xi − x)2

}1/2

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 24

Page 25: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice

Bootstrap in Practice

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 25

Page 26: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

Bootstrap Standard Error (revisited)

Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we

want to make inferences about some statistic θ = s(x).

Estimated standard error of θ is standard deviation of {θ∗b}Bb=1:

σB =

√1

B − 1∑B

b=1(θ∗b − θ∗)2

where θ∗ = 1B∑B

b=1 θ∗b is the mean of the bootstrap distribution of θ.

As the number of bootstrap samples goes to infinity, we have that

limB→∞

σB = SEF (θ)

where SEF (θ) is the plug-in estimate of SEF (θ).Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 26

Page 27: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

Illustration of Bootstrap Standard Error

Figure 6.1: An Introduction to the Bootstrap (Efron & Tibshirani, 1993).

48 THE BOOTSTRAP ESTIMATE OF STANDARD ERROR

Figure 6.1. The bootstrap algorithm for estimating the standard error of a statistzc iJ = s(x); each bootstrap sample zs an mdependent random sample of szze n from F The number of bootstrap replications B for estimating a standard error zs usually between 25 and 200. As B ---> oo, ies approaches the plug-m estimate of sep ( iJ).

Empirical Distribution

Bootstrap Bootstrap Samples of Replications

Size n of 8 t t

I ::: = x*

3 -----+ S*(3) = s(x*3)

F / *b A x -----+ S*(b) = s(x*b)

-

Bootstrap Estimate of Standard Error

A word about notation: m (G.7) we write sep(iJ*) rather than sep(O) to avoid confuswn between o, the value of s(x) based on the observed data, and 0* = s(x*) thought of as a random variable based on the bootstrap sample. The fuller notation se p ( 0( x*)) em-phasizes that sep is a bootstrap standard error: the actual data x is held fixed in (6.7); the randomness in the calculation comes from the variability of the bootstrap samples x*, gwen x. Similarly we will write E pg(x*) to indicate the bootstrap expectation of a func-

EXAMPLE: THE CORRELATION COEFFICIENT 49

tion g( x*), the expectation with x (and P) fixed and x* varymg according to (6.1).

The reader is asked in Problem 6.5 to show that there is a total of enn- 1

) distinct bootstrap samples. Denote these by z1, z 2 ,. . zm where m = enn- 1

). For example, if n = 2, the distinct sam-ples are ( x1, xt), ( x2, Xz) and ( x1, x 2); since the order doesn't mat-ter, ( x2, xt) 1s the same as ( x1, x2). The probability of obtaining one of these samples under sampling with replacement can be ob-tained frotu Llw multinomial distribution: details are in Problem 6.7 Denote the probability of the jth distinct sample by Wj,J = 1, 2,.-- enn- 1

). Then a direct way to calculate the ideal bootstrap estimate of standard error would be to use the population standard deviation of them bootstrap values s(zl):

m

sep(O*) = (L w.i{s(zl) _ s(-)}2]1/2 (6.8) J=1

where s(·) = I:;"=1 Wjs(zJ). The difficulty with this approach is that unless n is quite small(::::; 5), the number en,;1) is very large, making computation of (6.8) impractical. Hence the need for boot-strap sampling as described above.

6.3 Example: the correlation coefficient

We have already seen two examples of the bootstrap standard error estimate, for the mean and the median of the Treatment group of the mouse data, Table 2.1. As a second example consider the sample correlation coefficient between y = LSAT and z = GPA for then= 15 law school data points, Table 3.1, Con.·(y, z) = .776. How accurate is the estimate . 776? Table 6.1 shows the bootstrap estimate of standard error BeiJ for B ranging from 25 to 3200. The last value, sca2ou = .132, ts our estimate for se1.·(cilli). Later we Will see that SC200 is nearly as good an estimate of Sep as is Be3200·

Looking at the nght stde of Figure 3.1, the reader can i;nagine the bootstrap sampling process at work. The sample correlation of the n = 15 actual data points is CciiT = . 776. A bootstrap sample consists of 15 points selected at random and with replacement from the actual 15. The sample correlation of the bootstrap sample is a bootstrap replication CciiT*, which may be either bigger or smaller than CciiT. Independent repetitions of the bootstrap sampling pro-cess give bootstrap replications COiT* ( 1), COiT* (2), ; COiT* (B). Fi-

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 27

Page 28: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

Illustration of Bootstrap Procedure

Figure 8.1: An Introduction to the Bootstrap (Efron & Tibshirani, 1993).

CHAPTER 8

More complicated data structures

8.1 Introduction

The bootstrap algorithm of Figure 6.1 is based on the simplest possible probability model for random data: the one-sample model, where a single unknown probability distribution F produces the data x by random sampling

(8.1)

The individual data points xt in (8.1) can themselves be quite complex, perhaps being numbers or vectors or maps or images or anything at all, but the probability mechanism is simple. Many data analysis problems involve more complicated data structures. These structures have names like time series, analysis of variance, regression models, multi-sample problems, censored data, stratified sampling, and so on. The bootstrap algorithm can be adapted to general data structures, as is dificussed here and m Chapter 9.

8.2 One-sample problems

Figure 8.1 is a schematic diagram of the bootstrap method as it applies to one-sample problems. On the left IS the real world, where an unknown distribution F has given the observed data x = ( x 1 , x 2 , - - - , Xn) by random sampling_ We have calculated a statistic of interest from x, iJ =: s(x), and wish to know something about iJ's statistical behavior, perhaps its standard error se F ( iJ).

On the right side of the diagram is the bootstrap world, to use David Freedman's evocative te,:minology. In the bootstrap world, the empirical distribution F gives bootstrap samples x* = (xi, - · - , x;,) by random sampling, from which we calcu-

ONE-SAMPLE PROBLEMS

REAL WORLD

Unknown Probability Distribution

Observed Random Sample

Statistic of interest

BOOTSTRAP WORLD

D1stnbution Bootstrap Sample

Bootstrap Replication

87

Figure 8.1. A schematic diagram of the bootstrap as it applies to one-sample problems. In the r·eal world, the unknown probability distribution F gtves the data x = (x 1 , x2 ,- · , xn) by random sampling; from x we calculate the stattstic of mterest {j = s(x). In the bootstrap world, F generates x* by random sampling, gwmg {j• = s(x*). There ts only one observed value of e, but we can generate as many bootstrap replications {j• as affordable. The crucwl step m the bootstrap pro,cess ts "===* ", the process by whtch we construct from x an estimate F of the unknown population F

late bootstrap replications of the statistic of interest, {j• = s(x*). The big advantage of the, bootstrap world is that we can calculate as many replications of B* as we want, or at least as many as we can afford. This allows us to do probabilistic calcu!ations directly, for example using the observed variability of the B*'s to estimate the unobservable quantity sep(iJ). ,

The double arrow in Figure 8.1 indicates the calculation of F from F _ Conceptually, this is the crucial step in the bootstrap pro-cess, even though it is computationally simple. Every other part of the bootstrap picture is defined by analogy: F, gives x by random sampling, so F gives x* random sampling; e is obtained from X

via the function s(x), so e· is obtained from x* in the same way. Bootstrap calculations for more complex probability mechanisms turn out to be straightforward, once we know how to carry out the double arrow process - estimating the entire probability mech-anism from the data. Fortunately this is easy to do for all of the

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 28

Page 29: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

An R Function for Bootstrap Resampling

We can design our own bootstrap sampling function:

bootsamp <- function(x,nsamp=10000){x = as.matrix(x)nx = nrow(x)bsamp = replicate(nsamp,x[sample.int(nx,replace=TRUE),])

}

If x is a vector of length n, then bootsamp returns an n × B matrix,where B is the number of bootstrap samples (controlled via nsamp).

If x is a matrix of order n × p, then bootsamp returns an n × p × Barray, where B is the number of bootstrap samples.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 29

Page 30: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

An R Function for Bootstrap Standard Error

We can design our own bootstrap standard error function:

bootse <- function(bsamp,myfun,...){if(is.matrix(bsamp)){

theta = apply(bsamp,2,myfun,...)} else {theta = apply(bsamp,3,myfun,...)

}if(is.matrix(theta)){return(list(theta=theta,cov=cov(t(theta))))

} else{return(list(theta=theta,se=sd(theta)))

}}

Returns a list where theta contains bootstrap statistic {θ∗b}Bb=1, andse contains the bootstrap standard error estimate(or cov contains bootstrap covariance matrix).Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 30

Page 31: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

Example 1: Sample Mean

> set.seed(1)> x = rnorm(500,mean=1)> bsamp = bootsamp(x)> bse = bootse(bsamp,mean)> mean(x)[1] 1.022644> sd(x)/sqrt(500)[1] 0.04525481> bse$se[1] 0.04530694> hist(bse$theta)

Histogram of bse$theta

bse$theta

Fre

quen

cy

0.9 1.0 1.1 1.20

500

1000

1500

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 31

Page 32: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

Example 2: Sample Median

> set.seed(1)> x = rnorm(500,mean=1)> bsamp = bootsamp(x)> bse = bootse(bsamp,median)> median(x)[1] 0.9632217> bse$se[1] 0.04299574> hist(bse$theta)

Histogram of bse$theta

bse$theta

Fre

quen

cy

0.8 0.9 1.0 1.1 1.20

1000

2000

3000

4000

5000

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 32

Page 33: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

Example 3: Sample Variance

> set.seed(1)> x = rnorm(500,sd=2)> bsamp = bootsamp(x)> bse = bootse(bsamp,var)> var(x)[1] 4.095996> bse$se[1] 0.2690615> hist(bse$theta)

Histogram of bse$theta

bse$theta

Fre

quen

cy

3.5 4.0 4.5 5.00

500

1000

1500

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 33

Page 34: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

Example 4: Mean Difference

> set.seed(1)> x = rnorm(500,mean=3)> y = rnorm(500)> z = cbind(x,y)> bsamp = bootsamp(z)> myfun = function(z) mean(z[,1]) - mean(z[,2])> bse = bootse(bsamp,myfun)> myfun(z)[1] 3.068584> sqrt( (var(z[,1])+var(z[,2]))/nrow(z) )[1] 0.06545061> bse$se[1] 0.06765369> hist(bse$theta)

Histogram of bse$theta

bse$theta

Fre

quen

cy

2.8 2.9 3.0 3.1 3.2 3.3

050

010

0015

0020

0025

00

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 34

Page 35: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

Example 5: Median Difference

> set.seed(1)> x = rnorm(500,mean=3)> y = rnorm(500)> z = cbind(x,y)> bsamp = bootsamp(z)> myfun = function(z) median(z[,1]) - median(z[,2])> bse = bootse(bsamp,myfun)> myfun(z)[1] 2.984479> bse$se[1] 0.07699423> hist(bse$theta)

Histogram of bse$theta

bse$theta

Fre

quen

cy

2.8 3.0 3.2 3.4

050

010

0020

0030

00

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 35

Page 36: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

Example 6: Correlation Coefficient

> set.seed(1)> x=rnorm(500)> y=rnorm(500)> Amat=matrix(c(1,-0.25,-0.25,1),2,2)> Aeig=eigen(Amat,symmetric=TRUE)> evec=Aeig$vec> evalsqrt=diag(Aeig$val^0.5)> Asqrt=evec%*%evalsqrt%*%t(evec)> z=cbind(x,y)%*%Asqrt> bsamp=bootsamp(z)> myfun=function(z) cor(z[,1],z[,2])> bse=bootse(bsamp,myfun)> myfun(z)[1] -0.2884766> (1-myfun(z)^2)/sqrt(nrow(z)-3)[1] 0.04112326> bse$se[1] 0.03959024> hist(bse$theta)

Histogram of bse$theta

bse$theta

Fre

quen

cy

−0.45 −0.35 −0.25 −0.150

500

1000

1500

2000

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 36

Page 37: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bootstrap in R

Example 7: Uniform[0, θ] = bootstrap failure

> set.seed(1)> x = runif(500)> bsamp = bootsamp(x)> myfun = function(x) max(x)> bse = bootse(bsamp,myfun)> myfun(x)[1] 0.9960774> bse$se[1] 0.001472801> hist(bse$theta)

F is not a good estimate of F inextreme tails.

Histogram of bse$theta

bse$theta

Fre

quen

cy

0.986 0.988 0.990 0.992 0.994 0.9960

1000

3000

5000

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 37

Page 38: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bias and Mean-Squared Error

Measuring the Quality of an Estimator

We have focused on standard error to measure the precision of θ.Small standard error is good, but other qualities are important too!

Consider the toy example:

Suppose {xi}ni=1iid∼ (µ, σ2) and want to estimate mean of X

Define µ = 10 + x to be our estimate of µStandard error of µ is σ/

√n and limn→∞ σ/

√n = 0

But µ = 10 + x is clearly not an ideal estimate of µ

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 38

Page 39: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bias and Mean-Squared Error

Bias of an Estimator

Suppose x = (x1, . . . , xn)′ with xiiid∼ F (x) for i ∈ {1, . . . ,n}, and we

want to make inferences about some statistic θ = s(x).

The bias of an estimate θ = s(x) of θ = t(F ) is defined as

BiasF = EF [s(x)]− t(F )

where the expectation is taken with respect to F .Bias is difference between expectation of estimate and parameterIn the example on the previous slide, we have thatBiasF = EF (µ)− µ = EF (10 + x)− µ = 10 + EF (x)− µ = 10

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 39

Page 40: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bias and Mean-Squared Error

Bootstrap Estimate of Bias

The bootstrap estimate of bias substitutes F for F in bias definition

BiasF = EF [s(x)]− t(F )

where the expectation is taken with respect to the ecdf F .Note that t(F ) is the plug-in estimate of θt(F ) is not necessarily equal to θ = s(x)

Given B bootstrap samples, we can estimate bias using

BiasB = θ∗ − t(F )

where θ∗ = 1B∑B

b=1 θ∗b is the mean of the bootstrap distribution of θ

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 40

Page 41: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bias and Mean-Squared Error

An R Function for Bootstrap Bias

We can design our own bootstrap bias estimation function:

bootbias <- function(bse,theta,...){if(is.matrix(bse$theta)){return(apply(bse$theta,1,mean) - theta)

} else{return(mean(bse$theta) - theta)

}}

The first input bse is the object output from bootse, and the secondinput theta is the plug-in estimate of theta used for bias calculation.

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 41

Page 42: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bias and Mean-Squared Error

Sample Mean is Unbiased

> set.seed(1)> x = rnorm(500,mean=1)> bsamp = bootsamp(x)> bse = bootse(bsamp,mean)> mybias = bootbias(bse,mean(x))> mybias[1] 0.0003689287> mean(x)[1] 1.022644> sd(x)/sqrt(500)[1] 0.04525481> bse$se[1] 0.04530694

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 42

Page 43: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bias and Mean-Squared Error

Toy Example

> set.seed(1)> x = rnorm(500,mean=1)> bsamp = bootsamp(x)> bse = bootse(bsamp,function(x) mean(x)+10)> mybias = bootbias(bse,mean(x))> mybias[1] 10.00037> mean(x)[1] 1.022644> sd(x)/sqrt(500)[1] 0.04525481> bse$se[1] 0.04530694

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 43

Page 44: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bias and Mean-Squared Error

Mean Squared Error (MSE)

The mean-squared error (MSE) of an estimate θ = s(x) of θ = t(F ) is

MSEF = EF{[s(x)− t(F )]2}= VF (θ) + [BiasF (θ)]2

where the expectation is taken with respect to F .MSE is expected squared difference between θ and θIn the toy example on the previous slide, we have that

MSEF = EF{(µ− µ)2}= EF{(10 + x − µ)2}= EF (102) + 2EF [10(x − µ)] + EF [(x − µ)2]

= 100 + σ2/n

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 44

Page 45: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bias and Mean-Squared Error

Toy Example (revisited)

> set.seed(1)> x = rnorm(500,mean=1)> bsamp = bootsamp(x)> bse = bootse(bsamp,function(x) mean(x)+10)> mybias = bootbias(bse,mean(x))> c(bse$se,mybias)[1] 0.04530694 10.00036893> c(bse$se,mybias)^2[1] 2.052718e-03 1.000074e+02> mse = (bse$se^2) + (mybias^2)> mse[1] 100.0094> 100 + 1/length(x)[1] 100.002

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 45

Page 46: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bias and Mean-Squared Error

Balance between Accuracy and Precision

MSE quantifies both the accuracy (bias) and precision (variance).

Ideal estimators are accurate (small bias) and precise (small variance).

Having some bias can be an ok (or even good) thing, despite thenegative connotations of the word “biased”.

For example:Q: Would you rather have an estimator that is biased by 1 unit witha standard error of 1 unit? Or one that is unbiased but has astandard error of 1.5 units?A: The first estimator is better with respect to MSE

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 46

Page 47: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice Bias and Mean-Squared Error

Accuracy and Precision Visualization

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●●

●●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

low accuracy and low precision

x

y

TruthEstimates

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ● ●

● ●

●●

●●●

● ●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●● ●●

● ●●

●●●

●●

●●●

●● ●●

● ●

● ●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●

●● ●● ●

●●●

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

low accuracy and high precision

x

y

TruthEstimates

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●

●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

● ●

●●●

●●

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

high accuracy and low precision

x

y

TruthEstimates

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●● ●

● ●

●● ●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●● ●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●

●● ●

● ●

●●

●●●

● ●

● ●

●●

●●

●●

● ●

● ●

●●

●●

● ●

● ●

● ●

●●

● ●

●●●

●● ●

● ●●

●●

●●

●●

●●

● ●● ●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

high accuracy and high precision

x

y

TruthEstimates

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 47

Page 48: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice The Jackknife

Jackknife Sample

Before the bootstrap, the jackknife was used to estimate bias and SE.

The i-th jackknife sample of x = (x1, . . . , xn) is defined as

x(i) = (x1, . . . , xi−1, xi+1, . . . , xn)

for i ∈ {1, . . . ,n}. Note that. . .x(i) is original data vector without the i-th observationx(i) is a vector of length (n − 1)

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 48

Page 49: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice The Jackknife

Jackknife Replication

The i-th jackknife replication θ(i) of the statistic θ = s(x) is

θ(i) = s(x(i))

which is the statistic calculated using the i-th jackknife sample.

For plug-in statistics θ = t(F ), we have that

θ(i) = t(F(i))

where F(i) is the empirical distribution of x(i).

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 49

Page 50: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice The Jackknife

Jackknife Estimate of Standard Error

The jackknife estimate of standard error is defined as

σjack =

√√√√n − 1n

n∑i=1

(θ(i) − θ(·))2

where θ(·) = 1n∑n

i=1 θ(i) is the mean of the jackknife estimates of θ.

Note that the n−1n factor is derived considering the special case θ = x

σjack =

√√√√ 1(n − 1)n

n∑i=1

(xi − x)2

which is an unbiased estimator of the standard error of x .

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 50

Page 51: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice The Jackknife

Jackknife Estimate of Bias

The jackknife estimate of bias is defined as

Biasjack = (n − 1)(θ(·) − θ)

where θ(·) = 1n∑n

i=1 θ(i) is the mean of the jackknife estimates of θ.

This approach only works for plug-in statistics θ = t(F ).Only works if t(F ) is smooth (e.g., mean or ratio)Doesn’t work if t(F ) is unsmooth (e.g., median)Gives bias estimate using only n recomputations (typically n� B)

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 51

Page 52: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice The Jackknife

Smooth versus Unsmooth t(F )

Suppose we have a sample of data (x1, . . . , xn), and consider themean and median as a function of x1:

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●●●●●

●●

−4 −2 0 2 4

0.08

0.10

0.12

0.14

mean

x1

mea

nval

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

−4 −2 0 2 4

0.08

0.10

0.12

0.14

median

x1

med

val

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 52

Page 53: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice The Jackknife

Smooth versus Unsmooth t(F ) R Code

# mean is smoothmeanfun <- function(x,z) mean(c(x,z))set.seed(1)z = rnorm(100)x = seq(-4,4,length=200)meanval = rep(0,200)for(j in 1:200) meanval[j] = meanfun(x[j],z)quartz(width=6,height=6)plot(x,meanval,xlab=expression(x[1]),main="mean")

# median is unsmoothmedfun <- function(x,z) median(c(x,z))set.seed(1)z = rnorm(100)x = seq(-4,4,length=200)medval = rep(0,200)for(j in 1:200) medval[j] = medfun(x[j],z)quartz(width=6,height=6)plot(x,medval,xlab=expression(x[1]),main="median")

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 53

Page 54: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice The Jackknife

Some R Functions for Jackknife Resampling

We can design our own jackknife functions:

jacksamp <- function(x){nx = length(x)jsamp = matrix(0,nx-1,nx)for(j in 1:nx) jsamp[,j] = x[-j]jsamp

}

jackse <- function(jsamp,myfun,...){nx = ncol(jsamp)theta = apply(jsamp,2,myfun,...)se = sqrt( ((nx-1)/nx) * sum( (theta-mean(theta))^2 ) )list(theta=theta,se=se)

}

These functions work similar to the bootsamp and bootse functions ifx is a vector and the statistic produced by myfun is unidimensional.Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 54

Page 55: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice The Jackknife

Example 1: Sample Mean (revisited)

> set.seed(1)> x = rnorm(500,mean=1)> jsamp = jacksamp(x)> jse = jackse(jsamp,mean)> mean(x)[1] 1.022644> sd(x)/sqrt(500)[1] 0.04525481> jse$se[1] 0.04525481> hist(jse$theta)

Histogram of jse$theta

jse$theta

Fre

quen

cy

1.016 1.020 1.024 1.0280

2040

6080

100

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 55

Page 56: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrap in Practice The Jackknife

Example 2: Sample Median (revisited)

> set.seed(1)> x = rnorm(500,mean=1)> jsamp = jacksamp(x)> jse = jackse(jsamp,median)> median(x)[1] 0.9632217> jse$se[1] 0.01911879> hist(jse$theta)

Histogram of jse$theta

jse$theta

Fre

quen

cy

0.9625 0.9630 0.9635 0.96400

5010

015

020

025

0

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 56

Page 57: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression

Bootstrapping Regression

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 57

Page 58: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression Regression Review

Simple Linear Regression Model: Scalar Form

The simple linear regression model has the form

yi = b0 + b1xi + ei

for i ∈ {1, . . . ,n} whereyi ∈ R is the real-valued response for the i-th observationb0 ∈ R is the regression interceptb1 ∈ R is the regression slopexi ∈ R is the predictor for the i-th observation

eiiid∼ (0, σ2) is zero-mean measurement error

Implies that (yi |xi)ind∼ (b0 + b1xi , σ

2)

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 58

Page 59: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression Regression Review

Simple Linear Regression Model: Matrix Form

The simple linear regression model has the form

y = Xb + e

wherey = (y1, . . . , yn)′ ∈ Rn is the n × 1 response vectorX = [1n,x] ∈ Rn×2 is the n × 2 design matrix• 1n is an n × 1 vector of ones• x = (x1, . . . , xn)′ ∈ Rn is the n × 1 predictor vector

b = (b0,b1)′ ∈ R2 is the 2× 1 vector of regression coefficientse = (e1, . . . ,en)′ ∼ (0n, σ

2In) is the n × 1 error vector

Implies that (y|x) ∼ (Xb, σ2In)

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 59

Page 60: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression Regression Review

Ordinary Least Squares: Scalar Form

The ordinary least squares (OLS) problem is

minb0,b1∈R

n∑i=1

(yi − b0 − b1xi)2

and the OLS solution has the form

b0 = y − b1x

b1 =

∑ni=1(xi − x)(yi − y)∑n

i=1(xi − x)2

where x = (1/n)∑n

i=1 xi and y = (1/n)∑n

i=1 yi

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 60

Page 61: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression Regression Review

Ordinary Least Squares: Matrix Form

The ordinary least squares (OLS) problem is

minb∈R2

‖y− Xb‖2

where ‖ · ‖ denotes the Frobenius norm; the OLS solution has the form

b = (X′X)−1X′y

where (X′X

)−1=

1n∑n

i=1(xi − x)2

( ∑ni=1 x2

i −∑n

i=1 xi−∑n

i=1 xi n

)X′y =

( ∑ni=1 yi∑n

i=1 xiyi

)

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 61

Page 62: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression Regression Review

OLS Coefficients are Random Variables

Note that b is a linear function of y, so we can derive the following.

The expectation of b is given by

E(b) = E[(X′X)−1X′y]

= E[(X′X)−1X′(Xb + e)]

= E[b] + (X′X)−1X′E[e]

= b

and the covariance matrix is given by

V(b) = V[(X′X)−1X′y]

= (X′X)−1X′V[y]X(X′X)−1

= (X′X)−1X′(σ2In)X(X′X)−1

= σ2(X′X)−1

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 62

Page 63: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression Regression Review

Fitted Values are Random Variables

Similarly y = Xb is a linear function of y, so we can derive. . .

The expectation of y is given by

E(y) = E[X(X′X)−1X′y]

= E[X(X′X)−1X′(Xb + e)]

= E[Xb] + X(X′X)−1X′E[e]

= Xb

and the covariance matrix is given by

V(y) = V[X(X′X)−1X′y]

= X(X′X)−1X′V[y]X(X′X)−1X′

= X(X′X)−1X′(σ2In)X(X′X)−1X′

= σ2X(X′X)−1X′

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 63

Page 64: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression Regression Review

Need for the Bootstrap

If the residuals are Gaussian eiiid∼ N(0, σ2), then we have that

b ∼ N(b, σ2(X′X)−1).y ∼ N(Xb, σ2X(X′X)−1X′)

so it is possible to make probabilistic statements about b and y.

If eiiid∼ F for some arbitrary distribution F with EF (ei) = 0, we can use

the bootstrap to make inferences about b and y.Use bootstrap with ecdf F as distribution of ei

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 64

Page 65: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression Bootstrapping Residuals

Bootstrapping Regression Residuals

Can use following bootstrap procedure:1 Fit regression model to obtain y and e = y− y2 Sample e∗i with replacement from {e1, . . . , en} for i ∈ {1, . . . ,n}3 Define y∗i = yi + e∗i and b∗ = (X′X)−1X′y∗

4 Repeat 2–3 a total of B times to get bootstrap distribution of b

Don’t need Monte Carlo simulation to get bootstrap standard errors:

V(b∗) = (X′X)−1X′V(y∗)X(X′X)−1

= σ2F (X′X)−1

given that V(y∗) = σ2F In where σ2

F =∑n

i=1 e2i /n

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 65

Page 66: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression Bootstrapping Residuals

Bootstrapping Regression Residuals in R

> set.seed(1)> n = 500> x = rexp(n)> e = runif(n,min=-2,max=2)> y = 3 + 2*x + e> linmod = lm(y~x)> linmod$coef(Intercept) x

2.898325 2.004220> yhat = linmod$fitted.values> bsamp = bootsamp(linmod$residuals)> bsamp = matrix(yhat,n,ncol(bsamp)) + bsamp> myfun = function(y,x) lm(y~x)$coef> bse = bootse(bsamp,myfun,x=x)> bse$cov

(Intercept) x(Intercept) 0.006105788 -0.003438571x -0.003438571 0.003605852> sigsq = mean(linmod$residuals^2)> solve(crossprod(cbind(1,x))) * sigsq

x0.006112136 -0.003412774

x -0.003412774 0.003573180> par(mfcol=c(2,1))> hist(bse$theta[1,],main=expression(hat(b)[0]))> hist(bse$theta[2,],main=expression(hat(b)[1]))

b0

bse$theta[1, ]

Fre

quen

cy

2.6 2.7 2.8 2.9 3.0 3.1 3.2

010

00

b1

bse$theta[2, ]

Fre

quen

cy

1.8 1.9 2.0 2.1 2.2 2.3

015

0030

00

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 66

Page 67: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression Bootstrapping Pairs

Bootstrapping Pairs Instead of Residuals

We could also use the following bootstrap procedure:1 Fit regression model to obtain y and e = y− y2 Sample z∗i = (x∗i , y

∗i ) with replacement from {(x1, y1), . . . , (xn, yn)}

for i ∈ {1, . . . ,n}3 Define x∗ = (x∗1 , . . . , x

∗n ), X∗ = [1n,x∗], y∗ = (y∗1 , . . . , y

∗n ), , and

b∗ = (X′∗X∗)−1X′∗y∗

4 Repeat 2–3 a total of B times to get bootstrap distribution of b

Bootstrapping pairs only assumes (xi , yi) are iid from some F .

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 67

Page 68: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression Bootstrapping Pairs

Bootstrapping Regression Pairs in R

> set.seed(1)> n = 500> x = rexp(n)> e = runif(n,min=-2,max=2)> y = 3 + 2*x + e> linmod = lm(y~x)> linmod$coef(Intercept) x

2.898325 2.004220> z = cbind(y,x)> bsamp = bootsamp(z)> myfun = function(z) lm(z[,1]~z[,2])$coef> bse = bootse(bsamp,myfun)> bse$cov

(Intercept) z[, 2](Intercept) 0.006376993 -0.003913989z[, 2] -0.003913989 0.004308720> sigsq = mean(linmod$residuals^2)> solve(crossprod(cbind(1,x))) * sigsq

x0.006112136 -0.003412774

x -0.003412774 0.003573180> par(mfcol=c(2,1))> hist(bse$theta[1,],main=expression(hat(b)[0]))> hist(bse$theta[2,],main=expression(hat(b)[1]))

b0

bse$theta[1, ]

Fre

quen

cy

2.6 2.7 2.8 2.9 3.0 3.1 3.2

010

0025

00

b1

bse$theta[2, ]

Fre

quen

cy

1.8 1.9 2.0 2.1 2.2

010

0025

00

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 68

Page 69: Nathaniel E. Helwig - Statisticsusers.stat.umn.edu/~helwig/notes/boot-Notes.pdf · estimate given our assumptions about the data (or MLE principles). In nonparametric situation, we

Bootstrapping Regression

Bootstrapping Regression: Pairs or Residuals?

Bootstrapping pairs requires fewer assumptions about the dataBootstrapping pairs only assumes (xi , yi) are iid from some F

Bootstrapping residuals assumes (yi |xi)iid∼ (b0 + b1xi , σ

2)

Bootstrapping pairs can be dangerous when working with categoricalpredictors and/or continuous predictors with skewed distributions

Bootstrapping residuals is preferable when regression model isreasonably specified (because X remains unchanged).

Nathaniel E. Helwig (U of Minnesota) Bootstrap Resampling Updated 04-Jan-2017 : Slide 69