stat 302 - university of british columbiaruben/stat302website/lecture12.pdf · stat 302 ruben zamar...

Stat 302

Ruben [email protected]

Asymptotic Results

Ruben Zamar [email protected] () Module 12 Asymptotic Results 1 / 60

Motivation

Suppose we are interested on the overall performance of UBCstudents in the Stat 302 Final Exam.

Some questions we may have are:

What is the mean performance across all UBC students?Do women do better than men on average?Do CPSC students do better than other FOSC students?How likely is for a student to fail this test? Does this prob changeacross disciplines?

Motivation

What is the mean performance across all UBC students?

Do women do better than men on average?Do CPSC students do better than other FOSC students?How likely is for a student to fail this test? Does this prob changeacross disciplines?

Motivation

What is the mean performance across all UBC students?Do women do better than men on average?

Do CPSC students do better than other FOSC students?How likely is for a student to fail this test? Does this prob changeacross disciplines?

Motivation

What is the mean performance across all UBC students?Do women do better than men on average?Do CPSC students do better than other FOSC students?

How likely is for a student to fail this test? Does this prob changeacross disciplines?

Motivation

Motivation (Continued)

Suppose we measure the independent performances of n = 65students (e.g. in April 12...).

The student’s performances Xi could be model as iid rv’s with(unknown) mean µ and variance σ2.

A quantity of possible interest is

P (|X − µ| < 5) =?

If this probability is large then we are confident that X estimates µwithin a 5 points error margin

P (|X − µ| < 5) =?

Confidence Intervals

Other examples of quantities of possible interest are

P (L (X1, ...,Xn) < µ < U (X1, ...,Xn)) = 0.95?where L (X1, ...,Xn) and U (X1, ...,Xn) are some functions (calledestimates) that only depend on the data

P (L (X1, ...,Xn) < µW − µM < U (X1, ...,Xn)) = 0.95?

Unfortunately we cannot compute these probabilities exactly becausewe don’t know the actual distribution of the Xi .

Even if we knew the cdf of Xi , it may be inconvenient or unfeasible tocalculate the exact probabilities when L (X1, ...,Xn) andU (X1, ...,Xn) are complicated (non-linear) functions.

P (L (X1, ...,Xn) < µW − µM < U (X1, ...,Xn)) = 0.95?

Asymptotic Approximation

We will learn in this course how to approximate these quantities whenn is “large”

These approximations are called “asymptotic calculations”

They are obtained by taking limit for n→ ∞The limiting results often give good approximation for moderatevalues of n (e.g. n = 20)

We will see that the standard normal cdf Φ (z) is a main tool forasymptotic calculations

They are obtained by taking limit for n→ ∞

The limiting results often give good approximation for moderatevalues of n (e.g. n = 20)

The General Setting

Let Xn be a sequence of random variables

Xn ∼ Fn, n = 1, 2, ...

There are several ways in which the sequence Xn may approach a“target” random variable X ∼ F as n→ ∞.Sometimes the target random variable is in fact a constant c

i.e. F (x) = I[c ,∞) (x) and so X = c with probability 1

The General Setting

Xn ∼ Fn, n = 1, 2, ...

There are several ways in which the sequence Xn may approach a“target” random variable X ∼ F as n→ ∞.

Sometimes the target random variable is in fact a constant c

The General Setting

Xn ∼ Fn, n = 1, 2, ...

The General Setting

Xn ∼ Fn, n = 1, 2, ...

Different Types of Convergence

An (incomplete) list of different types of convergence:

(1) Convergence with Probability One (Almost Sure Convergence)(2) Convergence in Probability(3) Convergence in Distribution(4) Convergence in Quadratic Mean

Almost Sure Convergence

Suppose that the rv’s Xn and X are all defined on the same samplespace Ω.

Notation:Xn → X a.s.

means “Xn converges almost surely to X as n→ ∞”Definition: Xn → X a.s. if

P(w : lim

n→∞Xn (w) = X (w)

This is a very strong type of convergence.

a.s. convergence implies ( no proved here)

convergence in probability andconvergence in distribution.

Suppose that the rv’s Xn and X are all defined on the same samplespace Ω.Notation:

Xn → X a.s.

means “Xn converges almost surely to X as n→ ∞”

Definition: Xn → X a.s. if

P(w : lim

n→∞Xn (w) = X (w)

Xn → X a.s.

P(w : lim

n→∞Xn (w) = X (w)

Xn → X a.s.

P(w : lim

n→∞Xn (w) = X (w)

Xn → X a.s.

P(w : lim

n→∞Xn (w) = X (w)

Xn → X a.s.

P(w : lim

n→∞Xn (w) = X (w)

convergence in probability and

convergence in distribution.

Xn → X a.s.

P(w : lim

n→∞Xn (w) = X (w)

The Strong Law of Large Numbers (SLLN)

Suppose that the random variables Xn are independent and all havethe same mean µ

The SLLN states that

∑i=1Xi → µ a.s.

as n→ ∞.In this case the limiting rv is a constant: the common mean µ

as n→ ∞.

In this case the limiting rv is a constant: the common mean µ

as n→ ∞.In this case the limiting rv is a constant: the common mean µ

Examples

Example 1: Suppose that Xn ∼ Binom(n, p)

Notice that Xn = Y1 + Y2 + · · ·+ Yn , the Yi are iid Bernoulli(1, p)E (Yi ) = p, for all iIn this case E (Xn) = np, for all nBy the SLLN

pn =Xnn=Y1 + Y2 + · · ·+ Yn

n→ p a.s.

pn is called the "sample proportion”and p is called the “populationproportion”

Examples

Notice that Xn = Y1 + Y2 + · · ·+ Yn , the Yi are iid Bernoulli(1, p)

E (Yi ) = p, for all iIn this case E (Xn) = np, for all nBy the SLLN

pn =Xnn=Y1 + Y2 + · · ·+ Yn

n→ p a.s.

Examples

Notice that Xn = Y1 + Y2 + · · ·+ Yn , the Yi are iid Bernoulli(1, p)E (Yi ) = p, for all i

In this case E (Xn) = np, for all nBy the SLLN

pn =Xnn=Y1 + Y2 + · · ·+ Yn

n→ p a.s.

Examples

Notice that Xn = Y1 + Y2 + · · ·+ Yn , the Yi are iid Bernoulli(1, p)E (Yi ) = p, for all iIn this case E (Xn) = np, for all n

By the SLLN

pn =Xnn=Y1 + Y2 + · · ·+ Yn

n→ p a.s.

Examples

pn =Xnn=Y1 + Y2 + · · ·+ Yn

n→ p a.s.

Examples

pn =Xnn=Y1 + Y2 + · · ·+ Yn

n→ p a.s.

a.s. Convergence and Continuity

Suppose that Xn → X a.s. and g (x) is a continuous function

It’s enough for g(x) to be continuous on the range of X

We can (easily) show that

g (Xn)→ g (X ) a.s.

Proof: just notice that if

Xn (w)→ X (w)

theng [Xn (w)]→ g [X (w)] .

g (Xn)→ g (X ) a.s.

Xn (w)→ X (w)

g (Xn)→ g (X ) a.s.

Xn (w)→ X (w)

g (Xn)→ g (X ) a.s.

Xn (w)→ X (w)

Examples (Continued)

Example 2: Suppose that Xn are iid Exp(λ)

In this case E (Xn) = 1/λ, for all nBy the SLLN

∑i=1

Xi → 1/λ a.s.

Now we use that a.s. convergence is preserved by continuous functions:

n∑ni=1 Xi

→ λ a.s.

In this case E (Xn) = 1/λ, for all n

By the SLLN

∑i=1

Xi → 1/λ a.s.

n∑ni=1 Xi

→ λ a.s.

∑i=1

Xi → 1/λ a.s.

n∑ni=1 Xi

→ λ a.s.

∑i=1

Xi → 1/λ a.s.

n∑ni=1 Xi

→ λ a.s.

The Sample Variance

Example 3: Suppose that Xn are iid with mean µ and variance σ2

By the SLLN

∑i=1

Xi → µ a.s.

Moreover, since E(X 2i)= σ2 + µ2 we have that

X 2 =1n

∑i=1

X 2i → σ2 + µ2 a.s.

Since a.s. convergence is preserved by continuous functions we have

X 2 − X 2 → σ2 + µ2 − µ2 = σ2 a.s.

That is1n

∑i=1(Xi − X )2 → σ2 a.s.

The Sample Variance

By the SLLN

∑i=1

Xi → µ a.s.

X 2 =1n

∑i=1

X 2i → σ2 + µ2 a.s.

X 2 − X 2 → σ2 + µ2 − µ2 = σ2 a.s.

That is1n

∑i=1(Xi − X )2 → σ2 a.s.

The Sample Variance

By the SLLN

∑i=1

Xi → µ a.s.

X 2 =1n

∑i=1

X 2i → σ2 + µ2 a.s.

X 2 − X 2 → σ2 + µ2 − µ2 = σ2 a.s.

That is1n

∑i=1(Xi − X )2 → σ2 a.s.

The Sample Variance

By the SLLN

∑i=1

Xi → µ a.s.

X 2 =1n

∑i=1

X 2i → σ2 + µ2 a.s.

X 2 − X 2 → σ2 + µ2 − µ2 = σ2 a.s.

That is1n

∑i=1(Xi − X )2 → σ2 a.s.

The Sample Variance

By the SLLN

∑i=1

Xi → µ a.s.

X 2 =1n

∑i=1

X 2i → σ2 + µ2 a.s.

X 2 − X 2 → σ2 + µ2 − µ2 = σ2 a.s.

That is1n

∑i=1(Xi − X )2 → σ2 a.s.

Convergence in Probability

Suppose that the rv’s Xn and X are all defined on the same samplespace Ω.

Notation:Xn →p X

means “Xn converges in probability to X as n→ ∞”Definition: Xn →p X if for all ε > 0,

limn→∞

P (|Xn − X | > ε) = 0

This type of convergence is weaker than a.s. convergence

Xn →p X

means “Xn converges in probability to X as n→ ∞”

Definition: Xn →p X if for all ε > 0,

limn→∞

P (|Xn − X | > ε) = 0

Xn →p X

limn→∞

P (|Xn − X | > ε) = 0

Xn →p X

limn→∞

P (|Xn − X | > ε) = 0

The Weak Law of Large Numbers (WLLN)

Suppose that the random variables Xn are independent and all havethe same mean µ and finite variance σ2

The WLLN states that

∑i=1Xi →p µ

as n→ ∞.We will give a simple proof of this result using Chevychev’s Inequality:

P (|X − µ| > ε) ≤ σ2

∑i=1Xi →p µ

as n→ ∞.

We will give a simple proof of this result using Chevychev’s Inequality:

P (|X − µ| > ε) ≤ σ2

∑i=1Xi →p µ

as n→ ∞.We will give a simple proof of this result using Chevychev’s Inequality:

P (|X − µ| > ε) ≤ σ2

Proof for the WLLN

We have

∑i=1Xi

By Chevychev’s Inequality

limn→∞

P (|X − µ| > ε) ≤ limn→∞

nε2= 0

Proof for the WLLN

We have

∑i=1Xi

limn→∞

P (|X − µ| > ε) ≤ limn→∞

nε2= 0

Proof for the WLLN

We have

∑i=1Xi

limn→∞

P (|X − µ| > ε) ≤ limn→∞

nε2= 0

Some Remarks

It can be shown (with some more work) that convergence inprobability is preserved by continuous functions.

More precisely, suppose that

Xn →p X and Yn →p Y ,g (s, t) is a continuous functionThen g (Xn ,Yn)→p g (X ,Y )

Example: suppose that Xn →p Z , where Z ∼ N (0, 1) thenX 2n →p Z 2, and Z 2 ∼ Gamma (1/2, 1/2) , which is call Chi-squarewith one degree of freedom.

Some Remarks

Xn →p X and Yn →p Y ,

g (s, t) is a continuous functionThen g (Xn ,Yn)→p g (X ,Y )

Some Remarks

Xn →p X and Yn →p Y ,g (s, t) is a continuous function

Then g (Xn ,Yn)→p g (X ,Y )

Some Remarks

Max and Min of Independent Unif(a,b)

Suppose that X1,X2, ...,Xn are iid Unif(a, b)

The common cdf is

F (x) =x − ab− a , for a < x < b

LetUn = Min X1,X2, ...,Xn

andVn = Max X1,X2, ...,Xn

The common cdf is

F (x) =x − ab− a , for a < x < b

The common cdf is

F (x) =x − ab− a , for a < x < b

The Min Converges in Probability to a

We will show that Un →p a

Recall that for all a < u < b

FUn (u) = 1−[1− u − a

b− a

]n= 1−

(b− ub− a

We will show that Un →p a

Recall that for all a < u < b

FUn (u) = 1−[1− u − a

b− a

]n= 1−

(b− ub− a

In fact, let 0 < ε < b− a be given. Then,

P (|Un − a| > ε) = P (Un > a+ ε) +

=0︷︸︸︷P (Un < a− ε)

(b− a− ε

b− a

(1− ε

b− a

)n→ 0,

as n→ ∞

Students should show that Vn →p b.

In fact, let 0 < ε < b− a be given. Then,

P (|Un − a| > ε) = P (Un > a+ ε) +

=0︷︸︸︷P (Un < a− ε)

(b− a− ε

b− a

(1− ε

b− a

)n→ 0,

as n→ ∞Students should show that Vn →p b.

Convergence in Distribution. Preliminary Concepts

Unlike a.s. convergence and convergence in probability, convergencein distribution doesn’t require the Xn to be all defined in the samesample space.

Technical concept: Given a cdf F (x) we define the setCF = x : F is continuous at xCF is the set of “continuity points of F”.It can be shown that C cF (the set of discontinuity points of F ) iseither finite or at most countable.

Proving this is a challenge question worth 5/100 marks added toyour midterm exam.

Technical concept: Given a cdf F (x) we define the setCF = x : F is continuous at x

CF is the set of “continuity points of F”.It can be shown that C cF (the set of discontinuity points of F ) iseither finite or at most countable.

Technical concept: Given a cdf F (x) we define the setCF = x : F is continuous at xCF is the set of “continuity points of F”.

It can be shown that C cF (the set of discontinuity points of F ) iseither finite or at most countable.

Convergence in Distribution

Let F (x) be the cdf of the “target” rv X , and let CF be its set ofcontinuity points

Definition: We say that Xn converges in distribution to X andwrite Xn →d X if Fn (x)→ F (x) for all x ∈ CF .Note: the convergence of Fn (x) to F (x) may fail for points outsideCF

Definition: We say that Xn converges in distribution to X andwrite Xn →d X if Fn (x)→ F (x) for all x ∈ CF .

Note: the convergence of Fn (x) to F (x) may fail for points outsideCF

Definition: We say that Xn converges in distribution to X andwrite Xn →d X if Fn (x)→ F (x) for all x ∈ CF .Note: the convergence of Fn (x) to F (x) may fail for points outsideCF

Restriction to Continuity Points

Let Xn = 1/n with probability one and let X = 0 with probabilityone.