lecture 5 interval estimation - arizona state universityslan/download/stp427_lecture5.pdflecture 5...
TRANSCRIPT
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Lecture 5 Interval Estimation
Shiwei Lan1
1School of Mathematical and Statistical SciencesArizona State University
STP427 Mathematical StatisticsFall 2019
1 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Table of Contents
1 Basic Concepts
2 CI of Means
3 CI of Variances
4 CI of Proportions
2 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Point Estimate vs Interval Estimate
3 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Point Estimate vs Interval Estimate
4 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Confidence Interval
5 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Interval Estimation
Definition (Confidence Interval)
If θl and θu are values of the random variables θl and θu such that
P[θl < θ < θu] = 1− α
for some specified probability 1− α, then we refer to the interval θl < θ < θu as a(1− α)100% confidence interval (CI) for θ. α > 0 is called confidence level,1− α is called the degree of confidence, and θl and θu are called lower and upperconfidence limits.
Example
Suppose Xiiid∼ N(µ, σ2). Then Z = X−µ
σ/√n∼ N(0, 1). We have
P[|Z | < zα/2] = 1− α
This implies x − zα/2σ√n< θ < x + zα/2
σ√n
.6 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Interpretation of Confidence Interval
• The (1− α)100% CI (θl , θu) is deterministic. Therefore for any specific(hypothesized) value θ0, θ0 ∈ (θl , θu) has probability either 0 or 1.P[θ0 ∈ (θl , θu)] 6= 1− α !!!
• The (1− α)100% CI for θ can be interpreted: if the experiment was repeatedfor multiple times from each a CI is obtained, then (1− α)100% of themwould contain θ.
7 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Table of Contents
1 Basic Concepts
2 CI of Means
3 CI of Variances
4 CI of Proportions
8 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
One-Sample Mean, known variance
Theorem
If x is the value of the mean of a random sample of size n from a (normal)population with the known variance σ2, then
x − zα/2σ√n< µ < x + zα/2
σ√n
is a (approximate) (1− α)100% CI for the population mean.
zα/2σ√n
is called margin of error. Note that we also have
P
[X − µσ/√n< zα
]= 1− α, or P
[X − µσ/√n> −zα
]= 1− α
This gives a one-sided (1− α)100% CI for the population mean µ.
µ > x − zασ√n, or µ < x + zα
σ√n
9 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
One-Sample Mean, known variance
Example
Suppose the lifetime of a particular brand of light bulbs is normally distributedwith standard deviation of σ = 75 hours and unknown mean. Suppose the sampleaverage lifetime of the 30 bulbs is x = 843 hours. Construct a 95% CI for theoverall average lifetime for light bulbs of this brand. How about a 90% one-sidedCI?
10 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
One-Sample Mean, known variance
Example
Suppose the lifetime of a particular brand of light bulbs is normally distributedwith standard deviation of σ = 75 hours and unknown mean. What is theminimum sample size required if we wish to estimate the overall average lifetimefor light bulbs to within 10 hours with 90% confidence?
11 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
One-Sample Mean, unknown variance
Note, when the population variance is unknown, we substitute it with the samplevariance and have the following t statistic
T =X − µS/√n∼ tn−1
Theorem
If x and s are the values of the mean and the standard deviation of a randomsample of size n from a (normal) population with unknown variance, then
x − tα/2,n−1sx√n< µ < x + tα/2,n−1
sx√n
is a (approximate) (1− α)100% CI for the population mean.
12 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
One-Sample Mean, unknown variance
Example
Suppose the lifetime of a particular brand of light bulbs is normally distributedwith unknown mean and variance. Suppose a sample of 30 bulbs is taken and thethe sample average lifetime is x = 843 hours and the sample standard deviation iss = 75 hours. Construct a 95% CI for the overall average lifetime for light bulbsof this brand.
13 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Two-Sample Means, known variances
Note, when there are two populations with known variances, we have the followingZ statistic for the difference of two independent random samples
Z =(X 1 − X 2)− (µ1 − µ2)√
σ21
n1+
σ22
n2
∼ N(0, 1)
Theorem
If x1 and x2 are the values of the mean of independent random samples of sizesn1 and n2 from (normal) populations with the known variances σ21 and σ22, then
(x1 − x2)− zα/2
√σ21n1
+σ22n2
< µ1 − µ2 < (x1 − x2) + zα/2
√σ21n1
+σ22n2
is a (approximate) (1− α)100% CI for the difference between two populationmeans.
14 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Two-Sample Mean, known variances
Example
• X1, . . . ,X15 ∼ N(µX , σ2X ),
• Y1, . . . ,Y8 ∼ N(µY , σ2Y ),
• x = 70.1, y = 75.3, σ2X = 60, σ2Y = 40,
Find 90% CI. (Note: z0.1 = 1.282, z0.05 = 1.645, z0.025 = 1.960)
• Note,
x − y = −5.2, 90%CI : −5.2± 1.645
√60
15+
40
8
• margin of error: 1.645√
6015 + 40
8 = 4.935
• Finally,90%CI = [−10.135,−0.265]
15 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Two-Sample Mean, known variances
Example
• X1, . . . ,X15 ∼ N(µX , σ2X ),
• Y1, . . . ,Y8 ∼ N(µY , σ2Y ),
• x = 70.1, y = 75.3, σ2X = 60, σ2Y = 40,
Find 90% CI. (Note: z0.1 = 1.282, z0.05 = 1.645, z0.025 = 1.960)
• Note,
x − y = −5.2, 90%CI : −5.2± 1.645
√60
15+
40
8
• margin of error: 1.645√
6015 + 40
8 = 4.935
• Finally,90%CI = [−10.135,−0.265]
15 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Two-Sample Means, unknown variances (pooled)
Note, when the population variances are unknown, we assume they are equal,denoted as σ2, and estimate it with the following pooled estimator
S2p =
(n1 − 1)S21 + (n2 − 1)S2
2
n1 + n2 − 2
which can be shown to be an unbiased estimator of σ2. Then we have thefollowing t statistic for the difference between population means
T =(X 1 − X 2)− (µ1 − µ2)
Sp
√1n1
+ 1n2
∼ tn1+n2−2
For two populations with different variances, we could use the following unpooledt statistic
T ∗ =(X 1 − X 2)− (µ1 − µ2)√
S21
n1+
S22
n2
∼ tdf
where we set the degree freedom df = (n1 ∧ n2)− 1 or refer Welch’s t-statistic.16 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Two-Sample Means, unknown variances (pooled)
Theorem
If x1, x2, s1 and s2 are the values of the means and the standard deviations ofindependent random samples of sizes n1 and n2 from normal populations withequal variances, then
(x1 − x2)− tα/2,n1+n2−2sp
√1
n1+
1
n2< µ1 − µ2
< (x1 − x2) + tα/2,n1+n2−2sp
√1
n1+
1
n2
is a (1− α)100% CI for the difference between two population means.
What if the normality assumption is absent???
17 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Two-Sample Means, unknown variances (pooled)
Example
• An investigation to determine if women employees are as well paid as theirmale counterparts in comparable jobs:
• Random samples of 14 males and 11 females in junior academic positions areselected.
• Male: Sample mean $48,530, sample sd: 780
• Female: Sample mean $47,620, sample sd: 750
• Assume that the populations are normally distributed with equal variances.
Construct 95% CI for the difference between two population means.
18 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Table of Contents
1 Basic Concepts
2 CI of Means
3 CI of Variances
4 CI of Proportions
19 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
One-Sample Variance
Recall that for normal population, we have the following χ2 statistic
(n − 1)S2
σ2∼ χ2
n−1
To obtain the (1− α)100% CI for the population variance σ2, we need
P
[χ21−α/2,n−1 <
(n − 1)S2
σ2< χ2
α/2,n−1
]= 1− α
Theorem
If s2 is the value of the variance of a random sample of size n from a normalpopulation, then
(n − 1)s2
χ2α/2,n−1
< σ2 <(n − 1)s2
χ21−α/2,n−1
is a (1− α)100% CI for the population variance.
20 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
One-Sample Variance
Example
Let X1, . . . ,X13 are normally distributed, but we do not know true populationmean and variance.Let 12 ∗ S2 = 128.41. Find 90% CI for variance.χ20.05,12 = 21.03, χ2
0.95,12 = 5.226
[(n − 1)s2
χ2α/2,n−1
,(n − 1)s2
χ21−α/2,n−1
]
=
[128.41
21.03,
128.41
5.226
]=
[128.41
21.03,
128.41
5.226
]= [6.11, 24.57]
21 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
One-Sample Variance
Example
Let X1, . . . ,X13 are normally distributed, but we do not know true populationmean and variance.Let 12 ∗ S2 = 128.41. Find 90% CI for variance.χ20.05,12 = 21.03, χ2
0.95,12 = 5.226
[(n − 1)s2
χ2α/2,n−1
,(n − 1)s2
χ21−α/2,n−1
]
=
[128.41
21.03,
128.41
5.226
]=
[128.41
21.03,
128.41
5.226
]= [6.11, 24.57]
21 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Two-Sample Variances*
When there are two normal populations, we have the following F statistic for theratio of variances with two independent random samples
F =S21/σ
21
S22/σ
22
∼ N(0, 1)
To obtain the (1− α)100% CI for the variance ratioσ21
σ22, we need
P[f1−α/2,n1−1,n2−1 < F < fα/2,n1−1,n2−1
]= 1− α
Theorem
If s21 and s22 are the values of the variances of independent random samples ofsizes n1 and n2 from normal populations, then
s21s22
1
fα/2,n1−1,n2−1<σ21σ22
<s21s22
fα/2,n2−1,n1−1
is a (1− α)100% CI for the population variance raito.22 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Table of Contents
1 Basic Concepts
2 CI of Means
3 CI of Variances
4 CI of Proportions
23 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
One-Sample Proportion
In many scenarios, we are interested in proportions. For the binomial populationX ∼∼ binom(n, θ), we have the following asymptotic Z statistic by CLT:
Zn =X − nθ√nθ(1− θ)
D→ N(0, 1)
We need to solve∣∣ xn − θ
∣∣ < zα/2
√θ(1−θ)
n . How?
Theorem
If X is a binomial random variable with parameter θ for large n, and θ = xn , then
θ − zα/2
√θ(1− θ)
n< θ < θ + zα/2
√θ(1− θ)
n
is an approximate (1− α)100% CI for the population proportion θ.
24 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
One-Sample Proportion
Example
• Just prior to an important election, in a random sample of 749 voters, 397preferred Candidate Y over Candidate Z.
• Construct a 90% confidence interval for the overall proportion of voters whoprefer Candidate Y over Candidate Z.
• X = 397, n = 749. p = Xn = 397
749 = 0.53.
• The confidence interval: p ± zα/2
√p(1−p)
n .
• 90% confidence level: α = 0.10, zα/2 = 1.645
• 0.53± 1.645√
0.54·0.47749 = (0.50, 0.56)
25 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
One-Sample Proportion
Example
• Just prior to an important election, in a random sample of 749 voters, 397preferred Candidate Y over Candidate Z.
• Construct a 90% confidence interval for the overall proportion of voters whoprefer Candidate Y over Candidate Z.
• X = 397, n = 749. p = Xn = 397
749 = 0.53.
• The confidence interval: p ± zα/2
√p(1−p)
n .
• 90% confidence level: α = 0.10, zα/2 = 1.645
• 0.53± 1.645√
0.54·0.47749 = (0.50, 0.56)
25 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Two-Sample Proportions
Note, when there are two populations, we are interested in the difference of twoproportions. By CLT, we have the following asymptotic Z statistic with twoindependent random samples
Zn =(θ1 − θ2)− (θ1 − θ2)√
θ1(1−θ1)n1
+ θ2(1−θ2)n2
∼ N(0, 1)
Theorem
If X1 ∼ b(n1, θ1) and X2 ∼ b(n2, θ2) are independent, and θi = xin i = 1, 2, then
(θ1 − θ2)− zα/2
√θ1(1− θ1)
n1+θ2(1− θ2)
n2< θ1 − θ2
< (θ1 − θ2) + zα/2
√θ1(1− θ1)
n1+θ2(1− θ2)
n2
is an approximate (1− α)100% CI for θ1 − θ2.26 / 27
sampling
S.Lan
Basic Concepts
CI of Means
CI of Variances
CI ofProportions
Two-Sample Proportions
Example
In a comparative study of two new drugs, A and B, 120 patients were treated withdrug A and 150 patients with drug B, and the following results were obtained.
Drug A Drug B
Cured 78 111Not Cured 42 39
Total 120 150
• Construct a 95% confidence interval for the difference in the cure rates of thetwo drugs.
27 / 27