unit3: statistical inferences
DESCRIPTION
Unit3: Statistical Inferences. Wenyaw Chan Division of Biostatistics School of Public Health University of Texas - Health Science Center at Houston. Estimation. Point Estimates A point estimate of a parameter θ is a single number used as an estimate of the value of θ . - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/1.jpg)
Unit3: Statistical Inferences
Wenyaw ChanDivision of BiostatisticsSchool of Public Health
University of Texas- Health Science Center at Houston
![Page 2: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/2.jpg)
Estimation• Point Estimates
– A point estimate of a parameter θ is a single number used as an estimate of the value of θ.
– e.g. A natural estimate to use for estimating the population mean is the sample mean .
• Interval Estimation– If an random interval I=(L,U) satisfying Pr(L< θ <U)=1-
α, the observed values of L and U for a given sample is called a 1- α conference interval estimate for θ.
Which one is more accurate? Which one is more precise?
nXXn
ii /
1
__
![Page 3: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/3.jpg)
Estimation
What to estimate?
• B(n, p) proportion• Poisson () mean• N(, σ2) mean and/or variance
![Page 4: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/4.jpg)
Estimation of the Mean of a Distribution
• A point estimator of the population mean is sample mean.
• Sampling Distribution of is the distribution of values of over all possible
samples of size n that could have been selected from the reference population.
X
X
)(XE
![Page 5: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/5.jpg)
Estimation
• An estimator of a parameter is unbiased estimator if its expectation is equal to the parameter.
• Note: The unbiasedness is not sufficient to be used as the only criterion for chosen an estimator.
• The unbiased estimator with the minimum variance (MVUE) is preferred.
• If the population is normal, then is the MVUE of .X
![Page 6: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/6.jpg)
Sample Mean
• Standard error (of the mean) = standard deviation of the sample mean
• The estimated standard error
where s: sample standard deviation .
nn
2
ns
![Page 7: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/7.jpg)
Central Limit Theorem
• Let X1,…,Xn be a random sample from some population with mean and variance σ2
Then, for large n,
nNX
2
,
![Page 8: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/8.jpg)
Interval Estimation
• Let X1, ….Xn be a random sample from a normal population N(, σ2). If σ2 is known, a 95% confidence interval (C.I.) for is
why? (next slide)
nX
nX 96.1,96.1
![Page 9: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/9.jpg)
Interval Estimation
2
If ~ , , then Pr 1.96 1.96 .95
. .
1.96 1.96
1.96 1.96
1.96 1.96
XX Nn
ni e
Xn n
X Xn n
X Xn n
![Page 10: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/10.jpg)
Interval Estimation
Interpretation of Confidence Interval• Over the collection of 95% confidence
intervals that could be constructed from repeated random samples of size n, 95% of them will contain the parameter
• It is wrong to say:There is a 95% chance that the parameter will fall within a particular 95% confidence interval.
![Page 11: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/11.jpg)
Interval Estimation
• Note: 1. When and n are fixed, 99% C.I. is wider than 95% C.I.2. If the width of the C.I. is specified, the sample size can be determined.n length length
![Page 12: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/12.jpg)
Hypothesis Testing
• Null hypothesis(H0): the statement to be tested, usually reflecting the status quo.
• Alternative hypothesis (H1): the logical compliment of H0.
• Note: the null hypothesis is analogous to the
defendant in the court. It is presumed to be true unless the data argue overwhelmingly to the contrary.
![Page 13: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/13.jpg)
Hypothesis Testing• Four possible outcomes of the decision:
• Notation: = Pr (Type I error) = level of significance = Pr (Type II error)1- = power= Pr(reject H0|H1 is true)
Truth
Ho H1
DecisionAccept H0 OK Type II error
Reject H0 Type I error OK
![Page 14: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/14.jpg)
Hypothesis Testing
• Goal : to make and both small
• Facts: then then
• General Strategy:fix , minimize
![Page 15: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/15.jpg)
Testing for the Population Mean• When the sample is from normal population
H0 : = 120 vs H1 : < 120• The best test is based on ,which is called the test
statistic. The "best test" means that the test has the highest power among all tests with a given type I error.
Is there any bad test? Yes. • Rejection Region:
– range of values of test statistic for which H0 is rejected.
X
![Page 16: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/16.jpg)
One-tailed test
• Our rejection region is • Now,
X c
2
0
0
00
Pr( | )
Pr( | ~ ( , ))
( ) /
i.e. or //
Type I error Ho is true
X c X Nn
cn
c Z c Z nn
![Page 17: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/17.jpg)
Result
• To test H0 : = 0 vs H1 : < 0, based on the samples taken from a normal population with mean and variance unknown,
the test statistic is . • Assume the level of significance is α then,
– if t < tn-1, α , then we reject H0.
– if t ≥ tn-1, α, then we do not reject H0.
t xs n
0
/
![Page 18: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/18.jpg)
P-value• The minimum α-level at
which we can reject Ho based on the sample.
• P-value can also be thought as the probability of obtaining a test statistic as extreme as or more extreme than the actual test statistic obtained from the sample, given that the null hypothesis is true.
P value
![Page 19: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/19.jpg)
Remarks
• Two different approaches on determining the statistical significance:– Critical value method– P-value method.
![Page 20: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/20.jpg)
One-tailed test• Testing H0: µ= µ0 vs H1: µ > µ0 When unknown and population is normal
Test Statistic:
Rejection Region: t > tn-1,α
p-value = 1- Ft,n-1 (t), where Ft,n-1 ( ) is the cdf for t distribution with df=n-1. • Note: If is known, the s in test statistic will be replaced σ by and tn-1,α in rejection region will be replaced by zα , Ft,n-1 (t) will be replace by Ф(t).
t xs n
0
/
2
2
![Page 21: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/21.jpg)
Testing For Two-Sided Alternative• Let X1,….,Xn be the random samples from the
population N(µ, σ²), where σ² is unknown.• H0 : µ=µ0 vs H1 : µ≠µ0
– Test Statistic:
– Rejection Region: |t|> tn-1,1-α/2
– p-value = 2*Ft,n-1 (t), if t<= 0. (see figures on next slide)
2*[1- Ft,n-1 (t)], if t > 0. • Warning: exact p-value requires use of computer.
t xs n
0
/
![Page 22: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/22.jpg)
Testing For Two-Sided Alternative
P-value for X>U0 P-value for X<=U0
2Uo-x Uo x
if x> Uo
x Uo 2Uo-x
if x<= Uo
![Page 23: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/23.jpg)
The Power of A Test
• To test H0 : µ=µ0 vs H1 : µ<µ0 in normal population with known variance σ², the power is
• Review : Power= Pr [rejecting H0 | H0 is false ]• Factors Affecting the Power
1. 2. 3. 4.
0 1[ ( - ) / ].Z n
powerZ || 10 power
power n power
![Page 24: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/24.jpg)
The Power of The 1-Sample T Test
• To test H0 : µ=µ0 vs H1 : µ<µ0 in a normal
population with unknown variance σ², the power, for true mean µ1 and true s.d.= σ, is F(tn-1, .05), where F( ) is the c.d.f of non-central t distribution with df=n-1 and non-centrality 1 0 .n
n-1,0.05 n-1,0.95
2
Notes: 1. t = -t . 2. If X and Y are independent random variables such that Y~ N( ,1) and X ~ with d.f.=m, then Y/ (X/m) is said to have a non-central t distribution with
non-centrality .
![Page 25: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/25.jpg)
Power Function For Two-Sided Alternative
• To test H0 : µ=µ0 vs H1 : µ≠µ0 in normal population with known variance σ², the power is
,where µ1 is true alternative.
1 2 0 1 1 2 1 0[- + ] [- + ]Z n Z n
![Page 26: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/26.jpg)
Case of Unknown Variance
• For the same test with an unknown variance population, the power is F(-tn-1, 1-α/2) + 1- F(tn-1, 1-
α /2), where F( ) is the c.d.f of non-central t distribution with df=n-1 and non-centrality 1 0 .n
![Page 27: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/27.jpg)
Sample Size Determination
For example: H0 : µ=µ0 vs H1 : µ<µ0
power :
Hence,01
10
-1 ]/)([
ifnZ
)() (
)() (
/)(
210
22-11
10
-1
-110
ZZn
ZZn
ZnZ
![Page 28: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/28.jpg)
Factor Affecting Sample Size
1. 2. 3. 4. • To test H0 : µ=µ0 vs H1 : µ≠µ0, σ² is known. Sample size calculation is
2 n
n 1 n
|| 10 n
)() (2
10
22-12/1
ZZn
![Page 29: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/29.jpg)
Relationship between Hypothesis Testing and Confidence Interval
• To test H0 : µ=µ0 vs H1 : µ≠µ0, H0 is rejected with a two-sided level α test if and only if the two-sided 100%*(1 - α) confidence interval for µ does not contain µ0.
![Page 30: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/30.jpg)
One Sample Test for the Variance of A Normal Population
![Page 31: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/31.jpg)
One Sample Test for A Proportion
![Page 32: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/32.jpg)
Exact Method
• If p(hat) < p0, the p-value
• If p(hat) ≥ p0, the p-value
knk
k
PPkn
PnX
)1( 2
)],B(~X|nobservatio in the events of #Pr[2
00
events of #
0
0
knk PPkn
PnX
)1( 2
)],B(~X|nobservatio in the events of #Pr[2
00
n
events of #k
0
![Page 33: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/33.jpg)
Power and Sample size
![Page 34: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/34.jpg)
One-Sample Inference for the Poisson Distribution
• X ~ Poisson with mean μ• To test H0 : µ=µ0 vs H1 : µ≠µ0 at α level of
significance,– Obtain a two-sided 100(1- α)% C.I. for µ,
say (C1, C2)– If µ0 (C1, C2), we accept H0 otherwise reject H0.
![Page 35: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/35.jpg)
One-Sample Inference for the Poisson Distribution
• The p-value (for above two-sided test)– If observed X < µ0, then
– If observed X > µ0,
Where F(x | µ0) is the Poisson c.d.f with mean = µ0.
]1),|(2min[ 0xFP
]1)),|1(1(2min[ 0 xFP
![Page 36: Unit3: Statistical Inferences](https://reader036.vdocuments.site/reader036/viewer/2022062521/56816016550346895dcf1857/html5/thumbnails/36.jpg)
Large-Sample Test for Poisson (for µ0 ≥ 10)
• To test H0 : µ=µ0 vs H1 : µ≠µ0 at α level of significance,– Test Statistic:
– Rejection Region:
– p-value:
02
12
00
202 ~)1100/()( HunderSMRxX
21,1
2 X
2 21Pr X