unit3: statistical inferences

Unit3: Statistical Inferences

Wenyaw ChanDivision of BiostatisticsSchool of Public Health

University of Texas- Health Science Center at Houston

Estimation• Point Estimates

– A point estimate of a parameter θ is a single number used as an estimate of the value of θ.

– e.g. A natural estimate to use for estimating the population mean is the sample mean .

• Interval Estimation– If an random interval I=(L,U) satisfying Pr(L< θ <U)=1-

α, the observed values of L and U for a given sample is called a 1- α conference interval estimate for θ.

Which one is more accurate? Which one is more precise?

nXXn

ii /

1

__

Estimation

What to estimate?

• B(n, p) proportion• Poisson () mean• N(, σ2) mean and/or variance

Estimation of the Mean of a Distribution

• A point estimator of the population mean is sample mean.

• Sampling Distribution of is the distribution of values of over all possible

samples of size n that could have been selected from the reference population.

X

X

)(XE

Estimation

• An estimator of a parameter is unbiased estimator if its expectation is equal to the parameter.

• Note: The unbiasedness is not sufficient to be used as the only criterion for chosen an estimator.

• The unbiased estimator with the minimum variance (MVUE) is preferred.

• If the population is normal, then is the MVUE of .X

Sample Mean

• Standard error (of the mean) = standard deviation of the sample mean

• The estimated standard error

where s: sample standard deviation .

nn

2

ns

Central Limit Theorem

• Let X1,…,Xn be a random sample from some population with mean and variance σ2

Then, for large n,

nNX

2

,

Interval Estimation

• Let X1, ….Xn be a random sample from a normal population N(, σ2). If σ2 is known, a 95% confidence interval (C.I.) for is

why? (next slide)

nX

nX 96.1,96.1

Interval Estimation

2

If ~ , , then Pr 1.96 1.96 .95

. .

1.96 1.96

1.96 1.96

1.96 1.96

XX Nn

ni e

Xn n

X Xn n

X Xn n

Interval Estimation

Interpretation of Confidence Interval• Over the collection of 95% confidence

intervals that could be constructed from repeated random samples of size n, 95% of them will contain the parameter

• It is wrong to say:There is a 95% chance that the parameter will fall within a particular 95% confidence interval.

Interval Estimation

• Note: 1. When and n are fixed, 99% C.I. is wider than 95% C.I.2. If the width of the C.I. is specified, the sample size can be determined.n length length

Hypothesis Testing

• Null hypothesis(H0): the statement to be tested, usually reflecting the status quo.

• Alternative hypothesis (H1): the logical compliment of H0.

• Note: the null hypothesis is analogous to the

defendant in the court. It is presumed to be true unless the data argue overwhelmingly to the contrary.

Hypothesis Testing• Four possible outcomes of the decision:

• Notation: = Pr (Type I error) = level of significance = Pr (Type II error)1- = power= Pr(reject H0|H1 is true)

Truth

Ho H1

DecisionAccept H0 OK Type II error

Reject H0 Type I error OK

Hypothesis Testing

• Goal : to make and both small

• Facts: then then

• General Strategy:fix , minimize

Testing for the Population Mean• When the sample is from normal population

H0 : = 120 vs H1 : < 120• The best test is based on ,which is called the test

statistic. The "best test" means that the test has the highest power among all tests with a given type I error.

Is there any bad test? Yes. • Rejection Region:

– range of values of test statistic for which H0 is rejected.

X

One-tailed test

• Our rejection region is • Now,

X c

2

0

0

00

Pr( | )

Pr( | ~ ( , ))

( ) /

i.e. or //

Type I error Ho is true

X c X Nn

cn

c Z c Z nn

Result

• To test H0 : = 0 vs H1 : < 0, based on the samples taken from a normal population with mean and variance unknown,

the test statistic is . • Assume the level of significance is α then,

– if t < tn-1, α , then we reject H0.

– if t ≥ tn-1, α, then we do not reject H0.

t xs n

0

/

P-value• The minimum α-level at

which we can reject Ho based on the sample.

• P-value can also be thought as the probability of obtaining a test statistic as extreme as or more extreme than the actual test statistic obtained from the sample, given that the null hypothesis is true.

P value

Remarks

• Two different approaches on determining the statistical significance:– Critical value method– P-value method.

One-tailed test• Testing H0: µ= µ0 vs H1: µ > µ0 When unknown and population is normal

Test Statistic:

Rejection Region: t > tn-1,α

p-value = 1- Ft,n-1 (t), where Ft,n-1 ( ) is the cdf for t distribution with df=n-1. • Note: If is known, the s in test statistic will be replaced σ by and tn-1,α in rejection region will be replaced by zα , Ft,n-1 (t) will be replace by Ф(t).

t xs n

0

/

2

2

Testing For Two-Sided Alternative• Let X1,….,Xn be the random samples from the

population N(µ, σ²), where σ² is unknown.• H0 : µ=µ0 vs H1 : µ≠µ0

– Test Statistic:

– Rejection Region: |t|> tn-1,1-α/2

– p-value = 2*Ft,n-1 (t), if t<= 0. (see figures on next slide)

2*[1- Ft,n-1 (t)], if t > 0. • Warning: exact p-value requires use of computer.

t xs n

0

/

Testing For Two-Sided Alternative

P-value for X>U0 P-value for X<=U0

2Uo-x Uo x

if x> Uo

x Uo 2Uo-x

if x<= Uo

The Power of A Test

• To test H0 : µ=µ0 vs H1 : µ<µ0 in normal population with known variance σ², the power is

• Review : Power= Pr [rejecting H0 | H0 is false ]• Factors Affecting the Power

1. 2. 3. 4.

0 1[ ( - ) / ].Z n

powerZ || 10 power

power n power

The Power of The 1-Sample T Test

• To test H0 : µ=µ0 vs H1 : µ<µ0 in a normal

population with unknown variance σ², the power, for true mean µ1 and true s.d.= σ, is F(tn-1, .05), where F( ) is the c.d.f of non-central t distribution with df=n-1 and non-centrality 1 0 .n

n-1,0.05 n-1,0.95

2

Notes: 1. t = -t . 2. If X and Y are independent random variables such that Y~ N( ,1) and X ~ with d.f.=m, then Y/ (X/m) is said to have a non-central t distribution with

non-centrality .

Power Function For Two-Sided Alternative

• To test H0 : µ=µ0 vs H1 : µ≠µ0 in normal population with known variance σ², the power is

,where µ1 is true alternative.

1 2 0 1 1 2 1 0[- + ] [- + ]Z n Z n

Case of Unknown Variance

• For the same test with an unknown variance population, the power is F(-tn-1, 1-α/2) + 1- F(tn-1, 1-

α /2), where F( ) is the c.d.f of non-central t distribution with df=n-1 and non-centrality 1 0 .n

Sample Size Determination

For example: H0 : µ=µ0 vs H1 : µ<µ0

power :

Hence,01

10

-1 ]/)([

ifnZ

)() (

)() (

/)(

210

22-11

10

-1

-110

ZZn

ZZn

ZnZ

Factor Affecting Sample Size

1. 2. 3. 4. • To test H0 : µ=µ0 vs H1 : µ≠µ0, σ² is known. Sample size calculation is

2 n

n 1 n

|| 10 n

)() (2

10

22-12/1

ZZn

Relationship between Hypothesis Testing and Confidence Interval

• To test H0 : µ=µ0 vs H1 : µ≠µ0, H0 is rejected with a two-sided level α test if and only if the two-sided 100%*(1 - α) confidence interval for µ does not contain µ0.

One Sample Test for the Variance of A Normal Population

One Sample Test for A Proportion

Exact Method

• If p(hat) < p0, the p-value

• If p(hat) ≥ p0, the p-value

knk

k

PPkn

PnX

)1( 2

)],B(~X|nobservatio in the events of #Pr[2

00

events of #

0

0

knk PPkn

PnX

)1( 2

)],B(~X|nobservatio in the events of #Pr[2

00

n

events of #k

0

Power and Sample size

One-Sample Inference for the Poisson Distribution

• X ~ Poisson with mean μ• To test H0 : µ=µ0 vs H1 : µ≠µ0 at α level of

significance,– Obtain a two-sided 100(1- α)% C.I. for µ,

say (C1, C2)– If µ0 (C1, C2), we accept H0 otherwise reject H0.

One-Sample Inference for the Poisson Distribution

• The p-value (for above two-sided test)– If observed X < µ0, then

– If observed X > µ0,

Where F(x | µ0) is the Poisson c.d.f with mean = µ0.

]1),|(2min[ 0xFP

]1)),|1(1(2min[ 0 xFP

Large-Sample Test for Poisson (for µ0 ≥ 10)

• To test H0 : µ=µ0 vs H1 : µ≠µ0 at α level of significance,– Test Statistic:

– Rejection Region:

– p-value:

02

12

00

202 ~)1100/()( HunderSMRxX

21,1

2 X

2 21Pr X

unit3: statistical inferences

Documents