ch9-4710

19
Inferences on Proportions Tieming Ji Fall 2012 1 / 19

Upload: leia-seungho

Post on 19-Jul-2016

5 views

Category:

Documents


2 download

DESCRIPTION

Credit chaitanyachegg

TRANSCRIPT

Page 1: ch9-4710

Inferences on Proportions

Tieming Ji

Fall 2012

1 / 19

Page 2: ch9-4710

Example: One measure of quality and customer satisfaction isrepeat business. A supplier of paper used for computerprintouts sampled 75 customer accounts last year and foundthat 40 of these had place more than one order during theyear. Estimate the proportion of repeat business, and give a100(1− α)% confidence interval of the proportion of repeatbusiness.

X : if one customer reorder or not.X=1, repeat business; X = 0, no repeat business.

X ∼ Bernoulli(p)

We want to estimate p. Notice E(X ) = p.Thus, for an i.i.d. sample with sample size n, an unbiasedpoint estimator is : p̂ = X̄ = 1

n

∑ni=1 Xi = 40

75= 8

15.

2 / 19

Page 3: ch9-4710

We have found a (unbiased) point estimator for p. Whatabout a C.I. estimate?

Consider: p̂ = X̄ . What have we learnt about X̄? By CLT, wehave

X̄ − µX

σX/√n∼ N(0, 1) for sufficiently large n.

We have µX = E(X ) = p, σ2X = Var(X ) = p(1− p). So

p̂ − p√p(1− p)/n

∼ N(0, 1).

3 / 19

Page 4: ch9-4710

p̂ − p√p(1− p)/n

∼ N(0, 1)

P(−zα/2 ≤p̂ − p√

p(1− p)/n≤ zα/2) = 1− α

P(p̂−zα/2√

p(1− p)/n ≤ p ≤ p̂+zα/2√p(1− p)/n) = 1−α

P(p̂−zα/2√p̂(1− p̂)/n ≤ p ≤ p̂+zα/2

√p̂(1− p̂)/n) ≈ 1−α

C.I. for p:

p̂ ± zα/2√

p̂(1− p̂)/n.

Thus, C.I. for p is: 4075± zα/2

√4075

(1− 4075

)/75 for level (1− α)

confidence.

4 / 19

Page 5: ch9-4710

C.I. for p:

p̂ ± zα/2√

p̂(1− p̂)/n.

Half Interval length: d = zα/2√

p̂(1− p̂)/n. If we want tocontrol the interval length (2d) or half interval length (d), thesample size n should be at least:

n ≈z2α/2p̂(1− p̂)

d2.

Since p̂(1− p̂) ≤ 1/4, when no estimate of p is available, use

n ≈z2α/24d2

.

5 / 19

Page 6: ch9-4710

Now, we want to test if the proportion of repeat business isbigger than p0 = 0.5 at the significance level α = 0.05. This isto test

H0 : p ≤ p0 vs. H1 : p > p0.

(or H0 : p = p0 vs. H1 : p > p0)

Rationale:We would reject H0 if p̂ = 40

75is much bigger than p0 = 0.5.

If H0 is true, p̂−p√p(1−p)/n

= p̂−p0√p0(1−p0)/n

·∼ N(0, 1).

If p̂−p0√p0(1−p0)/n

is much bigger than 0, we will reject H0.

6 / 19

Page 7: ch9-4710

To control type I error at α = 0.05 (Reject H0 but H0 is true),we reject H0 if

p̂ − p0√p0(1− p0)/n

≥ zα

We observe: p̂−p0√p0(1−p0)/n

= 40/75−0.5√0.5(1−0.5)/75

= 0.578.

z0.05 = 1.64

Thus, we fail to reject H0.

7 / 19

Page 8: ch9-4710

Statistical inferences for proportions:

Summary:

1. p̂ = X̄ is an unbiased estimator for p.

2. (1− α) confidence interval on p is: p̂ ± zα/2√

p̂(1− p̂)/n.

3. In hypothesis tests (right-tailed, left-tailed, two-tailed), usetest statistic p̂−p√

p(1−p)/n. When H0 is true (and p0 is the

boundary value in the hypothsis),

p̂ − p0√p0(1− p0)/n

∼ N(0, 1).

Find critical value and rejection region according the type ofthe test.

8 / 19

Page 9: ch9-4710

Comparison of two proportions

Example: A study is conducted to compare computer usage inCanadian businesses to that of businesses in the United States.Independent random samples of size 375 businesses areselected from the population of Canadian and United Statesbusinesses, respectively. It is found that 221 of the Canadianfirms and 232 of the firms in the United States havemainframe computers.

Proportion of business in Canada having mainframe computers: p1.Proportion of business in United States having mainframe computers: p2.(1) Estimation: How to give point and interval estimation of p1 − p2?(2) Test: How to test if p1 − p2 > p0, p1 − p2 < p0, or p1 − p2 6= p0?

9 / 19

Page 10: ch9-4710

An i.i.d. sample of size n1 with x1 successes from Canada;An i.i.d. sample of size n2 with x2 successes from UnitedStates.

We have

E(x1n1− x2

n2) = E(

x1n1

)− E(x2n2

) = p1 − p2.

Thus, an unbiased point estimate for p1 − p2 is

p̂1 − p2 =

(x1n1− x2

n2

)=

221

375− 232

375= −0.03.

10 / 19

Page 11: ch9-4710

What about C.I. estimator for p1 − p2? That relies on thedistribution of the point estimator p̂1 − p2.

According to the Central Limit Theorem, we have

p̂1 − p2 − (p1 − p2)√p1(1− p1)/n1 + p2(1− p2)/n2

∼ N(0, 1).

Use p̂1 and p̂2 to replace p1 and p2 in the denominator, wehave:

p̂1 − p2 − (p1 − p2)√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

·∼ N(0, 1).

11 / 19

Page 12: ch9-4710

P

(−zα/2 ≤

p̂1 − p2 − (p1 − p2)√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

≤ zα/2

)= 1−α

Thus, the (1− α) C.I. is:

p̂1 − p2 ± zα/2√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2.

A 95% C.I. is

−0.03± 1.96√

(0.589)(0.411)/375 + (0.619)(0.381)/375

= −0.03± 0.07 = [−0.1, 0.04].

12 / 19

Page 13: ch9-4710

Consider the following tests:1. Right-tail test:

H0 : p1 − p2 = (p1 − p2)0 or H0 : p1 − p2 ≤ (p1 − p2)0

H1 : p1 − p2 > (p1 − p2)0

2. Left-tail test:

H0 : p1 − p2 = (p1 − p2)0 or H0 : p1 − p2 ≥ (p1 − p2)0

H1 : p1 − p2 < (p1 − p2)0

3. Two-tail test:

H0 : p1 − p2 = (p1 − p2)0

H1 : p1 − p2 6= (p1 − p2)0

13 / 19

Page 14: ch9-4710

Consider two cases: (p1 − p2)0 6= 0 and (p1 − p2)0 = 0. Teststatistics are different for these two cases.

Case 1: (p1 − p2)0 6= 0 example:

At the significance level α = 0.05, we want to test

H0 : p1 − p2 = −0.05 vs. H1 : p1 − p2 6= −0.05

We will reject H0 if p̂1 − p2 is too far from -0.05.

If H0 is true, by Central Limit Theorem, we have

p̂1 − p2 − (p1 − p2)0√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

·∼ N(0, 1).

14 / 19

Page 15: ch9-4710

We use test statistic p̂1−p2−(p1−p2)√p1(1−p1)/n1+p2(1−p2)/n2

, and when H0 is

true,

p̂1 − p2 − (p1 − p2)0√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

·∼ N(0, 1).

Since this is a two-tailed test, we reject H0 if∣∣∣∣∣ p̂1 − p2 − (p1 − p2)0√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

∣∣∣∣∣ ≥ zα/2

We observe∣∣∣∣ p̂1−p2−(p1−p2)0√p̂1(1−p̂1)/n1+p̂2(1−p̂2)/n2

∣∣∣∣ = −0.03−(−0.05)√0.589(1−0.589)

375+ 0.619(1−0.619)

375

= 0.560

z0.025 = 1.96. Thus, we fail to reject H0.

15 / 19

Page 16: ch9-4710

Case 2: (p1 − p2)0 = 0 example:

At the significance level α = 0.05, we want to test

H0 : p1 = p2 vs. H1 : p1 6= p2

We will reject H0 if p̂1 − p2 is too far from 0.

When H0 is true, then

p̂1 − p2 − 0√p(1− p)(1/n1 + 1/n2)

∼ N(0, 1).

When H0 is true, estimate p̂ = n1p̂1+n2p̂2n1+n2

, thus

p̂1 − p2√p̂(1− p̂)(1/n1 + 1/n2)

·∼ N(0, 1).

16 / 19

Page 17: ch9-4710

If H0 true, then

p̂1 − p2√p̂(1− p̂)(1/n1 + 1/n2)

·∼ N(0, 1).

Since this is a two-tailed test, at the level α, we reject H0 if∣∣∣∣∣ p̂1 − p2√p̂(1− p̂)(1/n1 + 1/n2)

∣∣∣∣∣ ≥ zα/2.

We observe p̂ = n1p̂1+n2p̂2n1+n2

= 375(0.589)+375(0.619)375+375

= 0.604, and∣∣∣∣ p̂1−p2√p̂(1−p̂)(1/n1+1/n2)

∣∣∣∣ =

∣∣∣∣ −0.03√0.604(1−0.604)( 1

375+ 1

375)

∣∣∣∣ = 0.840.

Since z0.025 = 1.96, we fail to reject H0.

17 / 19

Page 18: ch9-4710

Comparing Two Proportions

Summary:

A random sample of size n1 from population 1 with x1 success.A random sample of size n2 from population 2 with x2 success.

1. E( x1n1− x2

n2) = p1 − p2.

2. A (1− α) C.I. for p1 − p2 is

p̂1 − p2 ± zα/2√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2.

18 / 19

Page 19: ch9-4710

3. Tests:

(1) To test if p1 6= p2, p1 > p2 or p1 < p2 (in H1), under H0,use test statistic

p̂1 − p2√p̂(1− p̂)(1/n1 + 1/n2)

·∼ N(0, 1),

where p̂ = n1p̂1+n2p̂2n1+n2

.

(2) To test if p1 − p2 6= c0, p1 − p2 > c0 or p1 − p2 < c0 (inH1 and c0 6= 0), under H0, use test statistic

p̂1 − p2 − c0√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

·∼ N(0, 1).

19 / 19