ch9-4710

Inferences on Proportions

Tieming Ji

Fall 2012

1 / 19

Example: One measure of quality and customer satisfaction isrepeat business. A supplier of paper used for computerprintouts sampled 75 customer accounts last year and foundthat 40 of these had place more than one order during theyear. Estimate the proportion of repeat business, and give a100(1− α)% confidence interval of the proportion of repeatbusiness.

X : if one customer reorder or not.X=1, repeat business; X = 0, no repeat business.

X ∼ Bernoulli(p)

We want to estimate p. Notice E(X ) = p.Thus, for an i.i.d. sample with sample size n, an unbiasedpoint estimator is : p̂ = X̄ = 1

n

∑ni=1 Xi = 40

75= 8

15.

2 / 19

We have found a (unbiased) point estimator for p. Whatabout a C.I. estimate?

Consider: p̂ = X̄ . What have we learnt about X̄? By CLT, wehave

X̄ − µX

σX/√n∼ N(0, 1) for sufficiently large n.

We have µX = E(X ) = p, σ2X = Var(X ) = p(1− p). So

p̂ − p√p(1− p)/n

∼ N(0, 1).

3 / 19

p̂ − p√p(1− p)/n

∼ N(0, 1)

P(−zα/2 ≤p̂ − p√

p(1− p)/n≤ zα/2) = 1− α

P(p̂−zα/2√

p(1− p)/n ≤ p ≤ p̂+zα/2√p(1− p)/n) = 1−α

P(p̂−zα/2√p̂(1− p̂)/n ≤ p ≤ p̂+zα/2

√p̂(1− p̂)/n) ≈ 1−α

C.I. for p:

p̂ ± zα/2√

p̂(1− p̂)/n.

Thus, C.I. for p is: 4075± zα/2

√4075

(1− 4075

)/75 for level (1− α)

confidence.

4 / 19

C.I. for p:

p̂ ± zα/2√

p̂(1− p̂)/n.

Half Interval length: d = zα/2√

p̂(1− p̂)/n. If we want tocontrol the interval length (2d) or half interval length (d), thesample size n should be at least:

n ≈z2α/2p̂(1− p̂)

d2.

Since p̂(1− p̂) ≤ 1/4, when no estimate of p is available, use

n ≈z2α/24d2

.

5 / 19

Now, we want to test if the proportion of repeat business isbigger than p0 = 0.5 at the significance level α = 0.05. This isto test

H0 : p ≤ p0 vs. H1 : p > p0.

(or H0 : p = p0 vs. H1 : p > p0)

Rationale:We would reject H0 if p̂ = 40

75is much bigger than p0 = 0.5.

If H0 is true, p̂−p√p(1−p)/n

= p̂−p0√p0(1−p0)/n

·∼ N(0, 1).

If p̂−p0√p0(1−p0)/n

is much bigger than 0, we will reject H0.

6 / 19

To control type I error at α = 0.05 (Reject H0 but H0 is true),we reject H0 if

p̂ − p0√p0(1− p0)/n

≥ zα

We observe: p̂−p0√p0(1−p0)/n

= 40/75−0.5√0.5(1−0.5)/75

= 0.578.

z0.05 = 1.64

Thus, we fail to reject H0.

7 / 19

Statistical inferences for proportions:

Summary:

1. p̂ = X̄ is an unbiased estimator for p.

2. (1− α) confidence interval on p is: p̂ ± zα/2√

p̂(1− p̂)/n.

3. In hypothesis tests (right-tailed, left-tailed, two-tailed), usetest statistic p̂−p√

p(1−p)/n. When H0 is true (and p0 is the

boundary value in the hypothsis),

p̂ − p0√p0(1− p0)/n

∼ N(0, 1).

Find critical value and rejection region according the type ofthe test.

8 / 19

Comparison of two proportions

Example: A study is conducted to compare computer usage inCanadian businesses to that of businesses in the United States.Independent random samples of size 375 businesses areselected from the population of Canadian and United Statesbusinesses, respectively. It is found that 221 of the Canadianfirms and 232 of the firms in the United States havemainframe computers.

Proportion of business in Canada having mainframe computers: p1.Proportion of business in United States having mainframe computers: p2.(1) Estimation: How to give point and interval estimation of p1 − p2?(2) Test: How to test if p1 − p2 > p0, p1 − p2 < p0, or p1 − p2 6= p0?

9 / 19

An i.i.d. sample of size n1 with x1 successes from Canada;An i.i.d. sample of size n2 with x2 successes from UnitedStates.

We have

E(x1n1− x2

n2) = E(

x1n1

)− E(x2n2

) = p1 − p2.

Thus, an unbiased point estimate for p1 − p2 is

p̂1 − p2 =

(x1n1− x2

n2

)=

221

375− 232

375= −0.03.

10 / 19

What about C.I. estimator for p1 − p2? That relies on thedistribution of the point estimator p̂1 − p2.

According to the Central Limit Theorem, we have

p̂1 − p2 − (p1 − p2)√p1(1− p1)/n1 + p2(1− p2)/n2

∼ N(0, 1).

Use p̂1 and p̂2 to replace p1 and p2 in the denominator, wehave:

p̂1 − p2 − (p1 − p2)√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

·∼ N(0, 1).

11 / 19

P

(−zα/2 ≤

p̂1 − p2 − (p1 − p2)√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

≤ zα/2

)= 1−α

Thus, the (1− α) C.I. is:

p̂1 − p2 ± zα/2√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2.

A 95% C.I. is

−0.03± 1.96√

(0.589)(0.411)/375 + (0.619)(0.381)/375

= −0.03± 0.07 = [−0.1, 0.04].

12 / 19

Consider the following tests:1. Right-tail test:

H0 : p1 − p2 = (p1 − p2)0 or H0 : p1 − p2 ≤ (p1 − p2)0

H1 : p1 − p2 > (p1 − p2)0

2. Left-tail test:

H0 : p1 − p2 = (p1 − p2)0 or H0 : p1 − p2 ≥ (p1 − p2)0

H1 : p1 − p2 < (p1 − p2)0

3. Two-tail test:

H0 : p1 − p2 = (p1 − p2)0

H1 : p1 − p2 6= (p1 − p2)0

13 / 19

Consider two cases: (p1 − p2)0 6= 0 and (p1 − p2)0 = 0. Teststatistics are different for these two cases.

Case 1: (p1 − p2)0 6= 0 example:

At the significance level α = 0.05, we want to test

H0 : p1 − p2 = −0.05 vs. H1 : p1 − p2 6= −0.05

We will reject H0 if p̂1 − p2 is too far from -0.05.

If H0 is true, by Central Limit Theorem, we have

p̂1 − p2 − (p1 − p2)0√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

·∼ N(0, 1).

14 / 19

We use test statistic p̂1−p2−(p1−p2)√p1(1−p1)/n1+p2(1−p2)/n2

, and when H0 is

true,

p̂1 − p2 − (p1 − p2)0√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

·∼ N(0, 1).

Since this is a two-tailed test, we reject H0 if∣∣∣∣∣ p̂1 − p2 − (p1 − p2)0√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

∣∣∣∣∣ ≥ zα/2

We observe∣∣∣∣ p̂1−p2−(p1−p2)0√p̂1(1−p̂1)/n1+p̂2(1−p̂2)/n2

∣∣∣∣ = −0.03−(−0.05)√0.589(1−0.589)

375+ 0.619(1−0.619)

375

= 0.560

z0.025 = 1.96. Thus, we fail to reject H0.

15 / 19

Case 2: (p1 − p2)0 = 0 example:

At the significance level α = 0.05, we want to test

H0 : p1 = p2 vs. H1 : p1 6= p2

We will reject H0 if p̂1 − p2 is too far from 0.

When H0 is true, then

p̂1 − p2 − 0√p(1− p)(1/n1 + 1/n2)

∼ N(0, 1).

When H0 is true, estimate p̂ = n1p̂1+n2p̂2n1+n2

, thus

p̂1 − p2√p̂(1− p̂)(1/n1 + 1/n2)

·∼ N(0, 1).

16 / 19

If H0 true, then

p̂1 − p2√p̂(1− p̂)(1/n1 + 1/n2)

·∼ N(0, 1).

Since this is a two-tailed test, at the level α, we reject H0 if∣∣∣∣∣ p̂1 − p2√p̂(1− p̂)(1/n1 + 1/n2)

∣∣∣∣∣ ≥ zα/2.

We observe p̂ = n1p̂1+n2p̂2n1+n2

= 375(0.589)+375(0.619)375+375

= 0.604, and∣∣∣∣ p̂1−p2√p̂(1−p̂)(1/n1+1/n2)

∣∣∣∣ =

∣∣∣∣ −0.03√0.604(1−0.604)( 1

375+ 1

375)

∣∣∣∣ = 0.840.

Since z0.025 = 1.96, we fail to reject H0.

17 / 19

Comparing Two Proportions

Summary:

A random sample of size n1 from population 1 with x1 success.A random sample of size n2 from population 2 with x2 success.

1. E( x1n1− x2

n2) = p1 − p2.

2. A (1− α) C.I. for p1 − p2 is

p̂1 − p2 ± zα/2√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2.

18 / 19

3. Tests:

(1) To test if p1 6= p2, p1 > p2 or p1 < p2 (in H1), under H0,use test statistic

p̂1 − p2√p̂(1− p̂)(1/n1 + 1/n2)

·∼ N(0, 1),

where p̂ = n1p̂1+n2p̂2n1+n2

.

(2) To test if p1 − p2 6= c0, p1 − p2 > c0 or p1 − p2 < c0 (inH1 and c0 6= 0), under H0, use test statistic

p̂1 − p2 − c0√p̂1(1− p̂1)/n1 + p̂2(1− p̂2)/n2

·∼ N(0, 1).

19 / 19

ch9-4710

Documents