statistics study guide - inference for proportions

of 13

Key Terms and Concepts

Before taking the Quiz, you need to be able to explain the meanings (and recognize symbols in cases where there is an associated symbol) of each of these terms or concepts. You should also know when and how to use them in statistics problems. These terms and concepts are defined in Key Terms. confidence interval for a single proportion confidence interval for two proportions margin of error for a single proportion pooled population proportions pooled standard error of the difference between two population proportions sample size for a given margin of error significance tests for a single proportion significance tests for single population proportions vs. confidence intervals for single

population proportions standard error of the difference between two population proportions z-intervals for a single proportion

______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

AP Statistics Review: Inference for Proportions

ToniLine

of 13

Objectives, Example Problems, and Study Tips Confidence Intervals and Hypothesis Tests for a Single Population Proportion Objective 1 Identify why and when it's proper to use z-procedures when dealing with proportions. Examples 1. Why are z-procedures used instead of t procedures for doing inference for proportions? 2. What assumptions are necessary to use z-procedures when doing inference for

proportions? Tip The answer has something to do with the binomial distribution, and the normal approximation to the binomial distribution. Answers 1. The count of successes, X, in a sample drawn from a much larger population follows

approximately a binomial distribution with mean np and standard deviation )1( pnp .

This means the distribution of the sample proportion nX

p = has an approximately

normal distribution with mean p = p and standard deviation npp

p)1(

= . You can

use the normal distribution (and z-procedures) when doing inference for proportions if the necessary assumptions are met (see #2 below). (In a more advanced statistics course, you may learn to do these problems by doing exact binomial calculations.)

2. Here are the assumptions necessary to use z-procedures when doing inference for

proportion: The sample data are a simple random sample from the population of interest. For the sampling distribution to be considered a binomial distribution, the population

must be at least 10 times the size of the sample. (Some textbooks say 20 times, but for this review we'll use 10.) Just be sure you checkyou'll get credit as long as you cite one of these commonly used rules.

The quantities )1(, pnpn must both be at least 10, and if you have a hypothesized

value of P ( 0p ), )1(, 00 pnnp must both be at least 10. (Some textbooks say "at

least 5" or "greater than 5," but for this review we'll use "at least 10.") Just be sure you checkyou'll get credit as long as you cite one of these commonly used rules.

Objective 2 Construct a confidence interval for a single population proportion using the point estimate, the standard error, and the critical z-value. Examples 1. What's the standard error of a sample proportion when you're constructing a confidence

interval for a single population proportion?



of 13

2. Construct a 99% confidence interval for the percentage of Americans planning to vote for Senator Porkbarrel if a sample of size 1,500 from this senator's district (voting population 500,000) identifies 650 who say they plan to vote for that candidate.

Tips The formula for a confidence interval for a single population proportion is

p z*n

pp )1( , where the point estimate is the sample proportion p, and the

standard error is n

pp )1( .

Remember these criteria: The sample data are a simple random sample from the population of interest. For the sampling distribution to be considered a binomial distribution, the population

is at least 10 times the size of the sample. (Some textbooks say 20 times, but for this review we'll use 10.) Just be sure you checkyou'll get credit as long as you cite one of these commonly used rules.

The quantities )1(, pnpn must both be at least 10, and if you have a hypothesized value of P ( 0p ), then )1(and 00 pnnp must both be at least 10. (Some textbooks

say "at least 5" or "greater than 5," but for this review we'll use "at least 10.") Just be sure you checkyou'll get credit as long as you cite one of these commonly used rules.

Answers 1. The standard error for a confidence interval for a single population proportion is

npp

sp)1(

= .

2. We can use a z-procedure because N > 10n, and pn and n(1 p) are both greater than

10.

Use p z*n

pp )1( .

For a 99% confidence interval for a population proportion, z* = 2.58, and

1500650 =p = .43. This gives us .43 2.58

1500)43.1(43.

= .43 2.58(.013) .

Remember that on a quiz you should show the details as we've done heredon't just report the answer you got from your calculator.

Objective 3 Determine the sample size needed for a given margin of error when constructing a confidence interval for a population proportion. Examples 1. You want to estimate the percentage of voters who'll vote for Sen. Porkbarrel in the next

election to within 2% with 95% confidence. What's the minimum value of n needed to do this?



of 13

2. Suppose we're pretty sure that Sen. Porkbarrel will receive very close to 60% of the

vote. What's the minimum sample size needed to be within 2% at the 95% level of confidence? At the 99% level?

Tips

The formula for the minimum sample size is n = 2*

mz

P*(1 P*), where P* is your

guess for the population proportion P. If you don't have a good reason to predict a value of P, use P = .5, or use the formula

2

2*

=

mz

n .

Answers

1. Assuming P = .5, n = 2

2*

mz

= ( )2

02.296.1

= 2,401.

2. Assuming P = .6, n =2*

mz

p*(1 p*) = 2

02.96.1

(.6)(.4) = 2,304.96. So 2,305

subjects will be needed for this test. Note that we need fewer subjects than when we assumed that P = .5. In general, P = .5 will give the greatest sample size (and thus the

safest estimate) for the expression n = 2*

mz

p*(1 p*). For a 99% confidence

interval, n = 2*

mz

p*(1 p*) = 2

02.58.2

(.6)(.4) = 3,993.84, so we'd need 3,994

subjects. You can see that it might be expensive to move from 95% confidence to 99% confidence!

Objective 4 Explain the difference between the standard error of a sample proportion for a confidence interval, and the standard deviation of a sample proportion for a significance test. Example Explain the difference between the standard error of a sample proportion for a confidence interval and the standard deviation of a sample proportion for a significance test. Answer For a significance test, you're assuming that you know the population proportion (the

hypothesized value, p0), and so you use p0 instead of p in the formula ( )

npp 1

,

and it becomes ( )

npp 00 1 . Since you're using p0 instead of p, you now have a

standard deviation. To summarize:



of 13

For confidence intervals, the standard error of p is ( )

npp 1

, where p is the sample

proportion, and n is the sample size.

For hypothesis tests, the standard deviation of p is ( )

npp 00 1 , where 0p is the

hypothesized value of P, and n is the sample size. Objective 5 Conduct a significance test about a single population proportion. Example Sen. Porkbarrel's official pollster predicts that he'll receive 50.1% of the vote. A simple random sample of 1,500 voters (voting population 500,000) identifies 700 who say they plan to vote for Sen. Porkbarrel. Conduct a test of the hypothesis that the population proportion who will vote for Sen. Porkbarrel is .501. Assume the alternative is that the pollster is wrong. Tips

Remember to use ( )

npp 00 1 instead of

( )n

pp 1 for a significance test.

The formula for a test statistic for a single population proportion is

( )n

pp

ppppz

p 00

0

0

1

=

=

.

Remember that when doing a significance test, the null hypothesis states that 0H : p = p0.

Answer

0H : p = .501 (50.1% of the voters will vote for Sen. Porkbarrel), and aH : p .501 (the

percentage of voters voting for Sen. Porkbarrel will be different from .501). Assuming that the 1,500 is an SRS from all eligible voters, the population from which the sample is drawn is more than 10 times greater than the sample size. In addition, (1,500)(.501) and (1,500)(.499) are both greater than 10. Thus we're justified in using z-procedures for this proportion problem.

Thus we have 500,1

700 =p = .467, 500,1

)499)(.501(. =p = .013 (remember in a significance

test you use the hypothesized value of P, not p), and 013.

501.467.z

= = 2.62 P

= 2(.004) = .008. This P-value is quite low, which leads us to reject our null hypothesis and conclude that the pollster was probably wrong in his belief that Sen. Porkbarrel would receive 50.1% of the vote. ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)


of 13

The Difference Between Two Proportions Objective 1 State the main criterion necessary to use two-sample procedures to compare two population proportions. Example State the main criterion necessary to use two-sample procedures to compare two population proportions. Answer The proportions must be for the same variable, but from two separate populations. The two populations can be two separate groups (proportion of Republicans in California compared to the proportion in Washington), or they may be the same group measured at different times (proportion of Californians in 1998 who are Republicans compared to the proportion of Californians in 2,000 who are Republicans). Objective 2 Construct confidence intervals for the difference between two population proportions. Examples 1. Three weeks before the election, a poll of 1,500 voters (voting population 500,000)

found that 700 planned to vote for Sen. Porkbarrel. One week before the election, a new sample of 1,200 voters found that 500 people planned to vote for Sen. Porkbarrel. Construct a 99% confidence interval for the difference between the population proportions.

2. Does the confidence interval you constructed in example 1 give evidence that support is

declining for Sen. Porkbarrel? Tip The formula for a confidence interval for the difference between two population proportions

is ( 1p 2p ) z*2

22

1

11 )1()1(

npp

npp

+

.

Answers

1. Use ( 1p 2p ) z*2

22

1

11 )1()1(

npp

npp

+

. In this example,

417.200,1

500,467.500,1

700 21 ==== pp . Thus we have

(.467 .417) (2.58)200,1

)583(.417.500,1

)533(.467.+ = .05 2.58(.019) = . (The

0+ indicates that the value is minutely greater than 0). Note that the expression for the standard error of the estimate uses the values of 1p and 2p .



of 13

2. Zero is on the verge of being in the interval, so technically we can't conclude that this difference is unlikely if the proportions were the same in the population. However, since 0 is so close to being in the interval, we should be able to argue that there's evidence (how strong is debatable) that the support for Sen. Porkbarrel seems to have declined between the two pollings.

Objective 3 State the justification criteria for pooling variances to estimate the standard error of the difference between two population proportions when doing a significance test. Example For means, it isn't a good idea, generally, to pool variances when calculating the standard error for the difference between two population means. Why is it okayeven requiredwhen doing a significance test for proportions? Tip Remember that when doing a significance test for the difference between two population proportions, the null assumes that the two proportions are equal. Answer When testing for a difference between two population means, we assume that the population means are equal, but we can't assume that the population standard deviations are equal without some further analysis. When testing for a difference between two proportions, the standard error is defined in terms of proportions, and the null hypothesis assumes that the proportions are equal.



of 13

To summarize it mathematically:

2

22

1

1121

)1()1(

npp

npp

s pp

+

=

, but 0H : 21 pp = . Thus

+=

+

=

212121

11)1(

)1()1(

nnpp

n

pp

n

pps pp .

This expression uses the pooled value of p:

21

21nnXX

p+

+= .

Objective 4 Conduct a significance test for the difference between two population proportions. Examples 1. Three weeks before the election, a poll of 1,500 voters (voting population 500,000)

found that 700 planned to vote for Sen. Porkbarrel. One week before the election, a new sample of 1,200 voters found that 500 people planned to vote for Sen. Porkbarrel. Conduct a test of the hypothesis that support for Sen. Porkbarrel has declined.

2. The test described in example 1 is one-sided since we're testing for a decline only.

Suppose we were only interested in whether the support level had changed. Use your answer from example 1 to state the probability that the difference between the two sample proportions was due to chance if, in fact, the population proportions were the same.

Tips

There are four steps to every significance-test problem: 1. State the hypotheses in the context of the problem. 2. State the test you plan to use and justify the assumptions needed to use it. 3. Calculate a test statistic and P-value. 4. State a conclusion.

In your calculations for the standard error of p, remember to use the pooled value

for the sample proportion: p = 21

21nnXX

+

+.

The test statistic for the difference between two population proportions is

pspp

z

21 = , where ps =

+

21

11)1(

nnpp , and

21

21nnXX

p+

+= .



of 13

Answers 1. Step 1: 0H : p1 = p2 (The proportion of the population that will vote for Sen. Porkbarrel hasn't

changed.) aH : p1 > p2 (The proportion of the population that will vote for Sen. Porkbarrel has

declined.)

444.500,1200,1

500700,417.200,1

500,467.500,1

700 21 =+

+===== ppp

Step 2: The population of people eligible to vote is much larger than the samples; also

(.444)(2,700) and (.556)(2,700) are both greater than 10. We're justified in using a z-test for this two-proportion problem.

Step 3:

019.05.

200,11

500,11

)556)(.444(.

417.467.=

+

=z = 2.63 P-value = .004

Note that the standard error of the estimate uses the pooled value of p: 21

21nnXX

p+

+= .

Step 4: This P-value is quite low and provides strong evidence in support of rejecting the null

hypothesis. We can conclude that support for Sen. Porkbarrel has declined significantly. (Note: If a significance level isn't stated beforehand, usually P .05 is evidence for

rejecting the null.)

2. To go from a P-value for a one-sided alternative hypothesis to a P-value for a two-sided

alternative hypothesis, multiply the one-sided P-value by two. Thus you'd get 2(.004) = .008.



of 13

Summary of Formulas Although you'll be provided with a formula sheet containing the elements of the formulas in this Review to use on the unit quiz, it's very important that you understand what these formulas mean and how they're used. Some of the formulas here are listed with extra notes as to how they're used, what the different symbols mean, or how to calculate degrees of freedom. You won't be given this information on the unit quiz. In some cases the formula sheet may not give you the exact formula you see here, so you'll need to understand the formulas well enough to be able to adapt them. For example, if you need to look up the standard error of p on the formula sheet, you may only find the formula for the standard deviation of a population proportion:

npp )1(

. You'd need to know that you have to modify this formula slightly to get

the standard error of p: n

pp )1( .

If you use these formulas a lot as you study and you understand what they mean and where they come from, you should have no trouble using them on a quiz or exam. (By the time you've used the formulas enough and understand them well, you'll probably find that you've memorized most of them anyway.) Confidence interval for a single population proportion:

p z*( ps ), where ps = standard error of p = npp )1(

Putting this all together, you get:

p z*n

pp )1(

Note: You'll need to know that the general form for a confidence interval is (estimate) (critical value)(standard error or standard deviation), and you'll also need to know which estimate or formula to plug in for each element.



of 13

Minimum sample size for a given margin of error and confidence level:

n = 2*

mz

P*(1 P*), where P* is your guess for the population proportion P.

If you don't have a good reason to predict a value of P, use P = .5, or use the formula

2

2*

=

mz

n .

Note: You'll either need to memorize these formulas, or know how to derive them from the formula for a confidence interval. Test statistic for a hypothesis test on a single population proportion:

( )n

pp

ppppz

p 00

0

0

1

=

=

, where

( )n

pp 00 1 is the standard deviation of p.

Note: You'll need to know that a test statistic is usually constructed as:

statistic the oferror standardor deviation standardvalues two between difference

.

You'll also need to know what to plug in for the numerator and denominator. The elements will be given to you on the formula sheet, but you'll have to know how they fit together. Confidence interval for the difference between two population proportions:

( 1p 2p ) z*( 21 pps ), where 21 pps =2

22

1

11 )1()1(

npp

npp

+

Putting this together, you get: ( 1p 2p ) z*2

22

1

11 )1()1(

npp

npp

+

.

Note: You'll need to know that the general form for a confidence interval is (estimate) (critical value)(standard error or standard deviation), and you'll also need to know which estimate or formula to plug in for each element.



of 13

Test statistic for a hypothesis test for the difference between two population proportions:

pspp

z

21 = , where ps =

+

21

11)1(

nnpp , and

21

21nnXX

p+

+= .

Note: You'll need to know that a test statistic is usually constructed as:

statistic the oferror standardor deviation standardvalues two between difference

.

You'll also need to know what to plug in for the numerator and denominator. The elements will be given to you on the formula sheet, but you'll have to know how they fit together.



of 13

About the Unit Quiz What to Bring Scratch paper Calculator Approved formula sheet Approved tables You can't have any reference materials other than those specifically mentioned above. You won't be able to ask for help during the quiz.

Hints and Tips for the Free-Response Portion Show your work. The test corrector won't assume you used proper set up and

methods if you reach the correct answer. It's up to you to communicate the methods that you used. Answers alone, without appropriate justification, will receive no credit.

Take your time reading the question. Since we want to see how well you can apply

your knowledge to new and somewhat unfamiliar situations, take some time to think about the question. If you don't understand the question, you're unlikely to find the right answer. Read the entire question before beginning to answer.

Most questions will be given in several parts. The answers from one section will

often be used in subsequent sections. Missing points in an early section does not mean you'll lose points in subsequent sections. Again, read the entire question to see how the different sections connect to each other.

The calculator. As in the AP Exam, this quiz will test you on how well you know

statistics, not on how well you can use your calculator. Be sure you understand the concepts behind the calculator operations. Don't use "calculator-speak" in your answerthe instructor doesn't want to read a set of steps for the calculator! Use your calculator for doing the mechanics, but be sure to clearly communicate your process for solving the problem.

Use Units. If units are given in the problem, make sure that you give them in your

answer. Answer the Question. Finally, be very careful to answer the question asked. Before

you move on, read over your answer to make sure you're providing exactly what the question asks for. Generally, an answer to a question you weren't asked will receive no credit.



Key Terms and ConceptsExamplesTipAnswersExamplesTipsAnswersExamples

TipsAnswersExample

TipsAnswerExamples

Does the confidence interval you constructed in example 1 give evidence that support is declining for Sen. Porkbarrel?AnswersExample

TipAnswerExamples

TipsAnswersWhat to BringHints and Tips for the Free-Response Portion

statistics study guide - inference for proportions

Documents