statistics study guide - inference for proportions

Upload: ldlewis

Post on 02-Nov-2015

1 views

Category:

Documents


0 download

DESCRIPTION

Inference for Proportions

TRANSCRIPT

  • Page 1 of 13

    Key Terms and Concepts

    Before taking the Quiz, you need to be able to explain the meanings (and recognize symbols in cases where there is an associated symbol) of each of these terms or concepts. You should also know when and how to use them in statistics problems. These terms and concepts are defined in Key Terms. confidence interval for a single proportion confidence interval for two proportions margin of error for a single proportion pooled population proportions pooled standard error of the difference between two population proportions sample size for a given margin of error significance tests for a single proportion significance tests for single population proportions vs. confidence intervals for single

    population proportions standard error of the difference between two population proportions z-intervals for a single proportion

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

    ToniLine

  • Page 2 of 13

    Objectives, Example Problems, and Study Tips Confidence Intervals and Hypothesis Tests for a Single Population Proportion Objective 1 Identify why and when it's proper to use z-procedures when dealing with proportions. Examples 1. Why are z-procedures used instead of t procedures for doing inference for proportions? 2. What assumptions are necessary to use z-procedures when doing inference for

    proportions? Tip The answer has something to do with the binomial distribution, and the normal approximation to the binomial distribution. Answers 1. The count of successes, X, in a sample drawn from a much larger population follows

    approximately a binomial distribution with mean np and standard deviation )1( pnp .

    This means the distribution of the sample proportion nX

    p = has an approximately

    normal distribution with mean p = p and standard deviation npp

    p)1(

    = . You can

    use the normal distribution (and z-procedures) when doing inference for proportions if the necessary assumptions are met (see #2 below). (In a more advanced statistics course, you may learn to do these problems by doing exact binomial calculations.)

    2. Here are the assumptions necessary to use z-procedures when doing inference for

    proportion: The sample data are a simple random sample from the population of interest. For the sampling distribution to be considered a binomial distribution, the population

    must be at least 10 times the size of the sample. (Some textbooks say 20 times, but for this review we'll use 10.) Just be sure you checkyou'll get credit as long as you cite one of these commonly used rules.

    The quantities )1(, pnpn must both be at least 10, and if you have a hypothesized

    value of P ( 0p ), )1(, 00 pnnp must both be at least 10. (Some textbooks say "at

    least 5" or "greater than 5," but for this review we'll use "at least 10.") Just be sure you checkyou'll get credit as long as you cite one of these commonly used rules.

    Objective 2 Construct a confidence interval for a single population proportion using the point estimate, the standard error, and the critical z-value. Examples 1. What's the standard error of a sample proportion when you're constructing a confidence

    interval for a single population proportion?

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

  • Page 3 of 13

    2. Construct a 99% confidence interval for the percentage of Americans planning to vote for Senator Porkbarrel if a sample of size 1,500 from this senator's district (voting population 500,000) identifies 650 who say they plan to vote for that candidate.

    Tips The formula for a confidence interval for a single population proportion is

    p z*n

    pp )1( , where the point estimate is the sample proportion p, and the

    standard error is n

    pp )1( .

    Remember these criteria: The sample data are a simple random sample from the population of interest. For the sampling distribution to be considered a binomial distribution, the population

    is at least 10 times the size of the sample. (Some textbooks say 20 times, but for this review we'll use 10.) Just be sure you checkyou'll get credit as long as you cite one of these commonly used rules.

    The quantities )1(, pnpn must both be at least 10, and if you have a hypothesized value of P ( 0p ), then )1(and 00 pnnp must both be at least 10. (Some textbooks

    say "at least 5" or "greater than 5," but for this review we'll use "at least 10.") Just be sure you checkyou'll get credit as long as you cite one of these commonly used rules.

    Answers 1. The standard error for a confidence interval for a single population proportion is

    npp

    sp)1(

    = .

    2. We can use a z-procedure because N > 10n, and pn and n(1 p) are both greater than

    10.

    Use p z*n

    pp )1( .

    For a 99% confidence interval for a population proportion, z* = 2.58, and

    1500650 =p = .43. This gives us .43 2.58

    1500)43.1(43.

    = .43 2.58(.013) .

    Remember that on a quiz you should show the details as we've done heredon't just report the answer you got from your calculator.

    Objective 3 Determine the sample size needed for a given margin of error when constructing a confidence interval for a population proportion. Examples 1. You want to estimate the percentage of voters who'll vote for Sen. Porkbarrel in the next

    election to within 2% with 95% confidence. What's the minimum value of n needed to do this?

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

  • Page 4 of 13

    2. Suppose we're pretty sure that Sen. Porkbarrel will receive very close to 60% of the

    vote. What's the minimum sample size needed to be within 2% at the 95% level of confidence? At the 99% level?

    Tips

    The formula for the minimum sample size is n = 2*

    mz

    P*(1 P*), where P* is your

    guess for the population proportion P. If you don't have a good reason to predict a value of P, use P = .5, or use the formula

    2

    2*

    =

    mz

    n .

    Answers

    1. Assuming P = .5, n = 2

    2*

    mz

    = ( )2

    02.296.1

    = 2,401.

    2. Assuming P = .6, n =2*

    mz

    p*(1 p*) = 2

    02.96.1

    (.6)(.4) = 2,304.96. So 2,305

    subjects will be needed for this test. Note that we need fewer subjects than when we assumed that P = .5. In general, P = .5 will give the greatest sample size (and thus the

    safest estimate) for the expression n = 2*

    mz

    p*(1 p*). For a 99% confidence

    interval, n = 2*

    mz

    p*(1 p*) = 2

    02.58.2

    (.6)(.4) = 3,993.84, so we'd need 3,994

    subjects. You can see that it might be expensive to move from 95% confidence to 99% confidence!

    Objective 4 Explain the difference between the standard error of a sample proportion for a confidence interval, and the standard deviation of a sample proportion for a significance test. Example Explain the difference between the standard error of a sample proportion for a confidence interval and the standard deviation of a sample proportion for a significance test. Answer For a significance test, you're assuming that you know the population proportion (the

    hypothesized value, p0), and so you use p0 instead of p in the formula ( )

    npp 1

    ,

    and it becomes ( )

    npp 00 1 . Since you're using p0 instead of p, you now have a

    standard deviation. To summarize:

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

  • Page 5 of 13

    For confidence intervals, the standard error of p is ( )

    npp 1

    , where p is the sample

    proportion, and n is the sample size.

    For hypothesis tests, the standard deviation of p is ( )

    npp 00 1 , where 0p is the

    hypothesized value of P, and n is the sample size. Objective 5 Conduct a significance test about a single population proportion. Example Sen. Porkbarrel's official pollster predicts that he'll receive 50.1% of the vote. A simple random sample of 1,500 voters (voting population 500,000) identifies 700 who say they plan to vote for Sen. Porkbarrel. Conduct a test of the hypothesis that the population proportion who will vote for Sen. Porkbarrel is .501. Assume the alternative is that the pollster is wrong. Tips

    Remember to use ( )

    npp 00 1 instead of

    ( )n

    pp 1 for a significance test.

    The formula for a test statistic for a single population proportion is

    ( )n

    pp

    ppppz

    p 00

    0

    0

    1

    =

    =

    .

    Remember that when doing a significance test, the null hypothesis states that 0H : p = p0.

    Answer

    0H : p = .501 (50.1% of the voters will vote for Sen. Porkbarrel), and aH : p .501 (the

    percentage of voters voting for Sen. Porkbarrel will be different from .501). Assuming that the 1,500 is an SRS from all eligible voters, the population from which the sample is drawn is more than 10 times greater than the sample size. In addition, (1,500)(.501) and (1,500)(.499) are both greater than 10. Thus we're justified in using z-procedures for this proportion problem.

    Thus we have 500,1

    700 =p = .467, 500,1

    )499)(.501(. =p = .013 (remember in a significance

    test you use the hypothesized value of P, not p), and 013.

    501.467.z

    = = 2.62 P

    = 2(.004) = .008. This P-value is quite low, which leads us to reject our null hypothesis and conclude that the pollster was probably wrong in his belief that Sen. Porkbarrel would receive 50.1% of the vote. ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

  • Page 6 of 13

    The Difference Between Two Proportions Objective 1 State the main criterion necessary to use two-sample procedures to compare two population proportions. Example State the main criterion necessary to use two-sample procedures to compare two population proportions. Answer The proportions must be for the same variable, but from two separate populations. The two populations can be two separate groups (proportion of Republicans in California compared to the proportion in Washington), or they may be the same group measured at different times (proportion of Californians in 1998 who are Republicans compared to the proportion of Californians in 2,000 who are Republicans). Objective 2 Construct confidence intervals for the difference between two population proportions. Examples 1. Three weeks before the election, a poll of 1,500 voters (voting population 500,000)

    found that 700 planned to vote for Sen. Porkbarrel. One week before the election, a new sample of 1,200 voters found that 500 people planned to vote for Sen. Porkbarrel. Construct a 99% confidence interval for the difference between the population proportions.

    2. Does the confidence interval you constructed in example 1 give evidence that support is

    declining for Sen. Porkbarrel? Tip The formula for a confidence interval for the difference between two population proportions

    is ( 1p 2p ) z*2

    22

    1

    11 )1()1(

    npp

    npp

    +

    .

    Answers

    1. Use ( 1p 2p ) z*2

    22

    1

    11 )1()1(

    npp

    npp

    +

    . In this example,

    417.200,1

    500,467.500,1

    700 21 ==== pp . Thus we have

    (.467 .417) (2.58)200,1

    )583(.417.500,1

    )533(.467.+ = .05 2.58(.019) = . (The

    0+ indicates that the value is minutely greater than 0). Note that the expression for the standard error of the estimate uses the values of 1p and 2p .

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

  • Page 7 of 13

    2. Zero is on the verge of being in the interval, so technically we can't conclude that this difference is unlikely if the proportions were the same in the population. However, since 0 is so close to being in the interval, we should be able to argue that there's evidence (how strong is debatable) that the support for Sen. Porkbarrel seems to have declined between the two pollings.

    Objective 3 State the justification criteria for pooling variances to estimate the standard error of the difference between two population proportions when doing a significance test. Example For means, it isn't a good idea, generally, to pool variances when calculating the standard error for the difference between two population means. Why is it okayeven requiredwhen doing a significance test for proportions? Tip Remember that when doing a significance test for the difference between two population proportions, the null assumes that the two proportions are equal. Answer When testing for a difference between two population means, we assume that the population means are equal, but we can't assume that the population standard deviations are equal without some further analysis. When testing for a difference between two proportions, the standard error is defined in terms of proportions, and the null hypothesis assumes that the proportions are equal.

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

  • Page 8 of 13

    To summarize it mathematically:

    2

    22

    1

    1121

    )1()1(

    npp

    npp

    s pp

    +

    =

    , but 0H : 21 pp = . Thus

    +=

    +

    =

    212121

    11)1(

    )1()1(

    nnpp

    n

    pp

    n

    pps pp .

    This expression uses the pooled value of p:

    21

    21nnXX

    p+

    += .

    Objective 4 Conduct a significance test for the difference between two population proportions. Examples 1. Three weeks before the election, a poll of 1,500 voters (voting population 500,000)

    found that 700 planned to vote for Sen. Porkbarrel. One week before the election, a new sample of 1,200 voters found that 500 people planned to vote for Sen. Porkbarrel. Conduct a test of the hypothesis that support for Sen. Porkbarrel has declined.

    2. The test described in example 1 is one-sided since we're testing for a decline only.

    Suppose we were only interested in whether the support level had changed. Use your answer from example 1 to state the probability that the difference between the two sample proportions was due to chance if, in fact, the population proportions were the same.

    Tips

    There are four steps to every significance-test problem: 1. State the hypotheses in the context of the problem. 2. State the test you plan to use and justify the assumptions needed to use it. 3. Calculate a test statistic and P-value. 4. State a conclusion.

    In your calculations for the standard error of p, remember to use the pooled value

    for the sample proportion: p = 21

    21nnXX

    +

    +.

    The test statistic for the difference between two population proportions is

    pspp

    z

    21 = , where ps =

    +

    21

    11)1(

    nnpp , and

    21

    21nnXX

    p+

    += .

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

  • Page 9 of 13

    Answers 1. Step 1: 0H : p1 = p2 (The proportion of the population that will vote for Sen. Porkbarrel hasn't

    changed.) aH : p1 > p2 (The proportion of the population that will vote for Sen. Porkbarrel has

    declined.)

    444.500,1200,1

    500700,417.200,1

    500,467.500,1

    700 21 =+

    +===== ppp

    Step 2: The population of people eligible to vote is much larger than the samples; also

    (.444)(2,700) and (.556)(2,700) are both greater than 10. We're justified in using a z-test for this two-proportion problem.

    Step 3:

    019.05.

    200,11

    500,11

    )556)(.444(.

    417.467.=

    +

    =z = 2.63 P-value = .004

    Note that the standard error of the estimate uses the pooled value of p: 21

    21nnXX

    p+

    += .

    Step 4: This P-value is quite low and provides strong evidence in support of rejecting the null

    hypothesis. We can conclude that support for Sen. Porkbarrel has declined significantly. (Note: If a significance level isn't stated beforehand, usually P .05 is evidence for

    rejecting the null.)

    2. To go from a P-value for a one-sided alternative hypothesis to a P-value for a two-sided

    alternative hypothesis, multiply the one-sided P-value by two. Thus you'd get 2(.004) = .008.

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

  • Page 10 of 13

    Summary of Formulas Although you'll be provided with a formula sheet containing the elements of the formulas in this Review to use on the unit quiz, it's very important that you understand what these formulas mean and how they're used. Some of the formulas here are listed with extra notes as to how they're used, what the different symbols mean, or how to calculate degrees of freedom. You won't be given this information on the unit quiz. In some cases the formula sheet may not give you the exact formula you see here, so you'll need to understand the formulas well enough to be able to adapt them. For example, if you need to look up the standard error of p on the formula sheet, you may only find the formula for the standard deviation of a population proportion:

    npp )1(

    . You'd need to know that you have to modify this formula slightly to get

    the standard error of p: n

    pp )1( .

    If you use these formulas a lot as you study and you understand what they mean and where they come from, you should have no trouble using them on a quiz or exam. (By the time you've used the formulas enough and understand them well, you'll probably find that you've memorized most of them anyway.) Confidence interval for a single population proportion:

    p z*( ps ), where ps = standard error of p = npp )1(

    Putting this all together, you get:

    p z*n

    pp )1(

    Note: You'll need to know that the general form for a confidence interval is (estimate) (critical value)(standard error or standard deviation), and you'll also need to know which estimate or formula to plug in for each element.

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

  • Page 11 of 13

    Minimum sample size for a given margin of error and confidence level:

    n = 2*

    mz

    P*(1 P*), where P* is your guess for the population proportion P.

    If you don't have a good reason to predict a value of P, use P = .5, or use the formula

    2

    2*

    =

    mz

    n .

    Note: You'll either need to memorize these formulas, or know how to derive them from the formula for a confidence interval. Test statistic for a hypothesis test on a single population proportion:

    ( )n

    pp

    ppppz

    p 00

    0

    0

    1

    =

    =

    , where

    ( )n

    pp 00 1 is the standard deviation of p.

    Note: You'll need to know that a test statistic is usually constructed as:

    statistic the oferror standardor deviation standardvalues two between difference

    .

    You'll also need to know what to plug in for the numerator and denominator. The elements will be given to you on the formula sheet, but you'll have to know how they fit together. Confidence interval for the difference between two population proportions:

    ( 1p 2p ) z*( 21 pps ), where 21 pps =2

    22

    1

    11 )1()1(

    npp

    npp

    +

    Putting this together, you get: ( 1p 2p ) z*2

    22

    1

    11 )1()1(

    npp

    npp

    +

    .

    Note: You'll need to know that the general form for a confidence interval is (estimate) (critical value)(standard error or standard deviation), and you'll also need to know which estimate or formula to plug in for each element.

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

  • Page 12 of 13

    Test statistic for a hypothesis test for the difference between two population proportions:

    pspp

    z

    21 = , where ps =

    +

    21

    11)1(

    nnpp , and

    21

    21nnXX

    p+

    += .

    Note: You'll need to know that a test statistic is usually constructed as:

    statistic the oferror standardor deviation standardvalues two between difference

    .

    You'll also need to know what to plug in for the numerator and denominator. The elements will be given to you on the formula sheet, but you'll have to know how they fit together.

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

  • Page 13 of 13

    About the Unit Quiz What to Bring Scratch paper Calculator Approved formula sheet Approved tables You can't have any reference materials other than those specifically mentioned above. You won't be able to ask for help during the quiz.

    Hints and Tips for the Free-Response Portion Show your work. The test corrector won't assume you used proper set up and

    methods if you reach the correct answer. It's up to you to communicate the methods that you used. Answers alone, without appropriate justification, will receive no credit.

    Take your time reading the question. Since we want to see how well you can apply

    your knowledge to new and somewhat unfamiliar situations, take some time to think about the question. If you don't understand the question, you're unlikely to find the right answer. Read the entire question before beginning to answer.

    Most questions will be given in several parts. The answers from one section will

    often be used in subsequent sections. Missing points in an early section does not mean you'll lose points in subsequent sections. Again, read the entire question to see how the different sections connect to each other.

    The calculator. As in the AP Exam, this quiz will test you on how well you know

    statistics, not on how well you can use your calculator. Be sure you understand the concepts behind the calculator operations. Don't use "calculator-speak" in your answerthe instructor doesn't want to read a set of steps for the calculator! Use your calculator for doing the mechanics, but be sure to clearly communicate your process for solving the problem.

    Use Units. If units are given in the problem, make sure that you give them in your

    answer. Answer the Question. Finally, be very careful to answer the question asked. Before

    you move on, read over your answer to make sure you're providing exactly what the question asks for. Generally, an answer to a question you weren't asked will receive no credit.

    ______________________________ Copyright 2011 Apex Learning Inc. (See Terms of Use at www.apexvs.com/TermsOfUse)

    AP Statistics Review: Inference for Proportions

    Key Terms and ConceptsExamplesTipAnswersExamplesTipsAnswersExamples

    TipsAnswersExample

    TipsAnswerExamples

    Does the confidence interval you constructed in example 1 give evidence that support is declining for Sen. Porkbarrel?AnswersExample

    TipAnswerExamples

    TipsAnswersWhat to BringHints and Tips for the Free-Response Portion