chapter10 revised

77
Chapter 10 Introduction to Inference Section 10.1 Estimating with Confidence

Upload: broward-county-schools

Post on 20-Jun-2015

1.212 views

Category:

Education


0 download

DESCRIPTION

powerpoint for chapter 10 AP Stat

TRANSCRIPT

Page 1: Chapter10 Revised

Chapter 10Introduction to Inference

Section 10.1Estimating with Confidence

Page 2: Chapter10 Revised

Twenty five samples from the same population gave these 95% confidence intervals, 95% of all samples give an interval that contains the population mean u.

Page 3: Chapter10 Revised

In 95% of all samples, x-bar lies within ± 9 of the unknown population mean µ. Also µ lies within ± 9 of x-bar of the samples.

Page 4: Chapter10 Revised

To say that x-bar is a 95% confidence interval for the population mean µ is to say that in repeated samples, 95% of these intervals capture the µ

Page 5: Chapter10 Revised

The central probability 0.8 under a standard normal curve lies between -1.28 and + 1.28. That is, there is area .1 to the left and to the right of the curve.

Page 6: Chapter10 Revised

The central probability C under a standard normal curve lies between –z* and + z* has area (1-C)/2 to its right under the curve, we call it the upper (1-C)/2 critical value.

Page 7: Chapter10 Revised

• As we learned in Chapter 9, because of sampling variability, the statistic calculated from a sample is rarely equal to the true parameter of interest. Therefore, when we are trying to estimate a parameter, we must go beyond our statistic value to construct a reasonable range that captures the true parameter value.

Page 8: Chapter10 Revised

Confidence Interval:

• The data must come from SRS of the population of interest

• The sampling distribution of x-bar is approximately normal

Page 9: Chapter10 Revised

Sampling Distribution of p-hat Proportions:

• Let p-hat be the proportion of successes in a random sample of size n from a population whose proportion is p.The mean value of p-hat is µp-hat and standard deviation σ-p-hat• Rule 1: µp-hat = p

• Rule 2: σp-hat = • Rule 3: when n is large and p is not very close to 0 or 1, the sampling distribution of p is approximately normal.

(1 )p p

n

Page 10: Chapter10 Revised

Sampling Distribution of x-bar – Means

• Let x-bar be the mean of the observations in a random sample of size n having mean µ and standard deviation σ. The mean value of the

x-bar distribution µ and standard deviation of x-bar is σ

• Rule 1: µxbar = µ

• Rule 2: σ xbar = σ / as long as 10% of the population is in the sample.

• Rule 3: when the population distribution is normal, the sampling distribution of x-bar is also normal for any sample size n

• Rule 4: (Central Limit Theorem) CLT when n is sufficiently large, the sampling distribution of x-bar is well approximated by a normal curve even when the population is not itself normal.

n

Page 11: Chapter10 Revised

Confidence Level C

Single Tail Area(1-C)/2

Critical z value

90% .05 1.645

95% .025 1.960

99% .005 2.576

Page 12: Chapter10 Revised

Estimating with Confidence – Interval Behavior Margin of Error = (z critical value)σ/ n

What are some ways in which we could lower our margin of error?

• z get smaller – as smaller confidence level• σ gets smaller• n gets larger

Page 13: Chapter10 Revised

Confidence Interval for pSingle Proportion

Confidence Interval for µSingle Mean (σ known)

Conditions Conditions

If an SRS of size n is chosen from a population with unknown proportion p, then a level C Confidence Interval for p is:

If an SRS of size n is chosen from a population with unknown proportion, µ then a level C Confidence Interval for µ is:

p-hat ± z critical value ( ((p(1-p)/ n)

X-bar ± z critical value (σ / n

Page 14: Chapter10 Revised

We will follow the 4 step process outlined below throughout the chapter.

Page 15: Chapter10 Revised

Inference Toolbox: Confidence Intervals

To Construct a Confidence Interval for a parameter

(1)Identify the population of interest and the parameter you want to draw conclusions about

(2)Choose the appropriate inference procedure.

Verify the conditions for the selected parameter

(3)If the conditions are met, carry out the inference procedures.

CI = estimate ± margin of error

(4)Interpret the results in the context of the problem

Page 16: Chapter10 Revised

Example Problem 1 A manufacturer of high resolution video terminals

must control the tension of the viewing screen to avoid tears and wrinkles. The tension is measured in millivolts (mV). Careful study has shown that when the process is operating properly, the standard deviation of the tension reading is 43mV.

(1) Given a sample of 20 screens, with xbar = 306.3 mV, from a single day’s production, construct a 90% confidence interval for the mean tension µ of all the screens produced on this day. Follow the steps from the previous slide.

Page 17: Chapter10 Revised

1)Identify the population of interest and the parameter you want to draw conclusions about all video terminal produced on the same day•Choose the appropriate inference procedure. σ is known use CI for µ of SRS of 20•Verify the conditions for the selected parameter x-bar is approximately normal•If the conditions are met, carry out the inference procedures. •CI = estimate ± margin of errorCI = x-bar ± critical z value · σ/CI = 306.3 ± 1.645 · 43/ = 306.3 ± 15.8 = (290.5, 322.1)•Interpret the results in the context of the problem We are 90% confident that the true mean is between 290.5 and 322.1

n

20

Page 18: Chapter10 Revised

• 2. Suppose the manufacturer wants 99% confidence rather than 90%. Using your data from problem 1, construct a 99% CI. How does it compare to the 90% CI? Why?

306.3 ± 2.576 (43/ ) =

306.3 ± 24.768 =

(281.532, 331.068) the interval is larger. We have greater confidence that the interval captures the true value.

20

Page 19: Chapter10 Revised

• 3. Company management wants to report the mean screen tension for the day’s production accurate within 5 mV with 95% confidence. How large a sample of video monitors must be measured to comply with this request?

1.960(43)/

84.28 =

16.856 =

n = 284.12 approximately 285

5n

n

n

Page 20: Chapter10 Revised

Do problems 1,7 and 10 from The Practice of Statistics 2nd edition.

Page 21: Chapter10 Revised

Section 10.2

Tests of Significance

Page 22: Chapter10 Revised

Tests of Significance:

We have learned that Confidence Intervals can be used to estimate a parameter. Often in statistics we want to use sample data to determine whether or not a claim or hypothesis about a parameter is plausible. A test of significance is a procedure in which we can use sample data to test such a claim. We will focus on tests about a population mean μ and proportions p.

Page 23: Chapter10 Revised

Note about Identifying H0 and HA

Page 24: Chapter10 Revised

Practice: Identify the Null and Alternative Hypothesis. Express the corresponding null and alternative hypotheses in symbolic form.

a) The proportion of drivers who admit to running red lights is greater than 0.5.

Page 25: Chapter10 Revised

a) The proportion of drivers who admit to running red lights is greater than 0.5.

• First we express the given claim as p > 0.5. • Next, we notice that the statement does not contain an

equality so it has to be the alternative hypothesis.H0 : p = 0.5HA : p > 0.5.

Page 26: Chapter10 Revised

Practice: Identify the Null and Alternative Hypothesis. Express the corresponding null and alternative hypotheses in symbolic form.

b) The mean height of professional basketball players is at most 7 ft.

Page 27: Chapter10 Revised

b) The mean height of professional basketball players is at most 7 ft.

• Express “a mean of at most 7 ft” in symbols 7. • The expression µ 7 contains an equality so it is the null

hypothesis.H0: µ 7

HA: µ > 0.5.

Page 28: Chapter10 Revised

c) The standard deviation of IQ scores of actors is equal to 15.

Practice: Identify the Null and Alternative Hypothesis. Express the corresponding null and alternative hypotheses in symbolic form.

Page 29: Chapter10 Revised

c) The standard deviation of IQ scores of actors is equal to 15.• Express the given claim as = 15. This is the null

hypothesis. • Ho: = 15

HA: 15

Page 30: Chapter10 Revised

Example Problem 2

Spinifex pigeons, one of the few bird species that inhabit the desert of Western Australia, rely on seeds for food. The article “Field metabolism and water requirements of Spinifex pigeons in Western Australia” reported the following Minitab analysis of the weight of seed in grams in the stomach contents of Spinifex pigeons. Use the analysis to construct and interpret a 95% confidence interval for the average weight of seeds in this type of pigeon’s stomach and compare your results to the hypothesis that the mean seed amount is 1g for all Spinifex pigeons.

TEST OF MU = 1.000 VS MU N.E. 1.000

N MEAN STDEV SE MEAN T PVALUE

WEIGHT 16 1.373 1.034 0.258 1.44 0.17

Page 31: Chapter10 Revised

The Spinifex pigeon problem• p (z = 1.373 =1.000)/.258 =• p( z= 1.445)• p = .1490• 95% CI:• 1.373 1.96 (

• 1.373 1.96 (.258)• .8664 < μ< 1.880

• 95% CI confident the mean is between .8664 and 1.880

• (.8664, 1.880)

1.034)

16

Page 32: Chapter10 Revised

• Do problems 12, 30 and 42 from The Practice of Statistics 2nd edition

Page 33: Chapter10 Revised

Section 10.3

Making Sense of Statistical Significance

Page 34: Chapter10 Revised

Inference Toolbox: Test of Significance

(1) Identify the population of interest and the parameter you want to draw conclusions about

(2) State null and alternative hypotheses in words and symbols

(3) Choose the appropriate inference procedure

(4) Verify the conditions for using the procedure

(5) If the conditions are met, carry out the inference procedures

(6) Interpret the results in the context of the problem

Page 35: Chapter10 Revised

1) The sample observations are a simple random sample.

2) The conditions for a binomial experiment are satisfied

3) The condition np 5 and n(1-p) 5 are satisfied,  so the binomial distribution of sample proportions can be approximated by a normal distribution with µ = np and

= np(1-p) .

Assumptions for Testing Claims About a

Population Proportion p

Page 36: Chapter10 Revised

To determine the significance at a set level, alpha, one needs to calculate the P-value for the observed statistic. Careful consideration of the test statistic, z, can also be used to determine significance:

Page 37: Chapter10 Revised

P-Value

The P-value (or p-value or probability value) is the probability of getting a value of the test statistic that is at least as extreme as the one representing the sample data, assuming that the null hypothesis is true. The null hypothesis is rejected if the P-value is very small, such as 0.05 or less.

Page 38: Chapter10 Revised
Page 39: Chapter10 Revised

Two-tailed TestH0: =

H1:

is divided equally between the two tails of the critical

region

Means less than or greater than

Page 40: Chapter10 Revised

Right-tailed Test

H0: =

H1: > Points Right

Page 41: Chapter10 Revised

Left-tailed Test

H0: =

H1: < Points Left

Page 42: Chapter10 Revised
Page 43: Chapter10 Revised

Critical Region

The critical region (or rejection region) is the set of all values of the test statistic that cause us to reject the null hypothesis.

Page 44: Chapter10 Revised

Practice: Finding P-values. First determine whether the given conditions result in a right-tailed test, a left-tailed test, or a two-tailed test, then find the P-values and state a conclusion about the null hypothesis.

a) A significance level of = 0.05 is used in testing the claim that p > 0.25, and the sample data result in a test statistic of z = 1.18.

Page 45: Chapter10 Revised

Practice: Finding P-values. First determine whether the given conditions result in a right-tailed test, a left-tailed test, or a two-tailed test, then find the P-values and state a conclusion about the null hypothesis.

a) With a claim of p > 0.25, the test is right-tailed. Because the test is right-tailed, the graph shows that the P-value is the area to the right of the test statistic z = 1.18. Using Table A and find that the area to the right of z = 1.18 is 0.1190. The P-value is 0.1190 is greater than the significance level = 0.05, so we fail to reject the null hypothesis.

Page 46: Chapter10 Revised

Practice: Finding P-values. First determine whether the given conditions result in a right-tailed test, a left-tailed test, or a two-tailed test, then find the P-values and state a conclusion about the null hypothesis.

b) A significance level of = 0.05 is used in testing the claim that p 0.25, and the sample data result in a test statistic of z = 2.34.

Page 47: Chapter10 Revised

Practice: Finding P-values. First determine whether the given conditions result in a right-tailed test, a left-tailed test, or a two-tailed test, then find the P-values and state a conclusion about the null hypothesis.

b) With a claim of p 0.25, the test is two-tailed. Because the test is two-tailed, and because the test statistic of z = 2.34 is to the right of the center, the P-value is twice the area to the right of z = 2.34. Using Table A and find that the area to the right of z = 2.34 is 0.0096, so P-value = 2 x 0.0096 = 0.0192. The P-value of 0.0192 is less than or equal to the significance level, so we reject the null hypothesis.

Page 48: Chapter10 Revised

Assumptions for Testing Claims About

Population Means 1) The sample is a simple random   sample.

2) The value of the population standard deviation is known.

3) Either or both of these conditions is satisfied: The population is normally distributed or n > 30.

Page 49: Chapter10 Revised

Example Problem 3 10.13 pg. 572

• Ho: μ = 128

• HA: μ ≠ 128

• σ = 15, x-bar = 126.07, n = 72• z = x-bar – μ/ (σ/ )

• 2 P (z ≥ z |1.09|) = 2(1-.862) = .2758

n126.07 128

1.0915

72

Page 50: Chapter10 Revised

Example Problem 3 #10.13 pg. 572

• More than 27% of the time, an SRS of size 72 will have mean blood pressure at least a far away from 128 as the sample. This is not good evidence that it differs. We fail to reject the null hypothesis

Page 51: Chapter10 Revised

= z =x – µx

n

98.2 – 98.6 = –6.64

106

Example 4: We are given a data set of 106 body temperatures having a mean of 98.20°F. Assume that the sample is a simple random sample and that the population standard deviation is known to be 0.62°F. Use a 0.05 significance level to test the common belief that the mean body temperature of healthy adults is equal to 98.6°F. Use the P-value method.

H0: = 98.6H1: 98.6 = 0.05x = 98.2 = 0.62

This is a two-tailed test and the test statistic is to the left of the center, so the P-value is twice the area to the left of z = –6.64. We refer to Table A to find the area to the left of z = –6.64 is 0.0001, so the P-value is 2(0.0001) = 0.0002.

Page 52: Chapter10 Revised

Example 4:We are given a data set of 106 body temperatures having a mean of 98.20°F. Assume that the sample is a simple random sample and that the population standard deviation is known to be 0.62°F. Use a 0.05 significance level to test the common belief that the mean body temperature of healthy adults is equal to 98.6°F. Use the P-value method.

H0: = 98.6H1: 98.6 = 0.05x = 98.2 = 0.62

z = –6.64

Page 53: Chapter10 Revised

Example 4: We are given a data set of 106 body temperatures having a mean of 98.20°F. Assume that the sample is a simple random sample and that the population standard deviation is known to be 0.62°F. Use a 0.05 significance level to test the common belief that the mean body temperature of healthy adults is equal to 98.6°F. Use the P-value method.

H0: = 98.6H1: 98.6 = 0.05x = 98.2 = 0.62

z = –6.64

Because the P-value of 0.0002 is less than the significance level of = 0.05, we reject the null hypothesis.There is sufficient evidence to conclude that the mean body temperature of healthy adults differs from 98.6°F.

Page 54: Chapter10 Revised

Example 5: A survey of n = 880 randomly selected adult drivers showed that 56%(or p-hat = 0.56) of those respondents admitted to running red lights. Find the value of the test statistic for the claim that the majority of all adult drivers admit to running red lights.

Page 55: Chapter10 Revised

The claim is that the majority of all Americans run red lights. That is, p > 0.5. The sample data are n = 880, and p = 0.56.

np = (880)(0.5) = 440 5n(1-p) = (880)(0.5) = 440 5

Page 56: Chapter10 Revised

Example 5: In an article distributed by the Associated Press included these results from a nationwide survey: Of 880 randomly selected drivers, 56% admitted that they run red lights. The claim is that the majority of all Americans run red lights. That is, p > 0.5. The sample data are n = 880, and p = 0.56. We will use the P-value Method.

H0: p = 0.5H1: p > 0.5 = 0.05

p(1-p)

n

p – pz =

0.56 – 0.5

(0.5)(0.5)

880

= = 3.56

Referring to Table A, we see that for values of z = 3.50 and higher, we use 0.9999 for the cumulative area to the left of the test statistic. The P-value is 1 – 0.9999 = 0.0001.

Page 57: Chapter10 Revised

Interpretation: In an article distributed by the Associated Press included these results from a nationwide survey: Of 880 randomly selected drivers, 56% admitted that they run red lights. The claim is that the majority of all Americans run red lights. That is, p > 0.5. The sample data are n = 880, and p = 0.56. We will use the P-value Method.

H0: p = 0.5H1: p > 0.5 = 0.05

p(1-p)

n

p – pz =

0.56 – 0.5

(0.5)(0.5)

880

= = 3.56

We know from previous chapters that a z score of 3.56 is exceptionally large. The corresponding P-value of 0.0001 is less than the significance level of = 0.05, we reject the null hypothesis.There is sufficient evidence to support the claim.

Page 58: Chapter10 Revised

Interpretation : In an article distributed by the Associated Press included these results from a nationwide survey: Of 880 randomly selected drivers, 56% admitted that they run red lights. The claim is that the majority of all Americans run red lights. That is, p > 0.5. The sample data are n = 880, and p = 0.56. We will use the P-value Method.

H0: p = 0.5H1: p > 0.5 = 0.05

z = 3.56

Page 59: Chapter10 Revised

Do problems 42 and 44

Page 60: Chapter10 Revised

Section 10.4

Inference as Decision

Page 61: Chapter10 Revised

Traditional method: Reject H0 if the test statistic falls within the critical region.Fail to reject H0 if the test statistic does not fall within the critical region.

Decision Criterion

Page 62: Chapter10 Revised

P-value method: Reject H0 if P-value (where is the significance level, such as 0.05).Fail to reject H0 if P-value > .

** this is the method you will use

Decision Criterion**

Page 63: Chapter10 Revised

Decision Criterion

Confidence Intervals: Because a confidence interval

estimate of a population parameter contains the likely values of that parameter, reject a claim that the population parameter has a value that is not included in the confidence interval.

Page 64: Chapter10 Revised
Page 65: Chapter10 Revised

Wording of Final Conclusion

Page 66: Chapter10 Revised

Accept versus Fail to Reject

Some texts use “accept the null hypothesis.”

We will use the phrase “fail to reject” instead.

We are not proving the null hypothesis.

The sample evidence is not strong enough to warrant rejection (such as not enough evidence to convict a suspect).

Page 67: Chapter10 Revised

Type I Error

A Type I error is the mistake of rejecting the null hypothesis when it is true.

The symbol (alpha) is used to represent the probability of a type I error.

Page 68: Chapter10 Revised

Type II Error

A Type II error is the mistake of failing to reject the null hypothesis when it is false.

The symbol (beta) is used to represent the probability of a type II error.

Page 69: Chapter10 Revised
Page 70: Chapter10 Revised

Controlling Type I and Type II Errors

For any fixed , an increase in the sample size n will cause a decrease in

For any fixed sample size n , a decrease in will cause an increase in . Conversely, an increase in will cause a decrease in .

To decrease both and , increase the sample size.

Page 71: Chapter10 Revised

• The ability to detect a false hypothesis is called the power of the test.

• The power of the test is the probability that it correctly rejects a false null hypothesis.

• Power = 1 – β

Ho True Ho False

Reject Ho Type I error

α

Power

1-β

Fail to Reject Ho

OK Type II error

β

Page 72: Chapter10 Revised

• When power is high we can be confident

• If we fail to reject the false null hypothesis,

the test’s power comes into question

• Was the sample big enough?

• Did we miss an effect?

• Did we fail to gather sufficient data?

• Was there too much variability?

• What power is needed? The choice of power is frequently determined by financial or scientific concerns.

Page 73: Chapter10 Revised

Summary of the P-Value Method

1. Identify the specific claim or hypothesis to be tested, and put it in symbolic form.

2. Give the symbolic form that must be true when the original claim is false.

3. Of the two symbolic expressions, let the alternative hypothesis HA be the one not containing the equality, so that HA uses the symbol > or < or ≠. Let the null hypothesis H0 be the symbolic expression that the parameter equals the fixed value being considered.

4. Select the significance level α based on the seriousness of a type 1 error. Make α small if the consequences of rejecting a true H0 are severe. The values of 0.05 and 0.01 are most frequently used.

5. Identify the statistic that is relevant to this test and determine its sampling distribution (such as normal, t, or chi square).

6. Find the test statistic and find the P-value. Draw a graph and show the test statistic and P-value.

7. Reject H0 if the P-value is less than or equal to the significance level α . Fail to reject H0 if the P-value is greater than α .

8. Restate the decision in simple, non-technical terms and address the original claim.

Page 74: Chapter10 Revised

• Do problems 68 & 82 from The Practice of Statistics 2nd Edition

Page 75: Chapter10 Revised

Baseball Problem

Check conditions: (1) plausible independence condition (2) random condition (3) 10% condition (4) np > 10 and n(1-p)> 10

In 2002 in Major League Baseball there were 2425 regular season games. Home teams won 1314 of 2425 or 54.2%. Can this be explained by national sampling variability or home field advantage? We want to know whether the home team in professional baseball is more likely to win. The parameter of interest is the proportion of home team wins. With no advantage, we’d expect that proportion to be .50.

Page 76: Chapter10 Revised

p-hat = .542

z = p-hat – p/ sdphat= .542 .54.14

.01015

p(z> 4.14) = p < .0001There is a .0001 chance the home team is more likely to win.

Page 77: Chapter10 Revised

CAUTION

When the calculation of p results in a decimal with many places, store the number on your calculator and use all of the decimals when evaluating the z test statistic.

Large errors can result from rounding p hat too much.