week 15 powerpoint

Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.

The Diversity of Samples from the

Same Population

Chapter 19

Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc. 2

19.1 Setting the StageWorking Backward from Samples to Populations• Start with question about population. • Collect a sample from the population, measure variable. • Answer question of interest for sample.• With statistics, determine how close such an answer,

based on a sample, would tend to be from the actual answer for the population.

Understanding Dissimilarity among Samples• Suppose most samples are likely to provide an answer

that is within 10% of the population answer.• Then the population answer is expected to be within

10% of whatever value the sample gave.• So, can make a good guess about the population value.


19.2 What to Expect of Sample Proportions

40% of population carry a certain geneDo Not Carry Gene = , Do Carry Gene = XA slice of the population:


Sample 1: Proportion with gene = 12/25 = 0.48 = 48%Sample 2: Proportion with gene = 9/25 = 0.36 = 36%Sample 3: Proportion with gene = 10/25 = 0.40 = 40%Sample 4: Proportion with gene = 7/25 = 0.28 = 28%

Possible Samples

• Each sample gave a different answer.• Sample answer may or may not match population answer.


1. There exists an actual population with fixed proportion who have a certain trait. Or There exists a repeatable situation for which a certain outcome is likely to occur with fixed probability.

2. Random sample selected from population (so probability of observing the trait is same for each sample unit). Or Situation repeated numerous times, with outcome each time independent of all other times.

3. Size of sample or number of repetitions is relatively large – large enough to see at least 5 of each of the two possible responses.

Conditions for Rule for Sample Proportions


Example 1: Election Polls

Pollster wants to estimate proportion of voters who favor a certain candidate. Voters are the population units, and favoring candidate is opinion of interest.

Example 2: Television Ratings

TV rating firm wants to estimate proportion of households with television sets tuned to a certain television program. Collection of all households with television sets makes up the population, and being tuned to program is trait of interest.


Example 3: Consumer PreferencesManufacturer of soft drinks wants to know what proportion of consumers prefers new mixture of ingredients compared with old recipe. Population consists of all consumers, and response of interest is preference of new formula over old one.

Example 4: Testing ESP

Researcher wants to know the probability that people can successfully guess which of 5 symbols is on a hidden card. Each symbol is equally likely. Repeatable situation is a guess, and response of interest is successful guess. Is the probability of correct guess higher than 20%?


If numerous samples or repetitions of the same size are taken, the frequency curve made from proportions from various samples will be approximately bell-shaped.

Mean will be true proportion from the population.

Standard deviation will be:

(true proportion)(1 – true proportion)sample size

Defining the Rule for Sample Proportions


Example 5: Using Rule for Sample Proportions

Suppose 40% of all voters in U.S. favor candidate X. Pollsters take a sample of 2400 people. What sample proportion would be expected to favor candidate X?

The sample proportion could be anything from a bell-shaped curve with mean 0.40 and standard deviation:

(0.40)(1 – 0.40) = 0.01 2400

• 68% chance sample proportion is between 39% and 41%• 95% chance sample proportion is between 38% and 42%• almost certain sample proportion is between 37% and 43%

For our sample of 2400 people:


19.3 What to Expect of Sample Means

• Want to estimate average weight loss for all who attend national weight-loss clinic for 10 weeks.

• Unknown to us, population mean weight loss is 8 pounds and standard deviation is 5 pounds.

• If weight losses are approximately bell-shaped, 95% of individual weight losses will fall between –2 (a gain of 2 pounds) and 18 pounds lost.


Results:Sample 1: Mean = 8.32 pounds, std dev = 4.74 poundsSample 2: Mean = 6.76 pounds, std dev = 4.73 poundsSample 3: Mean = 8.48 pounds, std dev = 5.27 poundsSample 4: Mean = 7.16 pounds, std dev = 5.93 pounds

Possible Samples

• Each sample gave a different sample mean, but close to 8.• Sample standard deviation also close to 5 pounds.

Sample 1: 1,1,2,3,4,4,4,5,6,7,7,7,8,8,9,9,11,11,13,13,14,14,15,16,16Sample 2: –2, 2,0,0,3,4,4,4,5,5,6,6,8,8,9,9,9,9,9,10,11,12,13,13,16Sample 3: –4,–4,2,3,4,5,7,8,8,9,9,9,9,9,10,10,11,11,11,12,12,13,14,16,18Sample 4: –3,–3,–2,0,1,2,2,4,4,5,7,7,9,9,10,10,10,11,11,12,12,14,14,14,19


1. Population of measurements is bell-shaped, and a random sample of any size is measured.

OR2. Population of measurements of interest is

not bell-shaped, but a large random sample is measured. Sample of size 30 is considered “large,” but if there are extreme outliers, better to have a larger sample.

Conditions for Rule for Sample Means


Example 6: Average Weight LossWeight-loss clinic interested in average weight loss for participants in its program. Weight losses assumed to be bell-shaped, so Rule applies for any sample size. Population is all current and potential clients, and measurement is weight loss.

Example 7: Average Age at DeathResearcher is interested in average age at which left-handed adults die, assuming they have lived to be at least 50. Ages at death not bell-shaped, so need at least 30 such ages at death. Population is all left-handed people who live to be at least 50 years old. The measurement is age at death.


If numerous samples or repetitions of the same size are taken, the frequency curve of means from various samples will be approximately bell-shaped.

Mean will be same as mean for the population.

Standard deviation will be:

population standard deviation sample size

Defining the Rule for Sample Means


Example 9: Using Rule for Sample Means

Weight-loss example, population mean and standard deviation were 8 pounds and 5 pounds, respectively, and we were taking random samples of size 25.

Potential sample means represented by a bell-shaped curve with mean of 8 pounds and standard deviation:

5 = 1 pound 25

• 68% chance sample mean is between 7 and 9 pounds• 95% chance sample mean is between 6 and 10 pounds• almost certain sample mean is between 5 and 11 pounds



Increasing the Size of the Sample

Weight-loss example: suppose a sample of 100 people instead of 25 was taken.

Potential sample means still represented by a bell-shaped curve with mean of 8 pounds but standard deviation:

5 = 0.5 pounds 100

• 68% chance sample mean is between 7.5 and 8.5 pounds• 95% chance sample mean is between 7 and 9 pounds• almost certain sample mean is between 6.5 and 9.5 pounds


Larger samples tend to result in more accurate estimates of population values than do smaller samples.


19.4 What to Expect in Other Situations

• So far two common situations – (1) want to know what proportion of a population fall into one category of a categorical variable, (2) want to know the mean of a population for a measurement variable.

• Many other situations and similar rules apply to most other situations


• Confidence IntervalsInterval of values the researcher is fairly sure covers the true value for the population.

• Hypothesis TestingUses sample data to attempt to reject the hypothesis that nothing interesting is happening—that is, to reject the notion that chance alone can explain the sample results.

Two Basic Statistical Techniques


Case Study 19.1: Do Americans Really Vote When They Say They Do?

Reported in Time magazine (Nov 28, 1994):

• Telephone poll of 800 adults (2 days after election) – 56% reported they had voted.

• Committee for Study of American Electorate stated

only 39% of American adults had voted.Could it be the results of poll simply reflected a sample that, by chance, voted with greater

frequency than general population?


Case Study 19.1: Do Americans Really Vote When They Say They Do?

Suppose only 39% of American adults voted. We can expect sample proportions to be represented by a bell-shaped curve with mean 0.39 and standard deviation:

(0.39)(1 – 0.39) = 0.017 800

For our sample of 800 adults, we can be almost certain to see a sample proportion between 33.9% and 44.1%. The reported 56% is far above 44.1%.

The standard score for 56% is: (0.56 – 0.39)/0.017 = 10.Virtually impossible to see a standard score of 10 or more.


For Those Who Like Formulas

Copyright ©2005 Brooks/Cole, a division of Thomson Learning, Inc.

Estimating Proportions

with ConfidenceChapter 20


20.1 Confidence IntervalsConfidence Interval: an interval of values computed from sample data that is almost sure to cover the true population number.

• Most common level of confidence used is 95%. So willing to take a 5% risk that the interval does not actually cover the true value.

• We never know for sure whether any one given confidence interval covers the truth. However, …

• In long run, 95% of all confidence intervals tagged with 95% confidence will be correct and 5% of them will be wrong.


20.2 Three Examples of Confidence Intervals from the Media

Most polls report a margin of error along with the proportion of the sample that had each opinion.

Formula for a 95% confidence interval:

sample proportion ± margin of error


Example 1: A Public Opinion PollPoll reported in The Sacramento Bee (Nov 19, 2003, p. A20) 54% of respondents agreed that gay and lesbian couples could be good parents. Report also provided …

Source: Pew Research Center for the People and the Press survey of 1,515 U.S. adults, Oct 15-19; margin of error 3 percentage points.

What proportion of entire adult population at that time would agree that gay and lesbian couples could be good parents?

sample proportion ± margin of error 59% ± 5%

54% to 64%

A 95% confidence interval for that population proportion:

Interval resides above 50% => a majority of adult Americans in 2003 believed such couples can be good parents.


Example 2: Number of AIDS Cases in U.S.

Story reported in the Davis (CA) Enterprise (14 December 1993, p. A5)

For the first time, survey data is now available on a randomly chosen cross section of Americans. Conducted by the National Center for Health Statistics, it concludes that 550,000 Americans are actually infected.

Dr. Geraldine McQuillan, who presented the analysis, said the statistical margin of error in the survey suggests that the true figure is probably between 300,000 and just over 1 million people. “The real number may be a little under or a bit over the CDC estimate” [of 1 million], she said, “but it is not 10 million.”

Some estimates had been as high as 10 million people, but the CDC had estimated the number at about 1 million. Could the results of this survey rule out the fact that the number of people infected may be as high as 10 million?


Example 3: The Debate Over Passive Smoking

Source: Wall Street Journal (July 28, 1993, pp. B-1, B-4)

U.S. EPA says there is a 90% probability that the risk of lung cancer for passive smokers is somewhere between 4% and 35% higher than for those who aren’t exposed to environmental smoke. To statisticians, this calculation is called the “90% confidence interval.”

And that, say tobacco-company statisticians, is the rub. “Ninety-nine percent of all epidemiological studies use a 95% confidence interval,” says Gio B. Gori, director of the Health Policy Center in Bethesda, Md.

The EPA believes it is inconceivable that breathing in smoke containing cancer-causing substances could be healthy and any hint in the report that it might be would be meaningless and confusing. (p. B-4)

Problem: EPA used 90% because the amount of data available at the time did not allow an extremely accurate estimate of true change in risk of lung cancer for passive smokers. The 95% confidence interval actually went below zero percent, indicating it’s possible passive smoke reduces risk.


20.3 Constructing a Confidence Interval for a Proportion

Recall Rule for Sample Proportions:If numerous samples or repetitions of same size are taken, the frequency curve made from proportions from various samples will be approximately bell-shaped. Mean will be true proportion from the population. Standard deviation will be:

(true proportion)(1 – true proportion)sample size

In 95% of all samples, the sample proportion will fall within 2 standard deviations of the mean, which is the true proportion.


A Confidence Interval for a Proportion

In 95% of all samples, the true proportion will fall within 2 standard deviations of the sample proportion.

Problem: Standard deviation uses the unknown “true proportion”. Solution: Substitute the sample proportion for the true proportion

in the formula for the standard deviation.

A 95% confidence interval for a population proportion:

sample proportion ± 2(S.D.)

Where S.D. = (true proportion)(1 – true proportion) sample size

A technical note: To be exact, we would use 1.96(S.D.) instead of 2(S.D.). However, rounding 1.96 off to 2.0 will not make much difference.


Example 4: Wife Taller than the Husband?

In a random sample of 200 British couples, the wife was taller than the husband in only 10 couples.

• sample proportion = 10/200 = 0.05 or 5%

• standard deviation =

• confidence interval = .05 ± 2(0.015) = .05 ± .03 or 0.02 to 0.08

(0.05)(1 – 0.05) = 0.015 200

Interpretation: We are 95% confident that of all British couples, between .02 (2%) and .08 (8%) are such that the wife is taller than her husband.


Example 5: Experiment in ESP

Experiment: Subject tried to guess which of four videos the “sender” was watching in another room. Of the 165 cases, 61 resulted in successful guesses.• sample proportion = 61/165 = 0.37 or 37%



(0.37)(1 – 0.37) = 0.038 165

Interpretation: We are 95% confident that the probability of a successful guess in this situation is between .29 (29%) and .45 (45%). Notice this interval lies entirely above the 25% value expected by chance.


Example 6: Quit Smoking with the Patch

Study: Of 120 volunteers randomly assigned to use a nicotine patch, 55 had quit smoking after 8 weeks.

• sample proportion = 55/120 = 0.46 or 46%



(0.46)(1 – 0.46) = 0.045 120

Interpretation: We are 95% confident that between 37% and 55% of smokers treated in this way would quit smoking after 8 weeks. The placebo group confidence interval is 13% to 27%, which does not overlap with the nicotine patch interval.


Other Levels of Confidence

• 68% confidence interval: sample proportion ± 1(S.D.)

• 99.7% confidence interval: sample proportion ± 3(S.D.)

• 90% confidence interval: sample proportion ± 1.645(S.D.)

• 99% confidence interval: sample proportion ± 2.576(S.D.)


How the Margin of Error was Derived

Two formulas for a 95% confidence interval:

• sample proportion ± 1/n (from conservative m.e. in Chapter 4)

• sample proportion ± 2(S.D.)

Two formulas are equivalent when the proportion used in the formula for S.D. is 0.50.

Then 2(S.D.) is simply 1/n , which is our conservative formula for margin of error – called conservative because the true margin of error is actually likely to be smaller.


Case Study 20.1: A Winning Confidence Interval Loses in Court

• Sears company erroneously collected and paid city sales taxes for sales made to individuals outside the city limits.

• Sears took a random sample of sales slips for the period in question to estimate the proportion of all such sales.

• 95% confidence interval for true proportion of all sales made to out-of-city customers: .367 ± .03, or .337 to .397.

• To estimate amount of tax owed, multiplied percentage by total tax they had paid of $76,975. Result = $28,250 with 95% confidence interval from $25,940 to $30,559.

Source: Gastwirth (1988, p. 495)


Case Study 20.1: A Winning Confidence Interval Loses in Court

• Judge did not accept use of sampling and required Sears to examine all sales records.

• Total they were owed about $27,586. Sampling method Sears had used provided a fairly accurate estimate of the amount they were owed.

• Took Sears 300 person-hours to conduct the sample and 3384 hours to do the full audit.

• In fairness, the judge in this case was simply following the law; the sales tax return required a sale-by-sale computation.

A well designed sampling audit may yield a more accurate estimate than a less carefully carried out complete audit.

Source: Gastwirth (1988, p. 495)


For Those Who Like Formulas

week 15 powerpoint

Documents