confidence intervals: sampling distribution

40
Statistics: Unlocking the Power of Data STAT 101 Dr. Kari Lock Morgan 9/13/12 Confidence Intervals: Sampling Distribution SECTIONS 3.1, 3.2 Sampling Distributions (3.1) Confidence Intervals (3.2)

Upload: reyna

Post on 09-Feb-2016

80 views

Category:

Documents


0 download

DESCRIPTION

STAT 101 Dr. Kari Lock Morgan 9/13/12. Confidence Intervals: Sampling Distribution. SECTIONS 3.1, 3.2 Sampling Distributions (3.1) Confidence Intervals (3.2). The Big Picture. Sample. Population. Sampling. Statistical Inference. Statistical Inference. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

STAT 101Dr. Kari Lock Morgan

9/13/12

Confidence Intervals: Sampling Distribution

SECTIONS 3.1, 3.2• Sampling Distributions (3.1)• Confidence Intervals (3.2)

Page 2: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

The Big Picture

Population

Sample

Sampling

Statistical Inference

Page 3: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Statistical Inference

Statistical inference is the process of drawing conclusions about the entire

population based on information in a sample.

Page 4: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Statistic and Parameter

A parameter is a number that describes some aspect of a population.

A statistic is a number that is computed from data in a sample.

We usually have a sample statistic and want to use it to make inferences about the population parameter

Page 5: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

The Big Picture

Population

Sample

Sampling

Statistical Inference

PARAMETERS

STATISTICS

Page 6: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Parameter versus Statistic

 

mu

sigma

rho

x-bar

p-hat

Page 7: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Election PollsOver the weekend (9/7/12 – 9/9/12), 1000

registered voters were asked who they plan to vote for in the 2012 presidential election

What proportion of voters plan to vote for Obama?

�̂�=0.50 p = ???http://www.politico.com/p/2012-election/polls/president

Page 8: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Point Estimate

We use the statistic from a sample as a point estimate for a population

parameter.

Point estimates will not match population parameters exactly, but they are our best guess, given the data

Page 9: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Election PollsActually, several polls were conducted over

the weekend (9/7/12 – 9/9/12):

http://www.politico.com/p/2012-election/polls/president

Page 10: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

IMPORTANT POINTS

• Sample statistics vary from sample to sample. (they will not match the parameter exactly)

• KEY QUESTION: For a given sample statistic, what are plausible values for the population parameter? How much uncertainty surrounds the sample statistic?

• KEY ANSWER: It depends on how much the statistic varies from sample to sample!

Page 11: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Reese’s Pieces

• What proportion of Reese’s pieces are orange?

• Take a random sample of 10 Reese’s pieces

• What is your sample proportion? class dotplot

• Give a range of plausible values for the population proportion

Page 12: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Sampling Distribution

A sampling distribution is the distribution of sample statistics

computed for different samples of the same size from the same population.

A sampling distribution shows us how the sample statistic varies from sample to sample

Page 13: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Sampling Distribution

In the Reese’s Pieces sampling distribution, what does each dot represent?

a) One Reese’s pieceb) One sample statistic

Page 14: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Reese’s Pieces• www.lock5stat.com/statkey

Page 15: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Sample Size Matters!

As the sample size increases, the variability of the sample statistics tends to decrease and the sample

statistics tend to be closer to the true value of the population parameter

For larger sample sizes, you get less variability in the statistics, so less uncertainty in your estimates

Page 16: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Random Samples

• If you take random samples, the sampling distribution will be centered around the true population parameter

• If sampling bias exists (if you do not take random samples), your sampling distribution may give you bad information about the true parameter

Page 17: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Lincoln’s Gettysburg Address

Page 18: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Interval Estimate

An interval estimate gives a range of plausible values for a population

parameter.

Page 19: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Margin of Error

One common form for an interval estimate is

statistic ± margin of error

where the margin of error reflects the precision of the sample statistic as a point

estimate for the parameter.

How do we determine the margin of error???

Page 20: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Sampling Distribution• We can use the spread of the sampling distribution to determine the margin of error for a statistic

Page 21: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Margin of Error

The higher the standard deviation of the sampling distribution, the

the margin of error.

a) higherb) lower

Page 22: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Election Polling

http://www.realclearpolitics.com/epolls/2012/president/us/general_election_romney_vs_obama-1171.html

Why is the margin of error smaller for the Gallup poll than the ABC news poll?

Page 23: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Interval EstimateUsing the Gallup poll, calculate an interval

estimate for the proportion of registered voters who plan to vote for Obama.

Page 24: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Election Polling

Page 25: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Confidence Interval

A confidence interval for a parameter is an interval computed from sample data by a

method that will capture the parameter for a specified proportion of all samples

The success rate (proportion of all samples whose intervals contain the parameter) is known as the confidence level

A 95% confidence interval will contain the true parameter for 95% of all samples

Page 26: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Confidence Intervalswww.lock5stat.com/StatKey

The parameter is fixed

The statistic is random (depends on the sample)

The interval is random (depends on the statistic)

Page 27: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

If you had access to the sampling distribution, how would you find the margin of error to ensure that intervals of the form

statistic ± margin of error

would capture the parameter for 95% of all samples?

Sampling Distribution

Page 28: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Standard Error

The standard error of a statistic, SE, is the standard deviation of the

sample statistic

The standard error can be calculated as the standard deviation of the sampling distribution

Page 29: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

95% Confidence Interval

If the sampling distribution is relatively symmetric and bell-shaped, a 95%

confidence interval can be estimated using

statistic ± 2 × SE

Page 30: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Economy

A survey of 1,502 Americans in January 2012 found that 86% consider the economy a “top priority” for the president and congress this year.

The standard error for this statistic is 0.01.

What is the 95% confidence interval for the true proportion of all Americans that considered the economy a “top priority” at that time?

(a) (0.85, 0.87)(b) (0.84, 0.88)(c) (0.82, 0.90)http://www.people-press.org/2012/01/23/public-priorities-deficit-rising-terrorism-slipp

ing/

statistic ± 2×SE0.86 ± 2×0.010.86 ± 0.02(0.84, 0.88)

Page 31: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Interpreting a Confidence Interval95% of all samples yield intervals that contain

the true parameter, so we say we are “95% sure” or “95% confident” that one interval contains the truth.

“We are 95% confident that the true proportion of all Americans that considered the economy a ‘top priority’ in January 2012 is between 0.84 and 0.88”

Page 32: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Reese’s Pieces

The standard error for the proportion of orange Reese’s Pieces in a random sample of 10, is closest to

a) 0.05b) 0.15c) 0.25d) 0.35

Page 33: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Reese’s Pieces

Each of you will create a 95% confidence interval based off your sample. If you all sampled randomly, and all create your CI correctly, what percentage of your intervals do you expect to include the true p?

a) 95%b) 5%c) All of themd) None of them

Page 34: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Reese’s Pieces

Use StatKey to more precisely estimate the SE

Use this estimated SE and your to create a 95% confidence interval based on your data.

Come up to the board and draw your interval

How many include (our best guess at) the truth?

Page 35: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Reese’s Pieces

Did your 95% confidence interval include the true p?

a) Yesb) No

Page 36: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Confidence IntervalsIf context were added, which of the following would be an appropriate interpretation for a 95% confidence interval:

a)“we are 95% sure the interval contains the parameter” b)“there is a 95% chance the interval contains the parameter”c)Both (a) and (b)d)Neither (a) or (b)

95% of all samples yield intervals that contain the true parameter, so we say we are “95% sure” or “95% confident” that one interval contains the truth.

We can’t make probabilistic statements such as (b) because the interval either contains the truth or it doesn’t, and also the 95% pertains to all intervals that could be generated, not just the one you’ve created.

Page 37: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Confidence Intervals

Population Sample

Sample

Sample

SampleSampleSample

. . .

Calculate statistic for each sample

Sampling Distribution

Standard Error (SE): standard deviation of sampling distribution

Margin of Error (ME)(95% CI: ME = 2×SE)

Confidence Interval

statistic ± ME

Page 38: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Summary• To create a plausible range of values for a

parameter:o Take many random samples from the population,

and compute the sample statistic for each sampleo Compute the standard error as the standard

deviation of all these statisticso Use statistic 2SE

• One small problem…

Page 39: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

Reality

… WE ONLY HAVE ONE SAMPLE!!!!

• How do we know how much sample statistics vary, if we only have one sample?!?

… to be continued

Page 40: Confidence Intervals: Sampling Distribution

Statistics: Unlocking the Power of Data Lock5

To DoRead Sections 3.1, 3.2

Do Homework 2 (due Tuesday, 9/18)