inferential statistics part 1 chapter 8 p. 253- 278

39
Inferential Statistics Part 1 Chapter 8 P. 253- 278

Upload: constance-gray

Post on 05-Jan-2016

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Inferential Statistics Part 1

Chapter 8P. 253- 278

Page 2: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Collecting a random sampleGoal: to understand characteristics about a population

Examples:What’s the average commuting time for city residents?What’s the average household income of the patrons of a particular grocery store?What’s the average leaf size size of birch trees on August 1 in a particular state park?What proportion of people in a particular tropical city have had malaria?

Page 3: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Estimating the meanOne of the most common goals of statistical inference is estimating a population mean with a sample mean

Page 4: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Central Limit Theorem

When we have n independent, identically distributed (X1..Xn) random variables, the mean of those random variables approaches a normal distribution with mean = μ and variance = , as n gets large.

Independence of random variables means that the value of one observation has no effect on the value of another observation.

Identical distribution of random variables means that each random variable comes from the same population (e.g., roll of a die, coin flip).

2

n

Page 5: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Simple random samplingEach observation drawn does not depend on others drawn

Thus observations are independent

Each observation (i.e., each random variable) is identically distributed

The population has a distribution that doesn’t change (each observation is randomly drawn from an identical distribution – the distribution of the population).

So the Central Limit Theorem applies! (when n is large)

Page 6: Inferential Statistics Part 1 Chapter 8 P. 253- 278

What does this mean?

frequencySuppose we take a sample of n=50 observations from a population that has this distribution:

0 10 20 30

Mean (μ) = 202

Variance ( ) = 100Std. dev ( ) = 10

We then find the mean of this sample (suppose this mean = 19). Take another sample of 50 observations and find the mean (suppose it’s 24). Do this many times, and we’ll come up with a distribution of means. The Central Limit Theorem tells us this distribution will always look like the next slide (as long as n is “large”, and 50 is large enough):

Page 7: Inferential Statistics Part 1 Chapter 8 P. 253- 278

The normal curve

20 2416 18 22

Mean (μ) = 20 Sample size (n) = 50 variance of sample mean = = 2 2

n

x

Page 8: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Symbols

Population Parameter:

Estimate:

Expected: )ˆ(

ˆ

E

Page 9: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Basic Types of InferencePoint Inference

The value of a population parameter is estimated using a single value

Examples: mean, standard deviation, etc.

Interval InferenceAttaching a probability to an estimate (i.e., making a confidence interval)

Example: we are 95% confident that μ is between 10 and 20

Page 10: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Judging the Quality of the Estimator

Bias – the difference between and (i.e., )

Bias may be positive or negative (e.g., a positively biased estimator would indicate the population parameter is higher than it actually is)

Efficiency – how clustered the distribution of is (i.e., how “peaked” is its distribution)

)ˆ(E )ˆ(EBias

Page 11: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Judging the Quality of the Estimator

Best case scenario: to have an unbiased estimator, with a high level of efficiency

We can measure the quality of the estimator using the Mean Squared Error (MSE) or its counterpart RMSE (the square root of the MSE)

Remember that the variance in this case it the variance of a random variable so we use the equation:

VarianceBiasMSE 2

nVariance

2

Page 12: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Point Estimates (inferring population parameters from samples)

Population Mean:

Population Proportions:

Population Variance:

Population Standard Deviation:

s

s

nXP

x

22

/

Page 13: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Confidence IntervalsThe degree of confidence we have in our estimates defined by a percentage

Common examples: 90, 95, or 99% confident

The confidence interval is defined with the α symbol

In confidence intervals, alpha (α) is the proportion of time your confidence interval is wrong

The typical usage is:

Why do we divide by 2?

2/z

Page 14: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Confidence Interval ExampleWhat is the 95% confidence interval for a normally distributed variable?

α= 1 - desired confidence interval

α= 1 – 0.95 = 0.05

Remember that we divide α by 2 since we have uncertainty both above and below the mean (i.e., 2 tails)

Therefore we use z0.025 for the 95% confidence interval

From the z-table we find that z0.025 = 1.96

What does this mean?

Page 15: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Interval Estimation (making confidence intervals for population parameters estimated from samples)

Case #1 estimating an interval for μ when X is normally distributed and we know σ

This is the simplest case because normality allows us to use the z-table

This is also unlikely since it requires knowing the distribution and the σ (which implies knowing μ already)

Page 16: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #1: Create a confidence interval for μ

A town is considering building a new bridge over a river. The primary goal is to reduce workers’ commute times from a particular community. A random sample of workers in that community are asked to estimate their reduction in commute time if the bridge were built.

Our goal is to estimate the mean reduction in commute time for the whole community if the bridge were built. Create a 95% confidence interval for this mean.

Page 17: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #1 Data

n = 100 workers are sampledx = 17 minutesσ = 30 minutesWhat is the 95% confidence interval for the mean?

Page 18: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Constructing a confidence intervalConstruct a 95% confidence interval around the sample mean

So we can say that the 95% C.I. is 17 +/- 5.88 or 11.12, 22.88

95.0)88.51788.517(

95.0)3*96.1173*96.117(

95.0)100

3096.117

100

3096.117(

95.0)96.196.1(

P

P

P

nX

nXP

Page 19: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #1 Questions

What would happen to our interval if we used a 99% confidence interval instead?

What would happen to our confidence interval if we sampled 200 people instead of 100 people?

Page 20: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Interval Estimation (making confidence intervals for population parameters estimated from samples)

Case #2 estimating an interval for μ when X is not normally distributed and we know σ

In this case the n matters a lot, why?

This is also unlikely since it requires knowing the distribution and the σ (which implies knowing μ already)

Page 21: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Interval Estimation (making confidence intervals for population parameters estimated from samples)

Case #3 estimating an interval for μ when σ and the distribution are unknown

What should we used instead of σ?

Can we use the z-table in this case?

This case is what we see most commonly

Page 22: Inferential Statistics Part 1 Chapter 8 P. 253- 278

t-distribution vs. z-distributionWhen we only have s (and not σ) we use the t-distribution rather than the z-distribution

To do so we use the t-table

How are they different?The t-distribution changes depending on the degrees of freedom (n-1)

• This is reflected in the table and in the symbolThe t-distribution accounts for more uncertainty (i.e., wider confidence intervals) since s is just an estimate for σ

1,2/ nt

Page 23: Inferential Statistics Part 1 Chapter 8 P. 253- 278

t-distribution vs. z-distributionAs n approaches infinity t and z become equal

This means that even when we have s instead of σ we can use the z-distribution if n is large

Central Limit Theorem: “…as n gets large.”What is “large”? Rule of thumb: 30

For n less than 30, the distribution of x does not follow the normal distribution accurately enough.

But the distribution of x does closely follow a t-distribution for sample sizes of less than 30.

For this class use the t-distribution any time you have s instead of σ

Page 24: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #2

n = 16x = 30s2 = 1600What is the 95% C.I. for the mean?

Page 25: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #2s = 40Degrees of freedom = n – 1 = 15 (from the t-table)

The 95% confidence interval for the mean is (8.69, 51.31)

95.0)31.213031.2130(

95.0)10*131.23010*131.230(

95.0)16

40131.230

16

40131.230(

95.0)131.2131.2(

P

P

P

n

sX

n

sXP

131.215,025.0116,2/05.01,2/ ttt n

Page 26: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Interval Estimation (making confidence intervals for population parameters estimated from samples)

Case #4 estimating an interval for a proportion π based on a sample proportion p

Remember that p = x/n In other word, p = the number of “successes” divided by the number of samplesFor example: the proportion of people over 6ft tall

In this case we don’t need s or σ, but we do need the standard deviation of p:

Which we estimate as:

np

)1(

n

ppsp

)1(

Page 27: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Interval Estimation (making confidence intervals for population parameters estimated from samples)

Case #4 continued

Equation:

We use the z-distribution for estimating an interval for a proportion π based on a sample proportion p

This also limits us to using only large samples (in this case n > 100)

For smaller samples, we calculate the entire distribution using the binomial mass function: (i.e., solve for all x values)

n

ppzp

n

ppzp

)1()1(2/2/

xnxnxCxP )1()(

Page 28: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #3

n = 150 people at a convention63 people sampled were over 6 feet tallWhat is the 99% C.I. for the true proportion of all people ≥6 ft tall at the convention?

Page 29: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #3p = 63/150 = 0.4299% C.I. -> (from the z-table)

The 99% confidence interval for p = 0.42 is (0.316, 0.524)

104.042.0104.042.0

04.0*58.242.004.0*58.242.0

150

58.0*42.058.242.0

150

58.0*42.058.242.0

)1()1(2/2/

n

ppzp

n

ppzp

58.2005.02/ zz

Page 30: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Sample Size DeterminationOften, before we conduct a sample, we want to know how large of a sample we need

Required sample sizes can be determined for population parameters (mean, proportions, etc.) by modifying the equations we’ve been going through

An additional component is the error (E)This is basically the term that defines how far off we are willing to be (i.e., the margin of acceptable error)Strictly speaking, E is one-half the difference between the upper and lower values for an interval for a given C.I.Note that E is not the same as C.I.

Page 31: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Sample Size Determination

Equation for μ :

Equation for π:

What obvious flaw do you see?

2

2/

E

zn

2

2/ )1(

E

ppzn

Page 32: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #4

A movie theatre wants to know the mean number of tickets sold per day. How many days must they count to know the mean daily ticket sales within 100 tickets with a 95% confidence interval?

From previous sales reports, it is determined that σ = 175

Page 33: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #4What numbers do we plug into our equation?

What should zalpha/2 be?

What should E be?Why don’t we multiply this by 2?

What should σ be?

2

2/

E

zn

Page 34: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #4

z = 1.96E = 100σ = 175n = number of days we should sample

765.11

100

175*96.1

100

175*96.1

2

2

2

2/

n

n

n

E

zn

Page 35: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #5A city council election is being held with several candidates expecting reasonably large returns.

To avoid a run-off between the top 2 vote getters, the leading candidate must receive at least 45% of the vote

How many people do we need to sample using exit polls to determine with 99% confidence and an acceptable error of 0.005 whether there will be a run-off vote?

Page 36: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Example #5

z = 2.58E = 0.005p = 0.45n = number of people we should sample

16310

005.0

497.0*58.2

005.0

55.0*45.0*58.2

)1(

2

2

2

2/

n

n

n

E

ppzn

Page 37: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Class Problem

Given this sample of middle school kid heights (in inches)

56, 64, 52, 69, 66, 64, 63, 46, 46, 49, 47, 60, 54, 45, 45, 69, 62, 67, 49, 43, 59

What is the 99% confidence interval for the population mean (μ)?

Page 38: Inferential Statistics Part 1 Chapter 8 P. 253- 278

Solutionn = 21x = 1175/21 = 55.95s = 8.96talpha/2 , n-1 = 2.845

So the 99% C.I. for the population mean (μ) is [50.387, 61.513]

99.0)563.595.55563.595.55(

99.0)21

96.8845.295.55

21

96.8845.295.55(

99.0)845.2845.2(

P

P

n

sX

n

sXP

Page 39: Inferential Statistics Part 1 Chapter 8 P. 253- 278

For Friday

Read chapter 9 : pages 280-306

For Monday

Come with questions about homework #6