econ 140 lecture 61 inference about a mean part ii lecture 6

Lecture 6 1

Econ 140Econ 140

Inference about a MeanPart IILecture 6

Lecture 6 2

Econ 140Econ 140Today’s Plan

• Confidence Intervals

• Hypothesis testing

– Small samples

– Large samples

• Types of errors

• Quick review of what we’ve learned so far

Lecture 6 3

Econ 140Econ 140What we’ve seen so far

• We’ve worked with univariate populations

– Recall that we have the standardized normal variate Z, distributed Z ~ N(0,1):

YY

Z

• Ask question E(Y)? What is the probability that someone selected at random will have earnings between $300 and $400?

Lecture 6 4

Econ 140Econ 140What we’ve seen so far

• Before when we were considering the distribution around y we were considering the distribution of Y

• Now we are considering as a point estimator for y

• The difference is that the distribution for has a variance of 2/n where as Y has a variance of 2

• Having obtained an estimate of a parameter (y), and considered the properties of the estimator (BLUE), we need to find out how ‘good’ the estimate is. Estimation is the first side of statistical inference.

• The other side of statistical inference: hypothesis testing

YY

Lecture 6 5

Econ 140Econ 140Confidence Intervals

• Recall our picture showing the distributions of Y and Y

yY

Y

n

2

2

• You repeatedly take samples from the population and get different estimates of

– Sampling distribution is the probability distribution for the values in different samples (of a given size).

Y

Y

Lecture 6 6

Econ 140Econ 140Confidence Intervals (2)

• How do we assign probability bounds on our estimate?

• We don’t know what µy is, but we know the sample size and the sample estimates of Y

• We can estimate µy give or take some amount of errorerror randomfor allowanceYY

• We know that is distributedY ),(~ 2 nsNY Y

– We use s2 as an estimate of 2

– Our distribution of :Y

y

n

2

Lecture 6 7


• Remember: (as a rule of thumb - more precision later)

– Large samples: use the Z distribution

– Small samples: use the t distribution

• We’ll use the Z distribution for this example

– Our expression for the Z statistic is

)1,0(~ Nns

YZ Y

YYnsZ )(

YYnsZ )(

)( nsZYY

Lecture 6 8


• We have the standard normal distribution around µy

y

-Z +Z

• We want to describe how much area is between -Z and +Z

• We can create a 95% confidence interval around Z

Lecture 6 9


• We can write the confidence interval as

96.196.1Pr

ns

Y Y

• Where did we get the values -1.96 and +1.96?

– Look at the standard normal table

– We see that 47.5% of the area under the curve can be found between 0 and 1.96. Or 95% between +/- 1.96

0-1.96 +1.96

47.5% 47.5%

Lecture 6 10


• So we can rewrite

nsY

nsYns

ns

Y

Y

Y

Y

96.1

96.196.1

96.196.1Pr

• This is the confidence interval estimate for µy at a 95% level of confidence

– You can choose other levels of confidence

– As you increase the confidence level you increase the range of possible values µy can take

Lecture 6 11

Econ 140Econ 140Using the t distribution

• If we have a small sample, we should use the t distribution

• Our t statistic looks like

ns

Yt Y

• What will our confidence interval look like?

– We substitute t for Z

nstYY 2

• We don’t know the underlying population distribution

– But we can use the central limit theorem to assume that the sample distribution is approximately normal

– We can use the t distribution to approximate the distribution of sample means.

Lecture 6 12

Econ 140Econ 140Using the t distribution (2)

• We have to choose the confidence interval (1- ) that requires a choice of

• The area between the two t values is the confidence interval

y

-t +t

Confidence interval

• The usual accepted confidence level is 95% ( = 0.05)

Lecture 6 13

Econ 140Econ 140Using the t distribution (3)

• If (1- ) is the area between the two t values, then () is the sum of the area under the two tails

– if =0.95, (1- )=0.05

– 0.05/2 = .025

– So for a 95% confidence level, 0.025 of the area of the curve is found in each tail of the distribution

Lecture 6 14

Econ 140Econ 140The t Table

• In the first row, there is an upper number and a lower number– The upper gives you the area in one tail given a two tail

test– The lower number gives the area in one tail or in two

tails combined• At an infinite number of observations, 2.5% of the area

under the curve is found in each of the tails when our t statistic is 1.96 - it approximates the normal

• If our sample size is 10, 95% of the area under the t distribution is between -2.228 and +2.228– Note: the t has fatter tails than the standard normal

Lecture 6 15

Econ 140Econ 140The t Table (2)

• For a small sample size, the t values corresponding to a 95% confidence interval are larger in absolute value than the Z values for the same interval

• Depending on 2 things we get a very different approximation of the confidence interval

– Sample size

– Whether or not we know the population value for

Lecture 6 16

Econ 140Econ 140Hypothesis Testing

• We want to ask:

– What is the probability that µy is equal to some value?

• Using hypothesis testing we can determine whether or not it’s plausible that µy equals a certain value

• We have two types of sample (approximate rules)

– Large: n > 30

– Small: 30 n

Lecture 6 17

Econ 140Econ 140Large Samples

• Large samples

– Doesn’t matter if the population distribution is skewed or normal

– Doesn’t matter if the population variance is known (use ) or unknown (calculate s - the sample estimate of the standard deviation)

– Use the Z table

Lecture 6 18

Econ 140Econ 140Small Samples

• Small samples

– If the population is normally distributed and the population variance is known, use the Z table

– If the population is normally distributed but the population variance is unknown use the t distribution with n-1 degrees of freedom (calculate the sample variance as an estimate of the population).

– If the population is non-normally distributed, use neither the t nor the Z (I will never give you a case like this)

Lecture 6 19

Econ 140Econ 140Setting Up Hypotheses

• In hypothesis testing you set up a null hypothesis H0

• Under the null hypothesis µy will take a particular value

– Example: we can create a null such that

H° : µy = 300

• Once we have a null hypothesis we can set up an alternative hypothesis H1

Lecture 6 20

Econ 140Econ 140One and Two - Tailed Tests

• We can represent this in the following graph:

300y

ns

• One-tail tests

– We calculate the area in the right-hand tail if H1 : µy > 300

– We calculate the area in the left-hand tail if H1 : µy <300

• Two tail test:

– Find the area under both tails if H1 : µy 300

Lecture 6 21

Econ 140Econ 140Intervals and Regions

• We also need to assign a significance level (or confidence interval)• For a two-tailed test we are looking to see if a value of 300 lies

within the confidence interval

• With hypothesis tests we are creating an acceptance region bound by critical values– Critical values are taken off the Z and t tables– The regions in the tails are the critical regions

Lecture 6 22

Econ 140Econ 140

Critical region/2

Critical region/2

1 - Acceptance Region

Criticalvalue

Criticalvalue

Intervals and Regions (2)

is the significance level

• If you fail to reject the null, the Z or t statistic must fall in the acceptance region

• If you reject the null, the Z or t must fall in one of the critical regions

Lecture 6 23

Econ 140Econ 140Types of Errors

• Type I errors– Rejecting a hypothesis when it is in fact true– Example:

In the confidence interval example we constructed the confidence interval (254 y 380). If the true pop. mean is 400 we can make H0 : y = 400. In this case we’d falsely reject the null hypothesis!

• Type II errors– Not rejecting a false hypothesis

– Example: if the true mean is 400 but we do not reject H0 : y =300 we would not be rejecting a false hypothesis

Lecture 6 24

Econ 140Econ 140Types of Errors (2)

• Statisticians worry about Type I errors

– They choose a significance level that minimizes Type I errors

• To minimize Type I errors choose a small , where is the total area in both tails

– Thus the area in each tail is /2

Lecture 6 25

Econ 140Econ 140Types of Errors (3)

• As decreases, the likelihood of rejecting a true null hypothesis also decreases

• Most of the time = 5% is used, and /2 = 2.5%

• We can say that we do not reject or reject the null, but we can’t say that we accept the alternative!

• Examples:

Lecture 6 26

Econ 140Econ 140Hypothesis Testing in General

• Null (H0): 0 Y

Alternative (H1 ) Critical Region

0 Y right tail

Zn

YZ

0

0 Y left tail

Zn

YZ

0

0 Y two tail 20

Zn

YZ

• If you are using the t instead, replace the Z’s with t’s

Lecture 6 27

Econ 140Econ 140Where are we now?

• So far we have learned about inference and testing hypotheses using assumptions about distributions

• Distributions

– We had samples and populations and used weights to make inferences about the population using sample statistics

– We assumed distributional forms such as the Z or t distributions

• Sampling distribution of the mean

– You should know the difference between E(Y) and )(YE

Lecture 6 28

Econ 140Econ 140Where are we now? (2)

• BLUE: we’ll return to this in the next lecture

• Estimation and hypothesis testing

• We now look to return to the regression line and consider the estimators for a and b from:

• Have to consider the properties of the OLS estimator (BLUE), and how do we construct hypothesis tests on the estimates of the parameters a and b?

iii YYe ˆ

ii bXaY ˆ

econ 140 lecture 61 inference about a mean part ii lecture 6

Documents