econ 140 lecture 61 inference about a mean part ii lecture 6
TRANSCRIPT
Lecture 6 1
Econ 140Econ 140
Inference about a MeanPart IILecture 6
Lecture 6 2
Econ 140Econ 140Today’s Plan
• Confidence Intervals
• Hypothesis testing
– Small samples
– Large samples
• Types of errors
• Quick review of what we’ve learned so far
Lecture 6 3
Econ 140Econ 140What we’ve seen so far
• We’ve worked with univariate populations
– Recall that we have the standardized normal variate Z, distributed Z ~ N(0,1):
YY
Z
• Ask question E(Y)? What is the probability that someone selected at random will have earnings between $300 and $400?
Lecture 6 4
Econ 140Econ 140What we’ve seen so far
• Before when we were considering the distribution around y we were considering the distribution of Y
• Now we are considering as a point estimator for y
• The difference is that the distribution for has a variance of 2/n where as Y has a variance of 2
• Having obtained an estimate of a parameter (y), and considered the properties of the estimator (BLUE), we need to find out how ‘good’ the estimate is. Estimation is the first side of statistical inference.
• The other side of statistical inference: hypothesis testing
YY
Lecture 6 5
Econ 140Econ 140Confidence Intervals
• Recall our picture showing the distributions of Y and Y
yY
Y
n
2
2
• You repeatedly take samples from the population and get different estimates of
– Sampling distribution is the probability distribution for the values in different samples (of a given size).
Y
Y
Lecture 6 6
Econ 140Econ 140Confidence Intervals (2)
• How do we assign probability bounds on our estimate?
• We don’t know what µy is, but we know the sample size and the sample estimates of Y
• We can estimate µy give or take some amount of errorerror randomfor allowanceYY
• We know that is distributedY ),(~ 2 nsNY Y
– We use s2 as an estimate of 2
– Our distribution of :Y
y
n
2
Lecture 6 7
Econ 140Econ 140Confidence Intervals (3)
• Remember: (as a rule of thumb - more precision later)
– Large samples: use the Z distribution
– Small samples: use the t distribution
• We’ll use the Z distribution for this example
– Our expression for the Z statistic is
)1,0(~ Nns
YZ Y
YYnsZ )(
YYnsZ )(
)( nsZYY
Lecture 6 8
Econ 140Econ 140Confidence Intervals (4)
• We have the standard normal distribution around µy
y
-Z +Z
• We want to describe how much area is between -Z and +Z
• We can create a 95% confidence interval around Z
Lecture 6 9
Econ 140Econ 140Confidence Intervals (5)
• We can write the confidence interval as
96.196.1Pr
ns
Y Y
• Where did we get the values -1.96 and +1.96?
– Look at the standard normal table
– We see that 47.5% of the area under the curve can be found between 0 and 1.96. Or 95% between +/- 1.96
0-1.96 +1.96
47.5% 47.5%
Lecture 6 10
Econ 140Econ 140Confidence Intervals (6)
• So we can rewrite
nsY
nsYns
ns
Y
Y
Y
Y
96.1
96.196.1
96.196.1Pr
• This is the confidence interval estimate for µy at a 95% level of confidence
– You can choose other levels of confidence
– As you increase the confidence level you increase the range of possible values µy can take
Lecture 6 11
Econ 140Econ 140Using the t distribution
• If we have a small sample, we should use the t distribution
• Our t statistic looks like
ns
Yt Y
• What will our confidence interval look like?
– We substitute t for Z
nstYY 2
• We don’t know the underlying population distribution
– But we can use the central limit theorem to assume that the sample distribution is approximately normal
– We can use the t distribution to approximate the distribution of sample means.
Lecture 6 12
Econ 140Econ 140Using the t distribution (2)
• We have to choose the confidence interval (1- ) that requires a choice of
• The area between the two t values is the confidence interval
y
-t +t
Confidence interval
• The usual accepted confidence level is 95% ( = 0.05)
Lecture 6 13
Econ 140Econ 140Using the t distribution (3)
• If (1- ) is the area between the two t values, then () is the sum of the area under the two tails
– if =0.95, (1- )=0.05
– 0.05/2 = .025
– So for a 95% confidence level, 0.025 of the area of the curve is found in each tail of the distribution
Lecture 6 14
Econ 140Econ 140The t Table
• In the first row, there is an upper number and a lower number– The upper gives you the area in one tail given a two tail
test– The lower number gives the area in one tail or in two
tails combined• At an infinite number of observations, 2.5% of the area
under the curve is found in each of the tails when our t statistic is 1.96 - it approximates the normal
• If our sample size is 10, 95% of the area under the t distribution is between -2.228 and +2.228– Note: the t has fatter tails than the standard normal
Lecture 6 15
Econ 140Econ 140The t Table (2)
• For a small sample size, the t values corresponding to a 95% confidence interval are larger in absolute value than the Z values for the same interval
• Depending on 2 things we get a very different approximation of the confidence interval
– Sample size
– Whether or not we know the population value for
Lecture 6 16
Econ 140Econ 140Hypothesis Testing
• We want to ask:
– What is the probability that µy is equal to some value?
• Using hypothesis testing we can determine whether or not it’s plausible that µy equals a certain value
• We have two types of sample (approximate rules)
– Large: n > 30
– Small: 30 n
Lecture 6 17
Econ 140Econ 140Large Samples
• Large samples
– Doesn’t matter if the population distribution is skewed or normal
– Doesn’t matter if the population variance is known (use ) or unknown (calculate s - the sample estimate of the standard deviation)
– Use the Z table
Lecture 6 18
Econ 140Econ 140Small Samples
• Small samples
– If the population is normally distributed and the population variance is known, use the Z table
– If the population is normally distributed but the population variance is unknown use the t distribution with n-1 degrees of freedom (calculate the sample variance as an estimate of the population).
– If the population is non-normally distributed, use neither the t nor the Z (I will never give you a case like this)
Lecture 6 19
Econ 140Econ 140Setting Up Hypotheses
• In hypothesis testing you set up a null hypothesis H0
• Under the null hypothesis µy will take a particular value
– Example: we can create a null such that
H° : µy = 300
• Once we have a null hypothesis we can set up an alternative hypothesis H1
Lecture 6 20
Econ 140Econ 140One and Two - Tailed Tests
• We can represent this in the following graph:
300y
ns
• One-tail tests
– We calculate the area in the right-hand tail if H1 : µy > 300
– We calculate the area in the left-hand tail if H1 : µy <300
• Two tail test:
– Find the area under both tails if H1 : µy 300
Lecture 6 21
Econ 140Econ 140Intervals and Regions
• We also need to assign a significance level (or confidence interval)• For a two-tailed test we are looking to see if a value of 300 lies
within the confidence interval
• With hypothesis tests we are creating an acceptance region bound by critical values– Critical values are taken off the Z and t tables– The regions in the tails are the critical regions
Lecture 6 22
Econ 140Econ 140
Critical region/2
Critical region/2
1 - Acceptance Region
Criticalvalue
Criticalvalue
Intervals and Regions (2)
is the significance level
• If you fail to reject the null, the Z or t statistic must fall in the acceptance region
• If you reject the null, the Z or t must fall in one of the critical regions
Lecture 6 23
Econ 140Econ 140Types of Errors
• Type I errors– Rejecting a hypothesis when it is in fact true– Example:
In the confidence interval example we constructed the confidence interval (254 y 380). If the true pop. mean is 400 we can make H0 : y = 400. In this case we’d falsely reject the null hypothesis!
• Type II errors– Not rejecting a false hypothesis
– Example: if the true mean is 400 but we do not reject H0 : y =300 we would not be rejecting a false hypothesis
Lecture 6 24
Econ 140Econ 140Types of Errors (2)
• Statisticians worry about Type I errors
– They choose a significance level that minimizes Type I errors
• To minimize Type I errors choose a small , where is the total area in both tails
– Thus the area in each tail is /2
Lecture 6 25
Econ 140Econ 140Types of Errors (3)
• As decreases, the likelihood of rejecting a true null hypothesis also decreases
• Most of the time = 5% is used, and /2 = 2.5%
• We can say that we do not reject or reject the null, but we can’t say that we accept the alternative!
• Examples:
Lecture 6 26
Econ 140Econ 140Hypothesis Testing in General
• Null (H0): 0 Y
Alternative (H1 ) Critical Region
0 Y right tail
Zn
YZ
0
0 Y left tail
Zn
YZ
0
0 Y two tail 20
Zn
YZ
• If you are using the t instead, replace the Z’s with t’s
Lecture 6 27
Econ 140Econ 140Where are we now?
• So far we have learned about inference and testing hypotheses using assumptions about distributions
• Distributions
– We had samples and populations and used weights to make inferences about the population using sample statistics
– We assumed distributional forms such as the Z or t distributions
• Sampling distribution of the mean
– You should know the difference between E(Y) and )(YE
Lecture 6 28
Econ 140Econ 140Where are we now? (2)
• BLUE: we’ll return to this in the next lecture
• Estimation and hypothesis testing
• We now look to return to the regression line and consider the estimators for a and b from:
• Have to consider the properties of the OLS estimator (BLUE), and how do we construct hypothesis tests on the estimates of the parameters a and b?
iii YYe ˆ
ii bXaY ˆ