welcome to econ 420 applied regression analysis study guide week six

30
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Six

Upload: loraine-caitlin-armstrong

Post on 03-Jan-2016

222 views

Category:

Documents


2 download

TRANSCRIPT

Welcome to Econ 420 Applied Regression Analysis

Study Guide

Week Six

The F-Test of Overall Significanceof Equation

Testing to see if, in general, our equation is any good at all

Step 1: State the null and alternative hypotheses.

Step 2: Choose the level of significance; find critical F( pages 316-319, d.f. of numerator =k and d.f. of denominator = n-k-1); state the decision rule

Step 3: Estimate the Regression; find F- Stat (formula on page 56, EViews calculates F-Stat automatically).

Step 4: Apply the decision rule• If FStat > critical F reject null hypothesis

The overall fit of the estimated model

• Graph of total, explained, and residual sums of squares

• TSS = RSS + ESS• Divide both sides by TSS• 1 = RSS/TSS + ESS/TSS• The coefficient of determination (R2)• R2 = ESS/TSS, or • Definition: Percentage of total variation of the

dependent variable around its mean that is explained by the independent variables

R2

• R2 = 1 – RSS/TSS • the smaller the sum of squared residuals

the _______ the R2

• Under what condition R2 = 1?• Under what condition R2 = 0?• In the presence of an intercept 1> R2>0• Suppose we got an R2 =0.7. What does

this number mean?

Problem of R2

• Remember our height –weight example

• Suppose R 2 = 0.7 • Now suppose we add another

independent variable to our model: pairs of shoes each individual owns

• Does R2 go up?– May be

• Should it go up?– No

Problem: The addition of an irrelevant variable never

decreases R2

• Why?1. If there is no correlation between the

added variable and dependent variable, then the estimated coefficient will be zero and RSS does not change

2. Sometimes the addition of an irrelevant independent variables to the model increases R2

• Why?• There may (accidentally) be a correlation

between the weight and pairs of shoes. This diminishes the sum of squared residuals

R Bar Squared(Adjusts R squared for degrees of

freedom.)

Adjusted R Squared

• As K goes up what happens to R bar squared?1. The sum of squared residuals may go down.

– What does this do to R bar squared?– R bar squared may go up

2. (n-k-1) goes down the term in the bracket goes up– R bar squared goes down

– R bar squared goes up if the first effect is stronger than the second effect.

– This is more likely to happen if the added independent variable is a relevant variable

• Note: High R or R bar squared is not the only sign of a good fit.

• EViews reports both R2 and Rbar2

Steps in Applied Regression Analysis (Chapter 4)

1. Identify the question

2. Review the literature

a) Theoretical literature will help you to • Specify the model

• Dependent and Independent Variables• Real/nominal variables• Omitted variables• Extra variables

• Functional form• Hypothesize the expected signs of coefficients• A perfect but useless regression (cause and

effect rather than equality)

Effects of Omitted Variables

• Example

• True equation is Y = f (X1,X2)

– Where – Y = GPA

– X1,= hours of study

– X2 = IQ score

• We fail to include X2 in our model

• Does this violate any assumptions?– Go back and study the assumptions to answer

this question• Violates assumption 1.  Why?• May violate assumption 3.  Why?

Effects of Omitted Variables

• What if X1 and X2 are correlated?

– Does this violate any assumptions?

• OLS is not BLUE

• The estimated coefficient of X1 (that is, B^1) is biased

• Bias depends on the correlation between X1 & X2 and the coefficient of X2 in true regression line.

Direction of Bias

The sign (direction) of Bias

• Bias is zero either1. if X2 does not affect Y (Bomitted is zero), or

2. if X2 is not correlated with X1

• How do you expect IQ (X2) to affect GPA (Y)?

• How are IQ (X2) and Hours of study (X1) correlated?

• What is the direction of bias in our example?– Will B^1 be bigger or smaller than it actually should

be?

The Variance of The estimated Coefficient

• Fact:– When we omit a relevant independent variables that

is correlated with other independent variables, variance of the estimated coefficients of the included independent variable goes down t statistic goes up t-test may yield significant coefficient while it should not

When should we suspect the omitted variable problem?

1. The adjusted R squared is low

2. The magnitude or the sign of the estimated coefficients is not as expected

3. The unimportant variables end up being highly significant

Correction for Omitted Variables

• Study the theoretical literature again

• Include the omitted variable based on the Expected bias analysis

Irrelevant Variable Problem

• Suppose the true regression model: GPA = f (Hours of study), but

• Our version of the true model: GPA = f (hours of study, and weight of the person)

• Does our model violate assumption 1?• Any other assumptions are violated?• Is our estimator bias?

– Not necessarily: if the expected value of the error term is zero, the expected value of Bhat on hours of study = B

• Does our estimator have the minimum variance?• No, our estimator does not have the smallest

variance (not the most efficient)• How does this affect t-test?

– variance of the estimated coefficients of hours of study goes up t statistic goes down t-test may mot yield significant coefficient on hours of study while it should.

 

Should we include X in the set of our independent variable?

• Yes, if1. Theory calls for its inclusion (the most important

criterion)

2. T- test: the estimated coefficient of X is significant in the right direction (Note: this does not mean that if the estimated coefficient is insignificant you have to drop the variable from your model.)

3. As you include X, the adjusted R squared goes up.

4. As you include X, the other variables’ coefficients change significantly.

b) Empirical literature will help you to

• See what others have done

• Their variables

• Their functional forms

• Their data sets

• Their findings

3. Choose a sample & collect data

• Cross Sectional/ Time Series

• Degrees of freedom

4. Estimate and evaluate the equation

a) Overall Quality of estimation• Adjusted R squared• F- test

b) Test your hypotheses

5. Document the results

• Predictions

• Policy recommendations

Assignment 5 (5 questions for 10 points each, total =50 points)Due: before 10PM on Friday, October 5)

1. Use the data set in dvd4 file to • run an F test of the overall significance of the

equation.• test the significance of all of the estimated

coefficients at 1% level. Make sure to not skip any of the 4 steps in hypothesis testing. Attach your EViews output.

• construct a 95% confidence interval for the coefficient on income.

Assignment 5 (continued)

2. #17, Page 63

3. #4, PP 81-82

4. #5, Page 82

5. #6, Page 83