econ 642, wednesday march 26, class 2model and least squares principle interpretation of...
TRANSCRIPT
Econ 642, Monday March 24, class 1
Econ 642, Wednesday March 26, class 2
Robert de Jong1
1Department of EconomicsOhio State University
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Outline
1 Econ 642, Monday March 24, class 1Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Simple regression: estimates modelyi = β0 + β1xi + ui
Multiple regression: estimates modelyi = β0 + β1xi1 + . . . + βkxik + ui
Interpretation of βj :
βj is the amount with which E(yi |xi1, . . . , xik ) increases if xij
goes up by one unit, keeping all other variables constant
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Least Squares principle - simple regressionMinimizing
n∑
i=1
(yi − (β0 + β1xi))2
over all possible values of β0 and β1 gives
β1 =n ·
∑ni=1 xiyi −
∑ni=1 xi ·
∑ni=1 yi
n ·∑n
i=1 x2i − (
∑ni=1 xi)2
andβ0 = y − β1x .
Note y = n−1 ∑ni=1 yi , the average of the yi
The mathematical calculation requires being able to find theminimum of a function of two variables using differentiation
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Least squares principle - multiple regressionminimize
n∑
i=1
(yi − (β0 + β1xi1 + . . . + βk xik ))2
This problem can be solved using matrix algebra
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Outline
1 Econ 642, Monday March 24, class 1Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Interpretation of coefficients
We obtain the regression line
y = β0 + β1x
yi : demand for housing of individual # i, in dollars annuallyxi : income in dollars annually
β1 is how many dollars an individual is predicted to spend onhousing when income increases by $1
β0 is how many dollars an individual is predicted to spend onhousing when income equals $0
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Example:
yi wage of individual ixi1 is nr. of years of educationxi2 is nr. of years of work experiencexi3 indicates male/female
Model:yi = β0 + β1xi1 + β2xi2 + β3xi3 + ui
β1 is the effect on wage of a 1 unit increase in nr. of years ofeducation, keeping nr. of years of work experience and genderconstant
β3 is the effect on wage of a 1 unit increase in the male/femalevariable, keeping nr. of years of work experience and nr. ofyears of education constant
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Interesting question/hypothesis in the previous regression:β3 = 0 ?
Often the interesting questions are answered by multivariateregression and not by simple regression
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
“Obvious" commenty = β0 + β1x
If β1 ≈ 0, then xi does not “influence" yi
Earlier example: if β1 ≈ 0, then expenditure on housing is not“influenced" by one’s income
Often, important questions correspond to a coefficient of 0
Does being a woman negatively impact earnings potential?
Is there a correlation between percentage of foreigners in aneighborhood and the crime rate?
Is there a relationship between number of cans of sodasold in a stadium and temperature?
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Minimizingn∑
i=1
(yi − (β0 + β1xi))2
over all possible values of β0 and β1 gives
β1 =n ·
∑ni=1 xiyi −
∑ni=1 xi ·
∑ni=1 yi
n ·∑n
i=1 x2i − (
∑ni=1 xi)2
andβ0 = y − β1x
Note y = n−1 ∑ni=1 yi , the average of the yi
The mathematical calculation requires being able to find theminimum of a function of two variables using differentiation
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Outline
1 Econ 642, Monday March 24, class 1Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Computer software: Eviews will calculate the regressioncoefficients for us
Other packages that can do this:
Excel, or other spreadsheet program
SAS, SPSS
Stata, Eviews
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Creating logarithms of variables
Remember: log(x) is increasing, log(1) = 0; so if we find apositive slope, this still suggests a positive correlation betweeny and x
Often in econometrics, we use logarithms of y and x instead ofy and x themselves
We call this a double-logarithmic specification instead of aregression in levels
Reasons:1 Interpretation: elasticity2 It works
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Q: How much does the demand q for pizza go up if income iincreases by 1%?
A: by the income elasticity of the demand for pizza
Mathematically:∆q/q∆i/i
Now∆q/q∆i/i
≈dq/qdi/i
anddlog(q)/dq = 1/q, so dlog(q) ≈ dq/q
Conclusion:∆q/q∆i/i
≈d log(q)
d log(i)Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
R2, the coefficient of determination
A measure between 0 and 1 of how well the data fit the model
0: poor fit; 1: perfect fit
used for:1 evaluating the quality of a regression2 comparing models with different with different data sets
and different functional form
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
R2 is the fraction of the variation in yi that is explained by themodel
Values you can expect for R2: totally silly guidelines
1 Time series: often “high" values: 0.8 - 0.99 range2 Cross-sections: often values in the 0.1 - 0.4 range3 Panel data: often in the 0.3 - 0.7 range
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
What R2 do we expect for:1 Regression of national consumption on national income2 Regression of cigarettes smoked on income3 Wage on nr. of years of education and and various other
regressors
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Issues with R2:
1. Some regressions with low R2 can still be interesting(smoking on income) while some regressions with high R2 canbe uninteresting (macro regression, e.g. exports on nationalincome)
2. Adding variables to a regression will always make R2 go up(mathematical necessity)
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Outline
1 Econ 642, Monday March 24, class 1Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Importance of inference
How do we obtain statistical proof that a coefficient equalszero ?
(or any other interesting value)
Examples:
yi apgar score; regressor incomeQuestion: does income affect apgar score?
yi wage; regressors: nr. of years of education, type ofprofession, male/female, etc.Question: does gender have an impact on compensation?
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Econometrician’s mindsets
We have observed data: the yi and xi .
We can calculate the regression line: calculate β0 and β1.
The data are generated as follows:1 Start with xi
2 We move to the value suggested by the true regressionline: β0 + β1xi (Note: no “hats" on the betas here!
3 a random error ui is added:yi = β0 + β1xi + ui
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Situation:1 We have true but unknown coefficients β0 and β1
2 We try to approximate these coefficients using β0 and β1
Example:yi : GPA in collegexi : GPA in high school
Question: what happens if xi has no impact on yi? (this is astatistical hypothesis)
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Answer: if xi has no impact on yi , then β1 = 0
NOTE: if β1 = 0, then β1 will be positive or negative, but notexactly zero
However β1 will be close to 0 in statistical sense if β1 = 0
Question: what does this mean: β1 close to 0?
Testing the hypothesis β1 = 0: 1. t-values2. standard errors3. p-values
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
GRE scores: are GRE and SAT biased against women andethnic groups?
GRE i = 172.4+ 39.7 Gi+ 78.9 GPAi
(10.9) (10.4)
+0.203 SATM i+ 0.110 SATV i
(0.071) (0.058)
where
GRE i = score of i th student on test
Gi = 1 if student is male, 0 otherwise
GPAi = GPA in economics classas
SATM i = score on SAT-mathematical
SATV i = score on SAT-verbal
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Things to note on previous transparency1 Dummy variable: a variable that takes on the values 0 and
1 only (male/female dummy)
This means that if Gi = 1 - i.e. for a man - we predict ahigher GRE score than if Gi = 0 (i.e. a woman)
2 Between parentheses: standard errors (we get those fromEviews output too!)
Standard errors measure the statistical uncertainty in thecoefficient
cf. margin of error in the Bush-Kerry election
95% confidence interval (likely values for the coefficient):
coefficient plus or minus 1.96 times the standard errorRobert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Example: estimated GRE equation
coefficient for Gi (gender: 1 male, 0 female): 39.7
suggests 39.7 extra GRE points for males
standard error: 10.9
Note: 39.7 is more than 1.96 standard errors away from 0
Conclusion: the positive coefficient for Gi is statistically“remote" from 0
Conclusion: we have statistical evidence that men score higheron the GRE
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Revisit GRE example:
GRE i = 172.4+ 39.7 Gi+ 78.9 GPAi
(10.9) (10.4)
+0.203 SATM i+ 0.110 SATV i
(0.071) (0.058)
where
GRE i = score of i th student on test
Gi = 1 if student is male, 0 otherwise
GPAi = GPA in economics classes
SATM i = score on SAT-mathematical
SATV i = score on SAT-verbal
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
t-value: the number of standard errors that the coefficient isaway from zero
Therefore,
t-value =coefficient
standard error
1 t-values are the third column in Eviews output2 sometimes t-values instead of standard errors between
parentheses under coefficients3 coefficient more than 1.96 standard errors away from zero
⇐⇒ t-value exceeds 1.96
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Outline
1 Econ 642, Monday March 24, class 1Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Theory of regression:
Model assumptions
⇒
Coefficients are normally distributed
⇒
Considering 1.96 standard errors is correct practice
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
1.96 standard error procedure corresponds to test at 95%confidence level
If we want to test at the 99% confidence level, we need 2.465standard errors
If absolute value of the t-value > 1.96, then we reject the nullhypothesis that the coefficient equals zero
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Looking at 1.96 standard errors is only valid practice if themodel assumptions are correct
Conclusion:
We need to understand the model assumptions
Model assumptions are of a probabilistic nature
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
The model assumptions
Model assumptions: statistical and mathematical assumptionsthat are needed to make our “two standard errors” procedurework
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Model assumptions1 y = β0 + β1x1 + β2x2 + . . . + β3x3 + u, where β0, β1, . . . , βk
are the unknown parameters (constants) of interest, and uis an unobservable random error or random disturbanceterm.
2 (random sampling) We have a random sample of nobservations from the above linear population model.
3 (zero conditional mean) E(u|x1, . . . , xk ) = 0.4 (no perfect collinearity) In the sample (and therefore in the
population), none of the independent is constant, and thereare no exact linear relationships among the independentvariables.
5 (homoskedasticity) Var(u|x1, . . . , xn) = σ2.
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
1. The regression model is linear in the coefficients, is correctlyspecified, and has an additive error term.
This means:1 No “relevant” omitted variables2 No nonlinearity
Implication: Regression of unemployment on minimum wageinvalid if the impact of a $5 to $7 change in minimum wage isbigger than the impact of a $12 to $14 change
Q: what is: “relevant" ?
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
2. Random sampling assumption: no autocorrelation
Two cases:
A. Cross-section: it seems unlikely that individuals in across-section influence each other’s behavior
Conclusion: This assumption is not very strict for across-section
B. Time series: this assumption can be problematic
Example: National consumption vs. national income: both timeseries are pretty well predictable from past values
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
3. All explanatory variables are uncorrelated with the error term.
This rules out that yi impacts xi inappropriately
Endogeneity: xi cannot be assumed given for yi
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
Examples: is endogeneity an issue in a regression of:1 Soda sales on temperature?2 Pizza consumption on income?3 Alcohol use on index of marital happiness?4 GDP on index of corruption in third world nations?5 National consumption on national income?6 Wage on male/female dummy?
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
4. No explanatory variable is a perfect linear function of anyother explanatory variable(s).
This is called: no multicollinearity
Rules out, e.g.1 a regression of wage on a constant, “education years”, and
“education years" (identical regressors)2 a regression of wage on a constant, “education years”, and
“education months", if the education months variable is 12times education years
3 regression of wage on a constant, a “female" dummyvariable, and a “male” dummy variable
Robert de Jong Econ 642, Wednesday March 26, class 2
Econ 642, Monday March 24, class 1
Model and Least squares principleInterpretation of coefficientsSoftware, taking logarithms, and R2
Inference and GRE exampleModel assumptions
5. The error term has a constant variance.
This is called homoskedasticity; non-constant variance of theerror term is called heteroskedasticity
Classical situation where this can fail:Regression of housing expenditure on income
Robert de Jong Econ 642, Wednesday March 26, class 2