![Page 1: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/1.jpg)
Linear Regression Models Powerful modeling technique Tease out relationships between
“independent” variables and 1 “dependent” variable
Models not perfect…need an error term Measurement errors, wrong model, omitted
variables, inherent randomness Linear models often misused.
![Page 2: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/2.jpg)
Example: Lake Water Quality Chlorophyll-a (C) widely used indicator –
measure of eutrophication Nitrogen (N) associated with
eutrophication Q: Golf Course Development. Nitrogen
expected to . By how much will C increase/decrease?
How should we proceed?
![Page 3: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/3.jpg)
Plot C vs. N
5 1 0 1 5 2 0 2 5
N i t ro g e n
0
5 0
1 0 0
1 5 0
Ch
loro
ph
yll
![Page 4: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/4.jpg)
A “Better” Model Explain (single) regression line (model?).
Neg. relationship suggests a problem. Omitted variable: Phosphorus (P)
Want to tease out effect of N, P separately. Write a Multiple Linear Regression Model:
Model designed to “tease out” effect of N and effect of P, separately, on C.
(**) Define and interpret variables, parameters.
ii2i10i NPC
![Page 5: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/5.jpg)
Estimation Use data to estimate parameter values
that give “best fit”: b0=-9.4, b1=0.3, b2=1.2
Answer: A one unit increase in N, results in about a 1.2 unit increase in C.
Importance: Omitting phosphorus from model introduced significant bias!!!
![Page 6: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/6.jpg)
Question: US Gas Consumption Gasoline consumption produces
many negative byproducts. Policy may be directed at increasing
the price of gas to reduce consumption.
But what is effect of price change? Question: What is the price elasticity
of demand for gasoline in the U.S.?
![Page 7: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/7.jpg)
Some Gasoline Data
1962 1972 1982 1992
YEAR
0.7
0.8
0.9
1.0
1.1
1.2
G.P
OP
0.6 1.1 1.6 2.1 2.6 3.1 3.6 4.1
PG
0.7
0.8
0.9
1.0
1.1
1.2
G.P
OP
![Page 8: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/8.jpg)
Gas Data Cont’d Gas consumption increases through time.
But no info here about price. Next plot shows (+) relationship between
gas price and gas consumption. Note opposite of demand curve. Something is wrong here…
Just as in Eutrophication problem, may have omitted important variables.
May have other problems, too.
![Page 9: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/9.jpg)
The OLS “Estimator” Estimator: A rule or strategy for using
data to estimate an unknown parameter. Defined before the data are drawn.
Ordinary Least Squares (OLS) estimator finds value of parameter that minimizes sum of squared deviations (see C vs. N plot)
Several assumptions for OLS estimator to apply to a model
![Page 10: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/10.jpg)
Linear Model The model must be linear
Linear in parameters, not in variables.• Difference between parameter, variable.
Examples:
t)S1(
t1t
t3t
t
2t
t
ttt
teSR
Z)Xlog(
XY
XY
![Page 11: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/11.jpg)
Transforming Models Previous “Ricker” model is non-
linear (in the parameter). Sometimes, can transform model so
linear. When plot, graph is nonlinear.
Take log of both sides, giving:
)log()S1()Slog()Rlog( ttt1t
![Page 12: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/12.jpg)
CLRM: Assumption 1 Dependent variable (Y) is function of
specific set of independent variables (X’s). Linear in parameters Additive error Coefficients are constant but unknown
Violations called “specification errors”, e.g.
Wrong regressors (a.k.a. indep. vars; X’s) Nonlinearity Changing parameters (e.g. through time)
![Page 13: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/13.jpg)
CLRM: Assumption 2 Disturbances (i’s) are independently and
identically distributed ~ (0,2) Typically we assume i~ N(0,2) Mean = 0 Constant variance, 2 (but unknown) Errors uncorrelated with one another
Example of violations: Measurement Bias (seep gas flux) Heteroskedasticity (variance differs). Autocorrelated Errors (disturbances correlated)
![Page 14: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/14.jpg)
CLRM: Assumption 3 It is possible to repeat the sample with
same independent variables. If had same levels of explanatory vars, would
it be possible to generate same value of Y? Common Violations:
Errors in variables – measurement error in X. Autoregression – when lagged dependent
variable should be independent variable Simultaneous Equations – several
relationships act jointly.
![Page 15: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/15.jpg)
Properties of Estimators Estimators have many properties.
“6” is an estimator, but not a very good one. Two main properties we care about:
Unbiased: The expected distance of estimator from thing it is estimating is 0.
Efficient: Small variance (spread) “6” is biased, but has a very small variance
(zero). OLS estimator is unbiased and has minimum
variance of all unbiased estimators.
![Page 16: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/16.jpg)
Correlation vs. Causation Now we know just enough to be
dangerous! Can estimate how any set of variables affects
some other variable….Very Powerful. Problem is: Correlation doesn’t imply
Causation! …. Why Data Mining is bad. Chicken production, Global CO2. May be “spurious” (no underlying relationship)
Difficult to tease out statistically. “Granger Causality”
![Page 17: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/17.jpg)
Violations & Consequences
Problem Consequences
Autocorrelation Unbiased, wrong inf.
Heterskedasticity Unbiased, wrong inf.
Contemporaneous Correlation (X, corr.)
Biased
Multicollinearity Usually OK
Omitted Variables Biased
Included Regressors Unbiased, extra noise
True model nonlinear Biased, Wrong inf.
![Page 18: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/18.jpg)
Guide to Model Specification
1. Start with theory to generate model2. Check assumptions of CLRM3. Collect and plot data4. Estimate model, test restrictions
Possibly perform Box-Cox transform5. Check R2, and “Adjusted R2”6. Plot residuals – look for patterns7. Seek explanations for patterns
![Page 19: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/19.jpg)
What’s a Residual? General form of linear model:
Graphically on board.
)"residual("YYˆ
)predicted(XˆˆY
)true(XY
iii
iii
![Page 20: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/20.jpg)
Residual Plots Residuals vs. Fit Normal Quantile
Plot
Fitted : Phosphorus + Nitrogen
Res
idua
ls
50 100 150 200
-40
-20
020
4060
7
10
14
Quantiles of Standard Normal
Res
idua
ls
-2 -1 0 1 2
-40
-20
020
4060
7
10
14
![Page 21: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/21.jpg)
Back to Gasoline Consumption Recall, interested in how gas consumption
is affected by price increase (say $0.10/gal.)
Variables: Gas consumption per capita (G) Gas price (Pg) Income (Y) New car price (Pnc) Used car price (Puc)
![Page 22: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/22.jpg)
2 Alternative Specifications Linear specification:
Log-log specification (often used with economic data)
One way to test specification is Box-Cox Transform (see 3 lectures back)
tt4t3t2t10t PucPncYPgG
tt4t3t2t10t )Puclog()Pnclog()Ylog()Pglog()Glog(
![Page 23: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/23.jpg)
Results of Linear Model
Parameter estimate, (p-value of t-test). Low p-value: “statistically significant”
R2 measures goodness of fit of model. Low p-value of F statistic means model
has explanatory power.
b0 b b2 b3 b4 R2 p (F)
-.09(.08)
-.04(.002)
.0002(.000)
-.10(.11)
-.04(.08)
.97 .000
![Page 24: Linear Regression Models Powerful modeling technique Tease out relationships between “independent” variables and 1 “dependent” variable Models not perfect…need](https://reader035.vdocuments.site/reader035/viewer/2022062407/56649d7e5503460f94a61e70/html5/thumbnails/24.jpg)
Answer to Question A 1 unit increase in price leads to
a .04 unit decrease in gas consumption.
Units are: G(1000 gallons), Pg($). So, a $0.10 increase in gas price
leads to, on average, a 4 gallon decrease in gas consumption…not much!