module ii lecture 3: misspecification: non-linearities
DESCRIPTION
Graduate School Quantitative Research Methods Gwilym Pryce. Module II Lecture 3: Misspecification: Non-linearities. Summary of Lecture 2:. 1. ANOVA in regression 2. Prediction 3. F-Test 4. Regression assumptions 5. Properties of OLS estimates. TSS = REGSS + RSS. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/1.jpg)
1
Module II
Lecture 3: Misspecification:
Non-linearities
Graduate School Quantitative Research MethodsGwilym Pryce
![Page 2: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/2.jpg)
2
Summary of Lecture 2:
1. ANOVA in regression 2. Prediction 3. F-Test 4. Regression assumptions 5. Properties of OLS estimates
![Page 3: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/3.jpg)
3
TSS = REGSS + RSS
The sum of squared deviations of y from the mean (i.e. the numerator in the variance of y equation) is called the TOTAL SUM OF SQUARES
(TSS) The sum of squared deviations of error e is called the
RESIDUAL SUM OF SQUARES* (RSS)* sometimes called the “error sum of squares”
The difference between TSS & RSS is called the
REGRESSION SUM OF SQUARES# (REGSS)#the REGSS is sometimes called the “explained sum of squares” or “model sum of squares”
TSS = REGSS + RSS R2 = REGSS/ TSS
![Page 4: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/4.jpg)
4
4. Regression assumptions For estimation of a and b and for regression inference to be correct:1. Equation is correctly specified:
– Linear in parameters (can still transform variables)– Contains all relevant variables– Contains no irrelevant variables– Contains no variables with measurement errors
2. Error Term has zero mean3. Error Term has constant variance
![Page 5: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/5.jpg)
5
4. Error Term is not autocorrelated– I.e. correlated with error term from previous time
periods
5. Explanatory variables are fixed– observe normal distribution of y for repeated fixed
values of x
6. No linear relationship between RHS variables
– I.e. no “multicolinearity”
![Page 6: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/6.jpg)
6
5. Properties of OLS estimates
If the above assumptions are met, OLS estimates are said to be BLUE:– Best I.e. most efficient = least variance
– Linear I.e. best amongst linear estimates
– Unbiased I.e. in repeated samples,
mean of b = – Estimates I.e. estimates of the population
parameters.
![Page 7: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/7.jpg)
7
Plan of Lecture 3: 1. Consequences of non-linearities 2. Testing for non-linearities
– (a) visual inspection of plots– (b) t-statistics– (c) structural break tests
3. Solutions– (a) transform variables– (b) split the sample– (c) dummies– (d) use non-linear estimation techniques
![Page 8: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/8.jpg)
8
1. Consequences of non-linearities Depending on how
severe the non-linearity is, a, and b will be misleading:– estimates may be
“biased” – i.e. they will not
reflect the “true” values of
Scatter plot of y on x
-35000
-30000
-25000
-20000
-15000
-10000
-5000
0
5000
10000
15000
-80 -60 -40 -20 0 20 40 60 80 100
x
![Page 9: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/9.jpg)
9
~ is a biased estimator of
![Page 10: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/10.jpg)
10
2. Testing for non-linearities: (a) visual inspection of plots scatter plots of two variables:
– if you only have two or three variables then looking at scatter plots of these variables can help identify non-linear relationships in the data
– but when there are more than 3 variables, non-linearities can be very complex and difficult to identify visually:
![Page 11: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/11.jpg)
11
– What can appear to be random variation of data points around a linear line of best fit in a 2-D plot, can turn out to have a systematic cause when a third variable is included and a 3-D scatter plot is examined.
• Same is true when comparing 3D with higher dimensions• e.g. Suppose that there is a quadratic relationship between
x,y and z. But that this is only visible in the data if one controls for the influence of a fourth variable, w. But one does not know this, so looking at x, y and z, they appear to have a linear relationship.
![Page 12: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/12.jpg)
12
2. Testing for non-linearities: (b) t-statistics
Sometimes variables that we would expect (from intuition or theory) to have a strong effect on the dependent variable turn out to have low t-values.– If so, then one might suspect non-linearities.– Try transforming the variable (e.g. take logs) and re-examine
the t-values• e.g. HOUSING DEMAND = a + b AGE OF BORROWER
– surprisingly, age of borrower may not be that significant– but this might be because of a non-linearity: housing demand
rises with age until mid-life, and starts to decrease as children leave home. Try Age2 instead and check t-value.
![Page 13: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/13.jpg)
13
There may be non-linearities caused by interactions between variables:– try interacting explanatory variables and
examining t-values• e.g. HOUSE PRICE = a + b SIZE OF WINDOW + c VIEW• But size of window may only add value to a house if there is
a nice view, and having a nice view may only add value if there are windows.
• Try including and interactive term as well/instead:– HOUSE PRICE = a +…+ d SIZE OF WINDOW * VIEW
• In SPSS you would do this by creating a new variable using the COMPUTE command:
– COMPUTE SIZE_VEW = SIZE OF WINDOW * VIEW
and then including the new variable in the regression.
![Page 14: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/14.jpg)
14
2. Testing for non-linearities: (c) shifts & structural break tests
Sometimes certain observations display consistently higher y values.
If this difference can be modelled as a parallel shift of the regression line, then we can incorporate it into our model simply by including an appropriate dummy variable– e.g. male = 1 or 0;
![Page 15: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/15.jpg)
15
Apparent Intercept Shift in data:
![Page 16: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/16.jpg)
16
Data shifts in 3-Dimensions:(NB: the shift is the slightly lower prices for terrace = 1)
1.23001.0
Price
.8.6
Terrace (1=terr, 0=other).4 200
300000
.2
200000
0.0
100000
Floor Area (m sq)
100
![Page 17: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/17.jpg)
17
However, sometimes there is an apparent shift in the slope not just/instead of the intercept.
Being able to observe this visually is difficult if you have lots of variables since the visual symptoms will only reveal themselves if the data has been ordered appropriately.
![Page 18: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/18.jpg)
18
Apparent slope shift:Slope shift
0
500
1000
1500
2000
2500
3000
3500
4000
0 500 1000 1500 2000 2500 3000
x
y
![Page 19: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/19.jpg)
19
Solutions: (a) Transforming Variables
Note that “Linear” regression analysis does not preclude analysis of non-linear relationships (a common misconception).– It merely precludes estimation of certain
types of non-linear relationships• I.e. those that are non-linear in parameters:• y = ax + az + bxz
![Page 20: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/20.jpg)
20
However, so long as the non-linearity can fit within the basic structure of
• y = a + bx • I.e. it is linear in parameters
– then we can make suitable transformations of the variables and estimate by OLS:
![Page 21: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/21.jpg)
21
– e.g. 1 y = a + b x2
• we can simply create a new variable, z = x2 and run a regression of y = a + b z
• including the square of x is appropriate if the scatter plot of y on x is “n” shaped or “u” shaped
– e.g. 2 y = b + bx3
• we can create a new variable, z = x3 and run a regression of y = a + b z
• including the square of x is appropriate if the scatter plot of y on x is “s” shaped or has a back-to-front “s” shape.
![Page 22: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/22.jpg)
22
E.g.1 Scatter plot suggests a quadratic relationship
Scatter Plot of y on x
0
1000
2000
3000
4000
5000
6000
7000
0 10 20 30 40 50 60x
y
![Page 23: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/23.jpg)
23
Regressing y on the square of x should give a better fit
Scatter plot of y on z where z = x^2
0
1000
2000
3000
4000
5000
6000
7000
0 500 1000 1500 2000 2500 3000z
y
![Page 24: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/24.jpg)
24
E.g. 2 Scatter plot suggests a cubic relationship
Scatter Plot of y on x
-400000
-200000
0
200000
400000
600000
800000
1000000
-100 -50 0 50 100
x
y
![Page 25: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/25.jpg)
25
Regressing y on the cube of x should give a better fit:
Scatter plot of y on x3
-400000
-200000
0
200000
400000
600000
800000
1000000
-30000
0
-20000
0
-10000
0
0 100000 200000 300000 400000 500000 600000 700000 800000
x 3
![Page 26: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/26.jpg)
26
E.g. 3 Scatter Plot suggests a cubic relationship
Scatter Plot of y on x
-800000
-600000
-400000
-200000
0
200000
400000
600000
-80 -60 -40 -20 0 20 40 60 80 100
x
![Page 27: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/27.jpg)
27
Cubing x should give a better fit
Scatter plot of y on x3
-800000
-600000
-400000
-200000
0
200000
400000
600000
-3E+05 -2E+05 -1E+05 0 100000 200000 300000 400000 500000 600000 700000 800000
x 3
![Page 28: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/28.jpg)
28
E.g. 4 Scatter plot suggests a quadratic relationship
Scatter plot of y on x
-35000
-30000
-25000
-20000
-15000
-10000
-5000
0
5000
10000
15000
-80 -60 -40 -20 0 20 40 60 80 100
x
![Page 29: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/29.jpg)
29
Squaring x should give a better fit
Scatter plot of y on x2
-35000
-30000
-25000
-20000
-15000
-10000
-5000
0
5000
10000
15000
0 2000 4000 6000 8000 10000
x2
y
![Page 30: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/30.jpg)
30
Log-log and log-linear models
One of the most common transformations of either the dependent variable and/or the the explanatory variables is to take logs.– It is appropriate to transform x if the
scatter plot of y on x has either an “r” shape, or an “L” shape.
![Page 31: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/31.jpg)
31
E.g 5 scatter plot suggests a logarithmic relationship
Scatter plot of y on x
0
1
2
3
4
5
6
0 10 20 30 40 50 60
x
y
![Page 32: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/32.jpg)
32
Taking the log of x should result in a better fit
Scatter plot of y on ln(x)
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 1 2 3 4 5 6
ln(x)
y
![Page 33: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/33.jpg)
33
E.g.6 scatter plot suggests a logarithmic relationship
Scatter Plot of y on x
0
1
2
3
4
5
6
7
0 10 20 30 40 50 60
x
y
![Page 34: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/34.jpg)
34
Taking the log of x should result in a better fit
Scatter plot of y on ln(x)
0
1
2
3
4
5
6
7
0 1 2 3 4 5
ln(x)
y
![Page 35: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/35.jpg)
35
E.g. 7 scatter plot suggests an exponential relationship
Scatter Plot of y on x
0
1000
2000
3000
4000
5000
6000
45 50 55 60 65
x
y
![Page 36: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/36.jpg)
36
Taking the exponent of x should result in a better fit
Scatter y on exp(0.18 * x)
0
1000
2000
3000
4000
5000
6000
0 10000 20000 30000 40000 50000
exp(0.18*x)
y
![Page 37: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/37.jpg)
37
Solutions:b) Split the sample
![Page 38: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/38.jpg)
38
Quite a drastic measure: – split the sample and estimate two OLS lines
separately– in practice its not easy to decide where exactly to
split the sample– we can do an F-test to help us test whether there
really is a structural break: “Chow Tests”– but even if the F-test shows that there is a break, it
can often be remedied by squaring the offending variable, or using slope dummies...
![Page 39: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/39.jpg)
39
Solutions:(c) Dummy variables A dummy variable is one that takes the
values 0 or 1:– e.g. 1 if male , 0 if female
If we include the dummy as a separate variable in the regression we call it an Intercept Dummy
If we multiply it by one of the explanatory variables, then we call it a Slope Dummy
![Page 40: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/40.jpg)
40
Intercept Dummies:Scatter Plot with Intercept shift
0
1000
2000
3000
4000
5000
6000
7000
8000
0 500 1000 1500 2000 2500 3000
x
y
Original equation:
y = a + bx
now add a dummy:
(eg.D= 0 if white,
D= 1 if non-white)
y = a + bx + cD
c measures how much higher (lower if c is negative) the dependent variable is for non-whites
![Page 41: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/41.jpg)
41
Slope Dummies:
Slope shift
0
500
1000
1500
2000
2500
3000
3500
0 500 1000 1500 2000 2500 3000
x
y
Suppose race has an effect on the slope of the regression line rather than the intercept.
You can account for this by simply multiplying the relevant explanatory variable by the race dummy:
y = a + bx + cD*x
c measures how much higher (lower if c is negative) the b slope parameter would be for non-whites
![Page 42: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/42.jpg)
42
Solutions:(d) Non-linear estimation When you can’t satisfactorily deal with the
non-linearity by simply transforming variables, you can fit a non-linear curve to the data
These are usually based on some sort of grid search (I.e. trial and error) for the correct value of the non-linear parameter.– E.g. y = a + b1eb2x + b3z
• cannot be transformed to linearity in a way that would allow us to derive estimates for b2 and b3
![Page 43: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/43.jpg)
43
SPSS does allow non-linear estimation– go to Analyse, Regression, non-linear
But we shall not cover this topic in any more detail on this course since most types of non-linearity in data can be adequately dealt with using transformations of the variables.
![Page 44: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/44.jpg)
44
Summary
1. Consequences of non-linearities 2. Testing for non-linearities
– (a) visual inspection of plots– (b) t-statistics– (c) structural break tests
3. Solutions– (a) transform variables– (b) split the sample– (c) dummies– (d) use non-linear estimation techniques
![Page 45: Module II Lecture 3: Misspecification: Non-linearities](https://reader033.vdocuments.site/reader033/viewer/2022051620/56813557550346895d9cbbd1/html5/thumbnails/45.jpg)
45
Reading:
Kennedy (1998) “A Guide to Econometrics”, Chapters 3, 5 and 6