take home project 2 group c: robert matarazzo, michael stromberg, yuxing zhang, yin chu, leslie wei,...
Post on 19-Dec-2015
216 views
TRANSCRIPT
TAKE HOME PROJECT 2Group C:
Robert Matarazzo, Michael Stromberg, Yuxing Zhang,
Yin Chu, Leslie Wei, and Kurtis Hollar
Introduction
We chose to forecast the imported petroleum price index.
Petroleum has many uses but is mainly used to produce fuels.
The price of petroleum heavily influences the price of gasoline.
2
Original Data
Decay in the autocorrelation
Large spike at lag 1 in the partial correlation
This structure indicates non-stationary data
5
Original Data
ADF Test Statistic -0.443169 1% Critical Value* -3.4575 5% Critical Value -2.8729 10% Critical Value -2.5728
*MacKinnon critical values for rejection of hypothesis of a unit root.
Augmented Dickey-Fuller Test EquationDependent Variable: D(PETROP)
Method: Least SquaresDate: 05/28/10 Time: 11:42
Sample(adjusted): 1989:01 2010:04Included observations: 256 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. PETROP(-1) -0.004220 0.009521 -0.443169 0.6580
C 1.450960 1.370614 1.058621 0.2908R-squared 0.000773 Mean dependent var 0.960156Adjusted R-squared -0.003161 S.D. dependent var 12.89963S.E. of regression 12.92000 Akaike info criterion 7.963212
Sum squared resid 42399.31 Schwarz criterion 7.990909Log likelihood -1017.291 F-statistic 0.196399Durbin-Watson stat 0.728269 Prob(F-statistic) 0.658021
The ADF test statistic confirms the notion of non-stationary data
6
Application of Step Functionand Dummy Variables
Dependent Variable: PETROP
Method: Least Squares
Date: 05/28/10 Time: 11:45Sample: 1988:12 2010:04
Included observations: 257
Variable Coefficient Std. Error t-Statistic Prob. C 105.3547 4.906600 21.47203 0.0000
D1 314.1453 75.53614 4.158874 0.0000D2 266.1453 75.53614 3.523417 0.0005
D3 183.5453 75.53614 2.429901 0.0158
D4 96.24534 75.53614 1.274163 0.2038D5 45.44534 75.53614 0.601637 0.5480
STEP 130.5766 19.47246 6.705704 0.0000R-squared 0.241580 Mean dependent var 117.0074
Adjusted R-squared 0.223378 S.D. dependent var 85.53265
S.E. of regression 75.37662 Akaike info criterion 11.50973Sum squared resid 1420409. Schwarz criterion 11.60640
Log likelihood -1472.000 F-statistic 13.27215Durbin-Watson stat 0.111977 Prob(F-statistic) 0.000000
Regression of petroleum price against dummy variables and step function
9
Application of Step Functionand Dummy Variables
Decay in the autocorrelation
Large spike at lag 1 in the partial correlation
This structure indicates non-stationary data
12
Application of Step Functionand Dummy Variables
Breusch-Godfrey Serial Correlation LM Test:
F-statistic 5887.195 Probability 0.000000
Obs*R-squared 251.6986 Probability 0.000000
Test Equation:
Dependent Variable: RESID
Method: Least Squares
Date: 05/28/10 Time: 11:49
Variable Coefficient Std. Error t-Statistic Prob.
C 1.306229 0.715046 1.826776 0.0689
D1 -373.8602 11.44297 -32.67159 0.0000
D2 62.97759 26.14838 2.408470 0.0167
D3 -1.306229 10.89304 -0.119914 0.9046
D4 -1.306229 10.89304 -0.119914 0.9046
D5 -1.306229 10.89304 -0.119914 0.9046
STEP 1.904366 2.810381 0.677618 0.4986
RESID(-1) 1.207329 0.062577 19.29363 0.0000
RESID(-2) -0.178742 0.065813 -2.715908 0.0071
R-squared 0.979372 Mean dependent var 5.73E-14
Adjusted R-squared 0.978706 S.D. dependent var 74.48806
S.E. of regression 10.86954 Akaike info criterion 7.644198
Sum squared resid 29300.44 Schwarz criterion 7.768484
Log likelihood -973.2794 F-statistic 1471.799
Durbin-Watson stat 1.900201 Prob(F-statistic) 0.000000
The F-statistic indicates that there is still serial correlation in this revised data
Further steps must be taken
13
Logarithm Transformation and First Differenced Data
Regression of the first difference of the logarithm of petroleum price against dummy variables and step function
Dependent Variable: DLNPETROPMethod: Least SquaresDate: 05/29/10 Time: 23:20Sample(adjusted): 1989:01 2009:04Included observations: 244 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. C 0.010559 0.004479 2.357406 0.0192
DD1 -0.113533 0.069245 -1.639584 0.1024DD2 -0.245607 0.098132 -2.502824 0.0130DD3 -0.507634 0.120437 -4.214950 0.0000DD4 -0.877988 0.139356 -6.300309 0.0000DD5 -1.178878 0.156127 -7.550778 0.0000
DSTEP -1.236968 0.171380 -7.217702 0.0000R-squared 0.222473 Mean dependent varAdjusted R-squared 0.202789 S.D. dependent varS.E. of regression 0.069100 Akaike info criterionSum squared resid 1.131627 Schwarz criterionLog likelihood 309.3474 F-statisticDurbin-Watson stat 1.102828 Prob(F-statistic)
14
Logarithm Transformation and First Differenced Data
ADF Test Statistic -7.814147 1% Critical Value* -3.4580 5% Critical Value -2.8731 10% Critical Value -2.5729
*MacKinnon critical values for rejection of hypothesis of a unit root.
Augmented Dickey-Fuller Test EquationDependent Variable: D(DLNPETROP)Method: Least SquaresDate: 05/30/10 Time: 15:57Sample(adjusted): 1989:06 2010:04Included observations: 251 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. DLNPETROP(-1) -0.700127 0.089597 -7.814147 0.0000
D(DLNPETROP(-1)) 0.315975 0.082244 3.841921 0.0002D(DLNPETROP(-2)) 0.098728 0.077206 1.278768 0.2022D(DLNPETROP(-3)) 0.143967 0.067344 2.137800 0.0335D(DLNPETROP(-4)) 0.069445 0.063319 1.096741 0.2738
C 0.004222 0.004091 1.032067 0.3031R-squared 0.286618 Mean dependent var -6.30E-05Adjusted R-squared 0.272059 S.D. dependent var 0.075215S.E. of regression 0.064173 Akaike info criterion -2.630867Sum squared resid 1.008940 Schwarz criterion -2.546593Log likelihood 336.1738 F-statistic 19.68690Durbin-Watson stat 2.007980 Prob(F-statistic) 0.000000
The ADF test statistic indicates that the data is now stationary
17
Logarithm Transformation and First Differenced Data
Notice the significant spikes at lags 1, 2, and 11
The spike at lag 10 may also be significant
18
Building The ModelDependent Variable: DLNPETROPMethod: Least SquaresDate: 05/29/10 Time: 23:26Sample(adjusted): 1989:03 2009:04Included observations: 242 after adjusting endpointsConvergence achieved after 6 iterationsBackcast: 1988:04 1989:02
Variable Coefficient Std. Error t-Statistic Prob. C 0.010550 0.006695 1.575659 0.1165
DD1 -0.118552 0.060062 -1.973814 0.0496DD2 -0.253143 0.109485 -2.312124 0.0216DD3 -0.530545 0.146073 -3.632045 0.0003DD4 -0.894882 0.172724 -5.180999 0.0000DD5 -1.194821 0.194876 -6.131179 0.0000
DSTEP -1.245789 0.212066 -5.874541 0.0000AR(1) 0.519082 0.065070 7.977273 0.0000AR(2) -0.196711 0.064619 -3.044167 0.0026
MA(11) 0.176080 0.066417 2.651109 0.0086R-squared 0.428059 Mean dependent var 0.004986Adjusted R-squared 0.405871 S.D. dependent var 0.077414S.E. of regression 0.059671 Akaike info criterion -2.759510Sum squared resid 0.826055 Schwarz criterion -2.615339Log likelihood 343.9007 F-statistic 19.29289Durbin-Watson stat 1.980779 Prob(F-statistic) 0.000000Inverted AR Roots .26+.36i .26 -.36iInverted MA Roots .82+.24i .82 -.24i .56 -.65i .56+.65i
.12 -.85i .12+.85i -.35+.78i -.35 -.78i -.72+.46i -.72 -.46i -.85
Based off of the correlogram, we tried modeling with an AR(1) AR(2) MA(11)
All of the coefficients are significant at a 5% level
19
Building The ModelBreusch-Godfrey Serial Correlation LM Test:F-statistic 0.776204 Probability 0.461351Obs*R-squared 1.622180 Probability 0.444374
Test Equation:Dependent Variable: RESIDMethod: Least SquaresDate: 05/29/10 Time: 23:30
Variable Coefficient Std. Error t-Statistic Prob. C -8.72E-05 0.006703 -0.013005 0.9896
DD1 0.001103 0.060235 0.018311 0.9854DD2 0.002716 0.110076 0.024676 0.9803DD3 0.006493 0.147130 0.044130 0.9648DD4 0.009790 0.173997 0.056265 0.9552DD5 0.010948 0.196074 0.055836 0.9555
DSTEP 0.011014 0.213172 0.051666 0.9588AR(1) 0.583091 0.772579 0.754733 0.4512AR(2) 0.028583 0.158990 0.179781 0.8575
MA(11) 0.001298 0.066538 0.019508 0.9845RESID(-1) -0.572796 0.773672 -0.740359 0.4598RESID(-2) -0.362230 0.330054 -1.097488 0.2736
R-squared 0.006703 Mean dependent var -6.23E-05Adjusted R-squared -0.040802 S.D. dependent var 0.058546S.E. of regression 0.059728 Akaike info criterion -2.749708Sum squared resid 0.820517 Schwarz criterion -2.576703Log likelihood 344.7147 F-statistic 0.141104Durbin-Watson stat 1.998401 Prob(F-statistic) 0.999515
The F-statistic indicates that there is no longer serial correlation in the data
20
Building The Model
The resulting correlogram indicates that the spike at lag 10 may still be significant
We will try incorporating an MA(10) into a new model
21
Building The Model
Adding an MA(10) term made the coefficients more significant in general
All of the coefficients are significant at a 5% level
Dependent Variable: DLNPETROP
Method: Least Squares
Date: 05/29/10 Time: 23:56
Sample(adjusted): 1989:03 2009:04
Included observations: 242 after adjusting endpoints
Convergence achieved after 6 iterations
Backcast: 1988:04 1989:02
Variable Coefficient Std. Error t-Statistic Prob.
C 0.010651 0.007566 1.407688 0.1606
DD1 -0.122285 0.059446 -2.057094 0.0408
DD2 -0.274387 0.108398 -2.531304 0.0120
DD3 -0.553364 0.144543 -3.828359 0.0002
DD4 -0.923081 0.171140 -5.393715 0.0000
DD5 -1.216743 0.192998 -6.304429 0.0000
DSTEP -1.274860 0.210151 -6.066406 0.0000
AR(1) 0.515431 0.064487 7.992847 0.0000
AR(2) -0.191517 0.064415 -2.973152 0.0033
MA(10) 0.157807 0.064670 2.440197 0.0154
MA(11) 0.185034 0.064715 2.859226 0.0046
R-squared 0.441873 Mean dependent var 0.004986
Adjusted R-squared 0.417712 S.D. dependent var 0.077414
S.E. of regression 0.059073 Akaike info criterion -2.775696
Sum squared resid 0.806103 Schwarz criterion -2.617107
Log likelihood 346.8592 F-statistic 18.28842
Durbin-Watson stat 1.977425 Prob(F-statistic) 0.000000
Inverted AR Roots .26+.35i .26 -.35i
Inverted MA Roots .86+.26i .86 -.26i .56+.70i .56 -.70i
.08 -.88i .08+.88i -.41 -.75i -.41+.75i
-.71 -.39i -.71+.39i -.7822
Building The Model
The correlogram has no significant spikes
All remaining lags are even less significant than without the MA(10) term
24
Building The Model
Once again, there is no serial correlation in this model
Breusch-Godfrey Serial Correlation LM Test:F-statistic 0.617507 Probability 0.540183Obs*R-squared 1.298123 Probability 0.522536
Test Equation:Dependent Variable: RESIDMethod: Least SquaresDate: 05/29/10 Time: 23:59
Variable Coefficient Std. Error t-Statistic Prob. C -8.47E-05 0.007580 -0.011180 0.9911
DD1 0.000363 0.059634 0.006085 0.9952DD2 0.000184 0.108973 0.001688 0.9987DD3 0.001832 0.145544 0.012587 0.9900DD4 0.004007 0.172313 0.023253 0.9815DD5 0.004465 0.194091 0.023006 0.9817
DSTEP 0.004825 0.211181 0.022846 0.9818AR(1) 0.343824 0.801985 0.428716 0.6685AR(2) 0.064231 0.162570 0.395097 0.6931
MA(10) -0.005656 0.064992 -0.087020 0.9307MA(11) 0.001445 0.064821 0.022295 0.9822
RESID(-1) -0.331181 0.802074 -0.412905 0.6801RESID(-2) -0.273438 0.337831 -0.809393 0.4191
R-squared 0.005364 Mean dependent var 2.72E-06Adjusted R-squared -0.046757 S.D. dependent var 0.057834S.E. of regression 0.059171 Akaike info criterion -2.764545Sum squared resid 0.801779 Schwarz criterion -2.577123Log likelihood 347.5100 F-statistic 0.102918Durbin-Watson stat 1.998171 Prob(F-statistic) 0.999950 25
Building The Model
ARCH Test:F-statistic 0.213734 Probability 0.644277Obs*R-squared 0.215330 Probability 0.642622
Test Equation:Dependent Variable: RESID^2Method: Least SquaresDate: 05/30/10 Time: 00:00Sample(adjusted): 1989:04 2009:04Included observations: 241 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. C 0.003238 0.000444 7.299747 0.0000
RESID^2(-1) 0.029897 0.064668 0.462314 0.6443R-squared 0.000893 Mean dependent var 0.003338Adjusted R-squared -0.003287 S.D. dependent var 0.006005S.E. of regression 0.006015 Akaike info criterion -7.380936Sum squared resid 0.008646 Schwarz criterion -7.352016Log likelihood 891.4027 F-statistic 0.213734Durbin-Watson stat 2.001542 Prob(F-statistic) 0.644277
We ran the ARCH test to check for heteroskedasticity
The F-statistic indicates the data is homoskedastic
26
Building The ModelDependent Variable: DLNPETROP
Method: ML - ARCH
Date: 05/30/10 Time: 00:03
Sample(adjusted): 1989:03 2009:04
Included observations: 242 after adjusting endpoints
Convergence achieved after 97 iterations
Backcast: 1988:04 1989:02
Coefficient Std. Error z-Statistic Prob.
C 0.010161 0.007176 1.415926 0.1568
DD1 -0.121103 3.887236 -0.031154 0.9751
DD2 -0.273365 6.337982 -0.043131 0.9656
DD3 -0.550069 15.65666 -0.035133 0.9720
DD4 -0.920318 21.19041 -0.043431 0.9654
DD5 -1.212506 21.40283 -0.056652 0.9548
DSTEP -1.271241 21.42383 -0.059338 0.9527
AR(1) 0.508052 0.073322 6.929041 0.0000
AR(2) -0.196208 0.075139 -2.611254 0.0090
MA(10) 0.163558 0.066876 2.445691 0.0145
MA(11) 0.164319 0.066187 2.482657 0.0130
Variance Equation
C 0.000432 0.000555 0.777128 0.4371
ARCH(1) 0.067695 0.065380 1.035413 0.3005
GARCH(1) 0.803114 0.214145 3.750337 0.0002
R-squared 0.441553 Mean dependent var 0.004986
Adjusted R-squared 0.409711 S.D. dependent var 0.077414
S.E. of regression 0.059477 Akaike info criterion -2.771904
Sum squared resid 0.806566 Schwarz criterion -2.570065
Log likelihood 349.4004 F-statistic 13.86729
Durbin-Watson stat 1.961365 Prob(F-statistic) 0.000000
Inverted AR Roots .25+.36i .25 -.36i
Inverted MA Roots .86 -.26i .86+.26i .56 -.69i .56+.69i
.07 -.87i .07+.87i -.41 -.74i -.41+.74i
-.70 -.37i -.70+.37i -.75
To fix this slight structure, we tried adding ARCH/GARCH to the model
However, most variable coefficients became insignificant
We returned to an AR(1) AR(2) MA(10) MA(11) model 28
Conclusion
There is an upward trend in the forecast suggesting an increase in future petroleum price
Because of this, companies that heavily rely on oil may want to hedge against this Ex: Southwest Airlines (2007)
33