course paper on regression analysis of gold prices
DESCRIPTION
Interesting Analysis of What Affects Gold Prices Using Multiple Linear RegressionTRANSCRIPT
Multiple Linear Regression Model
For Predicting GOLD Prices
Project report submitted to
Prof. Dhiman Bhadra
Associate: Kshitij G Trivedi
For the requirements of the course
Probability and Statistics – 3
On October 25, 2012
By
Group E12
Vineet Singh
SonamYadav
SaurabhPhelixKachhap
Arun Kumar K
Rohit Raj Nikhil Pandey
2
Contents
1. Introduction ............................................................................................................................. 3
2. Data Description ...................................................................................................................... 3
3. Exploratory Analysis ............................................................................................................... 4
4. Regression Modeling ............................................................................................................... 7
a. Model Building .................................................................................................................... 7
b. Coefficient of Multiple Determination ................................................................................ 9
c. T-tests of Regression Coefficients ..................................................................................... 10
d. Residual Analysis............................................................................................................... 10
e. Multicollinearity Correction .............................................................................................. 12
f. Durbin Watson Test ........................................................................................................... 13
g. Positive Autocorrelation Correction: Lagged Variable ..................................................... 14
h. Model Validation ............................................................................................................... 15
5. Conclusion ............................................................................................................................. 16
6. Further Improvement ............................................................................................................. 16
3
1. Introduction
GOLD has attracted interest from all sorts of investors including several central banks, the IMF,
hedge fund managers, and retail investors especially in India. In the last 5 years, GOLD has
generated cumulative returns of ~130% v/s ~ -9% returns of the S&P 500 Index. Given its
growing importance as an asset class, we intend to predict GOLD prices using the regression
techniques learnt during the course.
2. Data Description
The Gold Prices have been empirically observed to be related to several macroeconomic factors.
For our analysis we have considered the following key variables
1. Value of Dollar (Euro v/s US Dollar): Since gold prices are denominated in US Dollar
and US Dollar is the reserve currency, demand for gold increases with a loss in value of
US dollar
2. Equity Indices (S&P 500 Index and BNY Mellon BRICs ADR Index): Lower
correlation with equity indices will increase demand for gold for diversification in a
portfolio
3. Commodity Prices (Thomson Reuters/Jefferies CRB Index): Rising commodity prices
are a signal that inflationary pressures in the economy are building, the purchasing power
of the dollar is declining and the gold price should be rising as well
4. Monetary Policy (US M1 Money Supply): Expansive monetary policy increases risks
of high inflation thus increasing demand for gold as an inflation hedge
5. Inflation expectations (University of Michigan Expected Inflation): Rising inflation
increases demand for ‘real’ assets like gold as investors seek to hedge themselves against
erosion in real value of money
6. Interest Rates (US Treasury Rate 1 year): Higher real interest rates increase
opportunity cost of gold and decrease its demand
4
Our data sources are mentioned in Table 1
S No FACTORS AFFECTING
DEMAND
INDEPENDENT VARIABLE IN
MULTIPLE REGRESSION
DATA
SOURCE
1 Dollar Value USD v/s EURO Exchange Rate REUTERS
2 Equity Indices S&P 500 Index
BNY Mellon BRICs ADR Index REUTERS
3 Commodity Prices TRB/J CRB Index REUTERS
4 Monetary Policy US Money Supply M1 REUTERS
5 Inflation Expectations University of Michigan Expected Inflation REUTERS
6 Interest Rates US Treasury Rate 1 year REUTERS
Table 1: Variables and Data Sources
3. Exploratory Analysis
We have used scatter plots to inspect whether a linear relationship exists between Gold Prices
and each of the explanatory variables identified
Similarly doing a scatter plot of gold rate with Value of Dollar (USD/EUR Rate), Equity Indices
(S&P 500 Index and BNY Mellon BRICs ADR Index), Commodity Prices (CRB Index),
Monetary Policy (US M1 Money Supply), Inflation Expectations we have obtained the following
plots.
1.21.11.00.90.80.70.6
2000
1500
1000
500
0
USD/EUR Rate
Go
ld P
rice
U$
/O
z
Scatterplot of Gold Price U$/Oz vs USD/EUR Rate
1600150014001300120011001000900800700
2000
1500
1000
500
S&P 500 Index
Go
ld P
rice
U$
/O
z
Scatterplot of Gold Price U$/Oz vs S&P 500 Index
5
There seem to be a positive relationship between Gold Price v/s CRB index, Gold price
v/s BRICs ADR index & gold rate v/s US Money Supply M1 which shows that gold rate
will increase with increase in either of those values.
The scatter plot of Gold price with value of dollar shows a negative relationship that
implies when value of dollar decreases there is an increase in gold price.
For Treasury Rates, based on the scatter plot obtained below we cannot draw any conclusion in
terms of linearity
We decide to plot the gold prices with Ln (Treasury Rate) (Treasury rate taken in natural
logarithmic terms).The following scatter plot was obtained which shows that there exists a linear
relationship between Gold Prices and Ln (Treasury Rate). Since the slope is negative we can
infer that the gold price and treasury rate (taken in natural logarithmic terms) will have a
negative association. As the value of LN treasury rate increases there would be a decrease in the
gold prices.
70006000500040003000200010000
2000
1500
1000
500
BRICs ADR Index
Go
ld P
rice
U$
/O
zScatterplot of Gold Price U$/Oz vs BRICs ADR Index
500400300200100
2000
1500
1000
500
CRB Index
Go
ld P
rice
U$
/O
z
Scatterplot of Gold Price U$/Oz vs CRB Index
220020001800160014001200
2000
1500
1000
500
M1
Go
ld P
rice
U$
/O
z
Scatterplot of Gold Price U$/Oz vs M1
5.55.04.54.03.53.02.52.01.5
2000
1500
1000
500
Expected Inflation
Go
ld P
rice
U$
/O
z
Scatterplot of Gold Price U$/Oz vs Expected Inflation
6
Following is the scatter plot matrix for the gold rate date set.
Owing to the variation in the range of the plots, there is no such perfect positive or
negative relationship but they can be in a sense said to have a somewhat positive relation
as depicted below
Few plots show positive association, few others negative, which is a little misleading. So in order
to get the correct picture, we need to control certain explanatory variables and this is done with
the help of multiple regression model
6543210
2000
1500
1000
500
Treasury Rate
Go
ld P
rice
U$
/O
zScatterplot of Gold Price U$/Oz vs Treasury Rate
210-1-2
2000
1500
1000
500
LN_Treasury Rate
Go
ld P
rice
U$
/O
z
Scatterplot of Gold Price U$/Oz vs LN_Treasury Rate
1.0
0.8
0.6
20-2
16001200800 200016001200 5.03.52.0
1600
1200
800
400
300
200
2000
1600
1200
5000
2500
05.0
3.5
2.0
1.00.80.6
2
0
-2
400300200 500025000
USD/EUR Rate
S&P 500 Index
CRB Index
M1
BRICs ADR Index
Expected Inflation
LN_Treasury Rate
Matrix Plot of Explanatory Variables
7
In addition to the scatter-plot matrix, a correlation matrix was constructed depicting the
correlation coefficients between the response and the predictors and also between the predictors.
Gold
Price
U$/Oz
USD/EUR
Rate
S&P
500
Index
CRB
Index M1
BRICs
ADR
Index
Expected
Inflation
Gold Price
US$/Oz
USD/EUR
Rate
-0.636
0
S&P 500
Index
0.209 -0.445
0.022 0
CRB Index 0.602 -0.833 0.707
0 0 0
M1 0.954 -0.573 0.09 0.476
0 0 0.331 0
BRICs ADR
Index
0.814 -0.783 0.578 0.834 0.659
0 0 0 0 0
Expected
Inflation
0.296 -0.501 0.608 0.794 0.179 0.54
0.001 0 0 0 0.052 0
LN_Treasury
Rate
-0.756 0.245 0.38 -0.03 -0.808 -0.35 0.138
0 0.007 0 0.743 0 0 0.136
Table 2: Correlation Matrix
4. Regression Modeling
a. Model Building
In scatter plots we used single explanatory variable for regression with the gold price. However
in real life applications, the response of gold price will depend on more than one explanatory
variable. Hence, the entire explanatory variable was taken into account at once to estimate the
value of the response. Performing multiple regression in MINITAB, we got the following
regressions equation:
8
MODEL A:
(
) ( )
( ) ( ) ( )
( ) ( )
Effect of value of dollar (USD/EUR): Since the slope of value of dollar is positive, controlling
for rest of the explanatory variables, we can say that gold price is positively correlated to value
of dollar. Specifically the predicted gold price increases by 572 for every one unit increase in
value of dollar vs euro.
Effect of value of S&P Index: Since the slope of S&P index is negative, controlling for rest of
the explanatory variables, we can say that gold price is negatively correlated to S&P Index.
Specifically the predicted gold price decreases by 0.0912 for every one unit increase in S&P
Index.
Effect of value of CRB Index: Since the slope of CRB index is positive, controlling for rest of
the explanatory variables, we can say that gold price is positively correlated to CRB Index.
Specifically the predicted gold price increases by 1.26 for every one unit increase in CRB Index.
Effect of value of Money supply (M1): Since the slope of value of money supply is positive,
controlling for rest of the explanatory variables, we can say that gold price is positively
correlated to value of money supply. Specifically the predicted gold price increases by 0.955 for
every one unit increase in value of money supply.
Effect of value of BRICs ADR index: Since the slope of value of BRICs ADR index is positive,
controlling for rest of the explanatory variables, we can say that gold price is positively
correlated to value of BRIC’s ADR index. Specifically the predicted gold price increases by
0.0830 for every one unit increase in value of BRIC’s ADR index.
Effect of value of inflation: Since the slope of value of inflation is negative, controlling for rest
of the explanatory variables, we can say that gold price is negatively correlated to value of
inflation. Specifically the predicted gold price decreases by 30.6 for every one unit increase in
value of inflation.
9
Effect of value of US Treasury Rate: Since the slope of value of natural log of US treasury is
negative, controlling for rest of the explanatory variables, we can say that gold price is
negatively correlated to value of US Treasury. Specifically the predicted gold price decreases by
64.7 for every one unit increase in natural log value of US Treasury.
b. Coefficient of Multiple Determination
This coefficient measures the proportion of variation in Gold Prices, that is simulatneously
explained by the set of predictors ( USD/EUR, S&P 500,Expected Inflation, BRICs ADR Index,
CRB Index, US Money Supply M1 & US Treasury Rate). R2 is used in the simple regression
setup. Evidently, 0< R2<1 with higher values of R
2 indicating a better fitting model and vice
versa. R2
is given by
However, R2
can only increase when additional predictor variables are added to the model.
Increasing the predictors will also increase the number of parameters and the computational cost.
In order to achieve a tradeoff between these two factors, an adjusted coefficient of multiple
determination is used.
( )
Following results were obtained:
Source DF SS MS F P
Regression 7 17463193 2494742 981.56 0
Residual Error 111 282119 2542
Total 118 1774531
Table 3: ANOVA
So we have
Thus taking USD/EUR, S&P 500,Expected Inflation, BRICs ADR Index, CRB Index, US
Money Supply M1 & US Treasury Rate explains about 98.73% of the total change in gold price.
10
c. T-tests of Regression Coefficients
For testing the significance of each predictor, the null and alternative hypothesis are built
The test statistic is given by
The following results were obtained:
Predictor Coefficient SE
Coefficient
T statistic P Value VIF
Constant -1498.000 112.900 -13.270 0.000
USD/EUR Rate 571.630 91.170 6.270 0.000 5.019
S&P 500 Index -0.091 0.054 -1.700 0.092 4.659
BRICs ADR Index 0.083 0.007 11.530 0.000 7.371
CRB Index 1.261 0.292 4.320 0.000 19.931
M1 0.955 0.063 15.230 0.000 9.475
Expected Inflation -30.550 14.240 -2.140 0.034 4.432
Ln (Treasury Rate) -64.730 14.390 -4.500 0.000 11.337
Table 4: Regression Diagnostics
For 10% significane level, we can observe from the table the p-values are very small. Hence for
each predictor, we reject the null hypothesis and accept the alternate. This means that evey
predictor have a significant effect on the gold prices at 10% significance level.
However at 5% significance level, the S&P 500 Index has high p-value and hence is not
significant
d. Residual Analysis
To test the appropriateness of the multiple regression model we use the following procedure
The residuals were plotted against fitted values to test for linearity of regression models and
consistency of error variances. The residuals fluctuate more or less randomly about 0 with no
11
noticeable trend or variation. Hence we conclude that gold price can be assumed to be
linearly related.
1) To check the validity of normal distributional assumption, the histogram on the normal
probability plot of the residuals were done. The following two plots were obtained:
18001600140012001000800600400200
3
2
1
0
-1
-2
-3
Fitted Value
Sta
nd
ard
ize
d R
esid
ua
l
Versus Fits(response is Gold Price U$/Oz)
3210-1-2-3
99.9
99
95
90
80
7060504030
20
10
5
1
0.1
Standardized Residual
Pe
rce
nt
Normal Probability Plot(response is Gold Price U$/Oz)
12
The above plots indicate the errors can be assumed to have symmetric and bell shaped
distribution. From the normal probability plot, we can deduce that the pattern is pretty much
linear and error distribution can be assumed to be normal.
e. Multicollinearity Correction
To determine whether any of the variables in the model should be removed or not because of
multicollinearity, step wise regression was done as given below.
Stepwise Regression: Forward Selection and Backward Elimination
Alpha-to-Enter: 0.05, Alpha-to-Remove: 0.05
Response is Gold Price U$/Oz on 7 predictors where Number of Observations = 119
The step wise regression was terminated at the end of six steps (Table 5) regression
equation identified using MINITAB was
MODEL B:
(
) ( )
( ) ( ) ( )
( )
3210-1-2
25
20
15
10
5
0
Standardized Residual
Fre
qu
en
cy
Histogram(response is Gold Price U$/Oz)
13
Step 1 2 3 4 5 6
Constant -1650.6 -1338.4 -965.7 -1314.4 -1460.6 -1486.3
M1 1.624 1.257 0.992 1.049 0.963 0.914
T-Value 34.56 35.47 18.25 19.58 17.35 15.67
P-Value 0 0 0 0 0 0
BRICs ADR
Index
0.0725 0.0834 0.0948 0.0808 0.0767
T-Value
15.72 18.78 18.39 13.19 12.29
P-Value
0 0 0 0 0
LN_Treasury
Rate
-54.6 -45.7 -71.1 -80.1
T-Value
-5.97 -5.09 -6.58 -7.11
P-Value
0 0 0 0
USD/EUR
Rate
283 449 542
T-Value
3.81 5.43 6.01
P-Value
0 0 0
CRB Index
0.72 1.26
T-Value
3.8 4.27
P-Value
0 0
Expected
Inflation
-33
T-Value
-2.35
P-Value
0.021
S 116 66 57.9 54.8 51.8 50.8
R-Sq 91.08 97.15 97.82 98.07 98.29 98.37
R-Sq(adj) 91 97.1 97.77 98 98.21 98.28
Mallows Cp 507.8 86 40.9 25.8 12.5 8.9
Table 5: Stepwise Regression Output
f. Durbin Watson Test
To detect the presence of autocorrelation in the residuals, Durbin Watson test was performed.
Durbin Watson test statistic obtained from MINITAB was 0.729. This denotes a high positive
autocorrelation because 0.729 < du.
For k=6 n dL dU
From D-W Tables 100 1.421 1.67
From D-W Tables 150 1.543 1.708
By Interpolation 119 1.499 1.694
Table 6: Durbin Watson Test
Our Durbin-Watson statistic of 0.729223 denotes high positive autocorrelation because
0.729 <dL<dU
14
g. Positive Autocorrelation Correction: Lagged Variable
To correct for autocorrelation we introduce a lagged variable ‘Gold Price (-1)’ or the Gold Price
for the previous month. The whole stepwise regression process was repeated with the lagged
variable ‘Gold Price (-1)’ and this time, the step wise regression was terminated at the end of 7
steps. The following regression was obtained with the new variables
MODEL C:
( ) (
)
( ) ( ) ( )
( ) ( )
Step 1 2 3 4 5 6 7
Constant 1.512 -185.654 -433.719 -366.181 -564.617 -686.812 -731.6
Gold Price (-
1)
1.015 0.914 0.696 0.617 0.581 0.543 0.533
T-Value 84.36 24.21 12.26 10.51 9.68 8.89 8.93
P-Value 0 0 0 0 0 0 0
M1 0.177 0.401 0.377 0.436 0.434 0.402 T-Value 2.82 5.43 5.32 5.83 5.92 5.54
P-Value 0.006 0 0 0 0 0
BRICs ADR
Index
0.0242 0.0346 0.043 0.0394 0.0371
T-Value 4.83 6.12 6.37 5.82 5.57
P-Value 0 0 0 0 0
LN_Treasury
Rate
-24.8 -22.8 -36.9 -45.3
T-Value -3.46 -3.22 -4.02 -4.76
P-Value 0.001 0.002 0 0
USD/EUR
Rate
139 231 325
T-Value 2.19 3.15 4.06
P-Value 0.03 0.002 0
CRB Index 0.36 0.83 T-Value 2.35 3.55
P-Value 0.021 0.001
Expected
Inflation
-29
T-Value -2.61
P-Value 0.01
S 49 47.5 43.5 41.5 40.8 40 39
R-Sq 98.41 98.51 98.77 98.89 98.93 98.98 99.04
R-Sq(adj) 98.4 98.49 98.73 98.85 98.88 98.93 98.98
Mallows Cp 66.7 57 30.3 18.8 15.6 11.8 7
Table 7: Stepwise Regression with Lagged Variable
15
The Durbin – Watson test was performed with the new regression equation in Model C and we
obtain Durbin-Watson statistic = 1.96059 which implies that there is no evidence for
autocorrelation because
dL < dU< 1.96059 < 2
For k=6 n dL dU
From D-W Tables 100 1.421 1.67
From D-W Tables 150 1.543 1.708
By Interpolation 119 1.499 1.694
Table 8: Durbin-Watson Test for Model C
Hence, the final multiple regression model to predict the gold price is as follows:
( ) (
) ( )
( ) ( ) ( )
( )
h. Model Validation
The gold prices were predicted using each of the three models discussed above and was
compared with the original to detect the forecast accuracy.
1500
1550
1600
1650
1700
1750
1800
1850
1900
1950
01/12/11 01/01/12 01/02/12 01/03/12 01/04/12 01/05/12 01/06/12 01/07/12 01/08/12 01/09/12
Model Validation
ACTUAL Gold Price Model C Model B Model A
16
Model C, obtained after correcting for positive autocorrelation is finally used to predict the gold
prices because as observed in the chart above, it has better forecasting accuracy as compared to
Model A and Model B.
5. Conclusion
Forecasting Gold Prices can be useful for several investors and policy makers. We have utilized
multiple linear regression to develop a model A that can predict Gold prices based on Exchange
Rate (USD/EUR), S&P 500 Index, BRICs ADR Index, Commodity Prices (CRB Index), Money
Supply (M1), Inflation and Ln(Treasury Rate). We have performed step wise regression to obtain
model B and applied correction for multicollinearity. However, Durbin Watson test gave us
evidence for positive autocorrelation which we corrected by using a lagged variable Gold Price
for previous period (Gold Price-1). We finally obtained Model C which has better forecast or
predictive power as compared to Model A and Model B.
MODEL C:
( ) (
)
( ) ( ) ( )
( ) ( )
6. Further Improvement
In our study we have not applied correction for heteroskedasticity as we assumed the variance
was almost constant for all observations. Also, an even more sophisticated regression model can
be obtained if we choose an appropriate lagged explanatory variable such that the correlation
coefficient is maximized.