forecasting with regression models trend analysis business forecasting prof. dr. burç Ülengin itu...
TRANSCRIPT
FORECASTING WITH REGRESSION FORECASTING WITH REGRESSION
MODELS MODELS
TREND ANALYSISTREND ANALYSIS
BUSINESS FORECASTINGBUSINESS FORECASTING
Prof. Dr. Burç ÜlenginProf. Dr. Burç Ülengin
ITUMANAGEMENT ENGINEERING FACULTY
FALL 2011
OVERVIEW
The bivarite regression modelData inspectionRegression forecast processForecasting with simple linear trendCausal regression modelStatistical evaluation of regression modelExamples...
The Bivariate Regression Model
The bivariate regression model is also known a simple regression model
It is a statistical tool that estimates the relationship between a dependent variable(Y) and a single independent variable(X).
The dependent variable is a variable which we want to forecast
The Bivariate Regression Model
)X(fY General form
Dependent variable
Independent variable
Specific form: Linear Regression Model
XY 10
Random disturbance
The Bivariate Regression Model
XY 10
•The regression model is indeed a line equation
1= slope coefficient that tell us the rate of change in Y per unit change in X
•If 1= 5, it means that one unit increase in X causes 5 unit increase in Y
is random disturbance, which causes for given X, Y can take different values
•Objective is to estimate 0 and 1 such a way that the fitted values should be as close as possible
The Bivariate Regression ModelGeometrical Representation
X
Y
Poor fit
Good fit
The red line is more close the data points than the blue one
Best Fit Estimates
210
2^
2
^
10
^
10
)XbbY()Y-Y(emin
OLS-Estimate SquareLeast Ordinary
Y-Ye
error term
XbbY
model regression Sample
XY
model regression Population
population
sample
Best Fit Estimates-OLS
XbYb
)XnX(
)YXnXY(b
)XbbY()Y-Y(emin
OLS-Estimate SquareLeast Ordinary
10
21
210
2^
2
Misleading Best Fits
X
Y
X
Y
X
Y
X
Y
e2=100e2=100
e2=100
e2=100
THE CLASSICAL ASSUMPTIONS
1. The regression model is linear in the coefficients, correctly specified, & has an additive error term.
2. E() = 0.3. All explanatory variables are uncorrelated with the error
term.4. Errors corresponding to different observations are
uncorrelated with each other.5. The error term has a constant variance.6. No explanatory variable is an exact linear function of any
other explanatory variable(s).7. The error term is normally distributed such that:
2,0 ~ Niidi
Regression Forecasting Process
Data consideration: plot the graph of each variable over time and scatter plot. Look at Trend Seasonal fluctuation Outliers
To forecast Y we need the forecasted value of X
Reserve a holdout period for evaluation and test the estimated equation in the holdout period
1T101T XbbY
An Example: Retail Car Sales
The main explanatory variables: Income Price of a car Interest rates- credit usage General price level Population Car park-number of cars sold up to time-replacement
purchases Expectation about future
For simple-bivariate regression, income is chosen as an explanatory variable
Bi-variate Regression Model
Population regression model
Our expectation is1>0But, we have no all available data at hand, the data set
only covers the 1990s.We have to estimate model over the sample periodSample regression model is
tt10t DPIRCS
tt10t eDPIbbRCS
Retail Car Sales and Disposable Personal Income Figures
17500
18000
18500
19000
19500
20000
20500
1400
1600
1800
2000
2200
90 91 92 93 94 95 96 97 98
DPI RCS
Quarterly car sales
000 carsDisposable income $
OLS Estimate
Dependent Variable: RCS
Method: Least Squares
Sample: 1990:1 1998:4
Included observations: 36
Variable CoefficientStd. Error t-Statistic Prob.
C 541010.9 746347.9 0.724878 0.4735
DPI 62.39428 40.00793 1.559548 0.1281
R-squared 0.066759 Mean dependent var 1704222.
Adjusted R-squared 0.039311 S.D. dependent var 164399.9
S.E. of regression 161136.1 Akaike info criterion 26.87184
Sum squared resid 8.83E+11 Schwarz criterion 26.95981
Log likelihood -481.6931 F-statistic 2.432189
Durbin-Watson stat 1.596908 Prob(F-statistic) 0.128128
DPIbbRCS 10
Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y per unit
change in X
When the DPI increases one $, the number of cars sold increases 62. Hypothesis test related with 1
H0: 1 =0 H1: 1 0 t test is used to test the validity of H0
t = 1/se(1)• If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 • If t statistic < t table Do not reject H0 or Pr > Do not reject H0
• t= 1,56 < t table or Pr = 0.1281 > 0.05 Do not Reject H0
• DPI has no effect on RCS
Basic Statistical Evaluation R2 is the coefficient of determination that tells us the fraction of the variation
in Y explained by X 0<R2<1 ,
R2 = 0 indicates no explanatory power of X-the equation. R2 = 1 indicates perfect explanation of Y by X-the equation. R2 = 0.066 indicates very weak explanation power
Hypothesis test related with R2
H0: R2=0 H1: R20 F test check the hypothesis
• If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic=2.43 < F table or Pr = 0.1281 > 0.05 Do not reject H0
• Estimated equation has no power to explain RCS figures
Graphical Evaluation of Fitand Error Terms
-400000
-200000
0
200000
400000
1400000
1600000
1800000
2000000
2200000
90 91 92 93 94 95 96 97 98
Residual Actual Fitted
Residuls show clear seasonal pattern
Model Improvement
When we look the graph of the series, the RCS exhibits clear seasonal fluctuations, but PDI does not.
Remove seasonality using seasonal adjustment method.
Then, use seasonally adjusted RCS as a dependent variable.
Seasonal Adjustment
Sample: 1990:1 1998:4 Included observations: 36 Ratio to Moving Average Original Series: RCS Adjusted Series: RCSSA
Scaling Factors:• 1 0.941503• 2 1.119916• 3 1.016419• 4 0.933083
Seasonally Adjusted RCS and RCS
1400000
1600000
1800000
2000000
2200000
90 91 92 93 94 95 96 97 98
RCS RCSSA
OLS EstimateDependent Variable: RCSSA
Method: Least Squares
Sample: 1990:1 1998:4
Included observations: 36
Variable CoefficientStd. Error t-Statistic Prob.
C 481394.3 464812.8 1.035674 0.3077
DPI 65.36559 24.91626 2.623411 0.0129
R-squared 0.168344 Mean dependent var 1700000.
Adjusted R-squared 0.143883 S.D. dependent var 108458.4
S.E. of regression 100352.8 Akaike info criterion 25.92472
Sum squared resid 3.42E+11 Schwarz criterion 26.01270
Log likelihood -464.6450 F-statistic 6.882286
Durbin-Watson stat 0.693102 Prob(F-statistic) 0.012939
DPIbbRCSSA 10
Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y
per unit change in X
When the DPI increases one $, the number of cars sold increases 65.
Hypothesis test related with 1 H0: 1 =0 H1: 1 0 t test is used to test the validity of H0
t = 1/se(1)• If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 • If t statistic < t table Do not reject H0 or Pr > Do not reject H0
• t= 2,62 < t table or Pr = 0.012 < 0.05 Reject H0
• DPI has statistically significant effect on RCS
Basic Statistical Evaluation R2 is the coefficient of determination that tells us the
fraction of the variation in Y explained by X 0<R2<1 ,
R2 = 0 indicates no explanatory power of X-the equation. R2 = 1 indicates perfect explanation of Y by X-the equation. R2 = 0.1683 indicates very weak explanation power
Hypothesis test related with R2
H0: R2=0 H1: R20 F test check the hypothesis
• If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic = 6.88 < F table or Pr = 0.012 < 0.05 Reject H0
• Estimated equation has some power to explain RCS figures
Graphical Evaluation of Fitand Error Terms
-200000
-100000
0
100000
200000
300000
1500000
1600000
1700000
1800000
1900000
90 91 92 93 94 95 96 97 98
Residual Actual Fitted
No seasonality but it still does not look random disturbance
Omitted Variable?Business Cycle?
Trend Models
Simple Regression ModelSpecial Case: Trend Model
t 1 0 tt b b Y
•Independent variable Time, t = 1, 2, 3,...., T-1, T
•There is no need to forecast the independent variable
•Using simple transformations, variety of nonlinear trend equations can be estimated , therefore the estimated model can mimic the pattern of the data
Suitable Data Pattern
NO SEASONALITY
ADDITIVE SEASONALITY
MULTIPLICTIVE SEASONALITY
NO TREND
ADDITIVE TREND
MULTIPLICATIVE TREND
Chapter 3 Exercise 13College Tuition Consumers' Price Index by
Quarter
120
160
200
240
280
86 87 88 89 90 91 92 93 94 95
FEE
Holdout period
OLS EstimatesDependent Variable: FEE
Method: Least Squares
Sample: 1986:1 1994:4
Included observations: 36
Variable CoefficientStd. Error t-Statistic Prob.
C 115.7312 1.982166 58.38624 0.0000
@TREND 3.837580 0.097399 39.40080 0.0000
R-squared 0.978568 Mean dependent var 182.8889
Adjusted R-squared 0.977938 S.D. dependent var 40.87177
S.E. of regression 6.070829 Akaike info criterion 6.498820
Sum squared resid 1253.069 Schwarz criterion 6.586793
Log likelihood -114.9788 F-statistic 1552.423
Durbin-Watson stat 0.284362 Prob(F-statistic) 0.000000
tbbfee 10
e2
Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y per unit
change in X
Each year tuition increases 3.83 points. Hypothesis test related with 1
H0: 1 =0 H1: 1 0 t test is used to test the validity of H0
t = 1/se(1)• If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 • If t statistic < t table Do not reject H0 or Pr > Do not reject H0
• t= 39,4 > t table or Pr = 0.0000 < 0.05 Reject H0
Basic Statistical Evaluation R2 is the coefficient of determination that tells us the fraction of the variation
in Y explained by X 0<R2<1 ,
R2 = 0 indicates no explanatory power of X-the equation. R2 = 1 indicates perfect explanation of Y by X-the equation. R2 = 0.9785 indicates very weak explanation power
Hypothesis test related with R2
H0: R2=0 H1: R20 F test check the hypothesis
• If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic= 1552 < F table or Pr = 0.0000 < 0.05 Reject H0
• Estimated equation has explanatory power
Graphical Evaluation of Fit
100
150
200
250
300
86 87 88 89 90 91 92 93 94 95
FEE FEEF
Holdout period
ACTUAL FORECAST
1995 Q1 260.00 253.881995 Q2 259.00 257.721995 Q3 266.00 261.551995 Q4 274.00 265.39
Graphical Evaluation of Fitand Error Terms
-15
-10
-5
0
5
10
15
100
150
200
250
300
86 87 88 89 90 91 92 93 94
Residual Actual Fitted
Residuals exhibit clear pattern, they are not random
Also the seasonal fluctuations can not be modelled
Regression model is misspecified
Model Improvement Data may exhibit exponential trend
In this case, take the logarithm of the dependent variable
Calculate the trend by OLS After OLS estimation forecast the holdout period Take exponential of the logarithmic forecasted values in
order to reach original units
tt
tt
etbbY
tY
10
10
)ln(
)ln(
tt AeY 1
Suitable Data Pattern
NO SEASONALITY
ADDITIVE SEASONALITY
MULTIPLICTIVE SEASONALITY
NO TREND
ADDITIVE TREND
MULTIPLICATIVE TREND
Original and Logarithmic Transformed Data
4.8
5.0
5.2
5.4
5.6
5.8
100
150
200
250
300
86 87 88 89 90 91 92 93 94 95
LFEE FEE
LOG(FEE) FEE 4.844187 127.000 4.844187 127.000 4.867534 130.000 4.912655 136.000 4.912655 136.000 4.919981 137.000 4.941642 140.000 4.976734 145.000 4.983607 146.000
OLS Estimate of the Logrithmin Trend Model
Dependent Variable: LFEE
Method: Least Squares
Sample: 1986:1 1994:4
Included observations: 36
Variable Coefficient Std. Error t-Statistic Prob.
C 4.816708 0.005806 829.5635 0.0000
@TREND 0.021034 0.000285 73.72277 0.0000
R-squared 0.993783 Mean dependent var 5.184797
Adjusted R-squared 0.993600 S.D. dependent var 0.222295
S.E. of regression 0.017783 Akaike info criterion -5.167178
Sum squared resid 0.010752 Schwarz criterion -5.079205
Log likelihood 95.00921 F-statistic 5435.047
Durbin-Watson stat 0.893477 Prob(F-statistic) 0.000000
tbb)feeln( 10
Forecast Calculations
obs FEE LFEEF FEELF=exp(LFEEF)1993:1 228.0000 5.405651 222.66101993:2 228.0000 5.426684 227.39401993:3 235.0000 5.447718 232.22761993:4 243.0000 5.468751 237.16391994:1 244.0000 5.489785 242.20521994:2 245.0000 5.510819 247.35361994:3 251.0000 5.531852 252.61141994:4 259.0000 5.552886 257.98101995:1 260.0000 5.573920 263.46481995:2 259.0000 5.594953 269.06511995:3 266.0000 5.615987 274.78451995:4 274.0000 5.637021 280.6254
Graphical Evaluation of Fitand Error Terms
-0.04
-0.02
0.00
0.02
0.044.8
5.0
5.2
5.4
5.6
86 87 88 89 90 91 92 93 94
Residual Actual Fitted
Residuals exhibit clear pattern, they are not random
Also the seasonal fluctuations can not be modelled
Regression model is misspecified
Model Improvement
In order to deal with seasonal variations remove seasonal pattern from the data
Fit regression model to seasonally adjusted data
Generate forecastsAdd seasonal movements to the forecasted
values
Suitable Data Pattern
NO SEASONALITY
ADDITIVE SEASONALITY
MULTIPLICTIVE SEASONALITY
NO TREND
ADDITIVE TREND
MULTIPLICATIVE TREND
Multiplicative Seasonal Adjustment
Included observations: 40 Ratio to Moving Average
Original Series: FEE Adjusted Series: FEESA
Scaling Factors:• 1 1.002372• 2 0.985197• 3 0.996746• 4 1.015929
Original and Seasonally Adjusted Data
120
160
200
240
280
86 87 88 89 90 91 92 93 94 95
FEE FEESA
OLS Estimate of the Seasonally Adjusted Trend Model
Dependent Variable: FEESAMethod: Least SquaresSample: 1986:1 1995:4Included observations: 40
Variable Coefficient Std. Error t-Statistic Prob. C 115.0387 1.727632 66.58749 0.0000@TREND 3.897488 0.076240 51.12152 0.0000
R-squared 0.985668 Mean dependent var 191.0397Adjusted R-squared 0.985291 S.D. dependent var 45.89346S.E. of regression 5.566018 Akaike info criterion 6.319943Sum squared resid 1177.261 Schwarz criterion 6.404387Log likelihood -124.3989 F-statistic 2613.410
Durbin-Watson stat 0.055041 Prob(F-statistic) 0.000000
Graphical Evaluation of Fitand Error Terms
-10
-5
0
5
10
15
100
150
200
250
300
86 87 88 89 90 91 92 93 94 95
Residual Actual Fitted
Residuals exhibit clear pattern, they are not random
There is no seasonal fluctuations
Regression model is misspecified
Model Improvement
Take the logarithm in order to remove existing nonlinearity
Use additive seasonal adjustment to logarithmic data
Apply OLS to seasonally adjusted logrithmic data Forecast holdout period Add seasonal movements to reach seasonal
forecasts Take an exponential in order to reach original
seasonal forecasts
Suitable Data Pattern
NO SEASONALITY
ADDITIVE SEASONALITY
MULTIPLICTIVE SEASONALITY
NO TREND
ADDITIVE TREND
MULTIPLICATIVE TREND
Logarithmic Transformation and Additive Seasonal Adjustment
Sample: 1986:1 1995:4
Included observations: 40
Difference from Moving Average
Original Series: LFEE =log(FEE)
Adjusted Series: LFEESA
Scaling Factors:
1 0.002216
2 -0.014944
3 -0.003099
4 0.015828
Original and Logarithmic Additive Seasonally Adjustment Series
100
150
200
250
300
4.8
5.0
5.2
5.4
5.6
5.8
86 87 88 89 90 91 92 93 94 95
FEE LFEESA
OLS Estimate of the Logarithmic Additive Seasonally Adjustment Data
Dependent Variable: LFEESA
Method: Least Squares
Sample: 1986:1 1995:4
Included observations: 40
Variable Coefficient Std. Error t-Statistic Prob.
C 4.822122 0.004761 1012.779 0.0000
@TREND 0.020618 0.000210 98.12760 0.0000
R-squared 0.996069 Mean dependent var 5.224171
Adjusted R-squared 0.995966 S.D. dependent var 0.241508
S.E. of regression 0.015340 Akaike info criterion -5.468039
Sum squared resid 0.008942 Schwarz criterion -5.383595
Log likelihood 111.3608 F-statistic 9629.026
Durbin-Watson stat 0.149558 Prob(F-statistic) 0.000000
Graphical Evaluation of Fitand Error Terms
-0.04
-0.02
0.00
0.02
0.04
4.8
5.0
5.2
5.4
5.6
5.8
86 87 88 89 90 91 92 93 94 95
Residual Actual Fitted
Residuals exhibit clear pattern, they are not random
There is no seasonal fluctuations
Regression model is misspecified
Autoregressive Model
Some cases the growth model may be more suitable to the data
If data exhibits the nonlinearity, the autoregressive model can be adjusted to model exponential pattern
model Sample eYaaY
model Population YY
t1t10t
t1t10t
model Sample e)Y(Lnaa)Y(Ln
model Population )Y(Ln)Y(Ln
t1t10t
t1t10t
OLS Estimate of Autoregressive Model
Dependent Variable: FEE
Method: Least Squares
Sample(adjusted): 1986:2 1995:4
Included observations: 39 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob.
C 0.739490 2.305654 0.320729 0.7502
FEE(-1) 1.016035 0.011884 85.49718 0.0000
R-squared 0.994964 Mean dependent var 192.7179
Adjusted R-squared 0.994828 S.D. dependent var 45.45787
S.E. of regression 3.269285 Akaike info criterion 5.256940
Sum squared resid 395.4643 Schwarz criterion 5.342251
Log likelihood -100.5103 F-statistic 7309.767
Durbin-Watson stat 1.888939 Prob(F-statistic) 0.000000
1t10t feebbfee
Graphical Evaluation of Fitand Error Terms
-10
-5
0
5
10
100
150
200
250
300
87 88 89 90 91 92 93 94 95
Residual Actual Fitted
Clear seasonal pattern
Model is misspecified
Model Improvement
To remove seasonal fluctuations Seasonally adjust the data Apply OLS to Autoregressive Trend ModelForecast seasonally adjusted dataAdd seasonal movement to forecasted
values
Dependent Variable: FEESAMethod: Least SquaresSample(adjusted): 1986:2 1995:4Included observations: 39 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. C 1.125315 0.811481 1.386743 0.1738FEESA(-1) 1.013445 0.004181 242.4027 0.0000
R-squared 0.999371 Mean dependent var 192.6894Adjusted R-squared 0.999354 S.D. dependent var 45.27587S.E. of regression 1.151024 Akaike info criterion 3.169101Sum squared resid 49.01968 Schwarz criterion 3.254412Log likelihood -59.79748 F-statistic 58759.08
Durbin-Watson stat 1.335932 Prob(F-statistic) 0.000000
OLS Estimate of Seasonally AdjustedAutoregressive Model
Graphical Evaluation of Fitand Error Terms
-2
0
2
4
100
150
200
250
300
87 88 89 90 91 92 93 94 95
Residual Actual Fitted
No seasonal pattern in the residuals
Model specification seems more corret than the previous estimates
Seasonal Autoregressive Model
If data exhibits sesonal fluctutions, the growth model should be remodeled
If data exhibits the nonlinearity and sesonality together, the seasonal autoregressive model can be adjusted to model exponential pattern
model Sample eYaaY
model Population YY
tst10t
tst10t
model Sample e)Y(Lnaa)Y(Ln
model Population )Y(Ln)Y(Ln
tst10t
tst10t
New Product ForecastingGrowth Curve Fitting
For new products, the main problem is typically lack of historical data.
Trend or Seasonal pattern can not be determined. Forecasters can use a number of models that generally fall in the
category called Diffusion Models. These models are alternatively called S-curves, growth models,
saturation models, or substitution curves. These models imitate life cycle of poducts. Life cycles follows a
common pattern: A period of slow growth just after introduction of new product A period of rapid growth Slowing growth in a mature phase Decline
New Product ForecastingGrowth Curve Fitting
Growth models has its own lower and upper limit.
A significant benefit of using diffusion models is to identfy and predict the timing of the four phases of the life cycle.
The usual reason for the transition from very slow initial growth to rapid growth is often the result of solutions to technical difficulties and the market’s acceptance of the new product / technology.
There are uper limits and a maturity phase occurs in which growth slows and finally ceases.
0
5
10
15
20
25
30
35
1 2 3 4 5 6 7 8 9 10TIME
SALES
GOMPERTZ CURVEGompertz function is given as
where L = Upper limit of Y e = Natural number = 2.718262..... a and b = coefficients describing the
curveThe Gompertz curve will range in value from
zero to L as t varies from zero to infinity.Gompertz curve is a way to summarize the
growth with a few parameters.
btaet LeY
GOMPERTZ CURVEAn Example
HDTV: LCD and Plazma TV sales figures
YEAR HDTV
2000 1200
2001 1500
2002 1770
2003 3350
2004 5500
2005 9700
2006 15000
0
2000
4000
6000
8000
10000
12000
14000
16000
00 01 02 03 04 05 06 07 08 09 10
HDTV
0
50000
100000
150000
200000
250000
300000
350000
00 05 10 15 20 25 30 35 40 45 50
HDTVF
GOMPERTZ CURVEAn Example
Dependent Variable: HDTVMethod: Least SquaresSample (adjusted): 2000 2006Included observations: 7 after adjustmentsConvergence achieved after 61 iterationsHDTV=C(1)*EXP(-C(2)*EXP(-C(3)*@TREND))
Coefficient Std. Error t-Statistic Prob. C(1) 332940 850837 0.391 0.716C(2) 6.718 2.023 3.321 0.029C(3) 0.128 0.087 1.477 0.214
R-squared 0.992 Mean dependent var 5431.429Adjusted R-squared 0.988 S.D. dependent var 5178.199S.E. of regression 559.922 Akaike info criterion 15.791Sum squared resid1254049 Schwarz criterion 15.76782Log likelihood -52.26849 Durbin-Watson stat 0.704723
LOGISTICS CURVE Logistic function is given as
where L = Upper limit of Y e = Natural number = 2.718262..... a and b = coefficients describing the curve
The Logistic curve will range in value from zero to L as t varies from zero to infinity.
The Logistic curve is symetric about its point of inflection. The Gompertz curve is not necessarily symmetric.
btt ae
LY
1
LOGISTICS or GOMPERTZ CURVES ?
The answer lies in whether, in a particular situation, it is easier to achieve the maximum value the closer you get to it, or whether it becomes more difficult to attain the maximum value the closer you get to it. Are there factors assisting the attainment of the maximum
value once you get close to it, or Are there factors preventing the attainment of the maximum
value once it is nearly attained? If there is an offsetting factor such that growth is more
difficult to maintain as the maximum is approached, then the Gompertz curve will be the best choice.
If there are no such offsetting factors hindering than attainment of the maximum value, the logistics curve will be the best choice.
0
40000
80000
120000
160000
00 05 10 15 20 25 30 35 40 45 50
HDTV HDTVF
LOGISTICS CURVEAn Example
HDTV: LCD and Plazma TV sales figures
YEAR HDTV2000 1200
2001 1500
2002 1770
2003 3350
2004 5500
2005 9700
2006 15000Dependent Variable: HDTVMethod: Least SquaresSample (adjusted): 2000 2006Included observations: 7 after adjustmentsConvergence achieved after 1 iterationHDTV=C(1)/(1+C(2)*EXP(-C(3)*@TREND))
Coefficient Std. Error t-Statistic Prob. C(1) 149930.000 350258.500 0.428 0.691C(2) 199.182 432.110 0.461 0.669C(3) 0.517 0.073 7.048 0.002
R-squared 0.997 Mean dependent var #######Adjusted R-squared 0.995 S.D. dependent var #######S.E. of regression 370.451 Akaike info criterion 14.965Sum squared resid 548936 Schwarz criterion 14.942Log likelihood -49.377 Durbin-Watson stat 1.632
0
50000
100000
150000
200000
250000
300000
350000
00 05 10 15 20 25 30 35 40 45 50
HD TV F _G HD TV F _L
LOGISTICS versus GOMPERTZ CURVES
FORECASTING WITH FORECASTING WITH
MULTIPLE REGRESSION MULTIPLE REGRESSION
MODELSMODELS
BUSINESS FORECASTINGBUSINESS FORECASTING
CONTENT DEFINITION INDEPENDENT VARIABLE
SELECTION,FORECASTING WITH MULTIPLE REGRESSION MODEL
STATISTICAL EVALUATION OF THE MODEL SERIAL CORRELATION SEASONALITY TREATMENT GENERAL AUTOREGRESSIVE MODEL ADVICES EXAMPLES....
MULTIPLE REGRESSION MODEL
DEPENDENT VARIABLE, Y, IS A FUNCTION OF MORE THAN ONE INDEPENDENT VARIABLE, X1, X2,..Xk
eXb...XbXbbY
REGRESSION SAMPLE -FORM LINEAR
X...XXY
REGRESSION POPULATION -FORM LINEAR
)X,...X,X(fY
FORM GENERAL
kk22110
kk22110
k21
SELECTING INDEPENDENT VARIABLES
FIRST, DETERMINE DEPENDENT VARIABLE SEARCH LITERATURE, USE COMMONSENSE AND
LIST THE MAIN POTENTIAL EXPLANATORY VARIABLES
IF TWO VARIABLE SHARE THE SAME INFORMATION SUCH AS GDP AND GNP SELECT THE MOST RELEVANT ONE
IF A VARITION OF A VARIABLE IS VERY LITTLE, FIND OUT MORE VARIABLE ONE
SET THE EXPECTED SIGNS OF THE PARAMETERS TO BE ESTIMATED
AN EXAMPLE: SELECTING INDEPENDENT VARIABLES
LIQUID PETROLIUM GAS-LPG- MARKET SIZE FORECAST
POTENTIAL EXPLANATORY VARIABLES POPULATION PRICE URBANIZATION RATIO GNP or GDP
EXPECTATIONS
0 0 0 0
PRICEURGDPPOPLPG
4321
tt4t3t2t10t
2kk210
2^
2
^
kk210
^
kk210
)Xb ....X2bX1bbY(
)Y-Y(emin
OLS-Estimate SquareLeast Ordinary
Y-Ye
Error term
Xb ....X2bX1bbY
Model Regression Sample
X ....X2X1Y
Model Regression Population
PARAMETER ESTIMATES-OLS ESTIMATION
IT IS VERY COMPLEX TO CALCULATE b’s, MATRIX ALGEBRA IS USED TO ESTIMATE b’s.
FORECASTING WITH MULTIPLE REGRESSION MODEL
Ln(SALESt) = 23 + 1.24*Ln(GDPt) - 0.90*Ln(PRICEt)
IF GDP INCREASES 1%, SALES INCRESES 1.24% IF PRICE INCREASES 1% SALES DECRAESES 0.9% PERIOD GDP PRICE SALES
100 1245 100 230
101 1300 103 ?
Ln(SALESt) = 23 + 1.24*Ln(1300) - 0.90*Ln(103)
Ln(SALESt) = 3.63
e3.63 = 235
EXAMPLE : LPG FORECAST
0
200000
400000
600000
800000
68 70 72 74 76 78 80 82 84 86 88 90 92 94 96
TUPSATAY
LOGARITHMIC TRANSFORMATION
10.5
11.0
11.5
12.0
12.5
13.0
13.5
68 70 72 74 76 78 80 82 84 86 88 90 92 94 96
LSATA
SCATTER DIAGRAM
10
11
12
13
14
13.6 13.8 14.0 14.2 14.4 14.6
LGNP
LS
ATA
10
11
12
13
14
0 2 4 6 8 10 12
LP
LS
ATA
UNEXPECTED RELATION
LSATA=f(LGNP)
Dependent Variable: LSATAMethod: Least SquaresSample: 1968 1997Included observations: 30
Variable Coefficient Std. Error t-Statistic Prob. C -44.91150 3.097045 -14.50140 0.0000LGNP 4.081938 0.220265 18.53195 0.0000
R-squared 0.924616 Mean dependent var 12.47858Adjusted R-squared 0.921924 S.D. dependent var 0.736099S.E. of regression 0.205681 Akaike info criterion -0.260637Sum squared resid 1.184535 Schwarz criterion -0.167224 Log likelihood 5.909555 F-statistic 343.4333
Durbin-Watson stat 0.485414 Prob(F-statistic) 0.000000
LGNPLSATA 10
Graphical Evaluation of Fitand Error Terms
-0.6
-0.4
-0.2
0.0
0.2
0.4
10
11
12
13
14
68 70 72 74 76 78 80 82 84 86 88 90 92 94 96
Residual Actual Fitted
NOT RANDOM
LSATA=f(LP)
Dependent Variable: LSATAMethod: Least SquaresSample(adjusted): 1969 1997Included observations: 29 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. C 11.70726 0.081886 142.9694 0.0000LP 0.190128 0.015096 12.59492 0.0000
R-squared 0.854551 Mean dependent var 12.53724Adjusted R-squared 0.849164 S.D. dependent var 0.674006S.E. of regression 0.261768 Akaike info criterion 0.223756Sum squared resid 1.850107 Schwarz criterion 0.318052Log likelihood -1.244459 F-statistic 158.6319Durbin-Watson stat 0.187322 Prob(F-statistic) 0.000000
LPLSATA 10
Graphical Evaluation of Fitand Error Terms
-1.0
-0.5
0.0
0.5
11
12
13
14
70 72 74 76 78 80 82 84 86 88 90 92 94 96
Residual Actual Fitted
NOT RANDOM
LSATA=f(LGNP,LP)
Dependent Variable: LSATAMethod: Least SquaresSample(adjusted): 1969 1997Included observations: 29 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. C -30.808410 7.715902 -3.992846 0.0005LGNP 3.066655 0.556533 5.510284 0.0000LP 0.045318 0.028281 1.602436 0.1211
R-squared 0.932905 Mean dependent var 12.53724Adjusted R-squared 0.927744 S.D. dependent var 0.674006S.E. of regression 0.181176 Akaike info criterion -0.480999Sum squared resid 0.853443 Schwarz criterion -0.339555Log likelihood 9.974488 F-statistic 180.7558Durbin-Watson stat 0.364799 Prob(F-statistic) 0.000000
LPLGNPLSATA 210
Graphical Evaluation of Fitand Error Terms
-0.4
-0.2
0.0
0.2
0.4
11
12
13
14
70 72 74 76 78 80 82 84 86 88 90 92 94 96
Residual Actual Fitted
NOT RANDOM
WHAT IS MISSING?
GNP AND PRICE ARE THE MOST IMPORTANT VARIABLES BUT THE COEFFICIENT OF THE PRICE IS NOT SIGNIFICANT AND HAS UNEXPECTED SIGN
RESIDUAL DISTRIBUTION IS NOT RANDOM WHAT IS MISSING?
WRONG FUNCTION-NONLINEAR MODEL? LACK OF DYNAMIC MODELLING? MISSING IMPORTANT VARIABLE?
• POPULATION?
Dependent Variable: LSATAMethod: Least SquaresSample(adjusted): 1969 1997Included observations: 29 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. C -50.913420 3.992134 -12.75343 0.0000LGNP 0.755445 0.337894 2.235746 0.0345LP -0.131508 0.021528 -6.108568 0.0000LPOP 4.955945 0.486887 10.17885 0.0000
R-squared 0.986958 Mean dependent var 2.53724Adjusted R-squared 0.985393 S.D. dependent var 0.674006S.E. of regression 0.081461 Akaike info criterion -2.049934Sum squared resid 0.165899 Schwarz criterion -1.861342Log likelihood 33.72405 F-statistic 630.6084Durbin-Watson stat 0.398661 Prob(F-statistic) 0.000000
LSATA=f(LGNP,LP,LPOP)
LPOPLPLGNPLSATA 3210
Graphical Evaluation of Fitand Error Terms
-0.2
-0.1
0.0
0.1
0.2
11.0
11.5
12.0
12.5
13.0
13.5
70 72 74 76 78 80 82 84 86 88 90 92 94 96
Residual Actual Fitted
NOT RANDOM
WHAT IS MISSING?
GNP, POPULATION AND PRICE ARE THE MOST IMPORTANT VARIABLES.• THEY ARE SIGNIFICANT • THEY HAVE EXPECTED SIGN
RESIDUAL DISTRIBUTION IS NOT RANDOM WHAT IS MISSING?
• WRONG FUNCTION-NONLINEAR MODEL?
• LACK OF DYNAMIC MODELLING? YES.
• MISSING IMPORTANT VARIABLE? YES, URBANIZATION
Dependent Variable: LSATAMethod: Least SquaresSample(adjusted): 1969 1997Included observations: 29 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. C -16.185910 3.832897 -4.222893 0.0003LGNP 0.523657 0.150971 3.468585 0.0020LP -0.033964 0.013483 -2.518934 0.0188LPOP 1.279753 0.419566 3.050182 0.0055LSATA(-1) 0.619986 0.060756 10.20446 0.0000
R-squared 0.997557 Mean dependent var 2.53724Adjusted R-squared 0.997150 S.D. dependent var 0.674006S.E. of regression 0.035983 Akaike info criterion -3.655968Sum squared resid 0.031074 Schwarz criterion -3.420227Log likelihood 58.01154 F-statistic 2450.048Durbin-Watson stat 2.118752 Prob(F-statistic) 0.000000
LSATA=f(LGNP,LP,LPOP,LSATAt-1)
1t4t3t2t10t LSATALPOPLPLGNPLSATA
Graphical Evaluation of Fitand Error Terms
-0.10
-0.05
0.00
0.05 11.0
11.5
12.0
12.5
13.0
13.5
14.0
70 72 74 76 78 80 82 84 86 88 90 92 94 96
Residual Actual Fitted
RANDOM
Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y
per unit change in X
When the GNP increases 1%, the volume of LPG sales increases 0.52%.
Hypothesis test related with 1 • H0: 1 =0• H1: 1 0• t test is used to test the validity of H0
• t = 1/se(1) If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 If t statistic < t table Do not reject H0 or Pr > Do not reject H0
t= 3,46 < t table or Pr = 0.002 < 0.05 Reject H0
GNP has effect on RCS
Basic Statistical Evaluation R2 is the coefficient of determination that tells us the
fraction of the variation in Y explained by X 0<R2<1 ,
• R2 = 0 indicates no explanatory power of X-the equation.• R2 = 1 indicates perfect explanation of Y by X-the equation.• R2 = 0.9975 indicates very strong explanation power
Hypothesis test related with R2
• H0: R2=0• H1: R20• F test check the hypothesis
If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 If F statistic < F table Do not reject H0 or Pr > Do not reject H0 F-statistic=2450 < F table or Pr = 0.0000 < 0.05 Reject H0
Estimated equation has power to explain RCS figures
SHORT AND LONG TERM IMPACTS
If we specify a dynamic model, we can estimate short and a long term impact of independent variables simultaneously on the dependent variable
x)1()1(
y
xy)1(
xyy
yxy
yxy
2
1
2
0
t102
t102
2t10
1t2t10t
yyy
conditions mequilibriu in
1-tt
Short term effect of x
Long term effect of x
AN EXAMPLE: SHORT AND LONG TERM IMPACTS
Short Term Impact Long Term Impact
LGNP 0.523657 1.3778
LP -0.033964 -0.0892
LPOP 1.279753 3.3657
If GNP INCREASES 1% AT TIME t, THE LPG SALES INCREASES 0.52% AT TIME t
IN THE LONG RUN, WITHIN 3-5 YEARS, THE LPG SALES INCREASES 1.38%
SESONALITY AND MULTIPLE REGRESSION MODEL
SEASONAL DUMMY VARIABLES CAN BE USED TO MODEL SEASONAL PATTERNS
DUMMY VARIABLE IS A BINARY VARIABLE THAT ONLY TAKES THE VALUES 0 AND 1.
DUMMY VARIABLES RE THE INDICATOR VARIABLES, IF THE DUMMY VARIABLE TAKES 1 IN A GIVEN TIME, IT MEANS THAT SOMETHING HAPPENS IN THAT PERIOD.
SEASONAL DUMMY VARIABLES THE SOMETHING CAN BE SPECIFIC SEASON THE DUMMY VARIABLE INDICATES THE SPECIFIC SEASON D1 IS A DUMMY VARIABLE WHICH INDICATES THE FIRST QUARTERS
» 1990Q1 1» 1990Q2 0» 1990Q3 0» 1990Q4 0» 1991Q1 1» 1991Q2 0» 1991Q3 0» 1991Q4 0» 1992Q1 1» 1992Q2 0» 1992Q3 0» 1992Q4 0
BASE PERIOD
DATE D1 D2 D3 1990 Q1 1 0 0 1990 Q2 0 1 0 1990 Q3 0 0 1 1990 Q4 0 0 0 1990 Q1 1 0 0 1991 Q2 0 1 0 1991 Q3 0 0 1 1991 Q4 0 0 0 1992 Q1 1 0 0 1992 Q2 0 1 0 1992 Q3 0 0 1 1992 Q4 0 0 0
FULL SEASONAL DUMMY VARIABLE REPRESANTATION
COLLEGE TUITION CONSUMERS' PRICE INDEX BY QUARTER
120
160
200
240
280
86 87 88 89 90 91 92 93 94 95
FEE
COLLEGE TUITION CONSUMERS' PRICE INDEX BY QUARTER
QUARTERLY DATA THEREFORE 3 DUMMY VARIABLES WILL BE SUFFICIENT TO CAPTURE THE SEASONAL PATTERN
DATE D1 D2 D3 1990 Q1 1 0 0 1990 Q2 0 1 0 1990 Q3 0 0 1 1990 Q4 0 0 0
SEASONAL PATTERN MODELLED
COLLEGE TUITION PRICE INDEX TREND ESTIMATION
Dependent Variable: LOG(FEE)
Method: Least Squares
Sample(adjusted): 1986:3 1995:4
Included observations: 38 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob.
C 4.832335 0.006948 695.4771 0.0000
@TREND 0.020780 0.000232 89.57105 0.0000
D1 -0.011259 0.007202 -1.563344 0.1275
D1(-1) -0.029526 0.007198 -4.101948 0.0003
D1(-2) -0.017082 0.007010 -2.436806 0.0204
R-squared 0.995921 Mean dependent var 5.244170
Adjusted R-squared 0.995427 S.D. dependent var 0.231661
S.E. of regression 0.015666 Akaike info criterion -5.352558
Sum squared resid 0.008099 Schwarz criterion -5.137087
Log likelihood 106.6986 F-statistic 2014.429
Durbin-Watson stat 0.161634 Prob(F-statistic) 0.000000
2t41t3t210t 1D1D1DtLFEE
Graphical Evaluation of Fitand Error Terms
-0.04
-0.02
0.00
0.02
0.04
4.8
5.0
5.2
5.4
5.6
5.8
87 88 89 90 91 92 93 94 95
Residual Actual Fitted
NOT RANDOM
COLLEGE TUITION PRICE INDEX AUTOREGRESSIVE TREND
ESTIMATIONDependent Variable: LOG(FEE)Method: Least SquaresSample(adjusted): 1986:3 1995:4Included observations: 38 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. C 0.050887 0.022969 2.215524 0.0337LOG(FEE(-1)) 0.997510 0.004375 227.9958 0.0000D1 -0.031634 0.002833 -11.16704 0.0000D1(-1) -0.035335 0.002833 -12.47301 0.0000D1(-2) -0.006775 0.002761 -2.454199 0.0196
R-squared 0.999368 Mean dependent var 5.244170Adjusted R-squared 0.999292 S.D. dependent var 0.231661S.E. of regression 0.006165 Akaike info criterion -7.217678Sum squared resid 0.001254 Schwarz criterion -7.002206Log likelihood 142.1359 F-statistic 13051.60
Durbin-Watson stat 1.605178 Prob(F-statistic) 0.000000
2t41t3t21t10t 1D1D1DLFEELFEE
Graphical Evaluation of Fitand Error Terms
-0.02
-0.01
0.00
0.01
0.02
4.8
5.0
5.2
5.4
5.6
5.8
87 88 89 90 91 92 93 94 95
Residual Actual Fitted
RANDOM
SEASONAL PART OF THE MODEL
DYNAMIC PART OF THE MODEL
COLLEGE TUITION PRICE INDEX GENERALIZED AUTOREGRESSIVE TREND
ESTIMATIONDependent Variable: LFEEMethod: Least SquaresSample(adjusted): 1987:1 1995:4Included observations: 36 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob.
C 0.048752 0.024114 2.021760 0.0529
LFEE(-1) 1.126366 0.182970 6.156010 0.0000
LFEE(-2) 0.292152 0.256488 1.139051 0.2643
LFEE(-3) -0.344963 0.253185 -1.362491 0.1839
LFEE(-4) -0.076855 0.181751 -0.422857 0.6756
D1 -0.043879 0.005597 -7.840118 0.0000
D1(-1) -0.048562 0.010241 -4.742040 0.0001
D1(-2) -0.005369 0.009855 -0.544814 0.5902R-squared 0.999502 Mean dependent var 5.263841Adjusted R-squared 0.999377 S.D. dependent var 0.221681S.E. of regression 0.005532 Akaike info criterion -7.363447Sum squared resid 0.000857 Schwarz criterion -7.011554Log likelihood 140.5420 F-statistic 8025.362
Durbin-Watson stat 1.892211 Prob(F-statistic) 0.000000
2t41t3t2iti
s
1i
0t 1D1D1DLFEELFEE
GAP SALES FORECAST
11
12
13
14
15
16
0
1000000
2000000
3000000
4000000
85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
LSALES SALES
SIMPLE AUTOREGRESSIVE REGRESSION MODEL
Dependent Variable: LSALESMethod: Least SquaresSample(adjusted): 1985:2 1999:4Included observations: 59 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. C 0.613160 0.484163 1.266433 0.2105LSALES(-1) 0.958714 0.036128 26.53623 0.0000
R-squared 0.925115 Mean dependent var 13.43549Adjusted R-squared 0.923802 S.D. dependent var 0.848687S.E. of regression 0.234272 Akaike info criterion -0.031358Sum squared resid 3.128350 Schwarz criterion 0.039067Log likelihood 2.925062 F-statistic 704.1714
Durbin-Watson stat 2.159164 Prob(F-statistic) 0.000000
SEASONALITY IS NOT MODELLED
1t0t LSALESLSALES
Graphical Evaluation of Fitand Error Terms
-0.6
-0.4
-0.2
0.0
0.2
0.4
11
12
13
14
15
16
86 87 88 89 90 91 92 93 94 95 96 97 98 99
Residual Actual Fitted
NOT RANDOM
AUTOREGRESSIVE REGRESSION MODEL WITH SEASONAL DUMMIES
Dependent Variable: LSALESMethod: Least SquaresSample(adjusted): 1985:3 1999:4Included observations: 58 after adjusting endpoints
Variable Coefficient Std. Error t-Statistic Prob. C 0.299734 0.111564 2.686656 0.0096LSALES(-1) 0.994473 0.008213 121.0873 0.0000D1 -0.547251 0.018685 -29.28766 0.0000D1(-1) -0.175405 0.018732 -9.364126 0.0000D1(-2) 0.033281 0.018458 1.803073 0.0771
R-squared 0.996547 Mean dependent var 13.46547Adjusted R-squared 0.996287 S.D. dependent var 0.823972S.E. of regression 0.050210 Akaike info criterion -3.062940Sum squared resid 0.133616 Schwarz criterion -2.885316Log likelihood 93.82526 F-statistic 3824.335
Durbin-Watson stat 1.828642 Prob(F-statistic) 0.000000
2t41t3t21t0t 1D1D1DLSALESLFEE
Graphical Evaluation of Fitand Error Terms
-0.2
-0.1
0.0
0.1
0.2
11
12
13
14
15
16
86 87 88 89 90 91 92 93 94 95 96 97 98 99
Residual Actual Fitted
RANDOM
ALTERNATIVE SEASONAL MODELLING
FOR NONSEASONAL DATA, THE AUTOREGRESSIVE MODEL CAN BE WRITTEN AS
IF THE LENGTH OF THE SEASONALITY IS S, THE SESONAL AUTOREGRESSIVE MODEL CAN BE WRITTEN AS
st10t yy
1t10t yy
SEASONAL LAGGED AUTOREGRESSIVE REGRESSION
MODELDependent Variable: LSALESMethod: Least SquaresSample(adjusted): 1986:1 1999:4Included observations: 56 after adjusting endpoints
Variable CoefficientStd. Error t-Statistic Prob. C 0.329980 0.169485 1.946953 0.0567LSALES(-4) 0.990877 0.012720 77.89949 0.0000
R-squared 0.991180 Mean dependent var 3.50893Adjusted R-squared 0.991016 S.D. dependent var 0.804465S.E. of regression 0.076248 Akaike info criterion 2.274583Sum squared resid 0.313945 Schwarz criterion -2.202249Log likelihood 65.68834 F-statistic 6068.330
Durbin-Watson stat 0.434696 Prob(F-statistic) 0.000000
4t10t LSALESLSALES
Graphical Evaluation of Fitand Error Terms
-0.2
-0.1
0.0
0.1
0.2
11
12
13
14
15
16
86 87 88 89 90 91 92 93 94 95 96 97 98 99
Residual Actual Fitted