forecasting with regression models trend analysis business forecasting prof. dr. burç Ülengin itu...

FORECASTING WITH REGRESSION FORECASTING WITH REGRESSION

MODELS MODELS

TREND ANALYSISTREND ANALYSIS

BUSINESS FORECASTINGBUSINESS FORECASTING

Prof. Dr. Burç ÜlenginProf. Dr. Burç Ülengin

ITUMANAGEMENT ENGINEERING FACULTY

FALL 2011

OVERVIEW

The bivarite regression modelData inspectionRegression forecast processForecasting with simple linear trendCausal regression modelStatistical evaluation of regression modelExamples...

The Bivariate Regression Model

The bivariate regression model is also known a simple regression model

It is a statistical tool that estimates the relationship between a dependent variable(Y) and a single independent variable(X).

The dependent variable is a variable which we want to forecast


)X(fY General form

Dependent variable

Independent variable

Specific form: Linear Regression Model

XY 10

Random disturbance


XY 10

•The regression model is indeed a line equation

1= slope coefficient that tell us the rate of change in Y per unit change in X

•If 1= 5, it means that one unit increase in X causes 5 unit increase in Y

is random disturbance, which causes for given X, Y can take different values

•Objective is to estimate 0 and 1 such a way that the fitted values should be as close as possible

The Bivariate Regression ModelGeometrical Representation

X

Y

Poor fit

Good fit

The red line is more close the data points than the blue one

Best Fit Estimates

210

2^

2

^

10

^

10

)XbbY()Y-Y(emin

OLS-Estimate SquareLeast Ordinary

Y-Ye

error term

XbbY

model regression Sample

XY

model regression Population

population

sample

Best Fit Estimates-OLS

XbYb

)XnX(

)YXnXY(b

)XbbY()Y-Y(emin


10

21

210

2^

2

Misleading Best Fits

X

Y

X

Y

X

Y

X

Y

e2=100e2=100

e2=100

e2=100

THE CLASSICAL ASSUMPTIONS

1. The regression model is linear in the coefficients, correctly specified, & has an additive error term.

2. E() = 0.3. All explanatory variables are uncorrelated with the error

term.4. Errors corresponding to different observations are

uncorrelated with each other.5. The error term has a constant variance.6. No explanatory variable is an exact linear function of any

other explanatory variable(s).7. The error term is normally distributed such that:

2,0 ~ Niidi

Regression Forecasting Process

Data consideration: plot the graph of each variable over time and scatter plot. Look at Trend Seasonal fluctuation Outliers

To forecast Y we need the forecasted value of X

Reserve a holdout period for evaluation and test the estimated equation in the holdout period

1T101T XbbY

An Example: Retail Car Sales

The main explanatory variables: Income Price of a car Interest rates- credit usage General price level Population Car park-number of cars sold up to time-replacement

purchases Expectation about future

For simple-bivariate regression, income is chosen as an explanatory variable

Bi-variate Regression Model

Population regression model

Our expectation is1>0But, we have no all available data at hand, the data set

only covers the 1990s.We have to estimate model over the sample periodSample regression model is

tt10t DPIRCS

tt10t eDPIbbRCS

Retail Car Sales and Disposable Personal Income Figures

17500

18000

18500

19000

19500

20000

20500

1400

1600

1800

2000

2200

90 91 92 93 94 95 96 97 98

DPI RCS

Quarterly car sales

000 carsDisposable income $

OLS Estimate

Dependent Variable: RCS

Method: Least Squares

Sample: 1990:1 1998:4

Included observations: 36

Variable CoefficientStd. Error t-Statistic Prob.

C 541010.9 746347.9 0.724878 0.4735

DPI 62.39428 40.00793 1.559548 0.1281

R-squared 0.066759 Mean dependent var 1704222.

Adjusted R-squared 0.039311 S.D. dependent var 164399.9

S.E. of regression 161136.1 Akaike info criterion 26.87184

Sum squared resid 8.83E+11 Schwarz criterion 26.95981

Log likelihood -481.6931 F-statistic 2.432189

Durbin-Watson stat 1.596908 Prob(F-statistic) 0.128128

DPIbbRCS 10

Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y per unit

change in X

When the DPI increases one $, the number of cars sold increases 62. Hypothesis test related with 1

H0: 1 =0 H1: 1 0 t test is used to test the validity of H0

t = 1/se(1)• If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 • If t statistic < t table Do not reject H0 or Pr > Do not reject H0

• t= 1,56 < t table or Pr = 0.1281 > 0.05 Do not Reject H0

• DPI has no effect on RCS

Basic Statistical Evaluation R2 is the coefficient of determination that tells us the fraction of the variation

in Y explained by X 0<R2<1 ,

R2 = 0 indicates no explanatory power of X-the equation. R2 = 1 indicates perfect explanation of Y by X-the equation. R2 = 0.066 indicates very weak explanation power

Hypothesis test related with R2

H0: R2=0 H1: R20 F test check the hypothesis

• If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic=2.43 < F table or Pr = 0.1281 > 0.05 Do not reject H0

• Estimated equation has no power to explain RCS figures

Graphical Evaluation of Fitand Error Terms

-400000

-200000

0

200000

400000

1400000

1600000

1800000

2000000

2200000

90 91 92 93 94 95 96 97 98

Residual Actual Fitted

Residuls show clear seasonal pattern

Model Improvement

When we look the graph of the series, the RCS exhibits clear seasonal fluctuations, but PDI does not.

Remove seasonality using seasonal adjustment method.

Then, use seasonally adjusted RCS as a dependent variable.

Seasonal Adjustment

Sample: 1990:1 1998:4 Included observations: 36 Ratio to Moving Average Original Series: RCS Adjusted Series: RCSSA

Scaling Factors:• 1 0.941503• 2 1.119916• 3 1.016419• 4 0.933083

Seasonally Adjusted RCS and RCS

1400000

1600000

1800000

2000000

2200000

90 91 92 93 94 95 96 97 98

RCS RCSSA

OLS EstimateDependent Variable: RCSSA


Sample: 1990:1 1998:4



C 481394.3 464812.8 1.035674 0.3077

DPI 65.36559 24.91626 2.623411 0.0129

R-squared 0.168344 Mean dependent var 1700000.



Sum squared resid 3.42E+11 Schwarz criterion 26.01270



DPIbbRCSSA 10

Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y

per unit change in X

When the DPI increases one $, the number of cars sold increases 65.

Hypothesis test related with 1 H0: 1 =0 H1: 1 0 t test is used to test the validity of H0


• t= 2,62 < t table or Pr = 0.012 < 0.05 Reject H0

• DPI has statistically significant effect on RCS

Basic Statistical Evaluation R2 is the coefficient of determination that tells us the

fraction of the variation in Y explained by X 0<R2<1 ,




• If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic = 6.88 < F table or Pr = 0.012 < 0.05 Reject H0

• Estimated equation has some power to explain RCS figures


-200000

-100000

0

100000

200000

300000

1500000

1600000

1700000

1800000

1900000

90 91 92 93 94 95 96 97 98


No seasonality but it still does not look random disturbance

Omitted Variable?Business Cycle?

Trend Models

Simple Regression ModelSpecial Case: Trend Model

t 1 0 tt b b Y

•Independent variable Time, t = 1, 2, 3,...., T-1, T

•There is no need to forecast the independent variable

•Using simple transformations, variety of nonlinear trend equations can be estimated , therefore the estimated model can mimic the pattern of the data

Suitable Data Pattern

NO SEASONALITY

ADDITIVE SEASONALITY

MULTIPLICTIVE SEASONALITY

NO TREND

ADDITIVE TREND

MULTIPLICATIVE TREND

Chapter 3 Exercise 13College Tuition Consumers' Price Index by

Quarter

120

160

200

240

280

86 87 88 89 90 91 92 93 94 95

FEE

Holdout period

OLS EstimatesDependent Variable: FEE


Sample: 1986:1 1994:4



C 115.7312 1.982166 58.38624 0.0000

@TREND 3.837580 0.097399 39.40080 0.0000

R-squared 0.978568 Mean dependent var 182.8889



Sum squared resid 1253.069 Schwarz criterion 6.586793



tbbfee 10

e2

Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y per unit

change in X

Each year tuition increases 3.83 points. Hypothesis test related with 1

H0: 1 =0 H1: 1 0 t test is used to test the validity of H0


• t= 39,4 > t table or Pr = 0.0000 < 0.05 Reject H0

Basic Statistical Evaluation R2 is the coefficient of determination that tells us the fraction of the variation

in Y explained by X 0<R2<1 ,




• If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 • If F statistic < F table Do not reject H0 or Pr > Do not reject H0 • F-statistic= 1552 < F table or Pr = 0.0000 < 0.05 Reject H0

• Estimated equation has explanatory power

Graphical Evaluation of Fit

100

150

200

250

300

86 87 88 89 90 91 92 93 94 95

FEE FEEF

Holdout period

ACTUAL FORECAST

1995 Q1 260.00 253.881995 Q2 259.00 257.721995 Q3 266.00 261.551995 Q4 274.00 265.39


-15

-10

-5

0

5

10

15

100

150

200

250

300

86 87 88 89 90 91 92 93 94


Residuals exhibit clear pattern, they are not random

Also the seasonal fluctuations can not be modelled

Regression model is misspecified

Model Improvement Data may exhibit exponential trend

In this case, take the logarithm of the dependent variable

Calculate the trend by OLS After OLS estimation forecast the holdout period Take exponential of the logarithmic forecasted values in

order to reach original units

tt

tt

etbbY

tY

10

10

)ln(

)ln(

tt AeY 1


NO SEASONALITY



NO TREND

ADDITIVE TREND


Original and Logarithmic Transformed Data

4.8

5.0

5.2

5.4

5.6

5.8

100

150

200

250

300

86 87 88 89 90 91 92 93 94 95

LFEE FEE

LOG(FEE) FEE 4.844187 127.000 4.844187 127.000 4.867534 130.000 4.912655 136.000 4.912655 136.000 4.919981 137.000 4.941642 140.000 4.976734 145.000 4.983607 146.000

OLS Estimate of the Logrithmin Trend Model

Dependent Variable: LFEE


Sample: 1986:1 1994:4


Variable Coefficient Std. Error t-Statistic Prob.

C 4.816708 0.005806 829.5635 0.0000

@TREND 0.021034 0.000285 73.72277 0.0000



S.E. of regression 0.017783 Akaike info criterion -5.167178

Sum squared resid 0.010752 Schwarz criterion -5.079205

Log likelihood 95.00921 F-statistic 5435.047


tbb)feeln( 10

Forecast Calculations

obs FEE LFEEF FEELF=exp(LFEEF)1993:1 228.0000 5.405651 222.66101993:2 228.0000 5.426684 227.39401993:3 235.0000 5.447718 232.22761993:4 243.0000 5.468751 237.16391994:1 244.0000 5.489785 242.20521994:2 245.0000 5.510819 247.35361994:3 251.0000 5.531852 252.61141994:4 259.0000 5.552886 257.98101995:1 260.0000 5.573920 263.46481995:2 259.0000 5.594953 269.06511995:3 266.0000 5.615987 274.78451995:4 274.0000 5.637021 280.6254


-0.04

-0.02

0.00

0.02

0.044.8

5.0

5.2

5.4

5.6

86 87 88 89 90 91 92 93 94



Also the seasonal fluctuations can not be modelled


Model Improvement

In order to deal with seasonal variations remove seasonal pattern from the data

Fit regression model to seasonally adjusted data

Generate forecastsAdd seasonal movements to the forecasted

values


NO SEASONALITY



NO TREND

ADDITIVE TREND


Multiplicative Seasonal Adjustment

Included observations: 40 Ratio to Moving Average

Original Series: FEE Adjusted Series: FEESA

Scaling Factors:• 1 1.002372• 2 0.985197• 3 0.996746• 4 1.015929

Original and Seasonally Adjusted Data

120

160

200

240

280

86 87 88 89 90 91 92 93 94 95

FEE FEESA

OLS Estimate of the Seasonally Adjusted Trend Model

Dependent Variable: FEESAMethod: Least SquaresSample: 1986:1 1995:4Included observations: 40

Variable Coefficient Std. Error t-Statistic Prob. C 115.0387 1.727632 66.58749 0.0000@TREND 3.897488 0.076240 51.12152 0.0000

R-squared 0.985668 Mean dependent var 191.0397Adjusted R-squared 0.985291 S.D. dependent var 45.89346S.E. of regression 5.566018 Akaike info criterion 6.319943Sum squared resid 1177.261 Schwarz criterion 6.404387Log likelihood -124.3989 F-statistic 2613.410



-10

-5

0

5

10

15

100

150

200

250

300

86 87 88 89 90 91 92 93 94 95



There is no seasonal fluctuations


Model Improvement

Take the logarithm in order to remove existing nonlinearity

Use additive seasonal adjustment to logarithmic data

Apply OLS to seasonally adjusted logrithmic data Forecast holdout period Add seasonal movements to reach seasonal

forecasts Take an exponential in order to reach original

seasonal forecasts


NO SEASONALITY



NO TREND

ADDITIVE TREND


Logarithmic Transformation and Additive Seasonal Adjustment

Sample: 1986:1 1995:4


Difference from Moving Average

Original Series: LFEE =log(FEE)

Adjusted Series: LFEESA

Scaling Factors:

1 0.002216

2 -0.014944

3 -0.003099

4 0.015828

Original and Logarithmic Additive Seasonally Adjustment Series

100

150

200

250

300

4.8

5.0

5.2

5.4

5.6

5.8

86 87 88 89 90 91 92 93 94 95

FEE LFEESA

OLS Estimate of the Logarithmic Additive Seasonally Adjustment Data

Dependent Variable: LFEESA


Sample: 1986:1 1995:4



C 4.822122 0.004761 1012.779 0.0000

@TREND 0.020618 0.000210 98.12760 0.0000








-0.04

-0.02

0.00

0.02

0.04

4.8

5.0

5.2

5.4

5.6

5.8

86 87 88 89 90 91 92 93 94 95



There is no seasonal fluctuations


Autoregressive Model

Some cases the growth model may be more suitable to the data

If data exhibits the nonlinearity, the autoregressive model can be adjusted to model exponential pattern

model Sample eYaaY

model Population YY

t1t10t

t1t10t

model Sample e)Y(Lnaa)Y(Ln

model Population )Y(Ln)Y(Ln

t1t10t

t1t10t

OLS Estimate of Autoregressive Model

Dependent Variable: FEE


Sample(adjusted): 1986:2 1995:4

Included observations: 39 after adjusting endpoints


C 0.739490 2.305654 0.320729 0.7502

FEE(-1) 1.016035 0.011884 85.49718 0.0000




Sum squared resid 395.4643 Schwarz criterion 5.342251



1t10t feebbfee


-10

-5

0

5

10

100

150

200

250

300

87 88 89 90 91 92 93 94 95


Clear seasonal pattern

Model is misspecified

Model Improvement

To remove seasonal fluctuations Seasonally adjust the data Apply OLS to Autoregressive Trend ModelForecast seasonally adjusted dataAdd seasonal movement to forecasted

values

Dependent Variable: FEESAMethod: Least SquaresSample(adjusted): 1986:2 1995:4Included observations: 39 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C 1.125315 0.811481 1.386743 0.1738FEESA(-1) 1.013445 0.004181 242.4027 0.0000

R-squared 0.999371 Mean dependent var 192.6894Adjusted R-squared 0.999354 S.D. dependent var 45.27587S.E. of regression 1.151024 Akaike info criterion 3.169101Sum squared resid 49.01968 Schwarz criterion 3.254412Log likelihood -59.79748 F-statistic 58759.08


OLS Estimate of Seasonally AdjustedAutoregressive Model


-2

0

2

4

100

150

200

250

300

87 88 89 90 91 92 93 94 95


No seasonal pattern in the residuals

Model specification seems more corret than the previous estimates

Seasonal Autoregressive Model

If data exhibits sesonal fluctutions, the growth model should be remodeled

If data exhibits the nonlinearity and sesonality together, the seasonal autoregressive model can be adjusted to model exponential pattern

model Sample eYaaY

model Population YY

tst10t

tst10t

model Sample e)Y(Lnaa)Y(Ln

model Population )Y(Ln)Y(Ln

tst10t

tst10t

New Product ForecastingGrowth Curve Fitting

For new products, the main problem is typically lack of historical data.

Trend or Seasonal pattern can not be determined. Forecasters can use a number of models that generally fall in the

category called Diffusion Models. These models are alternatively called S-curves, growth models,

saturation models, or substitution curves. These models imitate life cycle of poducts. Life cycles follows a

common pattern: A period of slow growth just after introduction of new product A period of rapid growth Slowing growth in a mature phase Decline

New Product ForecastingGrowth Curve Fitting

Growth models has its own lower and upper limit.

A significant benefit of using diffusion models is to identfy and predict the timing of the four phases of the life cycle.

The usual reason for the transition from very slow initial growth to rapid growth is often the result of solutions to technical difficulties and the market’s acceptance of the new product / technology.

There are uper limits and a maturity phase occurs in which growth slows and finally ceases.

0

5

10

15

20

25

30

35

1 2 3 4 5 6 7 8 9 10TIME

SALES

GOMPERTZ CURVEGompertz function is given as

where L = Upper limit of Y e = Natural number = 2.718262..... a and b = coefficients describing the

curveThe Gompertz curve will range in value from

zero to L as t varies from zero to infinity.Gompertz curve is a way to summarize the

growth with a few parameters.

btaet LeY

GOMPERTZ CURVEAn Example

HDTV: LCD and Plazma TV sales figures

YEAR HDTV

2000 1200

2001 1500

2002 1770

2003 3350

2004 5500

2005 9700

2006 15000

0

2000

4000

6000

8000

10000

12000

14000

16000

00 01 02 03 04 05 06 07 08 09 10

HDTV

0

50000

100000

150000

200000

250000

300000

350000

00 05 10 15 20 25 30 35 40 45 50

HDTVF

GOMPERTZ CURVEAn Example

Dependent Variable: HDTVMethod: Least SquaresSample (adjusted): 2000 2006Included observations: 7 after adjustmentsConvergence achieved after 61 iterationsHDTV=C(1)*EXP(-C(2)*EXP(-C(3)*@TREND))

Coefficient Std. Error t-Statistic Prob. C(1) 332940 850837 0.391 0.716C(2) 6.718 2.023 3.321 0.029C(3) 0.128 0.087 1.477 0.214

R-squared 0.992 Mean dependent var 5431.429Adjusted R-squared 0.988 S.D. dependent var 5178.199S.E. of regression 559.922 Akaike info criterion 15.791Sum squared resid1254049 Schwarz criterion 15.76782Log likelihood -52.26849 Durbin-Watson stat 0.704723

LOGISTICS CURVE Logistic function is given as

where L = Upper limit of Y e = Natural number = 2.718262..... a and b = coefficients describing the curve

The Logistic curve will range in value from zero to L as t varies from zero to infinity.

The Logistic curve is symetric about its point of inflection. The Gompertz curve is not necessarily symmetric.

btt ae

LY

1

LOGISTICS or GOMPERTZ CURVES ?

The answer lies in whether, in a particular situation, it is easier to achieve the maximum value the closer you get to it, or whether it becomes more difficult to attain the maximum value the closer you get to it. Are there factors assisting the attainment of the maximum

value once you get close to it, or Are there factors preventing the attainment of the maximum

value once it is nearly attained? If there is an offsetting factor such that growth is more

difficult to maintain as the maximum is approached, then the Gompertz curve will be the best choice.

If there are no such offsetting factors hindering than attainment of the maximum value, the logistics curve will be the best choice.

0

40000

80000

120000

160000

00 05 10 15 20 25 30 35 40 45 50

HDTV HDTVF

LOGISTICS CURVEAn Example

HDTV: LCD and Plazma TV sales figures

YEAR HDTV2000 1200

2001 1500

2002 1770

2003 3350

2004 5500

2005 9700

2006 15000Dependent Variable: HDTVMethod: Least SquaresSample (adjusted): 2000 2006Included observations: 7 after adjustmentsConvergence achieved after 1 iterationHDTV=C(1)/(1+C(2)*EXP(-C(3)*@TREND))

Coefficient Std. Error t-Statistic Prob. C(1) 149930.000 350258.500 0.428 0.691C(2) 199.182 432.110 0.461 0.669C(3) 0.517 0.073 7.048 0.002

R-squared 0.997 Mean dependent var #######Adjusted R-squared 0.995 S.D. dependent var #######S.E. of regression 370.451 Akaike info criterion 14.965Sum squared resid 548936 Schwarz criterion 14.942Log likelihood -49.377 Durbin-Watson stat 1.632

0

50000

100000

150000

200000

250000

300000

350000

00 05 10 15 20 25 30 35 40 45 50

HD TV F _G HD TV F _L

LOGISTICS versus GOMPERTZ CURVES

FORECASTING WITH FORECASTING WITH

MULTIPLE REGRESSION MULTIPLE REGRESSION

MODELSMODELS

BUSINESS FORECASTINGBUSINESS FORECASTING

CONTENT DEFINITION INDEPENDENT VARIABLE

SELECTION,FORECASTING WITH MULTIPLE REGRESSION MODEL

STATISTICAL EVALUATION OF THE MODEL SERIAL CORRELATION SEASONALITY TREATMENT GENERAL AUTOREGRESSIVE MODEL ADVICES EXAMPLES....

MULTIPLE REGRESSION MODEL

DEPENDENT VARIABLE, Y, IS A FUNCTION OF MORE THAN ONE INDEPENDENT VARIABLE, X1, X2,..Xk

eXb...XbXbbY

REGRESSION SAMPLE -FORM LINEAR

X...XXY

REGRESSION POPULATION -FORM LINEAR

)X,...X,X(fY

FORM GENERAL

kk22110

kk22110

k21

SELECTING INDEPENDENT VARIABLES

FIRST, DETERMINE DEPENDENT VARIABLE SEARCH LITERATURE, USE COMMONSENSE AND

LIST THE MAIN POTENTIAL EXPLANATORY VARIABLES

IF TWO VARIABLE SHARE THE SAME INFORMATION SUCH AS GDP AND GNP SELECT THE MOST RELEVANT ONE

IF A VARITION OF A VARIABLE IS VERY LITTLE, FIND OUT MORE VARIABLE ONE

SET THE EXPECTED SIGNS OF THE PARAMETERS TO BE ESTIMATED

AN EXAMPLE: SELECTING INDEPENDENT VARIABLES

LIQUID PETROLIUM GAS-LPG- MARKET SIZE FORECAST

POTENTIAL EXPLANATORY VARIABLES POPULATION PRICE URBANIZATION RATIO GNP or GDP

EXPECTATIONS

0 0 0 0

PRICEURGDPPOPLPG

4321

tt4t3t2t10t

2kk210

2^

2

^

kk210

^

kk210

)Xb ....X2bX1bbY(

)Y-Y(emin


Y-Ye

Error term

Xb ....X2bX1bbY

Model Regression Sample

X ....X2X1Y

Model Regression Population

PARAMETER ESTIMATES-OLS ESTIMATION

IT IS VERY COMPLEX TO CALCULATE b’s, MATRIX ALGEBRA IS USED TO ESTIMATE b’s.

FORECASTING WITH MULTIPLE REGRESSION MODEL

Ln(SALESt) = 23 + 1.24*Ln(GDPt) - 0.90*Ln(PRICEt)

IF GDP INCREASES 1%, SALES INCRESES 1.24% IF PRICE INCREASES 1% SALES DECRAESES 0.9% PERIOD GDP PRICE SALES

100 1245 100 230

101 1300 103 ?

Ln(SALESt) = 23 + 1.24*Ln(1300) - 0.90*Ln(103)

Ln(SALESt) = 3.63

e3.63 = 235

EXAMPLE : LPG FORECAST

0

200000

400000

600000

800000

68 70 72 74 76 78 80 82 84 86 88 90 92 94 96

TUPSATAY

LOGARITHMIC TRANSFORMATION

10.5

11.0

11.5

12.0

12.5

13.0

13.5

68 70 72 74 76 78 80 82 84 86 88 90 92 94 96

LSATA

SCATTER DIAGRAM

10

11

12

13

14

13.6 13.8 14.0 14.2 14.4 14.6

LGNP

LS

ATA

10

11

12

13

14

0 2 4 6 8 10 12

LP

LS

ATA

UNEXPECTED RELATION

LSATA=f(LGNP)

Dependent Variable: LSATAMethod: Least SquaresSample: 1968 1997Included observations: 30

Variable Coefficient Std. Error t-Statistic Prob. C -44.91150 3.097045 -14.50140 0.0000LGNP 4.081938 0.220265 18.53195 0.0000

R-squared 0.924616 Mean dependent var 12.47858Adjusted R-squared 0.921924 S.D. dependent var 0.736099S.E. of regression 0.205681 Akaike info criterion -0.260637Sum squared resid 1.184535 Schwarz criterion -0.167224 Log likelihood 5.909555 F-statistic 343.4333


LGNPLSATA 10


-0.6

-0.4

-0.2

0.0

0.2

0.4

10

11

12

13

14

68 70 72 74 76 78 80 82 84 86 88 90 92 94 96


NOT RANDOM

LSATA=f(LP)

Dependent Variable: LSATAMethod: Least SquaresSample(adjusted): 1969 1997Included observations: 29 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C 11.70726 0.081886 142.9694 0.0000LP 0.190128 0.015096 12.59492 0.0000

R-squared 0.854551 Mean dependent var 12.53724Adjusted R-squared 0.849164 S.D. dependent var 0.674006S.E. of regression 0.261768 Akaike info criterion 0.223756Sum squared resid 1.850107 Schwarz criterion 0.318052Log likelihood -1.244459 F-statistic 158.6319Durbin-Watson stat 0.187322 Prob(F-statistic) 0.000000

LPLSATA 10


-1.0

-0.5

0.0

0.5

11

12

13

14

70 72 74 76 78 80 82 84 86 88 90 92 94 96


NOT RANDOM

LSATA=f(LGNP,LP)


Variable Coefficient Std. Error t-Statistic Prob. C -30.808410 7.715902 -3.992846 0.0005LGNP 3.066655 0.556533 5.510284 0.0000LP 0.045318 0.028281 1.602436 0.1211

R-squared 0.932905 Mean dependent var 12.53724Adjusted R-squared 0.927744 S.D. dependent var 0.674006S.E. of regression 0.181176 Akaike info criterion -0.480999Sum squared resid 0.853443 Schwarz criterion -0.339555Log likelihood 9.974488 F-statistic 180.7558Durbin-Watson stat 0.364799 Prob(F-statistic) 0.000000

LPLGNPLSATA 210


-0.4

-0.2

0.0

0.2

0.4

11

12

13

14

70 72 74 76 78 80 82 84 86 88 90 92 94 96


NOT RANDOM

WHAT IS MISSING?

GNP AND PRICE ARE THE MOST IMPORTANT VARIABLES BUT THE COEFFICIENT OF THE PRICE IS NOT SIGNIFICANT AND HAS UNEXPECTED SIGN

RESIDUAL DISTRIBUTION IS NOT RANDOM WHAT IS MISSING?

WRONG FUNCTION-NONLINEAR MODEL? LACK OF DYNAMIC MODELLING? MISSING IMPORTANT VARIABLE?

• POPULATION?


Variable Coefficient Std. Error t-Statistic Prob. C -50.913420 3.992134 -12.75343 0.0000LGNP 0.755445 0.337894 2.235746 0.0345LP -0.131508 0.021528 -6.108568 0.0000LPOP 4.955945 0.486887 10.17885 0.0000


LSATA=f(LGNP,LP,LPOP)

LPOPLPLGNPLSATA 3210


-0.2

-0.1

0.0

0.1

0.2

11.0

11.5

12.0

12.5

13.0

13.5

70 72 74 76 78 80 82 84 86 88 90 92 94 96


NOT RANDOM

WHAT IS MISSING?

GNP, POPULATION AND PRICE ARE THE MOST IMPORTANT VARIABLES.• THEY ARE SIGNIFICANT • THEY HAVE EXPECTED SIGN

RESIDUAL DISTRIBUTION IS NOT RANDOM WHAT IS MISSING?

• WRONG FUNCTION-NONLINEAR MODEL?

• LACK OF DYNAMIC MODELLING? YES.

• MISSING IMPORTANT VARIABLE? YES, URBANIZATION


Variable Coefficient Std. Error t-Statistic Prob. C -16.185910 3.832897 -4.222893 0.0003LGNP 0.523657 0.150971 3.468585 0.0020LP -0.033964 0.013483 -2.518934 0.0188LPOP 1.279753 0.419566 3.050182 0.0055LSATA(-1) 0.619986 0.060756 10.20446 0.0000


LSATA=f(LGNP,LP,LPOP,LSATAt-1)

1t4t3t2t10t LSATALPOPLPLGNPLSATA


-0.10

-0.05

0.00

0.05 11.0

11.5

12.0

12.5

13.0

13.5

14.0

70 72 74 76 78 80 82 84 86 88 90 92 94 96


RANDOM

Basic Statistical Evaluation 1 is the slope coefficient that tell us the rate of change in Y

per unit change in X

When the GNP increases 1%, the volume of LPG sales increases 0.52%.

Hypothesis test related with 1 • H0: 1 =0• H1: 1 0• t test is used to test the validity of H0

• t = 1/se(1) If t statistic > t table Reject H0 or Pr < (exp. =0.05) Reject H0 If t statistic < t table Do not reject H0 or Pr > Do not reject H0

t= 3,46 < t table or Pr = 0.002 < 0.05 Reject H0

GNP has effect on RCS

Basic Statistical Evaluation R2 is the coefficient of determination that tells us the

fraction of the variation in Y explained by X 0<R2<1 ,

• R2 = 0 indicates no explanatory power of X-the equation.• R2 = 1 indicates perfect explanation of Y by X-the equation.• R2 = 0.9975 indicates very strong explanation power


• H0: R2=0• H1: R20• F test check the hypothesis

If F statistic > F table Reject H0 or Pr < (exp. =0.05) Reject H0 If F statistic < F table Do not reject H0 or Pr > Do not reject H0 F-statistic=2450 < F table or Pr = 0.0000 < 0.05 Reject H0

Estimated equation has power to explain RCS figures

SHORT AND LONG TERM IMPACTS

If we specify a dynamic model, we can estimate short and a long term impact of independent variables simultaneously on the dependent variable

x)1()1(

y

xy)1(

xyy

yxy

yxy

2

1

2

0

t102

t102

2t10

1t2t10t

yyy

conditions mequilibriu in

1-tt

Short term effect of x

Long term effect of x

AN EXAMPLE: SHORT AND LONG TERM IMPACTS

Short Term Impact Long Term Impact

LGNP 0.523657 1.3778

LP -0.033964 -0.0892

LPOP 1.279753 3.3657

If GNP INCREASES 1% AT TIME t, THE LPG SALES INCREASES 0.52% AT TIME t

IN THE LONG RUN, WITHIN 3-5 YEARS, THE LPG SALES INCREASES 1.38%

SESONALITY AND MULTIPLE REGRESSION MODEL

SEASONAL DUMMY VARIABLES CAN BE USED TO MODEL SEASONAL PATTERNS

DUMMY VARIABLE IS A BINARY VARIABLE THAT ONLY TAKES THE VALUES 0 AND 1.

DUMMY VARIABLES RE THE INDICATOR VARIABLES, IF THE DUMMY VARIABLE TAKES 1 IN A GIVEN TIME, IT MEANS THAT SOMETHING HAPPENS IN THAT PERIOD.

SEASONAL DUMMY VARIABLES THE SOMETHING CAN BE SPECIFIC SEASON THE DUMMY VARIABLE INDICATES THE SPECIFIC SEASON D1 IS A DUMMY VARIABLE WHICH INDICATES THE FIRST QUARTERS

» 1990Q1 1» 1990Q2 0» 1990Q3 0» 1990Q4 0» 1991Q1 1» 1991Q2 0» 1991Q3 0» 1991Q4 0» 1992Q1 1» 1992Q2 0» 1992Q3 0» 1992Q4 0

BASE PERIOD

DATE D1 D2 D3 1990 Q1 1 0 0 1990 Q2 0 1 0 1990 Q3 0 0 1 1990 Q4 0 0 0 1990 Q1 1 0 0 1991 Q2 0 1 0 1991 Q3 0 0 1 1991 Q4 0 0 0 1992 Q1 1 0 0 1992 Q2 0 1 0 1992 Q3 0 0 1 1992 Q4 0 0 0

FULL SEASONAL DUMMY VARIABLE REPRESANTATION

COLLEGE TUITION CONSUMERS' PRICE INDEX BY QUARTER

120

160

200

240

280

86 87 88 89 90 91 92 93 94 95

FEE

COLLEGE TUITION CONSUMERS' PRICE INDEX BY QUARTER

QUARTERLY DATA THEREFORE 3 DUMMY VARIABLES WILL BE SUFFICIENT TO CAPTURE THE SEASONAL PATTERN

DATE D1 D2 D3 1990 Q1 1 0 0 1990 Q2 0 1 0 1990 Q3 0 0 1 1990 Q4 0 0 0

SEASONAL PATTERN MODELLED

COLLEGE TUITION PRICE INDEX TREND ESTIMATION

Dependent Variable: LOG(FEE)


Sample(adjusted): 1986:3 1995:4

Included observations: 38 after adjusting endpoints


C 4.832335 0.006948 695.4771 0.0000

@TREND 0.020780 0.000232 89.57105 0.0000

D1 -0.011259 0.007202 -1.563344 0.1275

D1(-1) -0.029526 0.007198 -4.101948 0.0003

D1(-2) -0.017082 0.007010 -2.436806 0.0204







2t41t3t210t 1D1D1DtLFEE


-0.04

-0.02

0.00

0.02

0.04

4.8

5.0

5.2

5.4

5.6

5.8

87 88 89 90 91 92 93 94 95


NOT RANDOM

COLLEGE TUITION PRICE INDEX AUTOREGRESSIVE TREND

ESTIMATIONDependent Variable: LOG(FEE)Method: Least SquaresSample(adjusted): 1986:3 1995:4Included observations: 38 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C 0.050887 0.022969 2.215524 0.0337LOG(FEE(-1)) 0.997510 0.004375 227.9958 0.0000D1 -0.031634 0.002833 -11.16704 0.0000D1(-1) -0.035335 0.002833 -12.47301 0.0000D1(-2) -0.006775 0.002761 -2.454199 0.0196

R-squared 0.999368 Mean dependent var 5.244170Adjusted R-squared 0.999292 S.D. dependent var 0.231661S.E. of regression 0.006165 Akaike info criterion -7.217678Sum squared resid 0.001254 Schwarz criterion -7.002206Log likelihood 142.1359 F-statistic 13051.60


2t41t3t21t10t 1D1D1DLFEELFEE


-0.02

-0.01

0.00

0.01

0.02

4.8

5.0

5.2

5.4

5.6

5.8

87 88 89 90 91 92 93 94 95


RANDOM

SEASONAL PART OF THE MODEL

DYNAMIC PART OF THE MODEL

COLLEGE TUITION PRICE INDEX GENERALIZED AUTOREGRESSIVE TREND

ESTIMATIONDependent Variable: LFEEMethod: Least SquaresSample(adjusted): 1987:1 1995:4Included observations: 36 after adjusting endpoints


C 0.048752 0.024114 2.021760 0.0529

LFEE(-1) 1.126366 0.182970 6.156010 0.0000

LFEE(-2) 0.292152 0.256488 1.139051 0.2643

LFEE(-3) -0.344963 0.253185 -1.362491 0.1839

LFEE(-4) -0.076855 0.181751 -0.422857 0.6756

D1 -0.043879 0.005597 -7.840118 0.0000

D1(-1) -0.048562 0.010241 -4.742040 0.0001

D1(-2) -0.005369 0.009855 -0.544814 0.5902R-squared 0.999502 Mean dependent var 5.263841Adjusted R-squared 0.999377 S.D. dependent var 0.221681S.E. of regression 0.005532 Akaike info criterion -7.363447Sum squared resid 0.000857 Schwarz criterion -7.011554Log likelihood 140.5420 F-statistic 8025.362


2t41t3t2iti

s

1i

0t 1D1D1DLFEELFEE

GAP SALES FORECAST

11

12

13

14

15

16

0

1000000

2000000

3000000

4000000

85 86 87 88 89 90 91 92 93 94 95 96 97 98 99

LSALES SALES

SIMPLE AUTOREGRESSIVE REGRESSION MODEL

Dependent Variable: LSALESMethod: Least SquaresSample(adjusted): 1985:2 1999:4Included observations: 59 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C 0.613160 0.484163 1.266433 0.2105LSALES(-1) 0.958714 0.036128 26.53623 0.0000

R-squared 0.925115 Mean dependent var 13.43549Adjusted R-squared 0.923802 S.D. dependent var 0.848687S.E. of regression 0.234272 Akaike info criterion -0.031358Sum squared resid 3.128350 Schwarz criterion 0.039067Log likelihood 2.925062 F-statistic 704.1714


SEASONALITY IS NOT MODELLED

1t0t LSALESLSALES


-0.6

-0.4

-0.2

0.0

0.2

0.4

11

12

13

14

15

16

86 87 88 89 90 91 92 93 94 95 96 97 98 99


NOT RANDOM

AUTOREGRESSIVE REGRESSION MODEL WITH SEASONAL DUMMIES

Dependent Variable: LSALESMethod: Least SquaresSample(adjusted): 1985:3 1999:4Included observations: 58 after adjusting endpoints

Variable Coefficient Std. Error t-Statistic Prob. C 0.299734 0.111564 2.686656 0.0096LSALES(-1) 0.994473 0.008213 121.0873 0.0000D1 -0.547251 0.018685 -29.28766 0.0000D1(-1) -0.175405 0.018732 -9.364126 0.0000D1(-2) 0.033281 0.018458 1.803073 0.0771

R-squared 0.996547 Mean dependent var 13.46547Adjusted R-squared 0.996287 S.D. dependent var 0.823972S.E. of regression 0.050210 Akaike info criterion -3.062940Sum squared resid 0.133616 Schwarz criterion -2.885316Log likelihood 93.82526 F-statistic 3824.335


2t41t3t21t0t 1D1D1DLSALESLFEE


-0.2

-0.1

0.0

0.1

0.2

11

12

13

14

15

16

86 87 88 89 90 91 92 93 94 95 96 97 98 99


RANDOM

ALTERNATIVE SEASONAL MODELLING

FOR NONSEASONAL DATA, THE AUTOREGRESSIVE MODEL CAN BE WRITTEN AS

IF THE LENGTH OF THE SEASONALITY IS S, THE SESONAL AUTOREGRESSIVE MODEL CAN BE WRITTEN AS

st10t yy

1t10t yy

SEASONAL LAGGED AUTOREGRESSIVE REGRESSION

MODELDependent Variable: LSALESMethod: Least SquaresSample(adjusted): 1986:1 1999:4Included observations: 56 after adjusting endpoints

Variable CoefficientStd. Error t-Statistic Prob. C 0.329980 0.169485 1.946953 0.0567LSALES(-4) 0.990877 0.012720 77.89949 0.0000

R-squared 0.991180 Mean dependent var 3.50893Adjusted R-squared 0.991016 S.D. dependent var 0.804465S.E. of regression 0.076248 Akaike info criterion 2.274583Sum squared resid 0.313945 Schwarz criterion -2.202249Log likelihood 65.68834 F-statistic 6068.330


4t10t LSALESLSALES


-0.2

-0.1

0.0

0.1

0.2

11

12

13

14

15

16

86 87 88 89 90 91 92 93 94 95 96 97 98 99


forecasting with regression models trend analysis business forecasting prof. dr. burç Ülengin itu...

Documents

simple regression model

simplebivariate regression

explanatory variable

possible slide

holdout period slide

best fit estimatesols

forecasted value of

retail car sales