decision 411: class 5 - fuqua school of businessrnau/decision411...¾a taller-than-average parent...

38
1 Decision 411: Class 5 Decision 411: Class 5 HW#2 discussion HW#2 discussion Introduction to regression forecasting Introduction to regression forecasting Example: rolling back the beer tax Example: rolling back the beer tax Where we’ve been so far Where we’ve been so far Thus far we have looked at the most basic models Thus far we have looked at the most basic models for predicting future values of a time series for predicting future values of a time series Y from from its own history its own history : the : the mean mean model, model, random walk random walk model, and model, and smoothing/averaging smoothing/averaging models, possibly models, possibly with with seasonal adjustment seasonal adjustment. These basic models assume that future values of These basic models assume that future values of Y are some sort of are some sort of linear function linear function of its past values, of its past values, so we’ve also discussed the use of so we’ve also discussed the use of nonlinear data nonlinear data transformations transformations (logging, deflating) to cover more (logging, deflating) to cover more possibilities. possibilities. We’ve also studied basic principles and tools for We’ve also studied basic principles and tools for testing the assumptions testing the assumptions of models and of models and comparing comparing the forecasting accuracy the forecasting accuracy of different models. of different models.

Upload: others

Post on 24-Sep-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

1

Decision 411: Class 5Decision 411: Class 5

HW#2 discussionHW#2 discussion

Introduction to regression forecastingIntroduction to regression forecasting

Example: rolling back the beer taxExample: rolling back the beer tax

Where we’ve been so farWhere we’ve been so farThus far we have looked at the most basic models Thus far we have looked at the most basic models for predicting future values of a time series for predicting future values of a time series YY from from its own historyits own history: the : the meanmean model, model, random walkrandom walkmodel, and model, and smoothing/averagingsmoothing/averaging models, possibly models, possibly with with seasonal adjustmentseasonal adjustment..These basic models assume that future values of These basic models assume that future values of YYare some sort of are some sort of linear functionlinear function of its past values, of its past values, so we’ve also discussed the use of so we’ve also discussed the use of nonlinear data nonlinear data transformations transformations (logging, deflating) to cover more (logging, deflating) to cover more possibilities.possibilities.We’ve also studied basic principles and tools for We’ve also studied basic principles and tools for testing the assumptions testing the assumptions of models and of models and comparing comparing the forecasting accuracythe forecasting accuracy of different models.of different models.

Page 2: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

2

Where we’re going nextWhere we’re going nextNext we will consider models for predicting future Next we will consider models for predicting future values ofvalues of YY as linear functions of as linear functions of alreadyalready--known known values of some othervalues of some other variable variable XX, or possibly several , or possibly several other variables (other variables (X1X1, , X2X2, etc.)., etc.).These more general linear forecasting models are These more general linear forecasting models are called “regression” models called “regression” models (for reasons to be explained…).(for reasons to be explained…).

In some cases the In some cases the XX’s could be lagged (previous) ’s could be lagged (previous) values of values of YY, but in general they are other variables , but in general they are other variables whose movements are in some way predictive of whose movements are in some way predictive of movements in movements in YY..Our same general tools for testing and comparing Our same general tools for testing and comparing models will still apply, but now there will be more models will still apply, but now there will be more assumptions to test and more models to compare.assumptions to test and more models to compare.

Game plan for the this weekGame plan for the this weekToday (videos #3Today (videos #3--6):6):

Major concepts: correlation, RMajor concepts: correlation, R--squared & all thatsquared & all thatRegression tools for time series dataRegression tools for time series dataRegression procedures available in Regression procedures available in StatgraphicsStatgraphicsExample: rolling back the beer taxExample: rolling back the beer tax

Friday (videos #15Friday (videos #15--16):16):Seasonality revisited: dummy variablesSeasonality revisited: dummy variablesSelecting Selecting regressorsregressors: manual vs. stepwise: manual vs. stepwiseModeling issues & diagnostic testsModeling issues & diagnostic testsMore examplesMore examples

Page 3: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

3

Game plan for next weekGame plan for next weekTuesday, September 25:Tuesday, September 25:

QuizQuizNonlinear transformationsNonlinear transformationsNotNot--soso--simple regressionsimple regressionMultiplicative regressionMultiplicative regression

Friday, September 25 (videos #17Friday, September 25 (videos #17--18):18):Advanced regression techniques: ANOVA, Advanced regression techniques: ANOVA, general linear models, logistic regression, etc. general linear models, logistic regression, etc.

Linear RegressionLinear RegressionIs the most widely used (and abused!) of all Is the most widely used (and abused!) of all statistical techniquesstatistical techniquesIs about Is about the fitting of straight lines to datathe fitting of straight lines to data

General equation of a (simple) regression line:General equation of a (simple) regression line:“Y = constant + beta*X”“Y = constant + beta*X”

X

Y1

50 100 150 200 250 30050

100

150

200

250

300

X

Y1

50 100 150 200 250 30050

100

150

200

250

300

Page 4: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

4

Why assume linear relationships?Why assume linear relationships?Linear relationships are the Linear relationships are the simplest nonsimplest non--trivial trivial relationships relationships (start simple!)(start simple!)

"True" relationships between variables are often "True" relationships between variables are often at least at least approximatelyapproximately linear over the range of linear over the range of interestinterest

X

Y

Why assume linear relationships?Why assume linear relationships?Linear relationships are the Linear relationships are the simplest nonsimplest non--trivial trivial relationships relationships (start simple!)(start simple!)

"True" relationships between variables are often "True" relationships between variables are often at least at least approximatelyapproximately linear over the range of linear over the range of interestinterest

X

Y

Page 5: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

5

““Linearization” of relationshipsLinearization” of relationships

Alternatively, we may be able to Alternatively, we may be able to transformtransform the the variables in such a way as to “variables in such a way as to “linearizelinearize” the ” the relationships.relationships.

Nonlinear Nonlinear transformations (log, power, reciprocal, transformations (log, power, reciprocal, deflation, differences, percentage differences, deflation, differences, percentage differences, ratios of variables, etc.) are therefore an important ratios of variables, etc.) are therefore an important tool of regression modelingtool of regression modeling

…but use with care and with good motivation!…but use with care and with good motivation!

Sales $$ = constant + beta * Advertising $$Sales $$ = constant + beta * Advertising $$

ΔΔ UnitsUnits sold = constant + beta* sold = constant + beta* ΔΔ Coupons distributedCoupons distributed

% Return on stock% Return on stock = constant + beta* % Return on = constant + beta* % Return on marketmarket

Log(PopulationLog(Population) = constant + beta * Time) = constant + beta * Time

Temperature(Temperature(tt) = constant + beta* Temperature() = constant + beta* Temperature(tt--11))

ΔΔ WebHits(WebHits(tt) = constant + beta * ) = constant + beta * ΔΔ WebHits(WebHits(tt--11))

ΔΔ denotes denotes ““change inchange in…”…”, i.e., delta, DIFF, i.e., delta, DIFF……

ExamplesExamples

Page 6: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

6

History of regressionHistory of regression

Was soWas so--named by named by Sir Francis Galton, Sir Francis Galton, a 19th a 19th century scientist & adventurercentury scientist & adventurer

Galton initially gained fame for his African Galton initially gained fame for his African explorations and wrote bestexplorations and wrote best--selling books on selling books on wilderness exploration that introduced the wilderness exploration that introduced the sleeping bag & other wilderness gear to the sleeping bag & other wilderness gear to the Western world (still in print)Western world (still in print)

Page 7: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

7

Galton (warts and all)Galton (warts and all)Was also a pioneer in the collection & analysis of Was also a pioneer in the collection & analysis of biometric, anthropometric & psychometric data, biometric, anthropometric & psychometric data, inspired by the evolution theory of Darwininspired by the evolution theory of Darwin

Invented weather maps and pioneered the scientific Invented weather maps and pioneered the scientific study of teastudy of tea--brewingbrewing

He was also wrong about some things He was also wrong about some things (e.g., eugenics)(e.g., eugenics)

His disciple, Karl Pearson, worked out the His disciple, Karl Pearson, worked out the mathematics of correlation and regressionmathematics of correlation and regression

(Look him up in Google or (Look him up in Google or WikipediaWikipedia, also , also Galton.orgGalton.org))

Page 8: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

8

Galton’s observationsGalton’s observationsA tallerA taller--thanthan--average parent tends to have a talleraverage parent tends to have a taller--thanthan--average child, but the child is likely to be average child, but the child is likely to be lesslesstall than the parent relative to its own generationtall than the parent relative to its own generation

Parent’s height =Parent’s height = xx standard deviations from the standard deviations from the mean mean ⇒⇒ child's child's predictedpredicted height =height = rrxx standard standard deviations from the mean,deviations from the mean,

... where ... where r r is a number is a number less than 1 in magnitudeless than 1 in magnitude: : the coefficient of the coefficient of correlation correlation between heights of between heights of parents and childrenparents and childrenThis is a "This is a "regression toward mediocrityregression toward mediocrity," or in ," or in modern terms a "modern terms a "regression to the meanregression to the mean."."

The first regression line (1877)The first regression line (1877)

Page 9: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

9

Graphical interpretation of regressionGraphical interpretation of regression

If you “standardize” the If you “standardize” the XX andand YY variables by variables by converting them to units of standard deviations converting them to units of standard deviations from their own means, from their own means, the prediction line passes the prediction line passes through the origin and has a slope equal to the through the origin and has a slope equal to the correlation coefficientcorrelation coefficient, , rr. .

Thus, the line “regresses” back toward the Thus, the line “regresses” back toward the XX--axis, because this minimizes the squared errors axis, because this minimizes the squared errors in predicting in predicting YY from from XX..

Graphical interpretationGraphical interpretation

standardize(X)

stan

dard

ize(

Y3)

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3On a “standardized” plot of Y On a “standardized” plot of Y vs. X, where the units are vs. X, where the units are standardstandard--deviationsdeviations--fromfrom--thethe--mean, the data mean, the data distribution is roughly distribution is roughly symmetric around the 45symmetric around the 45--degree line…degree line…

……but the line for but the line for predicting predicting YY from from XX“regresses” toward the “regresses” toward the XXaxis because this axis because this minimizes the squared minimizes the squared error in the error in the Y Y direction.direction.

The The slopeslope of the regression line of the regression line on the standardized plot is the on the standardized plot is the correlation correlation rr (=0.46 in this case).(=0.46 in this case).

Page 10: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

10

Graphical interpretationGraphical interpretation

standardize(X)

stan

dard

ize(

Y3)

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

If we instead wanted to If we instead wanted to predict predict XX from from YY, the line , the line would regress to the would regress to the YYaxis instead! (This line axis instead! (This line would minimize the would minimize the squared error measured squared error measured in the in the XX direction.)direction.)

Graphical interpretation of regressionGraphical interpretation of regressionwith with time seriestime series datadata

In a simple regression of two In a simple regression of two time seriestime series, the , the forecast plot of forecast plot of YY is just is just a shifted and rescaled a shifted and rescaled copy of the time series plot of Xcopy of the time series plot of X

In a multiple regression of time series, the forecast In a multiple regression of time series, the forecast plot of plot of YY is a is a weighted sum*weighted sum* of the plots of the of the plots of the XX’s’s

In either case, the time pattern in In either case, the time pattern in YY should “look should “look like” some of the time patterns in the like” some of the time patterns in the XX variables: variables: trends and peaks and valleys and spikes in trends and peaks and valleys and spikes in YYideally should have their counterparts somewhere ideally should have their counterparts somewhere among the among the XX’s’s

* weights can be positive or negative

Page 11: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

11

Regression is inescapableRegression is inescapable

Your kids will Your kids will probablyprobably be less exceptional than you be less exceptional than you and your spouse, for better or worseand your spouse, for better or worseYour performance on the final exam in a course will Your performance on the final exam in a course will probablyprobably be less exceptional than your score on the be less exceptional than your score on the midtermmidtermA ballplayer’s performance during the 2nd half of a A ballplayer’s performance during the 2nd half of a season will season will probablyprobably be less exceptional than in the be less exceptional than in the 1st half1st halfThe hottest mutual funds of the last 5 years will be The hottest mutual funds of the last 5 years will be less hot in the next 5 yearsless hot in the next 5 years

Regression is inescapable, cont’dRegression is inescapable, cont’d

Your forecasting models will always produce Your forecasting models will always produce sequences of forecasts that are sequences of forecasts that are smoother smoother (less (less variable)variable) than the actual datathan the actual data

This doesn’t mean the future is This doesn’t mean the future is guaranteedguaranteed to be to be more mediocre (less interesting) than the past, but more mediocre (less interesting) than the past, but that’s the way to bet!that’s the way to bet!

Page 12: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

12

Why do predictions regress?Why do predictions regress?

Is there a “restoring force” that pulls everything Is there a “restoring force” that pulls everything back to the mean? back to the mean?

No! It’s a purely statistical phenomenon.No! It’s a purely statistical phenomenon.

Every observation of a random process is part Every observation of a random process is part “signal” (a predictable or inheritable component) “signal” (a predictable or inheritable component) and part “noise” (a random, unpredictable, zeroand part “noise” (a random, unpredictable, zero--mean component).mean component).

Here’s why:Here’s why:An observation that is An observation that is exceptional exceptional (far above or (far above or below the mean) is likely to be composed of a below the mean) is likely to be composed of a signal and a noise term with the signal and a noise term with the same signsame sign (both (both positive or both negative).positive or both negative).

If the highIf the high-- (or low(or low--) achiever performs again (or ) achiever performs again (or has offspring), the expected signal will be just as has offspring), the expected signal will be just as strong and in the same direction, but the expected strong and in the same direction, but the expected noise term will be noise term will be zerozero. .

Hence the second observation is Hence the second observation is likelylikely (not (not guaranteed, just likely) to be closer to the mean.guaranteed, just likely) to be closer to the mean.

Page 13: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

13

Underlying assumptions of regressionUnderlying assumptions of regressionLinearLinear relationship between variablesrelationship between variables

Constant varianceConstant variance of errors (of errors (homoscedasticityhomoscedasticity))

Normally distributedNormally distributed errorserrors

IndependentIndependent errors (no autocorrelation)errors (no autocorrelation)

StationaryStationary process (stable correlations over time)process (stable correlations over time)

These need to be tested!These need to be tested!

Error statistics and confidence intervals for Error statistics and confidence intervals for forecasts are not reliable if the assumptions are forecasts are not reliable if the assumptions are badly violated badly violated

Sufficient statistics for regressionSufficient statistics for regressionRegression analysis depends only on the Regression analysis depends only on the

following summary statistics of the data:following summary statistics of the data:

MeansMeans of all variables of all variables

Variances Variances (or standard deviations) of all (or standard deviations) of all variables variables

CovariancesCovariances (or correlations) between all pairs (or correlations) between all pairs of variablesof variables

Given only these statistics, you can calculate all the coefficieGiven only these statistics, you can calculate all the coefficient nt estimates, standard errors, and forecasts for any regression estimates, standard errors, and forecasts for any regression model that might be fitted to any combination of the variables! model that might be fitted to any combination of the variables! (However, you still ought to look at residual plots, etc.,(However, you still ought to look at residual plots, etc.,……))

Page 14: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

14

““Variance” measures the tendency of a Variance” measures the tendency of a variablevariable to to varyvary (away from its mean)(away from its mean)

...the population variance is the average squared ...the population variance is the average squared deviation of deviation of YY from its own meanfrom its own mean

SampleSample variance: variance: VAR(VAR(YY) = () = (nn/(/(nn--1))VARP(1))VARP(YY))

...an unbiased estimate...an unbiased estimate of the true variance based of the true variance based on a finite sample of size on a finite sample of size nn

))((AVG)(VARP 2YYY −=Population Population variance:variance:

Our forecasting task is to Our forecasting task is to ““explainexplain”” the variance in the variance in Y. Y. Why does it Why does it vary in the way that it doesvary in the way that it does——i.e., why isni.e., why isn’’t it always constant?t it always constant?

This factor adjusts for the This factor adjusts for the ““degree of degree of freedom for errorfreedom for error”” that was used up back that was used up back

calculating the mean from the same sample.calculating the mean from the same sample.

““Covariance” measures the tendency of Covariance” measures the tendency of two variables to vary two variables to vary togethertogether

... is the average ... is the average productproduct of the deviations of of the deviations of XX and and YYfrom their from their respective respective meansmeansIf If YY and and XX tend to be on the tend to be on the same sidesame side of their of their respective means at the same time (both above or bothrespective means at the same time (both above or bothbelow), the average product of deviations is below), the average product of deviations is positivepositive..If they tend to be on If they tend to be on oppositeopposite sides of their own means sides of their own means at any given time, the average product is at any given time, the average product is negativenegative..If their variations around their own means are If their variations around their own means are unrelatedunrelated, the average product is, the average product is zerozero..

)))(((AVG),(COVP YYXXYX −−=Population Population covariance:covariance:

Page 15: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

15

Sample covarianceSample covariance

SampleSample covariance: covariance:

COV(COV(XX,,YY) = () = (nn/(/(nn−−1))COVP(1))COVP(X,YX,Y))

...an unbiased estimate of the true covariance based ...an unbiased estimate of the true covariance based on a sample of size on a sample of size nn, analogous to the sample , analogous to the sample variance.variance.

CorrelationCorrelationThe The correlationcorrelation coefficient coefficient is the covariance is the covariance “standardized” by dividing by the product of standard “standardized” by dividing by the product of standard deviations:deviations:

rr = COV(= COV(X,YX,Y)/STDEV()/STDEV(XX)STDEV()STDEV(YY))

= COVP(= COVP(X,YX,Y)/STDEVP()/STDEVP(XX)STDEVP()STDEVP(YY))

= = CORREL(CORREL(X,YX,Y) in Excel) in Excel

It measures the It measures the strength of the linear relationshipstrength of the linear relationshipbetween between XX & & YY on a relative scale of on a relative scale of --1 to +11 to +1

When the correlation is significantly different from zero, When the correlation is significantly different from zero, variations in variations in XX can help to predict variations in can help to predict variations in YY

Page 16: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

16

Model assumption:Model assumption:“intercept” “slope”“intercept” “slope”

Prediction equation:Prediction equation:

“Least squares” coefficient estimates:“Least squares” coefficient estimates:

Simple regression formulasSimple regression formulas

0 1t tˆ ˆY Xβ β= +

))/VAR(COV(1 XX,Y=β̂)))/STDEV((STDEV( XYr=

)AVG( )AVG( 10 XY ββ ˆˆ −=

The slope The slope coefficient is coefficient is just the just the correlation correlation multiplied by multiplied by the ratio of the ratio of standard standard deviations!deviations!

0 1t t tY Xβ β ε= + +

εεtt = independent = independent identically normally identically normally distributed errordistributed error

We have We have exact formulasexact formulas for the coefficient estimatesfor the coefficient estimates——dondon’’t need to use Solver to minimize squared error.t need to use Solver to minimize squared error.

Multiple regression formulasMultiple regression formulas

The formulas for coefficient estimates and The formulas for coefficient estimates and forecast standard errors for the forecast standard errors for the multiple multiple regression model are merely regression model are merely matrix matrix versions of versions of the preceding formulas.the preceding formulas.

If you’re interested in the gory details, see the If you’re interested in the gory details, see the “Regression formulas” worksheet “Regression formulas” worksheet (SIMPREG.XLS) posted on the Course Outline (SIMPREG.XLS) posted on the Course Outline web page (lecture 5 links).web page (lecture 5 links).

Page 17: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

17

Standard error of the regressionStandard error of the regressionThe standard error of the regression, a.k.a., The standard error of the regression, a.k.a., ““standard error of the estimate”, is the RMSE standard error of the estimate”, is the RMSE adjusted for # coefficients estimated:adjusted for # coefficients estimated:

21 12 2 STDEV( ) ( )(1 )VAR( )n n

n ns e r Y− −− −= = −

Adjustment for # coefficients estimated (2)

Fraction of variance

“unexplained”

Original sample

variance

ss is the estimated standard deviation of the true error process (is the estimated standard deviation of the true error process (εεtt),), and in and in general it is slightly larger than the sample standard deviationgeneral it is slightly larger than the sample standard deviation of the of the residuals, due to the adjustment for additional coefficients estresiduals, due to the adjustment for additional coefficients estimated imated besides the constant. All the other “standard errors” (for coefbesides the constant. All the other “standard errors” (for coefficients, ficients, means, forecasts, etc.) are proportional to this quantity.means, forecasts, etc.) are proportional to this quantity.

Sample standard deviation of the

residuals (errors)

Standard errors of the coefficient Standard errors of the coefficient estimatesestimates

Standard error Standard error of the slope coefficientof the slope coefficient

1STDEVP( )

sSEXnβ

⎛ ⎞= ⎜ ⎟⎝ ⎠

11

1

ββ

βˆ

ˆˆ

SEt =

The larger the The larger the sample size (sample size (nn), ), the more the more precise the precise the coefficient coefficient estimateestimate

The The pp--value of the value of the tt--stat is stat is TDIST(TDIST(tt, , nn−−2,2) in Excel2,2) in Excel

tt--statisticstatistic of the slope coefficient:of the slope coefficient:

Page 18: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

18

Standard error of the meanStandard error of the mean

The The standard error of the meanstandard error of the mean at at X = X = XXtt is the is the standard deviation of the error in estimating the standard deviation of the error in estimating the “true” height of the regression line at that point:“true” height of the regression line at that point:

2( AVG( ))VARP( )1 tX X

mean XsSEn

−= +

Same as standard error of the mean in the “mean” model

Correction factor for distance of Xt from the

mean

Standard error of the forecastStandard error of the forecast

The The standard error of the forecaststandard error of the forecast isis2( AVG( ))2 2

VARP( )11 1 tX X

fcst mean XSE s SE sn

−⎛ ⎞= + = + +⎜ ⎟⎝ ⎠

This term measures the “noise” (unexplained variation) in the data

This term measures the error in estimating the height of the “true” regression line at X = Xt

Note that this is Note that this is almostalmost the same formula we used for the “mean” the same formula we used for the “mean” model in class 1. The only difference is that calculating model in class 1. The only difference is that calculating SESEmeanmean is is slightly more complicated hereslightly more complicated here——it depends on the value of it depends on the value of XXtt..

Page 19: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

19

Lower bounds on standard errorsLower bounds on standard errors

is a lower bound on the standard error is a lower bound on the standard error of the meanof the mean(…(…equalledequalled only when X=AVG(X) ) only when X=AVG(X) )

sn

11 ns +is the corresponding lower bound is the corresponding lower bound on the standard error of the on the standard error of the forecastforecast

Key point: Key point: the standard errors of the forecasts for Y arethe standard errors of the forecasts for Y arelarger for values of X that are farther from the mean, larger for values of X that are farther from the mean, i.e., farther from the “center” of the data distribution i.e., farther from the “center” of the data distribution

Confidence limitsConfidence limits

Confidence limits for a forecast are obtained by adding Confidence limits for a forecast are obtained by adding and subtracting the appropriate multiples of the and subtracting the appropriate multiples of the forecast standard error (as usual).forecast standard error (as usual).

For large For large nn (>20) a rough 95% confidence interval is (>20) a rough 95% confidence interval is plus or minus 2 standard errorsplus or minus 2 standard errors

The exact number of standard errors for a 95% The exact number of standard errors for a 95% interval, for any interval, for any nn, is given by TINV(.05,, is given by TINV(.05,nn−−2) in Excel2) in Excel

A 50% interval is roughly 1/3 as wide (plus or minus A 50% interval is roughly 1/3 as wide (plus or minus 2/3 standard error)2/3 standard error)

Page 20: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

20

X

Y2

50 100 150 200 250 30050

100

150

200

250Xt = 210

95% conf. int. 95% conf. int. for meanfor mean

Note that confidence intervals Note that confidence intervals are wider when are wider when XX is far from the is far from the centercenter——this probably this probably understatesunderstates the danger of overthe danger of over--extrapolating a linear model!extrapolating a linear model!

X

Y2

50 100 150 200 250 30050

100

150

200

250Xt = 210

95% conf. int. 95% conf. int. for forecastfor forecast

The confidence interval for the forecast The confidence interval for the forecast reflects both the “parameter risk” reflects both the “parameter risk” concerning the slope & intercept of the concerning the slope & intercept of the regression line and the “intrinsic risk” regression line and the “intrinsic risk” of random variations around it.of random variations around it.

Page 21: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

21

Strange but trueStrange but true

For a For a simplesimple regression model:regression model:

)VAR()VAR()VAR( eYY += ˆFor For anyany regression model:regression model:

Total variance “Explained” variance + “Unexplained” variance

2 )VAR()VAR( rYY =/ˆ

…i.e., fraction of variance explained = “r squared”

RR--squaredsquaredThe term “R squared” refers to the The term “R squared” refers to the fraction of fraction of variance explainedvariance explained, i.e., the ratio , i.e., the ratio regardless of the number of regardless of the number of regressorsregressors..It measures It measures the improvement of the regression the improvement of the regression model over the mean modelmodel over the mean model for for YY..

)VAR()VAR( YY /ˆ

A bigger RA bigger R--squared is usually better, squared is usually better, for the same for the same YY, , but Rbut R--squared should not be used to compare squared should not be used to compare models that may have used models that may have used different transformationsdifferent transformationsof of YY and/or and/or different data samplesdifferent data samples..

RR--squared can be very misleading for regressions squared can be very misleading for regressions involving time series data: 90% or more is not involving time series data: 90% or more is not necessarily “good”, and 10% or less is not necessarily “good”, and 10% or less is not necessarily “bad.”necessarily “bad.”

Page 22: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

22

Example: rolling back the beer taxExample: rolling back the beer taxSuppose the 1991 beer tax had been rolled back in Suppose the 1991 beer tax had been rolled back in July 2007, resulting an immediate 10July 2007, resulting an immediate 10--point drop in point drop in the beer price index (from 118.23 to 108.23)the beer price index (from 118.23 to 108.23)

What would be the expected effect on per capita What would be the expected effect on per capita real consumption (“real consumption (“BeerPerCapitaBeerPerCapita”)?”)?

What would we predict for the consumption rate in What would we predict for the consumption rate in July 2007? (June 2007 rate is $268.65 per year July 2007? (June 2007 rate is $268.65 per year SAAR in yearSAAR in year--2000 beer dollars.)2000 beer dollars.)

Page 23: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

23

Plot of BeerPerCapita vs BeerRelPrice

0.55 0.65 0.75 0.85 0.95

BeerRelPrice

110

140

170

200

230

260

290

Bee

rPer

Cap

ita

1960 1970 1980 1990 2000 2010100

200

300

0.52

0.62

0.72

0.82

0.92

1.02VariablesBeerPerCapitaBeerRelPrice

In search of a linear model…In search of a linear model…

ScatterplotScatterplot of of BeerPerCapitaBeerPerCapita vs. vs. relativerelativeprice (price (BeerPriceBeerPrice/CPI) reveals a strong /CPI) reveals a strong negative negative correlation and a highly linear correlation and a highly linear relationship (except for midrelationship (except for mid--’90’s anomaly)’90’s anomaly)

The actual correlation is The actual correlation is --0.94, 0.94, which suggests that 0.94which suggests that 0.9422 ≈≈88% of the variance in 88% of the variance in BeerPerCapitaBeerPerCapita can be can be ““explainedexplained”” by by BeerRelPriceBeerRelPrice

Post tax hike Post tax hike anomalyanomaly

Assumed Assumed relativerelativeprice drop in June ‘07price drop in June ‘07

What will happen to per What will happen to per capita consumption in capita consumption in July 2007?July 2007?

Summary statistics of variablesSummary statistics of variables

Variable definitions:Variable definitions:BeerPerCapitaBeerPerCapita = 100000*Beer/(= 100000*Beer/(BeerPriceBeerPrice*Population)*Population)BeerRelPriceBeerRelPrice = = BeerPriceBeerPrice/CPI/CPI

Here are summary stats and correlations Here are summary stats and correlations of the two variables, obtained with the of the two variables, obtained with the MultipleMultiple--Variable analysis procedure. Note Variable analysis procedure. Note that the standard deviation of that the standard deviation of BeerPerCapitaBeerPerCapita is $39.15, which is is $39.15, which is essentially the forecast standard error we essentially the forecast standard error we would get by using the would get by using the meanmean model to model to predict it. How much better can we do with predict it. How much better can we do with a regression model? Well, there is a very a regression model? Well, there is a very strong negative correlation of strong negative correlation of --0.94 with 0.94 with BeerRelPriceBeerRelPrice, and the , and the squaresquare of the of the correlation is the fraction by which the correlation is the fraction by which the error error variancevariance can be reduced by can be reduced by regressing regressing BeerPerCapitaBeerPerCapita on on BeerRelPriceBeerRelPricerather instead of using the mean model.rather instead of using the mean model.

Page 24: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

24

Fitting a simple regression model:Fitting a simple regression model:

Relate/Multiple Factors/Multiple Regression Relate/Multiple Factors/Multiple Regression on the on the StatgraphicsStatgraphics menumenu

Typical regression outputTypical regression outputStandard error of the regression, a.k.a. standard Standard error of the regression, a.k.a. standard error of the estimate, is the RMSEerror of the estimate, is the RMSEadjusted for # coefficients estimatedadjusted for # coefficients estimated

RR--squared & squared & adjustedadjusted* R* R--squaredsquared

Coefficients & their standard errors,Coefficients & their standard errors,tt--stats (=stats (=coeffcoeff./std. error) & ./std. error) & pp--valuesvalues

Residual plots and diagnostic testsResidual plots and diagnostic tests

*Adjusted for # coefficients in the same way as the standard error of the regression, to be able to compare among models with different #’s of coefficients

←←The bottom line IF it The bottom line IF it is really representative is really representative

of future accuracyof future accuracy

←←Not the bottom line!Not the bottom line!

←←Used to test whether Used to test whether some variables are some variables are insignificant in the insignificant in the

presence of the otherspresence of the others

←←Used to test assumptions Used to test assumptions of linearity, normality, no of linearity, normality, no

autocorrelation, etc.autocorrelation, etc.

Page 25: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

25

What to look for in regression outputWhat to look for in regression outputError measures: Error measures: smaller is bettersmaller is better

tt--statistics of coefficients greater than 2 in magnitude? statistics of coefficients greater than 2 in magnitude? ((pp--values < 0.05) values < 0.05) ⇒⇒ variables appear variables appear ““significantsignificant””**

Economic interpretations of coefficientsEconomic interpretations of coefficients

Residual plots & diagnostic testsResidual plots & diagnostic tests

Residuals vs. predicted (nonlinearity?)Residuals vs. predicted (nonlinearity?)Normal probability plot (skew? fat tails? outliers?)Normal probability plot (skew? fat tails? outliers?)Residuals vs. time (for time series data)Residuals vs. time (for time series data)Residual autocorrelation plot (for time series data)Residual autocorrelation plot (for time series data)

*Not a hard and fast rule, but variables that don’t pass this test can often be removed without being missed. If a variable’s presence in the model is strongly supported by intuition or theory, then a low t-stat may be OK: its effect may just be hard to measure.

Basic regression outputBasic regression output

RR--squared = 88% as expected, slope coefficient (squared = 88% as expected, slope coefficient (--280.8) is highly significant 280.8) is highly significant (t(t--stat = stat = --64). Standard error of regression is $13.80, much less than ori64). Standard error of regression is $13.80, much less than original ginal standard deviation, but still a lot of error in predicting next standard deviation, but still a lot of error in predicting next month’s per capita month’s per capita consumption! Durbinconsumption! Durbin--Watson stat and lagWatson stat and lag--1 autocorrelation are also very bad! 1 autocorrelation are also very bad! (DW should be close to 2, not zero, lag(DW should be close to 2, not zero, lag--1 auto should be close to zero, not 1!)1 auto should be close to zero, not 1!)

The “Interval plot” option The “Interval plot” option plots the regression line plots the regression line vs. the dependent vs. the dependent variable or time index.variable or time index.

Page 26: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

26

Deconstruction of RDeconstruction of R--squaredsquaredThe variance of the dependent variable is The variance of the dependent variable is (39.15)(39.15)22 = 1533. This is the error variance you would = 1533. This is the error variance you would get by using the mean model.get by using the mean model.

The variance of the regression forecast errors is the The variance of the regression forecast errors is the square of the regression standard error, which is square of the regression standard error, which is (13.8)(13.8)22 = 190. = 190.

The fraction of the original variance that remains The fraction of the original variance that remains “unexplained” is 190/1532 “unexplained” is 190/1532 ≈≈ 12%, hence the fraction 12%, hence the fraction “explained” is 88%.“explained” is 88%.

This is the reduction in error variance compared to This is the reduction in error variance compared to using the mean model instead.using the mean model instead.

What’s the DurbinWhat’s the Durbin--Watson statistic?Watson statistic?It’s just an alternative statistic for measuring lagIt’s just an alternative statistic for measuring lag--1 1 autocorrelation in the residuals, which is popular in autocorrelation in the residuals, which is popular in regression analysis for obscure historical reasonsregression analysis for obscure historical reasons

0< DW < 4, and ideally DW 0< DW < 4, and ideally DW ≈≈ 22

DW DW ≈≈ 2(12(1−−rr11) where ) where rr11 = lag= lag--1 autocorrelation1 autocorrelation

rr1 1 is easier to interpret: a good value is close to 0, is easier to interpret: a good value is close to 0, and and rr11

22 is roughly the percentage of further is roughly the percentage of further variance reduction that could be achieved by finevariance reduction that could be achieved by fine--tuning to reduce the autocorrelation.tuning to reduce the autocorrelation.

Page 27: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

27

Economic interpretation of modelEconomic interpretation of modelThe slope coefficient of The slope coefficient of --280.8 suggests that a .01 280.8 suggests that a .01 decrease in the relative price (decrease in the relative price (XX) should increase ) should increase consumption (consumption (YY) by $2.81 ) by $2.81 The proposed tax rollback would decrease the The proposed tax rollback would decrease the relative price by 0.049 (from 0.569 to 0.520)relative price by 0.049 (from 0.569 to 0.520)Thus, predicted consumption in July ‘07 will increase Thus, predicted consumption in July ‘07 will increase by 0.049*280.8 = $13.76 from its by 0.049*280.8 = $13.76 from its predictedpredicted June ‘07 June ‘07 valuevalueBut the model’s prediction for June ‘07 is already But the model’s prediction for June ‘07 is already way off!way off!Hence the prediction for July ’07 is Hence the prediction for July ’07 is lessless than the than the actual June ’07 value, despite the price drop.actual June ’07 value, despite the price drop.

Forecasting equation of model 1Forecasting equation of model 1Forecasting equation of this model:Forecasting equation of this model:

YYtt = 386.3 = 386.3 –– 280.8 280.8 XXtt

For July ‘07:For July ‘07:

YYtt = 386.3 = 386.3 –– 280.8(0.52) 280.8(0.52) ≈≈ 240.20240.20

240.20 ??0.520July ‘07268.650.569June ‘07272.330.565May ‘07

BeerPerCapita(Y)

BeerRelprice(X)

The forecast for The forecast for Y Y depends (only) on the depends (only) on the current value of current value of X, X, not not on recent values of on recent values of YY

Page 28: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

28

The forecast (“Reports”) reportThe forecast (“Reports”) report

The multiple regression procedure automatically shows forecasts The multiple regression procedure automatically shows forecasts (on the “Reports” report) (on the “Reports” report) if future values are provided for the independent if future values are provided for the independent variable(svariable(s)). Here, a July 2007 value of 0.52 was plugged in for . Here, a July 2007 value of 0.52 was plugged in for BeerRelPriceBeerRelPrice on the data spreadsheet, and the resulting forecast for on the data spreadsheet, and the resulting forecast for BeerPerCapitaBeerPerCapita is $240.20, which is $28.45 is $240.20, which is $28.45 belowbelow the current value of the current value of $268.65. The upper 95% limit of $267.4 is even below the curren$268.65. The upper 95% limit of $267.4 is even below the current value!t value!

Here’s the plot of the regression line with confidence 95% confiHere’s the plot of the regression line with confidence 95% confidence limits dence limits for the forecasts. This is the “Interval plots” chart drawn witfor the forecasts. This is the “Interval plots” chart drawn with 95% intervals h 95% intervals for “predicted values” (a rightfor “predicted values” (a right--mousemouse--button option). The interval for the July button option). The interval for the July ’07 prediction is at the upper left where ’07 prediction is at the upper left where BeerRelPriceBeerRelPrice=0.52.=0.52.

Last data point (268.65) Last data point (268.65) is somewhere in hereis somewhere in here

95% CI for forecast for 95% CI for forecast for BeerRelPriceBeerRelPrice = 0.52= 0.52

Page 29: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

29

Plot of residuals vs. row Plot of residuals vs. row number (time) shows number (time) shows severe autocorrelation, severe autocorrelation, i.e., long runs of values i.e., long runs of values with the same sign, as with the same sign, as foretold by bad DW stat foretold by bad DW stat and lagand lag--1 autocorrelation, 1 autocorrelation, and the most recent and the most recent errors have been errors have been especially large.especially large.

Plot of predicted values (red) vs. row number (“Interval plot”) Plot of predicted values (red) vs. row number (“Interval plot”) shows poor fit to data, and the predicted jump in July ‘07 fallsshows poor fit to data, and the predicted jump in July ‘07 fallswell short of the June ‘07 value. The predicted values are well short of the June ‘07 value. The predicted values are actually just a shifted and rescaled version of actually just a shifted and rescaled version of BeerRelPriceBeerRelPrice..

Regression option in Forecasting procedureRegression option in Forecasting procedure

You can also fit the same You can also fit the same regression model in the regression model in the Forecast/UserForecast/User--Specified Specified Model procedure. Model procedure. Choose the “Mean” model Choose the “Mean” model type and hit the type and hit the “Regression” button to “Regression” button to add independent add independent variables. This approach variables. This approach allows you to use the allows you to use the modelmodel--comparison comparison features and additional features and additional residual diagnostics in the residual diagnostics in the forecasting procedure. forecasting procedure. Caveat: no more than 4 Caveat: no more than 4 independent variables are independent variables are allowed here.allowed here.

Here model E is Here model E is specified as a “mean + 1 specified as a “mean + 1 regressorregressor” model” model

Page 30: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

30

……the normal the normal probability plot and probability plot and autocorrelation plot autocorrelation plot of the residuals look of the residuals look terribleterrible, and the , and the comparison with comparison with simpler time series simpler time series models is not models is not flattering!flattering!

Same regression results Same regression results and forecast, but…and forecast, but…

Conclusion (so far…)Conclusion (so far…)Although this model provides a plausible estimate of Although this model provides a plausible estimate of the the macroeconomicmacroeconomic relationship between relative price relationship between relative price and per capita consumption (assuming that the longand per capita consumption (assuming that the long--term upward trend in consumption is term upward trend in consumption is entirelyentirely caused caused by the longby the long--term downward trend in relative price!), it term downward trend in relative price!), it does not do a very plausible job of forecasting the does not do a very plausible job of forecasting the near future.near future.

Why not? It is a “crossWhy not? It is a “cross--sectional” model that does not sectional” model that does not exploit theexploit the time dimensiontime dimension in the data: it predicts in the data: it predicts consumption for a “randomly chosen” relative price.consumption for a “randomly chosen” relative price.

Due to other, Due to other, unmodeledunmodeled factors, the data wanders factors, the data wanders away from the regression line and does not return very away from the regression line and does not return very quicklyquickly——errors are strongly correlatederrors are strongly correlated..

Page 31: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

31

How to incorporate the time How to incorporate the time dimension in a regression model?dimension in a regression model?

Some possible approaches:Some possible approaches:Predict Predict changeschanges instead of levels (i.e., use a firstinstead of levels (i.e., use a first--difference transformation)difference transformation)Use Use laggedlagged variables (recent values of dependent variables (recent values of dependent and independent variables) as additional and independent variables) as additional regressorsregressors, to serve as proxies for effects of , to serve as proxies for effects of unmodeledunmodeled variables*variables*Use an “autocorrelation correction” (e.g., Use an “autocorrelation correction” (e.g., CochraneCochrane--OrcuttOrcutt or ARIMA error structure) as a or ARIMA error structure) as a proxy for proxy for unmodeledunmodeled factors*factors*

*We’ll discuss these in later classes…

Let’s look at monthly changes…Let’s look at monthly changes…

Here’s a plot of the original Here’s a plot of the original BeerPerCapitaBeerPerCapitaseries obtained in the Time Series/Descriptive series obtained in the Time Series/Descriptive Methods procedure. No transformations have Methods procedure. No transformations have been performed yet.been performed yet.

Page 32: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

32

On the rightOn the right--mousemouse--button Analysis Options panel, entering a 1 in the button Analysis Options panel, entering a 1 in the Differencing box performs a firstDifferencing box performs a first--difference transformation. Now we are difference transformation. Now we are seeing the plot of seeing the plot of monthmonth--toto--month changesmonth changes in in BeerPerCapitaBeerPerCapita..

Here are time series plots of both Here are time series plots of both BeerPerCapitaBeerPerCapita and and BeerRelPriceBeerRelPrice, before , before and after a firstand after a first--difference transformation. Note that the differenced series difference transformation. Note that the differenced series appear to be “stationary”: no trend, constant variance, etc. Tappear to be “stationary”: no trend, constant variance, etc. The circled point he circled point in lower right is the assumed price impact of tax rollback in Juin lower right is the assumed price impact of tax rollback in July 2007. (This ly 2007. (This 44--chart arrangement was made by pasting the plots into the “chart arrangement was made by pasting the plots into the “StatgalleryStatgallery.”).”)

Page 33: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

33

Plot of diff(BeerPerCapita) vs diff(BeerRelPrice)

-0.03 -0.02 -0.01 0 0.01 0.02 0.03 0.04

diff(BeerRelPrice)

-15

-10

-5

0

5

10

15

diff(

Bee

rPer

Cap

ita)

ScatterplotScatterplot of differenced variables indicates a weaker but still significaof differenced variables indicates a weaker but still significant nt negative correlation (negative correlation (--0.33). The two circled points in the lower right are the 0.33). The two circled points in the lower right are the drops in Jan. and Feb. 1991 due to the beer tax increase. Even wdrops in Jan. and Feb. 1991 due to the beer tax increase. Even when these two hen these two points are depoints are de--selected, there is still a significant negative correlation of selected, there is still a significant negative correlation of --0.28.0.28.

Here the Here the Plot/Plot/ScatterplotScatterplot/X/X--Y Plot Y Plot procedure was used to procedure was used to plot plot diff(BeerPerCapitadiff(BeerPerCapita) ) vs. vs. diff(BeerRelPricediff(BeerRelPrice). ).

Statistics of the differenced variablesStatistics of the differenced variables

The correlations and other summary stats The correlations and other summary stats of the differenced variables were computed of the differenced variables were computed using the Describe/Numeric Data/Multipleusing the Describe/Numeric Data/Multiple--Variable Analysis procedure. Note that the Variable Analysis procedure. Note that the standard deviation of standard deviation of diff(BeerPerCapitadiff(BeerPerCapita) is ) is only $2.637. This is roughly the forecast only $2.637. This is roughly the forecast standard error you would get by using a standard error you would get by using a random walk with driftrandom walk with drift model to predict model to predict BeerPerCapitaBeerPerCapita, because the random walk , because the random walk model merely predicts that each change will model merely predicts that each change will equal the mean change. Hence we can equal the mean change. Hence we can already see that the forecast standard error already see that the forecast standard error of the RW model is smaller than that of the of the RW model is smaller than that of the original crossoriginal cross--sectional regression by sectional regression by roughly a factor of 5. However, let’s see if roughly a factor of 5. However, let’s see if we can improve on the RW model by we can improve on the RW model by regressing regressing diff(BeerPerCapitadiff(BeerPerCapita) on ) on diff(BeerRelPricediff(BeerRelPrice)….)….

Page 34: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

34

Simple regression of Simple regression of diff(BeerPerCapitadiff(BeerPerCapita) ) on on diff(BeerRelPricediff(BeerRelPrice))

In a simple regression of In a simple regression of the differenced variables, the differenced variables, the the changechange in in BeerPerCapitaBeerPerCapita is is predicted from the predicted from the changechange in in BeerRelPriceBeerRelPrice. . This is a “micro” This is a “micro” prediction rather than a prediction rather than a “macro” prediction. Our “macro” prediction. Our predictedpredicted level level of of BeerPerCapitaBeerPerCapita in the in the next period will be equal next period will be equal to the current level to the current level plus plus the predicted change.the predicted change.

The estimated coefficient of The estimated coefficient of diff(BeerRelPricediff(BeerRelPrice ) is ) is --257.7, in the same ballpark 257.7, in the same ballpark as the coefficient of as the coefficient of BeerRelPriceBeerRelPrice in the earlier model. Hence a similar change in the earlier model. Hence a similar change in consumption per unit change in relative price is predicted. in consumption per unit change in relative price is predicted. However, this However, this model is directly predicting the model is directly predicting the changechange, not the level. The predicted change in , not the level. The predicted change in July ‘07 is positive (+$12.65) in line with intuition. But whatJuly ‘07 is positive (+$12.65) in line with intuition. But what happened to happened to RR--squared? It’s fallen to around 11%! (Horrors)squared? It’s fallen to around 11%! (Horrors)

Still some autocorrelation, Still some autocorrelation, but not nearly as bad.but not nearly as bad.

Regression standard error Regression standard error is vastly superior!is vastly superior!

Page 35: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

35

What happened to RWhat happened to R--squared??squared??

The previous model explained 88% of the The previous model explained 88% of the variance… in the monthlyvariance… in the monthly levellevel of of BeerPerCapitaBeerPerCapita. . Because Because BeerPerCapitaBeerPerCapita is a is a nonstationarynonstationary, , trended variable, it has a lot of variance to explain!trended variable, it has a lot of variance to explain!

This model directly predicts the This model directly predicts the changechange in in BeerPerCapitaBeerPerCapita, which is a stationary series with a , which is a stationary series with a much lower variance to begin with.much lower variance to begin with.

Hence, less variance Hence, less variance remainsremains to be explained by to be explained by this regression model, and an Rthis regression model, and an R--squared of only squared of only 11% is actually a much better performance.11% is actually a much better performance.

Another way to look at it:Another way to look at it:When the dependent variable is When the dependent variable is undifferencedundifferenced,,RR--squared measures the reduction in error variance squared measures the reduction in error variance compared to using the compared to using the meanmean model.model.

When the dependent variable is When the dependent variable is differenceddifferenced, , RR--squared measures the reduction in error variance squared measures the reduction in error variance compared to using the compared to using the random walkrandom walk with drift with drift model on model on the original variable.the original variable.Here, a random walk model (or another simple time Here, a random walk model (or another simple time series model) would have been a much better series model) would have been a much better “reference point” for predictions of monthly per capita “reference point” for predictions of monthly per capita beer consumption.beer consumption.The regression of differenced variables is a “walk” The regression of differenced variables is a “walk” model in which the steps are not completely random: model in which the steps are not completely random: they depend on the change in price!they depend on the change in price!

Page 36: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

36

Deconstruction of RDeconstruction of R--squaredsquaredThe variance of the differenced dependent variable is The variance of the differenced dependent variable is (2.637)(2.637)22 = 6.95. This is the error variance you would = 6.95. This is the error variance you would get by using the get by using the random walk with drift random walk with drift model on the model on the original original undifferencedundifferenced variable.variable.

The variance of the regression forecast errors is the The variance of the regression forecast errors is the square of the regression standard error, which is now square of the regression standard error, which is now (2.494)(2.494)22 = 6.22= 6.22The fraction of the original variance that remains The fraction of the original variance that remains “unexplained” is 6.22/6.95 “unexplained” is 6.22/6.95 ≈≈ 89%, hence the fraction 89%, hence the fraction “explained” is 11%.“explained” is 11%.This is not a huge improvement over the random walk This is not a huge improvement over the random walk model in terms of forecast accuracy, but it does allow model in terms of forecast accuracy, but it does allow us to factor in the price sensitivity of consumers.us to factor in the price sensitivity of consumers.

Forecasting equation for model 2Forecasting equation for model 2Forecasting equation for the Forecasting equation for the changechange in in YY::((YYtt –– YYtt--11) = 0.0893 ) = 0.0893 –– 257.7(257.7(XXtt –– XXtt--11))For July ‘07:For July ‘07:

((YYtt –– YYtt--11) = 0.0893 ) = 0.0893 –– 257.7 (0.520257.7 (0.520--.569) = 12.65.569) = 12.65

““UndifferencedUndifferenced” forecast for new ” forecast for new levellevel ofof YY::YYtt = = YYtt--11 + 12.65 = 268.65 + 12.65 = 281.30+ 12.65 = 268.65 + 12.65 = 281.30

281.30 !!0.520Feb-05

268.650.569Jan-05272.330.565Dec-04

BeerPerCapita (Y)BeerRelprice (X)The ultimate forecast from The ultimate forecast from this model “steps off” from this model “steps off” from the last actual value of the last actual value of YY, as , as in the random walk model, in the random walk model, but now but now the step size the step size depends on the change independs on the change in XX

Page 37: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

37

Same model in Forecasting procedureSame model in Forecasting procedure

There are several ways There are several ways in which the differenced in which the differenced regression model can be regression model can be fitted in the Forecasting fitted in the Forecasting procedure. The simplest procedure. The simplest way is to specify it as an way is to specify it as an ARIMA model with 1 ARIMA model with 1 order of order of nonseasonalnonseasonaldifferencing plus 1 differencing plus 1 regressorregressor and a constant and a constant term. The firstterm. The first--difference difference transformation is applied transformation is applied to to bothboth variables prior to variables prior to fitting the regression fitting the regression model.model.

……and the normal and the normal probability plot and probability plot and autocorrelation plot of autocorrelation plot of the residuals are the residuals are much much betterbetter (not perfect, but (not perfect, but acceptable). The acceptable). The differenced regression differenced regression model (B) is best on all model (B) is best on all error measures, but not error measures, but not by a large margin.by a large margin.

Almost Almost the same the same regression results and regression results and forecast (slightly different forecast (slightly different estimation procedure)…estimation procedure)…

Page 38: Decision 411: Class 5 - Fuqua School of Businessrnau/Decision411...¾A taller-than-average parent tends to have a taller-than-average child, but the child is likely to be less tall

38

More fine tuning??More fine tuning??The differenced model still has a technically The differenced model still has a technically significant lagsignificant lag--1 autocorrelation of 1 autocorrelation of --.23. .23.

Because it is negative, it means the model is Because it is negative, it means the model is overover--reacting reacting rather than underrather than under--reacting to recent reacting to recent changes in the data.changes in the data.By the By the rr--squared rule, this suggests that squared rule, this suggests that 0.230.2322 = 0.05 = 0.05 ≈≈ 5% of the 5% of the remaining remaining variance might variance might be explainedbe explained via more finevia more fine--tuning (e.g., adding tuning (e.g., adding lagged variables).lagged variables).This is not a large improvement: it corresponds to This is not a large improvement: it corresponds to about a 2.5% further reduction in standard error, about a 2.5% further reduction in standard error, hence a 2.5% shrinkage in confidence intervals.hence a 2.5% shrinkage in confidence intervals.

Class 5 recapClass 5 recap““Regression to mediocrity” is inescapableRegression to mediocrity” is inescapable

Correlations and Correlations and scatterplotsscatterplots help to reveal help to reveal strengths of linear relationshipsstrengths of linear relationships

How to interpret regression output & test residualsHow to interpret regression output & test residuals

Much of the variance in the original data may be Much of the variance in the original data may be “explainable” merely by an appropriate “explainable” merely by an appropriate transformation of the data, such as a firsttransformation of the data, such as a first--difference transformation applied to difference transformation applied to nonstationarynonstationarytime series variables. time series variables.

RR--squared is not the bottom line!squared is not the bottom line!