decision 411: forecasting - fuqua school of businessrnau/decision411_2007/411class01_… ·...

Decision 411: ForecastingDecision 411: Forecasting

Professor: Bob NauProfessor: Bob Nau

Course content:Course content:How to predict the futureHow to predict the futureHow to learn from the pastHow to learn from the past…using data analysis…using data analysis

Who should be interested:Who should be interested:Anyone on a quantitative career track (financial Anyone on a quantitative career track (financial investments, marketing research, consulting, investments, marketing research, consulting, operations, accounting, econometrics, operations, accounting, econometrics, engineering, environmental science …)engineering, environmental science …)

Anyone who wants more experience in Anyone who wants more experience in computer modeling & data analysiscomputer modeling & data analysis

Anyone who needs to make decisions based Anyone who needs to make decisions based on forecasts provided by otherson forecasts provided by others

Forecasts are used at every Forecasts are used at every organizational levelorganizational level

Corporate Strategy

FinanceMarketing Accounting

Production, Operations & Supply Chain

Sales

Many numbers…. or one number?

2003 Nobel 2003 Nobel Prize(sPrize(s) in Economics ) in Economics awarded for forecasting methodsawarded for forecasting methods

Robert F. EngleRobert F. Engle“for methods of analyzing economic time “for methods of analyzing economic time series with timeseries with time--varying volatility (ARCH)”varying volatility (ARCH)”

Clive W.J. GrangerClive W.J. Granger"for methods of analyzing economic time "for methods of analyzing economic time series with common trends (series with common trends (cointegrationcointegration)”)”

www.nobel.se/economics/laureates/2003/www.nobel.se/economics/laureates/2003/

http://www.nobel.se/economics/laureates/2003/

Recent history (pitfalls of forecasting)Recent history (pitfalls of forecasting)

DJIA to March 2000

1980 1985 1990 1995 2000 20050

1

2

3(X 10000)

Recent historyRecent history

DJIA to March 2000Forecasts (GRW)Lower 95%Upper 95%

1980 1985 1990 1995 2000 20050

1

2

3(X 10000)


DJIA to March 2000Forecasts (GRW)Lower 95%Upper 95%DJIA since March 2000

1980 1985 1990 1995 2000 20050

1

2

3(X 10000)

Today’s agendaToday’s agenda⇒⇒ Course introductionCourse introduction

Forecasting tools & principlesForecasting tools & principles

How to obtain data & move it aroundHow to obtain data & move it around

Statistical graphicsStatistical graphics

Forecasts and confidence intervals: the simplest Forecasts and confidence intervals: the simplest case (mean model)case (mean model)

March madnessMarch madness

Course objectivesCourse objectives•• How to use data to predict the future & How to use data to predict the future &

aid decisionaid decision--makingmaking

•• Data acquisition and integrationData acquisition and integration

•• Statistical & graphical data analysisStatistical & graphical data analysis

•• Regression and other forecasting modelsRegression and other forecasting models

•• Time series conceptsTime series concepts

•• Management of forecastingManagement of forecasting

Course mapCourse mapForecasting methods

Statistical Non-statistical

Extrapolative(one variable)

Associative(many variables)

Naive

Decomposition

Smoothing

ARIMA

One equation(regression)

Many equations(econometric)

Nonlinear (data mining via neural nets, classification trees, etc.)

Simulation (what-if)

Subjective(expert consensus, field estimates)

We are mainly here

Betting markets

Course outlineCourse outlineWeek 1: Data concepts & simple models: Week 1: Data concepts & simple models:

linear trend & random walk linear trend & random walk

Week 2: Seasonal adjustment & exponential Week 2: Seasonal adjustment & exponential smoothing (smoothing (HW#1 due Tues 3/27HW#1 due Tues 3/27))

Week 3: Regression Week 3: Regression (HW#2 due Tues 4/3(HW#2 due Tues 4/3))

Week 4: More regression (Week 4: More regression (Quiz on Tues 4/10Quiz on Tues 4/10))

Week 5: ARIMA models (Week 5: ARIMA models (HW#3 due Tues 4/17HW#3 due Tues 4/17))

Week 6: Additional topics (automatic, nonlinear…)Week 6: Additional topics (automatic, nonlinear…)

Final project (Final project (due at end of exam week due at end of exam week ThurThur 5/35/3))

ReadingsReadingsMy notes handed out in class, also on course web My notes handed out in class, also on course web pagepage

faculty.fuqua.duke.edu/~rnau/Decision411CoursePage.htmlfaculty.fuqua.duke.edu/~rnau/Decision411CoursePage.html

PowerpointPowerpoint slides from lecturesslides from lectures

Additional materials on web page, bulletin board, & Additional materials on web page, bulletin board, & CD’sCD’s

Optional stats textbook by Optional stats textbook by SchleiferSchleifer & Bell (or any & Bell (or any other MBAother MBA--level stats textbook)level stats textbook)

http://faculty.fuqua.duke.edu/~rnau/Decision411CoursePage.html

SoftwareSoftware

StatgraphicsStatgraphics XV (in lab & on your PC)XV (in lab & on your PC)

ExcelExcel

Library databases (Library databases (EconomagicEconomagic, etc.), etc.)

GoogleGoogle

Decision 411 CD’sDecision 411 CD’s

Video files that provide a tour of Video files that provide a tour of StatgraphicsStatgraphics& & EconomagicEconomagic on your own PCon your own PC

View with View with CamtasiaCamtasia Player (included on CD) Player (included on CD)

Hit Hit AltAlt--EnterEnter to toggle the control barto toggle the control bar

Bulletin boardBulletin boardMain course bMain course b--board:board:

mba.spring2007_session4.decision411.forecastingmba.spring2007_session4.decision411.forecasting

Will be used for answers to FAQ’s, additional Will be used for answers to FAQ’s, additional comments on lecture topics, & discussions of statistics comments on lecture topics, & discussions of statistics in the news and in the workplacein the news and in the workplace——check it frequentlycheck it frequently

Feel free to post your own examples of Feel free to post your own examples of good/bad/interesting stats (extra credit for class good/bad/interesting stats (extra credit for class participation!)participation!)

Do Do notnot post any post any assignmentassignment--relatedrelated questions.questions.

https://www.fuquaworld.duke.edu/FuquaWorld/main/fwForum2.jsp?forumID=9576

EE--mailmail

If you have a question If you have a question for mefor me, send it by , send it by ee--mailmail rather than posting on a brather than posting on a b--board… board…

…but check main b…but check main b--board first to see it board first to see it has already been asked and answeredhas already been asked and answered

Use a descriptive subject line beginning Use a descriptive subject line beginning with “Forecasting:…”with “Forecasting:…”

Grading basisGrading basis

45% homework (3 assignments)45% homework (3 assignments)

15% quiz15% quiz

30% final project30% final project

10% class participation10% class participation

Study group policyStudy group policy

Work in teams of 2 (max)Work in teams of 2 (max)

Try to find a partner by FridayTry to find a partner by Friday

OK to team up with someone from other sectionOK to team up with someone from other section

Send me eSend me e--mail if still seeking a partnermail if still seeking a partner

Final projectFinal projectFinal project may be based on a data set Final project may be based on a data set and modeling goal of YOUR choiceand modeling goal of YOUR choice

Should get started by 5th week of classShould get started by 5th week of class

Alternatively, there will be several Alternatively, there will be several “designated project” options (essentially a “designated project” options (essentially a fourth homework assignment)fourth homework assignment)

Can work in groups of 2 on final project Can work in groups of 2 on final project as well as regular homeworkas well as regular homework

Honor code issuesHonor code issues

You are encouraged to consult your You are encouraged to consult your classmates for general advice on forecasting classmates for general advice on forecasting concepts and software useconcepts and software use

Specific details of data analysis assignments Specific details of data analysis assignments should be discussed only with your studyshould be discussed only with your study--group partnergroup partner

Don’t post notes on bDon’t post notes on b--board that are at all board that are at all related to assignments prior to due datesrelated to assignments prior to due dates——send any questions to me by esend any questions to me by e--mail.mail.

Suggestions & examples welcome!Suggestions & examples welcome!

If you are interested in particular If you are interested in particular forecasting problems or can suggest forecasting problems or can suggest particular examples that might be useful particular examples that might be useful for classroom discussion, for classroom discussion, please send please send me eme e--mailmail (include data if you have it)(include data if you have it)

Exception: no examples from Exception: no examples from gradedgradedassignments in other ongoing courses!assignments in other ongoing courses!

Today’s agendaToday’s agenda

Course introductionCourse introduction

⇒⇒ Forecasting tools & principlesForecasting tools & principles





How can we predict the future?How can we predict the future?

Look for Look for statistical patternsstatistical patterns that were that were stable in the past stable in the past and which can be and which can be expected to remain stableexpected to remain stable

Extrapolate those patterns into the futureExtrapolate those patterns into the future

“I have seen the future and it is very much “I have seen the future and it is very much like the present, only longer.”like the present, only longer.”

Example: stable mean & varianceExample: stable mean & variance

Time Sequence Plot for XX

0 30 60 90 120 1500

25

50

75

100

Time Sequence Plot for XConstant mean = 49.4977

X

actualforecast95.0% limits

0 25 50 75 100 1250

25

50

75

100

Example: stable trendExample: stable trend

Time Series Plot for YY

0 30 60 90 120 1500

100

200

300

400

Time Sequence Plot for YLinear trend = 94.184 + 1.44936 t

Y


0 30 60 90 120 1500

100

200

300

400

Time Sequence Plot for YRandom walk with drift

Y


0 30 60 90 120 1500

100

200

300

400

Example: stable seasonalityExample: stable seasonalityTime Series Plot for RetailxautoNSA

1/92 1/94 1/96 1/98 1/00 1/02 1/04 1/06 1/081

1.4

1.8

2.2

2.6

3

3.4100000.)

Ret

ailx

auto

NS

A

Example: stable correlationsExample: stable correlations

age

features

price

sqfeet

tax

TransformationsTransformations

Sometimes a stable pattern is not Sometimes a stable pattern is not apparent on a graph of the “raw” dataapparent on a graph of the “raw” data

Transformations of the data (deflation, Transformations of the data (deflation, logging, differencing, seasonal logging, differencing, seasonal adjustment) may help to reveal the adjustment) may help to reveal the underlying patternunderlying pattern

Example: stock pricesExample: stock prices

Pattern: exponential growth curve with 1990’s bubble

LoggedLogged stock pricesstock prices

Natural log transformation linearizes the growth : slope of trend line in logged units is average percentage growth

LoggedLogged stock pricesstock prices

Logged indices since 1990

Logged & Logged & differenceddifferenced stock pricesstock prices

Difference of natural log = percent change between periods

Time Series Plot for adjusted SP500monthclose

1/80 1/84 1/88 1/92 1/96 1/00 1/04 1/08-0.25

-0.15

-0.05

0.05

0.15

0.25

adju

sted

SP

500m

onth

clos

e

Example: U.S. retail sales (excluding autos)

Pattern: strong nominal growth & seasonal pattern

Deflated and seasonally adjusted sales x-autos

Pattern: real growth accelerated in late ’90’s, flattened after March 2000 peak, dipped in September 2001, ramped up again, but recently…?

1/92 1/96 1/00 1/04 1/08810

910

1010

1110

1210

1310

VariablesRetailexautoSA/CPIcityavg

What if patterns are not stable?What if patterns are not stable?

Trends, seasonality, etc., may vary in timeTrends, seasonality, etc., may vary in time

This may limit the amount of past data that This may limit the amount of past data that should be used for fitting the model (don’t should be used for fitting the model (don’t merely use all data “because it is there”)merely use all data “because it is there”)

More sophisticated forecasting models are More sophisticated forecasting models are capable of tracking timecapable of tracking time--varying parametersvarying parameters

Expert opinion can also be used to Expert opinion can also be used to anticipate changes in patternsanticipate changes in patterns

A changing pattern: Housing Starts

Strong seasonal pattern, big drop in last year!

(A few) Forecasting Principles(A few) Forecasting Principles

Use the most Use the most relevantrelevant & & recentrecent data data

Seek Seek diversediverse & & independentindependent data sourcesdata sources

Let model selection be guided by Let model selection be guided by theorytheory and and domain knowledgedomain knowledge, not just “fit” to past data, not just “fit” to past data

Keep It Keep It SimpleSimple

Test Test the assumptions behind the modelthe assumptions behind the model

ValidateValidate the model on holdthe model on hold--out dataout data

Report Report confidence intervalsconfidence intervals with forecastswith forecasts

The “best” forecasting modelThe “best” forecasting modelIs the one that can be expected to make the Is the one that can be expected to make the SMALLEST ERRORS…SMALLEST ERRORS…

…when predicting the FUTURE* …when predicting the FUTURE*

*not always the same thing as giving the best fit to the past!*not always the same thing as giving the best fit to the past!

Is intuitively reasonableIs intuitively reasonableIs no more complicated than necessaryIs no more complicated than necessaryProvides insight into trends & causesProvides insight into trends & causesCan be explained to your boss or clientCan be explained to your boss or client

Forecasting risks (sources of error)Forecasting risks (sources of error)

1.1. Intrinsic risk (random error)Intrinsic risk (random error)2.2. Parameter risk (estimation error)Parameter risk (estimation error)3.3. Model risk (erroneous assumptions)Model risk (erroneous assumptions)

Note: statistical confidence intervals are Note: statistical confidence intervals are based on estimates of based on estimates of intrinsicintrinsic risk and risk and parameterparameter risk, not model riskrisk, not model risk

Intrinsic riskIntrinsic risk

Even the best model cannot be expected to Even the best model cannot be expected to make perfect predictions (“forecasting is hard, make perfect predictions (“forecasting is hard, especially when it’s about the future…”)especially when it’s about the future…”)

Intrinsic risk is measured by error statistics such Intrinsic risk is measured by error statistics such as the “standard error of the estimate” (RMS as the “standard error of the estimate” (RMS error, adjusted for number of coefficients)error, adjusted for number of coefficients)

Intrinsic risk can be reduced, in principle, by Intrinsic risk can be reduced, in principle, by finding a “better” model based on more detailed finding a “better” model based on more detailed assumptions and dataassumptions and data

Parameter riskParameter riskEven if you have the “correct” forecasting model, Even if you have the “correct” forecasting model, its parameters may not be exactly knownits parameters may not be exactly known——they they must be estimated from available datamust be estimated from available data

Parameter risk is measured by standard errors Parameter risk is measured by standard errors and tand t--statistics of model coefficientsstatistics of model coefficients

Parameter risk can be reduced, in principle, by Parameter risk can be reduced, in principle, by using more past data to estimate the modelusing more past data to estimate the model

The “blur of history” problem: older data may be The “blur of history” problem: older data may be “stale” and not reflect current conditions“stale” and not reflect current conditions

Parameter risk is usually a smaller component of Parameter risk is usually a smaller component of forecast error than intrinsic risk or model riskforecast error than intrinsic risk or model risk

Model riskModel riskThis is often the most serious riskThis is often the most serious risk——and its effects and its effects are not taken into account in the calculation of are not taken into account in the calculation of confidence intervalsconfidence intervalsModel risk can be reduced by following good Model risk can be reduced by following good forecasting principles:forecasting principles:Exploratory data analysis to make sure important Exploratory data analysis to make sure important patterns or related variables are not overlookedpatterns or related variables are not overlookedStatistical tests of key assumptionsStatistical tests of key assumptionsOutOut--ofof--sample validation of statistical model sample validation of statistical model Use of domain knowledge and expert judgmentUse of domain knowledge and expert judgment




⇒⇒ How to obtain data & move it aroundHow to obtain data & move it around




Where to get dataWhere to get data

Internet sources (Internet sources (EconomagicEconomagic, library , library databases, government agencies…)databases, government agencies…)

Your corporate databaseYour corporate database

Trade associations & journalsTrade associations & journals

Econometric consulting firmsEconometric consulting firms

Designed experiments and surveysDesigned experiments and surveys

How to move data aroundHow to move data around

Most computer programs use their own Most computer programs use their own idiosyncratic “binary” file formats for storing data idiosyncratic “binary” file formats for storing data (word processors, spreadsheets, stat programs, (word processors, spreadsheets, stat programs, database programs…)database programs…)

All programs must also read and write All programs must also read and write text filestext filesin order to communicate with in order to communicate with peoplepeople

Hence, different programs can always exchange Hence, different programs can always exchange data data with each otherwith each other in the form of text filesin the form of text files

1 1 charactercharacter of text data = 1 of text data = 1 bytebyte of storageof storage

Text filesText filesMay be either “fixed format” or “delimited”May be either “fixed format” or “delimited”

In a fixed format file, data fields are delineated In a fixed format file, data fields are delineated by character position within a lineby character position within a line

xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx

In a delimited file, data fields are separated by In a delimited file, data fields are separated by delimiting characters (commas, tabs, spaces)delimiting characters (commas, tabs, spaces)

xxxxxxxxxx, , xxxxxxxx, , xxxxxxxxxx, , xxxxxxxxxx, ,

StatgraphicsStatgraphics & Excel can easily read tab& Excel can easily read tab-- or or commacomma--delimited files as well as XLS filesdelimited files as well as XLS files

From From EconomagicEconomagic to to StatgraphicsStatgraphics**Save several series to personal workspaceSave several series to personal workspace

Create Excel file or CSV (commaCreate Excel file or CSV (comma--delimited delimited text) filetext) file

Open the file in Excel & clean it up (delete Open the file in Excel & clean it up (delete extraneous rows, add more descriptive extraneous rows, add more descriptive column headings as variable names)column headings as variable names)

Save the cleanedSave the cleaned--up file under a up file under a new namenew name, , CLOSE ITCLOSE IT, and open it in , and open it in StatgraphicsStatgraphics

* See video for details* See video for details





⇒⇒Statistical graphicsStatistical graphics

Forecasts and confidence intervals: the Forecasts and confidence intervals: the simplest case (mean model)simplest case (mean model)



Wizards & integrated plotting procedures Wizards & integrated plotting procedures make charting easymake charting easy

Complex patterns in data can be Complex patterns in data can be uncovered and communicated by following uncovered and communicated by following principles of good graphic designprinciples of good graphic design

Charts can also be boring, confusing, or Charts can also be boring, confusing, or deceptive if produced thoughtlesslydeceptive if produced thoughtlessly

Tufte’sTufte’s graphical principles*graphical principles*Above all else, Above all else, show the datashow the data

Avoid “Avoid “chartjunkchartjunk”: dark grid lines, false ”: dark grid lines, false perspective, unintentional optical art, selfperspective, unintentional optical art, self--promoting graphicspromoting graphics

Maximize the ratio of data ink to nonMaximize the ratio of data ink to non--data inkdata ink

Mobilize every graphical element, perhaps Mobilize every graphical element, perhaps several times over, to show the data (e.g., several times over, to show the data (e.g., data values printed on a bar chart)data values printed on a bar chart)* * The Visual Display of Quantitative InformationThe Visual Display of Quantitative Information by E. by E. TufteTufte

Charts vs. tablesCharts vs. tablesChartsCharts are most effective when data are are most effective when data are numerous and/or multinumerous and/or multi--dimensionaldimensional

If the data are oneIf the data are one--dimensional and not too dimensional and not too numerous, or if numerical details are numerous, or if numerical details are important, a important, a table table may be better than a chartmay be better than a chart

“A table is nearly always better than a dumb “A table is nearly always better than a dumb pie chart; the only worse design than a pie pie chart; the only worse design than a pie chart is several of them”chart is several of them”

Focus attentionFocus attentionDon’t embed important numbers in sentences Don’t embed important numbers in sentences of textof text——set them apart in a table or chart.set them apart in a table or chart.

Treat tables & charts as “paragraphs”, and Treat tables & charts as “paragraphs”, and include them in the narrative at the include them in the narrative at the appropriate pointsappropriate points

Annotate charts with appropriate commentsAnnotate charts with appropriate comments

Maximize data density: “graphs can be Maximize data density: “graphs can be shrunk way down” so that more than one will shrunk way down” so that more than one will fit on a page or slidefit on a page or slide

Excel & Excel & StatgraphicsStatgraphics tipstipsEmbed small, wellEmbed small, well--labeled, welllabeled, well--chosen chosen charts & tables in your reports charts & tables in your reports

Make points and lines thick enough to Make points and lines thick enough to “show the data”“show the data”

Suppress gridlines where not neededSuppress gridlines where not needed

Use an appropriate chart type (e.g., line Use an appropriate chart type (e.g., line plots for time series, plots for time series, scatterplotsscatterplots for crossfor cross--sectional data, bar charts or tables rather sectional data, bar charts or tables rather than pie charts)than pie charts)

Often it is instructive to plot more than one variable on the same graph—here different left and right axis scales were used to align the two series

Economagic will superimpose bars indicating periods of recession

EconomagicEconomagic GIF chartsGIF charts

ScatterplotScatterplot matrixmatrix

age

features

price

sqfeet

tax

Describe/Numeric Variables/Multiple-Variable Analysis

This chart provides detailed views of relationships between many variables that may be helpful in regression analysis

Residual time series plotResidual time series plot

Residuals-vs-time or vs-row-number is an option in Forecasting,Multiple Regression, & Advanced Regression)

Residual Plot for adjusted DJIAtoMarch2000Random walk with drift

1/80 1/85 1/90 1/95 1/00 1/05-0.28

-0.18

-0.08

0.02

0.12

Res

idua

l

“Residuals” are forecast errors within the sample that was fitted by the model.

Look for non-random patterns, changes in variance, outliers (this one is not bad

except for a couple of outliers)

Residual probability plot (vertical)Residual probability plot (vertical)

Plot/Exploratory Plots/Normal Probability Plot (also a residual plot “pane option” in Forecasting & Advanced Regression)

Residual Plot for adjusted DJIAtoMarch2000Random walk with drift

Residual

prop

ortio

n

-0.28 -0.18 -0.08 0.02 0.120.1

15

2050809599

99.9

Deviations from diagonal line reveal non-normality of error

distribution (this one is not bad, except for two negative outliers)

Residual autocorrelation plotResidual autocorrelation plotResidual Autocorrelations for RSJEWEL

Winter's exp. smoothing with alpha = 0.7587, beta = 0.0289, gamma = 0.7882

lag

Aut

ocor

rela

tions

0 5 10 15 20 25-1

-0.6

-0.2

0.2

0.6

1

Ideally all the autocorrelation bars should be within the red 95% significance bands. This plot shows significant autocorrelation at

lag 12, indicating a poor fit to the seasonal pattern in the data.

Today’s agendaToday’s agendaCourse introductionCourse introduction




⇒⇒ Forecasts and confidence intervals: the Forecasts and confidence intervals: the simplest case (mean model)simplest case (mean model)


Consider the following time series:Consider the following time series:

Time Series Plot for XX

0 4 8 12 16 2050

70

90

110

130

150

170

How to forecast?How to forecast?If you have reason to believe the observations If you have reason to believe the observations are are statistically independentstatistically independent and and identically identically distributeddistributed, with , with no trend*no trend*, the appropriate , the appropriate forecasting model is the MEAN modelforecasting model is the MEAN model

Just predict that future observations will equal Just predict that future observations will equal the the meanmean of the past valuesof the past values

*These assumptions might be based on domain knowledge, or else they could be tested by comparing alternative models and looking at autocorrelations, etc..

Stats review: sampling from a Stats review: sampling from a populationpopulation

XX = random variable, = random variable, nn = sample size= sample size

populationpopulation mean & standard deviationmean & standard deviation

== samplesample mean (AVERAGE)mean (AVERAGE)

== sample sample std. dev. (STDEV)std. dev. (STDEV)

=σμ,

)()(

1

21

−−= ∑ =

nXxS

ni i

nxX ni i∑= =1

Standard error of the meanStandard error of the mean

This is the estimated standard deviation of This is the estimated standard deviation of the “sampling distribution of the mean”the “sampling distribution of the mean”

It measures the It measures the precisionprecision of our estimate of our estimate of the (unknown) population meanof the (unknown) population mean

As As nn gets larger, gets larger, SESEmeanmean gets smaller and gets smaller and the sampling distribution becomes normal*the sampling distribution becomes normal*

*Central Limit Theorem*Central Limit Theorem

nSSEmean =

Std. deviation vs. std. error?Std. deviation vs. std. error?The term “standard deviation” (usually) The term “standard deviation” (usually) refers to the refers to the actual actual rootroot--meanmean--squared squared deviation of a given population or sample deviation of a given population or sample around its meanaround its mean

The term “standard error” refers to the The term “standard error” refers to the expectedexpected rootroot--meanmean--squared deviation of squared deviation of an estimate or forecast around the true an estimate or forecast around the true value under repeated samplingvalue under repeated sampling----i.e., the i.e., the “standard deviation of the error”“standard deviation of the error”

Forecasting with the mean modelForecasting with the mean model

Let denote a Let denote a forecastforecast of of xxn+n+11 based on based on data observed up to period data observed up to period nn

If If xxn+n+11 is assumed to be independently is assumed to be independently drawn from the same population as the drawn from the same population as the sample sample xx11, …, , …, xxnn, the forecast that , the forecast that minimizes mean squared error is simply minimizes mean squared error is simply the sample mean:the sample mean:

1+nx̂

Xxn =+1ˆ

Forecast standard errorForecast standard errorThe The standard error of the forecaststandard error of the forecast has two has two components:components:

2 2 11fcst meanSE S SE S n= + = +

This term measures the intrinsic risk

(“noise” in the data)This term measures the parameter risk

(error in estimating the “signal” in the data)

Note that variances, rather than standard deviations, are additive

For the mean model, the result is that the forecast standard error is slightly larger than the sample

standard deviation

Confidence intervals for forecastsConfidence intervals for forecastsA point forecast should always be accompanied by a A point forecast should always be accompanied by a confidence intervalconfidence interval to indicate its accuracy… but to indicate its accuracy… but what what isis a confidence interval??a confidence interval??

An x% confidence interval is an interval calculated An x% confidence interval is an interval calculated by a by a rulerule which has the property that the interval will which has the property that the interval will cover the true value x% of the time under cover the true value x% of the time under simulatedsimulatedconditions, conditions, assuming the model is correctassuming the model is correct..

Loosely speakingLoosely speaking, there is an x% chance that , there is an x% chance that your your data will fall in data will fall in youryour x% confidence intervalx% confidence interval——but but only if your model and its underlying assumptions only if your model and its underlying assumptions are correct! (This is why we test assumptions.)are correct! (This is why we test assumptions.)

Confidence interval = Confidence interval = point forecast point forecast ±± tt standard errorsstandard errors

If the distribution of forecast errors is assumed If the distribution of forecast errors is assumed to be to be normalnormal, a , a 95% confidence interval95% confidence interval for for the forecast isthe forecast is

…where is the critical value of the …where is the critical value of the “Student’s “Student’s tt” distribution” distribution** with a tail with a tail probability of .05 and probability of .05 and nn−−1 “degrees of 1 “degrees of freedom” (in Excel, = TINV(.05,freedom” (in Excel, = TINV(.05,nn−−1))1))

fcstnn SEtx 1051 −+ ± ,.ˆ

105 −nt ,.

105 −nt ,.

*discovered by W.S. Gossett of Guinness Brewery

en.wikipedia.org/wiki/William_Sealey_Gosset

http://en.wikipedia.org/wiki/William_Sealey_Gosset

tt vs. normal distributionvs. normal distribution

The The tt distribution is the distribution ofdistribution is the distribution of

i.e., the number of “standard errors from i.e., the number of “standard errors from the true mean” when the standard the true mean” when the standard deviation is unknown.deviation is unknown.

meanSEX )( μ−

The The tt distribution resembles a standard distribution resembles a standard normal (normal (zz) distribution but with “fatter tails” ) distribution but with “fatter tails” for small for small nn

Normal vs. t: much difference?

-4 -3 -2 -1 0 1 2 3 4

Normal t with 20 df t with 10 df t with 5 df

# standard errors # standard errors ±± computed from normal and computed from normal and ttdistributions are distributions are very closevery close except for very low except for very low

d.fd.f. or very high confidence. or very high confidence

Confidence level (2Confidence level (2--sided)sided)d.fd.f.. 90.0%90.0% 95.0%95.0% 99.0%99.0% 99.5%99.5% 99.9%99.9%Normal Normal 1.6451.645 1.9601.960 2.5762.576 2.8072.807 3.2913.291

200200 1.6531.653 1.9721.972 2.6012.601 2.8382.838 3.3403.340100100 1.6601.660 1.9841.984 2.6262.626 2.8712.871 3.3903.390

5050 1.6761.676 2.0092.009 2.6782.678 2.9372.937 3.4963.4962020 1.7251.725 2.0862.086 2.8452.845 3.1533.153 3.8503.8501010 1.8121.812 2.2282.228 3.1693.169 3.5813.581 4.5874.587

Empirical rules of thumbEmpirical rules of thumbFor For n n ≈≈ 20 or more, the critical 20 or more, the critical tt value is value is approximately 2, so the approximately 2, so the ““empiricalempirical”” 95% CI is 95% CI is roughly the point forecast roughly the point forecast plus or minus two plus or minus two standard errors, standard errors, howeverhowever……

A prediction interval that covers 95% of the A prediction interval that covers 95% of the data is often data is often too widetoo wide to be managerially to be managerially usefuluseful——50% (a 50% (a ““coin flipcoin flip””) or 80% might be ) or 80% might be easier for a manager to understandeasier for a manager to understand

A 50% confidence interval is roughly A 50% confidence interval is roughly plus or plus or minus twominus two--thirds of a standard errorthirds of a standard error

Example, continuedExample, continued

Time series X (Time series X (nn=20*, =20*, d.f.d.f. =19**): =19**):

114, 126, 123, 112, 68, 116, 50, 108, 163, 79114, 126, 123, 112, 68, 116, 50, 108, 163, 7967, 98, 131, 83, 56, 109, 81, 61, 90, 9267, 98, 131, 83, 56, 109, 81, 61, 90, 92

96.28,35.96:Statistics == SX48.620/96.28 ==meanSE

30,100:parametersTrue* == σμ

68.2948.696.28 22 =+=fcstSE

Confidence intervals for predictionsConfidence intervals for predictions

05 19 2 093. ,*t .=

Exact 95% CI* = 96.35 Exact 95% CI* = 96.35 ±± 2.093 2.093 ×× 29.6829.68= [34.2, 158.5]= [34.2, 158.5]

Exact 50% CI** = 96.35 Exact 50% CI** = 96.35 ±± 0.688 0.688 ×× 29.6829.68= = [77.8, 114.9][77.8, 114.9]

5 19 0 688. ,**t .=

StatgraphicsStatgraphics output: mean modeloutput: mean modelTime Sequence Plot for X

Constant mean = 96.35

X


0 10 20 30 4025

50

75

100

125

150

175

StatgraphicsStatgraphics output: mean modeloutput: mean modelTime Sequence Plot for X

Constant mean = 96.35

X


0 10 20 30 4025

50

75

100

125

150

175

A 50% confidence interval is 1/3 the width of a 95% A 50% confidence interval is 1/3 the width of a 95% confidence interval.confidence interval.

What if there’s really a trend?What if there’s really a trend?Time Sequence Plot for XLinear trend = 114.611 + -1.7391 t

X


0 10 20 30 4025

50

75

100

125

150

175

That’s a different modeling assumption, and it leads to That’s a different modeling assumption, and it leads to very different forecasts and confidence intervals.very different forecasts and confidence intervals.

Actually, t=1.61 for slope coefficient, so this model would be rejected at .05 level of significance.

Yes, it’s simple, but...Yes, it’s simple, but...The mean model is the foundation for more The mean model is the foundation for more sophisticated models we will encounter later sophisticated models we will encounter later (RW, regression, ARIMA)(RW, regression, ARIMA)

It has the same generic features:It has the same generic features:→→ A A coefficientcoefficient to be estimatedto be estimated→→ A A standard errorstandard error for the coefficient for the coefficient

that reflects parameter riskthat reflects parameter risk→→ A A forecast standard errorforecast standard error that that

reflects intrinsic risk & parameter riskreflects intrinsic risk & parameter risk→→ …and model risk too!…and model risk too!







⇒⇒ March madnessMarch madness

MarketMarket--based forecastingbased forecastingBetting markets are often an efficient way to Betting markets are often an efficient way to aggregate diverse opinions (and to share aggregate diverse opinions (and to share risks… or have fun)risks… or have fun)Probabilistic forecasts derived from contract Probabilistic forecasts derived from contract prices are often wellprices are often well--calibratedcalibratedCaveats: markets don’t Caveats: markets don’t alwaysalways workwork——may may exhibit “herding” or distortions when bettors exhibit “herding” or distortions when bettors lack independent information or have highly lack independent information or have highly correlated personal stakes in eventscorrelated personal stakes in eventsSome applications are controversial (e.g. Some applications are controversial (e.g. “terrorism futures”)“terrorism futures”)

March madness: forecasting basketball March madness: forecasting basketball games via a betting marketgames via a betting market

These are the price quotes on Tradesports.com at 10am on Monday, March 19, 2007.

As of Monday…As of Monday…

Florida’s stock is rising…

Recap of today’s topicsRecap of today’s topics





•• Forecasts and confidence intervals: the simplest Forecasts and confidence intervals: the simplest case (mean model)case (mean model)

•• March madnessMarch madness

decision 411: forecasting - fuqua school of businessrnau/decision411_2007/411class01_… ·...

Documents