decision 411: forecasting - fuqua school of businessrnau/decision411_2007/411class01_… ·...
TRANSCRIPT
Decision 411: ForecastingDecision 411: Forecasting
Professor: Bob NauProfessor: Bob Nau
Course content:Course content:How to predict the futureHow to predict the futureHow to learn from the pastHow to learn from the past…using data analysis…using data analysis
Who should be interested:Who should be interested:Anyone on a quantitative career track (financial Anyone on a quantitative career track (financial investments, marketing research, consulting, investments, marketing research, consulting, operations, accounting, econometrics, operations, accounting, econometrics, engineering, environmental science …)engineering, environmental science …)
Anyone who wants more experience in Anyone who wants more experience in computer modeling & data analysiscomputer modeling & data analysis
Anyone who needs to make decisions based Anyone who needs to make decisions based on forecasts provided by otherson forecasts provided by others
Forecasts are used at every Forecasts are used at every organizational levelorganizational level
Corporate Strategy
FinanceMarketing Accounting
Production, Operations & Supply Chain
Sales
Many numbers…. or one number?
2003 Nobel 2003 Nobel Prize(sPrize(s) in Economics ) in Economics awarded for forecasting methodsawarded for forecasting methods
Robert F. EngleRobert F. Engle“for methods of analyzing economic time “for methods of analyzing economic time series with timeseries with time--varying volatility (ARCH)”varying volatility (ARCH)”
Clive W.J. GrangerClive W.J. Granger"for methods of analyzing economic time "for methods of analyzing economic time series with common trends (series with common trends (cointegrationcointegration)”)”
www.nobel.se/economics/laureates/2003/www.nobel.se/economics/laureates/2003/
Recent history (pitfalls of forecasting)Recent history (pitfalls of forecasting)
DJIA to March 2000
1980 1985 1990 1995 2000 20050
1
2
3(X 10000)
Recent historyRecent history
DJIA to March 2000Forecasts (GRW)Lower 95%Upper 95%
1980 1985 1990 1995 2000 20050
1
2
3(X 10000)
Recent historyRecent history
Recent historyRecent history
DJIA to March 2000Forecasts (GRW)Lower 95%Upper 95%DJIA since March 2000
1980 1985 1990 1995 2000 20050
1
2
3(X 10000)
Today’s agendaToday’s agenda⇒⇒ Course introductionCourse introduction
Forecasting tools & principlesForecasting tools & principles
How to obtain data & move it aroundHow to obtain data & move it around
Statistical graphicsStatistical graphics
Forecasts and confidence intervals: the simplest Forecasts and confidence intervals: the simplest case (mean model)case (mean model)
March madnessMarch madness
Course objectivesCourse objectives•• How to use data to predict the future & How to use data to predict the future &
aid decisionaid decision--makingmaking
•• Data acquisition and integrationData acquisition and integration
•• Statistical & graphical data analysisStatistical & graphical data analysis
•• Regression and other forecasting modelsRegression and other forecasting models
•• Time series conceptsTime series concepts
•• Management of forecastingManagement of forecasting
Course mapCourse mapForecasting methods
Statistical Non-statistical
Extrapolative(one variable)
Associative(many variables)
Naive
Decomposition
Smoothing
ARIMA
One equation(regression)
Many equations(econometric)
Nonlinear (data mining via neural nets, classification trees, etc.)
Simulation (what-if)
Subjective(expert consensus, field estimates)
We are mainly here
Betting markets
Course outlineCourse outlineWeek 1: Data concepts & simple models: Week 1: Data concepts & simple models:
linear trend & random walk linear trend & random walk
Week 2: Seasonal adjustment & exponential Week 2: Seasonal adjustment & exponential smoothing (smoothing (HW#1 due Tues 3/27HW#1 due Tues 3/27))
Week 3: Regression Week 3: Regression (HW#2 due Tues 4/3(HW#2 due Tues 4/3))
Week 4: More regression (Week 4: More regression (Quiz on Tues 4/10Quiz on Tues 4/10))
Week 5: ARIMA models (Week 5: ARIMA models (HW#3 due Tues 4/17HW#3 due Tues 4/17))
Week 6: Additional topics (automatic, nonlinear…)Week 6: Additional topics (automatic, nonlinear…)
Final project (Final project (due at end of exam week due at end of exam week ThurThur 5/35/3))
ReadingsReadingsMy notes handed out in class, also on course web My notes handed out in class, also on course web pagepage
faculty.fuqua.duke.edu/~rnau/Decision411CoursePage.htmlfaculty.fuqua.duke.edu/~rnau/Decision411CoursePage.html
PowerpointPowerpoint slides from lecturesslides from lectures
Additional materials on web page, bulletin board, & Additional materials on web page, bulletin board, & CD’sCD’s
Optional stats textbook by Optional stats textbook by SchleiferSchleifer & Bell (or any & Bell (or any other MBAother MBA--level stats textbook)level stats textbook)
SoftwareSoftware
StatgraphicsStatgraphics XV (in lab & on your PC)XV (in lab & on your PC)
ExcelExcel
Library databases (Library databases (EconomagicEconomagic, etc.), etc.)
GoogleGoogle
Decision 411 CD’sDecision 411 CD’s
Video files that provide a tour of Video files that provide a tour of StatgraphicsStatgraphics& & EconomagicEconomagic on your own PCon your own PC
View with View with CamtasiaCamtasia Player (included on CD) Player (included on CD)
Hit Hit AltAlt--EnterEnter to toggle the control barto toggle the control bar
Bulletin boardBulletin boardMain course bMain course b--board:board:
mba.spring2007_session4.decision411.forecastingmba.spring2007_session4.decision411.forecasting
Will be used for answers to FAQ’s, additional Will be used for answers to FAQ’s, additional comments on lecture topics, & discussions of statistics comments on lecture topics, & discussions of statistics in the news and in the workplacein the news and in the workplace——check it frequentlycheck it frequently
Feel free to post your own examples of Feel free to post your own examples of good/bad/interesting stats (extra credit for class good/bad/interesting stats (extra credit for class participation!)participation!)
Do Do notnot post any post any assignmentassignment--relatedrelated questions.questions.
EE--mailmail
If you have a question If you have a question for mefor me, send it by , send it by ee--mailmail rather than posting on a brather than posting on a b--board… board…
…but check main b…but check main b--board first to see it board first to see it has already been asked and answeredhas already been asked and answered
Use a descriptive subject line beginning Use a descriptive subject line beginning with “Forecasting:…”with “Forecasting:…”
Grading basisGrading basis
45% homework (3 assignments)45% homework (3 assignments)
15% quiz15% quiz
30% final project30% final project
10% class participation10% class participation
Study group policyStudy group policy
Work in teams of 2 (max)Work in teams of 2 (max)
Try to find a partner by FridayTry to find a partner by Friday
OK to team up with someone from other sectionOK to team up with someone from other section
Send me eSend me e--mail if still seeking a partnermail if still seeking a partner
Final projectFinal projectFinal project may be based on a data set Final project may be based on a data set and modeling goal of YOUR choiceand modeling goal of YOUR choice
Should get started by 5th week of classShould get started by 5th week of class
Alternatively, there will be several Alternatively, there will be several “designated project” options (essentially a “designated project” options (essentially a fourth homework assignment)fourth homework assignment)
Can work in groups of 2 on final project Can work in groups of 2 on final project as well as regular homeworkas well as regular homework
Honor code issuesHonor code issues
You are encouraged to consult your You are encouraged to consult your classmates for general advice on forecasting classmates for general advice on forecasting concepts and software useconcepts and software use
Specific details of data analysis assignments Specific details of data analysis assignments should be discussed only with your studyshould be discussed only with your study--group partnergroup partner
Don’t post notes on bDon’t post notes on b--board that are at all board that are at all related to assignments prior to due datesrelated to assignments prior to due dates——send any questions to me by esend any questions to me by e--mail.mail.
Suggestions & examples welcome!Suggestions & examples welcome!
If you are interested in particular If you are interested in particular forecasting problems or can suggest forecasting problems or can suggest particular examples that might be useful particular examples that might be useful for classroom discussion, for classroom discussion, please send please send me eme e--mailmail (include data if you have it)(include data if you have it)
Exception: no examples from Exception: no examples from gradedgradedassignments in other ongoing courses!assignments in other ongoing courses!
Today’s agendaToday’s agenda
Course introductionCourse introduction
⇒⇒ Forecasting tools & principlesForecasting tools & principles
How to obtain data & move it aroundHow to obtain data & move it around
Statistical graphicsStatistical graphics
Forecasts and confidence intervals: the simplest Forecasts and confidence intervals: the simplest case (mean model)case (mean model)
March madnessMarch madness
How can we predict the future?How can we predict the future?
Look for Look for statistical patternsstatistical patterns that were that were stable in the past stable in the past and which can be and which can be expected to remain stableexpected to remain stable
Extrapolate those patterns into the futureExtrapolate those patterns into the future
“I have seen the future and it is very much “I have seen the future and it is very much like the present, only longer.”like the present, only longer.”
Example: stable mean & varianceExample: stable mean & variance
Time Sequence Plot for XX
0 30 60 90 120 1500
25
50
75
100
Time Sequence Plot for XConstant mean = 49.4977
X
actualforecast95.0% limits
0 25 50 75 100 1250
25
50
75
100
Example: stable trendExample: stable trend
Time Series Plot for YY
0 30 60 90 120 1500
100
200
300
400
Time Sequence Plot for YLinear trend = 94.184 + 1.44936 t
Y
actualforecast95.0% limits
0 30 60 90 120 1500
100
200
300
400
Time Sequence Plot for YRandom walk with drift
Y
actualforecast95.0% limits
0 30 60 90 120 1500
100
200
300
400
Example: stable seasonalityExample: stable seasonalityTime Series Plot for RetailxautoNSA
1/92 1/94 1/96 1/98 1/00 1/02 1/04 1/06 1/081
1.4
1.8
2.2
2.6
3
3.4100000.)
Ret
ailx
auto
NS
A
Example: stable correlationsExample: stable correlations
age
features
price
sqfeet
tax
TransformationsTransformations
Sometimes a stable pattern is not Sometimes a stable pattern is not apparent on a graph of the “raw” dataapparent on a graph of the “raw” data
Transformations of the data (deflation, Transformations of the data (deflation, logging, differencing, seasonal logging, differencing, seasonal adjustment) may help to reveal the adjustment) may help to reveal the underlying patternunderlying pattern
Example: stock pricesExample: stock prices
Pattern: exponential growth curve with 1990’s bubble
LoggedLogged stock pricesstock prices
Natural log transformation linearizes the growth : slope of trend line in logged units is average percentage growth
LoggedLogged stock pricesstock prices
Logged indices since 1990
Logged & Logged & differenceddifferenced stock pricesstock prices
Difference of natural log = percent change between periods
Time Series Plot for adjusted SP500monthclose
1/80 1/84 1/88 1/92 1/96 1/00 1/04 1/08-0.25
-0.15
-0.05
0.05
0.15
0.25
adju
sted
SP
500m
onth
clos
e
Example: U.S. retail sales (excluding autos)
Pattern: strong nominal growth & seasonal pattern
Deflated and seasonally adjusted sales x-autos
Pattern: real growth accelerated in late ’90’s, flattened after March 2000 peak, dipped in September 2001, ramped up again, but recently…?
1/92 1/96 1/00 1/04 1/08810
910
1010
1110
1210
1310
VariablesRetailexautoSA/CPIcityavg
What if patterns are not stable?What if patterns are not stable?
Trends, seasonality, etc., may vary in timeTrends, seasonality, etc., may vary in time
This may limit the amount of past data that This may limit the amount of past data that should be used for fitting the model (don’t should be used for fitting the model (don’t merely use all data “because it is there”)merely use all data “because it is there”)
More sophisticated forecasting models are More sophisticated forecasting models are capable of tracking timecapable of tracking time--varying parametersvarying parameters
Expert opinion can also be used to Expert opinion can also be used to anticipate changes in patternsanticipate changes in patterns
A changing pattern: Housing Starts
Strong seasonal pattern, big drop in last year!
(A few) Forecasting Principles(A few) Forecasting Principles
Use the most Use the most relevantrelevant & & recentrecent data data
Seek Seek diversediverse & & independentindependent data sourcesdata sources
Let model selection be guided by Let model selection be guided by theorytheory and and domain knowledgedomain knowledge, not just “fit” to past data, not just “fit” to past data
Keep It Keep It SimpleSimple
Test Test the assumptions behind the modelthe assumptions behind the model
ValidateValidate the model on holdthe model on hold--out dataout data
Report Report confidence intervalsconfidence intervals with forecastswith forecasts
The “best” forecasting modelThe “best” forecasting modelIs the one that can be expected to make the Is the one that can be expected to make the SMALLEST ERRORS…SMALLEST ERRORS…
…when predicting the FUTURE* …when predicting the FUTURE*
*not always the same thing as giving the best fit to the past!*not always the same thing as giving the best fit to the past!
Is intuitively reasonableIs intuitively reasonableIs no more complicated than necessaryIs no more complicated than necessaryProvides insight into trends & causesProvides insight into trends & causesCan be explained to your boss or clientCan be explained to your boss or client
Forecasting risks (sources of error)Forecasting risks (sources of error)
1.1. Intrinsic risk (random error)Intrinsic risk (random error)2.2. Parameter risk (estimation error)Parameter risk (estimation error)3.3. Model risk (erroneous assumptions)Model risk (erroneous assumptions)
Note: statistical confidence intervals are Note: statistical confidence intervals are based on estimates of based on estimates of intrinsicintrinsic risk and risk and parameterparameter risk, not model riskrisk, not model risk
Intrinsic riskIntrinsic risk
Even the best model cannot be expected to Even the best model cannot be expected to make perfect predictions (“forecasting is hard, make perfect predictions (“forecasting is hard, especially when it’s about the future…”)especially when it’s about the future…”)
Intrinsic risk is measured by error statistics such Intrinsic risk is measured by error statistics such as the “standard error of the estimate” (RMS as the “standard error of the estimate” (RMS error, adjusted for number of coefficients)error, adjusted for number of coefficients)
Intrinsic risk can be reduced, in principle, by Intrinsic risk can be reduced, in principle, by finding a “better” model based on more detailed finding a “better” model based on more detailed assumptions and dataassumptions and data
Parameter riskParameter riskEven if you have the “correct” forecasting model, Even if you have the “correct” forecasting model, its parameters may not be exactly knownits parameters may not be exactly known——they they must be estimated from available datamust be estimated from available data
Parameter risk is measured by standard errors Parameter risk is measured by standard errors and tand t--statistics of model coefficientsstatistics of model coefficients
Parameter risk can be reduced, in principle, by Parameter risk can be reduced, in principle, by using more past data to estimate the modelusing more past data to estimate the model
The “blur of history” problem: older data may be The “blur of history” problem: older data may be “stale” and not reflect current conditions“stale” and not reflect current conditions
Parameter risk is usually a smaller component of Parameter risk is usually a smaller component of forecast error than intrinsic risk or model riskforecast error than intrinsic risk or model risk
Model riskModel riskThis is often the most serious riskThis is often the most serious risk——and its effects and its effects are not taken into account in the calculation of are not taken into account in the calculation of confidence intervalsconfidence intervalsModel risk can be reduced by following good Model risk can be reduced by following good forecasting principles:forecasting principles:Exploratory data analysis to make sure important Exploratory data analysis to make sure important patterns or related variables are not overlookedpatterns or related variables are not overlookedStatistical tests of key assumptionsStatistical tests of key assumptionsOutOut--ofof--sample validation of statistical model sample validation of statistical model Use of domain knowledge and expert judgmentUse of domain knowledge and expert judgment
Today’s agendaToday’s agenda
Course introductionCourse introduction
Forecasting tools & principlesForecasting tools & principles
⇒⇒ How to obtain data & move it aroundHow to obtain data & move it around
Statistical graphicsStatistical graphics
Forecasts and confidence intervals: the simplest Forecasts and confidence intervals: the simplest case (mean model)case (mean model)
March madnessMarch madness
Where to get dataWhere to get data
Internet sources (Internet sources (EconomagicEconomagic, library , library databases, government agencies…)databases, government agencies…)
Your corporate databaseYour corporate database
Trade associations & journalsTrade associations & journals
Econometric consulting firmsEconometric consulting firms
Designed experiments and surveysDesigned experiments and surveys
How to move data aroundHow to move data around
Most computer programs use their own Most computer programs use their own idiosyncratic “binary” file formats for storing data idiosyncratic “binary” file formats for storing data (word processors, spreadsheets, stat programs, (word processors, spreadsheets, stat programs, database programs…)database programs…)
All programs must also read and write All programs must also read and write text filestext filesin order to communicate with in order to communicate with peoplepeople
Hence, different programs can always exchange Hence, different programs can always exchange data data with each otherwith each other in the form of text filesin the form of text files
1 1 charactercharacter of text data = 1 of text data = 1 bytebyte of storageof storage
Text filesText filesMay be either “fixed format” or “delimited”May be either “fixed format” or “delimited”
In a fixed format file, data fields are delineated In a fixed format file, data fields are delineated by character position within a lineby character position within a line
xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx xxxxxxxxxx
In a delimited file, data fields are separated by In a delimited file, data fields are separated by delimiting characters (commas, tabs, spaces)delimiting characters (commas, tabs, spaces)
xxxxxxxxxx, , xxxxxxxx, , xxxxxxxxxx, , xxxxxxxxxx, ,
StatgraphicsStatgraphics & Excel can easily read tab& Excel can easily read tab-- or or commacomma--delimited files as well as XLS filesdelimited files as well as XLS files
From From EconomagicEconomagic to to StatgraphicsStatgraphics**Save several series to personal workspaceSave several series to personal workspace
Create Excel file or CSV (commaCreate Excel file or CSV (comma--delimited delimited text) filetext) file
Open the file in Excel & clean it up (delete Open the file in Excel & clean it up (delete extraneous rows, add more descriptive extraneous rows, add more descriptive column headings as variable names)column headings as variable names)
Save the cleanedSave the cleaned--up file under a up file under a new namenew name, , CLOSE ITCLOSE IT, and open it in , and open it in StatgraphicsStatgraphics
* See video for details* See video for details
Today’s agendaToday’s agenda
Course introductionCourse introduction
Forecasting tools & principlesForecasting tools & principles
How to obtain data & move it aroundHow to obtain data & move it around
⇒⇒Statistical graphicsStatistical graphics
Forecasts and confidence intervals: the Forecasts and confidence intervals: the simplest case (mean model)simplest case (mean model)
March madnessMarch madness
Statistical graphicsStatistical graphics
Wizards & integrated plotting procedures Wizards & integrated plotting procedures make charting easymake charting easy
Complex patterns in data can be Complex patterns in data can be uncovered and communicated by following uncovered and communicated by following principles of good graphic designprinciples of good graphic design
Charts can also be boring, confusing, or Charts can also be boring, confusing, or deceptive if produced thoughtlesslydeceptive if produced thoughtlessly
Tufte’sTufte’s graphical principles*graphical principles*Above all else, Above all else, show the datashow the data
Avoid “Avoid “chartjunkchartjunk”: dark grid lines, false ”: dark grid lines, false perspective, unintentional optical art, selfperspective, unintentional optical art, self--promoting graphicspromoting graphics
Maximize the ratio of data ink to nonMaximize the ratio of data ink to non--data inkdata ink
Mobilize every graphical element, perhaps Mobilize every graphical element, perhaps several times over, to show the data (e.g., several times over, to show the data (e.g., data values printed on a bar chart)data values printed on a bar chart)* * The Visual Display of Quantitative InformationThe Visual Display of Quantitative Information by E. by E. TufteTufte
Charts vs. tablesCharts vs. tablesChartsCharts are most effective when data are are most effective when data are numerous and/or multinumerous and/or multi--dimensionaldimensional
If the data are oneIf the data are one--dimensional and not too dimensional and not too numerous, or if numerical details are numerous, or if numerical details are important, a important, a table table may be better than a chartmay be better than a chart
“A table is nearly always better than a dumb “A table is nearly always better than a dumb pie chart; the only worse design than a pie pie chart; the only worse design than a pie chart is several of them”chart is several of them”
Focus attentionFocus attentionDon’t embed important numbers in sentences Don’t embed important numbers in sentences of textof text——set them apart in a table or chart.set them apart in a table or chart.
Treat tables & charts as “paragraphs”, and Treat tables & charts as “paragraphs”, and include them in the narrative at the include them in the narrative at the appropriate pointsappropriate points
Annotate charts with appropriate commentsAnnotate charts with appropriate comments
Maximize data density: “graphs can be Maximize data density: “graphs can be shrunk way down” so that more than one will shrunk way down” so that more than one will fit on a page or slidefit on a page or slide
Excel & Excel & StatgraphicsStatgraphics tipstipsEmbed small, wellEmbed small, well--labeled, welllabeled, well--chosen chosen charts & tables in your reports charts & tables in your reports
Make points and lines thick enough to Make points and lines thick enough to “show the data”“show the data”
Suppress gridlines where not neededSuppress gridlines where not needed
Use an appropriate chart type (e.g., line Use an appropriate chart type (e.g., line plots for time series, plots for time series, scatterplotsscatterplots for crossfor cross--sectional data, bar charts or tables rather sectional data, bar charts or tables rather than pie charts)than pie charts)
Often it is instructive to plot more than one variable on the same graph—here different left and right axis scales were used to align the two series
Economagic will superimpose bars indicating periods of recession
EconomagicEconomagic GIF chartsGIF charts
ScatterplotScatterplot matrixmatrix
age
features
price
sqfeet
tax
Describe/Numeric Variables/Multiple-Variable Analysis
This chart provides detailed views of relationships between many variables that may be helpful in regression analysis
Residual time series plotResidual time series plot
Residuals-vs-time or vs-row-number is an option in Forecasting,Multiple Regression, & Advanced Regression)
Residual Plot for adjusted DJIAtoMarch2000Random walk with drift
1/80 1/85 1/90 1/95 1/00 1/05-0.28
-0.18
-0.08
0.02
0.12
Res
idua
l
“Residuals” are forecast errors within the sample that was fitted by the model.
Look for non-random patterns, changes in variance, outliers (this one is not bad
except for a couple of outliers)
Residual probability plot (vertical)Residual probability plot (vertical)
Plot/Exploratory Plots/Normal Probability Plot (also a residual plot “pane option” in Forecasting & Advanced Regression)
Residual Plot for adjusted DJIAtoMarch2000Random walk with drift
Residual
prop
ortio
n
-0.28 -0.18 -0.08 0.02 0.120.1
15
2050809599
99.9
Deviations from diagonal line reveal non-normality of error
distribution (this one is not bad, except for two negative outliers)
Residual autocorrelation plotResidual autocorrelation plotResidual Autocorrelations for RSJEWEL
Winter's exp. smoothing with alpha = 0.7587, beta = 0.0289, gamma = 0.7882
lag
Aut
ocor
rela
tions
0 5 10 15 20 25-1
-0.6
-0.2
0.2
0.6
1
Ideally all the autocorrelation bars should be within the red 95% significance bands. This plot shows significant autocorrelation at
lag 12, indicating a poor fit to the seasonal pattern in the data.
Today’s agendaToday’s agendaCourse introductionCourse introduction
Forecasting tools & principlesForecasting tools & principles
How to obtain data & move it aroundHow to obtain data & move it around
Statistical graphicsStatistical graphics
⇒⇒ Forecasts and confidence intervals: the Forecasts and confidence intervals: the simplest case (mean model)simplest case (mean model)
March madnessMarch madness
Consider the following time series:Consider the following time series:
Time Series Plot for XX
0 4 8 12 16 2050
70
90
110
130
150
170
How to forecast?How to forecast?If you have reason to believe the observations If you have reason to believe the observations are are statistically independentstatistically independent and and identically identically distributeddistributed, with , with no trend*no trend*, the appropriate , the appropriate forecasting model is the MEAN modelforecasting model is the MEAN model
Just predict that future observations will equal Just predict that future observations will equal the the meanmean of the past valuesof the past values
*These assumptions might be based on domain knowledge, or else they could be tested by comparing alternative models and looking at autocorrelations, etc..
Stats review: sampling from a Stats review: sampling from a populationpopulation
XX = random variable, = random variable, nn = sample size= sample size
populationpopulation mean & standard deviationmean & standard deviation
== samplesample mean (AVERAGE)mean (AVERAGE)
== sample sample std. dev. (STDEV)std. dev. (STDEV)
=σμ,
)()(
1
21
−−= ∑ =
nXxS
ni i
nxX ni i∑= =1
Standard error of the meanStandard error of the mean
This is the estimated standard deviation of This is the estimated standard deviation of the “sampling distribution of the mean”the “sampling distribution of the mean”
It measures the It measures the precisionprecision of our estimate of our estimate of the (unknown) population meanof the (unknown) population mean
As As nn gets larger, gets larger, SESEmeanmean gets smaller and gets smaller and the sampling distribution becomes normal*the sampling distribution becomes normal*
*Central Limit Theorem*Central Limit Theorem
nSSEmean =
Std. deviation vs. std. error?Std. deviation vs. std. error?The term “standard deviation” (usually) The term “standard deviation” (usually) refers to the refers to the actual actual rootroot--meanmean--squared squared deviation of a given population or sample deviation of a given population or sample around its meanaround its mean
The term “standard error” refers to the The term “standard error” refers to the expectedexpected rootroot--meanmean--squared deviation of squared deviation of an estimate or forecast around the true an estimate or forecast around the true value under repeated samplingvalue under repeated sampling----i.e., the i.e., the “standard deviation of the error”“standard deviation of the error”
Forecasting with the mean modelForecasting with the mean model
Let denote a Let denote a forecastforecast of of xxn+n+11 based on based on data observed up to period data observed up to period nn
If If xxn+n+11 is assumed to be independently is assumed to be independently drawn from the same population as the drawn from the same population as the sample sample xx11, …, , …, xxnn, the forecast that , the forecast that minimizes mean squared error is simply minimizes mean squared error is simply the sample mean:the sample mean:
1+nx̂
Xxn =+1ˆ
Forecast standard errorForecast standard errorThe The standard error of the forecaststandard error of the forecast has two has two components:components:
2 2 11fcst meanSE S SE S n= + = +
This term measures the intrinsic risk
(“noise” in the data)This term measures the parameter risk
(error in estimating the “signal” in the data)
Note that variances, rather than standard deviations, are additive
For the mean model, the result is that the forecast standard error is slightly larger than the sample
standard deviation
Confidence intervals for forecastsConfidence intervals for forecastsA point forecast should always be accompanied by a A point forecast should always be accompanied by a confidence intervalconfidence interval to indicate its accuracy… but to indicate its accuracy… but what what isis a confidence interval??a confidence interval??
An x% confidence interval is an interval calculated An x% confidence interval is an interval calculated by a by a rulerule which has the property that the interval will which has the property that the interval will cover the true value x% of the time under cover the true value x% of the time under simulatedsimulatedconditions, conditions, assuming the model is correctassuming the model is correct..
Loosely speakingLoosely speaking, there is an x% chance that , there is an x% chance that your your data will fall in data will fall in youryour x% confidence intervalx% confidence interval——but but only if your model and its underlying assumptions only if your model and its underlying assumptions are correct! (This is why we test assumptions.)are correct! (This is why we test assumptions.)
Confidence interval = Confidence interval = point forecast point forecast ±± tt standard errorsstandard errors
If the distribution of forecast errors is assumed If the distribution of forecast errors is assumed to be to be normalnormal, a , a 95% confidence interval95% confidence interval for for the forecast isthe forecast is
…where is the critical value of the …where is the critical value of the “Student’s “Student’s tt” distribution” distribution** with a tail with a tail probability of .05 and probability of .05 and nn−−1 “degrees of 1 “degrees of freedom” (in Excel, = TINV(.05,freedom” (in Excel, = TINV(.05,nn−−1))1))
fcstnn SEtx 1051 −+ ± ,.ˆ
105 −nt ,.
105 −nt ,.
*discovered by W.S. Gossett of Guinness Brewery
en.wikipedia.org/wiki/William_Sealey_Gosset
tt vs. normal distributionvs. normal distribution
The The tt distribution is the distribution ofdistribution is the distribution of
i.e., the number of “standard errors from i.e., the number of “standard errors from the true mean” when the standard the true mean” when the standard deviation is unknown.deviation is unknown.
meanSEX )( μ−
The The tt distribution resembles a standard distribution resembles a standard normal (normal (zz) distribution but with “fatter tails” ) distribution but with “fatter tails” for small for small nn
Normal vs. t: much difference?
-4 -3 -2 -1 0 1 2 3 4
Normal t with 20 df t with 10 df t with 5 df
# standard errors # standard errors ±± computed from normal and computed from normal and ttdistributions are distributions are very closevery close except for very low except for very low
d.fd.f. or very high confidence. or very high confidence
Confidence level (2Confidence level (2--sided)sided)d.fd.f.. 90.0%90.0% 95.0%95.0% 99.0%99.0% 99.5%99.5% 99.9%99.9%Normal Normal 1.6451.645 1.9601.960 2.5762.576 2.8072.807 3.2913.291
200200 1.6531.653 1.9721.972 2.6012.601 2.8382.838 3.3403.340100100 1.6601.660 1.9841.984 2.6262.626 2.8712.871 3.3903.390
5050 1.6761.676 2.0092.009 2.6782.678 2.9372.937 3.4963.4962020 1.7251.725 2.0862.086 2.8452.845 3.1533.153 3.8503.8501010 1.8121.812 2.2282.228 3.1693.169 3.5813.581 4.5874.587
Empirical rules of thumbEmpirical rules of thumbFor For n n ≈≈ 20 or more, the critical 20 or more, the critical tt value is value is approximately 2, so the approximately 2, so the ““empiricalempirical”” 95% CI is 95% CI is roughly the point forecast roughly the point forecast plus or minus two plus or minus two standard errors, standard errors, howeverhowever……
A prediction interval that covers 95% of the A prediction interval that covers 95% of the data is often data is often too widetoo wide to be managerially to be managerially usefuluseful——50% (a 50% (a ““coin flipcoin flip””) or 80% might be ) or 80% might be easier for a manager to understandeasier for a manager to understand
A 50% confidence interval is roughly A 50% confidence interval is roughly plus or plus or minus twominus two--thirds of a standard errorthirds of a standard error
Example, continuedExample, continued
Time series X (Time series X (nn=20*, =20*, d.f.d.f. =19**): =19**):
114, 126, 123, 112, 68, 116, 50, 108, 163, 79114, 126, 123, 112, 68, 116, 50, 108, 163, 7967, 98, 131, 83, 56, 109, 81, 61, 90, 9267, 98, 131, 83, 56, 109, 81, 61, 90, 92
96.28,35.96:Statistics == SX48.620/96.28 ==meanSE
30,100:parametersTrue* == σμ
68.2948.696.28 22 =+=fcstSE
Confidence intervals for predictionsConfidence intervals for predictions
05 19 2 093. ,*t .=
Exact 95% CI* = 96.35 Exact 95% CI* = 96.35 ±± 2.093 2.093 ×× 29.6829.68= [34.2, 158.5]= [34.2, 158.5]
Exact 50% CI** = 96.35 Exact 50% CI** = 96.35 ±± 0.688 0.688 ×× 29.6829.68= = [77.8, 114.9][77.8, 114.9]
5 19 0 688. ,**t .=
StatgraphicsStatgraphics output: mean modeloutput: mean modelTime Sequence Plot for X
Constant mean = 96.35
X
actualforecast95.0% limits
0 10 20 30 4025
50
75
100
125
150
175
StatgraphicsStatgraphics output: mean modeloutput: mean modelTime Sequence Plot for X
Constant mean = 96.35
X
actualforecast50.0% limits
0 10 20 30 4025
50
75
100
125
150
175
A 50% confidence interval is 1/3 the width of a 95% A 50% confidence interval is 1/3 the width of a 95% confidence interval.confidence interval.
What if there’s really a trend?What if there’s really a trend?Time Sequence Plot for XLinear trend = 114.611 + -1.7391 t
X
actualforecast50.0% limits
0 10 20 30 4025
50
75
100
125
150
175
That’s a different modeling assumption, and it leads to That’s a different modeling assumption, and it leads to very different forecasts and confidence intervals.very different forecasts and confidence intervals.
Actually, t=1.61 for slope coefficient, so this model would be rejected at .05 level of significance.
Yes, it’s simple, but...Yes, it’s simple, but...The mean model is the foundation for more The mean model is the foundation for more sophisticated models we will encounter later sophisticated models we will encounter later (RW, regression, ARIMA)(RW, regression, ARIMA)
It has the same generic features:It has the same generic features:→→ A A coefficientcoefficient to be estimatedto be estimated→→ A A standard errorstandard error for the coefficient for the coefficient
that reflects parameter riskthat reflects parameter risk→→ A A forecast standard errorforecast standard error that that
reflects intrinsic risk & parameter riskreflects intrinsic risk & parameter risk→→ …and model risk too!…and model risk too!
Today’s agendaToday’s agenda
Course introductionCourse introduction
Forecasting tools & principlesForecasting tools & principles
How to obtain data & move it aroundHow to obtain data & move it around
Statistical graphicsStatistical graphics
Forecasts and confidence intervals: the simplest Forecasts and confidence intervals: the simplest case (mean model)case (mean model)
⇒⇒ March madnessMarch madness
MarketMarket--based forecastingbased forecastingBetting markets are often an efficient way to Betting markets are often an efficient way to aggregate diverse opinions (and to share aggregate diverse opinions (and to share risks… or have fun)risks… or have fun)Probabilistic forecasts derived from contract Probabilistic forecasts derived from contract prices are often wellprices are often well--calibratedcalibratedCaveats: markets don’t Caveats: markets don’t alwaysalways workwork——may may exhibit “herding” or distortions when bettors exhibit “herding” or distortions when bettors lack independent information or have highly lack independent information or have highly correlated personal stakes in eventscorrelated personal stakes in eventsSome applications are controversial (e.g. Some applications are controversial (e.g. “terrorism futures”)“terrorism futures”)
March madness: forecasting basketball March madness: forecasting basketball games via a betting marketgames via a betting market
These are the price quotes on Tradesports.com at 10am on Monday, March 19, 2007.
As of Monday…As of Monday…
Florida’s stock is rising…
Recap of today’s topicsRecap of today’s topics
Course introductionCourse introduction
Forecasting tools & principlesForecasting tools & principles
How to obtain data & move it aroundHow to obtain data & move it around
Statistical graphicsStatistical graphics
•• Forecasts and confidence intervals: the simplest Forecasts and confidence intervals: the simplest case (mean model)case (mean model)
•• March madnessMarch madness