advanced topics in analysis of economic and financial...
TRANSCRIPT
Advanced Topics in Analysis of Economic and Financial
Data Using R and SAS1
ZONGWU CAIa,b
E-mail address: [email protected] of Mathematics & Statistics and Department of Economics,
University of North Carolina, Charlotte, NC 28223, U.S.A.bWang Yanan Institute for Studies in Economics, Xiamen University, China
December 27, 2007
c©2007, ALL RIGHTS RESERVED by ZONGWU CAI
1This manuscript may be printed and reproduced for individual or instructional use, but maynot be printed for commercial purposes.
Preface
This is the advanced level of econometrics and financial econometrics with some basictheory and heavy applications. Here, our focuses are on the SKILLS of analyzing real datausing advanced econometric techniques and statistical softwares such as SAS and R. Thisis along the line with our WISE’s spirit “STRONG THEORETICAL FOUNDATION andSKILL EXCELLENCE”. In other words, this course covers basically some advanced topicsin analysis of economic and financial data, particularly in nonlinear time series models andsome models related to economic and financial applications. The topics covered start fromclassical approaches to modern modeling techniques even up to the research frontiers. Thedifference between this course and others is that you will learn step by step how to build amodel based on data (or so-called “let data speak themselves”) through real data examplesusing statistical softwares or how to explore the real data using what you have learned.Therefore, there is no a single book serviced as a textbook for this course so that materialsfrom some books and articles will be provided. However, some necessary handouts, includingcomputer codes like SAS and R codes, will be provided with your help (You might be askedto print out the materials by yourself).
Five or six projects, including the heavy computer works, are assigned throughout theterm. The group discussion is allowed to do the projects, particularly writing the computercodes. But, writing the final report to each project must be in your own language. Copyingeach other will be regarded as a cheating. If you use the R language, similar to SPLUS, youcan download it from the public web site at http://www.r-project.org/ and install it intoyour own computer or you can use PCs at our labs. You are STRONGLY encouraged touse (but not limited to) the package R since it is a very convenient programming languagefor doing statistical analysis and Monte Carol simulations as well as various applications inquantitative economics and finance. Of course, you are welcome to use any one of otherpackages such as SAS, MATLAB, GAUSS, STATA, SPSS and EVIEW. But, I mightnot have an ability of giving you a help if doing so.
Some materials are based on the lecture notes given by Professor Robert H. Shumway, De-partment of Statistics, University of California at Davis and my colleague, Professor StanislavRadchenko, Department of Economics, University of North Carolina at Charlotte, and thebook by Tsay (2005). Some datasets are provided by Professor Robert H. Shumway, Depart-ment of Statistics, University of California at Davis and Professor Phillips Hans Franses atUniversity of Rotterdam, Netherland. I am very grateful to them for providing their lecturenotes and datasets. Finally, I have to express my many thanks to our Master student Ms.Huiqun Ma for writing all SAS codes.
ii
How to Install R ?
The main package used is R, which is free from R-Project for Statistical Computing. It alsoneeds Ox with G@RCH to fit various GARCH models. Students may use other packages orprograms if they prefer.
Install R
(1) go to the web site http://www.r-project.org/;
(2) click CRAN;
(3) choose a site for downloading, say http://cran.cnr.Berkeley.edu;
(4) click Windows (95 and later);
(5) click base;
(6) click R-2.6.1-win32.exe (Version of November 26, 2007) to save this file first andthen run it to install (Note that the setup program is 29 megabytes and it is updatedevery three months).
The above steps installs the basic R into your computer. If you need to install otherpackages, you need to do the followings:
(7) After it is installed, there is an icon on the screen. Click the icon to get into R;
(8) Go to the top and find packages and then click it;
(9) Go down to Install package(s)... and click it;
(10) There is a new window. Choose a location to download packages, say USA(CA1),move mouse to there and click OK;
(11) There is a new window listing all packages. You can select any one of packages andclick OK, or you can select all of them and then click OK.
Install Ox?
OxMetricstm is a family of of software packages providing an integrated solution for theeconometric analysis of time series, forecasting, financial econometric modelling, or statisticalanalysis of cross-section and panel data. OxMetrics consists of a front-end program calledOxMetrics, and individual application modules such as PcGive, STAMP, etc.
OxMetrics Enterprisetm is a single product that includes all the important components:OxMetrics desktop, G@RCH, Ox Professional, PcGive and STAMP.
To install Ox Console, please download the filehttp://www.math.uncc.edu/˜ zcai/installation−Ox−[email protected] and follow the steps.
iii
Data Analysis and Graphics Using R – An Introduction (109 pages)
I encourage you to download the file r-notes.pdf (109 pages) which can be downloadedfrom http://www.math.uncc.edu/˜ zcai/r-notes.pdf and learn it by yourself. Pleasesee me if any questions.
Contents
1 Review of Multiple Regression Models 11.1 Least Squared Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Model Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Box-Cox Transformation . . . . . . . . . . . . . . . . . . . . . . . . . 31.2.2 Reading Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Computer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Classical and Modern Model Selection Methods 122.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Subset Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Sequential Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.4 Likelihood Based-Criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.5 Cross-Validation and Generalized Cross-Validation . . . . . . . . . . . . . . . 192.6 Penalized Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.7 Implementation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.7.1 Classical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 222.7.2 LASSO Type Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 232.7.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.8 Computer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.9 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3 Regression Models With Correlated Errors 383.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.2 Nonparametric Models with Correlated Errors . . . . . . . . . . . . . . . . . 473.3 Predictive Regression Models . . . . . . . . . . . . . . . . . . . . . . . . . . 483.4 Computer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4 Estimation of Covariance Matrix 534.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 534.2 Details (see the paper by Zeileis) . . . . . . . . . . . . . . . . . . . . . . . . 574.3 Computer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
iv
CONTENTS v
5 Seasonal Time Series Models 595.1 Characteristics of Seasonality . . . . . . . . . . . . . . . . . . . . . . . . . . 595.2 Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625.3 Nonlinear Seasonal Time Series Models . . . . . . . . . . . . . . . . . . . . . 715.4 Computer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 725.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6 Robust and Quantile Regressions 806.1 Robust Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 806.2 Quantile Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.3 Computer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 836.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7 How to Analyze Boston House Price Data? 867.1 Description of Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 867.2 Analysis Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2.1 Linear Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 877.2.2 Nonparametric Models . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.3 Computer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8 Value at Risk 968.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 968.2 R Commends (R Menu) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.2.1 Generalized Pareto Distribution . . . . . . . . . . . . . . . . . . . . . 1018.2.2 Background of the Generalized Pareto Distribution . . . . . . . . . . 102
8.3 Reading Materials I and II (see Handouts) . . . . . . . . . . . . . . . . . . . 1038.4 New Developments (Nonparametric Approaches) . . . . . . . . . . . . . . . . 103
8.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1038.4.2 Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1078.4.3 Nonparametric Estimating Procedures . . . . . . . . . . . . . . . . . 1088.4.4 Real Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
8.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
9 Long Memory Models and Structural Changes 1199.1 Long Memory Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
9.1.1 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1199.1.2 Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1219.1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.2 Related Problems and New Developments . . . . . . . . . . . . . . . . . . . 1259.2.1 Long Memory versus Structural Breaks . . . . . . . . . . . . . . . . . 1269.2.2 Testing for Breaks (Instability) . . . . . . . . . . . . . . . . . . . . . 1279.2.3 Long Memory versus Trends . . . . . . . . . . . . . . . . . . . . . . . 133
9.3 Computer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1339.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
List of Tables
2.1 AICC values for ten models for the recruits series . . . . . . . . . . . . . . . 30
9.1 Critical Values of the QLR statistic with 15% Trimming . . . . . . . . . . . 129
vi
List of Figures
1.1 Scatterplot with regression line and lowess smoothed curve. . . . . . . . . . 91.2 Scatterplot with regression line and both lowess and loess smoothed curves
as well as the Theil-Sen estimated line. . . . . . . . . . . . . . . . . . . . . . 91.3 Residual plots. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.4 (a) Leverage plot. (b) Influential plot. (c) Plots without observation #34. . . 10
2.1 Monthly SOI (left) and simulated recruitment (right) from a model (n=453months, 1950-1987). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 The SOI series (black solid line) compared with a 12 point moving average(red thicker solid line). The left panel: original data and the right panel:filtered series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 Multiple lagged scatterplots showing the relationship between SOI and thepresent (xt) versus the lagged values (xt+h) at lags 1 ≤ h ≤ 16. . . . . . . . . 27
2.4 Autocorrelation functions of SOI and recruitment and cross correlation func-tion between SOI and recruitment. . . . . . . . . . . . . . . . . . . . . . . . 28
2.5 Multiple lagged scatterplots showing the relationship between the SOI at timet+ h, say xt+h (x-axis) versus recruits at time t, say yt (y-axis), 0 ≤ h ≤ 15. 28
2.6 Multiple lagged scatterplots showing the relationship between the SOI at timet, say xt (x-axis) versus recruits at time t+ h, say yt+h (y-axis), 0 ≤ h ≤ 15. 29
2.7 Partial autocorrelation functions for the SOI (left panel) and the recruits(right panel) series. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.8 ACF of residuals of AR(1) for SOI (left panel) and the plot of AIC and AICCvalues (right panel). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.1 Quarterly earnings for Johnson & Johnson (4th quarter, 1970 to 1st quarter,1980, left panel) with log transformed earnings (right panel). . . . . . . . . . 41
3.2 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the detrended log J&J earnings series (top two panels)and the fittedARIMA(0, 0, 0) × (1, 0, 0)4 residuals. . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Time plots of U.S. weekly interest rates (in percentages) from January 5, 1962to September 10, 1999. The solid line (black) is the Treasury 1-year constantmaturity rate and the dashed line the Treasury 3-year constant maturity rate(red). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
vii
LIST OF FIGURES viii
3.4 Scatterplots of U.S. weekly interest rates from January 5, 1962 to September10, 1999: the left panel is 3-year rate versus 1-year rate, and the right panelis changes in 3-year rate versus changes in 1-year rate. . . . . . . . . . . . . 43
3.5 Residual series of linear regression Model I for two U.S. weekly interest rates:the left panel is time plot and the right panel is ACF. . . . . . . . . . . . . . 43
3.6 Time plots of the change series of U.S. weekly interest rates from January 12,1962 to September 10, 1999: changes in the Treasury 1-year constant maturityrate are in denoted by black solid line, and changes in the Treasury 3-yearconstant maturity rate are indicated by red dashed line. . . . . . . . . . . . . 44
3.7 Residual series of the linear regression models: Model II (top) and Model III(bottom) for two change series of U.S. weekly interest rates: time plot (left)and ACF (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1 US Retail Sales Data from 1967-2000. . . . . . . . . . . . . . . . . . . . . . . 605.2 Four-weekly advertising expenditures on radio and television in The Nether-
lands, 1978.01 − 1994.13. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 615.3 Number of live births 1948(1)−1979(1) and residuals from models with a first
difference, a first difference and a seasonal difference of order 12 and a fittedARIMA(0, 1, 1) × (0, 1, 1)12 model. . . . . . . . . . . . . . . . . . . . . . . . 64
5.4 Autocorrelation functions and partial autocorrelation functions for the birthseries (top two panels), the first difference (second two panels) an ARIMA(0, 1, 0)×(0, 1, 1)12 model (third two panels) and an ARIMA(0, 1, 1) × (0, 1, 1)12 model(last two panels). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5 Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the log J&J earnings series (top two panels), the first difference (sec-ond two panels), ARIMA(0, 1, 0) × (1, 0, 0)4 model (third two panels), andARIMA(0, 1, 1) × (1, 0, 0)4 model (last two panels). . . . . . . . . . . . . . . 68
5.6 ACF and PACF for ARIMA(0, 1, 1)×(0, 1, 1)4 model (top two panels) and theresidual plots of ARIMA(0, 1, 1)×(1, 0, 0)4 (left bottom panel) and ARIMA(0, 1, 1)×(0, 1, 1)4 model (right bottom panel). . . . . . . . . . . . . . . . . . . . . . . 69
5.7 Monthly simple return of CRSP Decile 1 index from January 1960 to December2003: Time series plot of the simple return (left top panel), time series plotof the simple return after adjusting for January effect (right top panel), theACF of the simple return (left bottom panel), and the ACF of the adjustedsimple return. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.1 Different loss functions for Quadratic, Huber’s ρc(·), ρc,0(·), ρ0.05(·), LAD andρ0.95(·), where c = 1.345. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.1 The results from model (7.1). . . . . . . . . . . . . . . . . . . . . . . . . . . 897.2 (a) Residual plot for model (7.1). (b) Plot of g1(x6) versus x6. (c) Residual
plot for model (7.2). (d) Density estimate of Y . . . . . . . . . . . . . . . . . 907.3 Boston Housing Price Data: Displayed in (a)-(d) are the scatter plots of the
house price versus the covariates X13, X6, X1 and X∗1 , respectively. . . . . . 91
LIST OF FIGURES ix
7.4 Boston Housing Price Data: The plots of the estimated coefficient functionsfor three quantiles τ = 0.05 (solid line) for model (7.4), τ = 0.50 (dashedline), and τ = 0.95 (dotted line), and the mean regression (dot-dashed line):a0,τ (u) and a0(u) versus u in (a), a1,τ (u) and a1(u) versus u in (b), and a2,τ (u)and a2(u) versus u in (c). The thick dashed lines indicate the 95% point-wiseconfidence interval for the median estimate with the bias ignored. . . . . . . 92
8.1 (a) 5% CVaR estimate for DJI index. (b) 5% CES estimate for DJI index. . 1148.2 (a) 5% CVaR estimates for IBM stock returns. (b) 5% CES estimates for
IBM stock returns index. (c) 5% CVaR estimates for three different values oflagged negative IBM returns (−0.275, −0.025, 0.325). (d) 5% CVaR estimatesfor three different values of lagged negative DJI returns (−0.225, 0.025, 0.425).(e) 5% CES estimates for three different values of lagged negative IBM returns(−0.275, −0.025, 0.325). (f) 5% CES estimates for three different values oflagged negative DJI returns (−0.225, 0.025, 0.425). . . . . . . . . . . . . . . 115
9.1 Sample autocorrelation function of the absolute series of daily simple returnsfor the CRSP value-weighted (left top panel) and equal-weighted (right toppanel) indexes. Sample partial autocorrelation function of the absolute se-ries of daily simple returns for the CRSP value-weighted (left middle panel)and equal-weighted (right middle panel) indexes. The log smoothed spectraldensity estimation of the absolute series of daily simple returns for the CRSPvalue-weighted (left bottom panel) and equal-weighted (right bottom panel)indexes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
9.2 Break testing results for the Nile River data: (a) Plot of F -statistics. (b) Thescatterplot with the breakpoint. (c) Plot of the empirical fluctuation processwith linear boundaries. (d) Plot of the empirical fluctuation process withalternative boundaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
9.3 Break testing results for the oil price data: (a) Plot of F -statistics. (b) Scat-terplot with the breakpoint. (c) Plot of the empirical fluctuation process withlinear boundaries. (d) Plot of the empirical fluctuation process with alterna-tive boundaries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.4 Break testing results for the consumer price index data: (a) Plot of F -statistics. (b) Scatterplot with the breakpoint. (c) Plot of the empiricalfluctuation process with linear boundaries. (d) Plot of the empirical fluctua-tion process with alternative boundaries. . . . . . . . . . . . . . . . . . . . . 132
Chapter 1
Review of Multiple Regression Models
1.1 Least Squared Estimation
We begin our discussion of univariate and multivariate regression (time series) models by
considering the idea of a simple regression model, which we have met before in other contexts
such as statistics or econometrics courses. All of the multivariate methods follow, in some
sense, from the ideas involved in simple univariate linear regression. In this case, we assume
that there is some collection of fixed known functions of time, say zt1, zt2, . . . , ztq that are
influencing our output yt which we know to be random. We express this relation between
the inputs (predictors or independent variables or covariates or exogenous variables) and
outputs (dependent or response variable) as
yt = β1 zt1 + β2 zt2 + · · · + βq ztq + et (1.1)
at the time points t = 1, 2, · · · , n, where β1, · · · , βq are unknown fixed regression
coefficients and et is a random error or noise, assumed to be white noise; this means that
the observations have zero means, equal variances σ2 and are (uncorrelated) independent.
We traditionally assume also that the white noise series, et, is Gaussian or normally
distributed. Finally, of course, the basic assumption is E(yt|zt1, · · · , ztq) is a linear function
as β1 zt1 + β2 zt2 + · · · + βq ztq.
Question: If at leat one of those (four) assumptions is violated, what should we do?
The linear regression model described by (1.1) can be conveniently written in slightly
more general matrix notation by defining the column vectors zt = (zt1, . . . , ztq)T and
β = (β1, . . . , βq)T so that we write (1.1) in the alternate form
yt = βTzt + et. (1.2)
1
CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS 2
To find estimators for β and σ2, it is natural to determine the coefficient vector β minimizing
the sum of squared errors (SSE)∑n
t=1 e2t with respect to β. Of course, if the distribution
of et is known, it should be to maximize the likelihood. This yields least squares (LSE)
or maximum likelihood estimator (MLE) β and the maximum likelihood estimator for σ2
which is proportional to the unbiased
σ2 =1
n− q
n∑
t=1
(yt − β
Tzt
)2
. (1.3)
Note that the LSE is exactly same as the MLE when the distribution of et is normal
(WHY?). An alternate way of writing the model (1.2) is as
y = Zβ + e, (1.4)
where ZT = (z1, z2, · · · , zn) is a q×n matrix composed of the values of the input variables
at the observed time points and yT = (y1, y2, · · · , yn) is the vector of observed outputs
with the errors stacked in the vector e = (e1, e2, · · · , en)T . The ordinary least squares
estimators β are the solutions to the normal equations
ZT Zβ = Zy.
You need not be concerned as to how the above equation is solved in practice as all computer
packages have efficient software for inverting the q × q matrix ZT Z to obtain
β =(ZT Z
)−1Zy. (1.5)
An important quantity that all software produces is a measure of uncertainty for the esti-
mated regression coefficients, say
Cov(β)
= σ2(ZT Z
)−1 ≡ σ2 C ≡ σ2 (cij). (1.6)
Then, Cov(βi, βj) = σ2 cij and a 100(1 − α)% confidence interval for βi is
βi ± tdf (α/2) σ√cii, (1.7)
where tdf (α/2) denotes the upper 100(1 − α)% point on a t distribution with df degrees of
freedom. What is the df for our case?
Question: If at leat one of those (four) assumptions is violated, are equations (1.6) and
(1.7) still true? If not, how to fix or modify them?
CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS 3
It seems that it is VERY IMPORTANT to make sure that all assumptions are satisfied.
The question is how to do this task. Well, that is what called model checking or model
diagnostics, discussed in the next section.
1.2 Model Diagnostics
1.2.1 Box-Cox Transformation
If the distribution of the error is not normal, a naive way to deal with this problem is to use
a transformation to the dependent variable. A simple and easy way to use a transformation
is the Box-Cox transformation in a regression model. When the errors are heterogeneous
and often non-Gaussian, a Box-Cox power transformation on the dependent variable is a
useful method to alleviate heteroscedasticity when the distribution of the dependent variable
is not known. For situations in which the dependent variable Y is known to be positive, the
following transformation can be used:
Y λ =
Y λ−1
λ, if λ 6= 0,
log(Y ), if λ = 0.
Given the vector of data observations Yini=1, one way to select the power λ is to use the λ
that maximizes the logarithm of the likelihood function
f(λ) = −1
2log[Sn(Y λ)
]− (λ− 1)
n∑
i=1
log(Yi),
where Sn(Y λ) is the sample standard variance of the transformed data Y λi . Generally, we
can take λ to be one of −2 + j∆ (0 ≤ j ≤ 4/∆) to maximize the logarithm of the likelihood
function f(λ).
Note that the Box-Cox transformation approach can be applied to handle the case when
any covariate is nonlinear. For this case, you need to do a transformation to each individual
covariate. When you do the Box-Cox transformation to covariates, you need to minimize
the SSE instead of the likelihood by running a regression. There is no build-in function in
any statistical package so that when implementing the Box-Cox transformation, you have to
write your won code. Another way to choose λ is to use the Q-Q plot. The command in R
for the Q-Q plot is qqnorm() for the Q-Q normal plot or qqplot() for the Q-Q plot of one
versus another.
CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS 4
1.2.2 Reading Materials
See materials from file simple-reg.htm. Also, file r-reference.htm contains references
about how to use R.
1.3 Computer Codes
To fit a multiple regression in R, one can use lm() or glm(); see the followings for details
lm(formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
singular.ok = TRUE, contrasts = NULL, offset, ...)
glm(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart,
offset, control = glm.control(...), model = TRUE,
method = "glm.fit", x = FALSE, y = TRUE, contrasts = NULL, ...)
to fit a regression model without intercept, you need to use
fit1=lm(y~-1+x1+...x9)
where fit1 is called the objective function containing all outputs you need. If you want to
model diagnostic checking, you need to use
plot(fit1)
For multivariate data, it is usually a good idea to view the data as a whole using the pairwise
scatter plots generated by the pairs() function:
pairs(data)
The smoothing parameter in lowess can be adjusted using the f =argument. The default
is f = 2/3 meaning that 2/3 of the data points are used in calculating the fitted value at
each point. If we set f = 1 we get the ordinary regression line. Specifying f = 1/3 should
yield a choppier curve.
lowess(x, y = NULL, f = 2/3, iter = 3, delta = 0.01 * diff(range(xy$x[o])))
CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS 5
There’s a second loess function in R, loess, that has more options and generates more
output.
loess(formula, data, weights, subset, na.action, model = FALSE,
span = 0.75, enp.target, degree = 2,
parametric = FALSE, drop.square = FALSE, normalize = TRUE,
family = c("gaussian", "symmetric"),
method = c("loess", "model.frame"),
control = loess.control(...), ...)
Example 1.1: We examine fitting a simple linear regression model to data using R. The
data are from Harder and Thompson (1989). As part of a study to investigate reproductive
strategies in plants, these biologists recorded the time spent at sources of pollen and the
proportions of pollen removed by bumblebee queens and honeybee workers pollinating a
species of Erythronium lily. The data set consists of three variables. (1) removed: proportion
of pollen removed duration; (2) duration of visit in seconds; and (3) code: 1=bumblebee
queens, 2=honeybee workers. The response variable of interest is removed, the predictor is
duration. Because removed is a proportion, it offers special challenges for analysis. We will
ignore these issues for the moment. We will also look only at the queen bumblebees. I will
leave consideration of the full collection of bees as a homework exercise.
The following R code is for Example 1.1, can be found in the file ex1-1.r.
# 10-12-2006
graphics.off()
beepollen<-read.table(’c:/res-teach/xiamen12-06/data/ex1-1.txt’,header=T)
#beepollen
queens<-beepollen[beepollen[,3]==1,]
attach(queens)
#print(names(queens))
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-1.1.eps",
CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS 6
horizontal=F,width=6,height=6)
#win.graph()
par(bg="light yellow")
plot(duration,removed)
bee.reg<-lm(removed~duration)
bee.reg
print(anova(bee.reg))
print(summary(bee.reg))
print(names(bee.reg))
abline(bee.reg,lty=1,lwd=3)
lowess(duration,removed)
lines(lowess(duration,removed),lty=2,col=2,lwd=2)
legend(35,.25,c(’regression’,’loess’),lty=c(1,2),
col=c(1,2),lwd=c(2,2),bty="n")
dev.off()
postscript(file="c:/res-teach/xiamen12-06/figs/fig-1.2.eps",
horizontal=F,width=6,height=6)
#win.graph()
par(bg="light grey")
plot(duration,removed)
abline(bee.reg,lty=1,lwd=3)
lines(lowess(duration,removed,f=1/3),lty=2,col=6,lwd=2)
loess.bee<-loess(removed~duration,family="symmetric")
print(names(loess.bee))
lines(sort(loess.bee$x),loess.bee$fitted[order(loess.bee$x)],
col=4,lty=2,lwd=2)
tsp1reg<-function(x,y)
#
# Compute the Theil-Sen regression estimator.
# Only a single predictor is allowed in this version
CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS 7
#
temp<-matrix(c(x,y),ncol=2)
temp<-elimna(temp) # Remove any pairs with missing values
x<-temp[,1]
y<-temp[,2]
ord<-order(x)
xs<-x[ord]
ys<-y[ord]
vec1<-outer(ys,ys,"-")
vec2<-outer(xs,xs,"-")
v1<-vec1[vec2>0]
v2<-vec2[vec2>0]
slope<-median(v1/v2)
coef<-0
coef[1]<-median(y)-slope*median(x)
coef[2]<-slope
res<-y-slope*x-coef[1]
list(coefficients=coef,residuals=res)
elimna<-function(m)
#
# remove any rows of data having missing values
#
ikeep<-c(1:nrow(m))
for(i in 1:nrow(m))if(sum(is.na(m[i,])>=1))ikeep[i]<-0
elimna<-m[ikeep[ikeep>=1],]
elimna
ts.reg<-tsp1reg(duration,removed)
abline(reg=ts.reg,col=2,lty=1,lwd=4)
CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS 8
dev.off()
########################################################
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-1.3.eps",
horizontal=F,width=6,height=6)
#win.graph()
par(mfrow=c(2,2),mex=0.4,bg="light pink")
plot(bee.reg)
dev.off()
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-1.4.eps",
horizontal=F,width=6,height=6)
#win.graph()
par(mfrow=c(2,2),mex=0.4,bg="light blue")
h<-hatvalues(bee.reg)
plot(h,type=’h’)
abline(h=c(2,3)*mean(h),col=4,lty=2)
plot(c(0,.6),c(-3,2),xlab=’leverage’,
ylab=’studentized residual’,type=’n’)
cook<-sqrt(cooks.distance(bee.reg))
points(hatvalues(bee.reg),rstudent(bee.reg),
cex=10*cook/max(cook))
abline(h=c(-2,0,2),lty=2,col=2)
abline(v=c(2,3)*mean(hatvalues(bee.reg)),lty=2,col=4)
#identify(hatvalues(bee.reg),rstudent(bee.reg))
# take long time to do this
bee.reg1<-update(bee.reg,subset=-34)
bee.reg2<-update(bee.reg,subset=-c(14,34))
plot(duration,removed)
abline(bee.reg,col=1,lty=1,lwd=3)
abline(bee.reg1,col=2,lty=2,lwd=3)
CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS 9
abline(bee.reg2,col=3,lty=2,lwd=3)
abline(reg=ts.reg,col=4,lty=1,lwd=3)
legend(25,.3,c(’all observations’,’without #34’,’without #14, #34’,
’Theil-Sen’),lty=c(1,2,2,1),col=1:4,lwd=rep(2,4),bty=’n’)
dev.off()
print(c(’I am done’))
0 10 20 30 40 50 60
0.10.2
0.30.4
0.50.6
0.7
duration
remove
d
regressionloess
Figure 1.1: Scatterplot with regression line and lowess smoothed curve.
0 10 20 30 40 50 60
0.10.2
0.30.4
0.50.6
0.7
duration
remove
d
Figure 1.2: Scatterplot with regression line and both lowess and loess smoothed curves aswell as the Theil-Sen estimated line.
CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS 10
0.3 0.4 0.5 0.6 0.7
−0.3
−0.1
0.10.2
0.3
Fitted values
Resid
uals
Residuals vs Fitted
35
4
28
−2 −1 0 1 2
−2−1
01
2
Theoretical Quantiles
Stan
dard
ized r
esidu
als
Normal Q−Q
34
35
4
0.3 0.4 0.5 0.6 0.7
0.00.5
1.01.5
Fitted values
Stan
dard
ized r
esidu
als
Scale−Location34
354
0.0 0.1 0.2 0.3 0.4 0.5
−2−1
01
2
Leverage
Stan
dard
ized r
esidu
als
Cook’s distance
1
0.5
0.5
1
Residuals vs Leverage
34
1
14
Figure 1.3: Residual plots.
0 5 10 15 20 25 30 35
0.10.2
0.30.4
0.5
Index
h
0.0 0.1 0.2 0.3 0.4 0.5 0.6
−3−2
−10
12
leverage
stude
ntize
d res
idual
0 10 20 30 40 50 60
0.10.2
0.30.4
0.50.6
0.7
duration
remo
ved
all observationswithout #34without #14, #34Theil−Sen
Figure 1.4: (a) Leverage plot. (b) Influential plot. (c) Plots without observation #34.
CHAPTER 1. REVIEW OF MULTIPLE REGRESSION MODELS 11
1.4 References
Harder, L.D. and J. D. Thompson (1989). Evolutionary options for maximizing pollendispersal of animal-pollinated plants. American Naturalist, 133. 323-344.
Chapter 2
Classical and Modern Model SelectionMethods
2.1 Introduction
Given a possibly large set of potential predictors, which ones do we include in our model?
Suppose X1, X2, · · · is a pool of potential predictors. The model with all predictors is given
by
Y = β0 + β1X1 + β2X2 + · · · + ε,
is the most general model. It holds even if some of the individual βj’s are zero. But if some
βj’s zero or close to zero, it is better to omit those Xj’s from the model. Reasons why you
should omit variables whose coefficients are close to zero:
(a) Parsimony principle:
Given two models that perform equally well in terms of prediction, one should choose
the model that is more parsimonious (simple).
(b) Prediction principle:
The model should give predictions that are as accurate as possible, not just for current
observation, but for future observations as well. Including unnecessary predictors
can apparently improve prediction for the current data, but can harm pre-
diction for future data. Note that the sum of squares error (SSE) never increases
as we add more predictors.
Next, we discuss all possible methods available in the literature.
12
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 13
2.2 Subset Approaches
The all-possible-regressions procedure calls for considering all possible subsets of the pool of
potential predictors and identifying for detailed examination a few good sub-sets according to
some criterion. The purpose of all-possible-regressions approach is identifying a small group
of regression models that are good according to a specified criterion (summary statistic) so
that a detailed examination can be made of these models leading to the selection of the final
regression model to be employed. The main problem of this approach is computationally
expensive. For example, with 10 predictors, we need to investigate 210 = 1024 potential
regression models. With the aid of modern computing power, this computation is possible.
But still the number of 1024 possible models to examine carefully would be an overwhelming
task for a data analyst.
Different criteria for comparing the regression models may be used with the all-possible-
regressions selection procedure. We discuss several summary statistics:
(i) R2p (or SSEp)
(ii) R2adj;p (or MSEp)
(iii) Cp
(iv) PRESSp
(v) Sequential Methods
(vi) AIC type criteria
We shall denote the number of all potential predictors in the pool by K−1. Hence including
an intercept parameter β0, we have K potential parameters. The number of predictors in a
subset will be denoted by p− 1, as always, so that there are p parameters in the regression
function for this subset of predictors. Thus we have 1 ≤ p ≤ K. Now, we discuss each one
in detail.
1. R2p (or SSEp)
R2p indicates that there are p parameters (or, p − 1 predictors) in a regression model.
The coefficient of multiple determination R2p is defined as
R2p = 1 − SSEp
SSTO.
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 14
• It measures the proportion of variance of Y explained by p− 1 predictors.
• R2p always goes up as we add more predictors.
• R2p varies inversely with SSEp because SSTO is constant for all possible regression
models. That is, choosing the model with the largest R2p is equivalent to choosing the
model with smallest SSEp.
• R2p might not be a good criterion. WHY?
2. R2adj;p (or MSEp)
One often considers models with a large R2p value. However, R2
p always increases with
the number of predictors. Hence it can not be used to compare models with different
sizes. The adjusted coefficient of multiple determination R2adj;p has been suggested as
an alternative criterion:
R2adj;p = 1 − SSEp/(n− p)
SSTO/(n− 1)= 1 −
(n− 1
n− p
)SSEp
SSTO= 1 − MSEp
SSTO/(n− 1).
• It is like R2p but with a penalty for adding unnecessary variables. R2
adj;p can go down
when a useless predictor is added. It can be even negative.
• R2adj;p varies inversely with MSEp because SSTO/(n − 1) is constant for all possible
regression models. That is, choosing the model with the largest R2adj;p is equivalent to
choosing the model with smallest MSEp.
• R2p is useful when comparing models of the same size, while R2
adj;p (or Cp) is used to
compare models with different sizes.
• R2adj:p is better than R2
p.
3. Mallows Cp
The Mallows Cp is concerned with the total mean squared error of the n fitted values
for each subset regression model. The mean squared error concept involves the total
error in each fitted value:
Yi − µi = Yi − E(Yi)︸ ︷︷ ︸random error
+E(Yi) − µi︸ ︷︷ ︸bias
,
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 15
where µi is the true mean response at ith observation. The means squared error for Yi
is defined as the expected value of the square of the total error in the above. It can be
shown that
MSE(Yi) = E
(Yi − µi)2
= Var(Yi) +[Bias(Yi)
]2,
where Bias(Yi) = E(Yi) − µi. The total mean square error for all n fitted values Yi is
the sum over the observation i:
n∑
i=1
MSE(Yi) =n∑
i=1
Var(Yi) +n∑
i=1
[Bias(Yi)
]2.
It can be shown that
n∑
i=1
Var(Yi) = p σ2 andn∑
i=1
[Bias(Yi)
]2= (n− p)[E(S2
p) − σ2],
where S2p is the MSE from the current model. Using this, we have
n∑
i=1
MSE(Yi) = p σ2 + (n− p)[E(S2p) − σ2], (2.1)
Dividing (2.1) by σ2, we make it scale-free:
n∑
i=1
MSE(Yi)
σ2= p+ (n− p)
E(S2p) − σ2
σ2,
If the model does not fit well, then S2p is a biased estimate of σ2. We can estimate E(S2
p)
by MSEp and estimate σ2 by the MSE from the maximal model (the largest model we
can consider), i.e., σ2 = MSEK−1 = MSE(X1, . . . , XK−1). Using the estimators for
E(S2p) and σ2 gives
Cp = p+ (n− p)MSEp − MSE(X1, . . . , XK−1)
MSE(X1, . . . , XK−1)=
SSEp
MSE(X1, . . . , XK−1)− (n− 2p).
• Small Cp is a good thing. A small value of Cp indicates that the model is relatively
precise (has small variance) in estimating the true regression coefficients and predicting
future responses. This precision will not improve much by adding more predictors.
Look for models with small Cp.
• If we have enough predictors in the regression model so that all the significant predictors
are included, then MSEp ≈ MSE(X1, . . . , XK−1) and it follows that Cp ≈ p.
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 16
• Thus Cp close to p is evidence that the predictors in the pool of potential predictors
(X1, . . . , XK−1) but not in the current model, are not important.
• Models with considerable lack of fit have values of Cp larger than p.
• The Cp can be used to compare models with different sizes.
• If we use all the potential predictors, then Cp = K.
4. PRESSp
The PRESS (prediction sum of squares) is defined as
PRESS =n∑
i=1
ε2(i),
where ε(i) is called PRESS (prediction sum of squares) residual for the the ith observa-
tion, similar to the jackknife residual. The PRESS residual is defined as ε(i) = Yi− Y(i),
where Y(i) is the fitted value obtained by leaving the ith observation. Models with small
PRESSp fit well in the sense of having small prediction errors. PRESSp can be calcu-
lated without fitting the model n times, each time deleting one of the n cases. One
can show that
ε(i) =εi
1 − hii
,
where hii is the ith diagonal element of H = X(XTX)−1XT (hat matrix).
2.3 Sequential Methods
1. Forward selection
(i) Start with the null model.
(ii) Add the significant variable if p-value is less than penter, (equivalently, F is larger than
Fenter).
(iii) Continue until no more variables enter the model.
2. Backward elimination
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 17
(i) Start with the full model.
(ii) Eliminate the least significant variable whose p-value is larger than premove, (equiva-
lently, F is smaller than Fremove).
(iii) Continue until no more variables can be discarded from the model.
3. Stepwise selection
(i) Start with any model.
(ii) Check each predictor that is currently in the model. Suppose the current model con-
tains X1, . . . , Xk. Then F statistic for Xi is
F =SSE(X1, . . . , Xi−1, Xi+1, . . . , Xk) − SSE(X1, . . . , Xk)
MSE(X1, . . . , Xk)∼ F (1;n− k − 1).
Eliminate the least significant variable whose p-value is larger than premove, (equiva-
lently, F is smaller than Fremove).
(iii) Continue until no more variables can be discarded from the model.
(iv) Add the significant variable if p-value is less than penter, (equivalently, F is larger than
Fenter).
(v) Go to step (ii)
(vi) Repeat until no more predictors can be entered and no more can be discarded.
2.4 Likelihood Based-Criteria
The following is based on Akaike’s approach Akaike (1973) and subsequent papers; see also
recent book by Burnham and Anderson (2003).
Suppose that f(y) : true model (unknown) giving rise to data ( is a vector of data) and
g(y, θ) : candidate model (parameter vector). Want to find a model g(y, θ) “close to” f(y).
The Kullback-Leibler discrepancy (K-L distance):
K(f, g) = Ef
[log
(f(Y )
g(Y, θ)
)].
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 18
This is a measure of how “far” model g is from model f (with reference to model f).
Properties:
K(f, g) ≥ 0 K(f, g) = 0 ⇐⇒ f(·) = g(·).
Of course, we can never know how far our model g is from f . But Akaike (1973) showed
that we might be able to estimate something almost as good.
Suppose we have two models under consideration: g(y, θ) and h(y, φ). Akaike (1973)
showed that we can estimate K(f, g)−K(f, h). It turns out that the difference of maximized
log-likelihoods, corrected for a bias, estimates the difference of K-L distances. The maximized
likelihoods are, Lg(y, θ) and Lh(y, φ), where θ and φ are the ML estimates of the parameters.
Akaike’s result: [log(Lg)− q]− [log(Lh)− r] is an asymptotically unbiased estimate (i.e. bias
approaches zero as sample size increases) of K(f, g) − K(f, h). Here q is the number of
parameters estimated in θ (model g) and r is the number of parameters estimated in φ
(model h). The price of parameters: the likelihoods in the above expression are penalized
by the number of parameters.
The Akaike Information Criterion (AIC) for model g:
AIC = −2 log(Lg) + 2 q. (2.2)
A biased correction version of AIC was proposed by Hurvich and Tsai (1989), called AICC ,
defined by
AICC = AIC + 2 q(q + 1)/(n− q − 1) = −2 log(Lg) + 2 qn/(n− q − 1). (2.3)
The difference between AIC and AICC is the penalty term. Intuitively, one can think of
2qn/(n − q − 1) in (2.3) as a penalty term to discourage over-parameterization. Shibata
(1976) suggested that the AIC has a tendency to overestimate parameter q. By comparing
the penalty terms in (2.2) and (2.3), we can see that the factors, 2qn/(n− q− 1) and 2q, for
the AICC and AIC statistics are asymptotically equivalent as n → ∞. The AICC statistic
however has more extreme penalty for larger-order models which counteracts the over-fitting
tendency of the AIC.
Another approach is given by the much older notion of Bayesian statistics. In the Bayesian
approach, we assume that a priori uncertainty about the value of model parameters is rep-
resented by a prior distribution. Upon observing the data, this prior is updated, yielding a
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 19
posterior distribution. In order to make inferences about the model (rather than its param-
eters), we integrate across the posterior distribution. Under the assumption that all models
are a priori equally likely (because the Bayesian approach requires model priors as well as
parameter priors), Bayesian model selection chooses the model with highest marginal likeli-
hood. The ratio of two marginal likelihoods is called a Bayes factor (BF), which is a widely
used method of model selection in Bayesian inference. The two integrals in the Bayes factor
are nontrivial to compute unless they form a conjugated family. Monte Carlo methods are
usually required to compute BF, especially for highly parameterized models. A large sample
approximation of BF yields the easily-computable Bayesian information criterion (BIC)
BIC = −2 log(Lg) + q log n. (2.4)
In a sum, both AIC and BIC as well as their generalizations have a similar form as
LC = −2 log(Lg) + λ q, (2.5)
where λ is fixed constant. From (2.2), (2.3), and (2.4), we can see that the BIC statistic has
much more penalty when n is large than AIC and AICC to overcome the over-fitting. Also,
from (2.5), it is easy to see that LC includes AIC, AICC and BIC as a special case.
The recent developments suggest the use of a data adaptive penalty to replace the fixed
penalties λ. See, Bai, Rao and Wu (1999) and Shen and Ye (2002). That is to estimate λ
by data in a complexity form based on a concept of generalized degree of freedom.
2.5 Cross-Validation and Generalized Cross-Validation
The cross validation (CV) is the most commonly used method for model assessment and
selection. The main idea is a direct estimate of extra-sample error. The general version of
CV is to split data into K roughly equal-sized parts and to fit the model to the other K-1
parts and calculate prediction error on the remaining part.
CV =n∑
i=1
(Yi − Y−i)2 (2.6)
where Y−i is the fitted value computed with i-th part of data removed and Yi − Y−i is the
jackknife residual.
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 20
A convenient approximation to CV for linear fitting with squared error loss is generalized
cross validation (GCV). A linear fitting method has the following property: Y = S Y , where
Yi is the fitted value with the whole data and S = (sij)n×n is the smoothing (hat) matrix.
For many linear fitting methods with leave-one-out (k = 1), it can be showed easily that
CV =n∑
i=1
(Yi − Y−i)2 =
n∑
i=1
(Yi − Yi
1 − sii
)2
.
Due to the intensive computation, the CV can be approximated by the GCV, defined by
GCV =n∑
i=1
(Yi − Yi
1 − trace(S)/n
)2
=
∑ni=1(Yi − Yi)
2
(1 − trace(S)/n)2. (2.7)
It has been shown that both the CV and GCV methods are very appalling to nonparametric
modeling; see the book by Hastie and Tishirani (1990). It follows from (2.7) that
GCV ≈n∑
i=1
(Yi − Yi)2 (1 + trace(S)/n)2 ≈ σ2
[SSE/σ2 + 2 trace(S)
], (2.8)
where σ2 =∑n
i=1(Yi − Yi)2/n. Therefore, under the normality assumption, GCV is asymp-
totically equivalent to AIC since trace(S) = q.
Recently, the leave-one-out cross-validation method was challenged by Shao (1993). Shao
(1993) claimed that the popular leave-one-out cross-validation method, which is asymptot-
ically equivalent to many other model selection methods such as the AIC, the Cp, and the
bootstrap, is asymptotically inconsistent in the sense that the probability of selecting the
model with the best predictive ability does not converge to 1 as the total number of obser-
vations n → ∞ and he showed that the inconsistency of the leave-one-out cross-validation
can be rectified by using a leave-nν-out cross-validation with nν , the number of observations
reserved for validation, satisfying nν/n→ 1 as n→ ∞.
2.6 Penalized Methods
1. Bridge and Ridge:
Frank and Friedman (1993) proposed the Lq(q > 0) penalized least squares as
n∑
i=1
(Yi −∑
j
βj Xij)2 + λ
∑
j
|βj|q,
which results in the estimator which is called the bridge estimator. If q = 2, the
resulting estimator is called the ridge estimator given by β = (XTX + λ I)−1XTY .
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 21
2. LASSO:
Tibshirani (1996) proposed the so-called the least absolute shrinkage and selection
operator (LASSO) which is the minimizer of the following constrained least squares
n∑
i=1
(Yi −∑
j
βj Xij)2 + λ
∑
j
|βj|,
which results in the soft threshing rule βj = sign(β0j )(|β0
j | − λ)+.
3. Non-concave Penalized LS:
Fan and Li (2001) proposed the non-concave penalized least squares
n∑
i=1
(Yi −∑
j
βj Xij)2 +
∑
j
pλ(|βj|),
where the hard threshing penalty function pλ(|θ|) = λ2 − (|θ| − λ)2|(|θ| < λ), which
results in the hard threshing rule βj = β0j I(|β0
j | > λ). Finally, Fan and Li (2001)
proposed the so-called the smoothly clipped absolute deviation (SCAD) model selection
criterion with the penalized function defined as
p′λ(θ) = λ
I(θ ≤ λ) − (a λ− θ)+
(a− 1)λI(θ > λ)
for some a > 2 and θ > 0,
which results in the estimator
βj =
sign(β0j )(|β0
j | − λ)+ when |β0j | ≤ 2λ,
(a− 1)β0j − sign(β0
j ) a λ/(a− 2) when 2λ ≤ |β0
j | ≤ a λ,
β0j when |β0
j | > aλ.
Also, Fan and Li (2001) showed that the SCAD estimator satisfies three properties: (1)
unbiasedness for large true coefficient to avoid unnecessary estimation bias; (2) sparsity
of estimating a small coefficient as 0 to reduce model complexity; and (3) continuity of
resulting estimator to avoid unnecessary variation in model prediction. Fan and Peng
(2004) considered the case that the number of regressors can depend on the sample
size and goes to infinity in a certain rate. To choose the appropriate tuning parameters
a and λ in the SCAD penalty function, the CV or GCV or BIC method can be used;
see Fan and Li (2001), Fan and Peng (204) and Wang, Li and Tsai (2007).
This approach can be generalized to a more general setting, called penalized likelihood
which can be applied to other purposes, say to select copulas in modeling dependent structure
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 22
of multivariate financial variables; see Cai, Chen, Fan and Wang (2007),1 and Cai and Wang
(2007).2 Also, it can be applied to quantile regression settings.
2.7 Implementation in R
2.7.1 Classical Models
To fit a multiple regression in R, one can use lm() or glm(); see the followings for details
lm(formula, data, subset, weights, na.action,
method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE,
singular.ok = TRUE, contrasts = NULL, offset, ...)
glm(formula, family = gaussian, data, weights, subset,
na.action, start = NULL, etastart, mustart,
offset, control = glm.control(...), model = TRUE,
method = "glm.fit", x = FALSE, y = TRUE, contrasts = NULL, ...)
to fit a regression model without intercept, you need to use
fit1=lm(y~-1+x1+...x9)
where fit1 is called the objective function containing all outputs you need. If you want to
model diagnostic checking, you need to use
plot(fit1)
For multivariate data, it is usually a good idea to view the data as a whole using the pairwise
scatter plots generated by the pairs() function:
pairs(data)
To drop or add one variable from or to a regression model, you use the command drop1()
or add1(), for example,
drop1(fit1)
add1(fit1,~x10+x11+...+x20)
1Cai, Z., X. Chen, Y. Fan and X. Wang (2007). Selection of Copulas with Applications in Finance.Working paper.
2Cai, Z. and X. Wang (2007). Selection of Mixture Distributions for Time Series. Working paper.
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 23
The last command means that you choose the “best” one from X10 to X20 to add it into
the model. Adding and dropping terms using add1() and drop1() is useful method for
selecting a model when only a few terms are involved, but it can quickly become tedious.
Functions add1() and drop1() are based on the Cp criterion. The step() function provides
an automatic procedure for conducting stepwise model selection. The step() function re-
quires an initial model, often constructed explicitly as an intercept-only model. For example,
suppose that we want to find the “best” model involving X1, · · · , X10, we could create an
intercept-only model and then call step() as follows:
fit0=lm(y~1)
fit2=step(fit0,~x1+x2+...+x10, trace=F)
step(object, scope, scale = 0,
direction = c("both", "backward", "forward"),
trace = 1, keep = NULL, steps = 1000, k = 2, ...)
With “trace=T” or “trace=1”, step() displays the output of each step of the selection
process. The step() function is based on AIC or BIC by specifying k in the function. Also,
one can use the function stepAIC() in the package MASS for a wider range of object
classes.
2.7.2 LASSO Type Methods
The package lasso2 provides many features for solving regression problems while imposing L1
constraints on the estimates and the package lars provides efficient procedures for an entire
LASSO with the cost of a single least squares. LAR is the least angle regression, proposed
by Efron, Hastie, Johnstone and Tibshirani (2004). In lars, you can use the function lars()
lars(x, y, type = c("lasso", "lar", "forward.stagewise"),trace = FALSE,
Gram, eps = .Machine$double.eps, max.steps, use.Gram = TRUE)
and the function cv.lars() to compute the K-fold cross-validated mean squared prediction
error for least angle regressions (lars), LASSO, or forward stagewise.
cv.lars(x, y, K = 10, fraction = seq(from = 0, to = 1, length = 100),
trace = FALSE, plot.it = TRUE, se = TRUE, ...)
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 24
In the package lasso2, the function l1ce() is for regression fitting with L1-constraint on the
parameters.
l1ce(formula, data = sys.parent(), weights, subset, na.action,
sweep.out = ~ 1, x = FALSE, y = FALSE,
contrasts = NULL, standardize = TRUE,
trace = FALSE, guess.constrained.coefficients = double(p),
bound = 0.5, absolute.t = FALSE)
or the function gl1ce() for fitting a generalized regression problem while imposing an L1
constraint on the parameters
gl1ce(formula, data = sys.parent(), weights, subset, na.action,
family = gaussian, control = glm.control(...), sweep.out = ~ 1,
x = FALSE, y = TRUE, contrasts = NULL, standardize = TRUE,
guess.constrained.coefficients = double(p), bound = 0.5, ...)
Also, you can use the function gcv() to extracts the generalised cross-validation score(s)
from fitted model objects.
gcv(object, ...)
2.7.3 Example
We begin by introducing several environmental and economic as well as financial time series
to serve as illustrative data for time series methodology. Figure 2.1 shows monthly values of
an environmental series called the Southern Oscillation Index (SOI) and associated recruit-
ment (number of new fish) computed from a model by Pierre Kleiber, Southwest Fisheries
Center, La Jolla, California. This data set is provided by Shumway (2006). Both series are
for a period of 453 months ranging over the years 1950−1987. The SOI measures changes in
air pressure that are related to sea surface temperatures in the central Pacific. The central
Pacific Ocean warms up every three to seven years due to the El Nino effect which has been
blamed, in particular, for foods in the midwestern portions of the U.S.
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 25
Both series in Figure 2.1 tend to exhibit repetitive behavior, with regularly repeating
(stochastic) cycles that are easily visible. This periodic behavior is of interest because un-
derlying processes of interest may be regular and the rate or frequency of oscillation char-
acterizing the behavior of the underlying series would help to identify them. One can also
remark that the cycles of the SOI are repeating at a faster rate than those of the recruitment
series. The recruit series also shows several kinds of oscillations, a faster frequency that
seems to repeat about every 12 months and a slower frequency that seems to repeat about
every 50 months. The study of the kinds of cycles and their strengths will be discussed later.
The two series also tend to be somewhat related; it is easy to imagine that somehow the fish
population is dependent on the SOI. Perhaps there is even a lagged relation, with the SOI
signalling changes in the fish population.
The study of the variation in the different kinds of cyclical behavior in a time series can
be aided by computing the power spectrum which shows the variance as a function of the
frequency of oscillation. Comparing the power spectra of the two series would then give
valuable information relating to the relative cycles driving each one. One might also want
to know whether or not the cyclical variations of a particular frequency in one of the series,
say the SOI, are associated with the frequencies in the recruitment series. This would be
measured by computing the correlation as a function of frequency, called the coherence.
The study of systematic periodic variations in time series is called spectral analysis. See
0 100 200 300 400
−1.0
−0.5
0.00.5
1.0
Southern Oscillation Index
0 100 200 300 400
020
4060
80100
Recruit
Figure 2.1: Monthly SOI (left) and simulated recruitment (right) from a model (n=453months, 1950-1987).
Shumway (1988), Shumway (2006), and Shumway and Stoffer (2000) for details.
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 26
We will need a characterization for the kind of stability that is exhibited by the environ-
mental and fish series. One can note that the two series seem to oscillate fairly regularly around
central values (0 for SOI and 64 for recruitment). Also, the lengths of the cycles and their
orientations relative to each other do not seem to be changing drastically over the time
histories.
We consider the twelve month moving average aj = 1/12, j = 0, ±1, ±2, ±3, ±4, ±5,
±6 and zero otherwise. The result of applying this filter to the SOI index is shown in Figure
2.2. It is clear that this filter removes some higher oscillations and produces a smoother
0 100 200 300 400
−1.0
−0.5
0.00.5
1.0
0 100 200 300 400
−0.5
0.00.5
Figure 2.2: The SOI series (black solid line) compared with a 12 point moving average (redthicker solid line). The left panel: original data and the right panel: filtered series.
series. In fact, the yearly oscillations have been filtered out (see the right panel in Figure
2.2) and a lower frequency oscillation appears with a cycling rate of about 42 months. This
is the so-called El Nino effect that accounts for all kinds of phenomena. This filtering effect
will be examined further later on spectral analysis since it is extremely important to know
exactly how one is influencing the periodic oscillations by filtering.
In Figure 2.3, we have made a lagged scatterplot of the SOI series at time t+ h against
the SOI series at time t and obtained a high correlation, 0.412, between the series xt+12 and
the series xt shifted by 12 years. Lower order lags at t − 1, t − 2 also show correlation.
The scatterplot shows the direction of the relation which tends to be positive for lags 1,
2, 11, 12, 13, and tends to be negative for lags 6, 7, 8. The scatterplot can also show no
significant nonlinearities to be present. In order to develop a measure for this self correlation
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 27
oooo
o ooo
ooo
o
oo
oo
ooo
o oo
oo o
oo
o
oo
oo
o oo
ooo
o
o
o
ooo
o
oo
ooo
oo
o
oo
o
o oo
o
oo
oo
oo
o
o
o
ooo o
ooo
o
oo
oo o
o
o
o
ooooo
o
o o
oo oo
ooo
ooo oooooooo
ooo
o o
oo
o oo
o
oo
o
o
oo
oo
o
o
ooo
o
o
o
oo
oo o
oo
oooo
o oo oo
oo
o
o
o
ooo
o
o oo oooooooo
oo o
oo
oo
o
o
oo
o
ooo
oo
o ooo
o
o oo
ooo
o ooo
o
ooo
o o
oo
o o
o
oo
o
o
o
oo
ooo oo
o
oo
oo
oo
ooo
oo
o
oo
oo
oo
oo
oo
oo
oooo
o oo
oooo
oo
oooo
oo
o oooo
oo
o
o
o oo
o o
ooo
o
oo
oo
o
oo
o
o oo
oo
o oo
o o
o oooo
o
o
o o
o
oo
oo
oo
o
o
o
o o
o ooooooo
o
ooo
oo
o
o
oo
o
oo
o
o
o o
oo
o oo
oo
oo
oo
o o
oo
o oo
o
oo
o
oo
o oo
o
oo
ooooooo
ooo
oo
ooooo oo
o
o oooo
oo o
oo
oo
oo
ooo o
ooo
oo o
ooo
oo
oo
−1.0 0.0 0.5 1.0−1.0
0.01.0
1
ooo
oo ooo
o oo
oo
oo
oo o
oooo
o ooo
o
oo
oo
ooo
ooo
o
o
o
o oo
o
oo
oo o
oo
o
oo
o
ooo
o
oo
oo
oo
o
o
o
o ooo
ooo
o
oo
oo o
o
o
o
o oooo
o
o o
ooo o
ooo
oooo o
oooooo
oooo o
oo
oooo
oo
o
o
oo
oo
o
o
o oo
o
o
o
oo
oo o
oo
o ooo
oooo o
oo
o
o
o
o ooo
o ooo o
oooooo
oo o
oo
oo
o
o
oo
o
o oo
oo
oo oo
o
ooo
oo o
oo ooo
ooo
oo
oo
oo
o
oo
o
o
o
oo
o ooo
oo
oo
oo
oo
o oo
oo
o
o o
oo
oo
oo
oo
oo
oooo
ooo
oo oo
oo
oo oo
oo
o oooo
oo
o
o
ooo
oo
ooo
o
oo
oo
o
oo
o
oo o
oo
ooo
o o
oo ooo
o
o
oo
o
oo
oo
oo
o
o
o
oo
oo oooooo
o
ooo
oo
o
o
oo
o
oo
o
o
o o
oo
o oo
o o
oo
oo
oo
oo
o oo
o
oo
o
oo
oooo
oo
ooooooo
oo o
oo
oooooo o
o
o o ooo
ooo
oo
oo
oo
o ooo
ooo
oo o
oo o
oo
ooo
−1.0 0.0 0.5 1.0−1.0
0.01.0
2
ooooo
ooooo
oo
oo
oo o
ooo
oo o
o o
o
oo
o o
ooo
ooo
o
o
o
oo o
o
oo
ooo
oo
o
oo
o
ooo
o
oo
oo
oo
o
o
o
o ooo
o oo
o
oo
ooo
o
o
o
ooooo
o
oo
oooo
ooo
ooooo
oooooo
ooo
oo
ooo oo
o
oo
o
o
oo
oo
o
o
ooo
o
o
o
oo
ooo
oooo o
o
ooo oo
ooo
o
o
ooo
o
ooo oo oooo
oo
ooo
oo
oo
o
o
o o
o
oo o
ooo oo
o
o
ooo
oo o
ooooo
ooo
oo
ooo o
o
oo
o
o
o
oo
o oooo
o
oo
oo
oo
ooo
oo
o
oo
oo
oo
oo
oo
oo
o ooo
ooo
oo o o
oo
ooo ooo
o oo oo
oo
o
o
ooo
oo
o oo
o
o o
ooo
oo
o
o oo
oooo
ooo
o oo oo
o
o
oo
o
oo
oo
oo
o
o
o
oo
o oo ooooo
o
ooo
ooo
o
oo
o
oo
o
o
oo
oo
ooo
oo
ooo
ooo
oooo
oo
o o
o
oo
ooo
o
oo
o oooooo
ooo
oo
o oooooo
o
oo o oo
ooo
ooo
oo
o
oooo
o oo
ooo
ooo
oo
ooo
o
−1.0 0.0 0.5 1.0−1.0
0.01.0
3
oooo
oo
ooo
oo
oo
ooo
o oo
ooo
o o
o
oooo
ooo
o oo
o
o
o
ooo
o
ooo
o o
oo
o
oo
o
o oo
o
oo
oo
oo
o
o
o
ooo o
oo o
o
oo
oo o
o
o
o
o oo oo
o
oo
oo oo
ooo
oooo
oo ooooo
ooo
oo
oo
ooo
o
oo
o
o
oo
oo
o
o
o oo
o
o
o
oo
oo o
oo
oooo
oooo o
ooo
o
o
o oo
o
ooo o o
o ooooo
ooo
oo
oo
o
o
oo
o
ooo
oooo o
o
o
ooo
ooo
o ooo
o
ooo
oo
oo
oo
o
oo
o
o
o
oo
ooo ooo
oo
oo
oo
o oo
ooo
o o
oo
oo
oo
oo
oo
oo oo
ooo
ooo o
oo
ooooo
o
oo o o o
oo
o
o
o oo
o o
ooo
o
oo
oo
o
oo
o
o o o
oooo
oo o
o o ooo
o
o
oo
o
oo
oo
oo
o
o
o
o o
oo oooooo
o
ooo
oo
o
o
oo
o
oo
o
o
oo
oo
ooo
o o
oo
oo
o o
oo
ooo
o
oo
o
oo
o oo
o
oo
o o ooooo
ooo
oo
oo ooooo
o
ooo oo
ooo
oo
oo
oo
o oo o
oo o
ooo
oo o
oo
oo oo
o
−1.0 0.0 0.5 1.0−1.0
0.01.0
4
oooo
oooo
oo
oo
oo o
o oo
oo o
o o
o
oo
oo
o oo
ooo
o
o
o
ooo
o
oooo o
oo
o
oo
o
ooo
o
oo
oo
oo
o
o
o
ooo o
ooo
o
oo
oooo
o
o
oooo o
o
oo
oo o o
oo o
ooooooo ooo
o
ooo
oo
oo
o oo
o
oo
o
o
oo
oo
o
o
o oo
o
o
o
oo
ooo
oo
o ooo
ooooooo
o
o
o
o oo
o
oooo o
oo oooo
ooo
oo
oo
o
o
o o
o
ooo
oo
ooo o
o
ooo
oo o
o o oo
o
ooo
oo
ooo o
o
oo
o
o
o
oo
o oo o o
o
ooo
oo
oo o
o
oo
o
o o
oo
oo
oo
oo
oo
o ooo
ooo
oo oo
oo
oooo
oo
ooo o o
oo
o
o
ooo
oo
ooo
o
oo
oo
o
oo
o
o o o
oo
ooo
oo
oo o oo
o
o
oo
o
oo
oo
oo
o
o
o
oo
ooo oo ooo
o
ooo
oo
o
o
oo
o
oo
o
o
oo
oo
o oo
o o
oo
oooo
oo
o oo
o
o o
o
oo
ooo
o
oooo o oooo
ooo
oo
ooo oooo
o
o oooo
ooo
oo
ooo
o
o ooooo
o
ooo
oo o
oo
ooo
ooo
−1.0 0.0 0.5 1.0−1.0
0.01.0
5
ooo
oo o
o
oo
oo
oo o
ooo
ooo
oo
o
oo
oo
oo o
ooo
o
o
o
ooo
o
oo
ooo
oo
o
oo
o
o oo
o
oo
oo
oo
o
o
o
o ooo
o oo
o
oo
ooo
o
o
o
o oo oo
o
oo
ooo o
ooo
ooooo
ooo ooo
ooooo
ooo o
oo
oo
o
o
oo
oo
o
o
ooo
o
o
o
oo
oo o
oo
o o oo
o oooo
ooo
o
o
ooo
o
o oooo
o oo ooo
ooo
ooo
oo
o
o o
o
o oo
oo
oooo
o
o ooo
oo
oo oo
o
ooooo
oo
oo
o
oo
o
o
o
oo
oooo o
o
oo
oo
oo
o oo
oo
o
oo
oo
oo
oo
oo
oo
oo oo
ooo
ooo o
oo
oooo
oo
oooo o
oo
o
o
ooo
oo
o oo
o
oo
oo
o
ooo
oo o
oo
o oooo
o oo oo
o
o
oo
o
oo
oo
oo
o
o
o
oo
o ooooo o
o
o
ooo
oo
o
o
oo
o
oo
o
o
o o
oo
o oo
oo
oo
oo
oo
oo
oooo
o o
o
oo
ooo
o
oo
ooo o ooo
ooo
oo
o oooooo
o
oo ooo
ooo
oo
oo
oo
ooo o
o oo
ooo
ooo
oo
ooo
ooo
o
−1.0 0.0 0.5 1.0−1.0
0.01.0
6
ooo
ooo
oo
oo
ooo
o oo
ooo
o o
o
oo
oo
ooo
ooo
o
o
o
ooo
o
oo
ooo
oo
o
oo
o
ooo
o
oo
oo
oo
o
o
o
oooo
o o o
o
oo
oooo
o
o
o ooo o
o
oo
ooooo
oo
o ooo
ooooo o
o
ooooo
oo
ooo
o
oo
o
o
oo
oo
o
o
o oo
o
o
o
oo
ooo
oo
oo oo
ooooo
oo
o
o
o
ooo
o
oooooo o oo
oo
ooo
oo
oo
o
o
oo
o
oo o
ooo oo
o
o
oooooo
o ooo
o
oo o
oo
oo
oo
o
oo
o
o
o
oo
ooo oo
o
ooo
oo
ooo
o
oo
o
oo
oo
oo
oo
oo
oo
o ooo
o oo
oooo
oo
oooo
oo
o oooo
oo
o
o
ooo
o o
ooo
o
o o
oo
o
oo
o
o oo
oo
ooooo
oo ooo
o
o
oo
o
oo
oo
oo
o
o
o
oo
oo ooo oo o
o
ooo
oo
o
o
oo
o
oo
o
o
o o
oo
ooo
oo
oo
oo
oo
oo
ooo
o
oo
o
oo
ooo
o
oo
o ooo o oo
ooo
oo
oo ooo oo
o
ooo oo
oo o
ooo
oo
o
ooo o
ooo
oo o
ooo
oo
oo oo
o o
oo
−1.0 0.0 0.5 1.0−1.0
0.01.0
7
oooo
o
oo
oo
ooo
o oo
oo o
oo
o
oo
o o
ooo
o oo
o
o
o
ooo
o
oo
ooo
oo
o
oo
o
oooo
oo
oo
oo
o
o
o
ooo o
oo o
o
oo
ooo
o
o
o
ooo oo
o
o o
oooo
oo o
oooooooooo
o
ooo
oo
oo
ooo
o
oo
o
o
oo
oo
o
o
ooo
o
o
o
oo
oo o
oo
o ooo
ooo oo
oo
o
o
o
ooo
o
o oo oo
oo o oo o
ooo
ooo
oo
o
oo
o
o oo
oo
oo oo
o
o oo
ooo
oo oo
o
ooo
oo
oo
oo
o
oo
o
o
o
oo
oooo o
o
oo
oo
oo
ooo
oo
o
o o
oo
oo
oo
oo
oo
oo oo
ooo
oooo
oo
oo oo
oo
ooooo
oo
o
o
ooo
oo
ooo
o
oo
oo
o
oo
o
oo o
oo
o oo
oo
ooo oo
o
o
o o
o
oo
oo
oo
o
o
o
o o
ooo ooo oo
o
ooo
ooo
o
oo
o
oo
o
o
oo
oo
ooo
o o
oo
oo
o o
oo
o oo
o
oo
o
oo
o oo
o
oo
oo ooo o o
ooo
oo
ooo ooo o
o
ooooo
oo o
oooo
oo
oooo
o oo
ooo
ooo
oo
ooo
ooo
oo
o
−1.0 0.0 0.5 1.0−1.0
0.01.0
8
ooo
o
oo
oo
ooo
ooo
oo o
oo
o
oo
o o
ooo
ooo
o
o
o
ooo
o
oo
oo o
ooo
oo
o
ooo
o
oo
oo
oo
o
o
o
o ooo
ooo
o
oo
ooo
o
o
o
o oo o o
o
oo
oooo
oo o
ooo oo
oooooo
ooo
oo
oo
oooo
oo
o
o
oo
oo
o
o
o oo
o
o
o
oo
ooo
oo
oo oo
o ooo o
oo
o
o
o
o ooo
o ooo o
ooo ooo
ooo
oooo
o
o
o o
o
o o o
oo
oooo
o
ooo
ooo
oooo
o
ooo
o o
oo
oo
o
oo
o
o
o
ooo ooooo
oo
ooo
ooo
o
oo
o
oo
ooo
o
oo
oo
oo
oooo
o oo
oooo
oo
oo o o
oo
ooo oo
oo
o
o
ooo
oo
o oo
o
oo
ooo
oo
o
ooo
oo
o oo
o o
ooooo
o
o
oo
o
ooo
oo
o
o
o
o
o o
ooooooo
o
o
ooo
ooo
o
oo
o
oo
o
o
o o
oo
ooo
o o
oo
oo
oo
oooo
oo
o o
o
oo
oo oo
oo
ooo ooo o
ooo
oo
ooooooo
o
ooooo
ooo
ooooo
o
o ooo
o oo
ooo
ooo
oo
oo o
ooo
oo
oo
−1.0 0.0 0.5 1.0−1.0
0.01.0
9
ooo
oo
oo
oo ooo
oo
ooo o
o
ooo o
o oo
ooo
o
o
o
o oo
o
oo
ooo
ooo
oo
o
o oo
o
oo
oo
oo
o
o
o
oooo
o oo
o
oo
oo o
o
o
o
oooo o
o
o o
oooo
ooo
o ooo o
oooooo
o oo
oo
oooo
oo
oo
o
o
oo
oo
o
o
ooo
o
o
o
oo
ooo
oo
o ooo
o oooo
ooo
o
o
oooo
ooo oooooo
o o
ooo
ooooo
o
o o
o
oo o
oo
oooo
o
ooo
oo o
oooo
o
oo o
oo
oooo
o
oo
o
o
o
oo
ooooo
o
oo
oo
oo
ooo
oo
o
o o
oo
oo
oo
oo
oo
oooo
ooo
oooo
oo
ooo o
oo
oooo o
oo
o
o
o oo
oo
ooo
o
o o
oo
o
oo
o
ooo
oo
o oo
oo
ooooo
o
o
o o
o
oooo
oo
o
o
o
o o
o oooo oo
o
o
o oo
oo
o
o
oo
o
ooo
o
o o
oo
o oo
oo
oo
oo
o o
oo
ooo
o
oo
o
oo
o oo
o
oo
o ooo ooo
ooo
oo
ooooo oo
o
ooooo
ooo
oo
oooo
oooo
ooo
oo o
oo o
oo
oo o
oo o
oo
ooo
−1.0 0.0 0.5 1.0−1.0
0.01.0
10
oo
oo
oo
ooooo
oo
o oo o
o
oo
oo
o o o
ooo
o
o
o
oo o
o
ooo
oo
oo
o
oo
o
ooo
o
oo
oo
oo
o
o
o
ooo o
oo o
o
oo
ooo
o
o
o
ooo oo
o
oo
oo oo
ooo
o ooo
oooooo
o
ooooo
ooooo
o
oo
o
o
oo
oo
o
o
ooo
o
o
o
oo
oo o
oo
oo oo
ooo ooo
oo
o
o
o oo
o
ooo o o
o oooo o
oo o
oo
ooo
o
oo
o
ooo
oo
o ooo
o
oooo
oo
oooo
o
oo o
oo
oooo
o
oo
o
o
o
oo
o oo oo
o
oo
oo
ooo o
o
oo
o
o o
oo
oo
oo
oo
oo
o ooo
o oo
oo oo
oo
oo oo
oo
ooooo
oo
o
o
o oo
oo
ooo
o
oo
oo
o
oo
o
o oo
oo
ooo
o o
o oooo
o
o
o o
o
ooooo
o
o
o
o
oo
o o oooo o
o
o
oo o
ooo
o
oo
o
oo
o
o
oo
oo
o oo
oo
oo
oo
o o
oo
ooo
o
oo
o
oo
o oo
o
oo
oo ooo oo
oo o
oo
oooooo o
o
o oooo
oo o
oo
oooo
ooo o
ooo
oooo
oo
oo
ooo
ooo
oo
o ooo
−1.0 0.0 0.5 1.0−1.0
0.01.0
11
o
oo
oo
ooo
o ooo
o ooo
o
oo
o o
o o o
ooo
o
o
o
ooo
o
oo
ooo
oo
o
ooo
o oo
o
oo
oo
oo
o
o
o
oooo
ooo
o
oo
ooo
o
o
o
oooo o
o
o o
ooo o
ooo
ooo ooo ooooo
ooo
oo
oo
ooo
o
oo
o
o
oo
oo
o
o
ooo
o
o
o
oo
ooo
oo
o ooo
o oo o o
oo
o
o
o
oooo
oooo o
oo oooo
ooo
ooo
oo
o
oo
o
o oo
oo
oo oo
o
o oooo o
o ooo
o
ooo
o o
oo
oo
o
oo
o
o
o
oo
oooo
oo
oo
oo
oo
ooo
oo
o
o o
oo
oo
oo
oo
oo
oo oo
ooo
ooo oo
o
ooo o
oo
ooooo
oo
o
o
o oo
oo
ooo
o
oo
ooo
ooo
oo o
ooo o
oo o
oo ooo
o
o
oo
o
oo
oooo
o
o
o
oo
o o o ooooo
o
o oo
ooo
o
oo
o
oo
o
o
o o
oo
ooo
oo
oo
oo
oo
oo
o ooo
o o
o
oo
ooo
o
oo
ooo ooo o
oo o
oo
ooooooo
o
oo ooo
ooo
oo
oo
oo
oooo
ooo
oo o
ooo
oo
ooo
oo o
oo
oo ooo
−1.0 0.0 0.5 1.0−1.0
0.01.0
12
oo
oo
oooooo
ooo
o o
o
oo
oo
ooo
o oo
o
o
o
ooo
o
ooo
oo
oo
o
ooo
o oo
o
oo
oo
oo
o
o
o
oooo
o oo
o
oo
oo o
o
o
o
oooooo
o o
oo oo
ooo
ooo o o
oo oooo
ooo
o o
oo
oooo
oo
o
o
oo
oo
o
o
o oo
o
o
o
oo
oo o
oo
oo oo
oooo o
oo
o
o
o
ooo
o
o ooooo oo o
oo
oo o
oooo
o
o
oo
o
o o o
oo
o ooo
o
ooo
ooo
oo ooo
oo o
o o
oo
o o
o
oo
o
o
o
ooo o
o ooo
oo
oo
oo
ooo
oo
o
oo
oo
oo
ooo
o
oo
oooo
ooo
oo oo
oo
oooo
oo
o oooo
oo
o
o
oo o
o o
ooo
o
o o
oo
o
oo
o
ooo
oo
ooo
o o
o oo oo
o
o
o o
o
oo
oooo
o
o
o
o o
oo o oooo
o
o
oo o
ooo
o
oo
o
oo
o
o
oo
oo
o oo
o o
oo
oo
oo
oo
ooo
o
oo
o
oo
oooo
oo
oooo ooo
ooo
oo
ooooooo
o
ooo oo
ooo
oo
oo
oo
oooo
o oo
oo o
oo o
oo
ooo
oo o
oo
ooo oo
o
−1.0 0.0 0.5 1.0−1.0
0.01.0
13
o
oo
oo o
ooo
ooo
o o
o
oo
oo
o oo
o oo
o
o
o
o oo
o
oo
ooo
oo
o
oo
o
oo oo
oo
oo
oo
o
o
o
o ooo
oo o
o
oo
oo o
o
o
o
ooooo
o
oo
ooo o
ooo
oooo
oooo oo
o
ooo
oo
oo
oooo
oo
o
o
oo
oo
o
o
ooo
o
o
o
oo
oo o
oo
oooo
o oo oo
ooo
o
o
ooo
o
oooooo o oooo
oo o
oo
ooo
o
oo
o
oo o
oo
o o oo
o
ooo
ooo
o oooo
ooo
oo
oo
oo
o
oo
o
o
o
oo
oooo o
o
oo
oo
oo
o oo
ooo
oo
oo
oo
oo
oo
oo
o ooo
ooo
ooo o
oo
oooo
oo
o oooo
oo
o
o
ooo
o o
ooo
o
oo
oo
o
oo
o
ooo
oo
ooo
oo
o o ooo
o
o
oo
o
oo
oo
oo
o
o
o
oo
ooo oo oo
o
o
ooo
oo
o
o
oo
o
oo
o
o
oo
oo
o oo
o o
oo
oo
o o
oo
o oo
o
oo
o
oo
o oo
o
oo
o oooo oo
ooo
oo
ooooooo
o
o oooo
ooo
oo
oo
oo
oooo
ooo
ooo
ooo
oo
ooo
ooo
ooo ooo o
o
o
−1.0 0.0 0.5 1.0−1.0
0.01.0
14o
o
ooo
ooo
ooo
oo
o
oo
o o
ooo
o oo
o
o
o
oo o
o
oo
ooo
oo
o
oo
o
oooo
oo
oo
oo
o
o
o
oooo
ooo
o
oo
ooo
o
o
o
o oooo
o
o o
oo oo
oo o
oooo
oo ooo o
o
ooooo
ooooo
o
oo
o
o
oo
oo
o
o
o oo
o
o
o
oo
ooo
oo
o ooo
oooo o
oo
o
o
o
ooo
o
o oo oooo o oo o
ooo
ooo
oo
o
oo
o
ooo
oo
oo oo
o
oooo
oo
oo oo
o
ooo
o o
oo
oo
o
oo
o
o
o
ooo o
o ooo
oo
oo
oo
o oo
oo
o
oo
oo
oo
oo
oo
oo
oo oo
o oo
oo oo
oo
oooo
oo
oo o oo
oo
o
o
ooo
o o
o oo
o
oo
oo
o
ooo
o oo
oo
oooo o
o o o oo
o
o
oo
o
oo
oo
oo
o
o
o
oo
o oooo o o
o
o
o oo
ooo
o
oo
o
ooo
o
oo
oo
ooo
oo
oo
oo
o o
oo
o oo
o
oo
o
oo
oooo
oo
oo oooo o
oo o
oo
o oooooo
o
oo ooo
ooo
oo
oo
oo
o ooo
ooo
ooo
oo o
oo
oo o
ooo
oo
oo ooo
o
oo
−1.0 0.0 0.5 1.0−1.0
0.01.0
15
o
ooo
o ooo
o ooo
o
oo
o o
ooo
ooo
o
o
o
ooo
o
ooo
oo
ooo
oo
o
oooo
oo
oo
oo
o
o
o
ooo o
ooo
o
oo
ooo
o
o
o
ooooo
o
oo
oo o o
ooo
oooooo o ooo
o
ooo
oo
oo
ooo
o
oo
o
o
oo
oo
o
o
ooo
o
o
o
oo
oo o
oo
oo oo
o oo oo
oo
o
o
o
o ooo
oooo o
ooo ooo
ooo
oo
oo
o
o
oo
o
ooo
oo
oooo
o
o oo
oo o
oooo
o
ooo
oo
oo
o o
o
oo
o
o
o
oo
oooo
oo
oo
oo
oo
ooo
ooo
oo
oo
oo
oo
oo
oo
o ooo
ooo
ooo o
oo
oooo
oo
o oo o o
oo
o
o
o oo
oo
o oo
o
oo
oo
o
oo
o
oo o
oo
o oo
oo
oo o oo
o
o
oo
o
oo
oo
oo
o
o
o
oo
oo ooo o o
o
o
oo o
oo
o
o
oo
o
ooo
o
oo
oo
o oo
o o
oo
oo
oo
oo
ooo
o
o o
o
oo
ooo
o
oo
o oo oooo
ooo
oo
o o ooooo
o
ooo oo
ooo
ooo
oo
o
o oooooo
ooo
oo o
oo
ooo
ooo
oo
o oo oo
o
oo
o
−1.0 0.0 0.5 1.0−1.0
0.01.0
16
Figure 2.3: Multiple lagged scatterplots showing the relationship between SOI and thepresent (xt) versus the lagged values (xt+h) at lags 1 ≤ h ≤ 16.
or autocorrelation, we utilize a sample version of the scaled autocovariance function, say
ρx(h) = γx(h)/γx(0),
where
γx(h) =1
n
n−h∑
t=1
(xt+h − x)(xt − x),
which is the sample counterpart with x =∑n
t=1 xt/n. Under the assumption that the
underlying process xt is white noise, the approximate standard error of the sample ACF is
σρ =1√n. (2.9)
That is, ρx(h) is approximately normal with mean 0 and variance 1/n.
As an illustration, consider the autocorrelation functions computed for the environmental
and recruitment series shown in the top two panels of Figure 2.4. Both of the autocorrelation
functions show some evidence of periodic repetition. The ACF of SOI seems to repeat at
periods of 12 while the recruitment has a dominant period that repeats at about 12 to 16
time points. Again, the maximum values are well above two standard errors shown as dotted
lines above and below the horizontal axis.
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 28
0 10 20 30 40 50−0.5
0.00.5
1.0ACF of SOI Index
0 10 20 30 40 50−0.5
0.00.5
1.0
ACF of Recruits
−40 −20 0 20 40−0.5
0.00.5
1.0
CCF of SOI and Recruits
Figure 2.4: Autocorrelation functions of SOI and recruitment and cross correlation functionbetween SOI and recruitment.
In order to examine this possibility, consider the lagged scatterplot matrix shown in
Figures 2.5 and 2.6. Figure 2.5 plots the SOI at time t + h, xt+h, versus the recruitment
ooooo oooo oooo o oo oo
oo
oo o o ooo
oo
oooo ooo
ooooo ooo
o
oo
o ooo
o ooo
oo oo
ooo
o
oo
ooo
o
ooo
oo
ooo
ooo o o
o oo
oooooo o oo
o oooooooo oooo
ooooooo o o
ooo oo ooo
oo
oo
o
o oo ooo
oo
oo oo
o o
oo oo
ooo oo
oo
oo oo
o ooo
o oo
o
ooooo
oooo o o
oo oo
o
oooo ooo
o
ooooooo o
oo o
oo oo
oooo
oo o
o
o
o oo oooo oo o o
oo ooo
oo
o
ooo
ooo o oo ooo oooo
ooo
oooooo
ooo
o ooooo
oo
ooo
oo ooooo
o ooo ooo o
ooo oooo oo
o
o
ooooo
oo oo
o oo ooo
oo
oo ooo o
o oo
oo ooo oo o
o
ooooo
oo
ooo o
o
o
oo ooo
ooo ooo
o oo oooo
o
oo ooo o oo ooo
oo
oo
o
oo o
oo
ooooooo ooo oo
ooo
oo
oo
oooooo
ooo
oo
oooo
oo oo
oooo oo ooo
ooo
−1.0 0.0 0.5 1.0
020
6010
0
1
oooo ooo
o oooo o oo o oo
oo
oo o oooo
oo
ooo ooooo
ooo ooooo
oo
oooo
oo oo
ooooooo
o
oo
ooo
o
ooooo
ooo
oo o oo
ooo
ooooo o ooo oooooooo oooo
ooooooo
o ooo
o oo ooooo
o
oo
o
ooooo o
o o
o ooo
oo
oooo
oo oo oo
o
o o oo
oooo
o oo
o
oo
ooooooo ooo o o
o
o
ooo
oooo
o
o ooo
oo ooo
ooo oo
o
ooooo
ooo
o
oo oooo ooo ooo ooo
oo
o
o
oo ooo o oo ooo oooo
oo oo
oooooo
oo
ooooooo
oooo o
ooooooo
ooo ooo oo
oo oooooo o
o
o
o ooo
o
ooo o
oo oooo
oo
oooo o o
ooo
o ooo ooooo
oooo
oo
ooo
ooo
o
o o ooo
o oooo ooo
oooo oo
o ooo o oo oo
ooo
oo o
o
o o o
oo
oooooo ooo
ooooo
o
oo
oo
oooo
oooo
oo o
oo o
o
oooo
oo ooo ooo
ooo
o
−1.0 0.0 0.5 1.0
020
6010
0
2
ooo oooo
oooo o oo o ooo
oo
oo ooooo
oo
oo ooooo
ooo oooo o
o
oo
ooo o
o ooo
oooo
o oo
o
oo
o oo
o
o oo
oo
ooo
oo oo o
o oo
oooo o
ooo oooooooo ooooo
oooooo
o oo oo
oo oooo oo
o
oo
o
o ooo oo
oo
oo oo
oo
oooo
ooo oo
oo
o ooo
oooo
oo o
o
ooooooo o
oooo oo
o
o
ooo
ooo o
o
ooo
oo oo o
ooo
oooo
oooo
oo o
o
o
o oooo oo oooo
ooooo
oo
o
ooo
o o oo ooo oooo oo
ooo
ooooo o
oo
oooooo o
ooo
o ooooooo o
oo ooo ooo
o oooo oo o o
o
o
oooo
o
oo o o
o ooooo
oo
ooo o o o
ooo
ooo oooooo
oooo
oo
oo
oo o
o
o
o ooooo o
oo o oo o
ooo ooo
oooo oo o
ooo o
oo
oo
o
o oo
oo
ooooo ooo
ooooooo
oo
ooooo
oo o
oo
oo o
ooo
o
oooo
o o oo ooo oo
o oo
−1.0 0.0 0.5 1.0
020
6010
0
3
oo ooooo
ooo o oo o ooooo
o
oooooo o
ooo
oooooooo oooo o o
o
oo
oooo
oooo
oooooo
oo
oo
o ooo
ooooo
o oo
ooo oo
ooo
ooo o o
oo oooooooo oooooo
ooooo o oo oo
oo oooo ooo
o
oo
o
ooo oo o
o o
o o oo
o o
oooooo oooo
o
oo oooo o
oo o
o
o
oooo
oo o ooo o
oo oo
o
o oooo oo
o
ooo
ooo o oo
o oooo
o
ooooo
ooo
o
oooo oo o ooo o
ooooo
oo
o
ooo
o oo ooo oooo ooo
o oo
oooo oo
oo
ooo
oo oo
oo
oo o
ooooo oo
o ooo oooo
oooo ooo o o
o
o
ooooo
oo oo
oooooo
oo
oo o o oo
ooo
oo oo ooooo
oooo
ooo
oo
o oo
o
oooo o
ooo o oo
oooo ooo
o
oo ooo oo
oooo
oooo
o
ooo
oooooo
ooo oooooo oo
oo
oooo
oo
ooo
oo
oo
ooo
o
oooo
o ooooo oo
ooo
o
−1.0 0.0 0.5 1.0
020
6010
0
4
o oooo oooo o oo o ooo
oo
oo
ooooo oo
oo
oooooooo
oooo o ooo
oo
o oo o
oo oo
ooo o
ooo
o
oo
ooo
o
ooooo
ooo
oo oo o
oooo
o o ooo oooooooo ooooooo
oooo o oo
oo oo
oooo oo oo
o
oo
o
oooo oooo
o ooo
oo
ooo o
ooooo o
o
o ooo
o o oo
ooo
o
oooooo ooo o o
o ooo
o
oooo oo o
o
ooo
oo o ooo
ooooo
o
oo oo
oo o
o
o
ooo oo o ooo oo
ooo oo
oo
o
oo o
oo ooo oooo oo oo
ooo
ooo oo o
oooooo ooo
oo
ooo
oooo ooo
ooo oooo o
ooooo o
o ooo
o
ooo oo
ooo o
oooooo
oo
oo o ooo
o oo
o oo oooooo
oooo
oo
oo
ooo
o
o
oooo o
ooo oo o
ooo ooo o
o
o o oo ooo
o ooo
oo
o o
o
ooo
oo
ooo o
oo oooooo
ooo
ooooo
oo
ooo
oo
oo o
oo o
o
ooo o
oo ooo ooo
oo oo
−1.0 0.0 0.5 1.0
020
6010
0
5
oooo ooo
o o oo o ooo ooo
oo
oooo ooo
oo
ooooooo o
ooo o oo oo
oo
oooo
o ooo
oo oo
o oo
o
oo
oo oo
ooo
o o
o oo
ooo oo
ooo
oo ooo
oooooooo ooooooooo
ooo oo
oo ooo
ooo oo ooo
o
oo
o
o oo oo o
o o
oooo
oo
oo oo
oooo o o
o
ooooo oo
oooo
o
oo
ooo ooo
o ooooo
o
o
ooo
oo oo
o
ooo
oo ooo
ooo
oooo
o ooo
ooo
o
o
oo oo o oooooo
oo o oo
oo
o
oo oo ooo oooo oo oo
ooo
o
oo oo o o
ooo
oo oooo
oo
ooo
ooo ooo o
oo oooo oo
oo oo o ooo o
o
o
oooo
o
oo oo
oooooo
oo
oo oooooo
o
oooooo
ooo
ooooo
oo
oo
o oo
o
oo ooo
o ooo oo
ooooo oo
o
o oooooo
ooo o
oo
o o
o
ooo
oooo oo
o oooooo ooo
o
oooo
oo
oo
ooo
oo
oo
ooo
o
oo o o
o ooo ooo o
ooo
o
−1.0 0.0 0.5 1.0
020
6010
0
6
ooo oooo
o oo o ooo ooo
oo
o
ooo oooo
ooo
ooooo oooo o oo oo
o
oo
o ooo
oooo
oooo
ooo
o
ooo o
oo
ooo
oo
o oo
oo ooooo
oo
ooo oooooo
oo ooooooooooo o oo oo oo o
ooo oo oo o
oo
oo
o
oooo oo
o o
oo oo
oo
ooo o
ooo o ooo
oooo
oo oooo
o
o
ooo o ooo o
oo oooo
o
o
ooo
o ooo
o
o oo
oooo o
ooo
oooo
oo oo
oo oo
o
o oo o ooooooo
o o o oo
oo
o
ooo
ooo oooo oo oo oo
ooo
ooo o oo
oo
oo oooo o
ooo
ooo
o ooo ooo oooo ooo
o oo o o oo oo
o
o
o oo o
o
oooooooo o
o
oo
ooooo o
ooo
o ooooooo
o
oooo
oo
oo
oo o
o
o
o o ooo
o oo oooo o
oo oooo
oo oooo o
oooo
oo
oo
o
ooo
oo
o ooooooo
oo ooo o
o
ooo
oo
oo
oo o
oo
ooo
ooo
o
oo oo
oooooo oo
ooo
o
−1.0 0.0 0.5 1.0
020
6010
0
7
oo ooooo
oo o ooo oo o oo
oo
oo oooo o
oo
ooooo ooo
o o oo oooo
oo
ooo o
oooo
ooo o
ooo
o
oo
oooo
ooo
o o
ooo
ooooo
ooo
ooo oo
oooooo oooooooooo
oo oo oo oo oo
oo oo oo o o
oo
oo
o
o oo oo o
oo
o ooo
oo
oo oo
oo o oo o
o
oo ooo oo
oooo
o
oo
o ooo o oo oo
oo oo
o
o oo
oooo
o
ooo
ooo oo
ooo
oooo
o oooo
ooo
o
oo o ooo ooooo
o o ooo
oo
o
oo o
oo oooo oo oo ooo
ooo
oo o ooo
oo
ooooo o o
oo
ooo
oooo ooooooo oooo
oo o o ooooo
o
o
ooo o
o
oooo
ooo ooo
oo
oooo oo
o oo
oooooooo
o
oooo
oo
oo
ooo
o
o
o ooo o
oooooo
ooo ooo o
o
o oooo oo
o oooo
ooo
o
ooo
oo
ooo ooooo
o oooo o
o
oo
oo
oo
oo
o oo
ooo o
ooo
o
ooo ooo ooo oo o
oo o
o
−1.0 0.0 0.5 1.0
020
6010
0
8
o oooo oo
o o ooo oo o o oo
oo
ooooo oo
ooo
ooo oooo
o oo ooooo
oo
oooooooo
oo oo
o oo
o
oooo
oo
o oo
o o
o oo
ooooo
o ooo
o oooooooo oooooooooooo
oooo oo
oooo
oo oo o ooo
o
oo
o
oooo o o
oo
oooo
o o
oooo
oo oo oo
o
o o oo
ooooooo
o
oo
ooo o ooooo
o ooo
o
ooooooo
o
o oo
oo oooo
oooo o
o
oo oo
ooo
o
o
o o ooooo
ooo o
o oo ooo
o
o
ooo
o oooo oo oo oooooo
o
oo oooo
oo
oooo o o o
ooo
ooo
oo ooo oooo oooo o
o o o oo oooo
o
o
o ooo
o
oooo
oo oooo
oo
ooo ooo
ooo
ooooo
ooo
o
oo oo
oo
oo
ooo
o
o
oooo o
o oooo o
ooooo o oo
oooo ooo
ooo o
oo
oo
o
ooo
oo
oooo
oooo ooo oooo
oo
ooo
oo
oooo
oo
oo
ooo
o
oo oo
o ooo oo ooo
o oo
−1.0 0.0 0.5 1.0
020
6010
0
9
oooo o oo
o ooo oo o o oooo
o
oooo ooo
oo
ooo oooo
ooo oooo o
o
oo
o ooooo o
oo
oooo o
oo
oo
oooo
ooo
oo
ooo
ooooo
o oo
ooooo
oooo ooooooooooo o
oo oo oo
ooooo
o oo o oo oo
o
oo
o
o oo o oo
o o
oooo
oo
ooooo
oo oooo
o oo oooo
oooo
o
oo
oo o oo oooo
oooo
o
o ooooo o
o
o oo
ooooo
ooo
o ooo
o ooo
ooo
o
o
o ooo ooooo o o
oo ooo
oo
o
ooo
oooo oo oo ooooo
o oo
oooooo
oo
ooo
o o oo
oo
oo oo
o ooo oooo oooo oo
o o oo ooooo
o
o
o oo o
o
oooo
o ooo oo
oo
oo ooo oo o
o
ooooo
ooo
o
o ooo
oo
oo
oooo
o
oo ooo
oooo oo
o ooo o oo
o
oooooo oooo o
oo
oo
o
ooo
ooo ooo
ooo ooo o o
ooo
oo
oo
oo
oo
o ooo
ooo
oo o
o
oooo
ooooo ooo
oo oo
−1.0 0.0 0.5 1.0
020
6010
0
10
ooo o ooo
ooo oo o o oooo
oo
ooo oooo
oo
oo oooo o o
o oooo ooo
oo
oooo
o ooo
ooo o
ooo
o
oooooo
o ooo o
o oo
oooo o
ooo
ooooo
ooo ooooooooooo o o
ooo oo oooo o
ooo o oo oo
oo
oo
o
ooo ooo
oo
oooo
o o
ooo o
oo ooo
oo
oo ooooo
ooo
o
o
oo
o o oo oooo o
ooo
o
o
ooo
oo oo
o
ooo
ooooo
ooo
oo oo
oo oo
oo o
o
o
ooo ooooo
o o oo ooo
oo
o
o
oo oooo oo oo ooooo
ooo
o
oooooo
oooo o o ooo
oo
ooo
oooo oooo oooo oo o
o oo ooooo oo
o
ooooo
oooo
ooo o oo
oo
oooo ooooo
ooooooooo
oo oo
oo
oo
oo o
o
o
o o oo o
ooo ooo
ooo o oo o
o
oo ooo oo
o ooo
oo
oo
o
oo o
oo
oooooo oo
o o oooo
o
ooo
oo
oo
ooo
ooo
oo
oo o
o
ooo ooo o
o ooo oo
ooo
−1.0 0.0 0.5 1.0
020
6010
0
11
oo o oo oo
oo oo o o ooooo
oo
oo ooooo
oo
ooooo o oo
oooo oo oo
oo
ooooooo
oo
o o ooo
oo
ooooo
o
o oo
oo
ooo
ooo o o
oooo
oooooo ooooooooooo o ooo
o oo oooo ooo
o o oo oooo
o
oo
o
o oooo o
oo
oo oo
oo
oo o o
ooooo o
o
o ooooooo
oo o
o
oo
o oo oooo oo
ooo
o
o
ooo
o oo o
o
oooooooo
oo o
o ooo
o ooo
oooo
o
oo ooooo oo ooooo o
oo
o
o
ooo
oo oo oo oooooooo o
o
ooooo o
ooo
o o oooo
oo
oooooo oooooooo oo o o
oo ooooo oo
o
o
o oooo
ooo ooo o o o
o
oo
ooo oo o
ooo
oooooo
ooo
o o oo
oo
oo
oo o
o
o
o oooo
ooooo o
ooo oo oo
o
o ooo ooo
o ooo
oooo
o
o oo
oo
ooooo ooo o ooo
ooo
oo
oo
oo
oo
oooo
ooo
ooo
o
oo oo
o ooooo o o
oooo
−1.0 0.0 0.5 1.0
020
6010
0
12
o o oo o oo
o oo o o ooooo
oo
o
ooooooo
oo
oooo o oo o
ooo oo ooo
oo
ooo o
oo oo
oo oo
o oo
o
oo
o oo
o
ooo
o o
ooo
oo o oo
o oo
ooooo
o ooooooooooo o oo o
ooo
oooo oo oo
o oo ooo oo
o
oo
o
o ooo oo
oo
o ooo
oo
oo ooo
ooo o oo
ooooooo
oo o
o
o
oo
oo ooooooo
o ooo
o
ooo
oo o o
o
o ooo
ooooo
oooo o
o
oooo
oo o
o
o
o ooooo o ooo o
oo o oo
oo
o
ooo
o oo oo oooooo oo
o oo
oooo oo
oo
oo ooooo
oo
oo o
oo oooo o
ooo oo o o o
o ooooo
oo oo
o
oooo
o
oo oo
o o o ooo
oo
oo oo oo
ooo
oooooo
o oo
o ooo
oo
oo
ooo
o
o
oo ooo
o ooo ooo o
oo oooo
oooooo o
oooo
oo
oo
o
ooo
oo
ooooooo o oooo
ooo
oo
oo
oo
ooo o
ooo
o o
oo oo
oooo
oo ooo o o o
ooo
o
−1.0 0.0 0.5 1.0
020
6010
0
13
o oo o ooo
oo o o ooooo oo
oo
ooooooo
oo
ooo o oo oo
oo oo oooo
oo
oooo
o ooo
oooo oo
oo
oo
ooo
o
o oo
oo
ooo
oo ooo
ooo
ooooo
ooooooooooo o oo oo
oo oooo oo ooo
oo ooo ooo
o
oo
o
ooo ooo
oo
oo oo
oo
ooo o
ooo o oo
o
ooooooo
oo o
o
o
oo
o oooo oooo
oo oo
o
ooo
o o oo
o
ooo
ooooo
oo o
o ooo
oooo
oo oo
o
ooooo o o oo ooo o oo
oo
o
o
ooo
oo oo oooooo ooo
ooo
ooo oooo
oo
oooooo
oo
ooo
ooooo oo
oo oo o o oo
ooooo oo o o
o
o
oooo
o
oooo
o o oooo
oo
ooo ooo
ooo
oooooooo
o
oo oo
oo
oo
ooo
o
o
o oooo
ooo ooo
o oo oooo
o
oo ooo o o
oooo
oooo
o
oo o
oo
ooo ooo o oo
oooo o
o
oo
oo
oo
oo
oooo
oo o
ooo
o
ooo o
o ooo o o oo
ooo
o
−1.0 0.0 0.5 1.0
020
6010
0
14
oo o ooooo o o ooooo oo
oo
o
ooooooo
oooo o oo oooo oo ooo o
o
oo
o ooooooo
ooo o
oooo
oo
o ooo
ooo
oo
ooo
oooo ooo
ooooo o
oooooooooo o oo oo oo
oooo oo oo oo
o ooo oo oo
o
oo
o
oooooo
o o
o ooo
o o
oo oo
oo o oo
oo
oooooooo
ooo
o
oo
oooo oooo o
o ooo
o
o oo
o ooo
o
ooo
oooo o
ooo
oo oo
oo oo
ooo
o
o
oooo o o oooooo oo o
oo
o
o
oo o
o oo oooooo oo oo
ooo
oo oooo
oo
oooooo o
oo
ooo
oooo ooo
o oo o o oo o
oooo ooo oo
o
o
oooo
o
ooo o
o ooooo
oo
oo oooo
ooo
oooooo
o oo
o o oo
oo
oo
oo o
o
o
oooo o
ooooo ooo
oooo oo
o ooo o oo
oooo
oo
o o
o
o oo
oooo oo
o o oooooo
ooo
oo
oo
ooo
ooo
oo
ooo
ooo
o
oo oo
oooo o ooo
ooo
o
−1.0 0.0 0.5 1.0
020
6010
0
15
o o ooo oo
o o ooooo oooo
oo
oooooo o
ooo
o oo oooo
oo ooo ooo
oo
ooo ooo o
ooo oooo
oo
oo
o oo
o
o oo
oo
ooo
ooo oo
oooo
oo ooooooooooo o oo oo oo
oooo oo oo o o
oooo oo oo
oo
oo
o
o ooooooo
oooo
o o
oooo
oo oo oo
o
oooooo o
ooo
o
o
oo
ooo oooo oo
ooo
o
o
ooo
ooo o
o
ooo
ooo oo
oo o
o ooo
o ooo
ooo
o
o
ooo o o oo ooo o
oo ooo
oo
o
ooooo ooooo
o oo o oooo
o
ooooo oo
oo
oooo oo
ooo
o oo
oo oooooo o o oo oo
ooo oo ooo o
o
o
ooooo
oo o o
oooo oo
oo
oooooo
ooo
oooo ooo oo
o ooo
oo
oo
oo o
o
o
oooooo o
oo o oo oooo oo
o
oooo ooo
ooooo
ooo
o
ooo
oo
o oooo oooo
oo ooo
o
oo
oo
oo
oo
ooo
ooo o
oo oo
ooo ooo o
o ooooo
ooo
−1.0 0.0 0.5 1.0
020
6010
0
16
Figure 2.5: Multiple lagged scatterplots showing the relationship between the SOI at timet+ h, say xt+h (x-axis) versus recruits at time t, say yt (y-axis), 0 ≤ h ≤ 15.
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 29
series yt at lag 0 ≤ h ≤ 15 in Figure 2.5. There are no particularly strong linear relations
apparent in this plots, i.e. future values of SOI are not related to current recruitment. This
means that the temperatures are not responding to past recruitment. In Figure 2.6, the
current SOI values, xt are plotted against the future recruitment values, yt+h for 0 ≤ h ≤ 15.
It is clear from Figure 2.6 that the series are correlated negatively for lags h = 5, . . . , 9. The
ooooo oooo oooo o oo oo
oo
oo o o ooo
oo
oooo ooo
ooooo ooo
o
oo
o ooo
o ooo
oo oo
ooo
o
oo
ooo
o
ooo
oo
ooo
ooo o o
o oo
oooooo o oo
o oooooooo oooo
ooooooo o o
ooo oo ooo
oo
oo
o
o oo ooo
oo
oo oo
o o
oo oo
ooo oo
oo
oo oo
o ooo
o oo
o
ooooo
oooo o o
oo oo
o
oooo ooo
o
ooooooo o
oo o
oo oo
oooo
oo o
o
o
o oo oooo oo o o
oo ooo
oo
o
ooo
ooo o oo ooo oooo
ooo
oooooo
ooo
o ooooo
oo
ooo
oo ooooo
o ooo ooo o
ooo oooo oo
o
o
ooooo
oo oo
o oo ooo
oo
oo ooo o
o oo
oo ooo oo o
o
ooooo
oo
ooo o
o
o
oo ooo
ooo ooo
o oo oooo
o
oo ooo o oo ooo
oo
oo
o
oo o
oo
ooooooo ooo oo
ooo
oo
oo
oooooo
ooo
oo
oooo
oo oo
oooo oo ooo
ooo
−1.0 0.0 0.5 1.0
020
6010
0
1
oooooo
ooo oooo o ooo
oo
ooo o o oo
oo
ooooo oooooooo oo
o
oo
oooo
oo oo
ooo o
ooo
o
oo
o oo
o
o ooo o
ooo
oo oo o
ooo
ooooo
oo o ooo oooooooo oo
ooooooooo o
oo oo oo oo
oo
oo
o
o ooo oo
o o
o ooo
o o
ooo o
oooo oo
o
oooo
oo ooooo
o
oooo
oooooo ooooo
o
o oo
oo oo
o
o oo
ooooo
oo o
oooo
oooo
ooo
o
o
oo oo oooooo o
ooo oo
oo
o
oo o
o ooo o oo ooo ooo
o oo
oo oooo
oo
oo o oooo
oo
oooo
o o oooooo ooo ooo
oooo oooo o
o
o
o oo o
o
ooo oo o oo o
o
oo
ooo ooo
o oo
ooo ooooo
o
oooooo
oo
ooo
o
o
o ooo o
ooo o oo
o ooo ooo
o
oooooo ooo
ooo
ooo
o
ooo
oo
oooooooo ooo o
ooo
oo
oo
oo
oooo
oo
ooo
oo o
o
ooo o
oooo o oo o
oo oo
o
−1.0 0.0 0.5 1.0
020
6010
0
2
ooooo
oooo oooo o oo
oo
oo oo o o o
ooo
o oooo oooooooo o
o
oo
o oo oooo
oo
oooooo
o
oo
ooo
o
o oo
oo
ooo
ooo oo
o oo
oo oooooo o oo
o oooooooo oo
ooooooooo
ooo oo oo o
oo
oo
o
ooo oo o
oo
oo oo
oo
oooo
ooooo oo
ooooo ooo
ooo
o
oo
oooooo
oooo ooo
o
ooo
ooo o
o
ooo
ooooo
ooo
o ooo
oooo
ooo
o
o
o oo ooooo
o ooo ooo
oo
o
o
oo o
oo ooo o oo ooo oo
ooo
ooo ooo
oo
ooo
o ooo
oo
ooo
oo o o ooo
ooo ooo oo
o oooo o
oooo
o
o ooo
o
oooo
oo o ooo
oo
oooo oo
o oo
oooo ooo o
o
oooo
ooo
oo
ooo
o
o o oo o
oooo o o
ooo oo oo
o
o ooo ooo
o oo oo
oo o
o
o oo
oo
oooooooo
o ooooo
o
oo
oo
oo
oooo
oo
ooo
oo o
o
ooooooo
oo o ooo
ooo
oo
−1.0 0.0 0.5 1.0
020
6010
0
3
oooo
o oooo oooo oo
oo
ooo oo o o
oo
ooo oooo
oooooooo
o
ooo o
ooooo
oo
o ooo o
oo
oo
ooo
o
ooo
oo
o oo
oooo o
o ooo
oo oooooo
o ooo oooooooooooooooooo
oo oo oo oo
oo
oo
o
o oo o oo
oo
o ooo
o o
oo ooo
oooooo
o oooo o o
ooo
o
o
ooo oo
oooooo
o o oo
o
o oo
oooo
o
ooo
oo oooo
o oo o o
o
o ooo
ooo
o
o
oo oo oo oooo o
o o ooo
oo
o
oo o
o oo ooo o oo oooo
ooo
oo oo oo
ooo
o oo o oo
oo
oo o
ooo o o oo
oooo ooo o
oo ooooooo
o
o
o oo oo
oooo
o oo o oo
oo
ooooo o
ooo
o oooo oooo
o oooo
ooo
ooo
o
o
oo ooo
o oooo o
ooo o oo o
o
oo ooo ooo o
ooo
ooo
o
oo o
oo
o oooooooo
o ooo o
o
ooo
oo
oo
ooooo
oo o
ooo
o
oo oo
o ooooo o o
ooo
ooo
o
−1.0 0.0 0.5 1.0
020
6010
0
4
ooo
oo oooo oooo
oo
o
oooo oo o
oo
oooo ooooooooooo
o
oooo
o oo oo
oo
oo ooo
oo
oo
o oo
o
ooo
o o
ooo
ooooooo
ooo oo o
ooooo o ooo ooooooo
ooo
ooooooo
oo o oo oo o
oo
oo
o
oooo o o
o o
oo oo
oo
oo o o
oo ooo
oo
oo oooo o
oo o
o
o
oo
oooooo
ooooo
oo
o
o oo
o ooo
o
oooo
oo ooo
oooo o
o
oo ooo
ooo
o
o oo oo ooo
ooooo o ooo
o
o
ooo
o o oo oooo oo oo
ooo
o
ooo oo oo
oooo oo o o
ooo
ooo
ooo o o oooooo ooo
ooo oooo oo
o
o
ooo o
o
oooo
oo oo oo
oo
oooooo
ooo
o o oooooo
o
oo oooo
ooo
ooo
o
o ooo oo o
ooooo o
oo o ooo
oooooo o
ooo o
oo
oo
o
ooo
oo
o o oooooo
ooo ooo
o
oo
oo
oo
oo
o oooo
oo
ooo
o
ooo o
oo ooooo o
oo oo
o o
oo
−1.0 0.0 0.5 1.0
020
6010
0
5
ooooo oooo ooo
oo
o
oo ooo oo
oo
ooooo ooo
o ooooooo
oo
ooo o
oo oo
oo oo
ooo
o
oo
ooo
o
o oo
o o
ooo
ooooo
o oo
ooo oo
oooooo o ooo oooooo
oo ooooo
oooo
oo o oo oooo
oo
o
o oo oo o
oo
oooo
o o
ooo o
ooo ooo
o
o oo ooooo
ooo
o
oo
o ooooooooooo o
o
ooo
oo oo
o
o oooo oo o
ooo
o ooo
ooooooo
o
o
oo oo oo ooooo
o oo oo
oo
o
ooo
o o o oo ooo o oo oo
o oo
oo oo oo
oo
ooo
o oo o
ooooo
ooooo o o
oooooo oo
o ooo oo
oo oo
o
o oo o
o
oo oo
ooo ooo
oo
oooooo
o oo
o o o oooo oo
o oooooo
oooo
o
o
oo oo o
ooo ooo
o oooo o o
o
oooo ooo
ooo o
oo
oo
o
o oo
oooo o o
oooooooo
ooo
oo
ooo
oo
oo o
ooooo
ooo
o
oo oo
oooooooo
ooo
ooo
oo
o
−1.0 0.0 0.5 1.0
020
6010
0
6
ooooo oooo oo
oo
o
oo o ooo o
oo
oooooo oo
oo oooooo
oo
oooo
o ooo
ooo o
o oo
o
oo
ooo
o
ooo
oo
o oo
ooooo
ooo
oo oo o
o ooooo
o o ooo oooooo
oo oooooooo
ooo o oo oo
o
oo
o
oooo oo
o o
o ooo
oo
oo oo
oooo oo
o
oo oo
oooo
o oo
o
oo
o o oo ooooo
oooo
o
ooo
o oo o
o
ooo
ooo oo
ooo
oo oo
o ooo
ooo
o
o
ooo oooo o
o oooo oo
oo
o
o
ooo
oo o o oo ooo o ooo
ooo
ooo oo o
oo
ooooo oo
oo
ooo
oo oooo o
o oooooo o
oo ooo oooo
o
o
oooo
o
ooo o
oooo oo
oo
oooooo
ooo
oo o o oooo
o
oo oo
oooo
ooo
o
o
ooooo
o oo o oo
ooo ooo o
o
o oooo oo
o ooo
oo
o o
o
oo o
oo
ooo ooooooooo
o oo
oo
oo
oo
oo
ooo
oooo
oo o
o
oo o o
o ooo ooooo
o oo
oo
o o
oo
−1.0 0.0 0.5 1.0
020
6010
0
7
ooooo oooo ooo
o
ooo o ooo
oo
oo ooooo o
ooo ooooo
oo
o oooo o o
oo
ooooo
oo
oo
ooo
o
ooo
oo
o oo
oo ooo
ooo
oo o oo oo ooo
ooo o ooo oooooooo oo
ooooo
oooo o ooo
o
oo
o
ooo oo o
o o
oo oo
o o
ooo o
oo ooo o
o
ooooo oo
oo o
o
o
oooo o oo o
oooooo
o
o
o oo
o o oo
o
ooo
oooo o
oooooo
o
o o oo
oooo
o
oooo oo oooo oooo o
oo
o
o
ooo
ooo o o oo ooo o oo
ooo
oooo oo
oo
oooooo o
oo
oooo
oo ooooo o oooooo
ooo oooooo
o
o
ooo o
o
oo oo
oooooo
oo
oo oooo
ooo
ooo o o o
ooo
ooooo
oooo
ooo
o
oooo o
o ooo o o
ooo o ooo
o
oo oooo o
oooo
oo
oo
o
ooo
oo
o oooo ooooooo
ooo
oo
oo
ooo
ooo
oo
ooo
ooo
o
ooo o
oo ooo ooo
oo o
oo o
oo
oo
o
−1.0 0.0 0.5 1.0
020
6010
0
8
ooooo oooo
ooo
oo oo o oo
ooo
o o ooooo
oooo oooo
oo
oooo
oo oo
ooooo o
oo
oo
ooo
o
o oo
o o
ooo
ooo oo
ooo
ooo o o
o oo oooooo o ooo ooo
ooo
oo ooooo
oooooo o o
oo
oo
o
oooo oo
oo
o ooo
oo
oo oo
oo o oo
oo
oooo
oo ooooo
o
oo
ooo o oo
ooooo
oo
o
o oo
oo o o
o
oooo
ooooo
o oooo
o
oo oo
oo o
o
o
ooooooo
oo oo
ooooo
oo
o
oo o
oooo o o oo ooo oo
o oo
ooooo o
oo
ooooooo
oo
ooo
oooo ooo
o o o ooooo
o ooo ooo oo
o
o
oooo
o
oo o o
o ooooo
oo
ooo ooo
ooo
o ooo o ooo
o
o ooo
oo
oooooo
o
ooooo
ooo oo o
oooo o ooo
o oooooo
ooo o
oo
o o
o
ooo
oo
oo ooo o oooooo
ooo
ooo
oo
ooo
o oo
oo
oo
ooo
o
oooo
o ooooo oo
ooo
ooo
oo
oo
oo
−1.0 0.0 0.5 1.0
020
6010
0
9
ooooo oooo
oo
oo o oo o o
oo
oo o o oooo
o oooo ooo
oo
ooo oooo
oo
o oooo
oo
oo
o oo
o
ooo
oo
ooo
oooo o
oooo
o oo ooo oo oooooo o ooo ooo
ooooo ooooooooooo o
oo
oo
o
o oooo o
o o
o o oo
oo
ooo o
ooo o ooo
ooooo oo
oooo
o
oo
oooo o o
o oooooo
o
ooo
ooo o
o
o oo
oo ooo
ooo
oooo
o ooo
ooo
o
o
oooooo oooo o
o oooo
oo
o
ooo
ooooo o o oo oooo
ooo
oo oooo
oo
oo ooooo
ooo
o oooooo oo
oo o o oooo
oo ooo ooo o
o
o
o ooo
o
oo o o
oo oooo
oo
oo oo oo
ooo
oo ooo oo oo
oo oo
ooo
ooooo
o
oooooo o
o o ooo o
ooo o oo
o o oo oooo o
oooo
o o
o
o oo
oo
ooo ooo o ooooo
ooo
oo
oo
oo
ooooo
oo
o o
ooo
o
oooo
o o oo ooo o
ooo
oo o
o o
oo
ooo
−1.0 0.0 0.5 1.0
020
6010
0
10
ooooo ooo
oo
ooo o oo o
oo
ooo o o ooo
oo oooo oo
oooo
ooooo
oo
oo ooo
oo
oo
ooo
o
ooo
oo
o oo
oo ooo
oooooo oo
o oo oo oooooo o ooo oo
oooooo oooo
oooooooo
o
oo
o
oooooo
oo
oo oo
o o
oo ooo
o oo o oo
o ooo
oo oo
ooo
o
oo
o oooo ooo o
oooo
o
ooo
o ooo
o
ooo
ooo oo
oo o
o ooo
oo oo
ooo
o
o
ooooooooo oo
oo ooo
oo
o
ooo
o ooooo o o oo ooo
o oo
ooo ooo
oo
ooo oooo
oo
oo o
oooooo o
ooo o o ooo
ooo oooooo
o
o
oooo
o
ooo o
o oo ooo
oo
oo o oo o
ooo
ooo ooo
o oo
oooo
oo
oo
oooo
o
ooooo
oooo o o
o ooooo o
o
oo ooo oo
ooooo
ooo
o
oo o
oo
o oooooo o oooo
ooo
oo
ooo
oo
ooo
oo
oo o
ooo
o
oo oo
oo ooo oooooo
oo o
oo
oo
o ooo
−1.0 0.0 0.5 1.0
020
6010
0
11
ooooo ooo
o
oooo o oo
oo
oo oo o o oo
ooo ooooo
oo
oooo
o oooo
o oooo
oo
oo
ooo
o
ooo
o o
ooo
oo o oo
o oo
oooo o
o o oooo oooooo o ooo
ooo
ooooo oo
oooooooo
oo
oo
o
o oo ooo
o o
o ooo
oo
ooo o
ooo oo o
o
oo ooooooo o
o
o
oo
oo ooooo oo
ooo
o
o
oooo o oo
o
o oo
oooo oooo
oo oo
oooo
oo o
o
o
oooooooooo o
o oo oo
oo
o
oo ooo ooooo o o oo ooo o
o
oooo oo
oo
oo oo ooo
oo
ooo
ooooooo
oooo o o oo
oooo ooo oo
o
o
ooo o
o
oo oo
o o oo oo
oo
ooo o oo
ooo
oooo ooo o
o
oooo
oo
ooo
ooo
o
ooooo
ooo oo o
ooo oooo
o
oooo oo o
ooo o
oo
oo
o
o oo
oo
oo ooo ooo o ooo
ooo
oo
oo
oo
oo
ooo
oo
oo
oooo
ooo o
oooo oo ooo
ooooo
o o
oo
ooooo
−1.0 0.0 0.5 1.0
020
6010
0
12
oooooo
oo
ooooo o o
oo
ooo oo o o o
oooo oooo
oooo
oooo o
ooo o o
o oo
o
oo
o oo
o
ooo
oo
ooo
ooo o o
oooo
oooooo o oo
oo oooooo o ooo
oooooooo o
oooooooo
oo
oo
o
oooo oo
oo
oo oo
o o
oooo
oo oo oo
o
oooo
oooo
oo o
o
oo
o oo oooo o o
o ooo
o
ooo
oo o o
o
o oo
ooooo
oooo oo
o
oooo
oo o
o
o
o oooooooo oo
oo ooo
oo
o
oo o
ooo ooooo o o ooo
ooo
oo ooo o
oo
ooo oo oo
ooo
o oo
o oooooo oooo o o o
ooooo o
oo oo
o
oooo
o
ooo o
o o o ooo
oo
oo oo o o
o oo
ooooo o
ooo
o ooo
oo
oo
oo o
o
o
ooooo
oooo oo
o oo o ooo
o
o ooo o oo
oooo
oo
o o
o
o o o
oo
ooo ooo ooo o oo
ooo
ooo
oo
oo
ooo
oo
ooo
oo o
o
oooo
oooo o oo o
oo o
ooo
o o
oo
ooooo
o
−1.0 0.0 0.5 1.0
020
6010
0
13
oooo
oo
o
oo oooo o
oo
oooo oo o o
ooooo ooo
oo
oooo
oooo
ooo o
ooo
o
oo
ooo
o
o oo
oo
o oo
oooo o
ooo
oooooo oo o oo
oo oooooo o oo
o oooooooooooooooo
oo
oo
o
o oo oo o
oo
o ooo
o o
oo oo
ooo oo o
o
o oooooo
oo o
o
o
oo
o o oo oooo o
oooo
o
oooooo o
o
ooo
oo ooo
ooo
oo oo
oooo
ooo
o
o
oo oooooo
oo oo oo o
oo
o
o
oooo ooo ooooo o o oo
ooo
ooo ooo
oo
oo oo oo o
oo
ooo
oo o oooo
oo oooo o o
ooooooooo
o
o
o ooo
o
oooo
oo o o oo
oo
ooo oo o
ooo
oooooo
ooo
o o oo
oo
oo
ooo
o
o
ooooo
ooooo o
o ooo o oo
o
o o ooo o oo o
ooo
ooo
o
oo o
oo
ooooooo ooo o o
ooo
oo
oo
oo
oo
ooo
oo
o o
oo o
o
oooo
o oooo o oo
ooo
ooo
oo
ooo ooo oo
o
−1.0 0.0 0.5 1.0
020
6010
0
14ooo
oo
o
ooo oooo
oo
oo ooo oo
oo ooooo o
o
oo
oooo
oooo
oooo o o
oo
oo
o oo
o
ooo
oo
ooo
oo ooo
o ooo
oooooo oo o oo oo oooooo oooo ooo
ooooo
oooooooo
o
oo
o
oooo oo
oo
oo oo
oo
ooo o
oo oo oo
o
o o oo
o ooo
ooo
o
oooo o oo o
oooo oo o
o
ooooooo
o
ooo
ooo oo
oo o
oooo
o ooo
oo o
o
o
ooo ooooo
ooooo oo
oo
o
o
oo o
o o ooo ooooo o o
oo oo
oo oo oo
oo
ooo oo oo
oo
ooo
ooo o ooo
ooo oooo o
o ooooo
o ooo
o
oooo
o
oooo
o oo o oo
oo
oooo oo
o oo
oooooo
o oo
o o oo
oo
oo
oo o
o
o
ooooooo
oooo ooo oo o oo
oo oooo ooo
ooo
ooo
o
ooo
oo
o oooo ooo ooo o
ooo
ooo
oo
oo
oo o
oo
ooo
ooo
o
oooo
oo oooo o o
ooo
ooo
oo
oo
ooooo
o
oo
−1.0 0.0 0.5 1.0
020
6010
0
15
ooo
oo
oooo ooo
oo
oo o ooo oo
o o oooooo
ooo o
ooooo
oo
oooo oo
o
oo
ooo
o
ooo
oo
ooo
ooo oo
o oo
oo ooo
ooo oo o oo oo ooooooo
ooo oooooo
oo oooooo
oo
oo
o
o oo oo o
o o
oooo
o o
oo oo
ooo oo o
o
oo oooo o
ooo
o
o
oooo
o o ooooo
o o oo
o
ooooooo
o
o oo
oo oo o
ooo
oooo
oo ooo
ooo
o
o ooo oooo
oooo oo o
ooo
o
ooo
oo o ooo ooooo oo
ooo
oo o oo o
oo
oooo oo o
oo
ooo
oo oo o oo
oooo oooo
o o oooooo o
o
o
ooo o
o
oo oo
oo oo oo
oo
ooooo o
o oo
o ooooo
ooo
oo oo
oo
oo
oooo
o
ooooooo
ooooo o
o o oo oo
oooo ooo
o oo o
oo
o o
o
o oo
oo
oooo
oo ooo oooo o
o
oo
oo
oo
oo
ooo
oo
oo
ooo
o
ooooooo
oooo oo
o ooo o
oo
oo
o oooo
o
oo
o
−1.0 0.0 0.5 1.00
2060
100
16
Figure 2.6: Multiple lagged scatterplots showing the relationship between the SOI at timet, say xt (x-axis) versus recruits at time t+ h, say yt+h (y-axis), 0 ≤ h ≤ 15.
correlation at lag 6, for example, is −0.60 implying that increases in the SOI lead decreases
in number of recruits by about 6 months. On the other hand, the series are hardly correlated
(0.025) at all in the conventional sense, measured at lag h = 0. The general pattern suggests
that predicting recruits might be possible using the El Nino at lags of 5, 6 ,7, . . . months.
We show in the right panels panel of Figure 2.7 the partial autocorrelation functions of
the SOI series (left panel) and the recruits series (right panel). Note that the PACF of the
SOI has a single peak at lag h = 1 and then relatively small values. This means, in effect,
that fairly good prediction can be achieved by using the immediately preceding point and
that adding further values does not really improve the situation. Hence we might try an
autoregressive model with p = 1. The recruits series has two peaks and then small values,
implying that the pure correlation between points is summarized by the first two lags.
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 30
0 5 10 15 20 25 30
−0.5
0.00.5
1.0
PACF of SOI
0 5 10 15 20 25 30
−0.5
0.00.5
1.0
PACF of Recruits
Figure 2.7: Partial autocorrelation functions for the SOI (left panel) and the recruits (rightpanel) series.
Table 2.1: AICC values for ten models for the recruits series
p 1 2 3 4 5 6 7 8 9 10AICC 5.75 5.52 5.53 5.54 5.54 5.55 5.55 5.56 5.57 5.58
We consider the simple problem of modeling the recruit series shown in the right panel
of Figure 2.1 using an autoregressive model. The top right panel of Figure 2.4 and the right
panel of Figure 2.7 shows the autocorrelation and partial autocorrelation functions of the
recruit series. The PACF has large values for h = 1 and 2 and then is essentially zero for
higher order lags. This implies by the property of an autoregressive model that a second
order (p = 2) AR model might provide a good fit. Running the regression program for an
AR(2) model with intercept
xt = φ0 + φ1 xt−1 + φ2 xt−2 + wt
leads to the estimators φ0 = 61.8439(4.0121), φ1 = 1.3512(0.0417), φ2 = −0.4612(0.0416)
and σ2 = 89.53, where the estimated standard deviations are in parentheses. To determine
whether the above order is the best choice, we fitted models for 1 ≤ p ≤ 10, obtaining
corrected AIC (AICC) values summarized in Table 2.1 using (2.3). This shows that the
minimum AICC is obtained for p = 2 and we choose the second order model.
The previous example uses various autoregressive models for the recruits series, for exam-
ple, one can fit a second-order regression model. We may also use this regression idea to fit
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 31
the model to other series such as a detrended version of the SOI given in previous discussions.
We have noted in our discussions of Figure 2.7 from the partial autocorrelation function that
a plausible model for this series might be a first order autoregression of the form given above
with p = 1. Again, putting the model above into the regression framework (1.2) for a single
coefficient leads to the estimators φ1 = 0.59 with standard error 0.04, σ2 = 0.09218 and
AICC(1) = −1.375. The ACF of these residuals shown in the left panel of Figure 2.8, how-
ever, will still show cyclical variation and it is clear that they still have a number of values
exceeding the 1.96/√n threshold. A suggested procedure is to try higher order autoregressive
0 5 10 15 20
−0.5
0.00.5
1.0
ACF of residuls of AR(1) for SOI
oo o
o
o o o o
o
o
o oo
o
o o o o o o o oo o o
o o o oo
0 5 10 15 20 25 30
−1.70
−1.65
−1.60
−1.55
−1.50
−1.45
−1.40
−1.35
Lag
oo o
o
o o o o
o
o
o oo
o
o o o o o o o oo o o
oo
o oo
AICAICC
Figure 2.8: ACF of residuals of AR(1) for SOI (left panel) and the plot of AIC and AICCvalues (right panel).
models and successive models for 1 ≤ p ≤ 30 were fitted and the AIC and AICC values are
plotted in the right panel of Figure 2.8. There is a clear minimum for a p = 16 order model.
The coefficient vector is φ with components and their standard errors in the parentheses
0.4050(0.0469), 0.0740(0.0505), 0.1527(0.0499), 0.0915(0.0505), −0.0377(0.0500), −0.0803(0.0493),
−0.0743(0.0493), −0.0679(0.0492), 0.0096(0.0492), 0.1108 (0.0491), 0.1707(0.0492), 0.1606(0.0499),
0.0281(0.0504), −0.1902(0.0501), −0.1283(0.0510), −0.0413(0.0476), and σ2 = 0.07166.
Exercise: Please use LASSO type approach (you can choose any one) to re-run this example
again to see what you can obtain. As for the R command, you can use the package lasso2
or lars.
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 32
2.8 Computer Codes
#####################################################################
# This is Example 2.1 for Southern Oscillation Index and Recruits data
####################################################################
y<-read.table("c:/res-teach/xiamen12-06/data/ex2-1a.txt",header=F)
# read data file
x<-read.table("c:/res-teach/xiamen12-06/data/ex2-1b.txt",header=F)
y=y[,1]
x=x[,1]
postscript(file="c:/res-teach/xiamen12-06/figs/fig-2.1.eps",
horizontal=F,width=6,height=6)
#win.graph()
par(mfrow=c(1,2),mex=0.4,bg="yellow")
# save the graph as a postscript file
ts.plot(y,type="l",lty=1,ylab="",xlab="")
# make a time series plot
title(main="Southern Oscillation Index",cex=0.5)
# set up the title of the plot
abline(0,0)
# make a straight line
#win.graph()
ts.plot(x,type="l",lty=1,ylab="",xlab="")
abline(mean(x),0)
title(main="Recruit",cex=0.5)
dev.off()
n=length(y)
n2=n-12
yma=rep(0,n2)
for(i in 1:n2)yma[i]=mean(y[i:(i+12)]) # compute the mean
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 33
yy=y[7:(n2+6)]
yy0=yy-yma
postscript(file="c:/res-teach/xiamen12-06/figs/fig-2.2.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(1,2),mex=0.4)
ts.plot(yy,type="l",lty=1,ylab="",xlab="")
points(1:n2,yma,type="l",lty=1,lwd=3,col=2)
ts.plot(yy0,type="l",lty=1,ylab="",xlab="")
points(1:n2,yma,type="l",lty=1,lwd=3,col=2) # make a point plot
abline(0,0)
dev.off()
m=17
n1=n-m
y.soi=rep(0,n1*m)
dim(y.soi)=c(n1,m)
y.rec=y.soi
for(i in 1:m)
y.soi[,i]=y[i:(n1+i-1)]
y.rec[,i]=x[i:(n1+i-1)]
text_soi=c("1","2","3","4","5","6","7","8","9","10","11","12","13",
"14","15","16")
postscript(file="c:/res-teach/xiamen12-06/figs/fig-2.3.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(4,4),mex=0.4,bg="light blue")
for(i in 2:17)
plot(y.soi[,1],y.soi[,i],type="p",pch="o",ylab="",xlab="",
ylim=c(-1,1),xlim=c(-1,1))
text(0.8,-0.8,text_soi[i-1],cex=2)
dev.off()
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 34
text1=c("ACF of SOI Index")
text2=c("ACF of Recruits")
text3=c("CCF of SOI and Recruits")
SOI=y
Recruits=x
postscript(file="c:/res-teach/xiamen12-06/figs/fig-2.4.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(2,2),mex=0.4,bg="light pink")
acf(y,ylab="",xlab="",ylim=c(-0.5,1),lag.max=50,main="")
# make an ACF plot
legend(10,0.8, text1) # set up the legend
acf(x,ylab="",xlab="",ylim=c(-0.5,1),lag.max=50,main="")
legend(10,0.8,text2)
ccf(y,x, ylab="",xlab="",ylim=c(-0.5,1),lag.max=50,main="")
legend(-40,0.8,text3)
dev.off()
postscript(file="c:/res-teach/xiamen12-06/figs/fig-2.5.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(4,4),mex=0.4,bg="light green")
for(i in 1:16)
plot(y.soi[,i],y.rec[,1],type="p",pch="o",ylab="",xlab="",
ylim=c(0,100),xlim=c(-1,1))
text(-0.8,10,text_soi[i],cex=2)
dev.off()
postscript(file="c:/res-teach/xiamen12-06/figs/fig-2.6.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(4,4),mex=0.4,bg="light grey")
for(i in 1:16)
plot(y.soi[,1],y.rec[,i],type="p",pch="o",ylab="",xlab="",
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 35
ylim=c(0,100),xlim=c(-1,1))
text(-0.8,10,text_soi[i],cex=2)
dev.off()
postscript(file="c:/res-teach/xiamen12-06/figs/fig-2.7.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(1,2),mex=0.4,bg="light blue")
pacf(y,ylab="",xlab="",lag=30,ylim=c(-0.5,1),main="")
text(10,0.9,"PACF of SOI")
pacf(x,ylab="",xlab="",lag=30,ylim=c(-0.5,1),main="")
text(10,0.9,"PACF of Recruits")
dev.off()
################################################################
x<-read.table("c:/res-teach/xiamen12-06/data/ex2-1a.txt",header=T)
x.soi=x[,1]
n=length(x.soi)
aicc=0
if(aicc==1)
aic.value=rep(0,30) # max.lag=30
aicc.value=aic.value
sigma.value=rep(0,30)
for(i in 1:30)
fit3=arima(x.soi,order=c(i,0,0)) # fit an AR(i)
aic.value[i]=fit3$aic/n-2 # compute AIC
sigma.value[i]=fit3$sigma2
# obtain the estimated sigma^2
aicc.value[i]=log(sigma.value[i])+(n+i)/(n-i-2) # compute AICC
print(c(i,aic.value[i],aicc.value[i]))
data=cbind(aic.value,aicc.value)
write(t(data),"c:/res-teach/xiamen12-06/soi_aic.dat",ncol=2)
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 36
else
data<-matrix(scan("c:/res-teach/xiamen12-06/soi_aic.dat"),byrow=T,ncol=2)
text4=c("AIC", "AICC")
fit11=arima(x.soi,order=c(1,0,0))
resid1=fit11$residual
postscript(file="c:/res-teach/xiamen12-06/figs/fig-2.8.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(1,2),mex=0.4,bg="light yellow")
acf(resid1,ylab="",xlab="",lag.max=20,ylim=c(-0.5,1),main="")
text(10,0.8,"ACF of residuls of AR(1) for SOI")
matplot(1:30,data,type="b",pch="o",col=c(1,2),ylab="",xlab="Lag",cex=0.6)
legend(16,-1.40,text4,lty=1,col=c(1,2))
dev.off()
#fit2=arima(x.soi,order=c(16,0,0))
#print(fit2)
2.9 References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood princi-ple. In Proceeding of 2nd International Symposium on Information Theory (V. Petrovand F. Csaki, eds.) 267281. Akademiai Kiado, Budapest.
Bai, Z., C.R. Rao and Y. Wu (1999). Model selection with data-oriented penalty. Journalof Statistical Planning and Inferences, 77, 103-117.
Burnham, K.P. and D. Anderson (2003). Model Selection And Multi-Model Inference: APractical Information Theoretic Approach, 2nd edition. New York: Springer-Verlag.
Efron, B., T. Hastie, I. Johnstone and R. Tibshirani (2004). Least angle regression. TheAnnals of Statistics, 32, 407-451.
Eicker, F. (1967). Limit theorems for regression with unequal and dependent errors. InProceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability(L. LeCam and J. Neyman, eds.), University of California Press, Berkeley.
CHAPTER 2. CLASSICAL AND MODERN MODEL SELECTION METHODS 37
Fan and Li (2001). Variable selection via nonconcave penalized likelihood and its oracleproperties. Journal of the American Statistical Association, 96, 1348-1360.
Fan, J. and H. Peng (2004). Nonconcave penalized likelihood with a diverging number ofparameters. Annals of Statistics, 32, 928-961.
Frank, I.E. and J.H. Friedman (1993). A statistical view of some chemometric regressiontools (with discussion). Technometrics, 35, 109-148.
Hastie, T.J. and R.J. Tishirani (1990). Generalized Additive Models. London: Chapmanand Hall.
Hurvich, C.M. and C.-L.Tsai (1989). Regression and time series model selection in smallsamples. Biometrika, 76, 297-307.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6,461-464.
Shao, J. (1993). Linear model selection by cross-validation. Journal of the AmericanStatistical Association, 88, 486-494.
Shen, X.T. and J.M. Ye (2002). Adaptive model selection. Journal of the American Sta-tistical Association, 97, 210-221.
Shibata (1976). Biometrika, 63, 117-126.
Shumway, R.H. (1988). Applied Statistical Time Series Analysis. Englewood Cliffs, NJ:Prentice-Hall.
Shumway, R.H. (2006). Lecture Notes on Applied Time Series Analysis. Department ofStatistics, University of California at Davis.
Shumway, R.H. and D.S. Stoffer (2000). Time Series Analysis & Its Applications. NewYork: Springer-Verlag.
Tibshirani, R.J. (1996). Regression shrinkage and selection via the Lasso. Journal of theRoyal Statistical Society, Series B, 58, 267-288.
Wang, H., R. Li and C.-L. Tsai (2007). Tuning parameter selectors for the smoothly clippedabsolute deviation method. Biometrika, 94,553-568.
Chapter 3
Regression Models With CorrelatedErrors
3.1 Methodology
In many applications, the relationship between two time series is of major interest. The
market model in finance is an example that relates the return of an individual stock to
the return of a market index. The term structure of interest rates is another example in
which the time evolution of the relationship between interest rates with different maturities
is investigated. These examples lead to the consideration of a linear regression in the form
yt = β1 + β2 xt + et,
where yt and xt are two time series and et denotes the error term. The least squares (LS)
method is often used to estimate the above model. If et is a white noise series, then the
LS method produces consistent estimates. In practice, however, it is common to see that the
error term et is serially correlated. In this case, we have a regression model with time series
errors, and the LS estimates of β1 and β2 may not be consistent and efficient.
Regression model with time series errors is widely applicable in economics and finance, but
it is one of the most commonly misused econometric models because the serial dependence
in et is often overlooked. It pays to study the model carefully. The standard method for
dealing with correlated errors et in the regression model
yt = βT zt + et
is to try to transform the errors et into uncorrelated ones and then apply the standard least
squares approach to the transformed observations. For example, let P be an n × n matrix
38
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 39
that transforms the vector e = (e1, · · · , en)T into a set of independent identically distributed
variables with variance σ2. Then, transform the matrix version (1.4) to
Py = PZβ + Pe
and proceed as before. Of course, the major problem is deciding on what to choose for P
but in the time series case, happily, there is a reasonable solution, based again on time series
ARMA models. Suppose that we can find, for example, a reasonable ARMA model for the
residuals, say, for example, the ARMA(p, 0, 0) model
et =
p∑
k=1
φk et + wt,
which defines a linear transformation of the correlated et to a sequence of uncorrelated wt.
We can ignore the problems near the beginning of the series by starting at t = p. In the
ARMA notation, using the back-shift operator B, we may write
φ(L) et = wt, (3.1)
where
φ(L) = 1 −p∑
k=1
φk Lk (3.2)
and applying the operator to both sides of (1.2) leads to the model
φ(L) yt = φ(L) zt + wt, (3.3)
where the wt’s now satisfy the independence assumption. Doing ordinary least squares
on the transformed model is the same as doing weighted least squares on the untransformed
model. The only problem is that we do not know the values of the coefficients φk (1 ≤ k ≤ p)
in the transformation (3.2). However, if we knew the residuals et, it would be easy to estimate
the coefficients, since (3.2) can be written in the form
et = φT et−1 + wt, (3.4)
which is exactly the usual regression model (1.2) with φ = (φ1, · · · , φp)T replacing β and
et−1 = (et−1, et−2, · · · , et−p)T replacing zt. The above comments suggest a general approach
known as the Cochran-Orcutt (1949) procedure for dealing with the problem of correlated
errors in the time series context.
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 40
1. Begin by fitting the original regression model (1.2) by least squares, obtaining β and
the residuals et = yt − βTzt.
2. Fit an ARMA to the estimated residuals, say φ(L) et = θ(L)wt.
3. Apply the ARMA transformation found to both sides of the regression equation (1.2)
to obtainφ(L)
θ(L)yt = βT φ(L)
θ(L)zt + wt.
4. Run an ordinary least squares on the transformed values to obtain the new β.
5. Return to 2. if desired.
Often, one iteration is enough to develop the estimators under a reasonable correlation
structure. In general, the Cochran-Orcutt procedure converges to the maximum likelihood
or weighted least squares estimators.
Note that there is a function in R to compute the Cochrane-Orcutt estimator
arima(x, order = c(0, 0, 0),
seasonal = list(order = c(0, 0, 0), period = NA),
xreg = NULL, include.mean = TRUE, transform.pars = TRUE,
fixed = NULL, init = NULL, method = c("CSS-ML", "ML", "CSS"),
n.cond, optim.control = list(), kappa = 1e6)
by specifying “xreg=...”, where xreg is a vector or matrix of external regressors, which must
have the same number of rows as “x”.
Example 3.1: The data shown in Figure 3.1 represent quarterly earnings per share for the
American Company Johnson & Johnson from the from the fourth quarter of 1970 to the first
quarter of 1980. We might consider an alternative approach to treating the Johnson and
Johnson earnings series, assuming that yt = log(xt) = β1 + β2 t+ et. In order to analyze the
data with this approach, first we fit the model above, obtaining β1 = −0.6678(0.0349) and
β2 = 0.0417(0.0071). The computed residuals et = yt − β1 − β2 t can be computed easily,
the ACF and PACF are shown in the top two panels of Figure 3.2. Note that the ACF and
PACF suggest that a seasonal AR series will fit well and we show the ACF and PACF of
these residuals in the bottom panels of Figure 3.2. The seasonal AR model is of the form
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 41
0 20 40 60 80
05
1015
J&J Earnings
0 20 40 60 80
01
2
transformed log(earnings)
Figure 3.1: Quarterly earnings for Johnson & Johnson (4th quarter, 1970 to 1st quarter,1980, left panel) with log transformed earnings (right panel).
0 5 10 15 20 25 30−0.5
0.00.5
1.0
ACF
detrended
0 5 10 15 20 25 30−0.5
0.00.5
1.0
PACF
0 5 10 15 20 25 30−0.5
0.00.5
1.0
ARIMA(1,0,0,)_4
0 5 10 15 20 25 30−0.5
0.00.5
1.0
Figure 3.2: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the detrended log J&J earnings series (top two panels)and the fitted ARIMA(0, 0, 0) ×(1, 0, 0)4 residuals.
et = Φ1 et−4 +wt and we obtain Φ1 = 0.7614(0.0639), with σ2w = 0.00779. Using these values,
we transform yt to
yt − Φ1 yt−4 = β1(1 − Φ1) + β2[t− Φ1(t− 4)] + wt
using the estimated value Φ1 = 0.7614. With this transformed regression, we obtain the
new estimators β1 = −0.7488(0.1105) and β2 = 0.0424(0.0018). The new estimator has the
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 42
advantage of being unbiased and having a smaller generalized variance.
To forecast, we consider the original model, with the newly estimated β1 and β2. We
obtain the approximate forecast for ytt+h = β1+ β2(t+h)+e
tt+h for the log transformed series,
along with upper and lower limits depending on the estimated variance that only incorporates
the prediction variance of ett+h, considering the trend and seasonal autoregressive parameters
as fixed. The narrower upper and lower limits (The figure is not presented here) are mainly
a refection of a slightly better fit to the residuals and the ability of the trend model to take
care of the nonstationarity.
Example 3:2: We consider the relationship between two U.S. weekly interest rate series: xt:
the 1-year Treasury constant maturity rate and yt: the 3-year Treasury constant maturity
rate. Both series have 1967 observations from January 5, 1962 to September 10, 1999 and
are measured in percentages. The series are obtained from the Federal Reserve Bank of St
Louis.
Figure 3.3 shows the time plots of the two interest rates with solid line denoting the
1-year rate and dashed line for the 3-year rate. The left panel of Figure 3.4 plots yt versus
1970 1980 1990 2000
46
810
1214
16
Figure 3.3: Time plots of U.S. weekly interest rates (in percentages) from January 5, 1962to September 10, 1999. The solid line (black) is the Treasury 1-year constant maturity rateand the dashed line the Treasury 3-year constant maturity rate (red).
xt, indicating that, as expected, the two interest rates are highly correlated. A naive way to
describe the relationship between the two interest rates is to use the simple model, Model I:
yt = β1 + β2 xt + et. This results in a fitted model yt = 0.911 + 0.924xt + et, with σ2e = 0.538
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 43
ooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
oooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
ooooooooooo
ooooooooo
oooooooooo
ooooooooo
ooooooooooooooooooooooooooooooooooo
ooooooooooooo
oooooooo
ooooooooooooooooooo
oooooo
oooooooooooooooooooo
oooo
oooooooooooooo
o
ooooooooooooo
ooo
oo
oo
ooooooo
oooooo
oooo
oooooooo
oooo
ooooooo
o
o
ooooooo
oooo
ooooooo
oo
oo
ooo
oooo
ooooo
oooo
ooooooooooooooooooo
ooooooooooo
oooooo
oooooooooooooooooo
oooooooooooooooooooooooooooo
ooooooooooooooo
ooo
oooo
oo
oooo
oooooo
ooo
oooooooooo
oooooo
oooooo
oo
oooooo
ooo
oo
ooooooo
ooo
ooooooooo
ooooooo
ooooooooo
ooooo
ooooo
oo
oooo
ooooooooooo
ooo
oooooooo
oooooooooo
ooo
oooooooooooooooo
oooooooooooo
oooooo
oooooooo
ooooooooooooo
oooo
ooooooooooooooooooo
oooooooooooooooo
ooooooooooooooooooo
oooooooooo
oooooooo
ooooo
ooooooo
oooooooooooooooooo
oooooo
ooooooooooo
oo
ooooo
o
o
ooo
oo
ooo
ooooooo
oo
o
oo
oo
o
o
o
o
o
o
ooooo
oo
ooooo
oo
o
o
o
oo
ooo
oo
o
o
oo
oo
o
oo
oooo
ooo
o
ooo
o
o
oo
oo
oo
ooo
oooooo
oo
oo
oo
o
ooo
oo
o
o
oo
o
o
o
oo
o
o o
ooo
oo
oooo
ooo
oooo
ooooooooo
o
oo
o
o
oo
oo
ooooo
o
o
o
ooo
oooooooooo
oooooo
oo
ooooo
ooooo
ooooo
oooooo
oo
oo
ooo
oooo
oooooooooooooooooooooo
ooooooo
oooo
ooooo
ooooooooo
oooo
oo
oo
oooooo
ooooo
ooooo
oooo
ooo
oooo
ooo
ooo
ooooooooooooo
oooooooo
ooo
ooooooo
ooooo
oooo
oo
o
ooooooo
ooo
oooooo
oooooooo
ooooooooooooo
oooooooooooooo
ooooo
oooo
oooooooooooo
ooooo
o
o
oooooooooooo
ooooooooooooooooooooooooooooo
oooooooooooooooooo
ooooooooooooo
ooo
ooooooooo
ooooooooo
ooooo
oooo
oooooooooooooooooooooooooooooo
ooooooooooooooooooooo
oooooooooo
ooooooooooooooooooooooooooooooooo
oo
ooooooooooooooooo
ooooo
ooooooooooo
oooooo
oooooooo
ooooooooo
oooooo
ooooooooooooooooo
ooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooooo
oo
oooooooooooooooooooooooooooo
ooooooooo
oooooo
oooooooo
oooooooooooooooo
oooooooooooooooooooo
ooooo
ooo
ooooooooooooooooooooo
oooooooooooooooooo
oooooooooooooooooooooooooooooooooo
oooooooooooooooooooooo
ooooooooooooooooooooooooooooooooo
oo
ooo
oooooooooooooooooooooooooooooooooooooooooooooooooo
4 6 8 10 12 14 16
46
810
1214
16
oo
oo
ooo
o
oo
o
o
o
oo
oo
o
oo
ooo
oo
ooo
ooooo
oo
ooo
o
ooooooooo
oooo
oo
o
ooooo
oooo
ooo
ooooo
oo
ooo
o ooo
oooooooooooooo oooooooooooooooo
ooooo
ooooooo
oooooooooo
ooooooo
oooooooooo
o
oooooooo
ooooooooooooooooooooo
o
ooooooooooooo
o
ooooo
o
ooo
oo
oo
o
o
ooooooooooo
o
oooo
oo
o
oo
oo
o
o
o
ooo
o
oo
o
o
oo
o
o
o
oo
ooo
ooo
o o
ooo o
o
o
o
o
o
oo
oo
ooo
oooooo
o
o
o
ooo
o
o o
o
o
o
o
oo
ooo
oooo
oo
ooo
ooo
o
oo
o oo
o
o
ooo
o
o
o
o
o
oo
ooo
o
oo
o
o
o
oo
o
o
oo
oooo
o
o
oo
o
ooooooo
o
o
oo
o
o
o
o
oo
o
ooo
ooooooo
o
oo
o oo oo
o
o
o
o
o
o
o
o
o
ooo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
oo
ooo
o
o
o
o
o
o
o o
o
oo
o
o
oo
o
oo
oooo
o
o
o
o
o
oo
o
oo
o
o
o
o
o o
o
oo
o
o
o
oo
oo
oo
oo
o
o
o
oo
o
ooo
o
o
o
o
oo
o
o
oo
o
o
oo
o
o
o
ooo
oo
o
oo o
o
o
o
oooo
o
o
o
oo
o
oo
ooo
o
o
oo
oo
o
o
oo
o
oo
ooo
ooo oooo
o
o
o
ooo
oo
o
ooooooooooooooo
o
o
o
oooo
o
o
o
o oo
oo
oo
oo
o
o
o
o
o
oo
o
o
oo
o
o
oo
o
o
ooo
o
o
o
o
o
o
o
o
ooo
o
o
o
o
o
oo
o
o
oo
o
o
o
oo
o
o
oooo
ooo
oo
o
oo
o
o
oo oo
o
o
oo
o
ooo
oo
o
o
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo
oo
o
oo
o
o
o
o
o
o
o
o
oo
o
o
o
oo
o
ooo
oo
o
oo
o
oo
o
o
o
o
o
oo
o
o
o
ooo
oo
oo
o
oo
o
o oo
ooooooo
oo
o
o
oo
o
o
o
oooo
o
o
oo
o
o
o
o
oooooo
o
o
ooo
oooo
o
oo
oooo
ooo
oo
oo
oo
o
o
o
oo
oo
ooooooo
o
ooo
oo
ooooo
o
ooo
o
ooo
oo
oo
o
o
oo
oo
oo
o
oo
o o
o
ooo
oo
o
o
o
o
o
o
o
o
o
ooo
o
o
oo
oo
ooo
oo
o
oo
o
o
o
o
oo
o
o
o
o
o oo
o o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
ooo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
oo
ooo
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
oo
o
o
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
ooo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo o
o
o
o
o
oo
oo
ooo
ooooo
oo
ooo
o
o o
o
o
o
o
oo
o
o
oo
o
o
o
oo
o
oo
ooo
o o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
oo
o
oo
o
o
o
o
oo
o
o
o
o
o o
ooo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
oo
oo
oo
o
o
oo
o
o
oo
oo
oo
oo
o
o
o
o
o
oo
oo
o
o
o
o
o
o
o
o
o
oo
o
o
oo
ooo
o
o
o
o
oo
o
o
o
o o
o
ooooo
o
o
o
oo
ooo
oo
oo
o
o
o
o
o
oooo
o
o
o
o
o
oo
o
o
ooo
o
o
o
oo
o
o
oo
o
o
o
ooo
o
o
o
ooooo
o oo
o
o
oo
o
o
o
oo
o
o
o
ooo
o
o
o
o
o
o
o
o
oo
o
o
oo
o
o
ooo
o
ooo
o
o
oo
o o
o
o
oo
o
o
oo
ooo
oo
oo
o
o
o
o
ooo
o
oo
o
oo
oo
o
o
oo
o
o
oo
o
o
o
o
o
o
o o
o
o
o
o
oooo
o
oo
o
oo
o
o
o
o
oo
o
ooo
oo
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
oo
oo
o
o
o
oo
o
oo
oo
oo
oo
o
oo
oo
o
o
ooo
o
o
oo
o o
o
ooooo
o
o
oooo
oo
o
o
oo
o
oo ooo
oo
o
oo
o
o
o
oo
o
o
oo
o
o
o
oo
o
o
ooo
o
o
o
o
o
o
o
o
o
o
ooo
o
o
o
oo
oo
oo
o
oo
oo
o
o
o
o
oooooo
o
o
o
oooo
oo
o
o
oo
o
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
oo
o
oo
ooooo
o
o
ooo
oo
o
ooo
oo
o
oo
o
o
oo
o
o
o
ooo
oo
o
o
o
o
o
oo
o
o
o
o
ooo
oo
o
o
o
o
ooo
ooo
oo
o
o
o
oo
oo
o
o
o
oo
o
ooo
o
o
o
o
o
o
o
ooo
oo
o
o
o
o
o
o
oo
o
o
oo
o
o
oo
o
o
o
oo
o
o
o
ooo
o
ooo
oo
oo
o
oo
o
o
ooo
o
o
o
o
oo
o
o
o
o
oo
o
o
ooo
ooo
o
o
o
o
o
oo
o
o
oo
oo
ooo
oooooo
oo
o
oo
ooo
o
oo
o
o
o
oooo
oo
o
oooo
o
oo
o
o
o
ooo
o
o
o
o
oo
o
ooo
o
o
o
ooo
ooo
o
oo
o
o
oo
o
oo
o
o
oo
o
oo
ooo
o
ooooooo
oo
ooo
ooo
ooo
o
o
o
o
o
o
o
o
ooo
o
oo
o
o
oooo
o
oooo
o
o
oo
o
ooo
oo
o
o
o
o
o
o
o
o
o
o
ooo
oo
o
o
−1.5 −0.5 0.5 1.0 1.5
−1.0
−0.5
0.00.5
1.01.5
Figure 3.4: Scatterplots of U.S. weekly interest rates from January 5, 1962 to September 10,1999: the left panel is 3-year rate versus 1-year rate, and the right panel is changes in 3-yearrate versus changes in 1-year rate.
and R2 = 95.8%, where the standard errors of the two coefficients are 0.032 and 0.004,
respectively. This simple model (Model I) confirms the high correlation between the two
interest rates. However, the model is seriously inadequate as shown by Figure 3.5, which
1970 1980 1990 2000
−1.5
−1.0
−0.5
0.00.5
1.0
0 5 10 15 20 25 30
−0.5
0.00.5
1.0
Figure 3.5: Residual series of linear regression Model I for two U.S. weekly interest rates:the left panel is time plot and the right panel is ACF.
gives the time plot and ACF of its residuals. In particular, the sample ACF of the residuals is
highly significant and decays slowly, showing the pattern of a unit root nonstationary time
series1. The behavior of the residuals suggests that marked differences exist between the
two interest rates. Using the modern econometric terminology, if one assumes that the two
1We will discuss in detail on how to do unit root test later
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 44
interest rate series are unit root nonstationary, then the behavior of the residuals indicates
that the two interest rates are not co-integrated; see later chapters for discussion of unit
root and co-integration. In other words, the data fail to support the hypothesis that
there exists a long-term equilibrium between the two interest rates. In some sense, this is
not surprising because the pattern of “inverted yield curve” did occur during the data span.
By the inverted yield curve, we mean the situation under which interest rates are inversely
related to their time to maturities.
The unit root behavior of both interest rates and the residuals leads to the consideration
of the change series of interest rates. Let ∆xt = yt − yt−1 = (1 − L)xt be changes in the
1-year interest rate and ∆ yt = yt − yt−1 = (1 − L) yt denote changes in the 3-year interest
rate. Consider the linear regression, Model II: ∆ yt = β1+β2 ∆xt +et. Figure 3.6 shows time
plots of the two change series, whereas the right panel of Figure 3.4 provides a scatterplot
1970 1980 1990 2000
−1.5
−1.0
−0.5
0.00.5
1.01.5
Figure 3.6: Time plots of the change series of U.S. weekly interest rates from January 12,1962 to September 10, 1999: changes in the Treasury 1-year constant maturity rate are indenoted by black solid line, and changes in the Treasury 3-year constant maturity rate areindicated by red dashed line.
between them. The change series remain highly correlated with a fitted linear regression
model given by ∆ yt = 0.0002 + 0.7811 ∆xt + et with σ2e = 0.0682 and R2 = 84.8%. The
standard errors of the two coefficients are 0.0015 and 0.0075, respectively. This model further
confirms the strong linear dependence between interest rates. The two top panels of Figure
3.7 show the time plot (left) and sample ACF (right) of the residuals (Model II). Once again,
the ACF shows some significant serial correlation in the residuals, but the magnitude of the
correlation is much smaller. This weak serial dependence in the residuals can be modeled by
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 45
0 500 1000 1500 2000−0.4
−0.2
0.00.2
0.4
0 5 10 15 20 25 30−0.5
0.00.5
1.00 500 1000 1500 2000−0
.4−0
.20.0
0.20.4
0 5 10 15 20 25 30−0.5
0.00.5
1.0
Figure 3.7: Residual series of the linear regression models: Model II (top) and Model III(bottom) for two change series of U.S. weekly interest rates: time plot (left) and ACF (right).
using the simple time series models discussed in the previous sections, and we have a linear
regression with time series errors.
The main objective of this section is to discuss a simple approach for building a linear
regression model with time series errors. The approach is straightforward. We employ a
simple time series model discussed in this chapter for the residual series and estimate the
whole model jointly. For illustration, consider the simple linear regression in Model II.
Because residuals of the model are serially correlated, we identify a simple ARMA model for
the residuals. From the sample ACF of the residuals shown in the right top panel of Figure
3.7, we specify an MA(1) model for the residuals and modify the linear regression model to
(Model III): ∆ yt = β1 + β2 ∆xt + et and et = wt − θ1wt−1, where wt is assumed to be
a white noise series. In other words, we simply use an MA(1) model, without the constant
term, to capture the serial dependence in the error term of Model II. The two bottom panels
of Figure 3.7 show the time plot (left) and sample ACF (right) of the residuals (Model III).
The resulting model is a simple example of linear regression with time series errors. In
practice, more elaborated time series models can be added to a linear regression equation to
form a general regression model with time series errors.
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 46
Estimating a regression model with time series errors was not easy before the advent
of modern computers. Special methods such as the Cochrane-Orcutt estimator have been
proposed to handle the serial dependence in the residuals. By now, the estimation is as easy
as that of other time series models. If the time series model used is stationary and invertible,
then one can estimate the model jointly via the maximum likelihood method or conditional
maximum likelihood method. For the U.S. weekly interest rate data, the fitted version of
Model II is ∆ yt = 0.0002 + 0.7824 ∆xt + et and et = wt + 0.2115wt−1 with σ2w = 0.0668
and R2 = 85.4%. The standard errors of the parameters are 0.0018, 0.0077, and 0.0221,
respectively. The model no longer has a significant lag-1 residual ACF, even though some
minor residual serial correlations remain at lags 4 and 6. The incremental improvement of
adding additional MA parameters at lags 4 and 6 to the residual equation is small and the
result is not reported here.
Comparing the above three models, we make the following observations. First, the high
R2 and coefficient 0.924 of Modle I are misleading because the residuals of the model show
strong serial correlations. Second, for the change series, R2 and the coefficient of ∆xt of
Model II and Model III are close. In this particular instance, adding the MA(1) model to
the change series only provides a marginal improvement. This is not surprising because the
estimated MA coefficient is small numerically, even though it is statistically highly significant.
Third, the analysis demonstrates that it is important to check residual serial dependence in
linear regression analysis. Because the constant term of Model III is insignificant, the model
shows that the two weekly interest rate series are related as yt = yt−1 + 0.782 (xt − xt−1) +
wt + 0.212wt−1. The interest rates are concurrently and serially correlated.
Finally, we outline a general procedure for analyzing linear regression models with time
series errors: First, fit the linear regression model and check serial correlations of the residuals
(see later). Second, if the residual series is unit-root nonstationary, take the first difference
of both the dependent and explanatory variables. Go to step 1. If the residual series appears
to be stationary, identify an ARMA model for the residuals and modify the linear regression
model accordingly. Third, perform a joint estimation via the maximum likelihood method
and check the fitted model for further improvement. All above steps can be implemented in
R with the command arima().
To check the serial correlations of residuals, we recommend that the Ljung-Box statistics
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 47
should be used instead of the Durbin-Watson (DW) statistic because the latter only considers
the lag-1 serial correlation. There are cases in which residual serial dependence appears at
higher order lags. This is particularly so when the time series involved exhibits some seasonal
behavior.
Remark: For a residual series et with T observations, the Durbin-Watson statistic is
DW =T∑
t=2
(et − et−1)2/
T∑
t=1
e2t .
Straightforward calculation shows that DW ≈ 2(1− ρe(1)), where ρe(1) is the lag-1 ACF of
et.
Consider testing that several autocorrelation coefficients are simultaneously zero, i.e.
H0 : ρ1 = ρ2 = . . . = ρm = 0. Under the null hypothesis, it is easy to show (see, Box and
Pierce (1970)) that
Q = Tm∑
k=1
ρ2k −→ χ2
m. (3.5)
Ljung and Box (1978) provided the following finite sample correction which yields a better
fit to the χ2m for small sample sizes:
Q∗ = T (T + 2)m∑
k=1
ρ2k
T − k−→ χ2
m. (3.6)
Both are called Q-test and well known in the statistics literature. Of course, they are very
useful in applications.
The function in R for the Ljung-Box test is
Box.test(x, lag = 1, type = c("Box-Pierce", "Ljung-Box"))
and the Durbin-Watson test for autocorrelation of disturbances is
dwtest(formula, order.by = NULL, alternative = c("greater","two.sided",
"less"),iterations = 15, exact = NULL, tol = 1e-10, data = list())
3.2 Nonparametric Models with Correlated Errors
See the paper by Xiao, Linton, Carrol and Mammen (2003).
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 48
3.3 Predictive Regression Models
see the papers by Stambaugh (1999), Amihud and Hurvich (2004), Torous, Valkanov and
Yan (2004), and Campbell and Yogo (2006) and the references therein.
3.4 Computer Codes
##################################################
# This is Example 3.1 for Johnson and Johnson data
##################################################
y=read.table(’c:/res-teach/xiamen12-06/data/ex3-1.dat’,header=F)
n=length(y[,1])
y_log=log(y[,1]) # log of data
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-3.1.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(1,2),mex=0.4,bg="light yellow")
ts.plot(y,type="l",lty=1,ylab="",xlab="")
title(main="J&J Earnings",cex=0.5)
ts.plot(y_log,type="l",lty=1,ylab="",xlab="")
title(main="transformed log(earnings)",cex=0.5)
dev.off()
# MODEL 1: y_t=beta_0+beta_1 t+ e_t
z1=1:n
fit1=lm(y_log~z1) # fit log(z) versus time trend
e1=fit1$resid
# Now, we need to re-fit the model using the transformed data
x1=5:n
y_1=y_log[5:n]
y_2=y_log[1:(n-4)]
y_fit=y_1-0.7614*y_2
x2=x1-0.7614*(x1-4)
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 49
x1=(1-0.7614)*rep(1,n-4)
fit2=lm(y_fit~-1+x1+x2)
e2=fit2$resid
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-3.2.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(2,2),mex=0.4,bg="light pink")
acf(e1, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="ACF")
text(10,0.8,"detrended")
pacf(e1,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PACF")
acf(e2, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="")
text(15,0.8,"ARIMA(1,0,0,)_4")
pacf(e2,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="")
dev.off()
#####################################################
# This is Example 3.2 for weekly interest rate series
#####################################################
z<-read.table("c:/res-teach/xiamen12-06/data/ex3-2.txt",header=F)
# first column=one year Treasury constant maturity rate;
# second column=three year Treasury constant maturity rate;
# third column=date
x=z[,1]
y=z[,2]
n=length(x)
u=seq(1962+1/52,by=1/52,length=n)
x_diff=diff(x)
y_diff=diff(y)
# Fit a simple regression model and examine the residuals
fit1=lm(y~x) # Model 1
e1=fit1$resid
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 50
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-3.3.eps",
horizontal=F,width=6,height=6)
matplot(u,cbind(x,y),type="l",lty=c(1,2),col=c(1,2),ylab="",xlab="")
dev.off()
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-3.4.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(1,2),mex=0.4,bg="light grey")
plot(x,y,type="p",pch="o",ylab="",xlab="",cex=0.5)
plot(x_diff,y_diff,type="p",pch="o",ylab="",xlab="",cex=0.5)
dev.off()
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-3.5.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(1,2),mex=0.4,bg="light green")
plot(u,e1,type="l",lty=1,ylab="",xlab="")
abline(0,0)
acf(e1,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="")
dev.off()
# Take different and fit a simple regression again
fit2=lm(y_diff~x_diff) # Model 2
e2=fit2$resid
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-3.6.eps",
horizontal=F,width=6,height=6)
matplot(u[-1],cbind(x_diff,y_diff),type="l",lty=c(1,2),col=c(1,2),
ylab="",xlab="")
abline(0,0)
dev.off()
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 51
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-3.7.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(2,2),mex=0.4,bg="light pink")
ts.plot(e2,type="l",lty=1,ylab="",xlab="")
abline(0,0)
acf(e2, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="")
# fit a model to the differenced data with an MA(1) error
fit3=arima(y_diff,xreg=x_diff, order=c(0,0,1)) # Model 3
e3=fit3$resid
ts.plot(e3,type="l",lty=1,ylab="",xlab="")
abline(0,0)
acf(e3, ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="")
dev.off()
3.5 References
Amihud, Y. and C. Hurvich (2004). Predictive regressions: A reduced-bias estimationmethod. Journal of Financial and Quantitative Analysis, 39, 813-841.
Box, G. and D. Pierce (1970). Distribution of residual autocorrelations in autoregressiveintegrated moving average time series models. Journal of the American StatisticalAssociation, 65, 1509-1526.
Campbell, J. and M. Yogo (2006). Efficient tests of stock return predictability. Journal ofFinancial Economics, 81, 27-60.
Cochrane, D. and G.H. Orcutt (1949). Applications of least squares regression to relation-ships containing autocorrelated errors. Journal of the American Statistical Association,44, 32-61.
Ljung, G. and G. Box (1978). On a measure of lack of fit in time series models. Biometrika,66, 67-72.
Stambaugh, R. (1999). Predictive regressions. Journal of Financial Economics, 54, 375-421.
Torous, W., R. Valkanov and S. Yan (2004). On predicting stock returns with nearlyintegrated explanatory variables. Journal of Business, 77, 937-966.
CHAPTER 3. REGRESSION MODELS WITH CORRELATED ERRORS 52
Xiao, X., O.B. Linton, R.J. Carroll and E. Mammen (2003). More efficient local polyno-mial estimation in nonparametric regression with autocorrelated errors. Journal of theAmerican Statistical Association, 98, 480-992.
Chapter 4
Estimation of Covariance Matrix
4.1 Methodology
Consider again the regression model in (1.2). There may exist situations which the error et
has serial correlations and/or conditional heteroscedasticity, but the main objective of the
analysis is to make inference concerning the regression coefficients β. When et has serial cor-
relations, we discussed methods in Example 3.1 and Example 3.2 above to overcome this
difficulty. However, we assume that et follows an ARIMA type model and this assumption
might not be always satisfied in some applications. Here, we consider a general situation
without making this assumption. In situations under which the ordinary least squares es-
timates of the coefficients remain consistent, methods are available to provide consistent
estimate of the covariance matrix of the coefficients. Two such methods are widely used in
economics and finance. The first method is called heteroscedasticity consistent (HC) esti-
mator; see Eicker (1967) and White (1980). The second method is called heteroscedasticity
and autocorrelation consistent (HAC) estimator; see Newey and West (1987).
To ease in discussion, we shall re-write the regression model as
yt = βTxt + et,
where yt is the dependent variable, xt = (x1t, · · · , xpt)T is a p-dimensional vector of ex-
planatory variables including constant and lagged variables, and β = (β1, · · · , βp)T is the
parameter vector. The LS estimate of β is given by
β =
[n∑
t=1
xt xTt
]−1 n∑
t=1
xt yt,
53
CHAPTER 4. ESTIMATION OF COVARIANCE MATRIX 54
and the associated covariance matrix has the so-called “sandwich” form as
Σβ = Cov(β) =
[n∑
t=1
xt xTt
]−1
C
[n∑
t=1
xt xTt
]−1
if et is iid= σ2
e
[n∑
t=1
xt xTt
]−1
,
where C is called the “meat” given by
C = Var
(n∑
t=1
et xt
),
σ2e is the variance of et and is estimated by the variance of residuals of the regression. In the
presence of serial correlations or conditional heteroscedasticity, the prior covariance matrix
estimator is inconsistent, often resulting in inflating the t-ratios of β.
The estimator of White (1980) is based on following:
Σβ,hc =
[n∑
t=1
xt xTt
]−1
Chc
[n∑
t=1
xt xTt
]−1
,
where with et = yt − βTxt being the residual at time t,
Chc =n
n− p
n∑
t=1
e2t xt xTt .
The estimator of Newey and West (1987) is
Σβ,hac =
[n∑
t=1
xt xTt
]−1
Chac
[n∑
t=1
xt xTt
]−1
,
where Chac is given by
Chac =n∑
t=1
e2t xt xTt +
l∑
j=1
wj
n∑
t=j+1
xt et et−j xT
t−j + xt−j et−j et xTt
with l is a truncation parameter and wj is weight function such as the Barlett weight function
defined by wj = 1 − j/(l + 1). Other weight function can also used. Newey and West
(1987) showed that if l → ∞ and l4/T → 0, then Chac is a consistent estimator of C.
Newey and West (1987) suggested choosing l to be the integer part of 4(n/100)1/4 and
Newey and West (1994) suggested using some adaptive (data-driven) methods to choose
l; see Newey and West (1994) for details. In general, this estimator essentially can use a
nonparametric method to estimate the covariance matrix of∑n
t=1 et xt and a class of kernel-
based heteroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators
CHAPTER 4. ESTIMATION OF COVARIANCE MATRIX 55
was introduced by Andrews (1991). For example, the Barlett weight wj above can be replaced
by wj = K(j/(l+1)) where K(·) is a kernel function such as truncated kernel K(x) = I(|x| ≤1), the Tukey-Hanning kernel K(x) = (1 + cos(π x))/2 if |x| ≤ 1, the Parzen kernel
K(x) =
1 − 6x2 + 6 |x|3 for 0 ≤ |x| ≤ 1/2,2(1 − |x|)3 for 1/2 ≤ |x| ≤ 1,0 otherwsie,
and the Quadratic spectral kernel
K(x) =25
12π2x2
(sin(6π x/5)
6π x/5− cos(6π x/5)
).
Andrews (1991) suggested using the data-driven method to select the bandwidth l: l =
2.66(α T )1/5 for the Parzen kernel, l = 1.7462(α T )1/5 for the Tukey-Hanning kernel, and
l = 1.3221(α T )1/5 for the quadratic spectral kernel, where
α =4∑k
i=1 ρ2i σ
4i /(1 − ρi)
8
∑ni=1 σ
4i /(1 − ρi)4
with ρi and σi being parameters estimated from an AR(1) model for ut = et xt.
Example 4.1: (Continuation of Example 3.2) For illustration, we consider the first differ-
enced interest rate series in Model II in Example 3.2. The t-ratio of the coefficient of ∆xt
is 104.63 if both serial correlation and conditional heteroscedasticity in residuals are ignored;
it becomes 46.73 when the HC estimator is used, and it reduces to 40.08 when the HAC
estimator is employed.
To use HC or HAC estimator, we can use the package sandwich in R and the commands
are vcovHC() or vcovHAC() or meatHAC(). There are a set of functions implementing
a class of kernel-based heteroskedasticity and autocorrelation consistent (HAC) covariance
matrix estimators as introduced by Andrews (1991). In vcovHC(), these estimators differ in
their choice of the ωi in Ω = Var(e) = diagω1, · · · , ωn, an overview of the most important
cases is given in the following:
const : ωi = σ2
HC0 : ωi = e2i
HC1 : ωi =n
n− ke2i
HC2 : ωi =e2i
1 − hi
CHAPTER 4. ESTIMATION OF COVARIANCE MATRIX 56
HC3 : ωi =e2i
(1 − hi)2
HC4 : ωi =e2i
(1 − hi)δi
where hi = Hii are the diagonal elements of the hat matrix and δi = min4, hi/h.
vcovHC(x, type = c("HC3", "const", "HC", "HC0", "HC1", "HC2", "HC4"),
omega = NULL, sandwich = TRUE, ...)
meatHC(x, type = , omega = NULL)
vcovHAC(x, order.by = NULL, prewhite = FALSE, weights = weightsAndrews,
adjust = TRUE, diagnostics = FALSE, sandwich = TRUE, ar.method = "ols",
data = list(), ...)
meatHAC(x, order.by = NULL, prewhite = FALSE, weights = weightsAndrews,
adjust = TRUE, diagnostics = FALSE, ar.method = "ols", data = list())
kernHAC(x, order.by = NULL, prewhite = 1, bw = bwAndrews,
kernel = c("Quadratic Spectral", "Truncated", "Bartlett", "Parzen",
"Tukey-Hanning"), approx = c("AR(1)", "ARMA(1,1)"), adjust = TRUE,
diagnostics = FALSE, sandwich = TRUE, ar.method = "ols", tol = 1e-7,
data = list(), verbose = FALSE, ...)
weightsAndrews(x, order.by = NULL,bw = bwAndrews,
kernel = c("Quadratic Spectral","Truncated","Bartlett","Parzen",
"Tukey-Hanning"), prewhite = 1, ar.method = "ols", tol = 1e-7,
data = list(), verbose = FALSE, ...)
bwAndrews(x,order.by=NULL,kernel=c("Quadratic Spectral", "Truncated",
"Bartlett","Parzen","Tukey-Hanning"), approx=c("AR(1)", "ARMA(1,1)"),
weights = NULL, prewhite = 1, ar.method = "ols", data = list(), ...)
CHAPTER 4. ESTIMATION OF COVARIANCE MATRIX 57
Also, there are a set of functions implementing the Newey and West (1987, 1994) het-
eroskedasticity and autocorrelation consistent (HAC) covariance matrix estimators.
NeweyWest(x, lag = NULL, order.by = NULL, prewhite = TRUE, adjust = FALSE,
diagnostics = FALSE, sandwich = TRUE, ar.method = "ols", data = list(),
verbose = FALSE)
bwNeweyWest(x, order.by = NULL, kernel = c("Bartlett", "Parzen",
"Quadratic Spectral", "Truncated", "Tukey-Hanning"), weights = NULL,
prewhite = 1, ar.method = "ols", data = list(), ...)
4.2 Details (see the paper by Zeileis)
4.3 Computer Codes
############################################
# This is Example 4.1 of using HC and HAC
##########################################
library(sandwich) # HC and HAC are in the package "sandwich"
library(zoo)
z<-read.table("c:/res-teach/xiamen12-06/data/ex4-1.txt",header=F)
x=z[,1]
y=z[,2]
x_diff=diff(x)
y_diff=diff(y)
# Fit a simple regression model and examine the residuals
fit1=lm(y_diff~x_diff)
print(summary(fit1))
e1=fit1$resid
# Heteroskedasticity-Consistent Covariance Matrix Estimation
#hc0=vcovHC(fit1,type="const")
#print(sqrt(diag(hc0)))
# type=c("const","HC","HC0","HC1","HC2","HC3","HC4")
CHAPTER 4. ESTIMATION OF COVARIANCE MATRIX 58
# HC0 is the White estimator
hc1=vcovHC(fit1,type="HC0")
print(sqrt(diag(hc1)))
#Heteroskedasticity and autocorrelation consistent (HAC) estimation
#of the covariance matrix of the coefficient estimates in a
#(generalized) linear regression model.
hac1=vcovHAC(fit1,sandwich=T)
print(sqrt(diag(hac1)))
4.4 References
Andrews, D.W.K. (1991). Heteroskedasticity and autocorrelation consistent covariancematrix estimation. Econometrica, 59, 817-858.
Eicker, F. (1967). Limit theorems for regression with unequal and dependent errors. InProceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability(L. LeCam and J. Neyman, eds.), University of California Press, Berkeley.
Newey, W.K. and K.D. West (1987). A simple, positive-definite, heteroskedasticity andautocorrelation consistent covariance matrix. Econometrica, 55, 703-708.
Newey, W.K. and K.D. West (1994). Automatic lag selection in covariance matrix estima-tion. Review of Economic Studies, 61, 631-653.
White, H. (1980). A Heteroskedasticity consistent covariance matrix and a direct test forheteroskedasticity. Econometrica, 48, 817-838.
Zeileis, A. (2004). Econometric computing with HC and HAC covariance matrix estimators.Journal of Statistical Software, Volume 11, Issue 10.
Zeileis, A. (2006). Object-oriented computation of sandwich estimators. Journal of Statis-tical Software, 16, 1-16.
Chapter 5
Seasonal Time Series Models
5.1 Characteristics of Seasonality
When time series (particularly, economic and financial time series) are observed each day
or month or quarter, it is often the case that such as a series displays a seasonal pattern
(deterministic cyclical behavior). Similar to the feature of trend, there is no precise definition
of seasonality. Usually we refer to seasonality when observations in certain seasons display
strikingly different features to other seasons. For example, when the retail sales are always
large in the fourth quarter (because of the Christmas spending) and small in the first quarter
as can be observed from Figure 5.1. It may also be possible that seasonality is reflected in the
variance of a time series. For example, for daily observed stock market returns the volatility
seems often highest on Mondays, basically because investors have to digest three days of
news instead of only day. For mode details, see the book by Taylor (2005, §4.5).
Example 5.1: For Example 3.1, the data shown in Figure 3.1 represent quarterly earnings
per share for the American Company Johnson & Johnson from the from the fourth quarter
of 1970 to the first quarter of 1980. It is easy to note some very nonstationary behavior in
this series that cannot be eliminated completely by differencing or detrending because of the
larger fluctuations that occur near the end of the record when the earnings are higher. The
right panel of Figure 3.1 shows the log-transformed series and we note that the latter peaks
have been attenuated so that the variance of the transformed series seems more stable. One
would have to eliminate the trend still remaining in the above series to obtain stationarity.
For more details on the current analyses of this series, see the later analyses and the papers
by Burman and Shumway (1998) and Cai and Chen (2006).
59
CHAPTER 5. SEASONAL TIME SERIES MODELS 60
Example 5.2: In this example we consider the monthly US retail sales series (not seasonally
adjusted) from January of 1967 to December of 2000 (in billions of US dollars). The data
can be downloaded from the web site at http://marketvector.com. The U.S. retail sales index
0 100 200 300 400
50000
100000
200000
300000
Figure 5.1: US Retail Sales Data from 1967-2000.
is one of the most important indicators of the US economy. There are vast studies of the
seasonal series (like this series) in the literature; see, e.g., Franses (1996, 1998) and Ghysels
and Osborn (2001) and Cai and Chen (2006). From Figure 5.1, we can observe that the
peaks occur in December and we can say that retail sales display seasonality. Also, it can be
observed that the trend is basically increasing but nonlinearly. The same phenomenon can
be observed from Figure 3.1 for the quarterly earnings for Johnson & Johnson.
If simple graphs are not informative enough to highlight possible seasonal variation, a
formal regression model can be used, for example, one might try to consider the following
regression model with seasonal dummy variables
∆ yt = yt − yt−1 =s∑
j=1
βj Dj,t + εt,
where Dj,t is a seasonal dummy variable and s is the number of seasons. Of course, one
can use a seasonal ARIMA model, denoted by ARIMA(p, d, q)× (P,D,Q)s, which will be
discussed later.
Example 5.3: In this example, we consider a time series with pronounced seasonality
displayed in Figure 5.2, where logs of four-weekly advertising expenditures on ratio and
television in The Netherlands for 1978.01−1994.13. For these two marketing time series one
CHAPTER 5. SEASONAL TIME SERIES MODELS 61
0 50 100 150 200
89
1011
television
radio
Figure 5.2: Four-weekly advertising expenditures on radio and television in The Netherlands,1978.01 − 1994.13.
can observe clearly that the television advertising displays quite some seasonal fluctuation
throughout the entire sample and the radio advertising has seasonality only for the last five
years. Also, there seems to be a structural break in the radio series around observation
53. This break is related to an increase in radio broadcasting minutes in January 1982.
Furthermore, there is a visual evidence that the trend changes over time.
Generally, it appears that many time series seasonally observed from business and eco-
nomics as well as other applied fields display seasonality in the sense that the observations in
certain seasons have properties that differ from those data points in other seasons. A second
feature of many seasonal time series is that the seasonality changes over time, like what
studied by Cai and Chen (2006). Sometimes, these changes appear abrupt, as is the case for
advertising on the radio in Figure 5.2, and sometimes such changes occur only slowly. To
capture these phenomena, Cai and Chen (2006) proposed a more general flexible seasonal
effect model having the following form:
yij = α(ti) + βj(ti) + eij, i = 1, . . . , n, j = 1, . . . , s,
where yij = y(i−1)s+j, ti = i/n, α(·) is a (smooth) common trend function in [0, 1], βj(·)are (smooth) seasonal effect functions in [0, 1], either fixed or random, subject to a set of
constraints, and the error term eij is assumed to be stationary. For more details, see Cai
and Chen (2006).
CHAPTER 5. SEASONAL TIME SERIES MODELS 62
5.2 Modeling
Some economic and financial as well as environmental time series such as quarterly earning
per share of a company exhibits certain cyclical or periodic behavior; see the later chapters
on more discussions on cycles and periodicity. Such a time series is called a seasonal
(deterministic cycle) time series. Figure 3.1 shows the time plot of quarterly earning per share
of Johnson and Johnson from the first quarter of 1960 to the last quarter of 1980. The data
possess some special characteristics. In particular, the earning grew exponentially during
the sample period and had a strong seasonality. Furthermore, the variability of earning
increased over time. The cyclical pattern repeats itself every year so that the periodicity of
the series is 4. If monthly data are considered (e.g., monthly sales of Wal-Mart Stores), then
the periodicity is 12. Seasonal time series models are also useful in pricing weather-related
derivatives and energy futures.
Analysis of seasonal time series has a long history. In some applications, seasonality is of
secondary importance and is removed from the data, resulting in a seasonally adjusted time
series that is then used to make inference. The procedure to remove seasonality from a time
series is referred to as seasonal adjustment. Most economic data published by the U.S.
government are seasonally adjusted (e.g., the growth rate of domestic gross product and the
unemployment rate). In other applications such as forecasting, seasonality is as important
as other characteristics of the data and must be handled accordingly. Because forecasting
is a major objective of economic and financial time series analysis, we focus on the latter
approach and discuss some econometric models that are useful in modeling seasonal time
series.
When the autoregressive, differencing, or seasonal moving average behavior seems to
occur at multiples of some underlying period s, a seasonal ARIMA series may result. The
seasonal nonstationarity is characterized by slow decay at multiples of s and can often be
eliminated by a seasonal differencing operator of the form ∆Ds xt = (1 − Ls)D xt. For
example, when we have monthly data, it is reasonable that a yearly phenomenon will induce
s = 12 and the ACF will be characterized by slowly decaying spikes at 12, 24, 36, 48, · · ·, and
we can obtain a stationary series by transforming with the operator (1−L12)xt = xt −xt−12
which is the difference between the current month and the value one year or 12 months ago.
If the autoregressive or moving average behavior is seasonal at period s, we define formally
CHAPTER 5. SEASONAL TIME SERIES MODELS 63
the operators
Φ(Ls) = 1 − Φ1 Ls − Φ2 L
2s − · · · − ΦP LPs (5.1)
and
Θ(Ls) = 1 − Θ1 Ls − Θ2 L
2s − · · · − ΘQ LQs. (5.2)
The final form of the seasonal ARIMA(p, d, q) × (P,D,Q)s model is
Φ(Ls)φ(L)∆Ds ∆d xt = Θ(Ls) θ(L)wt. (5.3)
Note that one special model of (5.3) is ARIMA(0, 1, 1) × (0, 1, 1)s, that is
(1 − Ls)(1 − L)xt = (1 − θ1 L)(1 − Θ1 Ls)wt.
This model is referred to as the airline model or multiplicative seasonal model in the
literature; see Box and Jenkins (1970), Box, Jenkins, and Reinsel (1994, Chapter 9), and
Brockwell and Davis (1991). It has been found to be widely applicable in modeling seasonal
time series. The AR part of the model simply consists of the regular and seasonal differences,
whereas the MA part involves two parameters.
We may also note the properties below corresponding to Properties 5.1 - 5.3.
Property 5.1: The ACF of a seasonally non-stationary time series decays very slowly at
lag multiples s, 2s, 3s, · · ·, with zeros in between, where s denotes a seasonal period, usually
4 for quarterly data or 12 for monthly data. The PACF of a non-stationary time series tends
to have a peak very near unity at lag s.
Property 5.2: For a seasonal autoregressive series of order P , the partial autocorrelation
function Φhh as a function of lag h has nonzero values at s, 2s, 3s, · · ·, Ps, with zeros in
between, and is zero for h > Ps, the order of the seasonal autoregressive process. There
should be some exponential decay.
Property 5.3: For a seasonal moving average series of orderQ, note that the autocorrelation
function (ACF) has nonzero values at s, 2s, 3s, · · ·, Qs and is zero for h > Qs.
Remark: Note that there is a build-in command in R called arima() which is a powerful
tool for estimating and making inference for an ARIMA model. The command is
CHAPTER 5. SEASONAL TIME SERIES MODELS 64
arima(x,order=c(0,0,0),seasonal=list(order=c(0,0,0),period=NA),
xreg=NULL,include.mean=TRUE, transform.pars=TRUE,fixed=NULL,init=NULL,
method=c("CSS-ML","ML","CSS"),n.cond,optim.control=list(),kappa=1e6)
See the manuals of R for details about this commend.
Example 5.4: We illustrate by fitting the monthly birth series from 1948-1979 shown in
Figure 5.3. The period encompasses the boom that followed the Second World War and there
0 100 200 300
250
300
350
400
Births
0 100 200 300
−40
−20
020
40
First difference
0 50 100 200 300
−40
−20
020
40 ARIMA(0,1,0)X(0,1,0)_12
0 100 200 300
−40
−20
020
40 ARIMA(0,1,1)X(0,1,1)_12
Figure 5.3: Number of live births 1948(1) − 1979(1) and residuals from models with a firstdifference, a first difference and a seasonal difference of order 12 and a fitted ARIMA(0, 1, 1)×(0, 1, 1)12 model.
is the expected rise which persists for about 13 years followed by a decline to around 1974.
The series appears to have long-term swings, with seasonal effects superimposed. The long-
term swings indicate possible non-stationarity and we verify that this is the case by checking
the ACF and PACF shown in the top panel of Figure 5.4. Note that by Property 5.1,
slow decay of the ACF indicates non-stationarity and we respond by taking a first difference.
The results shown in the second panel of Figure 2.5 indicate that the first difference has
eliminated the strong low frequency swing. The ACF, shown in the second panel from the
top in Figure 5.4 shows peaks at 12, 24, 36, 48, · · ·, with now decay. This behavior implies
CHAPTER 5. SEASONAL TIME SERIES MODELS 65
0 10 20 30 40 50 60−0
.50
.5ACF
0 10 20 30 40 50 60−0
.50
.5
PACF
data
0 10 20 30 40 50 60−0
.50
.5
0 10 20 30 40 50 60−0
.50
.5
ARIMA(0,1,0)
0 10 20 30 40 50 60−0
.50
.5
0 10 20 30 40 50 60−0
.50
.5
ARIMA(0,1,0)X(0,1,0)_12
0 10 20 30 40 50 60−0
.50
.5
0 10 20 30 40 50 60−0
.50
.5
ARIMA(0,1,0)X(0,1,1)_12
0 10 20 30 40 50 60−0
.50
.5
0 10 20 30 40 50 60−0
.50
.5
ARIMA(0,1,1)X(0,1,1)_12
Figure 5.4: Autocorrelation functions and partial autocorrelation functions for the birthseries (top two panels), the first difference (second two panels) an ARIMA(0, 1, 0)×(0, 1, 1)12
model (third two panels) and an ARIMA(0, 1, 1) × (0, 1, 1)12 model (last two panels).
CHAPTER 5. SEASONAL TIME SERIES MODELS 66
seasonal non-stationarity, by Property 5.1 above, with s = 12. A seasonal difference of
the first difference generates an ACF and PACF in Figure 5.4 that we expect for stationary
series.
Taking the seasonal difference of the first difference gives a series that looks stationary
and has an ACF with peaks at 1 and 12 and a PACF with a substantial peak at 12 and
lesser peaks at 12, 24, · · ·. This suggests trying either a first order moving average term,
or a first order seasonal moving average term with s = 12, by Property 5.3 above. We
choose to eliminate the largest peak first by applying a first-order seasonal moving average
model with s = 12. The ACF and PACF of the residual series from this model, i.e. from
ARIMA(0, 1, 0)× (0, 1, 1)12, written as (1−L)(1−L12)xt = (1−Θ1 L12)wt, is shown in the
fourth panel from the top in Figure 5.4. We note that the peak at lag one is still there, with
attending exponential decay in the PACF. This can be eliminated by fitting a first-order
moving average term and we consider the model ARIMA(0, 1, 1) × (0, 1, 1)12, written as
(1 − L)(1 − L12)xt = (1 − θ1 L)(1 − Θ1 L12)wt.
The ACF of the residuals from this model are relatively well behaved with a number of peaks
either near or exceeding the 95% test of no correlation. Fitting this final ARIMA(0, 1, 1) ×(0, 1, 1)12 model leads to the model
(1 − L)(1 − L12)xt = (1 − 0.4896L)(1 − 0.6844L12)wt
with AICC= 4.95, R2 = 0.98042 = 0.961, and the p-values are (0.000, 0.000). The ARIMA
search leads to the model
(1 − L)(1 − L12)xt = (1 − 0.4088L− 0.1645L2)(1 − 0.6990L12)wt,
yielding AICC= 4.92 and R2 = 0.9812 = 0.962, slightly better than the ARIMA(0, 1, 1) ×(0, 1, 1)12 model. Evaluating these latter models leads to the conclusion that the extra
parameters do not add a practically substantial amount to the predictability. The model is
expanded as
xt = xt−1 + xt−12 − xt−13 + wt − θ1wt−1 − Θ1wt−12 + θ1Θ1wt−13.
The forecast is
xtt+1 = xt + xt−11 − xt−12 − θ1wt − Θ1wt−11 + θ1 Θ1wt−12
CHAPTER 5. SEASONAL TIME SERIES MODELS 67
xtt+2 = xt
t+1 + xt−10 − xt−11 − Θ1wt−10 + θ1Θ1wt−11.
Continuing in the same manner, we obtain
xtt+12 = xt
t+11 + xt − xt−1 − Θ1wt + θ1Θ1wt−1
for the 12 month forecast.
Example 5.5: Figure 5.5 shows the autocorrelation function of the log-transformed J&J
earnings series that is plotted in Figure 3.1 and we note the slow decay indicating the
nonstationarity which has already been obvious in the Chapter 3 discussion. We may also
compare the ACF with that of a random walk, and note the close similarity. The partial
autocorrelation function is very high at lag one which, under ordinary circumstances, would
indicate a first order autoregressive AR(1) model, except that, in this case, the value is
close to unity, indicating a root close to 1 on the unit circle. The only question would be
whether differencing or detrending is the better transformation to stationarity. Following
in the Box-Jenkins tradition, differencing leads to the ACF and PACF shown in the second
panel and no simple structure is apparent. To force a next step, we interpret the peaks at 4,
8, 12, 16, · · ·, as contributing to a possible seasonal autoregressive term, leading to a possible
ARIMA(0, 1, 0) × (1, 0, 0)4 and we simply fit this model and look at the ACF and PACF of
the residuals, shown in the third two panels. The fit improves somewhat, with significant
peaks still remaining at lag 1 in both the ACF and PACF. The peak in the ACF seems more
isolated and there remains some exponentially decaying behavior in the PACF, so we try a
model with a first-order moving average. The bottom two panels show the ACF and PACF
of the resulting ARIMA(0, 1, 1)×(1, 0, 0)4 and we note only relatively minor excursions above
and below the 95% intervals under the assumption that the theoretical ACF is white noise.
The final model suggested is (yt = log xt)
(1 − Φ1 L4)(1 − L) yt = (1 − θ1 L)wt, (5.4)
where Φ1 = 0.820(0.058), θ1 = 0.508(0.098), and σ2w = 0.0086. The model can be written in
forecast form as
yt = yt−1 + Φ1(yt−4 − yt−5) + wt − θ1wt−1.
The residual plot of the above is plotted in the left bottom panel of Figure 5.6. To forecast
the original series for, say 4 quarters, we compute the forecast limits for yt = log xt and then
exponentiate, i.e. xtt+h = exp(yt
t+h).
CHAPTER 5. SEASONAL TIME SERIES MODELS 68
0 5 10 15 20 25 30−0.
50.
00.
51.
0
ACF
log(J&J)
0 5 10 15 20 25 30−0.
50.
00.
51.
0
PACF
0 5 10 15 20 25 30−0.
50.
00.
51.
0
First Difference
0 5 10 15 20 25 30−0.
50.
00.
51.
0
0 5 10 15 20 25 30−0.
50.
00.
51.
0
ARIMA(0,1,0)X(1,0,0,)_4
0 5 10 15 20 25 30−0.
50.
00.
51.
0
0 5 10 15 20 25 30−0.
50.
00.
51.
0
ARIMA(0,1,1)X(1,0,0,)_4
0 5 10 15 20 25 30−0.
50.
00.
51.
0
Figure 5.5: Autocorrelation functions (ACF) and partial autocorrelation functions (PACF)for the log J&J earnings series (top two panels), the first difference (second two panels),ARIMA(0, 1, 0) × (1, 0, 0)4 model (third two panels), and ARIMA(0, 1, 1) × (1, 0, 0)4 model(last two panels).
Based on the the exact likelihood method, Tsay (2005) considered the following seasonal
ARIMA(0, 1, 1) × (0, 1, 1)4 model
(1 − L)(1 − L4) yt = (1 − 0.678L)(1 − 0.314L4)wt, (5.5)
with σ2w = 0.089, where standard errors of the two MA parameters are 0.080 and 0.101,
respectively. The Ljung-Box statistics of the residuals show Q(12) = 10.0 with p-value 0.44.
The model appears to be adequate. The ACF and PACF of the ARIMA(0, 1, 1) × (0, 1, 1)4
CHAPTER 5. SEASONAL TIME SERIES MODELS 69
0 5 10 15 20 25 30−0.5
0.00.5
1.0
ACF
ARIMA(0,1,1)X(0,1,1,)_4
0 5 10 15 20 25 30−0.5
0.00.5
1.0
PACF
0 20 40 60 80
−0.2
−0.1
0.00.1
0.2
Residual Plot
ARIMA(0,1,1)X(1,0,0,)_4
0 20 40 60 80−0
.10.0
0.10.2
Residual Plot
ARIMA(0,1,1)X(0,1,1,)_4
Figure 5.6: ACF and PACF for ARIMA(0, 1, 1) × (0, 1, 1)4 model (top two panels) and theresidual plots of ARIMA(0, 1, 1)×(1, 0, 0)4 (left bottom panel) and ARIMA(0, 1, 1)×(0, 1, 1)4
model (right bottom panel).
model are given in the top two panels of Figure 5.6 and the residual plot is displayed in
the right bottom panel of Figure 5.6. Based on the comparison of ACF and PACF of two
model (5.4) and (5.5) [the last two panels of Figure 5.5 and the top two panels in Figure
5.6], it seems that ARIMA(0, 1, 1) × (0, 1, 1)4 model in (5.5) might perform better than
ARIMA(0, 1, 1) × (1, 0, 0)4 model in (5.4).
To illustrate the forecasting performance of the seasonal model in (5.5), we re-estimate
the model using the first 76 observations and reserve the last eight data points for fore-
casting evaluation. We compute 1-step to 8-step ahead forecasts and their standard errors
of the fitted model at the forecast origin t = 76. An anti-log transformation is taken to
obtain forecasts of earning per share using the relationship between normal and log-normal
distributions. Figure 2.15 in Tsay (2005, p.77) shows the forecast performance of the model,
where the observed data are in solid line, point forecasts are shown by dots, and the dashed
lines show 95% interval forecasts. The forecasts show a strong seasonal pattern and are close
to the observed data. For more comparisons for forecasts using different models including
semiparametric and nonparametric models, the reader is referred to the book by Shumway
(1988), and Shumway and Stoffer (2000) and the papers by Burman and Shummay (1998)
CHAPTER 5. SEASONAL TIME SERIES MODELS 70
and Cai and Chen (2006).
When the seasonal pattern of a time series is stable over time (e.g., close to a deterministic
function), dummy variables may be used to handle the seasonality. This approach is taken
by some analysts. However, deterministic seasonality is a special case of the multiplicative
seasonal model discussed before. Specifically, if Θ1 = 1, then model contains a deterministic
seasonal component. Consequently, the same forecasts are obtained by using either dummy
variables or a multiplicative seasonal model when the seasonal pattern is deterministic. Yet
use of dummy variables can lead to inferior forecasts if the seasonal pattern is not deter-
ministic. In practice, we recommend that the exact likelihood method should be used to
estimate a multiplicative seasonal model, especially when the sample size is small or when
there is the possibility of having a deterministic seasonal component.
Example 5.6: To determine deterministic behavior, consider the monthly simple return of
the CRSP Decile 1 index from January 1960 to December 2003 for 528 observations. The
series is shown in the left top panel of Figure 5.7 and the time series does not show any
clear pattern of seasonality. However, the sample ACf of the return series shown in the left
0 100 200 300 400 500
−0.2
0.00.2
0.4
Simple Returns
0 100 200 300 400 500−0.3
−0.1
0.10.2
0.30.4
January−adjusted returns
0 10 20 30 40−0.5
0.00.5
1.0
ACF
0 10 20 30 40−0.5
0.00.5
1.0
ACF
Figure 5.7: Monthly simple return of CRSP Decile 1 index from January 1960 to December2003: Time series plot of the simple return (left top panel), time series plot of the simplereturn after adjusting for January effect (right top panel), the ACF of the simple return (leftbottom panel), and the ACF of the adjusted simple return.
CHAPTER 5. SEASONAL TIME SERIES MODELS 71
bottom panel of Figure 5.7 contains significant lags at 12, 24, and 36 as well as lag 1. If
seasonal AIMA models are entertained, a model in form
(1 − φ1 L)(1 − Φ1 L12)xt = α+ (1 − Θ1 L
12)wt
is identified, where xt is the monthly simple return. Using the conditional likelihood, the
fitted model is
(1 − 0.25L)(1 − 0.99L12)xt = 0.0004 + (1 − 0.92L12)wt
with σw = 0.071. The MA coefficient is close to unity, indicating that the fitted model is
close to being non-invertible. If the exact likelihood method is used, we have
(1 − 0.264L)(1 − 0.996L12)xt = 0.0002 + (1 − 0.999L12)wt
with σw = 0.067. Cancellation between seasonal AR and MA factors is clearly. This high-
lights the usefulness of using the exact likelihood method, and the estimation result suggests
that the seasonal behavior might be deterministic. To further confirm this assertion, we
define the dummy variable for January, that is
Jt =
1 if t is January,0 otherwise,
and employ the simple linear regression
xt = β0 + β1 Jt + et.
The right panels of Figure 5.7 show the time series plot of and the ACF of the residual series
of the prior simple linear regression. From the ACF, there are no significant serial correlation
at any multiples of 12, suggesting that the seasonal pattern has been successfully removed
by the January dummy variable. Consequently, the seasonal behavior in the monthly simple
return of Decile 1 is due to the January effect.
5.3 Nonlinear Seasonal Time Series Models
See the papers by Burman and Shumway (1998) and Cai and Chen (2006) and the books
by Franses (1998) and Ghysels and Osborn (2001). The reading materials are the papers by
Burman and Shumway (1998) and Cai and Chen (2006).
CHAPTER 5. SEASONAL TIME SERIES MODELS 72
5.4 Computer Codes
##########################################
# This is Example 5.2 for retail sales data
#########################################
y=read.table("c:/res-teach/xiamen12-06/data/ex5-2.txt",header=F)
postscript(file="c:/res-teach/xiamen12-06/figs/fig-5.1.eps",
horizontal=F,width=6,height=6)
ts.plot(y,type="l",lty=1,ylab="",xlab="")
dev.off()
############################################
# This is Example 5.3 for the marketing data
############################################
text_tv=c("television")
text_radio=c("radio")
data<-read.table("c:/res-teach/xiamen12-06/data/ex5-3.txt",header=T)
TV=log(data[,1])
RADIO=log(data[,2])
postscript(file="c:/res-teach/xiamen12-06/figs/fig-5.2.eps",
horizontal=F,width=6,height=6)
ts.plot(cbind(TV,RADIO),type="l",lty=c(1,2),col=c(1,2),ylab="",xlab="")
text(20,10.5,text_tv)
text(165,8,text_radio)
dev.off()
####################################################################
# This is Example 5.4 in Chapter 2
####################################
x<-matrix(scan("c:/res-teach/xiamen12-06/data/ex5-4.txt"),byrow=T,ncol=1)
n=length(x)
x_diff=diff(x)
x_diff_12=diff(x_diff,lag=12)
CHAPTER 5. SEASONAL TIME SERIES MODELS 73
fit1=arima(x,order=c(0,0,0),seasonal=list(order=c(0,0,0)),include.mean=F)
resid_1=fit1$resid
fit2=arima(x,order=c(0,1,0),seasonal=list(order=c(0,0,0)),include.mean=F)
resid_2=fit2$resid
fit3=arima(x,order=c(0,1,0),seasonal=list(order=c(0,1,0),period=12),
include.mean=F)
resid_3=fit3$resid
postscript(file="c:/res-teach/xiamen12-06/figs/fig-5.4.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(5,2),mex=0.4,bg="light pink")
acf(resid_1, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="ACF",cex=0.7)
pacf(resid_1,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="PACF",cex=0.7)
text(20,0.7,"data",cex=1.2)
acf(resid_2, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="")
# differenced data
pacf(resid_2,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="")
text(30,0.7,"ARIMA(0,1,0)")
acf(resid_3, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="")
# seasonal difference of differenced data
pacf(resid_3,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="")
text(30,0.7,"ARIMA(0,1,0)X(0,1,0)_12",cex=0.8)
fit4=arima(x,order=c(0,1,0),seasonal=list(order=c(0,1,1),
period=12),include.mean=F)
resid_4=fit4$resid
fit5=arima(x,order=c(0,1,1),seasonal=list(order=c(0,1,1),
period=12),include.mean=F)
resid_5=fit5$resid
acf(resid_4, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="")
CHAPTER 5. SEASONAL TIME SERIES MODELS 74
# ARIMA(0,1,0)*(0,1,1)_12
pacf(resid_4,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="")
text(30,0.7,"ARIMA(0,1,0)X(0,1,1)_12",cex=0.8)
acf(resid_5, ylab="", xlab="",ylim=c(-0.5,1),lag=60,main="")
# ARIMA(0,1,1)*(0,1,1)_12
pacf(resid_5,ylab="",xlab="",ylim=c(-0.5,1),lag=60,main="")
text(30,0.7,"ARIMA(0,1,1)X(0,1,1)_12",cex=0.8)
dev.off()
postscript(file="c:/res-teach/xiamen12-06/figs/fig-5.3.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(2,2),mex=0.4,bg="light blue")
ts.plot(x,type="l",lty=1,ylab="",xlab="")
text(250,375, "Births")
ts.plot(x_diff,type="l",lty=1,ylab="",xlab="",ylim=c(-50,50))
text(255,45, "First difference")
abline(0,0)
ts.plot(x_diff_12,type="l",lty=1,ylab="",xlab="",ylim=c(-50,50))
# time series plot of the seasonal difference (s=12) of differenced data
text(225,40,"ARIMA(0,1,0)X(0,1,0)_12")
abline(0,0)
ts.plot(resid_5,type="l",lty=1,ylab="",xlab="",ylim=c(-50,50))
text(225,40, "ARIMA(0,1,1)X(0,1,1)_12")
abline(0,0)
dev.off()
########################
# This is Example 5.5
########################
y=read.table(’c:/res-teach/xiamen12-06/data/ex3-1.txt’,header=F)
n=length(y[,1])
CHAPTER 5. SEASONAL TIME SERIES MODELS 75
y_log=log(y[,1]) # log of data
y_diff=diff(y_log) # first-order difference
y_diff_4=diff(y_diff,lag=4) # first-order seasonal difference
fit1=ar(y_log,order=1) # fit AR(1) model
#print(fit1)
library(tseries) # call library(tseries)
library(zoo)
fit1_test=adf.test(y_log)
# do Augmented Dicky-Fuller test for tesing unit root
#print(fit1_test)
fit1=arima(y_log,order=c(0,0,0),seasonal=list(order=c(0,0,0)),
include.mean=F)
resid_21=fit1$resid
fit2=arima(y_log,order=c(0,1,0),seasonal=list(order=c(0,0,0)),
include.mean=F)
resid_22=fit2$resid # residual for ARIMA(0,1,0)*(0,0,0)
fit3=arima(y_log,order=c(0,1,0),seasonal=list(order=c(1,0,0),period=4),
include.mean=F,method=c("CSS"))
resid_23=fit3$resid # residual for ARIMA(0,1,0)*(1,0,0)_4
# note that this model is non-stationary so that "CSS" is used
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-5.5.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(4,2),mex=0.4,bg="light green")
acf(resid_21, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="ACF",cex=0.7)
text(16,0.8,"log(J&J)")
pacf(resid_21,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PACF",cex=0.7)
acf(resid_22, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="")
text(16,0.8,"First Difference")
pacf(resid_22,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="")
CHAPTER 5. SEASONAL TIME SERIES MODELS 76
acf(resid_23, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="")
text(16,0.8,"ARIMA(0,1,0)X(1,0,0,)_4",cex=0.8)
pacf(resid_23,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="")
fit4=arima(y_log,order=c(0,1,1),seasonal=list(order=c(1,0,0),
period=4),include.mean=F,method=c("CSS"))
resid_24=fit4$resid # residual for ARIMA(0,1,1)*(1,0,0)_4
# note that this model is non-stationary
#print(fit4)
fit4_test=Box.test(resid_24,lag=12, type=c("Ljung-Box"))
#print(fit4_test)
acf(resid_24, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="")
text(16,0.8,"ARIMA(0,1,1)X(1,0,0,)_4",cex=0.8)
# ARIMA(0,1,1)*(1,0,0)_4
pacf(resid_24,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="")
dev.off()
fit5=arima(y_log,order=c(0,1,1),seasonal=list(order=c(0,1,1),period=4),
include.mean=F,method=c("ML"))
resid_25=fit5$resid # residual for ARIMA(0,1,1)*(0,1,1)_4
#print(fit5)
fit5_test=Box.test(resid_25,lag=12, type=c("Ljung-Box"))
#print(fit5_test)
postscript(file="c:\\res-teach\\xiamen12-06\\figs\\fig-5.6.eps",
horizontal=F,width=6,height=6,bg="light grey")
par(mfrow=c(2,2),mex=0.4)
acf(resid_25, ylab="", xlab="",ylim=c(-0.5,1),lag=30,main="ACF")
text(16,0.8,"ARIMA(0,1,1)X(0,1,1,)_4",cex=0.8)
# ARIMA(0,1,1)*(0,1,1)_4
pacf(resid_25,ylab="",xlab="",ylim=c(-0.5,1),lag=30,main="PACF")
ts.plot(resid_24,type="l",lty=1,ylab="",xlab="")
title(main="Residual Plot",cex=0.5)
CHAPTER 5. SEASONAL TIME SERIES MODELS 77
text(40,0.2,"ARIMA(0,1,1)X(1,0,0,)_4",cex=0.8)
abline(0,0)
ts.plot(resid_25,type="l",lty=1,ylab="",xlab="")
title(main="Residual Plot",cex=0.5)
text(40,0.18,"ARIMA(0,1,1)X(0,1,1,)_4",cex=0.8)
abline(0,0)
dev.off()
#########################
# This is Example 5.6
########################
z<-matrix(scan("c:/res-teach/xiamen12-06/data/ex5-6.txt"),byrow=T,ncol=4)
decile1=z[,2]
# Model 1: an ARIMA(1,0,0)*(1,0,1)_12
fit1=arima(decile1,order=c(1,0,0),seasonal=list(order=c(1,0,1),
period=12),include.mean=T)
#print(fit1)
e1=fit1$resid
n=length(decile1)
m=n/12
jan=rep(c(1,0,0,0,0,0,0,0,0,0,0,0),m)
feb=rep(c(0,1,0,0,0,0,0,0,0,0,0,0),m)
mar=rep(c(0,0,1,0,0,0,0,0,0,0,0,0),m)
apr=rep(c(0,0,0,1,0,0,0,0,0,0,0,0),m)
may=rep(c(0,0,0,0,1,0,0,0,0,0,0,0),m)
jun=rep(c(0,0,0,0,0,1,0,0,0,0,0,0),m)
jul=rep(c(0,0,0,0,0,0,1,0,0,0,0,0),m)
aug=rep(c(0,0,0,0,0,0,0,1,0,0,0,0),m)
sep=rep(c(0,0,0,0,0,0,0,0,1,0,0,0),m)
oct=rep(c(0,0,0,0,0,0,0,0,0,1,0,0),m)
CHAPTER 5. SEASONAL TIME SERIES MODELS 78
nov=rep(c(0,0,0,0,0,0,0,0,0,0,1,0),m)
dec=rep(c(0,0,0,0,0,0,0,0,0,0,0,1),m)
de=cbind(decile1[jan==1],decile1[feb==1],decile1[mar==1],decile1[apr==1],
decile1[may==1],decile1[jun==1],decile1[jul==1],decile1[aug==1],
decile1[sep==1],decile1[oct==1],decile1[nov==1],decile1[dec==1])
# Model 2: a simple regression model without correlated errors
# to see the effect from January
fit2=lm(decile1~jan)
e2=fit2$resid
#print(summary(fit2))
# Model 3: a regression model with correlated errors
fit3=arima(decile1,xreg=jan,order=c(0,0,1),include.mean=T)
e3=fit3$resid
#print(fit3)
postscript(file="c:/res-teach/xiamen12-06/figs/fig-5.7.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(2,2),mex=0.4,bg="light yellow")
ts.plot(decile1,type="l",lty=1,col=1,ylab="",xlab="")
title(main="Simple Returns",cex=0.5)
abline(0,0)
ts.plot(e3,type="l",lty=1,col=1,ylab="",xlab="")
title(main="January-adjusted returns",cex=0.5)
abline(0,0)
acf(decile1, ylab="", xlab="",ylim=c(-0.5,1),lag=40,main="ACF")
acf(e3,ylab="",xlab="",ylim=c(-0.5,1),lag=40,main="ACF")
dev.off()
5.5 References
Box, G.E.P. and Jenkins, G.M. (1970). Time Series Analysis, Forecasting, and Control.Holden Day, San Francisco.
CHAPTER 5. SEASONAL TIME SERIES MODELS 79
Box, G.E.P., G.M. Jenkins and G.C. Reinsel (1994). Time Series Analysis, Forecasting andControl. 3th Edn. Englewood Cliffs, NJ: Prentice-Hall.
Brockwell, P.J. and Davis, R.A. (1991). Time Series Theory and Methods. New York:Springer.
Burman, P. and R.H. Shumway (1998). Semiparametric modeling of seasonal time series.Journal of Time Series Analysis, 19, 127-145.
Cai, Z. and R. Chen (2006). Flexible seasonal time series models. Advances in Economet-rics, 20B, 63-87.
Franses, P.H. (1998). Time Series Models for Business and Economic Forecasting. NewYork: Cambridge University Press.
Franses, P.H. and D. van Dijk (2000). Nonlinear Time Series Models for Empirical Finance.New York: Cambridge University Press.
Ghysels, E. and D.R. Osborn (2001). The Econometric Analysis of Seasonal Time Series.New York: Cambridge University Press.
Shumway, R.H. (1988). Applied Statistical Time Series Analysis. Englewood Cliffs, NJ:Prentice-Hall.
Shumway, R.H., A.S. Azari and Y. Pawitan (1988). Modeling mortality fluctuations in LosAngeles as functions of pollution and weather effects. Environmental Research, 45,224-241.
Shumway, R.H. and D.S. Stoffer (2000). Time Series Analysis & Its Applications. NewYork: Springer-Verlag.
Tiao, G.C. and R.S. Tsay (1983). Consistency properties of least squares estimates ofautoregressive parameters in ARMA models. Annals of Statistics, 11, 856-871.
Tsay, R.S. (2005). Analysis of Financial Time Series, 2th Edition. John Wiley & Sons,New York.
Chapter 6
Robust and Quantile Regressions
There are a lot of materials to cover both robust regression and linear quantile regression.
However here we give only some basic concepts and brief methodology. For more detailed
discussions, see the books by Huber (1981) and Koenker (2005).
6.1 Robust Regression
Robustness means that good statistical procedures should work well even if the underlying
assumptions are somewhat in error. For a classical regression model
Yi = βTXi + εi,
the general form of a robust estimation is the estimator to minimizing the function
n∑
i=1
ρ(Yi − βTXi),
where ρ(·) is a function (“loss” function) to be specified. If ρ(z) = z2 (quadratic), then it is
the usual least squares criterion. If ρ(z) = |z|, it gives the least absolute deviation (LAD)
estimator (median regression). Another important choice is the Huber (1981)’s function
ρc(z) =
z2, if |z| ≤ c,c|z| − c2/2, if |z| > c,
where c is a fixed constant, or
ρc,0(z) =
z2, if |z| ≤ c,c2, if |z| > c,
and the corresponding estimator is calledM-estimator. There are a lot of choices of ρ(·); see
the book by Serfling (1980) for details. This choice for ρc(·) results in down-weighting
80
CHAPTER 6. ROBUST AND QUANTILE REGRESSIONS 81
cases with large residuals. For more details on robust regression, see the book by Huber
(1981).
To implement the robust regression in R, you can use the command rlm() or lqs()
(M-estimator) in the package MASS [library(MASS)] .
rlm(formula, data, weights, ..., subset, na.action,
method = c("M", "MM", "model.frame"),
wt.method = c("inv.var", "case"),
model = TRUE, x.ret = TRUE, y.ret = FALSE, contrasts = NULL)
lqs(formula, data, ...,
method = c("lts", "lqs", "lms", "S", "model.frame"),
subset, na.action, model = TRUE,
x.ret = FALSE, y.ret = FALSE, contrasts = NULL)
6.2 Quantile Regression
The above LAD estimator is indeed a special case (τ = 1/2) of the following quantile
regressionn∑
i=1
ρτ (Yi − βTXi),
where the loss function ρτ (z) = z(τ − Iz<0) is also called the loss (“check”) function, and
IA is the indicator function of any set A. All the aforementioned loss functions are plotted
in Figure 6.1 using R (the R code for this figure can be found in Section 7.3). Indeed, the
above problem can be formulated as a general quantile regression, for any 0 < τ < 1,
Fy(qτ (Xi) |Xi) = τ,
where Fy(y |Xi) is the conditional distribution of Yi given Xi, and qτ (Xi) = βTτ Xi which
allows the coefficients to depend on τ . This linear quantile regression model was proposed
by Koenker and Bassett (1978, 1982). For details, see the papers by Koenker and Bassett
(1978, 1982). There are several advantages of using a quantile regression:
• A quantile regression does not require knowing the distribution of the
dependent variable.
CHAPTER 6. ROBUST AND QUANTILE REGRESSIONS 82
−2 −1 0 1 2
01
23
4
Loss Functions
quadradicHuberHuber0tau=0.05LADtau=0.95
−c c
Figure 6.1: Different loss functions for Quadratic, Huber’s ρc(·), ρc,0(·), ρ0.05(·), LAD andρ0.95(·), where c = 1.345.
• It does not require the symmetry of the measurement error.
• It can characterize the heterogeneity.
• It can estimate the mean and variance simultaneously.
• It is a robust procedure.
There are a lot more.
Clearly, a quantile regression model includes the LAD and the classical mean regression
models as a special case. To see this, for a classical regression model
Yi = βTXi + εi,
its τ -th quantile is
qτ (Xi) = βTXi + qτ,i(ε),
where qτ,i(ε) is the τ -th quantile of εi. For example, if εi = σ(Xi)ui with iid ui, then,
qτ,i(ε) = σ(Xi) qτ,u, where qτ,u is the τ -th quantile of ui, and
qτ (Xi) = βTXi + σ(Xi) qτ,u.
CHAPTER 6. ROBUST AND QUANTILE REGRESSIONS 83
This property can be used as an informative way (graphically) to detect whether σ(Xi) is
constant or not by plotting two quantile functions qτ1(Xi) and qτ2(Xi) for two distinct τ1 6= τ2.
If two quantile curves (lines) are parallel, this means that σ(Xi) is a constant. Further, if
σ(Xi) is a linear function of Xi, then the quantile regression model can be used to model the
conditional heteroscedasticity such as the autoregressive conditional heteroscedastic (ARCH)
and generalized autoregressive conditional heteroscedastic (GARCH) as well as stochastic
volatility (SV) models. For details on this aspect, see the papers by Koenker and Zhao (1996)
and Xiao (2006). Of course, one can model some nonlinear type models such as ARCH
or GARCH or SV models. For more details, see the book by Koenker (2005). Finally,
another application of conditional quantiles is the construction of prediction intervals
for the next value given a small section of recent past values in a stationary time series
(Granger, White, and Kamstra, 1989; Koenker, 1994; Zhou and Portnoy, 1996; Koenker
and Zhao, 1996; Taylor and Bunn, 1999). Also, Granger, White, and Kamstra (1989),
Koenker and Zhao (1996), and Taylor and Bunn (1999) considered an interval forecasting
for parametric autoregressive conditional heteroscedastic type models. For example, the
quantile regression can be used to construct a prediction interval as (q0.025(Xt), q0.975(Xt))
such as P (q0.025(Xt) ≤ Yt+1 ≤ q0.975(Xt) |Xt) = 0.95.
To fit a linear quantile regression using R, one can use the command rq() in the package
quantreg. For a nonlinear parametric model, the command is nlrq(). For a nonparametric
quantile model for univariate case, one can use the command lprq() for implementing the
local polynomial estimation. For an additive quantile regression, one can use the commands
rqss() and qss().
rq(formula, tau=.5, data, subset, weights, na.action,
method="br", model = TRUE, contrasts, ...)
lprq(x, y, h, tau = .5, m = 50)
6.3 Computer Codes
The following is the R code for making Figure 6.1.
# define functions
CHAPTER 6. ROBUST AND QUANTILE REGRESSIONS 84
rho1=function(x)x^2
rho2=function(x,c)x^2*(abs(x)<=c)+(c*abs(x)-c^2/2)*(abs(x)>c)
rho3=function(x,c)x^2*(abs(x)<=c)+c^2*(abs(x)>c)
rho_tau=function(x,tau)x*(tau-(x<0))
z=seq(-2,2,by=0.1)
c=1.345
y=cbind(rho1(z),rho2(z,c),rho3(z,c),rho_tau(z,0.05),rho_tau(z,0.5),
rho_tau(z,0.95))
text1=c("quadradic","Huber","Huber0","tau=0.05","LAD","tau=0.95")
c1=c*rep(1,10)
c2=seq(0,c,length=10)
c3=c2^2
postscript(file="c:/res-teach/xiamen12-06/figs/fig-6.1.ps",
horizontal=F,width=6,height=6)
# output the graph as a ps or eps file
#win.graph() # show the graph on the screen
par(bg="light green")
matplot(z,y,type="l",lty=1:6,ylab="",xlab="",ylim=c(-0.2,4))
points(c1,c3,type="l")
points(-c1,c3,type="l")
title(main="Loss Functions",col.main="red",cex=0.5)
legend(0,4,text1,lty=1:6,col=1:6,cex=0.7)
text(-c,-0.2,"-c")
text(c,-0.2,"c")
abline(0,0)
dev.off()
6.4 References
Huber, P.J. (1981). Robust Statistics. New York: Wiley.
Granger, C.W.J., White, H. and Kamstra, M. (1989). Interval forecasting: an analysisbased upon ARCH-quantile estimators. Journal of Econometrics, 40, 87-96.
CHAPTER 6. ROBUST AND QUANTILE REGRESSIONS 85
Koenker, R. (1994). Confidence intervals for regression quantiles. In Proceedings of theFifth Prague Symposium on Asymptotic Statistics (P. Mandl and M. Huskova, eds.),349-359. Physica, Heidelberg.
Koenker, R. (2005). Quantile Regression. Cambridge University Press, New York.
Koenker, R. and G.W. Bassett (1978). Regression quantiles. Econometrica, 46, 33-50.
Koenker, R. and G.W. Bassett (1982). Robust tests for heteroscedasticity based on regres-sion quantiles. Econometrica, 50, 43-61.
Koenker, R. and Q. Zhao (1996). Conditional quantile estimation and inference for ARCHmodels. Econometric Theory, 12, 793-813.
Serfling, R.J. (1980). Approximation Theorems of Mathematical Statistics. John Wiley,New York.
Taylor, J.W. and Bunn, D.W. (1999). A quantile regression approach to generating predic-tion intervals. Management Science, 45, 225-237.
Xiao, Z. (2006). Conditional quantile estimation for GARCH models. Working Paper,Department of Economics, Boston College.
Zhou, K.Q. and Portnoy, S.L. (1996). Direct use of regression quantiles to construct confi-dence sets in linear models. The Annals of Statistics, 24, 287–306.
Chapter 7
How to Analyze Boston House PriceData?
7.1 Description of Data
The well known Boston house price data set1 consists of 14 variables, collected on each
of 506 different houses from a variety of locations. The Boston house-price data set was
used originally by Harrison and Rubinfeld (1978) and it was re-analyzed in Belsley, Kuh
and Welsch (1980) by various transformations in the table on pages 244-261. Variables are,
denoted by X1, · · · , X13 and Y , in order:
CRIM per capita crime rate by town
ZN proportion of residential land zoned for lots over 25,000 sq.ft.
INDUS proportion of non-retail business acres per town
CHAS Charles River dummy variable (= 1 if tract bounds river; 0
otherwise)
NOX nitric oxides concentration (parts per 10 million)
RM average number of rooms per dwelling
AGE proportion of owner-occupied units built prior to 1940
DIS weighted distances to five Boston employment centers
RAD index of accessibility to radial highways
TAX full-value property-tax rate per 10,000USD
PTRATIO pupil-teacher ratio by town
B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town
1This dataset can be downloaded from the web site at http://lib.stat.cmu.edu/datasets/boston.
86
CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA? 87
LSTAT lower status of the population
MEDV Median value of owner-occupied homes in $1000’s
The dependent variable is Y , the median value of owner-occupied homes in $1, 000’s
(house price). The major factors possibly affecting the house prices used in the literature
are:
X13=proportion of population of lower educational status,
X6=the average number of rooms per house,
X1=the per capita crime rate,
X10=the full property tax rate,
and
X11=the pupil/teacher ratio.
For the complete description of all 14 variables, see Harrison and Rubinfeld (1978) and Gilley
and Pace (1996) for corrections.
7.2 Analysis Methods
7.2.1 Linear Models
Harrison and Rubinfeld (1978) was the first to analyze this data set using a standard regres-
sion model Y versus all 13 variables including some higher order terms or transformations on
Y and Xj’s. The purpose of this study is to see whether there are the effects of pollution on
housing prices via hedonic pricing methodology. Belsley, Kuh and Welsch (1980) used this
data set to illustrate the effects of using robust regression and outlier detection strategies.
From these results, we might conclude that the model might not be linear and there might
exist outliers. Also, Pace and Gilley (1997) added a georeferencing idea (spatial statistics or
econometrics) and used a spatial estimation method to consider this data set.
Exercise: Please use all possible methods to explore this dataset to see what is the best
linear model (including higher orders) you can obtain.
CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA? 88
7.2.2 Nonparametric Models
Mean Additive Models
There have been several papers devoted to the analysis of this dataset using some non-
parametric methods. For example, Breiman and Friedman (1985), Pace (1993), Chaudhuri,
Doksum and Samarov (1997), and Opsomer and Ruppert (1998) used four covariates: X6,
X10, X11 and X13 or their transformations (including the transformation on Y ) to fit the
data through a mean additive regression model such as
log(Y ) = µ+ g1(X6) + g2(X10) + g3(X11) + g4(X13) + ε, (7.1)
where the additive components gj(·) are unspecified smooth functions. Pace (1993) and
Chaudhuri, Doksum and Samarov (1997) also considered the nonparametric estimation of the
first derivative of each additive component which measures how much the response changes as
one covariate is perturbed while the other covariates are held fixed; see Chaudhuri, Doksum
and Samarov (1997). Let us use model (7.1) to fit the Boston house price data. The results
are summarized in Figure 7.1 (the R code can be found in Section 7.3).
To fit an additive model or a partially additive model in R, the function is gam() in the
package gam. For details, please look at the help command help(gam) after loading the
package gam [library(gam)].
gam(formula, family = gaussian, data, weights, subset, na.action,
start, etastart, mustart, control = gam.control(...),
model=FALSE, method, x=FALSE, y=TRUE, ...)
Note that the function gam() allows to fit a semi-parametric additive model as
log(Y ) = µ+ g1(X6) + β2X10 + β3X11 + β4X13 + ε, (7.2)
which can be done by specifying some components without smooth.
Fan and Jiang (2005) used the generalized likelihood ratio (GLR) [see Cai, Fan and Li
(2000), Cai, Fan and Yao (2000) and Fan, Zhang and Zhang (2001)] test to test if any
component is linear or insignificant. In other words, they tested an additive model versus
a partially additive model or a smaller additive model. That is, H0 : gj(x) = a + bx or
H0 : gj(x) = 0. Let us use model (7.2) to fit the Boston house price data. The results are
summarized in Figure 7.2 (the R code can be found in Section 7.3).
CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA? 89
4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
x6
lo(x
6)
200 300 400 500 600 700
−0.1
00.
000.
100.
20
x10
lo(x
10)
14 16 18 20 22−0.1
00.
000.
050.
100.
15
x11
lo(x
11)
10 20 30−0.8
−0.4
0.0
0.2
0.4
x13
lo(x
13)
Component of X_13
Figure 7.1: The results from model (7.1).
Quantile Regression Approaches
Some existing analyses (e.g., Breiman and Friedman, 1985) in mean regression models con-
cluded that most of the variation seen in housing prices in the restricted data set can be
explained by two major variables: X6 and X13. Indeed, the correlation coefficients between
Y and X13 and X6 are −0.7377 and 0.6954, respectively. Figure 7.3 gives the scatter plots
of the house price versus the covariates X13, X6, X1 and X∗1 , respectively The interesting
features of this dataset are that the response variable is the median price of a home in a
given area and the distributions of Y and the major covariate X13 are left skewed (please
look at the density estimates; see Figure 7.2(d) for the density estimate of Y which can be
done by using the command density() in R).
To overcome the skewness, one way to use a transformation such as the Box-Cox trans-
formation as mentioned above. Another way (a better way) is to use the quantile regression.
Therefore, Yu and Lu (2004) employed the additive quantile technique to analyze the
CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA? 90
oo
oo
ooo
oo
oo o
oooo
oo ooo
ooo
oooo o
oo
o
o
oo
ooo
oo
oooo
oo
oo
o
ooo oo
ooo oo
ooo
oo o
ooo
oo oo
oooooo
oo
ooooo ooo
o
ooooo
o
oo o
oo
oo
ooo
oo
oo
o o
ooo
o
o oo
oo o
oo
o
o oo
o oo oo
oo
ooo
oooo
o
oo
o
o
o
o
o
o oo
o
o
oo
o
o
ooo
oooooooo
oo o
ooo oo o
o o o
o
oooo
ooo o
oo ooo
o
oo oooo o o ooo
oo
oo
oo o
o
oo oo
o o
o
oo
oo
o
oo
oo
o o
ooo ooooo
oo oo
o ooo
ooo o
ooo
ooo
oo
oo
oo
o
o o
ooo oo ooo
ooo
o oo oo
o
oo ooo
o
ooooooo
ooo
o
o ooooo
oo
o
oo oo
o
o o oo oooo
ooo
o
oo
oo o oooooooo
o
o
oo
oo o oooo
o oo o
o ooo oooo
o
o
o
o
ooo
oo
oo
o
oo
o
o
o
ooo
oo
o
o
oo
o
o
o oooo
o
o
o
o o
o
o
o
o
o
ooo
o
o
o
o
ooo
o
oo
oo
oo
oooo
oo
oo oo
o
o
o
o
oo
o
o oo
oo o
o
ooo
oo
oooo oo
o
ooo
oooo
oo
ooo
o o
o
o oo o
ooo o
oooo o
o
o
o
o
oo
ooo
oo
oooo
oo
o
2.5 3.0 3.5 4.0
−0.5
0.0
0.5
1.0
y_hat
Residual Plot of Additive Model
oo
ooo
ooo
oo
oooooooo
oo
ooooo
oo o
ooo
ooo
oooooo
ooooo
oo o
oo
ooo
oo
o
oo
ooo oo
o
o
ooo
oo
ooooooooo
o oooo o ooooo
ooooo oo
ooo
ooo
oooo oooo ooooo oooooooooo
oo
o
o
ooooo
oo
oo oo
o
oo
o
oo
o
o
oo
o
o
ooo
o
oo
oo
oo
o
o o
o
ooo
ooo
oo
oo
ooo
o
o
o
o
oo
o
oo
oo
o
o
oo
o
ooo
oo
o
ooo
oo
o o
o
o
o
oo
o
oo ooo
oo
oo
oo
o
o
o
oo
o
oo
oo o
o
ooo
ooo
ooooo
ooo
o
o
oo
o
o
o
o
oo
o
ooo o
o
o
oo o o
o
oo
o
oo
o
o
o
oo
o
oooo oo
o
ooo ooo
o
oo
oo
oo
o
o
ooo
o
oo o oo
ooo
ooo
oo
ooooo
oo
oo
oooooooo
o
ooo
oooo
o
ooo o
oo oo
oooo
o
o
o
o
o
o
o
o
o
oo
o
o
o
oo
oo
o
o
oo
o
o
o
oo
o
oo
o
oooo
oo
oooo
ooo
o
oo ooo
o
o
o
oo
o
o oo
oo
oo
oo ooooo
ooo oooo
oo
oo oo
oooo oooo
o
oo
ooo
o ooooo
ooooo
ooo
oo
oo
o
ooo oo
ooooo
oo
o
ooo
oo
o
o oo
ooo
ooo
4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0.5
0.6
x6
s1(x
6)
Component of X_6
oo
ooooo
oo
oo o
o
ooooo o
ooo
oo oooo oo
o
o
o
oo o
o o
o oo
ooo
oo
oo
oo
oo oo
oo
o ooo
oo o
oo
ooo
o o ooo
oooooo
ooooo
o oo
o
o
ooooooo
ooo
o
oo
ooo
oo
oo
o oo
oo
o
o oo
oooo
oo
o oo
o oo
ooo
oooo
oo
oo
o
oo
o
o
o
o
o
ooo
o
o
oo
o
o
ooo
oooo o
ooooo o
o
oooo
oo
oo
o
oooo
ooo o
oo ooo
o
oo ooo
oo ooo
o
o ooo
oo o
o
oo oo
o o
o
oo
oo
oo
ooo
o o
ooo ooooo
oooo
o ooo
ooo o o
oo
oooo
oo
ooo
o
oo
ooo oo oooooo o oo o oo
oo oo
oo
ooooooo
ooo
o
o ooo
oo
oo
ooo o
o
o
o o oo oooo
ooo
o
o ooo o
ooooo oooo
o
oo
oo o ooooo
oo o
o ooo oooo
o
o
o
o
ooo
oo
oo
o
oo
o
o
o
oo
o
oo
oo
oo
o
o
o oooo
o
o
o
o o
o
o
o
o
o
ooo
o
o
o
o
oo
o
o
oo
oo
oo
oo
oo
oo
oo oo
o
o
o
o
o o
o
o oo
oo o
o
ooo
oo
oooooo
o
ooo
o
oooo
o ooo
o o
o
o oo o
o oo o
oooo o
o
o
o
o
oo
oo
o
oo
oo
oo
oo
o
2.5 3.0 3.5 4.0
−0.5
0.0
0.5
1.0
y_hat
Residual Plot of Model II
0 10 20 30 40 500.00
0.02
0.04
0.06
Density of Y
Figure 7.2: (a) Residual plot for model (7.1). (b) Plot of g1(x6) versus x6. (c) Residual plotfor model (7.2). (d) Density estimate of Y .
data to fit the following additive quantile model, for any 0 < τ < 1,
qτ (X) = δτ + θτ,1(X6) + θτ,2(X∗10) + θτ,3(X11) + θτ,4(X
∗13), (7.3)
where X∗10 = log(X10), X
∗13 = log(X13), and the additive components θτ,j(·) are unspecified
smooth functions. Unfortunately, the detailed graphs are not provided and you are asked to
write a code for model (7.3) to present the computational results, including graphs. Note that
you can use the command rqss() in R to compute the estimated values of each component
in model (7.3).
Recently, Senturk and Muller (2005a, 2005b, 2006) studied the correlation between the
house price Y and the crime rate X1 adjusted by the confounding variable X13 through a
varying coefficient model and they concluded that the expected effect of increasing crime
rate on declining house prices seems to be only observed for lower educational status neigh-
borhoods in Boston. Finally, it is surprising that all the existing nonparametric models
CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA? 91
10 20 30
1020
3040
50
Plot of Price vs LSTAT
x134 5 6 7 8
1020
3040
50
Plot of Price vs RM
x6
0 20 40 60 80
1020
3040
50
Plot of Price vs CRIM
x1−4 −2 0 2 4
1020
3040
50
Plot of Price vs log(CRIM)
log(x1)
Figure 7.3: Boston Housing Price Data: Displayed in (a)-(d) are the scatter plots of thehouse price versus the covariates X13, X6, X1 and X∗
1 , respectively.
aforementioned above did not include the crime rate X1, which may be an important factor
affecting the housing price, and did not consider the interaction terms such as X13 and X1.
Based on the above discussions, Cai and Xu (2005) studied the following nonparametric
quantile regression model, which might be well suitable to the analysis of this dataset,
qτ (X) = a0,τ (X13) + a1,τ (X13)X6 + a2,τ (X13)X∗1 , 1 ≤ t ≤ n = 506, (7.4)
where X∗1 = log(X1). The reason for using the logarithm of X1 in (7.4), instead of X1 itself,
is that the correlation between Yt and X∗1 is slightly stronger than that for Yt and X1. The
estimated curves for three coefficient functions are displayed in Figure 7.4. In the model
fitting, covariates X6 and X∗1 can be centralized. Of course, you can add more covariates
into model (7.4) and then you need to consider the model diagnostics to see whether some
covariates are significant. Also, you might explore some semi-parametric models to this data
set. Unfortunately, the detailed code is not provided and you are asked to write a code by
CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA? 92
5 10 15 20 25
1015
2025
3035
40
(a)
tau=0.05tau=0.50tau=0.95MeanCI for tau=0.5
5 10 15 20 25
05
10
(b)
tau=0.05tau=0.50tau=0.95MeanCI for tau=0.5
5 10 15 20 25
−3−2
−10
12
34
(c)
tau=0.05tau=0.50tau=0.95MeanCI for tau=0.5
Figure 7.4: Boston Housing Price Data: The plots of the estimated coefficient functions forthree quantiles τ = 0.05 (solid line) for model (7.4), τ = 0.50 (dashed line), and τ = 0.95(dotted line), and the mean regression (dot-dashed line): a0,τ (u) and a0(u) versus u in (a),a1,τ (u) and a1(u) versus u in (b), and a2,τ (u) and a2(u) versus u in (c). The thick dashedlines indicate the 95% point-wise confidence interval for the median estimate with the biasignored.
yourself.
7.3 Computer Codes
The following is the R code for making Figures 7.1 and 7.2.
data=read.table("c:/res-teach/xiamen12-06/data/ex7-1.txt")
y=data[,14]
x1=data[,1]
x6=data[,6]
x10=data[,10]
x11=data[,11]
x13=data[,13]
CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA? 93
y_log=log(y)
library(gam)
fit_gam=gam(y_log~lo(x6)+lo(x10)+lo(x11)+lo(x13))
resid=fit_gam$residuals
y_hat=fit_gam$fitted
postscript(file="c:/res-teach/xiamen12-06/figs/fig-7.1.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(2,2),mex=0.4)
plot(fit_gam)
title(main="Component of X_13",col.main="red",cex=0.6)
dev.off()
fit_gam1=gam(y_log~lo(x6)+x10+x11+x13)
s1=fit_gam1$smooth[,1] # obtain the smoothed component
resid1=fit_gam1$residuals
y_hat1=fit_gam1$fitted
print(summary(fit_gam1))
postscript(file="c:/res-teach/xiamen12-06/figs/figs/fig-7.2.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(2,2),mex=0.4)
plot(y_hat,resid,type="p",pch="o",ylab="",xlab="y_hat")
title(main="Residual Plot of Additive Model",col.main="red",cex=0.6)
abline(0,0)
plot(x6,s1,type="p",pch="o",ylab="s1(x6)",xlab="x6")
title(main="Component of X_6",col.main="red",cex=0.6)
plot(y_hat1,resid1,type="p",pch="o",ylab="",xlab="y_hat")
title(main="Residual Plot of Model II",col.main="red",cex=0.5)
abline(0,0)
plot(density(y),ylab="",xlab="",main="Density of Y")
dev.off()
CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA? 94
7.4 References
Belsley, D.A., E. Kuh and R.E. Welsch (1980). Regression Diagnostic: Identifying Influen-tial Data and Sources of Collinearity. New York: Wiley.
Breiman, L. and J.H. Friedman (1985). Estimating optimal transformation for multipleregression and correlation. Journal of the American Statistical Association, 80, 580-619.
Cai, Z., J. Fan and R. Li (2000). Efficient estimation and inferences for varying-coefficientmodels. Journal of the American Statistical Association, 95, 888-902.
Cai, Z., J. Fan and Q. Yao (2000). Functional-coefficient regression models for nonlineartime series. Journal of the American Statistical Association, 95, 941-956.
Cai, Z. and X. Xu (2005). Nonparametric quantile estimations for dynamic smooth coeffi-cient models. Revised for Journal of the American Statistical Association.
Chaudhuri, P., K. Doksum and A. Samarov (1997). On average derivative quantile regres-sion. The Annuals of Statistics, 25, 715-744.
Fan, J. and J. Jiang (2005). Nonparametric inference for additive models. Journal of theAmerican Statistical Association, 100, 890-907.
Fan, J., C. Zhang and J. Zhang (2001). Generalized likelihood ratio statistics and Wilksphenomenon. The Annals of Statistics, 29, 153-193.
Gilley, O.W. and R.K. Pace (1996). On the Harrison and Rubinfeld Data. Journal ofEnvironmental Economics and Management, 31, 403-405.
Harrison, D. and D.L. Rubinfeld (1978). Hedonic housing prices and demand for clean air.Journal of Environmental Economics and Management, 5, 81-102.
Opsomer, J.D. and D. Ruppert (1998). A fully automated bandwidth selection for additiveregression model. Journal of The American Statistical Association, 93, 605-618.
Pace, R.K. (1993). Nonparametric methods with applications hedonic models. Journal ofReal Estate Finance and Economics, 7, 185-204.
Pace, R.K. and O.W. Gilley (1997). Using the spatial configuration of the data to improveestimation. Journal of the Real Estate Finance and Economics, 14, 333-340.
Senturk, D. and H.G. Muller (2005a). Covariate-adjusted regression. Biometrika, 92, 75-89.
Senturk, D. and H.G. Muller (2005b). Covariate adjusted correlation analysis. Scandina-vian Journal of Statistics, 32, 365-383.
Senturk, D. and H.G. Muller (2006). Inference for covariate adjusted regression via varyingcoefficient models. The Annals of Statistics, 34, 654-679.
CHAPTER 7. HOW TO ANALYZE BOSTON HOUSE PRICE DATA? 95
Yu, K. and Z. Lu (2004). Local linear additive quantile regression. Scandinavian Journalof Statistics, 31, 333-346.
Chapter 8
Value at Risk
8.1 Methodology
Value at Risk (VaR) is one of the basic measures of the risk of a loss in a particular
portfolio. It is typically stated in a form like, “there is a one percent chance that, over a
particular day of trading, the position could lose” some dollar amount. We can state it more
generally as VaR(p,N) = V , where V is the dollar amount of the possible loss, p is the
percentage (managers might commonly be interested in the 1%, 5%, or 10%), and N is the
time period (one day, one week, one month). To get from one-day VaR to N -day VaR, if the
risks are independent and identically distributed, we multiply by√N (Please verify this.
Do you need any assumption?).
Or, in one irreverent definition, VaR is “a number invented by purveyors of panaceas for
pecuniary peril intended to mislead senior management and regulators into false confidence
that market risk is adequately understood and controlled”. VaR is called upon for a variety
of tasks (enumerated in Duffie & Pan 1997): measure risk exposure; ensure correct capital
allocation; provide information to counter-parties, regulators, auditors, and other stake-
holders; evaluate and provide incentives to profit centers within the firm; and protect against
financial distress. Note that the final desired outcome (protection against loss) is not the
only desired outcome. Being too safe costs money and loses business!
VaR is an important component of bank regulation: the Basle Accord sets capital based
on the 10-day 1% VaR (so, if risks are iid, then the 10-day VaR is three times larger than
the one-day). The capital is set at least 3 times as high as the 10-day 1% VaR, and can be
as much as four times higher if the bank’s own VaR calculations have performed poorly in
96
CHAPTER 8. VALUE AT RISK 97
the past year. If a bank hits their own 1% VaR more than 3 times in a year (of 250 trading
days) then their capital adequacy rating is increased for the next year. Poor modeling can
carry a high cost!
If we graph the distribution of possible returns of a portfolio, we can interpret VaR as a
measure of a percentile. If we plot the cumulative distribution function (cdf) of the portfolio
value, then the VaR is simply the inverse of the probability:
VaR = F−1(p)
Or if we graph the probability density function (pdf), then VaR is the value that gives a
particular area under the curve. This is obviously similar to hypothesis testing in statistics.
Typically we are concerned with the lower tail of losses; therefore some people might
refer to either the 1% VaR or the 99% V aR. From the mathematical description above, the
99% VaR would seem to be the upper tail probability (therefore the loss in a short position)
but this is rarely the intent - the formulation typically refers to the idea that 99% of losses
will be smaller (in absolute value) than V .
There are three basic methods of computing the Value at Risk of some portfolio:
(a) parametric models that assume some error distribution,
(b) historical returns that use a selection of past data, and
(c) computationally intensive Monte Carlo simulations that draw thousands of possible
future paths.
Each of these typically has two steps: first determine a possible path for the value of some
underlying asset, then determine the change in the value of the portfolio assets due to it.
The parametric model case assumes that future returns can be modeled with some
statistical distribution such as the normal. A statistician would note that P ((X − µ)/σ <
−Z) = p and, in the case of the normal distribution, for p = 0.01 (a 1% one-tailed critical
value), −Z = −2.326. For other distributions than the normal distribution, there would
be different critical values. Sometimes a modeler would use the t-distribution (with some
CHAPTER 8. VALUE AT RISK 98
degrees of freedom) or even the Cauchy (a t-distribution with 1 df) to give heavier tails
and generalized Pareto distribution (see later) or some more sophisticated and complex
distributions (for example, hyperbolic distributions; see Eberlein and Keller (1995)).
Regardless of what distribution is chosen, the above probability can be re-written from
P (X < µ− σ Z) = p, so that the quantity, (µ− σ Z), is exactly V , the value at risk.
An extension of the parametric approach is to use the Greeks that we have derived from
the Black-Scholes-Merton model (under the assumption of normally-distributed errors) to
calculate possible changes in the portfolio values.
We can also incorporate information from options prices into the calculation of forward
value at risk. For instance, if S&P 500 option prices imply a high VIX (volatility) then the
model could use this volatility instead of the historical average. In extracting information
from derivatives prices we must be careful of the difference between the risk-neutral distri-
bution and the distribution of actual price changes VaR is based on actual changes not some
perfectly-hedged portfolio!
The basic method of historical returns would just select some reference period of
history (say, the past 2 years), order the returns from that period, and select the 100 × p%
worst one to represent the value at risk based on past trading history. Basically instead
of using some parametric cdf, this method uses a historical cdf (an empirical cdf). The
choice of historical period means that this method is not quite as simple as it might sound.
The parametric method has a potential misspecification if the true distribution is not the
specified one. However, the historical cdf (the empirical cdf) is always consistent estimate
of the true cdf. Therefore, the historical method would be better although the statistical
involvements are much more.
Monte Carlo simulations use computational power to compute a cumulative distri-
bution function. Possible future paths of asset prices and derivatives are computed and the
value of the portfolio calculated along each one. If the errors are drawn from a parametric
distribution then the Monte Carlo method will, in the limit, provide the same result. If the
errors are drawn from a historical distribution then the Monte Carlo method will give the
same result as that method. Typically a Monte Carlo simulation will use some of each.
Each method has strengths and weaknesses. The parametric method is quick and easy to
CHAPTER 8. VALUE AT RISK 99
implement with a minimum of computing power. It is good for calculating day-to-day trading
swings for thickly-traded assets and conventional derivatives (plain vanilla or with well-
specified Greeks). But most common probability distributions were developed for making
inferences about behavior in the center of the probability mass means and medians. The
tails tend to be poorly specified and so do not perform as well.
Each method must choose which underlying variables are considered and which covari-
ances. The covariances are tough (particularly for historical simulations) since the “curse
of dimensionality” quickly takes hold. For example, consider a histogram of 250 obser-
vations (one trading year) of returns on a single variable: with the distribution of the same
250 observations but now incorporating the joint behavior of two variables: you might see
that now many of the boxes represent a single observation (they are just 1 unit high) while
the possible maximum value is 8 (down from 40 previously). A series of “historical simu-
lations” that drew from these observations would have very few options, compared with a
parametric draw that needs only the correlation between the variables and a specification of
a distributional form. The imposition of a normal distribution might be unrealistic but it
might be less bad than the alternatives.
To handle the dimensionality problem, we might use principal components analy-
sis, that breaks down complex movements of variables (like the yield curve) into simpler
orthogonal “factor” movements that can then be used in place of the real changes.
To consider the VaR constructions in more detail, consider how to define the VaR for a
portfolio with just a single asset. If this asset has a normal distribution then a parametric
VaR would say there is a 1% chance that a day’s trading will end more than −2.326 standard
deviations below average (see the example above for details on calculating V from a Z
statistic). We would use information on the past performance of the asset to determine its
average and standard deviation. A historical simulation VaR would rank, say, the past 4
years of data (approximately 1000 observations) to find the worst 10.
For the last 4 years data on the OEX (S&P 100) (please download this dataset and
analyze it by yourself), the mean daily return (expressed in logs) is 0.018% and its standard
error is 0.013. Using the 1% Z value that we had above −2.326, we find a parametric 1-day
1% VaR of −3.06%. If we instead use the historical returns we find a 1-day 1% VaR of
CHAPTER 8. VALUE AT RISK 100
3.49%. How different are these numbers? In that same time period, the parametric VaR
was exceeded on 1.3% of days; the historical returns VaR would be hit only 0.4% of the
time if the parametric model were correct. We should add a number of caveats, principally
to note that the historical period which we are considering includes September 2001, when
trading stopped for a week. When trading resumed the S&P 100 had fallen over 5%, but
this underestimates the financial impact. Other world markets were open so any positions
that crossed international borders would have lost more. We might choose not to include
that episode, in which case the VaR numbers rise to a parametric 3.03% or historical 3.28%.
There is a general tradeoff between the amount of historical data used and the degree
of imposed inference. No model is completely theory-independent (the historical simulation
model assumes that the future will be sampled from some defined past). As the amount
of theory increases, more inferences can be drawn but at a cost: if the theory is incorrect
then the inferences will be worse (although if the theory is right then the inferences will be
sharper).
All these methods share a weakness of looking to the past for guidance, rather than
seriously considering future events that are genuinely new, and tries to push them further,
by asking how a portfolio position would react so some of the most extreme price/volume
changes in past decades.
Back Testing acts to check out that the percentages seem to line up. Not only must
the number of actual “hits” be close to the predicted number, but they should also be
independent of each other. “Hits” that are clustered indicate that the VaR is not adequately
dependent upon current market valuations. Back testing can also explore the adequacy of
other percentile measures of VaR, on the insight that, if the 5% VaR (or 10%, or 0.5%, or
whatever) is wrong then we might have more skepticism for the validity of the 1% measure.
For more details, see Chapter 6 of Crouhy, Galai, and Mark (2001).
All of these methods also find only a single percentile of the distribution. Suppose a firm
has some exotic option which will be worth $100 in 99.5% of the trading days but will lose
$1000 on 0.5%. Then the 1% VaR is $100. Even if the option lost $10,000 or $1,000,000
on 0.5% of trading days, the calculated VaR would not change! In the pictures of the cdf
CHAPTER 8. VALUE AT RISK 101
and pdf, this is simply that the left-hand tail can be drawn out to stretch into huge losses.
Therefore some advocate the calculation of a Conditional VaR, which is the expected loss
conditional on it being among the worst 1% of trading days.
The VaR methods all basically assume that the trading desk is a machine that can gener-
ate any basket of risk/return on the efficient frontier that is desired; the role of management
is to specify how much risk they want to take on (and therefore how high of a return they
want). This assumes that risk is a univariate measure with a defined specification, but there
are many sources of risk. If management is genuinely so disconnected from the trading desk
that is taking on this risk, then there is a real problem that no VaR can solve! If management
is more closely involved, then they should understand the panoply of risks.
VaRs are a tool of risk management. Emanuel Derman writes that risk managers “need to
love markets and know what to worry about: bad marks, misspecified contracts, uncertainty,
risk, value-at-risk, illiquid positions, exotic options” (in Risk, May 2004). VaR is one part
of a long list.
8.2 R Commends (R Menu)
The package VaR contains mainly two functions VaR.gpd() for using a fit of Generalized
Pareto Distribution (GPD) and VaR.norm() for a log normal distribution. This function
VaR.gpd() estimates Value at Risk and Expected Shortfall (ES) of a single risk factor with
a given confidence by using a fit of GPD to the part of data exceeding a given threshold
(Peak Over Threshold (POT) Method). The input data transformed to precentral daily
return. Then, transformed data is sorted and only part exceeding a given threshold is hold.
Threshold is calculated according an expression p.tr*std. Log likelihood fit is then applied
to get values of VaR and ES. After that, confidence intervals for this values are calculated;
see Embrechts, Kluppelberg and Mikosch (1997) and Tsay (2005) for details. The R menu
for VaR package is VaR.pdf, which can be downloaded, for example, from the web site at
http://cran.cnr.berkeley.edu/doc/packages/VaR.pdf.
8.2.1 Generalized Pareto Distribution
The Pareto distribution is a skewed, heavy-tailed distribution that is sometimes used to
model the distribution of incomes. Let F (x) = 1 − x−1/k for x > 1, where k > 0 is a
CHAPTER 8. VALUE AT RISK 102
parameter. The distribution defined by this function is called the Pareto distribution with
shape parameter k, and is named for the economist Vilfredo Pareto. The density function
f(·) is given by f(x) = k−1x−(1+1/k) for x > 1. Clearly, f(x) is decreasing for x > 1. f(·)decreases faster as k decreases and the quantile function is F−1(p) = (1−p)−k for 0 < p < 1.
The Pareto distribution is a heavy-tailed distribution. Thus, the mean, variance, and other
moments are finite only if the shape parameter k satisfies certain conditions.
As with many other distributions, the Pareto distribution is often generalized by adding
a scale parameter. Thus, suppose that Z has the Pareto distribution with shape parameter
k. If σ > 0, the random variable X = σ Z has the Pareto distribution with shape parameter
k and scale parameter σ. Note that X takes values in the interval (σ, ∞). Analogues
of the results given above follow easily from basic properties of the scale transformation.
The density function is f(x) = k−1 σ1/k x−(1+1/k) for x > σ and distribution function is
F (x) = 1− (x/σ)−1/k for x > σ. The quantile function is F−1(p) = σ (1−p)−k for 0 < p < 1.
A more general form for the generalized Pareto distribution with shape parameter k 6= 0,
scale parameter σ, and threshold parameter θ, is
f(x) =1
σ
(1 + k
x− θ
σ
)1/k
for θ < x, when k > 0, or for θ < x < −σ/k when k < 0. In the limit for k = 0, the density
is
f(x) =1
σexp
(−x− θ
σ
)
for θ < x. If k = 0 and θ = 0, the generalized Pareto distribution is equivalent to the
exponential distribution. If k > 0 and θ = σ, the generalized Pareto distribution is equivalent
to the Pareto distribution.
8.2.2 Background of the Generalized Pareto Distribution
Like the exponential distribution, the generalized Pareto distribution is often used to model
the tails of another distribution. For example, you might have washers from a manufacturing
process. If random influences in the process lead to differences in the sizes of the washers,
a standard probability distribution, such as the normal, could be used to model those sizes.
However, while the normal distribution might be a good model near its mode, it might not
CHAPTER 8. VALUE AT RISK 103
be a good fit to real data in the tails and a more complex model might be needed to describe
the full range of the data. On the other hand, only recording the sizes of washers larger
(or smaller) than a certain threshold means you can fit a separate model to those tail data,
which are known as exceedance. You can use the generalized Pareto distribution in this way,
to provide a good fit to extremes of complicated data.
The generalized Pareto distribution allows a continuous range of possible shapes that
includes both the exponential and Pareto distributions as special cases. You can use either
of those distributions to model a particular dataset of exceedance. The generalized extreme
value distribution allows you to “let the data decide” which distribution is appropriate.
The generalized Pareto distribution has three basic forms, each corresponding to a limiting
distribution of exceedance data from a different class of underlying distributions.
(1) Distributions whose tails decrease exponentially, such as the normal, lead to a
generalized Pareto shape parameter of zero.
(2) Distributions whose tails decrease as a polynomial, such as Student’s t, lead to
a positive shape parameter.
(3) Distributions whose tails are finite, such as the beta, lead to a negative shape
parameter.
8.3 Reading Materials I and II (see Handouts)
The reading materials I and II are from Chapters 5 and 6 in the book by Crouhy, Galai, and
Mark (2001) about the basic concepts and methodologies on the value at risk.
8.4 New Developments (Nonparametric Approaches)
For details, see the paper by Cai and Wang (2006). If you like to read the whole paper, you
can download it from the web site at http://www.wise.xmu.edu.cn/ at WORKING
PAPER column. Next we present only a part of the whole paper of Cai and Wang (2006).
8.4.1 Introduction
The value-at-risk (hereafter, VaR) and expected shortfall have become two popular mea-
sures on market risk associated with an asset or a portfolio of assets during the last decade.
CHAPTER 8. VALUE AT RISK 104
In particular, VaR has been chosen by the Basle Committee on Banking Supervision as the
benchmark of risk measures for capital requirements and both of them have been used by
financial institutions for asset managements and minimization of risk as well as have been
developed rapidly as analytic tools to assess riskiness of trading activities. See, to name just
a few, Morgan (1996), Duffie and Pan (1997), Jorion (2001, 2003), and Duffie and Singleton
(2003) for the financial background, statistical inferences, and various applications. In terms
of the formal definition, VaR is simply a quantile of the loss distribution (future portfolio
values) over a prescribed holding period (e.g., 2 weeks) at a given confidence level, while ES
is the expected loss, given that the loss is at least as large as some given quantile of the loss
distribution (e.g., VaR). It is well known from Artzner, Delbaen, Eber and Heath (1999)
that ES is a coherent risk measure such as it satisfies the following four axioms:
• homogeneity: increasing the size of a portfolio by a factor should scale
its risk measure by the same factor,
• monotonicity: a portfolio must have greater risk if it has systematically
lower values than another,
• risk-free condition or translation invariance: adding some amount of
cash to a portfolio should reduce its risk by the same amount, and
• subadditivity: the risk of a portfolio must be less than the sum of
separate risks or merging portfolios cannot increase risk.
VaR satisfies homogeneity, monotonicity, and risk-free condition but is not sub-
additive. See Artzner, et al. (1999) for details. As advocated by Artzner, et al. (1999), ES
is preferred due to its better properties although VaR is widely used in applications.
Measures of risk might depend on the state of the economy since economic and market
conditions vary from time to time. This requires risk managers should focus on the condi-
tional distributions of profit and loss, which take full account of current information about
the investment environment (macroeconomic and financial as well as political) in forecasting
future market values, volatilities, and correlations. As pointed out by Duffie and Singleton
(2003), not only are the prices of the underlying market indices changing randomly over
time, the portfolio itself is changing, as are the volatilities of prices, the credit qualities of
counterparties, and so on. On the other hand, one would expect the VaR to increase as the
past returns become very negative, because one bad day makes the probability of the next
CHAPTER 8. VALUE AT RISK 105
somewhat greater. Similarly, very good days also increase the VaR, as would be the case for
volatility models. Therefore, VaR could depend on the past returns in someway. Hence, an
appropriate risk analytical tool or methodology should be allowed to adapt to varying mar-
ket conditions and to reflect the latest available information in a time series setting rather
than the iid framework. Most of the existing risk management literature has concentrated on
unconditional distributions and the iid setting although there have been some studies on the
conditional distributions and time series data. For more background, see Chernozhukov and
Umanstev (2001), Cai (2002), Fan and Gu (2003), Engle and Manganelli (2004), Cai and
Xu (2005), Scaillet (2005), and Cosma, Scaillet and von Sachs (2006), and references therein
for conditional models, and Duffie and Pan (1997), Artzner, et al. (1999), Rockafellar and
Uryasev (2000), Acerbi and Tasche (2002), Frey and McNeil (2002), Scaillet (2004), Chen
and Tang (2005), Chen (2006), and among others for unconditional models. Also, most of
studies in the literature and applications are limited to parametric models, such as all stan-
dard industry models like CreditRisk+, CreditMetrics, CreditPortfolio View and the model
proposed by the KMV corporation. See Chernozhukov and Umanstev (2001), Frey and Mc-
Neil (2002), Engle and Manganelli (2004), and references therein on parametric models in
practice and Fan and Gu (2003) and references therein for semiparametric models.
The main focus of this paper is on studying the conditional value-at-risk (CVaR) and
conditional expected shortfall (CES) and proposing a new nonparametric estimation proce-
dure to estimate CVaR and CES functions where the conditional information is allowed to
contain economic and market (exogenous) variables and the past observed returns. Paramet-
ric models for CVaR and CES can be most efficient if the underlying functions are correctly
specified. See Chernozhukov and Umanstev (2001) for a polynomial type regression model
and Engle and Manganelli (2004) for a GARCH type parametric model for CVaR based
on regression quantile. However, a misspecification may cause serious bias and model con-
straints may distort the underlying distributions. A nonparametric modeling is appealing
in several aspects. One of the advantages for nonparametric modeling is that little or no re-
strictive prior information on functionals is needed. Further, it may provide a useful insight
for further parametric fitting.
This paper has several contributions to the literature. The first one is to propose a new
nonparametric approach to estimate CVaR and CES. In essence, our estimator for CVaR is
based on inverting a newly proposed estimator of the conditional distribution function for
CHAPTER 8. VALUE AT RISK 106
time series data and the estimator for CES is by a plugging-in method based on plugging
in the estimated conditional probability density function and the estimated CVaR func-
tion. Note that they are analogous to the estimators studied by Scaillet (2005) by using
the Nadaraya-Watson (NW) type double kernel (smoothing in both the y and x directions)
estimation, and Cai (2002) by utilizing the weighted Nadaraya-Watson (WNW) kernel type
technique to avoid the so-called boundary effects as well as Yu and Jones (1998) by employ-
ing the double kernel local linear method. More precisely, our newly proposed estimator
combines the WNW method of Cai (2002) and the double kernel local linear technique of
Yu and Jones (1998), termed as weighted double kernel local linear (WDKLL) estimator.
The second contribution is to establish the asymptotic properties for the WDKLL esti-
mators of the conditional probability density function (PDF) and cumulative distribution
function (CDF) for the α-mixing time series at both boundary and interior points. It is
therefore shown that the WDKLL method enjoys the same convergence rates as those of the
double kernel local linear estimator of Yu and Jones (1998) and the WNW estimator of Cai
(2002). It is also shown that the WDKLL estimators have desired sampling properties at
both boundary and interior points of the support of the design density, which seems to be
seminal. Finally, we derive the WDKLL estimator of CVaR by inverting the WDKLL condi-
tional distribution estimator and the WDKLL estimator of CES by plugging in the WDKLL
estimators of PDF and CVaR. We show that the WDKLL estimator of CVaR exists always
due to the WDKLL estimator of CDF being a distribution function itself, and that it inherits
all better properties from the WDKLL estimator of CDF; that is, the WDKLL estimator of
CDF is a CDF and differentiable, and it possess the asymptotic properties such as design
adaption, avoiding boundary effects, and mathematical efficiency. Note that to preserve
shape constraints, recently, Cosma, Scaillet and von Sachs (2006) used a wavelet method to
estimate conditional probability density and cumulative distribution functions and then to
estimate conditional quantiles.
Note that CVaR defined here is essentially the conditional quantile or quantile regression
of Koenker and Bassett (1978), based on the conditional distribution, rather than CVaR
defined in some risk management literature (see, e.g., Rockafellar and Uryasev, 2000; Jorion,
2001, 2003) which is what we call ES here. Also, note that the ES here is called TailVaR in
Artzner, et al. (1999). Moreover, as aforementioned, CVaR can be regarded as a special case
of quantile regression. See Cai and Xu (2005) for the state-of-the-art about current research
CHAPTER 8. VALUE AT RISK 107
on nonparametric quantile regression, including CVaR. Further, note that both ES and CES
have been known for decades among actuary sciences and they are very popular in insurance
industry. Indeed, they have been used to assess risk on a portfolio of potential claims, and to
design reinsurance treaties. See the book by Embrechts, Kluppelberg, and Mikosch (1997)
for the excellent review on this subject and the papers by McNeil (1997), Hurlimann (2003),
Scaillet (2005), and Chen (2006). Finally, ES or CES is also closely related to other applied
fields such as the mean residual life function in reliability and the biometric function in
biostatistics. See Oakes and Dasu (1990) and Cai and Qian (2000) and references therein.
The rest of the paper is organized as follows. Section 8.4.2 provides a blueprint for
the basic notations and concepts. In Section 8.4.3, we present the detailed motivations and
formulations for the new nonparametric estimation procedures for estimating the conditional
PDF, CDF, VaR and ES. We establish the asymptotic properties of these nonparametric
estimators at both boundary and interior points with a comparison in Section 4. Together
with a convenient and efficient data-driven method for selecting the bandwidth based on
the nonparametric Akaike information criterion (AIC), Monte Carol simulation studies and
empirical applications on several stock index and stock returns are presented in Section
5. Finally, the derivations of the theorems are given in Section 6 with some lemmas and
the Appendix contains the technical proofs of certain lemmas needed in the proofs of the
theorems presented in Section 6.
8.4.2 Framework
Assume that the observed data (Xt, Yt); 1 ≤ t ≤ n, Xt ∈ ℜd, are available and they are
observed from a stationary time series model. Here Yt is the risk or loss variable which can
be the negative logarithm of return (log loss) and Xt is allowed to include both economic and
market (exogenous) variables and the lagged variables of Yt and also it can be a vector. But,
for the expositional purpose, we consider only the case when Xt is a scalar (d = 1). Note that
the proposed methodologies and their theory for the univariate case (d = 1) continue to hold
for multivariate situations (d > 1). Extension to the case d > 1 involves no fundamentally
new ideas. Note that models with large d are often not practically useful due to “curse of
dimensionality”.
We now turn to considering the nonparametric estimation of the conditional expected
CHAPTER 8. VALUE AT RISK 108
shortfall µp(x), which is defined as
µp(x) = E[Yt |Yt ≥ νp(x), Xt = x],
where νp(x) is the conditional value-at-risk, which is defined as the solution of
P (Yt ≥ νp(x) |Xt = x) = S(νp(x) |x) = p
or expressed as νp(x) = S−1(p |x), where S(y |x) is the conditional survival function of Yt
given Xt = x; S(y |x) = 1− F (y |x), and F (y |x) is the conditional cumulative distribution
function. It is easy to see that
µp(x) =
∫ ∞
νp(x)
y f(y |x) dy/p,
where f(y |x) is the conditional probability density function of Yt given Xt = x. To estimate
µp(x), one can use the plugging-in method as
µp(x) =
∫ ∞bνp(x)
y f(y |x) dy/p, (8.1)
where νp(x) is a nonparametric estimation of νp(x) and f(y |x) is a nonparametric estimation
of f(y |x). But the bandwidths for νp(x) and f(y |x) are not necessary to be same.
Note that Scaillet (2005) used the NW type double kernel method to estimate f(y |x)first, due to Roussas (1969), denoted by f(y |x), and then estimated νp(x) by inverting
the estimated conditional survival function, denoted by νp(x), and finally estimated µp(x)
by plugging f(y |x) and νp(x) into (8.1), denoted by µp(x), where νp(x) = S−1(y |x) and
S(y |x) =∫∞
yf(u |x)du. But, it is well documented (see, e.g., Fan and Gijbels, 1996) that the
NW kernel type procedures have serious drawbacks: the asymptotic bias involves the design
density so that they can not be adaptive, and boundary effects exist so that they require
boundary modifications. In particular, boundary effects might cause a serious problem for
estimating νp(x) since it is only concerned with the tail probability. The question is now
how to provide a better estimate for f(y |x) and νp(x) so that we have a good estimate for
µp(x). Therefore, we address this issue in the next section.
8.4.3 Nonparametric Estimating Procedures
We start with the nonparametric estimators for the conditional density function and its
distribution function first and then turn to discussing the nonparametric estimators for the
conditional VaR and ES functions.
CHAPTER 8. VALUE AT RISK 109
There are several methods available for estimating νp(x), f(y |x), and F (y |x) in the
literature, such as kernel and nearest-neighbor. To name just a few, see Lejeune and Sarda
(1988), Troung (1989), Samanta (1989), and Chaudhuri (1991) for iid errors, Roussas (1969)
and Roussas (1991) for Markovian processes, and Troung and Stone (1992) and Boente
and Fraiman (1995) for mixing sequences. To attenuate these drawbacks of the kernel type
estimators mentioned in Section 8.4.2, recently, some new methods have been proposed to
estimate conditional quantiles. The first one, a more direct approach, by using the “check”
function such as the robustified local linear smoother, was provided by Fan, Hu, and Troung
(1994) and further extended by Yu and Jones (1997, 1998) for iid data. A more general
nonparametric setting was explored by Cai and Xu (2005) for time series data. This modeling
idea was initialed by Koenker and Bassett (1978) for linear regression quantiles and Fan, Hu,
and Troung (1994) for nonparametric models. See Cai and Xu (2005) and references therein
for more discussions on models and applications. An alternative procedure is first to estimate
the conditional distribution function by using double kernel local linear technique of Fan,
Yao, and Tong (1996) and then to invert the conditional distribution estimator to produce
an estimator of a conditional quantile or CVaR. Yu and Jones (1997, 1998) compared these
two methods theoretically and empirically and suggested that the double kernel local linear
would be better.
Estimation of Conditional PDF and CDF
To make a connection between the conditional density (distribution) function and nonpara-
metric regression problem, it is noted by the standard kernel estimation theory (see, e.g.,
Fan and Gijbles, 1996) that for a given symmetric density function K(·),
EKh0(y − Yt) |Xt = x = f(y |x) +
h20
2µ2(K) f 2,0(y |x) + o(h2
0) ≈ f(y |x), as h0 → 0,
(8.2)
where Kh0(u) = K(u/h0)/h0, µ2(K) =
∫∞
−∞u2K(u)du, f 2,0(y |x) = ∂2/∂y2f(y |x), and ≈
denotes an approximation by ignoring the higher terms. Note that Y ∗t (y) = Kh0
(y− Yt) can
be regarded as an initial estimate of f(y |x) smoothing in the y direction. Also, note that
this approximation ignores the higher order terms O(hj0) for j ≥ 2, since they are negligible
if h0 = o(h), where h is the bandwidth used in smoothing in the x direction (see (8.3) below).
Therefore, the smoothing in the y direction is not important in the context of this subject
so that intuitively, it should be under-smoothed. Thus, the left hand side of (8.2) can be
CHAPTER 8. VALUE AT RISK 110
regraded as a nonparametric regression of the observed variable Y ∗t (y) versus Xt and the
local linear (or polynomial) fitting scheme of Fan and Gijbles (1996) can be applied to here.
This leads us to consider the following locally weighted least squares regression problem:
n∑
t=1
Y ∗t (y) − a− b (Xt − x)2 Wh(x−Xt), (8.3)
where W (·) is a kernel function and h = h(n) > 0 is the bandwidth satisfying h → 0 and
nh→ ∞ as n → ∞, which controls the amount of smoothing used in the estimation. Note
that (8.3) involves two kernels K(·) and W (·). This is the reason of calling “double kernel”.
Minimizing the above locally weighted least squares in (8.3) with respect to a and b, we
obtain the locally weighted least squares estimator of f(y |x), denoted by f(y |x), which is
a. From Fan and Gijbels (1996) or Fan, Yao and Tong (1996), f(y |x) can be re-expressed
as a linear estimator form as
fll(y |x) =n∑
t=1
Wll,t(x, h)Y∗t (y),
where with Sn,j(x) =∑n
t=1Wh(x−Xt) (Xt − x)j, the weights Wll,t(x, h) are given by
Wll,t(x, h) =[Sn,2(x) − (x−Xt)Sn,1(x)]Wh(x−Xt)
Sn,0(x)Sn,2(x) − S2n,1(x)
.
Clearly, Wll,t(x, h) satisfy the so-called discrete moments conditions as follows: for 0 ≤j ≤ 1,
n∑
t=1
Wll,t(x, h) (Xt − x)j = δ0,j =
1 if j = 00 otherwsie
(8.4)
based on the least squares theory; see (3.12) of Fan and Gijbels (1996, p.63). Note that the
estimator fll(y |x) can range outside [0, ∞). The double kernel local linear estimator of
F (y |x) is constructed (see (8) of Yu and Jones (1998)) by integrating fll(y |x)
Fll(y |x) =
∫ y
−∞
fll(y |x)dy =n∑
t=1
Wll,t(x, h)Gh0(y − Yt),
where G(·) is the distribution function of K(·) and Gh0(u) = G(u/h0). Clearly, Fll(y |x)
is continuous and differentiable with respect to y with Fll(−∞|x) = 0 and Fll(∞|x) = 1.
Note that the differentiability of the estimated distribution function can make the asymptotic
analysis much easier for the nonparametric estimators of CVaR and CES (see later).
CHAPTER 8. VALUE AT RISK 111
Although Yu and Jones (1998) showed that the double kernel local linear estimator has
some attractive properties such as no boundary effects, design adaptation, and mathematical
efficiency (see, e.g., Fan and Gijbels, 1996), it has the disadvantage of producing conditional
distribution function estimators that are not constrained either to lie between zero and one
or to be monotone increasing, which is not good for estimating CVaR if the inverting method
is used. In both these respects, the NW method is superior, despite its rather large bias
and boundary effects. The properties of positivity and monotonicity are particularly advan-
tageous if the method of inverting conditional distribution estimator is applied to produce
the estimator of a conditional quantile or CVaR. To overcome these difficulties, Hall, Wolff,
and Yao (1999) and Cai (2002) proposed the WNW estimator based on an empirical likeli-
hood principle, which is designed to possess the superior properties of local linear methods
such as bias reduction and no boundary effects, and to preserve the property that the NW
estimator is always a distribution function, although it might require more computational
efforts since it requires estimating and optimizing additional weights aimed at the bias cor-
rection. Cai (2002) discussed the asymptotic properties of the WNW estimator at both
interior and boundary points for the mixing time series under some regularity assumptions
and showed that the WNW estimator has a better performance than other competitors. See
Cai (2002) for details. Recently, Cosma, Scaillet and von Sachs (2006) proposed a shape
preserving estimation method to estimate cumulative distribution functions and probability
density functions using the wavelet methodology for multivariate dependent data and then
to estimate a conditional quantile or CVaR.
The WNW estimator of the conditional distribution F (y |x) of Yt given Xt = x is defined
by
Fc1(y |x) =n∑
t=1
Wc,t(x, h) I(Yt ≤ y), (8.5)
where the weights Wc,t(x, h) are given by
Wc,t(x, h) =pt(x)Wh(x−Xt)∑nt=1 pt(x)Wh(x−Xt)
, (8.6)
and pt(x) is chosen to be pt(x) = n−1 1 + λ (Xt − x)Wh(x−Xt)−1 ≥ 0 with λ, a
function of data and x, uniquely defined by maximizing the logarithm of the empirical
likelihood
Ln(λ) = −n∑
t=1
log 1 + λ (Xt − x)Wh(x−Xt)
CHAPTER 8. VALUE AT RISK 112
subject to the constraints∑n
t=1 pt(x) = 1 and the discrete moments conditions in (8.4); that
is,n∑
t=1
Wc,t(x, h) (Xt − x)j = δ0,j (8.7)
for 0 ≤ j ≤ 1. Also, see Cai (2002) for details on this aspect. In implementation, Cai (2002)
recommended using the Newton-Raphson scheme to find the root of equation L′n(λ) = 0.
Note that 0 ≤ Fc1(y |x) ≤ 1 and it is monotone in y. But Fc1(y |x) is not continuous in y
and of course, not differentiable in y either. Note that under regression setting, Cai (2001)
provided a comparison of the local linear estimator and the WNW estimator and discussed
the asymptotic minimax efficiency of the WNW estimator.
To accommodate all nice properties (monotonicity, continuity, differentiability, and lying
between zero and one) and the attractive asymptotic properties (design adaption, avoiding
boundary effects, and mathematical efficiency, see Cai (2002) for detailed discussions) of
both estimators Fll(y |x) and Fc1(y |x) under a unified framework, we propose the following
nonparametric estimators for the conditional density function f(y |x) and its conditional
distribution function F (y |x), termed as weighted double kernel local linear estimation,
fc(y |x) =n∑
t=1
Wc,t(x, h)Y∗t (y),
where Wc,t(x, h) is given in (8.6), and
Fc(y |x) =
∫ y
−∞
fc(y |x)dy =n∑
t=1
Wc,t(x, h)Gh0(y − Yt). (8.8)
Note that if pt(x) in (8.6) is a constant for all t, or λ = 0, then fc(y |x) becomes the classical
NW type double kernel estimator used by Scaillet (2005). However, Scaillet (2005) adopted
a single bandwidth for smoothing in both the y and x directions. Clearly, fc(y |x) is a
probability density function so that Fc(y |x) is a cumulative distribution function (monotone,
0 ≤ Fc(y |x) ≤ 1, Fc(−∞|x) = 0, and Fc(∞|x) = 1). Also, Fc(y |x) is continuous and
differentiable in y. Further, as expected, it will be shown that like Fc1(y |x), Fc(y |x) has
the attractive properties such as no boundary effects, design adaptation, and mathematical
efficiency.
Estimation of Conditional VaR and ES
We now are ready to formulate the nonparametric estimators for νp(x) and µp(x). To this
end, from (8.8), νp(x) is estimated by inverting the estimated conditional survival distribution
CHAPTER 8. VALUE AT RISK 113
Sc(y |x) = 1− Fc(y |x), denoted by νp(x) and defined as νp(x) = S−1c (p |x). Note that νp(x)
always exists since Sc(p |x) is a survival function itself. Plugging-in νp(x) and fc(y |x) into
(8.1), we obtain the nonparametric estimation of µp(x),
µp(x) = p−1
∫ ∞bνp(x)
y fc(y |x) dy = p−1
n∑
t=1
Wc,t(x, h)
∫ ∞bνp(x)
y Kh0(y − Yt)dy
= p−1
n∑
t=1
Wc,t(x, h)[Yt Gh0
(νp(x) − Yt) + h0G1,h0(νp(x) − Yt)
], (8.9)
where G(u) = 1 − G(u), G1,h0(u) = G1(u/h0), and G1(u) =
∫∞
uv K(v)dv. Note that as
mentioned earlier, νp(x) in (8.9) can be an any consistent estimator.
8.4.4 Real Examples
Example 1. Now we illustrate our proposed methodology by considering a real data set on
Dow Jones Industrials (DJI) index returns. We took a sample of 1801 daily prices from DJI
index, from November 3, 1998 to January 3, 2006, and computed the daily returns as 100
times the difference of the log of prices. Let Yt be the daily negative log return (log loss) of
DJI and Xt be the first lagged variable of Yt. The estimators proposed in this paper are used
to estimate the 5% CVaR and CES functions. The estimation results are shown in Figure 8.1
for the 5% CVaR estimate in Figure 8.1(a) and the 5% CES estimate in Figure 8.1(b). Both
CVaR and CES estimates exhibit a U-shape, which corresponds to the so-called “volatility
smile”. Therefore, the risk tends to be lower when the lagged log loss of DJI is close to the
empirical average and larger otherwise. We can also observe that the curves are asymmetric.
This may indicate that the DJI is more likely to fall down if there was a loss within the last
day than there was a same amount positive return.
Example 2. We apply the proposed methods to estimate the conditional value-at-risk and
expected shortfall of the International Business Machine Co. (NYSE: IBM) security returns.
The data are daily prices recorded from March 1, 1996 to April 6, 2005. We use the same
method to calculate the daily returns as in Example 3. In order to estimate the value-
at-risk of a stock return, generally, the information set Xt may contain a market index of
corresponding capitalization and type, the industry index, and the lagged values of stock
return. For this example, Yt is the log loss of IBM stock returns and only two variables are
chosen as information set for the sake of simplicity. Let Xt1 be the first lagged variable of
Yt and Xt2 denote the first lagged daily log loss of Dow Jones Industrials (DJI) index. Our
CHAPTER 8. VALUE AT RISK 114
−1.0 −0.5 0.0 0.5 1.0
1.60
1.65
1.70
1.75
1.80
1.85
1.90
(a) Conditional VaR
−1.0 −0.5 0.0 0.5 1.0
2.2
2.3
2.4
2.5
2.6
(b) Conditional Es
Figure 8.1: (a) 5% CVaR estimate for DJI index. (b) 5% CES estimate for DJI index.
main results from the estimation of the model are summarized in Figure 8.2. The surfaces
of the estimators of IBM returns are given in Figure 8.2(a) for CVaR and in Figure 8.2(b)
for CES. For visual convenience, Figures 8.2(c) and (e) depict the estimated CVaR and CES
curves (as function of Xt2) for three different values of Xt1 = (−0.275, −0.025, 0.325) and
Figures 8.2(d) and (f) display the estimated CVaR and CES curves (as function of Xt1) for
three different values of Xt2 = (−0.225, 0.025, 0.425).
From Figures 8.2(c) - (f), we can observe that most of these curves are U-shaped. This is
consistent with the results observed in Example 1. Also, we can see that these three curves
in each figure are not parallel. This implies that the effects of lagged IBM and lagged DJI
variables on the risk of IBM are different and complex. To be concrete, let us examine Figure
8.2(d). Three curves are close to each other when the lagged IBM log loss is around −0.2
and far away otherwise. This implies that DJI has fewer effects (less information) on CVaR
around this value. Otherwise, DJI has more effects when the lagged IBM log loss is far from
this value.
8.5 References
Acerbi, C. and D. Tasche (2002). On the coherence of expected shortfall. Journal ofBanking and Finance, 26, 1487-1503.
CHAPTER 8. VALUE AT RISK 115
IBM
−0.4
−0.2
0.00.2
0.4
DJI
−0.4
−0.2
0.0
0.20.4
2.62.7
2.8
2.9
(a) Conditional VaR surface
IBM
−0.4
−0.2
0.00.2
0.4
DJI
−0.4
−0.2
0.0
0.20.4
3.84.04.24.44.6
(b) Conditional ES surface
−0.4 −0.2 0.0 0.2 0.42.6
02
.70
2.8
02
.90
(c) Conditional VaR
x1=−0.275x1=−0.025x1=0.350
−0.4 −0.2 0.0 0.2 0.4
2.6
2.7
2.8
2.9
(d) Conditional VaR
x2=−0.225x2=0.025x2=0.425
−0.4 −0.2 0.0 0.2 0.43.7
3.8
3.9
4.0
4.1
4.2
(e) Conditional ES
x1=−0.275x1=−0.025x1=0.350
−0.4 −0.2 0.0 0.2 0.4
3.8
4.0
4.2
4.4
4.6
(f) Conditional ES
x2=−0.225x2=0.025x2=0.425
Figure 8.2: (a) 5% CVaR estimates for IBM stock returns. (b) 5% CES estimates for IBMstock returns index. (c) 5% CVaR estimates for three different values of lagged negativeIBM returns (−0.275, −0.025, 0.325). (d) 5% CVaR estimates for three different values oflagged negative DJI returns (−0.225, 0.025, 0.425). (e) 5% CES estimates for three differentvalues of lagged negative IBM returns (−0.275, −0.025, 0.325). (f) 5% CES estimates forthree different values of lagged negative DJI returns (−0.225, 0.025, 0.425).
CHAPTER 8. VALUE AT RISK 116
Artzner, P., F. Delbaen, J.M. Eber, and D. Heath (1999). Coherent measures of risk.Mathematical Finance, 9, 203-228.
Boente, G. and R. Fraiman (1995). Asymptotic distribution of smoothers based on localmeans and local medians under dependence. Journal of Multivariate Analysis, 54,77-90.
Cai, Z. (2001). Weighted Nadaraya-Watson regression estimation. Statistics and ProbabilityLetters, 51, 307-318.
Cai, Z. (2002). Regression quantiles for time series data. Econometric Theory, 18, 169-192.
Cai, Z. (2006). Trending time varying coefficient time series models with serially correlatederrors. Journal of Econometrics, 136, xxx-xxx.
Cai, Z. and L. Qian (2000). Local estimation of a biometric function with covariate effects.In Asymptotics in Statistics and Probability (M. Puri, ed), 47-70.
Cai, Z. and R.C. Tiwari (2000). Application of a local linear autoregressive model to BODtime series. Environmetrics, 11, 341-350.
Cai, Z. and X. Xu (2005). Nonparametric quantile estimations for dynamic smooth coeffi-cient models. Working Paper, Department of Mathematics and Statistics, Universityof North Carolina at Charlotte. Revised for Journal of the American Statistical Asso-ciation.
Carrasco, M. and X. Chen (2002). Mixing and moments properties of various GARCH andstochastic volatility models. Econometric Theory, 18, 17-39.
Chaudhuri, P. (1991). Nonparametric estimates of regression quantiles and their localBahadur representation. The Annals of statistics, 19, 760-777.
Chen, S.X. (2006). Nonparametric estimation of expected shortfall. Working paper, De-partment of Statistics, Iowa State University.
Chen, S.X. and C.Y. Tang (2005). Nonparametric inference of value at risk for dependentfinancial returns. Journal of Financial Econometrics, 3, 227-255.
Chernozhukov, V. and L. Umanstev (2001). Conditional value-at-risk: Aspects of modelingand estimation. Empirical Economics, 26, 271-292.
Cosma, A., O. Scaillet and R. von Sachs (2006). Multivariate wavelet-based shape preserv-ing estimation for dependent observations. Bernoulli, in press.
Crouhy, M., D. Galai and R. Mark (2001). Risk Management. McGraw-Hill, New York.
Duffie, D. and J. Pan (1997). An overview of value at risk. Journal of Derivatives, 4, 7-49.
Duffie, D. and K.J. Singleton (2003). Credit Risk: Pricing, Measurement, and Management.Princeton: Princeton University Press.
CHAPTER 8. VALUE AT RISK 117
Eberlein, E. and U. Keller (1995). Hyperbolic distributions in finance. Bernoulli, 281-299.
Embrechts, P., C. Kluppelberg, and T. Mikosch (1997). Modeling Extremal Events ForFinance and Insurance. New York: Springer-Verlag.
Engle, R.F. and S. Manganelli (2004). CAViaR: conditional autoregressive value at risk byregression quantile. Journal of Business and Economics Statistics, 22, 367-381.
Fan, J. and I. Gijbels (1996). Local Polynomial Modeling and Its Applications. London:Chapman and Hall.
Fan, J. and J. Gu (2003). Semiparametric estimation of value-at-risk. Econometrics Jour-nal, 6, 261-290.
Fan, J., T.-C. Hu and Y.K. Troung (1994). Robust nonparametric function estimation.Scandinavian Journal of Statistics, 21, 433-446.
Fan, J., Q. Yao, and H. Tong (1996). Estimation of conditional densities and sensitivitymeasures in nonlinear dynamical systems. Biometrika, 83, 189-206.
Frey, R. and A.J. McNeil (2002). VaR and expected shortfall in portfolios of dependentcredit risks: conceptual and practical insights. Journal of Banking and Finance, 26,1317-1334.
Hall, P., R.C.L. Wolff, and Q. Yao (1999). Methods for estimating a conditional distributionfunction. Journal of the American Statistical Association, 94, 154-163.
Hurlimann, W. (2003). A Gaussian exponential approximation to some compound Poissondistributions. ASTIN Bulletin, 33, 41-55.
Ibragimov, I.A. and Yu. V. Linnik (1971). Independent and Stationary Sequences of Ran-dom Variables. Groningen, the Netherlands: Walters-Noordhoff.
Jorion, P. (2001). Value at Risk, 2nd Edition. New York: McGraw-Hill.
Jorion, P. (2003). Financial Risk Manager Handbook, 2nd Edition. New York: John Wiley.
Koenker, R. and G.W. Bassett (1978). Regression quantiles. Econometrica, 46, 33-50.
Lejeune, M.G. and P. Sarda (1988). Quantile regression: a nonparametric approach. Com-putational Statistics and Data Analysis, 6, 229-281.
Masry, E. and J. Fan (1997). Local polynomial estimation of regression functions for mixingprocesses. The Scandinavian Journal of Statistics, 24, 165-179.
McNeil, A. (1997). Estimating the tails of loss severity distributions using extreme valuetheory. ASTIN Bulletin, 27, 117-137.
Morgan, J.P. (1996). Risk Metrics - Technical Documents, 4th Edition, New York.
Oakes, D. and T. Dasu (1990). A note on residual life. Biometrika, 77, 409-410.
CHAPTER 8. VALUE AT RISK 118
Potscher, B.M. and I.R. Prucha (1997). Dynamic Nonlinear Econometric Models: Asymp-totic Theory. Berlin: Springer-Verlag.
Rockafellar, R. and S. Uryasev (2000). Optimization of conditional value-at-risk. Journalof Risk, 2, 21-41.
Roussas, G.G. (1969). Nonparametric estimation of the transition distribution function ofa Markov process. The Annals of Mathematical Statistics, 40, 1386-1400.
Roussas, G.G. (1991). Estimation of transition distribution function and its quantiles inMarkov processes: Strong consistency and asymptotic normality. In G.G. Roussas(ed.), Nonparametric Functional Estimation and related Topics, pp. 443-462. Amster-dam: Kluwer Academic.
Samanta, M. (1989). Nonparametric estimation of conditional quantiles. Statistics andProbability Letters, 7, 407-412.
Scaillet, O. (2004). Nonparametric estimation and sensitivity analysis of expected shortfall.Mathematical Finance, 14, 115-129.
Scaillet, O. (2005). Nonparametric estimation of conditional expected shortfall. RevueAssurances et Gestion des Risques/Insurance and Risk Management Journal, 74, 639-660.
Troung, Y.K. (1989). Asymptotic properties of kernel estimators based on local median.The Annals of Statistics, 17, 606-617.
Troung, Y.K. and C.J. Stone (1992) Nonparametric function estimation involving timeseries. The Annals of Statistics 20, 77-97.
Tsay, R.S. (2005). Analysis of Financial Time Series, 2th Edition. John Wiley & Sons,New York.
Yu, K. and M.C. Jones (1997). A comparison of local constant and local linear regressionquantile estimation. Computational Statistics and Data Analysis, 25, 159-166.
Yu, K. and M.C. Jones (1998). Local linear quantile regression. Journal of the AmericanStatistical Association, 93, 228-237.
Chapter 9
Long Memory Models and StructuralChanges
Long memory time series have been a popular area of research in economics, finance and
statistics and other applied fields such as hydrological sciences during the recent years. Long
memory dependence was first observed by the hydrologist Hurst (1951) when analyzing the
minimal water flow of the Nile River when planning the Aswan Dam. Granger (1966) gave
an intensive discussion about the application of long memory dependence in economics and
its consequence was initiated. But in many applications, it is not clear whether the observed
dependence structure is real long memory or an artefact of some other phenomenon such
as structural breaks or deterministic trends. Long memory in the data would have strong
consequences.
9.1 Long Memory Models
9.1.1 Methodology
We have discussed that for a stationary time series the ACF decays exponentially to zero as
lag increases. Yet for a unit root nonstationary time series, it can be shown that the sample
ACF converges to 1 for all fixed lags as the sample size increases; see Chan and Wei (1988)
and Tiao and Tsay (1983). There exist some time series whose ACF decays slowly to zero at
a polynomial rate as the lag increases. These processes are referred to as long memory or
long range dependent time series. One such an example is the fractionally differenced
process defined by
(1 − L)d xt = wt, |d| < 0.5, (9.1)
119
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 120
where wt is a white noise series and d is called the long memory parameter or H = d+1/2
is called the Hurst parameter; see Hurst (1951). Properties of model (9.1) have been widely
studied in the literature (e.g., Beran, 1994). We summarize some of these properties below.
1. If d < 0.5, then xt is a weakly stationary process and has the infinite MA representation
xt = wt +∞∑
k=1
ψk wt−k with ψk = d(d+ 1) · · · (d+ k − 1)/k! =
(k + d− 1
k
).
2. If d > −0.5, then xt is invertible and has the infinite AR representation.
xt = wt +∞∑
k=1
ψk wt−k with ψk = (0 − d)(1 − d) · · · (k − 1 − d)/k! =
(k − d− 1
k
).
3. For |d| < 0.5, the ACF of xt is
ρx(h) =d(1 + d) · · · (h− 1 + d)
(1 − d)(2 − d) · · · (h− d), h ≥ 1.
In particular, ρx(1) = d/(1 − d) and as h→ ∞,
ρx(h) ≈(−d)!
(d− 1)!h2d−1.
4. For |d| < 0.5, the PACF of xt is φh,h = d/(h− d) for h ≥ 1.
5. For |d| < 0.5, the spectral density function fx(·) of xt, which is the Fourier transform
of the ACF γx(h) of xt, that is
fx(ν) =1
2π
∞∑
h=−∞
γx(h) exp(−i h π ν)
for ν ∈ [−1, 1], where i =√−1, satisfies
fx(ν) ∼ ν−2d as ν → 0, (9.2)
where ν ∈ [0, 1] denotes the frequency.
See the books by Hamilton (1994) and Brockwell and Davis (1991) for details about the
spectral analysis. The basic idea and properties of the spectral density and its estimation
are discussed in the section.
Of particular interest here is the behavior of ACF of xt when d < 0.5. The property
says that ρx(h) ∼ h2d−1, which decays at a polynomial, instead of exponential rate. For
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 121
this reason, such an xt process is called a long-memory time series. A special characteristic
of the spectral density function in (9.2) is that the spectrum diverges to infinity as ν → 0.
However, the spectral density function of a stationary ARMA process is bounded for all
ν ∈ [−1, 1].
Earlier we used the binomial theorem for non-integer powers
(1 − L)d =∞∑
k=0
(−1)k
(d
k
)Lk.
If the fractionally differenced series (1 − L)d xt follows an ARMA(p, q) model, then xt is
called an fractionally differenced autoregressive moving average (ARFIMA(p, d, q)) process,
which is a generalized ARIMA model by allowing for non-integer d. In practice, if the sample
ACF of a time series is not large in magnitude, but decays slowly, then the series may have
long memory. For more discussions, we refer to the book by Beran (1994). For the pure
fractionally differenced model in (9.1), one can estimate d using either a maximum likelihood
method in the time domain (by assuming that the distribution is known) or the approxi-
mate Whittle likelihood (see below) or a regression method with logged periodogram at the
lower frequencies (using (9.2)) in the frequency domain. Finally, long-memory models have
attracted some attention in the finance literature in part because of the work on fractional
Brownian motion in the continuous time models.
9.1.2 Spectral Density
We define the basic statistics used for detecting periodicities in time series and for deter-
mining whether different time series are related at certain frequencies. The power spectrum
is a measure of the power or variance of a time series at a particular frequency ν; the func-
tion that displays the power or variance for each frequency ν, say fx(ν) is called the power
spectral density. Although the function fx(·) is the Fourier transform of the autocovariance
function γx(h), we shall not make much use of this theoretical result in our generally applied
discussion. For the theoretical results, please read the book by Brockwell and Davis (1991,
§10.3).
The discrete Fourier transform of a sampled time series xtn−1t=0 is defined as
X(νk) =1√n
n−1∑
t=0
xt exp−2π νk t, (9.3)
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 122
where νk = k/n, k = 0, . . . , n − 1, defines the set of frequencies over which (9.3) is com-
puted. This means that the frequencies over which (9.3) is evaluated are of the form
ν = 0, 1/n, 2/n, . . . , (n − 1)/n. The evaluation of (9.3) proceeds using the fast Fourier
transform (DFT) and usually assumes that the length of the series, n is some power of 2.
If the series is not a power of 2, it can be padded by adding zeros so that the extended
series corresponds to the next highest power of two. One might want to do this anyway if
frequencies of the form νk = k/n are not close enough to the frequencies of interest.
Since, the values of (9.3) have a real and imaginary part at every frequency and can be
positive or negative, it is conventional to calculate first the squared magnitude of (9.3) which
is called the periodogram. The periodogram is defined as
Px(νk) = |X(νk)|2 (9.4)
and is just the sum of the squares of the sine and cosine transforms of the series. If we
consider the power spectrum fx(νk) as the quantity to estimate, it is approximately true1;
see the books by Hamilton (1994) and Brockwell and Davis (1991, §10.3), that
E[P (νk)] = E[|X(νk)|2
]≈ fx(νk), (9.5)
which implies that the periodogram is an approximately unbiased estimator of the power
spectrum at frequency νk. One may also show that the approximate covariance between two
frequencies νk and νl is zero for k 6= l and both frequencies multiples of 1/n. One can also
show that the sine and cosine parts of (9.3) are uncorrelated and approximately normally
distributed with equal variances fx(νk)/2. This implies that the squared standardized real
and imaginary components have chi-squared distributions with 1 degree of freedom each.
The periodogram is the sum of two such squared variables and must then have a chi-squared
distribution with 2 degrees of freedom. In other words, the approximate distribution of
P (νk) is exponential with the parameter λk = f(νk). The Whittle likelihood is based on the
approximate exponential distribution of P (νk)n−1k=0 .
It should be noted that trends introduce apparent long range periodicities which inflate
the spectrum at low frequencies (small values of νk) since the periodogram assumes that
the trend is part of one very long frequency. This behavior will obscure possible important
1For Ph.D. students, I encourage you to read the related book to understand the details on this aspect,which is important for your future research.
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 123
components at higher frequencies. For this reason (as with the auto and cross correlations) it
is important to work in (9.3) with the transform of the mean-adjusted (xt − x) or detrended
(xt − a − b t) series. A further modification that sometimes improves the approximation in
the expectation in (9.5) is tapering, a procedure where by each point xt is replaced by at xt,
where at is a function, often a cosine bell, that is maximum at the midpoint of the series and
decays away at the extremes. Tapering makes a difference in engineering applications where
there are large variations in the value of the spectral density fx(·). As a final comment,
we note that the fast Fourier transform algorithms for computing (9.3) generally work best
when the sample length is some power of two. For this reason, it is conventional to extend
the data series xt, t = 0, 1, . . . , n− 1 to a length n∗ > n that is a power of two by replacing
the missing values by xt = 0, t = n+ 1, . . . , n∗ − 1. If this is done, the frequency ordinates
νk = k/n are replaced by ν∗k = k/n∗ and we proceed as before.
Theorem 10.3.2 in Brockwell and Davis (1991, p.347) shows that P (νk) is not a con-
sistent estimator of fx(ν). Since for large n, the periodogram ordinates are approximately
uncorrelated with variances changing only slightly over small frequency intervals, we might
hope to construct a consistent estimator of fx(ν) by averaging the periodogram ordinates
in a small neighborhood of ν, which is called the smoothing estimator. On the other hand,
such a smoothing estimator can reduce the variability of the periodogram estimator of the
previous section. This procedure leads to a spectral estimator of the form
fx(νk) =1
2L+ 1
L∑
l=−L
Px(νk + l/n) =L∑
l=−L
wl Px(νk + l/n), (9.6)
when the periodogram is smoothed over 2L + 1 frequencies. The width of the interval
over which the frequency average is taken is called the bandwidth. Since there are 2L + 1
frequencies of width 1/n, the bandwidth B in this case is approximately B = (2L+1)/n. The
smoothing procedure improves the quality of the estimator for the spectrum since it is now
the average of L random variables each having a chi-squared distribution with 2 degrees of
freedom. The distribution of the smoothed estimator then will have df = 2(2L+ 1) degrees
of freedom. If the series is adjusted to the next highest power of 2, say n∗, the adjusted
degrees of freedom for the estimator will be df ∗ = 2(2L + 1)n/n∗ and the new bandwidth
will be B = (2L+ 1)/n∗.
Clearly, in (9.6), wl = 1/(2L+ 1), which is called the Daniell window (the corresponding
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 124
spectral window is given by the Daniell kernel). One popular approach is to take wl to be
wl = w(νl)/n.
If w(x) = r−1n sin2(r x/2)/ sin2(x/2), it is the well known Barlett or triangular window (the
corresponding spectral window is given by the Fejer kernel), where rn = ⌊n/(2L)⌋. If w(x) =
sin((r+0.5)x)/ sin(x/2), it is the rectangular window (the corresponding spectral window is
given by the Dirichlet kernel). See Brockwell and Davis (1991, §10.4) for more discussions.
9.1.3 Applications
The usage of the function fracdiff() is
fracdiff(x, nar = 0, nma = 0,
ar = rep(NA, max(nar, 1)), ma = rep(NA, max(nma, 1)),
dtol = NULL, drange = c(0, 0.5), h, M = 100)
This function can be used to compute the maximum likelihood estimators of the parameters
of a fractionally-differenced ARIMA(p, d, q) model, together (if possible) with their estimated
covariance and correlation matrices and standard errors, as well as the value of the maximized
likelihood. The likelihood is approximated using the fast and accurate method of Haslett
and Raftery (1989). To generate simulated long-memory time series data from the fractional
ARIMA(p, d, q) model, we can use the following function fracdiff.sim() and its usage is
fracdiff.sim(n, ar = NULL, ma = NULL, d,
rand.gen = rnorm, innov = rand.gen(n+q, ...),
n.start = NA, allow.0.nstart = FALSE, ..., mu = 0.)
An alternative way to simulate a long memory time series is to use the function arima.sim().
The menu for the package fracdiff can be downloaded from the web site at
http://cran.cnr.berkeley.edu/doc/packages/fracdiff.pdf
The function spec.pgram() in R calculates the periodogram using a fast Fourier trans-
form, and optionally smooths the result with a series of modified Daniell smoothers (moving
averages giving half weight to the end values). The usage of this function is
spec.pgram(x, spans = NULL, kernel, taper = 0.1,
pad = 0, fast = TRUE, demean = FALSE, detrend = TRUE,
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 125
plot = TRUE, na.action = na.fail, ...)
We can also use the function spectrum() to estimate the spectral density of a time series
and its usage is
spectrum(x, ..., method = c("pgram", "ar"))
Finally, it is worth to pointing out that there is a package called longmemo for long-memory
processes, which can be downloaded from
http://cran.cnr.berkeley.edu/doc/packages/longmemo.pdf. This package also pro-
vides a simple periodogram estimation by function per() and other functions like llplot()
and lxplot() for making graphs for spectral density. See the menu for details.
Example 9.1: As an illustration, Figure 9.1 show the sample ACFs of the absolute series
of daily simple returns for the CRSP value-weighted (left top panel) and equal-weighted
(right top panel) indexes from July 3, 1962 to December 31, 1997 and the sample partial
autocorrelation function of the absolute series of daily simple returns for the CRSP value-
weighted (left middle panel) and equal-weighted (right middle panel) indexes. The ACFs
are relatively small in magnitude, but decay very slowly; they appear to be significant at
the 5% level even after 300 lags. There are only the first few lags for PACFs outside the
confidence interval and then the rest is basically within the confidence interval. For more
information about the behavior of sample ACF of absolute return series, see Ding, Granger,
and Engle (1993). To estimate the long memory parameter estimate d, we can use the
function fracdiff() in the package fracdiff in R and results are d = 0.1867 for the absolute
returns of the value-weighted index and d = 0.2732 for the absolute returns of the equal-
weighted index. To support our conclusion above, we plot the log smoothed spectral density
estimation of the absolute series of daily simple returns for the CRSP value-weighted (left
bottom panel) and equal-weighted (right bottom panel). They show clearly that both log
spectral densities decay like a log function and they support the spectral densities behavior
like (9.2).
9.2 Related Problems and New Developments
The reading materials are the papers by Ding, Granger, and Engle (1993), Kramer, Sibbert-
sen and Kleiber (2002), Zeileis, Leisch, Hornik, and Kleiber (2002), and Sibbertsen (2004a).
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 126
0 100 200 300 400−0.1
0.0
0.1
0.2
0.3
0.4
ACF for value−weighted index
0 100 200 300 400−0.1
0.0
0.1
0.2
0.3
0.4
ACF for equal−weighted index
0 100 200 300 400−0.1
0.0
0.1
0.2
0.3
PACF for value−weighted index
0 100 200 300 400−0.1
0.0
0.1
0.2
0.3
PACF for equal−weighted index
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
−11
−9−8
−7−6
Log Smoothed Spectral Density of VW
0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07
−12
−10
−8−6
Log Smoothed Spectral Density of EW
Figure 9.1: Sample autocorrelation function of the absolute series of daily simple returnsfor the CRSP value-weighted (left top panel) and equal-weighted (right top panel) indexes.Sample partial autocorrelation function of the absolute series of daily simple returns for theCRSP value-weighted (left middle panel) and equal-weighted (right middle panel) indexes.The log smoothed spectral density estimation of the absolute series of daily simple returnsfor the CRSP value-weighted (left bottom panel) and equal-weighted (right bottom panel)indexes.
9.2.1 Long Memory versus Structural Breaks
It is a well known stylized fact that many financial time series such as squares or absolute
values of returns or volatilities, even returns themselves behave as if they had long memory;
see Ding, Granger and Engle (1993) and Sibbertsen (2004b). On the other hand, it is also well
known that long memory is easily confused with structural change, in the sense that the
slow decay of empirical autocorrelations which is typical for a time series with long memory
is also produced when a shortmemory time series exhibits structural breaks. Therefore it
is of considerable theoretical and empirical interest to discriminate between these sources of
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 127
slowly decaying empirical autocorrelations.
Structural break is another type of nonstationarity arises when the population re-
gression function changes over the sample period. This may occur because of changes in
economic policy, changes in the structure of the economy or industry, events that change
the dynamics of specific industries or firm related quantities such as inventories, sales, and
production, etc. If such changes, called breaks, occur then regression models that neglect
those changes lead to a misleading inference or forecasting.
Breaks may result from a discrete change (or changes) in the population regression coef-
ficients at distinct dates or from a gradual evolution of the coefficients over a longer period
of time. Discrete breaks may be a result of some major changes in economic policy or in the
economy (oil shocks) while “gradual” breaks, population parameters evolve slowly over time,
may be a result of slow evolution of economic policy. The former might be characterized by
an indicator function and latter can be described by a smooth transition function.
If a break occurs in the population parameters during the sample, then the OLS regression
estimates over the full sample will estimate a relationship that holds on “average”. Now
the question is how to test breaks.
9.2.2 Testing for Breaks (Instability)
Tests for breaks in the regression parameters depend on whether the break date is know
or not. If the date of the hypothesized break in the coefficients is known, then the null
hypothesis of no break can be testing using a dummy or indicator variable. For example,
consider the following model:
yt =
β0 + β1yt−1 + δ1 xt + ut, if t ≤ τ ,(β0 + γ0) + (β1 + γ1)yt−1 + (δ1 + γ2)xt + ut, if t > τ ,
where τ denotes the hypothesized break date. Under the null hypothesis of no break, H0 :
γ0 = γ1 = γ2 = 0, and the hypothesis of a break can be tested using the F-statistic. This is
called a Chow test proposed by Chow (1960) for a break at a known break date. Indeed,
the above structural break model can be regarded as a special case of the following trending
time series model
yt = β0(t) + β1(t) yt−1 + δ1(t)xt + ut.
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 128
For more discussions, see Cai (2007). If there is a distinct break in the regression function,
the date at which the largest Chow statistic occurs is an estimator of the break date.
If there are more variables or more lags, this test can be extended by constructing binary
variable interaction variables for all the dependent variables. This approach can be modified
to check for a break in a subset of the coefficients. The break date is unknown in most of
the applications but you may suspect that a break occurred sometime between two dates,
τ0 and τ1. The Chow test can be modified to handle this by testing for break at all possible
dates t in between τ0 and τ1, then using the largest of the resulting F-statistics to test for a
break at an unknown date. This modified test is often called Quandt likelihood ratio (QLR)
statistic or the supWald or supF statistic:
supF = maxF (τ0), F (τ0 + 1), · · · , F (τ1).
Since the supF statistic is the largest of many F-statistics, its distribution is not the same
as an individual F-statistic. The critical values for supF statistic must be obtained from a
special distribution. This distribution depends on the number of restriction being tested, m,
τ0, τ1, and the subsample over which the F-statistics are computed expressed as a fraction
of the total sample size. Other types of F -tests are the average F and exponential F given
by
aveF =1
τ1 − τ0 + 1
τ1∑
j=τ0
Fj
and
expF = log
(1
τ1 − τ0 + 1
τ1∑
j=τ0
exp(Fj/2)
).
For details on modified F -tests, see the papers by Hansen (1992) and Andrews (1993).
For the large-sample approximation to the distribution of the supF statistic to be a good
one, the subsample endpoints, τ0 and τ1, can not be too close to the end of the sample. That
is why the supF statistic is computed over a “trimmed” subset of the sample. A popular
choice is to use 15% trimming, that is, to set for τ0 = 0.15T and τ1 = 0.85T . With 15%
trimming, the F-statistic is computed for break dates in the central 70% of the sample. Table
9.1 presents the critical values for supF statistic computed with 15% trimming. This table
is from Stock and Watson (2003) and you should check the book for a complete table. The
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 129
Table 9.1: Critical Values of the QLR statistic with 15% Trimming
Number of restrictions (m) 10% 5% 1%1 7.12 8.68 12.162 5.00 5.86 7.783 4.09 4.71 6.024 3.59 4.09 5.125 3.26 3.66 4.536 3.02 3.37 4.127 2.84 3.15 3.828 2.69 2.98 3.579 2.58 2.84 3.3810 2.48 2.71 3.23
supF test can detect a single break, multiple discrete breaks, and a slow evolution of the
regression parameters.
Three classes of structural change tests (or tests for parameter instability) which have
been receiving much attention in both the statistics and econometrics communities but have
been developed in rather loosely connected lines of research are unified by embedding them
into the framework of generalized M-fluctuation tests (Zeileis and Hornik (2003)). These
classes are tests based on maximum likelihood scores (including the Nyblom-Hansen test),
on F statistics (supF, aveF, expF tests) and on OLS residuals (OLS-based CUSUM and
MOSUM tests; see Chu, Hornik and Kuan (1995), which is a special case of the so called
empirical fluctuation process, termed as efp in Zeileis, Leisch, Hornik, and Kleiber (2002)
and Zeileis and Hornik (2003)). Zeileis (2005) showed that representatives from these classes
are special cases of the generalized M-fluctuation tests, based on the same functional central
limit theorem, but employing different functionals for capturing excessive fluctuations. After
embedding these tests into the same framework and thus understanding the relationship
between these procedures for testing in historical samples, it is shown how the tests can
also be extended to a monitoring situation. This is achieved by establishing a general M-
fluctuation monitoring procedure and then applying the different functionals corresponding
to monitoring with ML scores, F statistics and OLS residuals. In particular, an extension of
the supF test to a monitoring scenario is suggested and illustrated on a real world data set.
In R, there are two packages strucchange developed by Zeileis, Leisch, Hornik, and
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 130
Kleiber (2002) and segmented to provide several testing methods for testing breaks. They
can be downloaded at http://cran.cnr.berkeley.edu/doc/packages/strucchange.pdf
and http://cran.cnr.berkeley.edu/doc/packages/segmented.pdf, respectively.
Example 9.2: We use the data build-in in R for the minimal water flow of the Nile River
when planning the Ashwan Dam; see Hurst (1951), yearly data from 1871 to 1970 with 100
observations and the data name Nile. Also, we might use the data build-in the package
longmemo in R, yearly data with 633 observations from 667 to 1284 and the data name
NileMin. Here we use R to run the data set in Nile. As a result, the p-value for testing break
Time
F sta
tistic
s
1890 1910 1930 1950
020
4060
Time
Nile
1880 1900 1920 1940 1960
600
800
1000
1200
1400
OLS−based CUSUM test
Time
Empir
ical fl
uctua
tion p
roce
ss
1880 1900 1920 1940 1960
−10
12
3
OLS−based CUSUM test with alternative boundaries
Time
Empir
ical fl
uctua
tion p
roce
ss
1880 1900 1920 1940 1960
−2−1
01
23
Figure 9.2: Break testing results for the Nile River data: (a) Plot of F -statistics. (b) Thescatterplot with the breakpoint. (c) Plot of the empirical fluctuation process with linearboundaries. (d) Plot of the empirical fluctuation process with alternative boundaries.
is very small (see the computer output) so that H0 is rejected. The details are summarized
in Figures 9.2(a)-(b). It is clear from Figure 9.2(b) that there is one breakpoint for the Nile
River data: the annual flows drop in 1898 because the first Aswan dam was built. To test
the null hypothesis that the annual flow remains constant over the years, we also compute
OLS-based CUSUM process and plot with standard linear and alternative boundaries in
Figures 9.2(c)-(d).
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 131
Example 9.3: We build a time series model for real oil price (quarterly) listed in the tenth
column in file “ex9-3.csv”, ranged from the first quarter of 1959 to the third quarter of 2002.
Before we build such a time series model, we want to see if there is any structure change for
oil price. As a result, the p-value for testing break is very small (see the computer output)
so that H0 is rejected. The details are summarized in Figure 9.3. It is clear from Figure
Time
F sta
tistic
s
40 60 80 100 120 140
05
1015
2025
30
Time
op0 50 100 150
0.51.0
1.52.0
OLS−based CUSUM test
Time
Empir
ical fl
uctua
tion p
roce
ss
0 50 100 150
−2−1
01
2
OLS−based CUSUM test with alternative boundaries
Time
Empir
ical fl
uctua
tion p
roce
ss
0 50 100 150
−2−1
01
2
Figure 9.3: Break testing results for the oil price data: (a) Plot of F -statistics. (b) Scatter-plot with the breakpoint. (c) Plot of the empirical fluctuation process with linear boundaries.(d) Plot of the empirical fluctuation process with alternative boundaries.
9.3(b) that there is one breakpoint for the oil price. We also compute OLS-based CUSUM
process and plot with standard linear and alternative boundaries in Figures 9.3(c)-(d).
If we consider the quarterly price level (CPI) listed in the eighth column in file “ex9-3.csv”
(1959.1 – 2002.3), we want to see if there is any structure change for quarterly consumer
price index. The details are summarized in Figure 9.4. It is clear from Figure 9.4(b) that
there would be one breakpoint for the consumer price index. We also compute OLS-based
CUSUM process and plot with standard linear and alternative boundaries in Figures 9.4(c)-
(d). Please thank about the conclusion for CPI!!! Do you believe this result? If
not, what happen?
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 132
Time
F sta
tistic
s
40 60 80 100 120 140
020
040
060
0
Time
cpi
0 50 100 150
5010
015
0
OLS−based CUSUM test
Time
Empir
ical fl
uctua
tion p
roce
ss
0 50 100 150
−6−4
−20
OLS−based CUSUM test with alternative boundaries
Time
Empir
ical fl
uctua
tion p
roce
ss
0 50 100 150
−6−4
−20
2
Figure 9.4: Break testing results for the consumer price index data: (a) Plot of F -statistics.(b) Scatterplot with the breakpoint. (c) Plot of the empirical fluctuation process with linearboundaries. (d) Plot of the empirical fluctuation process with alternative boundaries.
Sometimes, you would suspect that a series may either have a unit root or be a trend
stationary process that has a structural break at some unknown period of time and you would
want to test the null hypothesis of unit root against the alternative of a trend stationary
process with a structural break. This is exactly the hypothesis testing procedure proposed
by Zivot and Andrews (1992). In this testing procedure, the null hypothesis is a unit root
process without any structural breaks and the alternative hypothesis is a trend stationary
process with possible structural change occurring at an unknown point in time. Zivot and
Andrews (1992) suggested estimating the following regression:
xt =
µ+ β t+ αxt−1 +
∑ki=1 ci ∆xt−i + et, if t ≤ τ T ,
[µ+ θ] + [β t+ γ(t− TB)] + αxt−1 +∑k
i=1 ci ∆xt−i + et, if t > τ T ,(9.7)
where τ = TB/T is the break fraction. Model (9.7) is estimated by OLS with the break points
ranging over the sample and the t-statistic for testing α = 1 is computed. The minimum
t-statistic is reported. Critical values for 1%, 5% and 10% critical values are −5.34, −4.8
and −4.58, respectively. The appropriate number of lags in differences is estimated for each
value of τ . Please read the papers by Zivot and Andrews (1992) and Sadorsky (1999) for
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 133
more details about this method and empirical applications.
9.2.3 Long Memory versus Trends
A more general problem than distinguishing long memory and structural breaks is in a
way the question if general trends in the data can cause the Hurst effect. The paper by
Bhattacharya et al (1983) was the first to deal this problem. They found that adding a
deterministic trend to a short memory process can cause spurious long memory. Now
the problem becomes more complicated for modeling long memory time series. For this
issue, we will follow Section 5 of Zeileis (2004a). Note that there are a lot of ongoing
research (theoretical and empirical) works in this area.
9.3 Computer Codes
#####################
# This is Example 9.1
#####################
z1<-matrix(scan("c:/res-teach/xiamen12-06/data/ex9-1.txt"),
byrow=T,ncol=5)
vw=abs(z1[,3])
n_vw=length(vw)
ew=abs(z1[,4])
postscript(file="c:/res-teach/xiamen12-06/figs/fig-9.1.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(3,2),mex=0.4,bg="light green")
acf(vw, ylab="",xlab="",ylim=c(-0.1,0.4),lag=400,main="")
text(200,0.38,"ACF for value-weighted index")
acf(ew, ylab="",xlab="",ylim=c(-0.1,0.4),lag=400,main="")
text(200,0.38,"ACF for equal-weighted index")
pacf(vw, ylab="",xlab="",ylim=c(-0.1,0.3),lag=400,main="")
text(200,0.28,"PACF for value-weighted index")
pacf(ew, ylab="",xlab="",ylim=c(-0.1,0.3),lag=400,main="")
text(200,0.28,"PACF for equal-weighted index")
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 134
library(fracdiff)
d1=fracdiff(vw,ar=0,ma=0)
d2=fracdiff(ew,ar=0,ma=0)
print(c(d1$d,d2$d))
m1=round(log(n_vw)/log(2)+0.5)
pad1=1-n_vw/2^m1
vw_spec=spec.pgram(vw,spans=c(5,7),demean=T,detrend=T,pad=pad1,plot=F)
ew_spec=spec.pgram(ew,spans=c(5,7),demean=T,detrend=T,pad=pad1,plot=F)
vw_x=vw_spec$freq[1:1000]
vw_y=vw_spec$spec[1:1000]
ew_x=ew_spec$freq[1:1000]
ew_y=ew_spec$spec[1:1000]
scatter.smooth(vw_x,log(vw_y),span=1/15,ylab="",xlab="",col=6,cex=0.7,
main="")
text(0.03,-7,"Log Smoothed Spectral Density of VW")
scatter.smooth(ew_x,log(ew_y),span=1/15,ylab="",xlab="",col=7,cex=0.7,
main="")
text(0.03,-7,"Log Smoothed Spectral Density of EW")
dev.off()
##########################
# This is for Example 9.2
##########################
graphics.off()
library(strucchange)
library(longmemo) # not used
postscript(file="c:/res-teach/xiamen12-06/figs/fig-9.2.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(2,2),mex=0.4,bg="light blue")
#if(! "package:stats" %in% search()) library(ts)
## Nile data with one breakpoint: the annual flows drop in 1898
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 135
## because the first Ashwan dam was built
data(Nile)
## test whether the annual flow remains constant over the years
fs.nile=Fstats(Nile ~ 1)
plot(fs.nile)
print(sctest(fs.nile))
plot(Nile)
lines(breakpoints(fs.nile))
## test the null hypothesis that the annual flow remains constant
## over the years
## compute OLS-based CUSUM process and plot
## with standard and alternative boundaries
ocus.nile=efp(Nile ~ 1, type = "OLS-CUSUM")
plot(ocus.nile)
plot(ocus.nile, alpha = 0.01, alt.boundary = TRUE)
## calculate corresponding test statistic
print(sctest(ocus.nile))
dev.off()
#########################
# This is for Example 9.3
#########################
y=read.csv("c:/res-teach/xiamen12-06/data/ex9-3.csv",header=T,skip=1)
op=y[,10] # oil price
postscript(file="c:/res-teach/xiamen12-06/figs/fig-9.3.eps",
horizontal=F,width=6,height=6)
par(mfrow=c(2,2),mex=0.4,bg="light pink")
op=ts(op)
fs.op=Fstats(op ~ 1) # no lags and covariate
#plot(op,type="l")
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 136
plot(fs.op)
print(sctest(fs.op))
## visualize the breakpoint implied by the argmax of the F statistics
plot(op,type="l")
lines(breakpoints(fs.op))
ocus.op=efp(op~ 1, type = "OLS-CUSUM")
plot(ocus.op)
plot(ocus.op, alpha = 0.01, alt.boundary = TRUE)
## calculate corresponding test statistic
print(sctest(ocus.op))
dev.off()
cpi=y[,8]
cpi=ts(cpi)
fs.cpi=Fstats(cpi~1)
print(sctest(fs.cpi))
postscript(file="c:/res-teach/xiamen12-06/figs/fig-9.4.eps",
horizontal=F,width=6,height=6)
#win.graph()
par(mfrow=c(2,2),mex=0.4,bg="light yellow")
#plot(cpi,type="l")
plot(fs.cpi)
## visualize the breakpoint implied by the argmax of the F statistics
plot(cpi,type="l")
lines(breakpoints(fs.cpi))
ocus.cpi=efp(cpi~ 1, type = "OLS-CUSUM")
plot(ocus.cpi)
plot(ocus.cpi, alpha = 0.01, alt.boundary = TRUE)
## calculate corresponding test statistic
print(sctest(ocus.cpi))
dev.off()
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 137
9.4 References
Andrews, D.W.K. (1993). Tests for parameter instability and structural change with un-known change point. Econometrica, 61, 821-856.
Bhattacharya, R.N., V.K. Gupta and W. Waymire (1983). The Hurst effect under trends.Journal of Applied Probability, 20, 649-662.
Beran, J. (1994). Statistics for Long-Memory Processes. Chapman and Hall, London.
Brockwell, P.J. and Davis, R.A. (1991). Time Series Theory and Methods. New York:Springer.
Chan, N.H. and C.Z. Wei (1988). Limiting distributions of least squares estimates ofunstable autoregressive processes. Annals of Statistics, 16, 367-401.
Chow, G.C. (1960). Tests of equality between sets of coefficients in two linear regressions.Econometrica, 28, 591-605.
Chu, C.S.J, K. Hornik and C.M. Kuan (1995). MOSUM tests for parameter constancy.Biometrika, 82, 603-617.
Ding, Z., C.W.J. Granger and R.F. Engle (1993). A long memory property of stock returnsand a new model. Journal of Empirical Finance, 1, 83-106.
Hamilton, J.D. (1994). Time Series Analysis. Princeton University Press.
Hansen, B.E. (1992). Testing for parameter instability in linear models. Journal of PolicyModeling, 14, 517-533.
Haslett, J. and A.E. Raftery (1989). Space-time modelling with long-memory dependence:Assessing Ireland’s wind power resource (with discussion). Applied Statistics, 38, 1-50.
Hsu, C.-C., and C.-M. Kuan. (2000). Long memory or structural changes: Testing methodand empirical examination. Working paper, Department of Economics, National Cen-tral University, Taiwan.
Hurst, H.E. (1951). Long-term storage capacity of reservoirs. Transactions of the AmericanSociety of Civil Engineers, 116, 770-799.
Granger, C.W.J. (1966). The typical spectral shape of an economic variable. Econometrica,34, 150-161.
Kramer, W., P. Sibbertsen and C. Kleiber (2002). Long memory versus structural changein financial tim series. Allgemeines Statistisches Archiv, 86, 83-96.
Nyblom, J. (1989). Testing for the constancy of parameters over time. Journal of theAmerican Statistical Association, 84, 223-230.
CHAPTER 9. LONG MEMORY MODELS AND STRUCTURAL CHANGES 138
Sadorsky, P. (1999). Oil price shocks and stock market activity. Energy Economics, 21,449-469.
Sibbertsen, P. (2004a). Long memory versus structural breaks: An overview. StatisticalPapers, 45, 465-515.
Sibbertsen, P. (2004b). Long memory in volatilities of German stock returns. EmpiricalEconomics, 29, 477-488.
Stock, J.H. and M.W. Watson (2003). Introduction to Econometrics. Addison-Wesley.
Tiao, G.C. and R.S. Tsay (1983). Consistency properties of least squares estimates ofautoregressive parameters in ARMA models. Annals of Statistics, 11, 856-871.
Tsay, R.S. (2005). Analysis of Financial Time Series, 2th Edition. John Wiley & Sons,New York.
Zeileis, A. (2005). A unified approach to structural change tests based on ML scores, Fstatistics, and OLS residuals. Econometric Reviews, 24, 445-466.
Zeileis A. and Hornik, K. (2003). Generalized M-fluctuation tests for parameter instabil-ity. Report 80, SFB “Adaptive Information Systems and Modelling in Economics andManagement Science”. URL http://www.wu-wien.ac.at/am/reports.htm#80.
Zeileis, A., F. Leisch, K. Hornik and C. Kleiber (2002). strucchange: An R package fortesting for structural change in linear regression models. Journal of Statistical Software,7, Issue 2, 1-38.
Zivot, E. and D.W.K. Andrews (1992). Further evidence on the great crash, the oil priceshock and the unit root hypothesis. Journal of Business and Economics Statistics, 10,251-270.