non parametric resampling for stationary markov processes...

20
Journal of Statistical Planning and Inference 136 (2006) 3319 – 3338 www.elsevier.com/locate/jspi Non parametric resampling for stationary Markov processes: The local grid bootstrap approach Valérie Monbet a , , Pierre-François Marteau b a UBS/SABRES, Campus Tohannic, F-56000 Vannes, France b UBS/VALORIA, Campus Tohannic, F-56000 Vannes, France Received 25 April 2003; accepted 16 November 2004 Available online 20 April 2005 Abstract A new resampling technique, referred as “local grid bootstrap” (LGB), based on nonparametric local bootstrap and applicable to a wide range of stationary general space Markov processes is proposed. This nonparametric technique resamples local neighborhoods defined around the true samples of the observed multivariate time serie. The asymptotic behavior of this resampling procedure is studied in detail. Applications to linear and nonlinear (in particular chaotic) simulated time series are presented, and compared to Paparoditis and Politis [2002. J. Statist. Plan. Inf. 108, 301–328] approach, referred as “local bootstrap” (LB) and developed in earlier similar works. The method shows to be efficient and robust even when the length of the observed time series is reasonably small. © 2005 Elsevier B.V.All rights reserved. Keywords: Nonlinear time series; Markov chains; Resampling; Smoothed bootstrap; Nonparametric estimation 1. Introduction Simulation is a powerful tool that can be used to investigate the behavior of any estimation procedure when the objective is to completely specify the sampled population. In these settings, one can obtain estimates for standard errors of parameter estimators even when the standard error formulas have not been (or cannot be) determined analytically. Furthermore, one can determine confidence intervals for unknown parameter values via simulation. When Corresponding author. Tel.: +33 2 9701 7225; fax: +33 2 9701 7200. E-mail addresses: [email protected] (V. Monbet), [email protected] (P.-F. Marteau). 0378-3758/$ - see front matter © 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.jspi.2004.11.014

Upload: vukhanh

Post on 13-Sep-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Journal of Statistical Planning andInference 136 (2006) 3319–3338

www.elsevier.com/locate/jspi

Non parametric resampling for stationary Markovprocesses: The local grid bootstrap approach

Valérie Monbeta,∗, Pierre-François Marteaub

aUBS/SABRES, Campus Tohannic, F-56000 Vannes, FrancebUBS/VALORIA, Campus Tohannic, F-56000 Vannes, France

Received 25 April 2003; accepted 16 November 2004Available online 20 April 2005

Abstract

A new resampling technique, referred as “local grid bootstrap” (LGB), based on nonparametric localbootstrap and applicable to a wide range of stationary general space Markov processes is proposed.This nonparametric technique resamples local neighborhoods defined around the true samples of theobserved multivariate time serie. The asymptotic behavior of this resampling procedure is studied indetail. Applications to linear and nonlinear (in particular chaotic) simulated time series are presented,and compared to Paparoditis and Politis [2002. J. Statist. Plan. Inf. 108, 301–328] approach, referredas “local bootstrap” (LB) and developed in earlier similar works. The method shows to be efficientand robust even when the length of the observed time series is reasonably small.© 2005 Elsevier B.V. All rights reserved.

Keywords: Nonlinear time series; Markov chains; Resampling; Smoothed bootstrap; Nonparametric estimation

1. Introduction

Simulation is a powerful tool that can be used to investigate the behavior of any estimationprocedure when the objective is to completely specify the sampled population. In thesesettings, one can obtain estimates for standard errors of parameter estimators even when thestandard error formulas have not been (or cannot be) determined analytically. Furthermore,one can determine confidence intervals for unknown parameter values via simulation. When

∗ Corresponding author. Tel.: +33 2 9701 7225; fax: +33 2 9701 7200.E-mail addresses: [email protected] (V. Monbet), [email protected]

(P.-F. Marteau).

0378-3758/$ - see front matter © 2005 Elsevier B.V. All rights reserved.doi:10.1016/j.jspi.2004.11.014

3320 V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338

one is unable to completely specify the population under study because one just knowsa sample, bootstrap methods still provide ways to obtain standard error estimates and/orconfidence intervals. Bootstrap is a general method (more precisely, a collection of methods)which makes use of the information contained in a single sample from the population ofinterest, in conjunction with simulation results, to provide information about the distributionof a statistic. The parametric bootstrap requires partially specifying the population understudy assuming a particular family of probability distributions. The parameters of the chosenfamily are estimated. Nonparametric bootstrap does not require any explicit assumptionsabout the population’s distribution but uses the single sample to provide an approximationfor the population’s distribution.

The idea behind the bootstrap is due to Efron (1979) who proposed an extension of theJacknife in the case of i.i.d variables. Recently, several different approaches for bootstrap-ping stationary (or almost stationary) observations have been proposed in the literature.We can give a nonexhaustive list: the ‘residual bootstrap’ (cf. Freedman, 1984;Efron andTibshirani, 1993), the block bootstrap (cf. Künsch, 1989; Lui and Singh, 1992; Politis andWhite, 2004), the blocks-of-blocks bootstrap (cf. Politis and Romano, 1992), the stationarybootstrap (cf. Politis and Romano , 1994), tapered bootstrap (cf. Paparoditis and Politis,2001a, b) and the frequency domain bootstrap (cf. Franke and Härdle, 1992). Shao and Tu(1995) make an overview of these methods. Bühlmann (2002) proposes a paper where hecompares block bootstrap, AR-sieve bootstrap and local bootstrap for time series. Härdleet al. (2003) discussed the accuracy of bootstrap methods for time series and describe someimportant unsolved problems. Horowitz, 2003 demonstrates some results for higher orderstatistics in the context of Markov process resampling.

This paper introduces a new resampling procedure called the “local grid bootstrap” (LGB)applicable to strictly stationary general space and discrete time stochastic processes thatfollow a finite order Markovian structure with continuous joint probability density function.The LGB procedure proposed in this paper, can be considered as a smoothed version of localbootstrap algorithm of Paparoditis and Politis (2002). In practice, smoothing in bootstrapmethods permits to improve estimation of some statistics, such as the probability to be ina given small set, when the sample size is small. LGB found its main roots in the paperby Paparoditis and Politis (2002) that proposes a nonparametric local bootstrap procedurefor stationary stochastic processes that follow a general autoregressive structure and in thepaper by Monbet and Marteau (2001) that presents a similar procedure for cyclostationarytime series in the context of the analysis of sea-state processes. Both papers propose a localbootstrap for Markov processes based on a nonparametrical estimation of the transitionprobability density function. In their paper, Paparoditis and Politis (2002) demonstrate theasymptotic properties of the nonparametric local bootstrap procedure and application of theprocedure in nonlinear time series analysis are considered and theoretically justified. In thefollowing, we use the notation LB to refer to the bootstrap procedure of Paparoditis andPolitis (2002).

Note that the local bootstrap has been proposed and developed in the scope of vari-ous applications: Shi’s work (1991) focused on i.i.d regression, Falk and Reiss (1989)one addressed conditional curves bootstrapping, Lall and Sharman (1996) addressed themodelling of hydrological time series while Chan et al. (1997) tackled ecologicalseries.

V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338 3321

Our initial concern is the statistical characterization of the behavior of complex dy-namical systems that are submitted to long-term multivariate random inputs. Two possibleapproaches exist to handle such problems: a deterministic (or analytical) approach thatallows short term (and often precise) prediction and a statistical modelling approach, forwhich a solution consists in synthesizing long-term input time series to excite a (simplified)physical model in order to test a large number of possible scenarios. Bootstrap methodsare in this context of great importance since they can provide long-term sequences from alimited set of observed real data.

For example, the characterization of the erosion of a coastal line is dependent on thelong-term effect of random waves. In order to provide erosion statistics, long-term seastate time series are required that could be used as plausible inputs to the erosion mod-els in experimentation. As a matter of fact, long-term (at a millennium scale) sea stateobservations are not available. Bootstrapping is thus necessary to build astatistical methodology for erosion characterization (cf. Monbet and Marteau, 2001; Ailliotet al., 2003).

We address, in this paper, the resampling of process defined as follows: X = {Xt, t =1, 2, . . .} denotes a d-dimensional stochastic process on a probability space (�,F, P ) andfollows a general Markovian structure, where t ∈ N and Xt ∈ Rd . The dimension d maybe greater than 1. We assume that an integer p > 0 exists such that, for all t ∈ N, the stateof the process X at time t depends only on the p previous states i.e., for every t ∈ N and forevery Borel set A ⊂ Rd ,

P(Xt+1 ∈ A|Xj , j � t) = P(Xt ∈ A|Xj , t − p + 1�j � t).

This class of Markov processes includes a large number of nonlinear models used in timeseries analysis: linear and nonlinear autoregressive processes, chaotic processes such asLorentz oscillator, sea state process parameters, etc.

This paper proposes a bootstrap scheme to construct a time series X1, . . . , Xk of anylength k sampled from an observed series X1, . . . , XT of fixed length T. The proposedscheme samples observed and unobserved states and restores the statistical properties ofthe reference time series under the assumption of continuity of the mapping x �→ Fx(.) =F(.|Xt = x) where F is the one step transition cumulative distribution function of X. Asstated, the generated data Xi are not necessarily observed in the sequence X=(Xi)i∈{1,...,T }.Nevertheless we show that the transition probabilities that govern the processes X and Xare asymptotically equivalent where X = {X1, . . . , Xn}. We also demonstrate on variousexamples that for fixed T we obtain a good approximation of some statistical properties ofX.

In the first part of the paper we give the general assumptions on the basis of which wedevelop the LGB scheme. In the second part asymptotic properties and behavior of the LGBprocedure are studied. Finally, in the third part, we compare our results with those obtainedby Paparoditis and Politis (2002) for linear and nonlinear autoregressive processes and wepropose an example on a chaotic multivariate Markovian process. Proof of the theorems aregiven in the Appendix.

3322 V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338

2. The local grid bootstrap procedure

2.1. Notations and assumptions

Let {Xt, t �1} be a d-dimensional stochastic process in (Rd ,Bd) where Bd is the Borel� algebra over Rd .

(A1) We suppose that there exists a positive integer p < ∞ such that the stochastic process{Yt , t > p} with Yt = {Xt, Xt−1, . . . , Xt−p+1} forms an aperiodic, strictly stationary andgeometrically ergodic Markov chain on (Rdp,Bdp) with transition probability functionP(y, A) where A ⊂ Rd .

We assume that this Markov chain admits a stationary distribution � with continuousdensity fY with respect to Lebesgue measure.We also suppose that the transition distributionfunction Fy(x) = P(Xt+1 �x|Yt = y) admits a continuous probability density function�(x|y).

Let us define, for any time t, the transition distribution Fy (Eq. 1) and the stationarydistribution F (Eq. 2) of the observed (or reference) time series (Yt ):

Fy(x) = P(Xt+1 < x|Yt = y), Xt+1 ∈ Rd , Yt ∈ Rdp, (1)

F(y) = P(Yt < y), Yt ∈ Rdp. (2)

We need to introduce some assumptions on the distribution functions Fy and F to ensurethe convergence of their estimates.

(A2) F and Fy are absolutely continuous with respect to the Lebesgue measure on Rdp

and Rp, respectively and have bounded densities.(A3) The mapping y ∈ S �→ Fy(.) is Prohorov continuous, S is a compact set of Rdp.(A4) For every time t, the stationary probability density function fY of F is positive on

a compact set S ⊂ Rdp. S is defined for a given reference process such that Yt ∈ S a.s. forall time t.

We remark that the compactness of the support of fY is a technical assumption which isnonrestrictive for the applications. For instance, every physical system with finite energyhas states that take their values on a compact subset, most of the biological parameters arebounded, etc.

The Prohorov continuity is defined in a topological space equipped with a Prohorov mea-sure. If M denotes the space of all probability measures on (�,F), the weak convergencetopology on M is metrizable by the Prohorov distance dpr(., .). The distance dpr(G1, G2)

of two measures G1 and G2 is given by

dpr(G1, G2) = inf{�|G1(A)�G2(A�) + �, ∀A ∈ �}, (3)

where A� = {y|d(A, y) < �} and d is a distance in Rd .In the sequel, the density probability function of the stationary distribution and the tran-

sition probability will be approximated by appropriate kernel estimates.Let Kd be a probability density function on Rd , Kdp a probability density function on

Rdp and {hT , T =0, 1, 2, . . .} a sequence of positive numbers such that hT → 0 as T → ∞.

V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338 3323

We suppose that the density kernels Kd and Kdp satisfy the following conditions:(K1) Ki(.) is continuous, symmetric, bounded, i.e. sup{Ki(x) : x ∈ Ri} < ∞ for all

i ∈ {d, dp}.(K2)

∫Ri x2Ki(x) dx < ∞ for all i ∈ {d, dp}.

(K3) � : R → R is a kernel density satisfying conditions (K1)–(K2), with i = 1, suchthat Ki(x) =∏i

j=1 �(xj ) for all i ∈ {d, dp} and for x = (x1, . . . , xi)′ ∈ Ri .

2.2. The LGB resampling scheme

The LGB resampling algorithm generates a series X1, X2, . . . , XN where the length Nmay be chosen independently from the length T of the observed sequence X=(Xi)i∈{1,...,T }.Let us denote Yt = {Xt , Xt−1, . . . , Xt−p+1} the state of the generated sequence at time t.

The generation of Xk is obtained by assigning probabilities to a finite subset of convenientstates and sampling this subset according to the discrete probability mass. The idea behindthe generation of unobserved states is that, if the underlying transition distribution is regularenough (for instance continuous), it is possible through the kernel estimate to assign aprobability to unobserved states around the observations {Xk} and to synthesize a newtime series that includes observed and unobserved points. The set of “unobserved states”is defined by a discretization of the image, through the Markovian transition operator, of aneighborhood of the current point Yt .

The resampling scheme may be described as follows:Initialization step: Select an initial state Yp, the bandwidth parameter hT , the width

�T of the neighborhood of a given state, nmin the minimum number of observations inthe neighborhood and the grid parameters: the discretization step �g and the edge length�g = Ng ∗ �g . �g may depend on �T .

Step t:

• Let us suppose that the state Yt is already sampled. The neighborhood V Y t of Yt is definedby the set of observed Yl ∈ {Y |d(Y, Yt )��T /2}.At this step, if |{Y |d(Y, Yt )��T /2}|=1,parameter �T is increased in order that the set contains at least nmin points.We note I [V Y t ]the set of time index such that for all l ∈ I [V Y t ], Yl ∈ V Y t . Furthermore, we define theimage (V Y t )

+ of V Y t by (V Y t )+ = {Xl+1, l ∈ I [V Y t ]} ⊂ Rd .

• Now, a grid Gt ={gt1, . . . , gT t

g} is built by discretizing a cube of Rd with grid step �g and

edge length �g . The cube is centered on the barycentre of (V Y t )+ and the edge length

�g is defined such that the cube includes at least all the elements of (V Y t )+. We note

GY(t)T = (V Y t )

+ ∪ Gt .

• Let J be a discrete random variable tacking its values in I [GY(t)T ]={k ∈ N, Xk ∈ GY

(t)T },

with probability mass function given by

P(J = k) = Wk = p(Yt , Xk+1)∑j∈I [GY

(t)T ] p(Yt , Xj+1)

, ∀k ∈ I [GY(t)T ], (4)

p(Yt , Xk+1) = pk =∑

i∈I [V Y t ]Kd

(Xk+1 − Xi+1

hT

)Kdp

(Yt − Yi

hT

). (5)

3324 V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338

The sampled state at time t + 1 is such that Xt+1 = XJ .The above procedure assigns sampling probabilities to a set of points that includes the

successors of the observed neighbors of the last sampled state and the points of a local gridpositioned around these successors. This discrete probability mass may be considered astransition probabilities between Yt and XJ . It depends both on the density of the originalsequence around XJ and on the density of the observed state around Yt . As the kernels Kd

and Kdp are continuous on Rd and Rdp respectively, we are able to assign probabilities tounobserved points of the grid and consequently to sample unobserved states.

2.3. Properties of the bootstrap

According to the above sampling scheme, for any fixed length T of the observed series,the generated series {Xt , t = 1, 2, . . .} evolves following a dependence structure to becharacterized. Theorems 1 and 2 describe the asymptotic validity of the LGB procedure.

Theorem 1. For given T and X1, . . . , XT , for every kernel satisfying (K1) and every fixedkernel bandwidth hT , there exists with probability one t0 ∈ N such that the generated seriesYT = {Yt ; t > t0} is a positive recurrent, irreductible and aperiodic Markov chain.

The probabilistic properties of the bootstrap Markov chain Y depends on the chosenkernel and on its bandwidth parameter hT . Indeed, if hT is sufficiently close to 0, thegenerated time series will be an exact reproduction of the reference series. On the contrary,if hT is too large, the bootstrap procedure will be unable to restore the statistical propertiesof the reference Markov chain. The grid parameters have no influence on the estimation ofthe transition probabilities itself. We will see later that the grid parameters impact directlyon the reproduction ratios of subsequences of the reference time series inside the sampledone.

Theorem 2. Under assumptions (K1)–(K3), (A1)–(A3), if hT → 0 and T hdpT → ∞ as

T → ∞ and �T = chT with c a positive constant. For almost all x ∈ S′and y ∈ S, we havefor the one step transition and the stationary distributions:

(1) Fy(x) → Fy(x) weakly in Prohorov as T → ∞,(2) F (y) → F(y) weakly in Prohorov as T → ∞,

where S′ is a compact subset of Rd such that fX(x) > 0 for all x ∈ S′ and S=S′×· · ·×S′ ⊂Rdp . fX denotes the stationary probability density function of the random variable Xt .

The first part of Theorem 2 ensures the weak convergence of the transition distributionfunction of the simulated series to the transition distribution of observed time series whenthe observation time becomes long enough. This result is important because a Markovprocess may be entirely specified by its transition distribution function when the stationarymode is achieved.

The second part of the theorem gives the weak convergence of the stationary distributionfunction of the LGB Markov chain to the stationary distribution function of the reference

V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338 3325

time series. Furthermore, the weak convergence of the empirical distribution F to F allows,through the Delta method, to extend the convergence property to statistics of the form �(F )

where � is an Hadamard-differentiable function (see van der Vaart, 1998).

3. Applications

3.1. The choice of the resampling parameters

For LGB procedure, the transition probabilities are estimated nonparametrically. One ofthe difficulties of kernel based estimation lies in the choice of the bandwidth parameter hT

that is used to estimate the probability density functions. A solution consists first to assumethat the reference process may be approximated by a Gaussian process and secondly tochoose the best bandwidth parameter for the Gaussian process. Another solution is to testseveral values for the bandwidth parameter and to compare the sampled time series. Onecan, for instance, compare the transition and the stationary distributions of the referenceand the sampled time series. A third alternative consists in choosing hT such that the localneighborhood contains at least a minimum of observed points as follows: select an initialbandwidth value; if the neighborhood contains less than points, increase the bandwidthuntil it contains more than observed points. For this last solution, for a fixed length T of theobserved time series, hT depends on the density fY . Nevertheless, as T tends to infinity, hT

needs to vanish and T hdpT to tends to infinity. The number can be estimated experimentally,

for instance by cross-validation on a subsequence of the reference time series.Bandwidth parameter hT may also be linked with �T that defined the width of the local

neighborhood of the current state. A particular case occur when |{Y |d(Y, Yt )��T /2}| = 1,then the bandwidth neighborhood parameter �T is increased until the number of neighboursis large enough, say greater than a given number nmin. In fact, such case occurs only forrare events (in general for extreme values) such that nmin is not a very sensitive parameterand it can be chosen low with respect to T.

The grid parameters, length of edge �g and step �g are also sensitives. They determinethe number Ng of the nonobserved states included in the neighborhood of the current pointfor the sampling: Ng = (

�g

�g)d . When the ratio �g/�g tends to zero, LGB algorithm tends to

become a standard LB procedure as none unobserved state is added to the observations. If�g is large compared to hT , nonobserved states will be far from the center of the grid andthey will have a very low probability to be sampled. Ideally, �g has to be chosen so that eachnonobserved state has a sufficiently large probability to be reached. Now, we can observethat for fixed �g , the smaller �g , the larger the number of accessible states. Unfortunately,the complexity of the algorithm increases exponentially with �g , so that the grid step shouldnot be too small.

3.2. Autoregressive models

In this section simulation examples are proposed in order to evaluate the propertiesof the LGB procedure and to compare it with LB algorithm developed by

3326 V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338

Table 1Estimated “exact” and bootstrap estimates of the standard deviation �r,T of the sample lag-reversibility coefficient√

T − rPr , r = 1, 2.

�r,T E STD MSE

�∗r,T

�r,T �∗r,T

�r,T �∗r,T

�r,T

T = 100√T − 1P1

AR 0.291 0.292 0.300 0.020 0.020 4.01e−4 4.81e−4

ARC 0.271 0.299 0.303 0.017 0.057 11.0e−4 4.3e−4

NLAR 0.306 0.283 0.296 0.021 0.024 9.70e−4 6.76e−4√T − 2P2

AR 0.281 0.287 0.290 0.019 0.019 3.97e−4 4.42e−4

ARC 0.258 0.289 0.293 0.021 0.065 14.0e−4 54.0e−4

NLAR 0.274 0.275 0.284 0.020 0.017 4.01e−4 3.89e−4

T = 200√T − 1P1

AR 0.291 0.294 0.299 0.017 0.019 2.98e−4 4.25e−4

ARC 0.271 0.293 0.296 0.018 0.021 0.80e−4 0.11e−4

NLAR 0.306 0.287 0.294 0.019 0.027 7.22e−4 8.13e−4√T − 2P2

AR 0.276 0.281 0.284 0.017 0.021 3.14e−4 5.05e−4

ARC 0.251 0.284 0.284 0.018 0.021 14.0e−4 15.0e−4

NLAR 0.269 0.271 0.281 0.017 0.019 2.93e−4 5.05e−4

Here, �r,T denotes the estimated “exact” standard deviation, E(�∗r,T

) the mean and STD(�∗r,T

) the standarderror for LB estimates, E(�r,T ) the mean and STD(�r,T ) the standard error for LGB estimates, MSE are meansquare errors.

Paparoditis and Politis (2002). Firstly, the statistics used for validation in Paparoditis andPolitis (2002) paper are calculated, e.g., first and second lag-reversibility coefficients andthe first order autocorrelation coefficient. Secondly, comparison of LB and LGB algorithmsis extended by plotting estimates of instantaneous probability density functions and somepersistence statistics.

Data series of size T =100 and 200 have been generated for the following Markov linearand nonlinear models:

AR: Xt = 0.8Xt−1 − 0.6Xt−2 + �t ,

ARC: Xt = 0.8Xt−1 − 0.6Xt−2 + ut ,

NLAR: Xt = 0.8 log(1 + 3X2t−1) − 0.6 log(1 + 3X2

t−3) + �t ,

where the errors {�t } and the {ut } are i.i.d. The distribution of the �t is Gaussian (0,1), whilethat of ut is a mixture of 90% Gaussian (−1, 1) and 10% Gaussian (1,9).

Several parameters are estimated. For each of them, the standard deviation is calculated.Let us denote �r,T the “exact” standard deviation of the considered parameter, �∗

r,T the

V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338 3327

Table 2Estimated “exact” and bootstrap estimates of the standard deviation �T of the sample first order auto-correlationcoefficient.

�r,T E STD MSE

�∗r,T

�r,T �∗r,T

�r,T �∗r,T

�r,T

T = 100AR 0.045 0.055 0.047 0.009 0.011 1.81e−4 1.25e−4

ARC 0.043 0.058 0.068 0.020 0.018 6.25e−4 9.49e−4

NLAR 0.084 0.086 0.084 0.016 0.013 2.60e−4 1.69e−4

T = 200AR 0.031 0.036 0.046 0.005 0.007 0.50e−4 2.74e−4

ARC 0.031 0.035 0.045 0.008 0.010 0.80e−4 2.96e−4

NLAR 0.059 0.058 0.059 0.009 0.009 0.82e−4 0.81e−4

Here, �T denotes the estimated “exact” standard deviation, E(�∗T

) the mean and STD(�∗T

) the standard errorfor LB estimates, E(�T ) the mean and STD(�T ) the standard error for LGB estimates, MSE are mean squareerrors.

-20 -15 -10 -5 0 5 10 15 20 250

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2probability density function

ReferenceLBLGB

Fig. 1. Probability density function of the ARC model. Solid line corresponds to reference, dashed line to LBsampled series and dashed-point to LGB sampled series.

standard deviation obtained on the sampled series using LB procedure and �r,T the standarddeviation obtained on the sampled series using LGB algorithm.

The LGB algorithm is compared to the LB procedure Paparoditis and Politis (2002)according to the estimation of three parameters: the sample lag-reversibility coefficient√

T − rPr , r = 1, 2 and the first order auto-correlation coefficient r1. For a time series(Xt )t , the parameter Pr is the probability that Xt − Xt−r > 0. And for a given sample

3328 V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338

LB

-10 0 10

-10

-5

0

5

10

15

LGB

-10 0 10

-10

-5

0

5

10

15

Reference

-10 0 10

-10

-5

0

5

10

15

0 1 2 3 4 5 6 7 8 9 10-0.5

0

0.5

1Autocorrelation function

ReferenceLBLGB

Fig. 2. Upper graphics plot the bivariate distribution of (Xt , Xt+1) for ARC model. And the figure below is theautocorrelation functions for the same model. Solid line corresponds to reference, dashed line to LB sampledseries and dashed-point line to LGB sampled series.

(Xt )t=1,...,T , Pr is straightforwardly estimated by Pr , defined as follows:

Pr = 1

T − r

T∑t=r+1

I(Xt − Xt−r ),

where I(x) = 1 if x > 0 and I(x) = 0 elsewhere.The tests are performed on the basis of 100 trials and 250 bootstraps. It may be verified

experimentally that the increase in trial number does not change significantly the results.The bandwidth parameter hT is calculated using the method described by Paparoditis andPolitis (2002): the studied process is approximated by a linear autoregressive model andhT isobtained by minimization of a quadratic risk. The grid parameters are chosen experimentallyas follows: the edge length �g is computed for every time t such that the cubic neighborhoodincludes all the observed Xk ∈ ( ˆV Y t )

+ and the discretization step is deduced by �g=�g/Ng

with fixed Ng = 10.Results given in Table 1 for the lag-reversibility coefficients and in Table 2 for the

first autocorrelation coefficient show that LGB procedure has about the same proper-ties as those of LB. Indeed we observe that most of the computed means and

V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338 3329

0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1persistence over level 2

ReferenceLBLGB

0 2 4 60

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1persistence over level 3

ReferenceLBLGB

Fig. 3. The figures plots the cumulative distribution of the length of the sejourn over levels 2 and 3. Solid linecorresponds to reference, dashed line to LB sampled series and dashed-point line to LGB sampled series.

standard-deviations in the tables are of the same order for LGB and LB. But we remarkthat LGB tends to overestimate some standard-deviations STD(�r,T ) especially for ARCmodel. As unobserved states are sampled, a small sampling noise may be added to thetime series that induces an increase of the standard-deviation of the estimators. Further-more, ARC process behave like an AR process with switching so that it may be difficultto be learned for short time series. A transformation of the data, such as a log transfor-mation, could help to obtain better bootstrap estimates. However, the figures describedbelow show that, although the standard deviations of the LGB estimates are slightly largerthan the standard deviations of LB estimates, the bias of LGB estimates seems to besmall.

Figs. 1 and 4 show estimations of the stationary probability density functions of thereference signal and of the simulated series obtained by LB and LGB procedures forARC and NLAR models. For these figures, 400 series of length T = 100 are computedand the mean is plotted. Fig. 1 (resp. 4) compares the kernel estimates of the station-ary probability density functions for ARC model (resp. NLAR model). We remark thatin both cases LGB better approximates the reference series. Figs. 2 and 5 evaluate thetime dependence structure of the series. The upper part of these figures presents the ker-nel estimate of the bivariate probability density function of the couple (Xt , Xt+1). Andthe lower part of the figures plots the mean of the autocorrelation functions of the

3330 V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338

-6 -4 -2 0 2 4 60

0.05

0.1

0.15

0.2

0.25

0.3

probability density function

ReferenceLBLGB

Fig. 4. Probability density function of NLAR model. Solid line corresponds to reference, dashed line to LB sampledseries and dashed-point line to LGB sampled series.

400 series. For the AR model such a figure has no sense because the process is Gaussianand the first order correlation coefficient suffices to describe the dependencestructure. We remark that both resampling procedures generate about the same results.LGB seems to better restore the bivariate distribution of (Xt , Xt+1) for NLAR model.Figs. 3 and 6 show the cumulative distribution of the length of sejourns (or persistence)over some given levels: level 2 for left graphics and level 3 for right graphics. This statisticis linked both with higher order time dependence structure and high values of the data. Itdescribes the behaviour of the time series over a quite high level. Here we notice that theLGB technique proposed in this paper better restore the persistence distribution (Figs. 4and 5).

The results presented here for the linear and nonlinear one-dimensional autoregressivemodels (AR,ARC and NLAR) lead us to conclude that the LGB procedure presents globallythe same performances as LB algorithm. Nevertheless, in LGB procedure, the estimationof the transition probability is more constrained since it combines a kernel estimate aroundthe current state Yt with a kernel estimate on the image of a neighborhood of Yt through theMarkovian transition operator. This can explain why LGB procedure seems to better capturesome statistics that describe the time dependence structure as for instance the persistencesstatistics (Fig. 6).

V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338 3331

Paparoditis

-5 0 5

-5

0

5

LGB

-5 0 5

-5

0

5

Reference

-5 0 5

-5

0

5

0 1 2 3 4 5 6 7 8 9 10-0.4

-0.2

0

0.2

0.4

0.6

0.8

1Autocorrelation function

ReferenceLBLGB

Fig. 5. Upper graphics plot the bivariate distribution of (Xt , Xt+1) for ARC model. The figure below gives theautocorrelation functions for the same model. Solid line corresponds to reference, dashed line to LB sampledseries and dashed-point line to LGB sampled series.

3.3. Lorentz attractor

LGB permits also to resample multidimensional processes. The Lorentz dynamical sys-tem is chosen as example. This oscillator model has been used the first by Lorentz inmeteorology to modelize turbulences consisting of rollers of parallel convection that ap-pear in a horizontal layer of fluid heated in its inferior part. Simplified equations are givenby

x = NPr(y − x),

y = −xz + rx − y,

z = xy − bz, (6)

where NPr is the number of Prandtl and parameters r and b depends on Rayleigh number.This oscillator exhibits a chaotic behavior characterized by an attractor with two lobes asshown in Fig. 7. The chaotic series has been computed with NPr = 16, b = 4, r = 45.92integrating Eq. (6) by a fourth-order Runge–Kutta method with time step �t = 0.02. Thedimension of the attractor is 2.06.

3332 V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1persistence over level 2

ReferenceLBLGB

0 1 2 3 4 50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1persistence over level 3

ReferenceLBLGB

Fig. 6. The figures plots the cumulative distribution of the length of the sejourns over levels 2 (left) and 3 (right).Solid line corresponds to reference, dashed line to LB sampled series and dashed-point line to LGB sampled series.

The LGB resampling procedure has been applied to the Lorentz oscillator trajectoriesto obtain the result given in Fig. 8. The parameters of the resampling algorithm (hT , �g)are initialized experimentally, then the bandwidth hT is increased such that the local neigh-borhood contains at least = 5 points. We observe that the 3-dimensional structure of theattractors is globally restored as well as the shape of the 1-dimensional time series. Indeedthe LGB induces a small resampling noise and that prevent the capture of fine details of thefractal attractor.

4. Concluding remarks

In this paper we have presented a nonparametric resampling procedure referred as LGBthat can be used to model and synthesize multivariate Markov processes of general orderp. This procedure extends the one proposed by Paparoditis and Politis (2002) in two ways:firstly, it copes with multivariate processes and secondly unobserved states can be generatedby using the smoothness properties of the transition kernel. The last feature is obtained by

V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338 3333

0

0

40

0

20

40

60

80

y

x

z

-40

0

40

x

reference

-50

0

50

y

40 50 60 70 800

40

80

z

time (s)

Fig. 7. Reference trajectories of the Lorentz attractor. On the left: 3D trajectory. On the right: 1D time series forthe same signal.

means of a grid used to discretize locally the image through the transition operator of theneighborhood of the current state. Theoretical results show that the proposed resamplingprocedure is asymptotically correct. Practically, to reproduce quite well the results obtainedby Paparoditis and Politis (2002) on linear and nonlinear autoregressive processes is per-mitted by LGB, although the expected lower rate of convergence for our approach shouldrequire a little more observed samples. We show that the parameters associated to the localgrid, namely the length of the grid edge and the size of the discretization step, permit tocontrol straightforwardly the number and the length of the subsequences reproduced inthe sampled time series. Even for large discretization steps, LGB reduces significantly thereproduction of such subsequences comparatively to the LB. Furthermore, some statisticsseem to be better captured by the LGB than the LB, in particular this seems to be the casefor the invariant densities and probabilities of sejourn within a compact subspace of thestate space for the autoregressive examples that have been tested. Finally, the LGB is ap-plicable to the sampling of more complex multivariate processes. Some tests performed onthe chaotic Lorentz oscillator show that the shape of the attractor is well captured althoughsome residual sampling noise prevent the extraction of fine details of the fractal structureof the attractor.

3334 V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338

0

0

40

0

20

40

60

80

y

x

z

-40

0

40sampled

x-50

0

50

y

40 50 60 70 800

40

80

z

time (s)

Fig. 8. Sampled trajectories of the Lorentz attractor. On the left: 3D trajectory. On the right: 1D time series for thesame signal.

Appendix. Proofs

Proof of Theorem 1. For X1, X2, . . . , XT given, let B(ET ) be the �-field of all finitesubsets of ET = (∪T

k=1 GY(k)T )p where (∪T

k=1 GY(k)T )p denotes the cartesian product of p

copies of ∪Tk=1 GY

(k)T . Let �T = ∞

k=1Ek,T where Ek,T is a copy of ET equipped with acopy of the �-field B(ET ) and denote by AT the product �-field on �T . Consider nowthe Markov chain Y = {Yt , t �1} on the path space (�T ,AT , P�0) defined as follows:for anyA ∈ AT and for �0 an appropriate initial distribution, P�0(Y ∈ A) describes theprobability of the event Y ∈ A when L(Y1) = �0. Apart from the initial distribution �0,P�0 is determined by the transition probability kernel P(x, z) = P(Yt+1 = z|Yt = x) whichis obtained by Eqs. (4) and (5). Indeed, for all x = (xp, . . . , x1) ∈ ET , there is a time k such

that P(x, GY(k)T ) > 0 and

P(x, z) =⎧⎨⎩

p(x, z′)∑j∈I [GY

(k)T ] p(x, zj )

if z′ ∈ GY(k)T ,

0 otherwise,(7)

where z = (z′, xp, . . . , x2) and p(.) is defined in Eq. (5). Let us now define ET =(∪T

k=p GY(k)T )p ⊂ ET . For every kernel bandwidth h > 0 and every x ∈ ET , we have

V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338 3335

by definition

∑z∈ET

P (x, z) =∑

{z|z′∈GY(k)T }

P(x, z) = 1.

Hence, ET is an absorbing communicating class. Furthermore, t0 ∈ N exists such thatP(Yt0 ∈ ET |Y1 =x)=1 for all x ∈ ET \ET by Theorem 2.3 of Doob (1953). By Proposition4.1.2 of Meyn and Tweedie (1993) it follows then that an irreductible Markov chain denotedby YT exists whose state space is restricted to ET with transition matrix PET

, i.e., therestriction of the matrix P = (P (x, z))

x,z∈ETto the class ET . The positive recurrence of

the chain is a simple consequence of �ET < ∞ where �A denotes the cardinal of A. Theaperiodicity of the chain follows from the properties of the reference Markov chain X andthe fact that the kernels Kd and Kdp are positives every where for every bandwidth h > 0by (K1). �

Proof of Theorem 2. Assertion 1 is a consequence of the following lemma which doesmost of the work for Prohorov consistency of Fy .

Let us denote F0 the mapping y �→ Fy(.).

Lemma 1. Let � be a bounded measurable function that is continuous on a set of Fy-probability 1. Then under conditions (i)–(iii) below

∫�(x)dFy(x) →

∫�(x)dFy(x) in pr .

(i) F0 is lProhorov continuous at y(ii) Wy → �y in pr.

(iii) ny → ∞ in pr where ny = (∑

k W 2k )−1 by definition.

We have to verify hypothesis (i) to (iii) of Lemma 1 in order to deduce Theorem 2.

Hypothesis (i) is given by Assumption (A2).Hypothesis (ii) is trivial given the definition of weights Wk and the properties (K1)–(K3)

of the density kernels Kd and Kdp and hT → 0 as T → ∞.Hypothesis (iii) is obtained if

∑k W 2

k tends to 0 as n tends to infinity. Given the def-inition of kernel Kd (respectively Kdp) it is not restrictive to suppose that each term pj

of the sums in the weight Wk (Eqs. (4) and (5)) is bounded by two positive constants0 < C1 �pjC2, ∀j on a compact neighborhood centered on (Xk+1, Yt ) and of volume

3336 V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338

min(T hd(p+1), support(Kd) × support(Kdp)). Then,

∑k

W 2k =

∑k∈I [GY

(t)T ]

(pk∑

j∈I [GY(t)T ] pj

)2

�C1�I [GY

(t)T ](�I [ ˆV Y ])2

C2(�I [GY(t)T ])2(�I [ ˆV Y ])2

� C1

C2�I [GY(t)T ]

,

where �A denotes the cardinal of the set A. Now, by construction, the set I [GY(t)T ] contains

the images {Yi} in the local neighborhood of the last generated state Yt , of diameter �T =chT , of the last generated state Yt and the points of the grid defined on the image of thisneighborhood with discretization step equal to �g . Then, two positive constants C and C′exit such that

�I [GY(t)T ] ∼ Cf (Yt )�

dpT T + C′�d

T

�dg

. (8)

Constant C′ depends on the regularity of the Markov chain operator H : Yt �→ Xt+1. Itimplies that ny = (

∑k W 2

k )−1 → ∞ when T → ∞.It may be observed in Eq. (8) that the first term is sufficient to ensure the weak convergence

but the second term permits to get a faster convergence. The second term corresponds tonumber of points of the grid that we add before sampling.

Assertion 2 is derived by means of the same arguments than those of the proof of Theorem3.3 in Paparoditis and Politis (2002). The idea is to extract a subsequence of the sampledbootstrap. If F = Fn is the stationary distribution of the bootstrap sequence Y for a givenlength n, by Helly’s selection theorem in Billingsley (1995), there exists a subsequence{Fk}k such that for all continuity points y ∈ Rdp of G

limk→∞ Fk(y) = G(y),

where G is a right continuous, nondecreasing function from Rdp into [0, 1]. Since thetransition probability density function of the reference Markov process is strictly positiveon the compact S, the sequence {Fn}n is tight and G is also a distribution function. Letg : Rd × Rdp → R be any bounded and continuous function, we get∫

g(x, y)dF (x, y) =∫

g(x, y)Fy(dx)dF (y)

�∫

g(x, y)Fy(dx)dF (y)

+∫

g(x, y)(Fy(dx) − Fy(dx))dF (y)

→∫

g(x, y)Fy(dx)dG(y)

V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338 3337

as k → ∞, where the last convergence follows by first assertion of the theorem, thecontinuity of y �→ ∫

g(x, y)Fy(dx) and the weak convergence of Fk to G. Thus G is thestationary distribution of Y from which it follows G = F by the uniqueness of F. Since theabove is true for any subsequence {Fk}k , we conclude by a corollary in Billingsley (1995,p. 381).

Proof of Lemma 1. We will follow Owen’s scheme of proof in Owen (1987) for Lemma1. Owen demonstrates the same result for independent variables.

Define � = ∫�(x)dFy(x) and �i = ∫

�(x)dFyi(x).

Let B = supx |�(x)| and � > 0. Then

∫�(x)dFy(x) −

∫�(x)dFy(x) =

T∑i=1

Wi(�(Xi) − �) − �

(1 −

n∑i=1

Wi

). (9)

The second term in (9) converges to 0 weakly under (ii).By the continuous mapping theorem (Billingsley, 1968) there is an open set O� ⊂ S such

that yi ∈ O� implies |�i − �| < �. The first term from (9) may now be written

∑Wi(�(Xi) − �) =

∑yi∈O�

Wi(�(Xi) − �) +∑

yi /∈O�

Wi(�(Xi) − �). (10)

The second term in (10) converges weakly to 0 under (i) and (ii) because

|�(Xi) − �|�2B.

Let |W | =∑ |Wi |. Conditionally on the Ys the first term in (10) has expectation boundedin absolute value by |W |� and variance bounded by 8B2/ny . Indeed, conditionally on thesample X,

Var

⎛⎝∑yi∈O�

Wi(�(Xi) − �|X⎞⎠

�E

⎡⎢⎣⎛⎝∑

yi∈O�

Wi(�(Xi) − �)

⎞⎠2

|X⎤⎥⎦

�∑

yi ∈O�

Wi4B2 +∑

yi∈O�

∑yj ∈O�,j �=i

WiWjE[(�(Xi) − �)(�(Xj ) − �)]

� 8B2

ny

applying (Wi − Wj)2 = W 2

i + W 2j − 2WiWj > 0.

3338 V. Monbet, P.-F. Marteau / Journal of Statistical Planning and Inference 136 (2006) 3319–3338

If |W | < 2 and ny > 8B2

�3 then by Chebychev inequality we obtain the conditionalprobability that∣∣∣∣∣∣

∑yi∈O�

Wi(�(Xi) − �)

∣∣∣∣∣∣> 3� (11)

is less than �. It follows that the unconditional probability

P

⎛⎝∣∣∣∣∣∣∑

yi∈O�

Wi(�(Xi) − �)

∣∣∣∣∣∣> 3�

⎞⎠< P(ny �b2/�3) + P(|W |�2) + � → �

by (ii) and (iii). This establishes the result of Lemma 1. �

References

Ailliot, P., Prevosto, M., Soukissian, T., Diamanti, C., Theodoulides, A., Politis C., 2003. Simulation of sea stateparameters process to study the profitability of a miritme line. Proceedings of ISOPE Conference 2003.

Billingsley, P., 1968. Convergence of Probability Measures. Wiley, NY.Billingsley, P., 1995. Probability and Measures. Wiley, NY.Bühlmann, P., 2002. Bootstraps for time series. Statist. Sci. 17, 52–72.Chan, K.S., Tong, H., Stenseth, N.C., 1997. Analyzing abundance data from periodically fluctuating rodent

populations by threshold models: a nearest block bootstrap approach. Technical Report, No 258, Departmentof Statistics and Actuarial Science, University of Iowa.

Doob, J.L., 1953. Stochastic Processes. Wiley, NY.Efron, B., 1979. Bootstrap methods: another look at the Jacknife. Ann. Statist. 7, 1–26.Efron, B., Tibshirani, R., 1993. An Introduction to the Bootstrap. Chapman Hall, NY.Falk, M., Reiss, R.D., 1989. Bootstrapping conditional curves. In: Jöckel, K.H., Rothe, G., Sendler,W. (Eds.),

Bootstrapping and Related Techniques, Lecture Notes in Economics and Mathematical Systems, vol. 376.Springer, NY.

Freedman, D.A., 1984. On bootstrapping two-stage least squares estimates in stationary linear models. Ann. Stat.12, 827–842.

Franke, J., Härdle, W., 1992. On bootstraping kernel spectral estimates. Ann. Stat. 20, 120–145.Horowitz, J.L., 2003. Bootstrap Methods for Markov Processes. Econometrica 71, 1049–1082.Härdle, W., Horowitz, J., Kreiss, J.P., 2003. Bootstrap methods for time series. Internat. Statist. Rev. 71, 435–459.Künsch, H.R., 1989. The jacknife and the bootstrap for general stationary observations. Ann. Stat. 17, 1217–1241.Lall, U., Sharman, A., 1996. A nearest neighbor bootstrap for resampling hydrologic time series. Water Resour.

Res. 32, 679–693.Meyn, S.P., Tweedie, R.L., 1993. Markov Chains and Stochastic Stability. Springer, London.Monbet,V., Marteau, P.F., 2001. Continuous space discrete time Markov models for multivariate sea state parameter

processes. Proc. ISOPE Conference 2001.Owen, A.B., 1987. Non parametric conditional estimation. Ph.D. Dissertation. Stanford University.Paparoditis, E., Politis, D.N., 2001a. Tapered block bootstrap. Biometrika 88 (4), 1105–1119.Paparoditis, E., Politis, D.N., 2001b. A Markovian local resampling scheme for nonparametric estimators in time

series analysis. Econometric Theory 17 (3), 540–566.Paparoditis, E., Politis, D.N., 2002. The local bootstrap for Markov processes. J. Statist. Plan. Inf. 108, 301–328.Politis, D.N., Romano, J.P., 1994. The stationary bootstrap. JASA 89, 1303–1313Politis, D.N., White, H., 2004. Automatic block-length selection for the dependent bootstrap. Econometric Rev.

23 (1), 53–70.Shao, J., Tu, D., 1995. The Jacknife and Bootstrap. Springer, NY.Shi, S.G., 1991. Local Bootstrap. Ann. Institut. Statist. Math. 43, 667–676.van der Vaart, A.W., 1998. Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics.

Cambridge University Press, Cambridge.