college of liberal arts – texas a&m university
TRANSCRIPT
Robust Estimation and Inference for Jumps in Noisy
High Frequency Data: A Local-to-Continuity Theory
for the Pre-averaging Method∗
Jia Li
Department of Economics
Duke University†
This Version: September 15, 2012
Abstract
We develop an asymptotic theory for the pre-averaging estimator when asset price
jumps are weakly identified, here modeled as local to zero. The theory unifies the con-
ventional asymptotic theory for continuous and discontinuous semimartingales as two
polar cases with a continuum of local asymptotics, and explains the breakdown of the
conventional procedures under weak identification. We propose simple bias-corrected
estimators for jump power variations, and construct robust confidence sets with valid
asymptotic size in a uniform sense. The method is also robust to microstructure noise.
Keywords: Confidence set; high frequency data; jump power variation; market
microstructure noise; pre-averaging; semimartingale; uniformity.
JEL Codes: C22.
∗This paper is a revised version of part of my Ph.D. dissertation at the department of economics,Princeton University. I am very grateful to my advisors Yacine Aït-Sahalia, Ulrich Müller and MarkWatson, as well as Jean Jacod for their guidance. I am also grateful for comments from Tim Bollerslev,Valentina Corradi, Nour Maddahi, Andrew Patton, George Tauchen and Viktor Todorov on various versionsof this paper. Comments from three referees and the co-editor have vastly improved the paper. The workis partially supported by NSF Grant SES-1227448. All errors are mine.†Durham, NC 27708. E-mail: [email protected].
1
1 Introduction.
This note proposes a robust method for the estimation and inference of power variations
of asset price jumps. We model the asset price as a continuous-time semimartingale and
the pth power variation of its jumps (henceforth the jump power variation) over some time
interval [0, T ] is defined as∑
0≤s≤T |∆Js|p, where ∆Js is the jump at time s. The jump
power variation is a pathwise analogue of the absolute moment of jumps. It naturally serves
as a measure for the jump risk, and can be used for estimating parameters governing the
jump process (Aït-Sahalia (2004), Todorov and Bollerslev (2010)), as well as constructing
nonparametric specification tests related to jumps (Aït-Sahalia and Jacod (2011)). Dis-
entangling the jump power variation, or functionals of the jump component in general, is
nontrivial because jumps are convoluted with the drift and the diffusive parts of the price
process. This task is further confounded by the presence of microstructure noise (Andersen
et al. (2006)). To the best of our knowledge, the pre-averaging method proposed by Jacod
et al. (2010), hereafter denoted JPV, is the only method available in the current literature
for the estimation and inference of jump characteristics that is robust to noise. However,
the asymptotic theory of JPV does not provide a satisfactory finite-sample approximation
in the presence of jumps, as documented by Aït-Sahalia et al. (2012), henceforth AJL.
In this note, we examine the asymptotic properties of the pre-averaging estimator when
jumps are weakly identified, or “small”, here modeled as local to zero. As hinted in the title,
we label this local asymptotic setting as “local-to-continuity”. While the standard theory of
JPV describes very distinct asymptotic behaviors of the pre-averaging estimator depending
on whether jumps are present or not, our results provide a continuum of local asymptotics
which bridges their results as two polar cases. Our theory explains the breakdown of the
standard method when jumps are weakly identified. Constructively, we propose a simple
bias correction for the pre-averaging estimator; we also propose robust confidence sets (CS)
for jump power variations, which have valid asymptotic coverage uniformly against “possibly
small jumps”. The results are nonparametric in nature, are valid for almost unrestricted
semimartingales, and are robust to microstructure noise. Monte Carlo evidence strongly
supports our theoretical findings.
Our contribution is twofold. Firstly, our local asymptotic theory for the pre-averaging
estimator is novel. Secondly, to the best of our knowledge, the robust CS and the associated
uniformity result is the first example of this kind for discretely sampled semimartingales.
More generally, we believe that the local-to-continuity approach can be extended to many
2
other applications for studying asset price jumps based on high frequency data.
We now discuss the related literature. We analyze the estimators of JPV and AJL
under the local-to-continuity setting and propose a robustification for these estimators. The
inference problem considered here, i.e. constructing CS’s for jump power variations, is more
general than the testing problem of AJL. Prior works on noise-robust estimation for high
frequency data, see e.g. Barndorff-Nielsen et al. (2008) and references therein, typically
assume away jumps or treat jumps as a nuisance, and hence have a quite different focus
than here. Jumps are now known to be prevalent in financial data and have been actively
studied in financial econometrics, see Aït-Sahalia and Jacod (2011) for a recent survey.
The insight that local asymptotics often provides a deeper understanding of the finite-
sample behavior of statistical procedures is now well recognized in econometrics. Main
examples include the local-to-unity literature, see e.g. Phillips (1987), as well as the weak
identification literature, see Staiger and Stock (1997) and Stock andWright (2000). Further-
more, local asymptotics have been shown to play a crucial role for constructing uniformly
valid inference procedures, see Mikusheva (2007), Andrews and Cheng (Forthcoming) and
references therein. Our approach here is clearly inspired by the above literature, but it is
distinct from prior works because of the nonstandard nature of the fill-in asymptotics for
semimartingale models.
The note is organized as follows. Section 2 presents the model and the pre-averaging
estimator. Section 3 presents the theory. Section 4 concludes. Technical details are collected
in the Appendix, where we present the regularity conditions and construct an estimator for
the asymptotic variance. The web supplement of this note contains all proofs and simulation
results.
2 The setting.
2.1 The underlying process.
The underlying process is a one-dimensional Itô semimartingale on a filtered space
(Ω,F , (Ft)t≥0,P) with the form
Xηt = X0 +
∫ t
0
bs ds+
∫ t
0
σsdWs + Jt, where (1)
Jt = η
(∫ t
0
∫E
δ (s, z) 1|δ(s,z)|≤1µ (ds, dz) +
∫ t
0
∫E
δ (s, z) 1|δ(s,z)|>1µ (ds, dz)
),
3
X0 is an F0-measurable random variable, W is a Brownian motion, σ is the stochastic
volatility process taking values in (0,∞) almost surely, η ∈ [0, 1] is a constant, δ is a
predictable function, µ is a Poisson random measure on R+ × E and its compensator is
ν(dt, dz) = dt ⊗ λ(dz) where (E, E) is an auxiliary space and λ is a σ-finite measure, and
µ = µ− ν. In typical financial econometrics applications, the underlying process representsthe logarithm of an asset price sampled at regularly spaced discrete times i∆n, i ≥ 0, over
a fixed time interval [0, T ], with the time lag ∆n → 0 asymptotically.
On the right-hand side of (1), the component X0 +∫ t
0bs ds+
∫ t0σsdWs is a continuous Itô
semimartingale, and Jt is a purely discontinuous process which can be completely charac-
terized by its jumps. In the sequel, we refer to these two components as the continuous part
and the jump part, respectively. We use the parameter η to control the scale of the jumps.
This parameter plays an important role in our asymptotic theory. To separate η from other
modeling components in (1), we introduce an auxiliary process X by setting Xt = X1t .1 In
particular, we have
∆J = η∆X, (2)
where for any càdlàg (i.e., right-continuous with left limits) process Y, the process ∆Y is
defined as ∆Yt = Yt − Yt−, t ≥ 0. The fixed jump process ∆X can be thought of as the
“direction”in which ∆J deviates from zero and η quantifies this deviation.
We stress that we are interested in the jumps of the underlying process, i.e. ∆J , rather
than η and ∆X separately. Indeed, while ∆J is identifiable upon observing the underlying
process in continuous time, η and ∆X can not be identified separately because of (2). We
hence keep the dependence of J on η implicit in our notation.2 Nevertheless, introducing
the scaling parameter η is useful for considering local asymptotics under a drifting sequence
of data generating processes. In Section 3, we derive asymptotic properties of the pre-
averaging estimator for a drifting sequence ηn while keeping other coeffi cients (i.e. b, σ, δ,
µ) fixed. When ηn → 0, the jump process ∆J = ηn∆X converges to zero asymptotically,
capturing the idea that jumps are “small” in an asymptotic sense. Moreover, the rate at
which ηn vanishes to zero describes how “small”the jumps are, and not surprisingly, it plays
an important role in the limiting theorems. Generally speaking, one could think of letting
1We note that X is the standard model for high frequency data and has been widely studied in theliterature, see e.g. Jacod (2008), Aït-Sahalia and Jacod (2011), Bollerslev and Todorov (2011), Todorovand Tauchen (2012) and in particular JPV.
2One may find that it is notationally more consistent to write Jη in place of J in order to emphasizeits dependence on η, we suppress the superscript here also for the sake of avoiding the somewhat awkwardnotation
∑s≤T |∆Jηs |
p for the pth jump power variation.
4
J drift to zero as an element in the space of stochastic processes, instead of governed by
the scaling sequence ηn. Our formulation here is only a special case, adopted largely due to
conceptual simplicity and technical convenience.3
To make the exposition as simple as possible, we present and discuss the regularity
conditions on the underlying process in Appendix A. Here, we only point out that the
assumptions are fairly standard and unrestrictive, allowing for stochastic volatility, jumps
of finite or infinite activity, and all manners of dependence between the characteristics of
the underlying process.
2.2 The noise.
We suppose that the underlying process can only be observed with an error: instead of Xηt ,
we observe
Zt = Xηt + χt. (3)
The error term χt is typically referred to as the “market microstructure noise”, which mainly
includes, but is not limited to, the bid-ask spread. We denote the conditional volatility of
the noise by αt =
√E[χ2
t | F(0)t ], where (F (0)
t )t≥0 is a filtration to which all processes in (1)
are adapted. The formal construction of the noise model and regularity conditions on the
noise are given in Appendix A.
The additive noise model considered here is standard in the literature, although the
specific assumption on the noise varies.4 Following Jacod et al. (2009), JPV and AJL, our
key assumption on the noise is that conditionally on the underlying process, the noise is
serially independent with mean zero. This assumption excludes some empirical features of
the noise that have been considered in the literature: we rule out the correlation between the
noise and the underlying process and the serial correlation of the noise process itself (Hansen
and Lunde (2006)), as well as the pure rounding case considered by Li and Mykland (2007).
This being said, we note that our assumption on noise is, to the best of our knowledge, the
most general setup known in the literature for studying the inference problem on jumps. In
3I wish to thank an anonymous referee for interesting comments on alternative formulations of the localasymptotic embedding.
4For various assumptions considered in the literature, see Zhou (1996), Zhang et al. (2005), Zhang (2006),Bandi and Russell (2006), Kalnina and Linton (2008), Barndorff-Nielsen et al. (2008, 2011), Aït-Sahalia et al.(2011). All these papers consider the case of estimating the quadratic variation and covariation, mostlyunder the setting without jumps. Hence, there is only minor overlap between these papers and the currentpaper, which is concerned with a general setting with jumps and functionals beyond the quadratic variation.
5
particular, we do allow the noise process to have stochastic conditional heteroskedasticity (αtis a stochastic process), unconditional serial dependence and dependence on the underlying
process through higher moments. We also allow the situation with “smooth rounding”as
discussed in Jacod et al. (2009).
2.3 Pre-averaging.
We now introduce the pre-averaging estimator proposed by JPV. The pre-averaging esti-
mator can be used to estimate jump power variations or integrated volatility functionals
depending on whether jumps are present or not. In order to define the pre-averaging win-
dow, we choose a sequence of integers kn satisfying kn√
∆n = θ + o(∆1/4n ) for some θ > 0.
Moreover, pre-averaging involves weighting the observations in the pre-averaging window.
In this paper, a function g : R 7→ R+ is called a weight function if it is continuous, piecewise
C1 with a piecewise Lipschitz derivative g′, supported on [0, 1] and satisfies∫g(s)2ds > 0.
For q > 0, we denote g(q) =∫|g(s)|q ds and g′(q) =
∫|g′(s)|q ds.
With any process Y = (Yt)t≥0 and weight function g, we associate the following variables.
For each integer i, we denote gni = g(i/kn), g′ni = gni − gni−1, and ∆ni Y = Yi∆n − Y(i−1)∆n .
5
We then set
Y (g)ni =kn−1∑j=1
gnj ∆ni+jY, Y (g)ni =
kn∑j=1
(g′nj ∆ni+jY )2.
We also define
V (Y, g, q, l)nt =
bt/∆nc−kn∑i=0
|Y (g)ni |q |Y (g)ni |l (4)
with b·c denoting the largest smaller integer function.Let p ≥ 2 be an even integer throughout the rest of this paper. We define (ρ(p)j)j=0,··· ,p/2
as the unique numbers solving the following triangular system of linear equations:
ρ(p)0 = 1,∑jl=0 2l m2j−2l C
p−2jp−2l ρ(p)l = 0, j = 1, 2, · · · , p/2,
where mq denotes the qth absolute moment of the law N (0, 1), and for integers x and y,
Cyx = x!/y!/(x− y)!.
5The notation ∆ni Y indicates that the increments form a triangular array; it is not meant to denote the
nth difference.
6
Finally, for any process Y , the pre-averaging estimator is defined to be
V (Y, g, p)nt =
p/2∑l=0
ρ(p)l V (Y, g, p− 2l, l)nt .
In the above display, the term V (Y, g, p, 0)nt serves as the leading term. The other terms
V (Y, g, p− 2l, l)nt , l = 1, . . . , p/2, when weighted by the constants ρ (p)l , correct the bias
arising from noise in the leading term.
3 The local-to-continuity asymptotics.
3.1 The law of large numbers.
We start with the law of large numbers (LLN) of the pre-averaging estimator V (Z, g, p)ntunder a drifting sequence of models governed by ηn; we use
P,ηn−→ to indicate the convergence
in probability under such a sequence. The normalizing factor in the LLN is given by
dn =∆
1−p/4n
1 + (∆−r∗n ηn)p, where r∗ =
p− 2
4p.
It is chosen to ensure that the probability limit is nondegenerate in all scenarios. We also
set
V (g, p)t = mp(θg (2))p/2∫ t
0
σpsds,
U (g, p)t = θg (p)∑s≤t|∆Xs|p
; (5)
it is helpful to recall that the price jump ∆J is related to the auxiliary process ∆X by (2).
Before stating the LLN, we remind the reader that the exact statements of the assump-
tions are collected in Appendix A, and all proofs are in the web supplement of this paper.
In the sequel, for any function f : R+ 7→ R, f (∞) should be understood as limh→∞ f (h)
provided that the limit exists; for any strictly positive sequences of real numbers xn and yn,
we denote xn ∼ yn iff limn→∞ xn/yn = 1.
Theorem 1 (LLN) Suppose that Assumptions (H-2) and (N) hold. Let (ηn)n≥1 ⊂ [0, 1] be
7
a sequence satisfying ∆−r∗
n ηn → h for some h ∈ [0,∞]. Then
dnV (Z, g, p)ntP,ηn−→ 1
1 + hpV (g, p)t +
hp
1 + hpU (g, p)t . (6)
Comments. (i) First consider the special case in which ηn vanishes at a polynomial rate,
i.e., ηn = ∆rn for some r ≥ 0. If r = r∗, the condition of Theorem 1 is satisfied with h = 1.
The normalizing factor is dn = ∆1−p/4n /2; the limiting variable is (V (g, p)t + U (g, p)t) /2,
the average between the contribution from the continuous part and the contribution from
the jump part, measured by V (g, p)t and U (g, p)t respectively. When r > r∗ (resp. r < r∗),
the theorem can be applied with h = 0 (resp. h = ∞), and the limit is V (g, p)t (resp.
U (g, p)t). In other words, when jumps are “small” (resp. “large”), the continuous part
(resp. the jump part) dominates the first-order asymptotic behavior of the estimator. The
constant r∗ is precisely the critical vanishing rate which balances the contributions of the
continuous part and the jump part in the LLN. The theorem also allows ηn to drift at non-
polynomial rates. For example, if ηn = ∆r∗n log (1/∆n), the theorem can be applied with
h =∞.(ii) The normalizing factor depends on ηn in the following way: when h ∈ [0,∞),
dn ∼ ∆1−p/4n /(1 + hp); when h =∞, dn ∼ ∆
1/2n η−pn ∼ θk−1
n η−pn .
(iii) The standard asymptotic theory of JPV under the non-local model can be recovered
by properly setting the sequence ηn. We first consider the case with jumps by taking ηn ≡ 1,
so h = 1 when p = 2 and h =∞ when p ≥ 4. In view of comment (ii), (6) can be written as
∆1/2n V (Z, g, p)nt
P→ V (g, p)t + U (g, p)t when p = 2
∆1/2n V (Z, g, p)nt
P→ U (g, p)t when p ≥ 4.
(7)
Up to a multiplicative constant, (7) coincides with Theorem 3.4(a) of JPV.6 In the absence
of jumps, we can take ηn ≡ 0 and simplify (6) as
∆1−p/4n V (Z, g, p)nt
P→ V (g, p)t .
This recovers Theorem 3.4(b) of JPV.
6To be precise, Theorem 3.4(a) in JPV allows X to be an arbitrary semimartingale and does not requirep to be an even integer when p > 2. Hence, we only recover special cases of their results. However, the lossof generality is inevitable here, because of our purpose of unifying the limiting theory for both continuousand discontinuous cases. Indeed, in the continuous case (Theorem 3.4(b)), JPV also require p to be an eveninteger.
8
(iv) The parameter h is a reparametrization of the scaling sequence ηn, which measures
the strength of the jump signal. As h increases from 0 to ∞, the limiting variable in(6) shifts continuously from the limit in the standard continuous case to the limit in the
standard jump case. In other words, Theorem 1 bridges the standard asymptotic results as
polar cases with a continuum of local asymptotics.
(v) Finally, we discuss the role of the smoothing parameter θ. Fix some h ∈ (0,∞).
In this case, both the continuous part and the jump part have nonnegligible contributions
in the LLN, and their relative contribution can be measured by V (g, p)t /U (g, p)t. Other
things being equal, this ratio is proportional to θp/2−1. When p > 2, V (g, p)t /U (g, p)tincreases in θ, suggesting that the relative strength of the jump signal decreases when we
over-smooth the data (i.e. θ and kn are large). When p = 2, this intuition is no longer true,
because the relative strength is invariant to θ.
Theorem 1 has an important implication for estimating jump power variations∑s≤t |∆Js|
p with p ≥ 4. It suggests that the pre-averaging estimator tends to overestimate
the jump power variation due to the continuous part in the price process; this source of over-
estimation is invisible in the standard asymptotics, see (7), but emerges naturally in the
local asymptotics. The theorem also suggests a simple procedure for correcting the higher-
order bias.7 Observing from (5) that V (g, p)t and U (g, p)t depend on the weight function g
in distinct ways when p > 2, we propose correcting the bias via multiple weight functions.
We consider d ≥ 2 weight functions (gi)1≤i≤d and a constant d-vector κ = (κi)1≤i≤d such
thatd∑i=1
κigi (2)p/2 = 0, θd∑i=1
κigi (p) = 1. (8)
The bias-corrected estimator is given by
Hnt = ∆1/2
n
d∑i=1
κiV (Z, gi, p)nt . (9)
Below, we compare the bias-corrected and the uncorrected estimators for the jump power
variation. For any nonrandom sequence bn > 0, we denote by op,ηn (bn) a generic sequence
of variables ξn which satisfies ξn/bnP,ηn−→ 0.
7As pointed out by an anonymous referee, the term “bias”here is somewhat imprecisely used. Neverthe-less, for the ease of discussion, we abuse the terminology slightly by using “bias”to refer to the first termon the right-hand side of (6). This term is nonzero only if h < ∞, implying ηn → 0 when p > 2. Hence,our discussion on “bias”does not directly speak to asymptotic results in the standard nonlocal setting.
9
Corollary 1 Suppose that the same conditions as in Theorem 1 hold for some h ∈ (0,∞).
Let p ≥ 4 be an even integer and Hnt = ∆
1/2n (θg (p))−1 V (Z, g, p)nt for some weight function
g. Then we have
Hnt =
∑s≤t|∆Js|p + op,ηn (ηpn) ,
Hnt =
∑s≤t|∆Js|p +
ηpnV (g, p)thpθg (p)
+ op,ηn (ηpn) .
Comments. (i) Corollary 1 shows that in the borderline case (0 < h <∞), Hnt is a valid
estimator for the jump power variation. It is valid in the sense that the estimation error
is asymptotically negligible relative to the estimand, even though the estimand vanishes
asymptotically; recall from (2) that∑
s≤t |∆Js|p = ηpn
∑s≤t |∆Xs|p, where the auxiliary
process ∆X is fixed. In contrast, the uncorrected estimator Hnt is invalid in the same sense,
because it carries a positive estimation error which has the same order of magnitude as the
estimand.
(ii) It can be shown that the assertion of Corollary 1 also holds for h = ∞, with theterm ηpnV (g, p)t /(h
pθg(p)) in the assertion understood as op,ηn (ηpn). Hence, when h = ∞,Hnt and H
nt are both “valid”estimators of the jump power variation in the same sense as
in comment (i). In particular, under the standard asymptotics (so ηn ≡ 1), both estimators
are consistent.
(iii) A convenient choice of the weight functions and the constants κi is the following.
For any weight function g1, if we take g2 (x) = g1 (kx) for some k > 1, then g1 (q) = kg2 (q)
for any q > 0, so (8) is satisfied with κ1 = −1/(θg1(p)(kp/2−1 − 1)) and κ2 = −kp/2κ1.
3.2 The central limit theorem.
We now describe the central limit theorem (CLT) of the pre-averaging estimator under the
local-to-continuity asymptotics. For a sequence of random variables ξn defined on the prob-
ability space (Ω,F ,P), we write ξnL-s,ηn−→ MN (0,Σξ) if ξn converges stably in law
8 under the
drifting sequence ηn to a random variable defined on an extension of the original probability
space and which, conditionally on F , has an N (0,Σξ) distribution. If Σξ is nonrandom, we
8Stable convergence in law is slightly stronger than the usual notion of weak convergence. We need thisstronger mode of convergence for inferential purposes, because the asymptotic variance here is random. SeeBarndorff-Nielsen et al. (2008) or Jacod and Shiryaev (2003) for detailed discussions.
10
write N (0,Σξ) in place of MN (0,Σξ). For applications, it is useful to consider the joint
convergence of estimators associated with multiple weight functions (gi)1≤i≤d for d ≥ 1,
which is our goal below.
We start by specifying the asymptotic variance. Consider two independent Brownian mo-
tions W 1 and W 2, given on another auxiliary filtered probability space (Ω′,F ′, (F ′t)t≥0,P′).For generic weight functions g and h, we define the following Wiener integral processes
L(g)t =
∫g(s− t) dW 1
s , L′(g)t =
∫g′(s− t) dW 2
s ,
and L(h) and L′(h) are defined likewise with h instead of g, with the same W 1 and W 2.
The four dimensional process (L(g), L′(g), L(h), L′(h)) is continuous stationary centered
Gaussian. We then set for x, y ∈ R and q, q′ even integers:
mq(g;x, y) = E′((xL(g)1 + yL′(g)1)q
)mq,q′(g, h;x, y)t = E′
((xL(g)1 + yL′(g)1)q (xL(h)t + yL′(h)t)
q′)
µ(g, h;x, y) =∑p/2
r,r′=0 ρ(p)rρ(p)r′ (2y2g′(2))
r (2y2h′(2)
)r′∫ 2
0
(mp−2r,p−2r′(g, h;x, y)t −mp−2r(g;x, y)mp−2r′(h;x, y)
)dt.
The covariance matrix associated with the continuous part is a d× d positive semi-definitematrix ΣC with entries
ΣijC = θ1−p
∫ t
0
µ(gi, gj; θσs, αs) ds.
We also consider four d× d positive semi-definite matrices Ψ−, Ψ+, Ψ′−, Ψ′+ with entries
Ψij± =
∫ 1
0
Γ(±, gi)t Γ(±, gj)t dt, Ψ′ij± =
∫ 1
0
Γ′(±, gi)t Γ′(±, gj)t dt,
where for any weight function g,
Γ(−, g)t =∫ 1
tg(s)p−1g(s− t)ds, Γ′(−, g)t =
∫ 1
tg(s)p−1g′(s− t)ds
Γ(+, g)t =∫ 1−t
0g(s)p−1g(s+ t)ds, Γ′(+, g)t =
∫ 1−t0
g(s)p−1g′(s+ t)ds.
The covariance matrix associated with the jump part is the d× d matrix given by
ΣJ = θ2p2∑s≤t|∆Xs|2p−2
(θσ2
s−Ψ− + θσ2sΨ+ +
α2s−θ
Ψ′− +α2s
θΨ′+
).
11
For any sequence (ηn)n≥1 ⊂ [0, 1], the normalizing factor of the CLT is given by:
an =∆
3/4−p/4n
1 + (∆−rn ηn)p−1, where r =
p− 2
4 (p− 1).
The normalizing factor depends on ηn and is chosen to avoid degenerate limits. Below, we
describe the joint stable convergence in law of the centered and scaled variables
V (gi, p)nt = an
(V (Z, gi, p)
nt −∆p/4−1
n V (gi, p)t −∆−1/2n θgi (p)
∑s≤t|∆Js|p
), 1 ≤ i ≤ d,
where the centering variable is motivated by Theorem 1.
Theorem 2 (CLT) Suppose that Assumptions (H-1), (K) and (N) hold. Let (ηn)n≥1 ⊂[0, 1] be a sequence such that ∆−rn ηn → h for some h ∈ [0,∞]. Then
(V (gi, p)nt )1≤i≤d
L-s,ηn−→ MN (0,Σ) ,
where
Σ =1
(1 + hp−1)2 ΣC +
(hp−1
1 + hp−1
)2
ΣJ .
Comments. (i) The constant r is the critical vanishing rate at which the continuous
part and the jump part are balanced in the CLT. When ηn ∼ ∆rn, both the continuous
part and the jump part have nonnegligible contributions to the asymptotic variance. When
∆−rn ηn → h = 0 (resp. h = ∞), the continuous part (resp. jump part) dominates in theasymptotic variance.
(ii) If the underlying process is assumed to be continuous, we can set ηn ≡ 0 and simplify
the statement of Theorem 2 as
∆−1/4n
(∆1−p/4n V (Z, gi, p)
nt − V (gi, p)t
)1≤i≤d
L-s−→MN (0,ΣC) .
This result recovers Theorem 4.1 of JPV and justifies the interpretation that ΣC is con-
tributed by the continuous part.
(iii) The standard asymptotic result of JPV in the jump case can be recovered by setting
ηn ≡ 1. In particular, when p ≥ 4 (so r > 0 and h =∞), the result of the theorem can be
12
simplified as
∆−1/4n
(∆1/2n V (Z, gi, p)
nt − θgi (p)
∑s≤t|∆Js|p
)1≤i≤d
L-s−→MN (0,ΣJ) .
Hence, the standard asymptotic theory not only ignores the higher-order bias in the LLN,
but also understates the sampling variability of the pre-averaging estimator, as it ignores
the contribution from the continuous part in the asymptotic variance. Therefore, if one
constructs a CS for the jump power variation based on the standard asymptotics, the CS
tends to be biased upward in location, and biased downward in scale, which may lead to
substantial size distortion. The size distortion in a testing context has been reported in the
simulation study of AJL.
(iv) Finally, it is interesting to observe that r > r∗ when p ≥ 4. Therefore, in view
of Theorem 1, the continuous part and the jump part can not be balanced in both the
first- and the second-order asymptotics for the same drifting sequence ηn. This observation
confirms the necessity of treating ηn as unknown and considering a general specification of
this drifting sequence, instead of imposing that ηn vanishes at some ad hoc rate.
The CLT for the bias-corrected estimator Hnt is given by the following corollary of
Theorem 2. Its proof is obvious and thus omitted.
Corollary 2 Let p ≥ 4 be an even integer and Hnt be defined by (9) with weight functions
(gi)1≤i≤d and constants κ = (κi)1≤i≤d. Under the same settings as in Theorem 2, we have
an
∆1/2n
(Hnt −
∑s≤t|∆Js|p
)L-s,ηn−→ MN
(0, κ>Σκ
).
3.3 Robust inference of jump power variations.
In this section, we construct robust CS’s for the pth jump power variation, i.e.∑
s≤t |∆Js|p,
for p ≥ 4. More precisely, for any constant c ∈ (0, 1), we construct a sequence of set-valued
statistics CSn1−c such that
limn→∞
infη∈[0,1]
Pη
(∑s≤t|∆Js|p ∈ CSn1−c
)= 1− c, (10)
13
where the notation Pη emphasizes the dependence of the data generating process on thescaling parameter η; recall that ∆J depends on η, see (2). By taking the infimum before
sending n → ∞, (10) requires that CSn1−c has valid asymptotic coverage uniformly overη ∈ [0, 1]. This uniformity requirement formalizes the notion of robustness against “possibly
small jumps”. For more discussions on the importance of uniformity, see Andrews and
Guggenberger (2009) and references therein.
Our construction relies on the bias-corrected estimator Hnt . To gauge its sampling vari-
ability, we need an estimator of the asymptotic variance Σ, which is associated with weight
functions (gi)1≤i≤d as described in Theorem 2. To streamline the discussion, we suppose
that there exists a sequence of estimators Σn which verifies Assumption (V) below. We
postpone the (somewhat complicated) construction of Σn until Appendix B for the sake of
readability.
Assumption (V). For any sequence (ηn)n≥1 ⊂ [0, 1] with∆−rn ηn → h ∈ [0,∞], a2nΣn
P,ηn→Σ.
We are now ready to define the robust CS. For any c ∈ (0, 1), let S1−c be a subset of Rsuch that P(ξ ∈ S1−c) = 1 − c for ξ ∼ N (0, 1). For any x, y ∈ R and S ⊆ R, we denotex+ yS ≡ x+ yz : z ∈ S. We then set
CSn1−c = Hnt + ∆1/2
n
√κ>ΣnκS1−c. (11)
The idea behind the construction of CSn1−c is straightforward. By the properties of stable
convergence, Corollary 2 and Assumption (V) imply
Hnt −
∑s≤t |∆Js|
p
∆1/2n
√κ>Σnκ
L-s,ηn−→ N (0, 1) (12)
for all ηn satisfying ∆−rn ηn → h ∈ [0,∞] . Importantly, the variable on the left-hand side of
the above display is asymptotically pivotal; in particular, the limiting distribution does not
depend on the strength of the jump signal, measured by the nuisance parameter h. Based
on (12), CSn1−c is the natural choice of CS with nominal level 1− c.The main result of this section is the following theorem, which summarizes the asymp-
totic properties of CSn1−c.
14
Theorem 3 (Uniformity) Suppose that Assumptions (H-1) and (N) hold. Let p ≥ 4 be
an even integer and Σn be a sequence of estimators satisfying Assumption (V). For every
c ∈ (0, 1), we have
limn→∞
infη∈[0,1]
Pη
(∑s≤t|∆Js|p ∈ CSn1−c
)= 1− c. (13)
Moreover, (13) still holds if we replace “inf”with “sup”.
Comment. Theorem 3 shows that CSn1−c has asymptotically valid coverage uniformly
over the scaling parameter η and the CS is not conservative; the existence of the limit is
part of the result. In addition, the second part of this theorem shows that the CS is also
asymptotically similar with respect to η.
As an interesting special case of (11), we consider one-sided confidence intervals. With
S0.5 being [0,∞) or (−∞, 0] in (11), the corresponding CS takes the form [Hnt ,∞) or
(−∞, Hnt ] respectively. Theorem 3 implies that
limn→∞
infη∈[0,1]
Pη
(∑s≤t|∆Js|p ≥ Hn
t
)= 0.5,
limn→∞
infη∈[0,1]
Pη
(∑s≤t|∆Js|p ≤ Hn
t
)= 0.5,
and the same results hold if we replace “inf”by “sup”. In other words, Hnt is a uniformly
(relative to η) asymptotically median unbiased estimator of the jump power variation. In
addition to Corollary 1, this result gives an alternative sense of robustness to the bias-
corrected estimator Hnt . In our opinion, this alternative argument is conceptually more
appealing because it relies on the uniformity principle, rather than some ad hoc asymptotic
embedding of the drifting sequence ηn.
4 Concluding remarks.
Motivated by the intuition that jumps in asset prices may be weakly identified due to the
presence of microstructure noise, we propose a local-to-continuity asymptotic theory for
the pre-averaging estimator. In a noisy setting, our theory unifies the standard asymp-
totic results for continuous and discontinuous Itô semimartingales as two polar cases with
a continuum of local asymptotics. The theory explains the higher-order bias in the stan-
15
dard estimator and the size distortion of the standard CS. More importantly, the theory
is constructive for designing methods that are robust to possibly weakly identified jumps.
Simulation evidence in the web supplement of this paper supports the theoretical claim: the
robust method generally outperforms the standard method, and the findings are robust to
various jump behaviors, microstructure noise, bouncebacks, rounding, and moderate per-
turbations on tuning parameters.
Appendix A Assumptions.
In this appendix, we present the assumptions mentioned in the main text. Let
(Ω(0),F (0), (F (0)t )t≥0,P(0)) be a filtered probability space on which the random quantities
in (1) are defined. We assume for r = 1 or 2,
Assumption (H-r): (a) the process (bt) is optional and locally bounded;
(b) the process (σt) is càdlàg and adapted;
(c) the function δ is predictable, and there is a bounded nonnegative measurable function
γ on (E, E), such that∫Eγ (z)r λ (dz) < ∞ and the process supz∈E(|δ(ω(0), t, z)| ∧ 1)/γ(z)
is locally bounded;
(d) we have almost surely∫ t
0σ2s ds > 0 for all t > 0.
Assumption (K): We have Assumption (H-2) and σt is also an Itô semimartingale which
can be written as
σt = σ0 +
∫ t
0
bsds+
∫ t
0
σsdWs +Mt +∑s≤t
∆σs 1|∆σs|>v,
where M is a local martingale orthogonal to W and with bounded jumps and 〈M,M〉t =∫ t0asds, and the compensator of
∑s≤t 1|∆σs|>v is
∫ t0a′sds, and where bt, at, a
′t are optional lo-
cally bounded processes, and the processes bt and σt are optional and càglàd (left-continuous
with right limits).
Conditions (a) and (b) of Assumption (H-r) impose very mild measurability and sample-
path regularity. Condition (c) is more restrictive when r = 1, in which case the price jumps
have finite variation; when r = 2, this condition is quite mild because the jumps of every
semimartingale is square-summable. The stronger version of this assumption (i.e. r = 1) is
needed for the central limit theorem because it is more diffi cult to disentangle the continuous
16
part from “small” jumps when we consider higher-order asymptotics; similar restrictions
have been adopted by Jacod (2008) and Todorov and Tauchen (2012). Condition (d) is
needed to avoid degenerate limiting theorems. Assumption (K) allows the volatility to be
stochastic with finitely or infinitely active jumps, and dependent on the price process in all
manners. Overall, these assumptions are fairly unrestrictive and satisfied by most models
in finance. Of course, they do exclude some examples such as fractional Brownian motion
or models without a continuous martingale part in price.
We now turn to the noise. Following JPV, we formalize the noise model as follows. For
each t ≥ 0, we have a transition probability Qt(ω(0), dz) from (Ω(0),F (0)
t ) into R. The spaceΩ(1) = R[0,∞) is endowed with the product Borel σ-field F (1) and the “canonical process”
(χt : t ≥ 0) and the probability Q(ω(0), dω(1)) which is the product⊗
t≥0Qt(ω(0), ·). The
filtered probability space (Ω,F , (Ft)t≥0,P) in the main text is then defined as follows:
Ω = Ω(0) × Ω(1), F = F (0) ⊗F (1),
Ft = F (0)t ⊗ σ(χs : s ∈ [0, t)),
P(dω(0), dω(1)) = P(0)(dω(0)) Q(ω(0), dω(1)).
Any variable or process which is defined on either Ω(0) or Ω(1) can be considered in the usual
way as a variable or a process on Ω.
The assumption on the noise is the following:
Assumption (N): For each q > 0, there is a sequence of (F (0)t )-stopping times (Tq,n)n≥1
increasing to ∞, such that∫Qt(ω
(0), dz) |z|q ≤ n whenever t < Tq,n(ω(0)). We write
β(q)t(ω(0)) =
∫Qt(ω
(0), dz) zq, αt =√β(2)t,
and we assume that the processes α and β(3) are càdlàg, and that β(1) ≡ 0.
For the results in this paper, we actually only need the moments of χt to be finite up to a
certain order, where the “minimal”order needed varies with the power index p in a nontrivial
way. For the purpose of simplifying the presentation, we adopt the stronger assumption that
χt has finite moments for all orders. This is nevertheless a very mild restriction in financial
applications as the noise is typically bounded within a few tick sizes. As mentioned in the
main text, the key requirement in this assumption is β(1) ≡ 0 in conjunction with the
conditional independence of the noise at different times.
17
Appendix B Estimation of the asymptotic variance.
In this appendix, we construct an estimator Σn which verifies Assumption (V). We fix
p ≥ 4 and weight functions (gi)1≤i≤d in the background. The main result is Corollary 3.
To clarify the idea behind our construction, we also provide auxiliary results (Theorems 4
and 5) which discuss estimators for the continuous component ΣC and the jump component
ΣJ separately. Estimators proposed here are similar to those proposed by AJL. The key
modification is on the estimation of ΣJ as we need to correct higher-order biases arising
from both the continuous part and the noise part under the local-to-continuity asymptotics;
such modification is necessary due to the strong requirement in Assumption (V). Technically
speaking, we derive the convergence of these estimators under the local-to-continuity setting,
which is beyond the scope of AJL.
We start with the estimation of ΣC . We choose a sequence of truncation levels un as
follows:
un = α∆$n , where α > 0,
p− 1
2 (2p− 1)< $ <
1
4.
We then complete the notation (4) with a truncated version: for any weight function φ,
V ∗(Y, φ, q, l)nt =
bt/∆nc−kn∑i=0
|Y (φ)ni |q 1|Y (φ)ni |≤un |Y (φ)ni |l.
We also need to define a number of constants associated with the weight functions. We set
for any functions g and h and any integers w ≥ 1 and w′ ∈ 0, · · · , 2w:
a(g, h)t =∫ 1+1∧t
1∨t g(u− 1) h(u− t) du
a′(g, h;w,w′)t =∑bw′/2c
r=0 C2rw′ m2rm2w−2r a(g, g)w−w
′
1
a(g, h)w′−2r
t
(a(g, g)1 a(h, h)1 − a(g, h)2
t
)r.
Below, g and h may take values as weight functions or their derivatives. We then write for
w ∈ N and generic weight functions g and h,
A(g, h;w)t =∑
l,l′∈0,··· ,p/2,l+l′≤p−w∑(2w)∧(p−2l′)
w′=(2w−p+2l)+ ρ(p)l ρ(p)l′ C2w−w′p−2l Cw′
p−2l′
(2g′(2))l(2h′(2))l′a′(g, h;w,w′)t a
′(g′, h′; p− l − l′ − w, p− 2l′ − w′)t
A′(g, h;w) =∫ 2
0A(g, h;w)t dt− 2m2
p g(2)p/2 h(2)p/2 1w=p.
18
We choose any weight function φ and associate with it a sequence of statistics ΣnC , taking
values in the space of d× d matrices, with entries
Σn,ijC = ∆−1/2
n
p∑w=0
θ A′(gi, gj;w)
m2w2p−wφ(2)wφ′(2)p−w
w∑l=0
ρ(2w)l V∗(Z, φ, 2w − 2l, p+ l − w)nt . (B.1)
The asymptotic property of ΣnC is described in the following theorem.
Theorem 4 Suppose that Assumptions (H-1) and (N) hold. Let p ≥ 4 and (ηn)n≥1 ⊂ [0, 1]
be a sequence such that ∆−rn ηn → h for some h ∈ [0,∞]. Then
a2nΣn
C
P,ηn−→ 1
(1 + hp−1)2 ΣC .
We now turn to the estimation of ΣJ . Setting
N (0,−)t = θ∑
s≤t |∆Xs|2p−2 σ2s−, N (0,+)t = θ
∑s≤t |∆Xs|2p−2 σ2
s,
N (1,−)t = 1θ
∑s≤t |∆Xs|2p−2 α2
s−, N (1,+)t = 1θ
∑s≤t |∆Xs|2p−2 α2
s,
we can rewrite
ΣJ = θ2p2(Ψ−N (0,−)t + Ψ+N (0,+)t + Ψ′−N (1,−)t + Ψ′+N (1,+)t
). (B.2)
To construct estimators for N (m,±)t, m = 0 or 1, we choose another sequence k′n of integers
satisfying
k′n/kn →∞, k′n∆n → 0.
For any weight functions φ and ψ, and any process Y, we consider the variables
ξ(Y, φ, 0)ni = 1φ(2)kn k′n ∆n
∑k′nj=1
((Y (φ)ni+j)
2 − 12Y (φ)ni+j
)1|Y (φ)ni+j |≤un
ξ(Y, φ, 1)ni = 1
2φ′(2)kn k′n ∆n
∑k′nj=1 Y (φ)ni+j 1|Y (φ)ni+j |≤un
v (Y, ψ)ni =∑p−1
l=0 ρ (2p− 2)l |Y (ψ)ni |2p−2−2l|Y (ψ)ni |l.
We define four processes as follows: for m = 0 or 1,
N (Y, φ, ψ,m,−)nt =∑bt/∆nc−kn
i=kn+k′nv (Y, ψ)ni ξ (Y, φ,m)ni−kn−k′n
N (Y, φ, ψ,m,+)nt =∑bt/∆nc−2kn−k′n+1
i=0 v (Y, ψ)ni ξ (Y, φ,m)ni+kn−1 .
19
To further the discussion, we now describe the asymptotic behaviors of
N (Z, φ, ψ,m,±)nt . To this end, we denote
Q (0)t = m2p−2θp−1
∫ t
0
σ2ps ds,
Q (1)t = m2p−2θp−1
∫ t
0
σ2p−2s α2
s ds.
Theorem 5 Suppose that Assumptions (H-1) and (N) hold. Let p ≥ 4 and (ηn)n≥1 ⊂ [0, 1]
be a sequence such that ∆−rn ηn → h for some h ∈ [0,∞]. Then for m = 0, 1, we have
a2nN (Z, φ, ψ,m,±)nt
P,ηn→(
1
1 + hp−1
)2
ψ (2)p−1Q (m)t
+
(hp−1
1 + hp−1
)2
ψ (2p− 2)N (m,±)t .
Theorem 5 shows that, when properly normalized, N (Z, φ, ψ,m,±)nt converges to
N (m,±)t plus a bias term, i.e., the term involvingQ (m)t. The bias is contributed by
the continuous part of the underlying price process. Again, we correct the bias by using
multiple weight functions. When p ≥ 4, we can pick two weight functions ψ1 and ψ2 and
real numbers λ1 and λ2 such that
2∑j=1
λjψj (2)p−1 = 0,2∑j=1
λjψj (2p− 2) = 1, (B.3)
and set N (m,±)nt =∑2
j=1 λjN(Z, φ, ψj,m,±
)nt. By Theorem 5 and (B.3),
a2nN (m,±)nt
P,ηn→(
hp−1
1 + hp−1
)2
N (m,±)t .
Finally, in view of (B.2), we set
ΣnJ = θ2p2
(Ψ−N (0,−)nt + Ψ+N (0,+)nt + Ψ′−N (1,−)nt + Ψ′+N (1,+)nt
). (B.4)
The following corollary summarizes the results above. Its proof is elementary and thus
omitted.
Corollary 3 Suppose that Assumptions (H-1) and (N) hold. Let p ≥ 4 and Σn = ΣnC + Σn
J ,
20
where ΣnC and Σn
J are given by (B.1) and (B.4), respectively. Then Σn satisfies Assumption
(V).
Comment. Corollary 3 shows that imposing Assumption (V) in Theorem 3 is free, as
it is implied by Assumptions (H-1) and (N).
References
Aït-Sahalia, Y., 2004. Disentangling diffusion from jumps. Journal of Financial Economics
74, 487—528.
Aït-Sahalia, Y., Jacod, J., 2011. Analyzing the spectrum of asset returns: Jump and volatil-
ity components in high frequency data. Journal of Economic Literature, Forthcoming.
Aït-Sahalia, Y., Jacod, J., Li, J., 2012. Testing for jumps in noisy high frequency data.
Journal of Econometrics 168, 207—222.
Aït-Sahalia, Y., Mykland, P. A., Zhang, L., 2011. Ultra high frequency volatility estimation
with dependent microstructure noise. Journal of Econometrics 160, 190—203.
Andersen, T. G., Bollerslev, T., Frederiksen, P. H., Nielsen, M. Ø., 2006. Comment on
realized variance and market microstructure noise. Journal of Business and Economic
Statistics 24, 173—179.
Andrews, D. W. K., Cheng, X., Forthcoming. Estimation and inference with weak, semi-
strong, and strong identification. Econometrica.
Andrews, D. W. K., Guggenberger, P., 2009. Validity of subsampling and “plug-in asymp-
totic”inference for parameters defined by moment inequalities. Econometric Theory 25,
669—709.
Bandi, F. M., Russell, J. R., 2006. Separating microstructure noise from volatility. Journal
of Financial Economics 79, 655—692.
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., Shephard, N., 2008. Designing realized
kernels to measure ex-post variation of equity prices in the presence of noise. Econometrica
76, 1481—1536.
21
Barndorff-Nielsen, O. E., Hansen, P. R., Lunde, A., Shephard, N., 2011. Multivariate realised
kernels: Consistent positive semi-definite estimators of the covariation of equity prices
with noise and non-synchronous trading. Journal of Econometrics 162, 149—169.
Bollerslev, T., Todorov, V., 2011. Estimation of jump tails. Econometrica 79, 1727—1783.
Hansen, P. R., Lunde, A., 2006. Realized variance and market microstructure noise. Journal
of Business and Economic Statistics 24, 127—161.
Jacod, J., 2008. Asymptotic properties of realized power variations and related functionals
of semimartingales. Stochastic Processes and their Applications 118, 517—559.
Jacod, J., Li, Y., Mykland, P. A., Podolskij, M., Vetter, M., 2009. Microstructure noise in
the continuous case: The pre-averaging approach. Stochastic Processes and Their Appli-
cations 119, 2249—2276.
Jacod, J., Podolskij, M., Vetter, M., 2010. Limit theorems for moving averages of discretized
processes plus noise. Annals of Statistics 38, 1478—1545.
Jacod, J., Shiryaev, A. N., 2003. Limit Theorems for Stochastic Processes, 2nd Edition.
Springer-Verlag, New York.
Kalnina, I., Linton, O., 2008. Estimating quadratic variation consistently in the presence of
endogenous and diurnal measurement error. Journal of Econometrics 147, 47—59.
Li, Y., Mykland, P. A., 2007. Are volatility estimators robust with respect to modeling
assumptions? Bernoulli 13, 601—622.
Mikusheva, A., 2007. Uniform inference in autoregressive models. Econometrica 75, 1411—
1452.
Phillips, P. C. B., 1987. Towards a unified asymptotic theory for autoregression. Biometrika
74, 535—547.
Staiger, D., Stock, J. H., 1997. Instrumental variables regression with weak instruments.
Econometrica 65, 557—586.
Stock, J. H., Wright, J. H., 2000. Gmm with weak identification. Econometrica 68, 1055—
1096.
22
Todorov, V., Bollerslev, T., 2010. Jumps and betas: A new framework for disentangling and
estimating systematic risks. Journal of Econometrics 157, 220—235.
Todorov, V., Tauchen, G., 2012. The realized laplace transform of volatility. Econometrica
80, 1105—1127.
Zhang, L., 2006. Effi cient estimation of stochastic volatility using noisy observations: A
multi-scale approach. Bernoulli 12, 1019—1043.
Zhang, L., Mykland, P. A., Aït-Sahalia, Y., 2005. A tale of two time scales: Determining
integrated volatility with noisy high-frequency data. Journal of the American Statistical
Association 100, 1394—1411.
Zhou, B., 1996. High-frequency data and volatility in foreign-exchange rates. Journal of
Business & Economic Statistics 14, 45—52.
23