measurement errors in quantile regression models · pdf filemeasurement errors in quantile...
TRANSCRIPT
Measurement Errors in Quantile Regression Models∗
Sergio Firpo† Antonio F. Galvao‡ Suyong Song§
November 8, 2016
Abstract
This paper develops estimation and inference for quantile regression models withmeasurement errors. We propose an easily-implementable semiparametric two-step es-timator when repeated measures for the covariates are available. Building on recenttheory on Z-estimation with infinite-dimensional parameters, consistency and asymp-totic normality of the proposed estimator are established. We also develop statisticalinference procedures and show the validity of a bootstrap approach to implement themethods in practice. Monte Carlo simulations assess the finite-sample performance ofthe proposed methods. We apply the methods to the investment equation model usinga firm-level data with repeated measures of investment demand, Tobin’s q. We docu-ment strong heterogeneity in the sensitivity of investment to Tobin’s q and cash flowacross the conditional distribution of investment. The cash flow sensitivity is relativelylarger at the lower part of the distribution, providing evidence that these firms aremore exposed to and dependent on fluctuations in internal finance.
Key Words: Quantile regression; measurement errors, investment equation
JEL Classification: C14, C23, G31
∗The authors would like to express their appreciation to Stephane Bonhomme, Tim Conley, SilviaGoncalves, Stefan Hoderlein, Roger Koenker, Arthur Lewbel, Salvador Navarro, Yuya Sasaki, Susanne Schen-nach, Liang Wang, Zhijie Xiao, and seminar participants at Boston College, Syracuse University, Universityof Western Ontario, 2015 Midwest Econometrics Group, 2015 CMStatistics, 2016 Latin American Workshopin Econometrics of the Econometric Society, and 2016 North America Summer Meeting of the EconometricSociety for helpful comments and discussions. All the remaining errors are ours.†Insper, Rua Quata 300, Sao Paulo, SP 04546-042. E-mail: [email protected]‡Department of Economics, University of Iowa, W284 Pappajohn Business Building, 21 E. Market Street,
Iowa City, IA 52242. E-mail: [email protected]§Department of Economics, University of Iowa, W360 Pappajohn Business Building, 21 E. Market Street,
Iowa City, IA 52242. E-mail: [email protected]
1 Introduction
Quantile regression (QR) models have provided a valuable tool in economics and statistics as
a way of capturing heterogeneous effects of covariates on the outcome of interest, exposing
a wide variety of forms of conditional heterogeneity under weak distributional assumptions.
Under some assumptions on the unobservable factors, QR can also be interpreted as providing
a structural relationship between the outcome of interest and its observable and unobservable
determinants. Also importantly, QR provides a framework for robust inference.
Measurement errors (ME) have important implications for the reliability of general stan-
dard estimation and testing. Variables used in empirical analysis are frequently measured
with error, particularly if information is collected through one-time retrospective surveys,
which are notoriously susceptible to recall errors. Recently, the topic of ME in variables
has received considerable attention in the QR literature. The standard QR estimator suffers
from bias in the presence of ME (see, e.g., He and Liang (2000)). Chesher (2001) studies
the impact of covariate ME on quantile functions using a small variance approximation ar-
gument. Schennach (2008) investigates identification of a nonparametric quantile function
when there is an instrumental variable measured on all sampling units. Wei and Carroll
(2009) develop a QR method that corrects the ME bias by constructing joint estimating
equations that simultaneously hold for all quantile levels. The method makes use of an iter-
ative algorithm and requires parametric pre-specification of a conditional density to obtain a
consistent estimator for the parameters of interest. Hu and Schennach (2008) establish that
the availability of instruments enables the identification of nonclassical errors-in-variables
models. The resulting identification induces an estimating equation that encompasses QR
methods. Schennach (2014) studies a model defined by moment conditions with unobserv-
ables, which covers QR with ME. The parameters of interest are obtained by averaging
the moment functions over the unobservables using a least favorable distribution obtained
through an entropy maximization procedure.1,2
Thus, in the analysis of QR with mismeasured covariates, it has been common to employ
estimation methods that either impose parametric restrictions on nuisance functionals or use
1Notice that the conditions used in this paper for identification of the conditional density in the first stageare different from those in Hu and Schennach (2008) and Schennach (2014), and non-nested.
2Recently, Torres-Saavedra (2013) and Hausman, Luo, and Palmer (2014) study ME in the dependentvariable of QR models. We refer to Ma and Yin (2011), Wang, Stefanski, and Zhu (2012), and Wu, Ma, andYin (2015) for other recent developments in QR models with ME.
1
exogenous information as those provided by instrumental variables (see, e.g., Wei and Carroll
(2009), Schennach (2008), and Chernozhukov and Hansen (2006)). Nevertheless, methods
relying on parametric assumptions are very sensitive to misspecification of such conditions
in practical inference. In addition, finding exogenous instrumental variables is known to be
a nontrivial task in most empirical applications.
This paper contributes to both the QR and ME branches of the literature by develop-
ing estimation and inference methods for QR models in the presence of ME in the covari-
ates. This is achieved by exploring repeated measures of the true regressor.3 The main
contributions are the following. First, we propose a simple and easily-implementable two-
step semiparametric estimation procedure for QR models that preserves the semiparametric
distribution-free and heteroscedastic features of the model. The first step employs a general
nonparametric estimation of the density function. The second step uses the estimated densi-
ties as weights in a weighted QR estimation. For the first step, we propose a nonparametric
estimator for the conditional density without imposing distributional assumptions on the
ME. Specifically, we show that two mismeasured covariates are sufficient to identify the con-
ditional density of interest in the presence of ME. In turn, this result guarantees consistent
estimation of the structural parameters of interest.
In the second main contribution we establish the asymptotic properties of the two-step
estimator, assuming that the conditional densities satisfy smoothness conditions and can
be estimated at an appropriate nonparametric rate. The third contribution is to develop
practical statistical inference and testing procedures for general linear hypotheses based on
the Wald statistic. To implement these tests in practice the critical values are computed
using a bootstrap method. We provide sufficient conditions under which the bootstrap is
theoretically valid, and discuss an algorithm for its practical implementation. Our method
leads to a simple algorithm that can be conveniently implemented in empirical applications.
Compared to the existing procedures for QR models with ME, our approach has several
distinctive advantages. First, the method employs a nonparametric estimator in the first
step and does not assume global linearity at all quantile levels for the estimation of the
conditional density function. Such feature makes the procedure applicable to any τ -quantile
3Identification and estimation of conditional average regression models and conditional density withrepeated measures of the true regressor have been studied in Li and Vuong (1998), Li (2002), Schennach(2004a), Delaigle, Hall, and Meister (2008), and Hu and Sasaki (2015a) among others. We extend thisliterature to estimation and inference for QR models using repeated measures of the true regressor.
2
of interest, thus relaxing the requirement of a joint estimation and providing more flexibility.
In contrast, Wei and Carroll (2009) require a parametric pre-specification of a conditional
density for implementation of their estimator. Second, our algorithm is computationally
simple and easy to implement in practice because estimation of the weights does not require
recursive algorithms allowing the weights for all observations to be obtained from one single
step. As a result, the quantile estimation in the second step is attained by minimizing only
one single convex objective function at the quantile of interest. On the other hand, Wei
and Carroll (2009) make use of an iterative algorithm and require the estimating equations
to be solved jointly for all quantiles, which increases the dimensionality of the problem
substantially. Finally, the estimated weights exhibit a property of uniform consistency,
implying that it is feasible to establish both the consistency and asymptotic normality of
the resulting estimators of the parameters of interest. Hence, the method provides standard
inference and testing procedures.
Monte Carlo simulations assess the finite-sample properties of the proposed methods.
We evaluate the estimator in terms of empirical bias, standard deviation, and mean squared
error, and compare its performance with methods that are not designed for dealing with
ME issues. The experiments suggest that the proposed approach performs relatively well in
finite samples and effectively removes the bias induced by ME.
Our procedure should be useful for those empirical settings based on QR models in which
ME in the independent variables is a concern. To motivate and illustrate the applicability
of the methods, we apply the developed methods to Fazzari, Hubbard, and Petersen (1988)
investment equation model, where a firm’s investment is regressed on a proxy for investment
demand (Tobin’s q) and cash flows. This a well-known model in the corporate investment
literature.4 The QR approach is a useful tool in this example because it allows us to cap-
ture the heterogeneity in the Tobin’s q and cash flows along the conditional distribution of
investment. Concerns about measurement errors have been emphasized in the context of
the empirical investment model. Theory suggests that firm’s investment demand is captured
by marginal q, but this quantity is unobservable and researchers use instead its measur-
able proxy, average q. Because average q measures marginal q imperfectly, a measurement
problem naturally arises (see, e.g., Hayashi (1982), Poterba (1988), Erickson and Whited
4Following Fazzari, Hubbard, and Petersen (1988), investment-cash flow sensitivities became a standardmetric in the literature that examines the impact of financing imperfections on corporate investment (Stein(2003)).
3
(2000), and Almeida, Campello, and Galvao (2010)). Within that framework, finding valid
and strong instrumental variables to solve the endogeneity problem is not, in general, an
easy task.
In the empirical example we use a data set taken from Cummins, Hasset, and Oliner
(2006), where there are two measures of average q. The first measure conforms with prior
research and is constructed using the standard equity prices. The second proxy for the firm’s
intrinsic value is based on analysts’ earnings expectations. Thus, our method is a natural
alternative solution to the ME problem where repeated measures on Tobin’s q are available.
The results document strong evidence of substantial heterogeneity in the sensitivity of in-
vestment to Tobin’s q and cash flow across the conditional distribution of investment. The
empirical results show that larger cash flow sensitivity occur at the lower part of the invest-
ment distribution, showing evidence that these firms are more exposed to and dependent on
fluctuations in internal finance. Our empirical findings support the idea that the proposed
methods are a useful alternative to existing approaches in economic applications in which
ME is an important concern.
The rest of the paper is organized as follows. Section 2 presents the model and discusses
identification of the parameters of interest in presence of ME. Section 3 proposes the two-step
QR estimator. Section 4 establishes the asymptotic properties of the estimator. Inference
is discussed in Section 5. Section 6 presents the Monte Carlo experiments. In Section 7, we
illustrate empirical usefulness of the the new approach with an application to the investment
equation model. Finally, Section 8 concludes the paper.
2 Model and identification
2.1 Model
We first introduce the model studied in this paper. Given a quantile τ ∈ (0, 1), we define
the following quantile regression (QR) model,
Yi = Xiβ0(τ) + Z>i δ0(τ) + εi(τ), (1)
where Yi is the scalar dependent variable of interest, Xi is a potentially-mismeasured scalar
continuous covariate,5 Zi is a L-vector of correctly-observed covariates, and εi(τ) is the inno-
5Below we describe the extension to the multi-dimensional case.
4
vation term whose τ -th quantile is zero conditional on (Xi, Zi). The structural parameters
of interest are θ0(τ) = (β0(τ), δ0(τ)>)>. In general, each β0(τ) and δ0(τ) will depend on
τ , but we assume τ to be fixed throughout the paper and suppress such a dependence for
notational simplicity.
Suppose (Yi, Xi, Zi) are i.i.d. random variables defined on a complete probability space
(Ω,F, P ). Denote the support of a random variable by supp(·). Define the population
objective function for the τ -th conditional quantile as
Q(β0, δ0) := E[ψτ (Yi −Xiβ0 − Z>i δ0)[Xi Zi]
]= 0, (2)
where ψτ (u) := (τ − Iu < 0) with the indicator function I·. When the true covariates
(X,Z) are observed, β0 and δ0 in (1) can be consistently estimated from the standard quantile
regression model with sample analog of Q(β, δ) in (2) as
Qn(β, δ) :=1
n
n∑i=1
ψτ (Yi −Xiβ − Z>i δ)[Xi Zi] = 0. (3)
The presence of the indicator function in equation (3) implies that the solution may not be
an exact zero. It is usual to write the estimator as a minimization problem, and use linear
programming to solve the optimization. Thus, the above moment condition is a slight abuse
of notation, but since everything else involving observed data is an estimating equation that
will have a zero, we will use the estimating equation nomenclature. For more details on
Z-estimator with non-smooth objective functions, see e.g., He and Shao (1996, 2000).
2.2 Measurement error bias and its solution
Under the assumption of perfectly-measured regressors, the solution of (3) can be shown
to produce consistent estimates of (β0, δ0). Nevertheless, it is commonly observed that re-
searchers have to use the regressor X measured with error. Using mismeasured X in the
standard QR estimation in (3) induces bias in the estimates of the coefficients of interest
(see, e.g., He and Liang, 2000). Thus, estimation of the standard QR model under measure-
ment errors (ME) leads to inconsistent estimates. To overcome this drawback we propose a
methodology that makes use of repeated measures. Both variables are mismeasured observ-
ables of the true covariate.
Suppose that true covariate X is unobservable due to ME. Instead, a researcher observes
5
two error-laden measurements which are noisy measures of X and defined as follows
X1i = Xi + U1i
X2i = Xi + U2i,
where U1i and U2i are ME. Therefore, the observed random variables are (Yi, X1i, X2i, Zi),
and one seeks to estimate the parameters (β0, δ0).
We show how to use information from the measures X1 and X2 to obtain consistent
estimates of the parameters of interest. For that purpose, it is useful to rewrite Q(β, δ) as a
function of the density function as well as (β, δ):
Q(β0, δ0, f0) := E[ψτ (Y −Xβ0 − Z>δ0)[X Z]]
=
∫ψτ (y − xβ0 − z>δ0)[x z] · fY XZ(y, x, z)dydxdz
=
∫ψτ (y − xβ0 − z>δ0)[x z] · fX|Y Z(x | y, z)fY Z(y, z)dydxdz
= E
[∫x
ψτ (Y − xβ0 − Z>δ0)[x Z] · fX|Y Z(x | Y, Z)dx
](4)
= 0,
where fY XZ(y, x, z) and fY Z(y, z) are the joint density of (Y,X,Z) and (Y, Z), respectively,
and where fX|Y Z(x | y, z) ≡ f0 is the conditional density of X given (Y, Z). By replacing
the outer expectation with its empirical counterpart, we write the sample analog of the
population objective function (4) as:
Qn(β, δ, f) :=1
n
n∑i=1
∫x
ψτ (Yi − xβ − Z>i δ)[x Zi] · fX|Y Z(x | Yi, Zi)dx (5)
= 0.
The integration in (5) makes the function continuous in its argument. The summand of
(5) is Ex[ψτ (Yi−xβ−Z>i δ)[x Zi] | Yi, Zi], the conditional mean of the original score function
given the observed Y and Z. Moreover, (5) is an unbiased estimating function, that is, has
mean zero, and will be the basis for constructing estimating equations to obtain consistent
estimates of the parameters of interest.
Therefore, one would solve the new estimating equation (5) to estimate the parameters
of interest. In empirical applications, however, the true conditional density fX|Y Z(x | y, z)is unknown and to implement the estimator (5) in practice one needs to replace it with
6
fX|Y Z(x | y, z), a consistent estimate of fX|Y Z(x | y, z). Thus, a (feasible) estimator would
first estimate fX|Y Z(x | y, z). The fitted density function from this step would be used to
estimate the coefficients of interest in a second step. Finally, with a consistent estimate
of the conditional density, (β0, δ0) can be consistently estimated. However, in general, the
conditional density is not identified due to the unobservability of the true X.
In this paper, we make use of the repeated measures, X1 and X2, and show that two
mismeasured covariates are sufficient to identify the conditional density in the presence
of ME on the covariate. We also propose a nonparametric estimator for the conditional
density without imposing assumptions on known distributions of the ME. In turn, this result
guarantees consistent estimation of the structural parameters of interest.
In a related model, Wei and Carroll (2009) make use of an iterative algorithm to obtain a
consistent estimator of the conditional density fX|Y X1(x | y, x1) in the presence of ME on X.6
They focus on model with one measurement of true X (here X1) and with no other observed
covariates Z for simplicity. Although their approach can be useful in some applications, it has
important technical challenges. First, to implement the estimator, one needs to estimate the
conditional density fX|Y X1(x | y, x1) which requires pre-specified parametric form of fX|X1(x |x1). This suffers from potentially serious model misspecification. Second, and related to the
first problem, there is a problem to solve the estimating equations, since estimating the
conditional density fX|Y X1(x | y, x1) involves estimation of the entire process β0(τ) over
quantiles τ . In other words, the estimating equations in Wei and Carroll (2009) need to be
solved jointly for all the τ ’s, which increases the dimensionality of the problem substantially
and makes implementation considerably difficult. This is reflected in the tractability of
inference for their method.
In the next section we propose a procedure that yields a consistent estimator of (β0, δ0)
in (5). We develop a method for QR with measurement errors, which relies on estimating
the conditional density function nonparametrically. The method is a two-step estimator,
where the first step estimates the density nonparametrically, and the second step employs a
standard weighted QR procedure. Before proceeding to estimation, we show an identification
result for the density function which is essential in the estimation. For expositional ease, we
use fX|Y Z(x | y, z) and f(x | y, z) synonymously.
6We note that their conditional density is slightly different from ours since there is a mismeasured covari-ate, X1, in their conditioning set.
7
2.3 Conditional density
As described above, f(x | y, z) is an important element for the identification of the pa-
rameters of interest in the QR with ME. This section describes the identification of the
conditional density function f(x | y, z) which is required to compute the two-step estimator.
The identification is based on the assumption that repeated measures of the true regressor
are observed. We state the following assumptions to obtain the main identification result.
Assumption A.I: (i) E[U1 | X,U2] = 0; (ii) U2 ⊥ (Y,X,Z).
Assumption A.II: (i) E[|X|] < ∞; (ii) E[|U1|] < ∞; (iii) |E[exp(iζX2)]| > 0 for any
finite ζ ∈ R.
Assumption A.III: (i) sup(x,y,z)∈supp(X,Y,Z) f(x | y, z) < ∞; (ii) f(x | y, z) is integrable
on R for each (y, z) ∈ supp(Y, Z).
Assumption A.I imposes restrictions on the repeated measures of X. Assumption A.I
(i) requires conditional mean zero of ME on X1, but allows dependence of the ME and
(X,U2). Assumption A.I (ii) requires that ME on X2 is independent of true X as well as
other variables. However, it does not necessarily require zero mean of U2. Thus, our setting
on the repeated measures can be useful for an example such that there is a drift or trend
in the mismeasured covariates. Assumption A.II imposes mild restrictions on the existence
of the first moments of X and U1, and nonvanishing characteristic function of X2. These
have been commonly assumed in the deconvolution literature (see, e.g., Fan (1991) and Fan
and Truong (1993)). Assumption A.III is trivially satisfied in commonly-used conditional
densities.
Let φ(ζ, y, z) ≡ E[eiζX | Y = y, Z = z] be the conditional characteristic function of X
given Y and Z. The following theorem presents the identification of f(x | y, z).
Theorem 1 Suppose Assumptions A.I–A.III hold. Then, for (x, y, z) ∈ supp(X, Y, Z),
f(x | y, z) =1
2π
∫φ(ζ, y, z) exp(−iζx)dζ, (6)
where for each real ζ,
φ(ζ, y, z) =E[eiζX2 | Y = y, Z = z]
E[eiζX2 ]exp
(∫ ζ
0
iE[X1eiξX2 ]
E[eiξX2 ]dξ
).
8
Proof. See Appendix.
The theorem implies that conditional density f(x | y, z) can be written as a function of
purely-observed variables. For this, we use useful properties of Fourier transform. Namely, we
write f(x | y, z) as the inverse Fourier transform of φ(ζ, y, z). This simplifies identification
since φ(ζ, y, z) is easily identified from Assumptions A.I–A.III by removing the ME, U1
and U2, in the frequency domains (ζ, ξ). It is worth noting that the identification result is
similar to Kotlarski (1967) who identifies the density of X from its repeated measurements by
assuming mutual independence of X, U1, and U2. Our approach rests on weaker assumptions
than their mutual independence, which is highlighted in condition A.I. In addition, the result
in Theorem 1 is related to that in Schennach (2004b) who identifies the conditional mean
of Y given X. Since we are interested in conditional quantile effects we need to identify the
conditional density of X, which requires a stronger independence condition (A.I(ii)).
We note that the current identification setup can be extended to a K-vector of X. In
this case, the integral in equation (5) becomes a multiple integral over the K-vector of X.
In addition, the integral in equation (6) should be an integral over a K-vector of ζ, while the
integral in the definition of φ(·) should be over a piecewise smooth path connecting 0 and ζ,
where ξ and ζ are K-dimensional vectors. The estimating equations are simply adjusted to
these corresponding changes.
3 Estimation
Given the identification condition in equation (6) of Theorem 1, we are able to estimate
the structural parameters of interest, (β0, δ0). We propose a semiparametric estimator that
involves two-step estimation. Implementation of the estimator is simple in practice. In the
first step, one estimates the nuisance parameter, the conditional density, using a nonpara-
metric method which requires no optimization. In the second step, by plugging in these
estimates, a general weighted quantile regression (QR) is performed.
Specifically in equation (6), we can rewrite Q(β, δ, f) as:
Q(β, δ, f) = E
[∫x
ψτ(Y − xβ − Z>δ
)[x Z] · fX|Y Z(x | Y, Z)dx
],
which does not depend on data on X. Thus, estimation of (β0, δ0) follows from solving a
9
feasible version of Qn(β, δ, f):
Qn(β, δ, f) =1
n
n∑i=1
∫x
ψτ(Yi − xβ − Z>i δ
)[x Zi] · fX|Y Z(x | Yi, Zi)dx,
where
fX|Y Z(x | Yi, Zi) =1
2π
∫ζ
φ(ζ, Yi, Zi) exp(−iζx)dζ,
and the only feature of this sample objective function that has not yet been presented is φ,
the estimate of φ, which is defined in the next section. In practice, as we discuss next, we
approximate integrals by sums, and end up with a double sum (on observations and on grid
values of X). Importantly on that representation is the fact that the estimates (β, δ) will be
obtained by a weighted QR, whose weights will be given by the estimate fX|Y Z .
3.1 Estimation of nuisance parameter
In this subsection we discuss the estimation of the nuisance parameter in the first step, i.e.,
the conditional density f(x | y, z). We propose a nonparametric method by adapting the
class of flat-top kernels of infinite order suggested by Politis and Romano (1999). Consider
the following assumption.
Assumption A.IV: The real-valued kernel x→ k(x) is measurable and symmetric with∫k(x)dx = 1, and its Fourier transform ξ → κ(ξ) is bounded, compactly supported, and
equal to one for |ξ| < ξ for some ξ > 0.
From Assumption A.IV, we allow for a kernel of the form (e.g., Li and Vuong (1998))
k(x) =sin(x)
πx,
with its Fourier transform such that
κ(hxζ) =
∫1
hxk( xhx
)exp(iζx)dx,
for a bandwidth hx. This flat-top kernel of infinite order has the property that its Fourier
transform is equal to one over the [−1, 1] interval and zero elsewhere, which guarantees
that the bias goes to zero faster than any power of the bandwidth. We note that the ill-
posed inverse problem occurs when one tries to invert a convolution operation. This is
10
true to our proposed estimator because it is divided by a quantity which converges to zero
as the frequency parameter goes to infinity by Riemann-Lebesgue lemma. By estimating
the numerator using the kernel whose Fourier transform is compactly supported, one can
guarantees that the ratio is under control. This is because that the numerator can decay
to zero before the denominator converges to zero. This compact support of the Fourier
transform of the kernel can be easily implemented by preserving most of the properties of
the original kernel. For instance, one can transform any given kernel k into a modified
kernel k with compact Fourier support by using a window function that is constant in the
neighborhood of the origin and vanishes beyond a given frequency.
The following theorem summarizes the result.
Theorem 2 Suppose Assumptions A.I–A.III hold, and let k(·) satisfy Assumption A.IV.
For (x, y, z) ∈ supp(X, Y, Z) and hx > 0, let
f(x | y, z;hx) ≡∫
1
hxk
(x− xhx
)f(x | y, z)dx.
Then we have
f(x | y, z;hx) =1
2π
∫κ(hxζ)φ(ζ, y, z) exp(−iζx)dζ.
Proof. See Appendix.
Let hn ≡ (hxn, h(2)n ) with h
(2)n ≡ (hyn, h
zn) be a set of smoothing parameters. Let E[·] denote
a sample average, i.e., 1n
∑ni=1[·]. Finally, we introduce a consistent nonparametric estimator
of f(x | y, z) motivated by Theorem 2.
Definition 2.3 The estimator of f(x | y, z) is defined as
f(x | y, z;hn) ≡ 1
2π
∫κ(hxnζ)φ(ζ, y, z, h(2)
n ) exp(−iζx)dζ, (7)
for hn → 0 as n→∞, where
φ(ζ, y, z, h(2)n ) ≡ E[eiζX2 | Y = y, Z = z]
E[eiζX2 ]exp
(∫ ζ
0
iE[X1eiξX2 ]
E[eiξX2 ]dξ
).
The above estimator is useful to compute the structural parameters of interest. Since it
has an explicit closed form, it requires no optimization routine unlike other likelihood-based
11
approaches. Estimation of conditional mean, E[eiζX2 | Y = y, Z = z], can be achieved via
any nonparametric method. For instance, one might use popular kernel estimation with
khn(·) ≡ h−1n k (·/hn) (e.g., Epanechnikov kernel) defined as
E[eiζX2 | Y = y, Z = z] ≡E[eiζX2khyn(Y − y)khzn(Z − z)]
E[khyn(Y − y)khzn(Z − z)]],
where khyn(Y − y) is a univariate kernel function and khzn(Z − z) is a L-multivariate kernel
function with corresponding bandwidths hyn and hzn, respectively.
3.2 Estimation of the structural parameters
This section describes the general estimator for QR models with ME. We propose a Z-
estimator that involves two-step estimation. Given the identification condition in equation
(5) and the estimator of the density function described in the previous section, we are able to
estimate the structural parameters of interest. We estimate the parameters, θ0 = (β0, δ>0 )>,
for a selected τ of interest, from the following two steps:
Step 1. Estimate f(xj | Yi, Zi;h) for each i-th observation and j-th grid as in equation
(5) where j ∈ J ≡ 1, 2, . . . ,m with m number of grids for approximating the numer-
ical integral. The choice of kernels and bandwidths are provided in Definition 2.3 above.
The integrals in equation (7) are performed using the fast Fourier transforms (FFT) algo-
rithm. Well-behaving performance of the algorithm is guaranteed by the smoothness of the
characteristic function φ(·) and the finiteness of the moments.
Step 2. To compute equation (5) in practice, we perform a numerical approximation to
the integral over x. We do this via translating the problem into a weighted QR problem. Let
x = (x1, x2, ..., xm) be a fine grid of possible xj values, akin to a set of abscissas in Gaussian
quadrature. For each τ , θ(τ) = (β(τ), δ(τ)>)> can be computed by solving
n∑i=1
m∑j=1
ψτ (Yi − xjβ − Z>i δ)[xj Zi] · f(xj | Yi, Zi;h) = 0, (8)
where f(xj | Yi, Zi;h) is obtained from Step 1. The weighted QR of Yi on xj and Zi with
corresponding weights f(xj | Yi, Zi;h) can be readily computed using the function called
“rq” in R package quantreg.
12
4 Asymptotic properties
This section investigates the large sample properties of the proposed two-step estimator.
While these methods seem similar to the ones discussed by Wei and Carroll (2009), the
nonparametric estimation of the conditional density function raises some new issues for the
asymptotic analysis of the estimator. First, we establish the asymptotic results for the
estimator of the conditional density function given in (7). Second, we establish consistency
and asymptotic normality of the second step estimator in (8).
4.1 Asymptotic properties of the density estimator
In this subsection we establish the asymptotic properties of the density function estimator
in equation (7). Let µ(ζ) ≡ E[eiζX ], ω1(ζ) ≡ E[eiζX2
], and χ(ζ, y, z) ≡ E[eiζX2 | Y = y, Z =
z]f(y, z). We denote the kernel density estimator of f(y, z) by f(y, z) ≡ E[khyn(Y −y)khzn(Z−z)]]. Let X ≡ R be the support of X, and Y ×Z be a compact set contained in the support
of (Y, Z), and Dζ denote a partial derivative with respect to ζ. We impose the following
assumptions.
Assumption B.I: (i) There exist constants C1 > 0 and γµ ≥ 0 such that
|Dζ lnµ(ζ)| =∣∣∣∣Dζµ(ζ)
µ(ζ)
∣∣∣∣ ≤ C1(1 + |ζ|)γµ ;
(ii) There exist constants Cφ > 0, αφ ≤ 0, νφ ≥ 0, and γφ ∈ R such that νφγφ ≥ 0 and
sup(y,z)∈Y×Z
|φ(ζ, y, z)| ≤ Cφ(1 + |ζ|)γφ exp(αφ|ζ|νφ),
and if αφ = 0, then γφ < −1;
(iii) There exist constants Cω > 0,αω ≤ 0, νω ≥ νφ ≥ 0, and γω ∈ R such that νωγω ≥ 0 and
min inf(y,z)∈Y×Z
|χ(ζ, y, z)|, |ω1(ζ)| ≥ Cω(1 + |ζ|)γω exp(αω|ζ|νω).
Assumption B.II: (i) E[|X1|2] <∞; (ii) E[|X1||X2|] <∞; (i) E[|X2|] <∞.
Assumption B.III: sup(y,z)∈Y×Z |f(y, z)− f(y, z)| = Op
((lnn)1/2
(nhyhz)1/2+∑
s∈y,z(hs)2)
.
Assumptions B.I–B.III are standard for nonparametric deconvolution estimators, since
the rates of convergence depend on the tails of the Fourier transforms (see, e.g., Fan (1991)
and Fan and Truong (1993)).
13
Assumption B.I concerns the smoothness of the (conditional) characteristic functions of
the true regressor X and the observed measure X2. The literature commonly adopts two
types of smoothness assumptions: ordinary and super smoothness. Ordinary smoothness
admits a Fourier transform whose tail decays to zero at a geometric rate |ζ|γ, γ < 0 whereas
super smoothness admits a Fourier transform whose tail decays to zero at an exponential
rate exp (α |ζ|γ), α < 0, γ > 0.7 Assumption B.I simultaneously imposes ordinary and super
smoothness conditions for notational ease. Assumption B.I (i) imposes a restriction on the
tail behavior of µ(ζ), the characteristic function of X. A term exp (α1 |ζ|ν1) is omitted with
merely a small loss of generality since lnµ (ζ) is indeed a power of ζ for all common ordinary
and super smoothness functions. Assumption B.I (ii) concerns φ(ζ, y, z), the conditional
characteristic function of X given Y = y and Z = z. It states that the rate of decay
of φ(ζ, y, z) is governed by the conditional density of X given Y = y and Z = z (i.e.,
f(X | Y = y, Z = z)), the parameter of interest in the first step. Assumption B.I (iii)
jointly imposes a restriction on χ(ζ, y, z), the conditional characteristic function of X2 given
Y = y and Z = z weighted by the joint density of (Y, Z) (i.e., E[eiζX2 | Y = y, Z = z]f(y, z)),
and ω1(ζ), the characteristic function of X2. Since ω1(ζ) = E[eiζX2 ] = E[eiζX ]E[eiζU2 ], the
smoothness of ω1(ζ) is determined by the combination of X and U2. As commonly imposed
in deconvolution-type estimators, χ(ζ, y, z) and ω1(ζ) need to be bounded below because
they appear in the linearization of the estimator. The restriction on χ(ζ, y, z) typically
requires the joint density of (Y,X2, Z) to be bounded away from zero on its support. This
is generally satisfied by common continuous distributions supported on the real line.
Assumption B.II imposes mild moment restrictions required for consistency. Assumption
B.III imposes a standard condition on nonparametric estimator of the joint density of f(y, z).
The next result establishes the asymptotic properties of the density function estimator.
Theorem 3 Let Assumptions A.I–A.IV and B.I–B.III hold. Then for (x, y, z) ∈ X ×Y × Z and h > 0 satisfying max(hyn)−1, (hzn)−1 = O (nη) and
(hxn)−1 = O((lnn)1/νω−η
)if νω 6= 0,
(hxn)−1 = O(n(1−20η)/2(γµ−γω)
)if νω = 0,
7The typical examples of ordinarily smooth functions are uniform, gamma, symmetric gamma, Laplace(or double exponential), and their mixtures. Normal, Cauchy, and their mixtures are super smooth functions.
14
for some η > 0, we have
sup(x,y,z)∈X×Y×Z
|f(x | y, z;h)− f(x | y, z)|
= O((
(hx)−1)γB exp
(αB((hx)−1
)νB))+Op
(n−1/2((hx)−1)γL exp
(αL((hx)−1
)νL)) ,with αB ≡ αφξ
νφ, νB ≡ νφ, γB ≡ γφ + 1, αL ≡ αφ1νφ=νω − αω, νL ≡ νω, γL ≡ 1 + γφ − γω,
and δL ≡ γµ + γφ − γω + 2.
Proof. See Appendix.
The theorem above establishes consistency and the uniform convergence rate of the pro-
posed estimator. The conditions on the bandwidths are imposed to guarantee that asymp-
totic behavior of the linear approximation of the expression f(x | y, z;h) − f(x | y, z) is
essentially determined by a variance term since a nonlinear remainder term is asymptoti-
cally negligible. The result also shows that convergence rate depends on the tail behaviors of
the associated quantities. For instance, consider the case that φ(ζ, y, z) is ordinarily smooth.
When χ(ζ, y, z) and ω1(ζ) in Assumption B.I is also ordinarily smooth (i.e., νω = 0), one
can choose smaller bandwidth so that resulting convergence rate of the estimator is faster
than when one of them is super smooth.
We note that the uniform convergence rate for the first stage is obtained over the compact
set Y ×Z contained in the support of (Y, Z). However, we highlight that this does not nec-
essarily imply that the random variables must be compactly supported (see, e.g., Schennach
(2008)). Alternatively, it is possible to obtain a uniform convergence rate and establish the
asymptotic properties over an expanding set (see, e.g., Newey (1994b), Andrews (1995), and
Hansen (2008)). We consider uniformity over expanding sets that diverge slowly to infinity.
Detailed derivation of the uniform convergence rate over expanding sets is provided in the
Online Supplementary Appendix, and the results in Theorem 3 remain the same.
4.2 Asymptotic properties of the two-step estimator
In this subsection, we derive the asymptotic properties of the second step estimator of
parameters of interest in (8). We establish consistency and asymptotic normality for a
given quantile τ ∈ (0, 1).
15
4.2.1 Consistency
Consistency is a desirable property for most estimators. We wish to establish consistency of
the estimator θ = (β, δ>)> defined in equation (8), where f , given in (7), is an estimator of
f0 = f(x | y, z).
First, notice that from the estimating equation in (5) we have
Qn(β, δ, f) =1
n
n∑i=1
∫ψ(Yi − xβ − Z>i δ)(x Zi) · f(x | Yi, Zi) dx,
and its expectation is
Q(β, δ, f) = E
∫ψ(Yi − xβ − Z>i δ)(x Zi) · f(x | Yi, Zi) dx.
The estimator θ = (β, δ>)> is obtained by equating Qn(β, δ, f) to zero. Note that Q(β, δ, f0) =
0 if and only if (β, δ>)> = (β0, δ>0 )> ∈ Θ.
Now we state the following sufficient conditions for the estimator to be consistent.
Assumption C.I: For any δ > 0, there exists εδ > 0 such that
inf ||θ−θ0||>δ ||E[∫xψτ (Y − xβ − Z>δ)[x Z] · f0 dx
]|| > εδ.
Assumption C.II: (i) E[|X|2+ε] <∞; (ii) E[‖Z‖2+ε] <∞.
Assumption C.III: E[|X| | Y, Z] <∞.
Assumption C.I is a standard identification condition in the QR literature (see, e.g.,
Chen, Linton, and Van Keilegom (2003) and Kato, Galvao, and Montes-Rojas (2012)). C.II
is also standard in QR, and requires the second moment of latent variable and well-measured
regressor to be finite, see e.g., Angrist, Chernozhukov, and Fernandez-Val (2006) and Koenker
(2005). C.III requires the first conditional moment of the latent variable to be finite.
The following theorem derives consistency of the proposed estimator, θ = (β, δ>)>.
Theorem 4 Suppose that θ0 is the unique solution of E[∫xψτ (Y − xβ0 − Z>δ0)[x Z] · f0 dx
]=
0, and assumptions C.I–C.III, and conditions of Theorem 3 are satisfied. Then, as n→∞
θp→ θ0.
16
Proof. See Appendix.
A uniform law of large numbers for the first step estimator, f(x | y, z), is a standard
requirement in two-step estimation literature to establish consistency of the second step
estimator; see, e.g., Newey and McFadden (1994). We note that, for the proposed estimator,
this requirement is satisfied by Theorem 3.
4.2.2 Asymptotic normality
Now we derive the limiting distribution of the two-step estimator in (8). Let || · ||∞ be
the supremum norm over the argument. We impose the following assumptions to establish
asymptotic normality.
Assumption G.I: The distribution function GY (y | X = x, Z = z) is absolutely contin-
uous, with continuous densities gY (y | X = x, Z = z) uniformly bounded away from 0 and
∞.
Assumption G.II: Let Γ1 := E[gY (Xβ0 + Z>δ0 | X,Z)(X,Z)(X,Z>)>] be positive
definite and Vn := var[Qn(θ0)]. There exists a nonnegative definite matrix V such that
Vn → V as n→∞.
Assumption G.III: ||f − f0||∞ = op(n−1/4).
Assumption G.IV: The function f(x | y, z) ∈ F is a uniformly smooth function with
the uniform smoothness order δ > dim(x, y, z)/2, and Lipschitz.
Conditions G.I and G.II are standard in the QR literature; see, e.g., Koenker (2005).
Condition G.III imposes that the estimator of the nuisance parameter converges at a
rate faster than n−1/4. A similar condition appears in condition (2.4) in Theorem 2 of Chen,
Linton, and Van Keilegom (2003). We note that Assumption G.III is verifiable for particular
examples through Theorem 3. As shown in Theorem 3, the convergence rate is controlled
by the smoothness of quantities such as φ(ζ, y, z), χ(ζ, y, z), and ω1(ζ). Thus, the rate of
convergence depends on the possible combinations of smoothness of these quantities. For
instance, if φ(ζ, y, z) is ordinarily smooth and if χ(ζ, y, z) and ω1(ζ) are super smooth, a
convergence rate of the form (lnn)−υ for some υ > 0 is achieved. This case illustrates a very
17
slow rate of convergence. On the other hand, a faster convergence rate, n−υ for some υ > 0,
which satisfies Assumption G.III, can be achieved when φ(ζ, y, z) is also super smooth. In
addition, if all three quantities, φ(ζ, y, z), χ(ζ, y, z), and ω1(ζ), are ordinarily smooth, the
slow convergence problem is easily avoided. To illustrate this in a specific example, consider
the case that φ(ζ, y, z) and ω1(ζ) are supersmooth and both follow a normal distribution with
variance σ2. Then, νφ = νω 6= 0 in Assumption B.I. From Theorem 3, one can show that
the convergence rate is nαB
2(αL−αB)+η
for some small η > 0. Because the normal distribution
has the characteristic function whose the tail is of the form exp (−(σ2/2)|ζ|2), we have
αφ = αω = −σ2/2 and νφ = νω = 2. As a result, the convergence rate becomes n−12
+η
which is fast enough to achieve n−14 in Assumption G.III.8 Therefore, the required rate of
convergence in G.III is attainable under proper combination of smoothness conditions (see,
e.g., Fan (1991), Fan and Truong (1993), and Schennach (2004b)).
Assumption G.IV is a smoothness condition on the conditional density function. A
similar assumption appears in Newey (1994a) and Chernozhukov and Hansen (2006). Con-
dition G.IV allows for a wide variety of nonparametric estimators, including the estimator
described in Section 4.1 above. The role of G.IV is to allow for an estimated density, which
is the weight in the estimating equation in (8). This condition together with Theorem 3
ensures that the weight is asymptotically well behaved to obtain the limiting distribution of
the estimator of the structural parameters.
Asymptotic normality of the estimator, θ = (β, δ>)>, is established in the following result.
Theorem 5 Suppose that θ0 is the unique solution of E[∫xψτ (Y − xβ0 − Z>δ0)[x Z] · f0 dx
]=
0, and assumptions C.I–C.III, G.I–G.IV, and conditions of Theorem 3 are satisfied. Then,
as n→∞
√n(θ − θ0)
d→ N(0,Λ)
for some positive definite matrix Λ = Γ−11 V Γ−1
1 .
Proof. See Appendix.
8Similarly, the convergence rate satisfying Assumption G.III can be proved for the case of an ordinarysmooth distribution. For instance, consider the Laplace distribution with mean µ and variance σ2. The tailof the characteristic function of a Laplace distribution is of the form |ζ|−2. To establish the results, first,the convergence rate can be pinned down, with νω = 0. Second, the condition can be derived by plugging inγφ = γω = −2.
18
It is worth noting that the density estimator in equation (7) requires multivariate kernel
functions when X and Z are multi-dimensional. This generates slower rate of convergence
of the density estimator. As in the conventional kernel density estimation, it would be
difficult to resolve the problem associated with large dimensionality of X. On the other
hand, one could use semiparametric single index models to mitigate the problem from large
dimensionality of Z. The convergence rate in the second stage would not be affected by the
multi-dimensionality of X and Z as long as Assumption G.III still holds.
5 Inference
In this section, we turn our attention to inference in the quantile regression (QR) with mea-
surement errors (ME) model. Important questions posed in the econometric and statistical
literatures concern the nature of the impact of a policy intervention or treatment on the
outcome distributions of interest; for example, whether a policy exerts a significant effect,
a constant versus heterogeneous effect, or a non-decreasing effect. It is possible to formu-
late a wide variety of tests using variants of the proposed method, from simple tests on a
single QR coefficient to joint tests involving many covariates simultaneously. We suggest a
bootstrap-based inference procedure to test general linear hypotheses.
5.1 Test statistic
General hypotheses on the vector θ(τ) = (β(τ), δ(τ)>)> can be accommodated by standard
tests. The proposed statistic and the associated limiting theory provide a natural foundation
for the hypothesis Rθ(τ) = r. Consider the following null hypothesis for a given τ of interest
H0 : Rθ(τ)− r = 0,
where R is a full-rank matrix imposing q number of restrictions on the parameters, and r is
a column vector of q elements.
The following are examples of hypotheses that may be considered in this framework.
Example 1 (No effect of the mismeasured variable). For a given τ , if there is no mis-
measured variable effect in the model, then under H0 : β(τ) = 0, θ(τ) = (β(τ), δ(τ)>)>,
R = [1 0] and r = 0.
19
Example 2 (Location shifts). The hypotheses of location shifts for β(τ) and δ(τ) can
be accommodated in the model. For the first case, H0 : β(τ) = β, so θ(τ) = (β(τ), δ(τ)>)>,
R = [1 0] and r = β. For the latter case, H0 : δ(τ) = δ, so that R = [0 1] and r = δ.
Practical implementation of testing procedures for the null hypothesis can be carried out
based on the following test statistic
Wn(τ) = Rθ(τ)− r. (9)
From Theorem 5, at given τ , and under the null hypothesis, it follows√n(Rθ(τ)− r) d→
N(0, RΛR>). If we are interested in testing H0, a Chi-square test could be conducted based
on the statistic in equation (9). However, to carry out practical inference procedures, even for
a fixed quantile of interest, and construct a Wald statistic, one would need to first estimate
Λ consistently, and consequently nuisance parameters which depend on both the unknown θ0
and f0 in a complicated way. The estimation of Λ is potentially difficult because it contains
additional terms from the effect of θ on the objective function indirectly through f0. An
alternative method is to use the statistic Wn directly and the bootstrap to compute critical
values and also form confidence regions. Therefore, to make practical inference we suggest
the use of bootstrap techniques to approximate the limiting distribution.
5.2 Implementation of testing procedures
Practical implementation of the proposed tests is simple. To test H0, one needs to compute
the test statistics Wn(τ) for a given τ of interest. The steps for implementing the tests are
as following:
First, the estimates of θ(τ) are computed by solving the problem in equation (8). Second,
Wn(τ) is calculated by centralizing θ(τ) at r. Third, after obtaining the test statistic, it is
necessary to compute the critical values. We propose the following scheme. Take B as a
large integer. For each b = 1, . . . , B:
(i) Obtain the resampled data (Y bi , X
b1i, X
b2i, Z
bi ), i = 1, . . . , n.
(ii) Estimate θb(τ) and set W bn(τ) := R(θb(τ)− θ(τ)).
(iii) Go back to step (i) and repeat the procedure B times.
20
Let cB1−α denote the empirical (1 − α)-quantile of the simulated sample W 1n , . . . ,W
Bn ,
where α ∈ (0, 1) is the nominal size. We reject the null hypothesis if Wn is larger than cB1−α.
Confidence intervals for the parameters of interest can be easily constructed by inverting the
tests described above.
We provide a formal justification of the simulation method. Let f ∗ be the bootstrap
version of the estimator f . Consider the following condition.
Assumption G.IB: With P ∗-probability tending to one, ||f ∗ − f ||∞ = op∗(n−1/4).
Condition G.IB is a condition on the density estimation with the bootstrapped sample
and could be verified under the same assumptions implying condition G.III.
Lemma 1 Under Assumptions C.I–C.III, G.I–G.IV, G.IB with “in probability” replaced
by “almost surely”, and conditions of Theorem 3, the bootstrap estimator of the θ0 is√n-
consistent and√n(θ∗ − θ) d→ N(0,Λ) in P ∗-probability.
Proof. See Appendix.
Lemma 1 establishes the consistency of the bootstrap procedure. It is important to
highlight the connection between this result and the previous section. Lemma 1 shows that
the limiting distribution of the bootstrap estimator is the same as that of Theorem 5, and
hence the above resample scheme is able to mimic the asymptotic distribution of interest.
Thus, computation of critical values and practical inference are feasible.
6 Monte Carlo simulations
6.1 Monte Carlo design
In this section, we describe the design of a small simulation experiment that has been con-
ducted to assess the finite-sample performance of the proposed two-step estimator discussed
in the previous sections. We consider the following model as a data generating process:
Yi = β1 + β2Xi + εi,
21
where ε ∼ N(0, 0.25), and β1 and β2 are the parameters of interest.9 We set them as
(β1, β2) = (0.5,−0.5). The true variable X is not observed by the researcher, and we use
additive forms of measurement errors (ME) to generate the mismeasured X as follows:
X1i = Xi + U1i,
X2i = Xi + U2i,
where we generate X ∼ N(0, 1), and we use a Laplace distribution density as L(0, 0.25)
to generate both measurement errors, U1 and U2. We compute and report results for the
proposed QR estimator. For comparison, we compute the density fX|Y using different pro-
cedures. First, we construct our proposed estimator to control for ME, using the variables
(Y,X1, X2), where the density is estimated by the Fourier Estimator. Second, we use the
variables (Y,X) to construct an “infeasible” kernel estimator of fX|Y in the first step. Fi-
nally, the variables (Y,X1) are used for “naive” kernel estimator of fX|Y which still suffers
from ME. For all estimators, we consider fourth-order Gaussian kernel. We approximate the
inner summation in equation (8) using Gauss-Hermite quadrature which is useful for the
indefinite integral. We perform 1000 simulations with n = 500 and n = 1000. We scan a
set of bandwidths for X and Y in order to find empirical optimal bandwidths in terms of
minimizing mean squared error.
6.2 Monte Carlo results
We report results for the following statistics of the coefficient β2: bias (B), standard deviation
(SD), and mean squared error (MSE). First of all, in order to illustrate the problem of ME
in practice, we consider a model estimation where the researcher ignores the ME problem
and performs a parametric median regression of Y on X1 without correcting for the ME in
X. This simple regression provides the bias of 0.1686, the standard error 0.02655 and the
MSE of 0.02586. These results highlight the importance of correcting for the ME problem.
Now we discuss and present the results for the nonparametric estimators with(out) correc-
tion of ME. Tables 1–3 report finite-sample performance of three different two-step estimators
at the median: (i) our proposed estimator (Fourier estimator); (ii) infeasible kernel estima-
tor; (iii) naive kernel estimator. These results are for n = 500, but the results for n = 1000
are similar. At the bottom of each table, B, SD, and MSE from optimal bandwidth are
9For simplicity, the perfectly-observed covariate Z is absent here.
22
reported. In Table 4 we vary the quantiles and present results for the different estimators
across different deciles with n = 1000.
Tables 1 - 3 Simulation Results
[ABOUT HERE]
Table 1 shows that the proposed estimator is effective in reducing the bias when true X
is measured with errors and repeated measures of the mismeasured covariate are available.
These results are comparable to the infeasible kernel estimator in Table 2. On the other
hand, the results in Table 3 from the naive kernel estimator ignoring ME in X show much
larger bias over all selected bandwidths. Therefore, our estimator outperforms the naive
kernel estimator in terms of both bias and MSE. The minimum MSE for our proposed
method is 0.00674 while the minimum MSE from the naive kernel estimator is 0.01008. This
result confirms that the methods proposed in this paper are beneficial in finite samples when
repeated measures of the mismeasured regressor are available to the researcher.
Table 4 reports finite-sample performance of three estimators over various quantiles with
n = 1000. For simplicity, we use the optimal bandwidths obtained from the simulation
results above. The results confirm that our proposed estimator performs well over different
level of quantiles.
Table 4 - Simulation Results
[ABOUT HERE]
7 Empirical application
This section illustrates the usefulness of the proposed methods with an empirical exam-
ple. We study Fazzari, Hubbard, and Petersen’s (1988) investment equation model, where
a firm’s investment is regressed on a proxy for investment demand (Tobin’s q) and cash
flows. Theory suggests that a correct measure for firm’s investment demand is marginal
Tobin’s q. This measure stems from the relationship that equates firm’s marginal benefit
with marginal cost in equilibrium. Nevertheless, the presence of financial constraints may
distort this relationship by introducing other factors that influence the firm’s optimal in-
vestment level. More specifically, financial constraints create a wedge between internal and
23
external funding that invalidates theoretical arguments in the spirit of Modigliani and Miller
(1958) capital structure irrelevance proposition. In this scenario firm’s cash flows reflect the
presence of financial constraints and may contain information relevant for explaining the
differences in investment demand across firms. Following Fazzari, Hubbard, and Petersen
(1988), investment-cash flow sensitivities became a standard metric in the literature that
examines the impact of financing imperfections on corporate investment (Stein (2003)).
Fazzari, Hubbard, and Petersen (1988) develop estimators for the investment equation,
where a firm’s investment is the dependent variable, and independent variables are a proxy
for investment demand (Tobin’s q) and cash flow. Nevertheless, empirical models proposed
to assess the sensitivity of investment demand to firm characteristics are usually fraught
with the presence of measurement error (see e.g., Erickson and Whited (2000), and Almeida,
Campello, and Galvao (2010)). A typical example is the use of the average Tobin’s q for
describing the investment-capital ratio. Theory suggests that firm’s investment demand is
captured by marginal q, but this quantity is unobservable and researchers use instead its
measurable proxy, average q. Since the two variables are not the same, a measurement
problem naturally arises (Hayashi (1982), Poterba (1988)). The introduction of error when
measuring these variables causes bias in least squares estimators. This bias can lead to
erroneous interpretations of the effect of firm characteristics on investment demand. Thus,
Poterba (1988) introduces the idea that errors in measuring Tobin’s q may be responsible for
the observed investment-cash flow sensitivities. If cash flow were correlated with investment
opportunities not well measured by the proxy for the marginal Tobin’s q, investment-cash
flow sensitivities could arise. This argument minimizes the role of financing constraints in
determining the relationship between firms’ cash flows and investment.
A number of studies intend to control for the measurement error in Tobin’s q, while
analyzing the relationship between investment and cash flow. A common approach has been
to use the standard instrumental variables together with ordinary least squares (OLS) and
generalized method of moments (GMM) estimators to correct the measurement error problem
(see, e.g., Almeida, Campello, and Galvao (2010) and Lewellen and Lewellen (2014)). It has
been common to use lags of the observed Tobin’s q as instruments, by assuming that they
are uncorrelated with the error term in the regression equation. Almeida, Campello, and
Galvao (2010) use lagged Tobin’s q as instruments to resolve the measurement error problem.
They estimate standard conditional average models and the results show the importance of
both Tobin’s q and cash flow in investment equation models.
24
A related method in the literature is to use different proxies to the marginal Tobin’s q
as alternative instruments. Cummins, Hasset, and Oliner (2006) estimate the investment
models using two sources of firm-level data. They construct an analyst-based measure of
average q as well as a market-based measure of average q, and compare estimation results
based on each measure to assess the robustness of the findings in the literature. Estimating
conditional mean models, they find no evidence that the cash flow is a statistically significant
determinant of investment in U.S. companies.
More recently, Agca and Mozumdar (2016) propose using the two measures of q contem-
plated in Cummins, Hasset, and Oliner (2006) as instruments for controlling for measurement
errors in marginal Tobin’s q. Their approach consists of using the lags of one of the vari-
ables as instruments for the other in the linear error-in-variables model. This leads them to
conclude that cash flow is a statistically significant cause of investment and investment-cash
flow sensitivity is higher for financially constrained firms.10
We use quantile regression (QR) methods in the investment equation model. The pro-
posed QR estimator is designed to correct for the measurement error problem while exploring
heterogeneous covariate effects across the conditional quantile functions. The QR model has
two advantages. First, our QR method proposes a solution to the measurement error problem
in Tobin’s q by using repeated measures of this variable. Second, we accommodate possible
heterogeneity on the Tobin’s q and cash flow in the conditional distribution of investment.
Indeed, this heterogeneity is not revealed by conventional least squares procedures.
The objective is to estimate the following conditional quantile function:
QIKi(τ |qi, CFKi) = α(τ) + β(τ)qi + δ(τ)CFKi, (10)
where the quantity IK denotes the ratio of investment, I, to capital stock, K; CFK is the
ratio of cash flow, CF , to capital stock; and q is the observed measure of average q, which
is a measure of the (latent) true marginal Tobin’s q.
We use a data set taken from Cummins, Hasset, and Oliner (2006). In this data there
are two measures of average q. The first measure is constructed using the standard equity
prices (qe). The second proxy for the firm’s intrinsic value is based on analysts’ earnings
expectations (q). Thus, the data on investment, the capital stock, the market-based measure
of average q, and cash flow are standard from Compustat, while the data on expected earnings
10Erickson and Whited (2000) suggest another alternative solution which relies on the high-order moments.
25
are from I/B/E/S International Inc. The construction of qe and q are detailed in Appendix
B.2 of Cummins, Hasset, and Oliner (2006). The sample consists of 11,431 observations over
the 1982-1999 period. For each firm, in a given year, we have information on investment, cash
flow, and two measures of Tobin’s q. For practical estimation we standardize all variables by
subtracting the mean and dividing by the corresponding standard deviation of each variable.
The summary statistics (for the standardized data) are described in Table 5.
Table 5 - Summary Statistics
[ABOUT HERE]
We present results for the estimates, and corresponding 95% confidence intervals, using
our proposed methods (ME-QR). For completeness, we also provide results from the standard
QR and OLS, as well as the instrumental variables QR (IV-QR) and OLS (IV-OLS). The
QR estimation strategy follows Koenker and Bassett (1978), and does not correct for ME.
The IV-QR estimates use the method developed by Chernozhukov and Hansen (2006). For
the IV-QR and IV-OLS, we use the variable q as an instrument for qe. The IV strategy for
QR is based on the assumption that the q is strongly related to the qe (i.e., IV) but the IV is
independent of unobservable factors of investment as well as ME. We conjecture that the IV
approach delivers different estimates than our proposed ME estimator since both procedures
rely on different set of conditions. Our method is particularly useful for the data set where it
is unlikely that the IV is independent of the regression error which contains ME on q, since
the IV is also potentially mismeasured.11
To identify the effects of Tobins’ q and cash flow we work with the conditions given in
Section 2. Assumption A.I (i) requires the ME on the q (U1) to have zero conditional mean
given the true Tobin’s q (X) and the ME of qe (U2). This excludes correlation between
U1 and X or between U1 and U2. Hence, a nonclassical reporting error assumption (e.g.,
Hu and Sasaki (2015b)) is not allowed in our setting.12 Nevertheless, Assumption A.I (i)
allows nonlinear dependence of the ME (U1) and the true Tobin’s q (X) and the ME of qe
(U2). This assumption is particularly useful when the miscalculation is changing over the
11We note that the independence condition required by the IV method implies independence between MEon q and qe. Because the analysts’ earnings expectations (q) are dependent on the standard equity prices(qe), their ME are likely to be related. However, our approach requires a weaker assumption of conditionalmean zero as in Assumption A.I (i).
12We acknowledge that although our conditions partially relax the classical ME assumptions, it would beinteresting to extend the current results to a fully nonclassical ME model in QR.
26
0.2 0.4 0.6 0.8
0.1
0.2
0.3
0.4
0.5
0.6
quantiles
coefficients
Tobin's q
QROLS
0.2 0.4 0.6 0.8
-0.1
0.0
0.1
0.2
0.3
quantiles
coefficients
Cash Flow
QROLS
Figure 1: Quantile Regression and Ordinary Least Squares. Left box plots the coefficients on
Tobins’s q. Right box plots the coefficients on Cash Flow.
level of Tobin’s q. Assumption A.I (ii) requires that the ME of qe (U2) to be independent of
true Tobin’s q (X) as well as other variables (Y, Z). This does not necessarily require zero
mean of U2. Hence, this condition allows a possibility that, on average, analysts earnings
expectations on the Tobins’ q could be either larger or smaller than the standard equity
prices.
The results for both mean and quantile estimates of the sensitivity of investment to
Tobin’s q and cash flow are presented in the left panels and right panels, respectively, of
Figures 1-3. Figure 1 presents the results for QR and OLS. Figure 2 displays the results for
IV-QR and IV-OLS. Finally, Figure 3 collects the results for ME-QR. All figures contain point
estimates as well as the corresponding 95% pointwise confidence bands. In the nonparametric
estimation step in our proposed estimator, for the choice of the bandwidth (hxn in equation
(7)), we use an informal rule where the estimates are not sensitive to marginal changes in the
neighborhood of the optimal bandwidth. Other bandwidths are chosen based on Silverman’s
rule of thumb. The number of bootstrap replications to construct the confidence intervals is
250.
The left panels of Figures 1-3 show significant positive estimates for Tobin’s q estimates.
All the quantile estimates in Figures 1-3 display evidence of positive and increasing effect
of the investment demand, Tobin’s q, on the investment spending, across quantiles of the
27
0.2 0.4 0.6 0.8
0.1
0.2
0.3
0.4
0.5
0.6
quantiles
coefficients
Tobin's q
IV-QRIV-OLS
0.2 0.4 0.6 0.8
-0.1
0.0
0.1
0.2
0.3
quantiles
coefficients
Cash Flow
IV-QRIV-OLS
Figure 2: Quantile Regression Instrumental Variables and Two Stage Least Squares. Left box
plots the coefficients on Tobins’s q. Right box plots the coefficients on Cash Flow.
conditional distribution of investment. This result documents empirical evidence of strong
heterogeneity in the effects of investment across the distribution of investment. Relative to
the QR in Figure 1, the IV-QR in Figure 2 shows virtually no difference in the importance
of Tobin’s q. This result might be interpreted as the IV approach being ineffective. On
the other hand, the ME-QR in Figure 3 shows different results. We note that, according
to the theory of investment equation, see, e.g., Fazzari, Hubbard, and Petersen (1988), and
previous empirical studies, e.g., Kaplan and Zingales (1997), on average, the investment
demand has positive effect on the investment spending. Thus, this result has been verified
in our ME-QR estimations, although after correcting for ME the estimates decrease relative
to the standard QR. Moreover, we find evidence that firms have heterogeneous responses to
changes of investment demand across the conditional distribution of investment.
The results regarding the sensitivity of investment to cash flow are presented in the right
panels of the Figures 1-3. The right panel of Figure 1 presents the results for standard QR and
shows decreasing point estimates of cash flow effects on investment over the quantiles. The
mean regression estimate is represented by the horizontal straight line, which shows a positive
effect close to 0.1, and is statistically different from zero at usual levels of significance. The
results for IV-QR in Figure 2 are virtually the same as those in Figure 1. Thus, again these
results might be interpreted as the instruments being invalid to resolve the measurement
28
0.2 0.4 0.6 0.8
0.00
0.05
0.10
0.15
0.20
quantiles
coefficients
Tobin's q
ME-QR
0.2 0.4 0.6 0.8
0.15
0.20
0.25
0.30
0.35
0.40
quantiles
coefficients
Cash Flow
ME-QR
Figure 3: Quantile Regression Measurement Error. Left box plots the coefficients on Tobins’s q.
Right box plots the coefficients on Cash Flow.
error problem.
The right panel results in Figure 3 for the ME-QR estimates exhibit a distinct inverted
U-shape, implying larger cash flow sensitivity for those firms in lower quantiles, that is,
the lower part of the conditional distribution of investment. In particular, the ME-QR
results show positive and monotonically-increasing cash flow sensitivity at low quantiles of
the distribution; estimated coefficient is increasing from 0.30 to 0.36 up to approximately
45th quantile. The cash flow sensitivity starts to decease for larger quantiles, and becomes
relatively smaller at the very top of the conditional investment distribution. These findings
uncover several important features. First, they document important heterogeneity on the
response of investment spending to cash flow along the conditional quantile function. Firms
in different quantiles of the conditional distribution of investment respond differently to
marginal changes in the cash flow. Second, Figure 3 shows evidence strong heterogeneity
with an inverted U-shape. This result could be interpreted at the light of the effects of
financial constraints on corporate policies as in Almeida, Campello, and Weisbach (2004).
The investment spending is more sensitive to cash flow (large magnitude of the coefficients)
for firms at lower quantiles. The large coefficients for lower quantiles is an intuitive result.
The cash flow coefficient captures the potential sensitivity of investment to fluctuations in
available internal finance–after investment opportunities. Thus, the results show evidence
29
that, on the one hand, firms at low levels of investment spending, and thus likely financially
constrained, are in fact more exposed to and dependent on fluctuations in internal finance.
On the other had, firms at upper quantiles, which have higher levels of investment spending
and are financially unconstrained, are less exposed to and dependent on fluctuations in
internal finance. Third, this heterogeneous effect across quantiles also indicates that, for a
fixed level of q, the variability of the investment spending across the conditional distribution
increases as the level of cash flow increases. Intuitively, firms with larger cash flow are
entitled to invest in a larger range in contrast to the firms with smaller cash flow.
Overall, the application illustrates that QR method is an important tool to study in-
vestment equation models. It allows us to study the impacts of Tobins’ q and cash flow
at different quantiles of the conditional distribution of investment. The empirical results
document findings that larger cash flow sensitivity occur at the lower part of the invest-
ment distribution, showing evidence that these firms are more exposed to and dependent on
fluctuations in internal finance.
8 Conclusion
This paper develops estimation and inference for quantile regression models with measure-
ment errors. We propose a semiparametric two-step estimator assuming availability of re-
peated measures of the true covariate. The asymptotic properties of the estimator are
established. We also develop statistical inference procedures and establish the validity of a
bootstrap approach to implement the methods in practice. Monte Carlo simulations assess
the finite-sample performance of the proposed methods and show that the proposed methods
have good finite-sample performance. We apply the methods to an empirical application to
the investment equation model. The results document strong heterogeneity in the sensitivity
of investment to Tobin’s q and cash flow across the conditional distribution of investment,
and illustrate that our methods are useful in empirical models where measurement error is
an important issue..
Many issues remain to be investigated. In this paper the quantile of interest is fixed,
τ ∈ (0, 1). The extension of the results to the uniform case is desirable and important
for uniform inference over the entire conditional quantile function over τ . Such extension
would require generalizing the current results. One of the key steps for the derivations
would be to establish stochastic equicontinuity of the appropriate centralized scores uniformly
30
over the quantiles. In addition, the analysis of the quantile regression with nonclassical
measurement error is also a critical direction for future research. There are many potential
applications for the proposed methods. Examples as employer-employee matched sample
for wages, matched federal agency and firm-level data, bidder’s private signals in auction,
earnings data in earnings dynamics would appear to be a natural laboratory for further
development of quantile regression models with repeated measures.
31
A Mathematical Appendix
Proof of Theorem 1. Given Assumption A.III, we have
φ(ζ, y, z) ≡ E[eiζX | Y = y, Z = z]
=
∫E[eiζX | Y = y, Z = z,X = x]f(x | y, z)dx
=
∫f(x | y, z)eiζxdx
where the last expression is the Fourier transform of f(x | y, z). Note that for (x, y, z) ∈ supp(X,Y, Z),
1
2π
∫φ(ζ, y, z) exp(−iζx)dζ
is the inverse Fourier transform of φ(ζ, y, z). Thus we have
f(x | y, z) =1
2π
∫φ(ζ, y, z) exp(−iζx)dζ.
We now need to show that
φ(ζ, Y, Z) =E[eiζX2 | Y,Z]
E[eiζX2 ]exp
(∫ ζ
0
iE[X1eiξX2 ]
E[eiξX2 ]dξ
).
From Assumptions A.I–A.II
Dξ ln(E[eiξX ]) =iE[XeiξX ]
E[eiξX ]
=iE[XeiξX ]E[eiξU2 ]
E[eiξX ]E[eiξU2 ]
=iE[Xeiξ(X+U2)]
E[eiξ(X+U2)]
=iE[Xeiξ(X+U2)] + iE[E(U1 | X,U2)eiξ(X+U2)]
E[eiξX2 ]
=iE[Xeiξ(X+U2)] + iE[E(U1e
iξ(X+U2) | X,U2)]
E[eiξX2 ]
=iE[Xeiξ(X+U2)] + iE[U1e
iξ(X+U2)]
E[eiξX2 ]
=iE[X1e
iξX2 ]
E[eiξX2 ].
32
Therefore, for each real ζ,
φ(ζ, Y, Z) ≡ E[eiζX | Y,Z]
=E[eiζX | Y, Z]E[eiζU2 ]
E[eiζX ]E[eiζU2 ]E[eiζX ]
=E[eiζX2 | Y, Z]
E[eiζX2 ]E[eiζX ]
=E[eiζX2 | Y, Z]
E[eiζX2 ]exp
(ln(E[eiζX ])− ln 1
)=
E[eiζX2 | Y, Z]
E[eiζX2 ]exp
(∫ ζ
0Dξ ln(E[eiξX ])dξ
)=
E[eiζX2 | Y,Z]
E[eiζX2 ]exp
(∫ ζ
0
iE[X1eiξX2 ]
E[eiξX2 ]dξ
),
where the third equality is obtained by U2 ⊥ (Y,X,Z).
Proof of Theorem 2. Note that the inverse Fourier Transform of κ(hxζ) is k(x/hx)/hx, andthe inverse Fourier Transform of E[eiζX | Y = y, Z = z] is f(x | y, z) by equation (13). Also notethat from the convolution theorem, the inverse Fourier Transform of the product of κ(hxζ) andE[eiζX | Y = y, Z = z] is the convolution between the inverse Fourier Transform of κ(hxζ) andthe inverse Fourier Transform of E[eiζX | Y = y, Z = z]. Because Assumptions A.II (iii)–A.IVguarantee the existence of f(x | y, z;hx), we conclude that
f(x | y, z;hx) ≡∫
1
hxk
(x− xhx
)f(x | y, z)dx
=1
2π
∫κ(hxζ)E[eiζX | Y = y, Z = z] exp(−iζx)dζ
=1
2π
∫κ(hxζ)φ(ζ, y, z) exp(−iζx)dζ.
The following lemma is helpful to derive the result given in Theorem 3. Recall that X ≡ R isthe support of X, and Y × Z is a compact set contained in the support of (Y, Z).
Lemma A.1 For (x, y, z) ∈ X × Y × Z and hn > 0,
f(x | y, z;h)− f(x | y, z) = B(x, y, z;hx) + L(x, y, z;h) +R(x, y, z;h),
where B(x, y, z;hx) is a nonrandom “bias term” defined as
B(x, y, z;hx) ≡ f(x | y, z;hx)− f(x | y, z);
L(x, y, z;h) is a “variance term” admitting the linear representation
L(x, y, z;h) ≡ f(x | y, z;h)− f(x | y, z, hx) = E [`(x, y, z, h;Y,X1, X2, Z)]
33
where `(x, y, z, h;Y,X1, X2, Z) is defined in the proof of the lemma, and R(x, y, z;h) is a “remainderterm,”
R(x, y, z;h) ≡ f(x | y, z;h)− f(x | y, z;h).
Proof of Lemma A.1. Let ωA(ζ) ≡ E[AeiζX2
]where A = 1, X1 and
ω(ζ, y, z) ≡ E[eiζX2 | Y = y, Z = z
]=
∫eiζx2f(x2 | y, z)dx2
=χ(ζ, y, z)
f(y, z),
where χ(ζ, y, z) ≡∫eiζx2f(x2, y, z)dx2. Also let ωA(ζ) ≡ E
[AeiζX2
]and δωA(ζ) ≡ ωA(ζ)− ωA(ζ),
and let
ω(ζ, y, z) ≡ E[eiζX2 | Y = y, Z = z
]≡ χ(ζ, y, z)/f(y, z)
where
χ(ζ, y, z) =1
n
n∑j=1
eiζX2jkhy(Yj − y)khz(Zj − z) = E[eiζX2khy(Y − y)khz(Z − z)
]f(y, z) =
1
n
n∑j=1
khy(Yj − y)khz(Zj − z) = E [khy(Y − y)khz(Z − z)] ,
and δχ(ζ, y, z) ≡ χ(ζ, y, z) − χ(ζ, y, z) and δf(y, z) ≡ f(y, z) − f(y, z). We use the followingrepresentation
ωX1(ζ)
ω1(ζ)=ωX1(ζ) + δωX1(ζ)
ω1(ζ) + δω1(ζ)= qX1(ζ) + δqX1(ζ) (11)
where qX1(ζ) = ωX1(ζ)/ω1(ζ) and where δqX1(ζ) can be written as either
δqX1(ζ) =
(δωX1(ζ)
ω1(ζ)− ωX1(ζ)δω1(ζ)
(ω1(ζ))2
)(1 +
δω1(ζ)
ω1(ζ)
)−1
or δqX1(ζ) = δ1qX1(ζ) + δ2qX1(ζ) with
δ1qX1(ζ) ≡ δωX1(ζ)
ω1(ζ)− ωX1(ζ)δω1(ζ)
(ω1(ζ))2
δ2qX1(ζ) ≡ ωX1(ζ)
ω1(ζ)
(δω1(ζ)
ω1(ζ)
)2(1 +
δω1(ζ)
ω1(ζ)
)−1
− δωX1(ζ)
ω1(ζ)
δω1(ζ)
ω1(ζ)
(1 +
δω1(ζ)
ω1(ζ)
)−1
.
Similarly,
1
ω1(ζ)=
1
ω1(ζ) + δω1(ζ)= q1(ζ) + δq1(ζ) (12)
34
where q1(ζ) ≡ 1/ω1(ζ), and where
δq1(ζ) =
(− δω1(ζ)
(ω1(ζ))2
)(1 +
δω1(ζ)
ω1(ζ)
)−1
or δq1(ζ) = δ1q1(ζ) + δ2q1(ζ) with
δ1q1(ζ) ≡ − δω1(ζ)
(ω1(ζ))2
δ2q1(ζ) ≡ 1
ω1(ζ)
(δω1(ζ)
ω1(ζ)
)2(1 +
δω1(ζ)
ω1(ζ)
)−1
.
And also
χ(ζ, y, z)
f(y, z)=χ(ζ, y, z) + δχ(ζ, y, z)
f(y, z) + δf(y, z)= q2(ζ, y, z) + δq2(ζ, y, z) (13)
where q2(ζ, y, z) ≡ χ(ζ, y, z)/f(y, z), and where
δq2(ζ, y, z) =
(δχ(ζ, y, z)
f(y, z)− χ(ζ, y, z)δf(y, z)
(f(y, z))2
)(1 +
δf(y, z)
f(y, z)
)−1
or δq2(ζ, y, z) = δ1q2(ζ, y, z) + δ2q2(ζ, y, z) with
δ1q2(ζ, y, z) ≡ δχ(ζ, y, z)
f(y, z)−χ(ζ, y, z)δf(y, z)
(f(y, z))2
δ2q2(ζ, y, z) ≡ χ(ζ, y, z)
f(y, z)
(δf(y, z)
f(y, z)
)2(1 +
δf(y, z)
f(y, z)
)−1
− δχ(ζ, y, z)
f(y, z)
δf(y, z)
f(y, z)
(1 +
δf(y, z)
f(y, z)
)−1
.
Let QX1(ζ) ≡∫ ζ
0 (iωX1(ξ)/ω1(ξ))dξ and δQX1(ζ) ≡∫ ζ
0 (iωX1(ξ)/ω1(ξ))dξ − QX1(ζ). Note that for
some random function δQX1(ζ) such that |δQX1(ζ)| ≤ |δQX1(ζ)| for all ζ,
exp(QX1(ζ) + δQX1(ζ)
)= exp(QX1(ζ))
(1 + δQX1(ζ) +
1
2
[exp(δQX1(ζ))
] (δQX1(ζ)
)2). (14)
35
From equations (11)–(14), we have
f(x | y, z;h)− f(x | y, z;hx)
=1
2π
∫κ(hxζ)φ(ζ, y, z, h(2)) exp(−iζx)dζ − 1
2π
∫κ(hxζ)φ(ζ, y, z) exp(−iζx)dζ
=1
2π
∫κ(hxζ) exp(−iζx)
[ω(ζ, y, z)
ω1(ζ)exp
(∫ ζ
0
iωX1(ξ)
ω1(ξ)dξ
)− ω(ζ, y, z)
ω1(ζ)exp
(∫ ζ
0
iωX1(ξ)
ω1(ξ)dξ
)]dζ
=1
2π
∫κ(hxζ) exp(−iζx)
[− ω(ζ, y, z)
ω1(ζ)exp
(∫ ζ
0
iωX1(ξ)
ω1(ξ)dξ
)+
χ(ζ, y, z)
f(y, z)+δχ(ζ, y, z)
f(y, z)− χ(ζ, y, z)δf(y, z)
(f(y, z))2+ δ2q2(ζ, y, z)
×
1
ω1(ζ)− δω1(ζ)
(ω1(ζ))2+ δ2q1(ζ)
× exp(QX1(ζ))
×
1 +
∫ ζ
0iδ1qX1(ξ)dξ +
∫ ζ
0iδ2qX1(ξ)dξ +
1
2exp(δQX1(ζ))
(∫ ζ
0iδqX1(ξ)dξ
)2]dζ.
We denote the linearization of f(x | y, z;hx) by f(x | y, z;hx). Then
L(x, y, z;h)
≡f(x | y, z;h)− f(x | y, z;hx)
=1
2π
∫κ(hxζ) exp(−iζx) exp(QX1(ζ))
[− χ(ζ, y, z)
f(y, z)
δω1(ζ)
(ω1(ζ))2
+χ(ζ, y, z)
f(y, z)
1
ω1(ζ)
∫ ζ
0iδ1qX1(ξ)dξ
+1
ω1(ζ)
δχ(ζ, y, z)
f(y, z)− 1
ω1(ζ)
χ(ζ, y, z)δf(y, z)
(f(y, z))2
]dζ
=1
2π
∫κ(hxζ) exp(−iζx)φ(ζ, y, z)
[− δω1(ζ)
ω1(ζ)+δχ(ζ, y, z)
χ(ζ, y, z)− δf(y, z)
f(y, z)
+
∫ ζ
0
(iδωX1(ξ)
ω1(ξ)− iωX1(ξ)δω1(ξ)
(ω1(ξ))2
)dξ
]dζ
=1
2π
∫κ(hxζ) exp(−iζx)φ(ζ, y, z)
(−δω1(ζ)
ω1(ζ)+δχ(ζ, y, z)
χ(ζ, y, z)− δf(y, z)
f(y, z)
)dζ
+1
2π
∫ ∫ ±∞ξ
κ(hxζ) exp(−iζx)φ(ζ, y, z)dζ
(iδωX1(ξ)
ω1(ξ)− iωX1(ξ)δω1(ξ)
(ω1(ξ))2
)dξ
=1
2π
∫κ(hxζ) exp(−iζx)φ(ζ, y, z)
(−δω1(ζ)
ω1(ζ)+δχ(ζ, y, z)
χ(ζ, y, z)− δf(y, z)
f(y, z)
)dζ
+1
2π
∫ ∫ ±∞ζ
κ(hxξ) exp(−iξx)φ(ξ, y, z)dξ
(iδωX1(ζ)
ω1(ζ)− iωX1(ζ)δω1(ζ)
(ω1(ζ))2
)dζ
36
=
∫ [− 1
2π
1
ω1(ζ)κ(hxζ) exp(−iζx)φ(ζ, y, z)
− 1
2π
iωX1(ζ)
(ω1(ζ))2
∫ ±∞ζ
κ(hxξ) exp(−iξx)φ(ξ, y, z)dξ
δω1(ζ)
+
1
2π
i
ω1(ζ)
∫ ±∞ζ
κ(hxξ) exp(−iξx)φ(ξ, y, z)dξ
δωX1(ζ)
+
1
2π
1
χ(ζ, y, z)κ(hxζ) exp(−iζx)φ(ζ, y, z)
δχ(ζ, y, z)
+
− 1
2π
1
f(y, z)κ(hxζ) exp(−iζx)φ(ζ, y, z)
δf(y, z)
]dζ
=
∫ [Ψ1(ζ, x, y, z, hx)
(E[eiζX2 ]− E[eiζX2 ]
)+ Ψ2(ζ, x, y, z, hx)
(E[X1e
iζX2 ]− E[X1eiζX2 ]
)+ Ψ3(ζ, x, y, z, hx)
(E[eiζX2khy(Y − y)khz(Z − z)]− E[eiζX2khy(Y − y)khz(Z − z)]
)+ Ψ4(ζ, x, y, z, hx)
(E[khy(Y − y)khz(Z − z)]− E[khy(Y − y)khz(Z − z)]
)]dζ
=E
[ ∫Ψ1(ζ, x, y, z, hx)
(eiζX2 − E[eiζX2 ]
)+ Ψ2(ζ, x, y, z, hx)
(X1e
iζX2 − E[X1eiζX2 ]
)+ Ψ3(ζ, x, y, z, hx)
(eiζX2khy(Y − y)khz(Z − z)− E[eiζX2khy(Y − y)khz(Z − z)]
)+ Ψ4(ζ, x, y, z, hx) (khy(Y − y)khz(Z − z)− E[khy(Y − y)khz(Z − z)])
]dζ
≡E [`(x, y, z, h;Y,X1, X2, Z)] ,
where the following identity was used in the fourth equality: for any absolutely integrable functiong ∫ ∞
−∞
∫ ζ
0g(ζ, ξ)dξdζ =
∫ ∞0
∫ ∞ξ
g(ζ, ξ)dζdξ +
∫ 0
−∞
∫ −∞ξ
g(ζ, ξ)dζdξ ≡∫ ∫ ±∞
ξg(ζ, ξ)dζdξ,
and where
Ψ1(ζ, x, y, z, hx) ≡ − 1
2π
1
ω1(ζ)κ(hxζ) exp(−iζx)φ(ζ, y, z)
− 1
2π
iωX1(ζ)
(ω1(ζ))2
∫ ±∞ζ
κ(hxξ) exp(−iξx)φ(ξ, y, z)dξ
Ψ2(ζ, x, y, z, hx) ≡ 1
2π
i
ω1(ζ)
∫ ±∞ζ
κ(hxξ) exp(−iξx)φ(ξ, y, z)dξ
Ψ3(ζ, x, y, z, hx) ≡ 1
2π
1
χ(ζ, y, z)κ(hxζ) exp(−iζx)φ(ζ, y, z)
Ψ4(ζ, x, y, z, hx) ≡ − 1
2π
1
f(y, z)κ(hxζ) exp(−iζx)φ(ζ, y, z).
We use the following convenient notation for expositional simplicity.
37
Definition A.1 We write f(ζ) g(ζ) for f, g : R 7→ R when there exists a constant C > 0,independent of ζ, such that f(ζ) ≤ Cg(ζ) for all ζ ∈ R (and similarly for ). Analogously, wewrite an bn for two sequences an, bn when there exists a constant C independent of n such thatan ≤ Cbn for all n ∈ N.
Proof of Theorem 3. In order to obtain the uniform convergence rate of f(x | y, z;h), we deriveasymptotic convergence rate of the bias term, divergence rate of the variance term, and rely onnegligibility of the remainder term. First, from Parseval’s identity and Assumption A.IV, we have
|B(x, y, z, hx)| = |f(x | y, z;hx)− f(x | y, z)|= |f(x | y, z;hx)− f(x | y, z; 0)|
=
∣∣∣∣ 1
2π
∫κ(hxζ)φ(ζ, y, z) exp(−iζx)dζ − 1
2π
∫φ(ζ, y, z) exp(−iζx)dζ
∣∣∣∣=
∣∣∣∣ 1
2π
∫(κ(hxζ)− 1)φ(ζ, y, z) exp(−iζx)dζ
∣∣∣∣≤ 1
2π
∫|(κ(hxζ)− 1)| |φ(ζ, y, z)| dζ
=1
π
∫ ∞ξ/hx|(κ(hxζ)− 1)| |φ(ζ, y, z)| dζ
∫ ∞ξ/hx|φ(ζ, y, z)| dζ.
Then, by Assumption B.I (ii), we have
sup(x,y,z)∈X×Y×Z
|B(x, y, z, hx)| ∫ ∞ξ/hx
Cφ(1 + |ζ|)γφ exp(αφ|ζ|νφ)dζ
∫ ∞ξ/hx
(1 + |ζ|)γφ exp(αφ|ζ|νφ)dζ (15)
= O((ξ/hx
)γφ+1exp
(αφ(ξ/hx
)νφ))= O
((hx)−γB exp
(αB (hx)−νB
)).
For the asymptotic divergence rate of the variance term, let an = (lnn)1/2
(nhyhz)1/2+∑
s∈y,z(hs)2.
and define
Ψ+(h) ≡∫
Ψ+1 (ζ, hx)dζ +
∫Ψ+
2 (ζ, hx)dζ
+ an
∫Ψ+
3 (ζ, hx)dζ + an
∫Ψ+
4 (ζ, hx)dζ,
where Ψ+A(ζ, hx) ≡ sup(x,y,z)∈X×Y×Z |ΨA(ζ, x, y, z, hx)| for A ∈ 1, 2, 3, 4. From Assumptions
38
A.IV and B.II, and from similar arguments above, one can show that
E(n |δω1(ζ)|2
) 1, E
n ∣∣∣∣∣a−1n · sup
(y,z)∈Y×Z(δχ(ζ, y, z))
∣∣∣∣∣2 1,
E(n |δωX1(ζ)|2
) 1, E
n ∣∣∣∣∣a−1n · sup
(y,z)∈Y×Z
(δf(y, z)
)∣∣∣∣∣2 1,
and ∫Ψ+
1 (ζ, hx)dζ (1 + (hx)−1)γµ+γφ−γω+2 exp(−αω(hx)−1)νω
)exp
(αφ((hx)−1)νφ
),∫
Ψ+2 (ζ, hx)dζ (1 + (hx)−1)γφ−γω+2 exp
(−αω(hx)−1)νω
)exp
(αφ((hx)−1)νφ
),
an
∫Ψ+
3 (ζ, hx)dζ an(1 + (hx)−1)γφ−γω+1 exp(−αω(hx)−1)νω
)exp
(αφ((hx)−1)νφ
),
an
∫Ψ+
4 (ζ, hx)dζ an(1 + (hx)−1)γφ+1 exp(αφ((hx)−1)νφ
).
Then we have
Ψ+(h) = O(
((hx)−1)γµ+γφ−γω+2 exp(
(αφ1νφ=νω − αω)((hx)−1)νω))
.
Note that by Minkowski inequality,
E
[sup
(x,y,z)∈X×Y×Z|L(x, y, z, h)|
]
= E
[sup
(x,y,z)∈X×Y×Z|f(x | y, z;h)− f(x | y, z;hx)|
]
= E
[sup
(x,y,z)∈X×Y×Z
∣∣∣∣ ∫ [Ψ1(ζ, x, y, z, hx)δω1(ζ) + Ψ2(ζ, x, y, z, hx)δωX1(ζ)
+ Ψ3(ζ, x, y, z, hx)δχ(ζ, y, z) + Ψ4(ζ, x, y, z, hx)δf(y, z)]dζ
∣∣∣∣]≤ E
∫ [(sup
(x,y,z)∈X×Y×Z|Ψ1(ζ, x, y, z, hx)|
)|δω1(ζ)|
+
(sup
(x,y,z)∈X×Y×Z|Ψ2(ζ, x, y, z, hx)|
)|δωX1(ζ)|
+
(sup
(x,y,z)∈X×Y×Z|Ψ3(ζ, x, y, z, hx)|
)(sup
(y,z)∈Y×Z|δχ(ζ, y, z)|
)
+
(sup
(x,y,z)∈X×Y×Z|Ψ4(ζ, x, y, z, hx)|
)(sup
(y,z)∈Y×Z
∣∣∣δf(y, z)∣∣∣) ]dζ
39
≤∫ [
Ψ+1 (ζ, hx)
E(|δω1(ζ)|2
)1/2+ Ψ+
2 (ζ, hx)
E(|δωX1(ζ)|2
)1/2
+ anΨ+3 (ζ, hx)
E
∣∣∣∣∣a−1n ·
(sup
(y,z)∈Y×Zδχ(ζ, y, z)
)∣∣∣∣∣2
1/2
+ anΨ+4 (ζ, hx)
E
∣∣∣∣∣a−1n ·
(sup
(y,z)∈Y×Zδf(y, z)
)∣∣∣∣∣2
]dζ
≤ n−1/2
[ ∫Ψ+
1 (ζ, hx)
E(n |δω1(ζ)|2
)1/2dζ +
∫Ψ+
2 (ζ, hx)
E(n |δωX1(ζ)|2
)1/2dζ
+ an
∫Ψ+
3 (ζ, hx)
E
n ∣∣∣∣∣a−1n ·
(sup
(y,z)∈Y×Zδχ(ζ, y, z)
)∣∣∣∣∣2
1/2
dζ
+ an
∫Ψ+
4 (ζ, hx)
E
n ∣∣∣∣∣a−1n ·
(sup
(y,z)∈Y×Zδf(y, z)
)∣∣∣∣∣2 dζ
] n−1/2Ψ+(h).
Thus, we have that by Markov’s inequality
sup(x,y,z)∈X×Y×Z
|L(x, y, z, h)| (16)
= Op
(n−1/2((hx)−1)γµ+γφ−γω+2 exp
((αφ1νφ=νω − αω)((hx)−1)νω
)).
From Assumptions B.II–B.III, selection of the bandwidths in the statement of the theorem,and minor adjustment of the argument for the variance term above, one can show that the remainderterm is asymptotically negligible. So detailed proof is omitted here for brevity. Then puttingequations (15) and (16) together yields the result.
Proof of Theorem 4. To show consistency of the estimator, we apply Theorem 1 of Chen,Linton, and Van Keilegom (2003). Thus, we need to verify Conditions (1.1)–(1.5’) in Chen,Linton, and Van Keilegom (2003). Recall that
Qn(β, δ, f) =1
n
n∑i=1
∫ψ(Yi − xβ − Z>i δ)[x Zi] · f(x | Yi, Zi) dx,
and
Q(β, δ, f) = E
∫ψ(Yi − xβ − Z>i δ)[x Zi] · f(x | Yi, Zi) dx.
a) First, we verify Condition (1.1) by verifying that Qn(β, δ, f) = op(n−1/2), which implies
the desired property. This result is usual in the quantile regression literature, and generally follows
40
for the boundness of the score function and the computational property of quantile regression.Consider the following
||Qn(β, δ, f)|| = || 1n
n∑i=1
∫ψ(Yi − xβ − Z>i δ)[x Zi] · f(x | Yi, Zi) dx||
= || 1n
n∑i=1
∫ψ(Yi − xβ − Z>i δ)[x Zi] · (f(x | Yi, Zi)− f(x | Yi, Zi)) dx
+1
n
n∑i=1
∫ψ(Yi − xβ − Z>i δ)[x Zi] · f(x | Yi, Zi) dx||
≤ || 1n
n∑i=1
∫ψ(Yi − xβ − Z>i δ)[x Zi] · (f(x | Yi, Zi)− f(x | Yi, Zi)) dx||
+ || 1n
n∑i=1
∫ψ(Yi − xβ − Z>i δ)[x Zi] · f(x | Yi, Zi) dx||
≤ C ·∫f(x | Yi, Zi)− f(x | Yi, Zi)dx||∞ ·
1
nsupx,i||x Zi||
+ C ·∫f(x | Yi, Zi)dx||∞ ·
1
nsupx,i||x Zi||
= op(n−1/2),
where C is generic constant, the first inequality is given by the triangle inequality, and the lastequality is given by the fact that ψ(·) ≤ 1 and the computational property of quantile regression(Koenker and Bassett (1978)), Theorem 3, 1
n supx,i ||x Zi|| = op(n−1/2) by Assumption C.II, and
f(x|Yi, Zi) being integrable by assumption A.III(ii). The last assertion follows from C.II andapplication of Markov inequality as 1
n supx,i ||x Zi|| = op(n−1/2), since P (supx,i ||x Z|| > n1/2) ≤
nP (||x Z|| > n1/2) ≤ nE||x Z||2+ε/n2+ε2 = o(1).
b) Condition (1.2) holds directly by the identification condition C.I.
c) Now we show that Condition (1.3) is satisfied by verifying that Q(β, δ, f) is continuous inf uniformly for all (β, δ>)> ∈ Θ. For any ||f − f0||∞ ≤ ε,
||Q(β, δ, f)− Q(β, δ, f0)|| =||E∫ψ(Yi − xβ − Z>i δ)[x Zi] · f(x | Yi, Zi) dx
− E
∫ψ(Yi − xβ − Z>i δ)[x Zi] · f0(x | Yi, Zi) dx||
=||E∫ψ(Yi − xβ − Z>i δ)[x Zi] · [f(x | Yi, Zi)− f0(x | Yi, Zi)] dx||
≤E
∫||[x Zi]|| · ||f(x | Yi, Zi)− f0(x | Yi, Zi)||∞ dx
The inequality holds by the property of exchanging norms and integral, Cauchy inequality, andthe fact that ψ(·) ≤ 1. By Assumptions C.II and C.III, E
∫||[x Zi]|| · ||f(x | Yi, Zi) − f0(x |
Yi, Zi)||∞ dx < C, and given that ||f−f0||∞ ≤ ε, as ε→ 0 the result follows. Therefore, Condition(1.3) holds.
41
d) Condition (1.4) is satisfied by our Theorem 3.
e) It only remains to verify Condition (1.5’). For any εn = o(1),
sup(β,δ)∈Θ,||f−f0||∞≤εn
||Qn(β, δ, f)− Q(β, δ, f)|| = op(1).
By noting that
||Qn(β, δ, f)− Q(β, δ, f)||
=||∫ (
1
n
n∑i=1
ψ(Yi − xβ − Z>i δ)[x Zi] · f(x | Yi, Zi)− Eψ(Yi − xβ − Z>i δ)[x Zi] · f(x | Yi, Zi)
)dx||
≤||∫
supif(x | Yi, Zi)
(1
n
n∑i=1
ψ(Yi − xβ − Z>i δ)[x Zi]− Eψ(Yi − xβ − Z>i δ)[x Zi]
)dx||
≤||∫
supif(x | Yi, Zi) dx|| · || sup
x
(1
n
n∑i=1
ψ(Yi − xβ − Z>i δ)[x Zi]− Eψ(Yi − xβ − Z>i δ)[x Zi]
)||,
we have
sup(β,δ)∈Θ,||f−f0||∞≤εn
||Qn(β, δ, f)− Q(β, δ, f)||
≤C · sup(β,δ)∈Θ,x
|| 1n
n∑i=1
ψ(Yi − xβ − Z>i δ)[x Zi]− Eψ(Yi − xβ − Z>i δ)[x Zi]||,
since ||∫
supi f(x | Yi, Zi) dx|| < C by Assumption A.III. Denote φβ,δ(Yi, x, Zi) = ψ(Yi − xβ −Z>i δ)[x Zi]. We need to show that φβ,δ : (β, δ) ∈ Θ is G-C. Because ψ(Yi−xβ−Z>i δ) : (β, δ) ∈ Θis bounded and VC, it is G-C. Also, E[|X|] < ∞ and E[‖Z‖] < ∞ by Assumption C.II. Finally,Fεn = f : ||f − f0||∞ ≤ ε is G-C by Theorem 3. Those conditions and Corollary 9.27 (ii) ofKosorok (2008) lead to our conclusion.
Proof of Theorem 5. We now apply Theorem 2 of Chen, Linton, and Van Keilegom (2003) toestablish weak convergence. We need to check their Conditions (2.1)–(2.6).
a) Condition (2.1) was verified in the first part of Theorem 4 above.
b) To verify Condition (2.2), note that
Q(β, δ, f0) = E
∫ψ(Yi − xβ − Z>i δ)[x Zi] · f0(x | Yi, Zi) dx
= EE[ψ(Yi − xβ − Z>i δ)[x Zi] · |Yi, Zi]= Eψ(Yi − xβ − Z>i δ)[x Zi]= EE[ψ(Y − xβ − Z>δ)[x Z]|X,Z]
= EE[ψ(Y − xβ − Z>δ)|X,Z][x Z]
= E(τ −GY (xβ + Z>δ))[x Z].
42
The derivative with respect to (β, δ), denoted by Γ1(β, δ, f), is −EgY (xβ + Z>δ))[x Z][x Z>]>. Itis continuous in (β, δ) at (β0, δ0) and positive definite by Assumptions G.I and G.II.
c) Now we verify Condition (2.3). We first calculate the pathwise derivative of Q(β, δ, f) atf0:
Γ2(β, δ, f0)[f − f0] = [Q(β, δ, f0 + ζ(f − f0))− Q(β, δ, f0)]/ζ
=E
∫ψ(Yi − xβ − Z>i δ)[x Zi] · [f(x | Yi, Zi)− f0(x | Yi, Zi)] dx.
For any εn ↓ 0, such that ||(β, δ)− (β0, δ0)|| ≤ εn and ||f − f0||∞ ≤ εn:
||Q(β, δ, f)− Q(β, δ, f0)− Γ2(β, δ, f0)[f − f0]|| = 0
and
||Γ2(β, δ, f0)[f − f0]− Γ2(β0, δ0, f0)[f − f0]||
=||E∫
[ψ(Yi − xβ − Z>i δ)− ψ(Yi − xβ0 − Z>i δ0)][x Zi] · [f(x | Yi, Zi)− f0(x | Yi, Zi)] dx||
≤E
∫|ψ(Yi − xβ − Z>i δ)− ψ(Yi − xβ0 − Z>i δ0)| · ||[x Zi]|| ||f(x | Yi, Zi)− f0(x | Yi, Zi)||∞ dx
=o(1).
The first inequality holds by the property of exchanging norm and integrals. The last equalityholds because the domain for integration is O(1) by the indicator function, Assumptions C.II andC.III, and ||f − f0||∞ ≤ εn. The result follows as εn → 0.
d) Condition 2.4 holds by Assumption G.III.
e) Now we verify Condition (2.5’):
sup||β−β0||≤εn,||δ−δ0||≤εn,||f−f0||∞≤εn
||Qn(β, δ, f)− Q(β, δ, f)− Qn(β0, δ0, f0)|| = op(1/√n)
Note that
43
||Qn(β, δ, f)− Q(β, δ, f)− Qn(β0, δ0, f0)||
=||∫ (
1
n
n∑i=1
ψ(Yi − xβ − Z>i δ)[x Zi] · f(x | Yi, Zi)− Eψ(Yi − xβ − Z>i δ)[x Zi] · f(x | Yi, Zi)
− 1
n
n∑i=1
ψ(Yi − xβ0 − Z>i δ0)[x Zi] · f0(x | Yi, Zi)
)− Eψ(Yi − xβ0 − Z>i δ0)[x Zi] · f0(x | Yi, Zi) dx||
≤||∫
supif(x | Yi, Zi) ·
1
n
n∑i=1
ψ(Yi − xβ − Z>i δ)[x Zi]− Eψ(Yi − xβ − Z>i δ)[x Zi]
− supif0(x | Yi, Zi) ·
1
n
n∑i=1
ψ(Yi − xβ0 − Z>i δ0)[x Zi]− Eψ(Yi − xβ0 − Z>i δ0)[x Zi] dx||
≤C · supx|| 1n
n∑i=1
ψ(Yi − xβ − Z>i δ)[x Zi]− Eψ(Yi − xβ − Z>i δ)[x Zi]
− 1
n
n∑i=1
ψ(Yi − xβ0 − Z>i δ0)[x Zi]− Eψ(Yi − xβ0 − Z>i δ0)[x Zi]||,
for a generic C by Assumption A.III. So we need to show
sup||β−β0||≤εn,||δ−δ0||≤εn,x
|| 1n
n∑i=1
ψ(Yi − xβ − Z>i δ)[x Zi]− Eψ(Yi − xβ − Z>i δ)[x Zi]
− 1
n
n∑i=1
ψ(Yi − xβ0 − Z>i δ0)[x Zi] + Eψ(Yi − xβ0 − Z>i δ0)[x Zi]||
= op(1/√n).
We need to show that φβ,δ is Donsker. Because ψ(Yi−xβ−Z>i δ) : (β, δ) ∈ Θ is bounded and VC,it is Donsker with constant envelope. Also E[|X|] <∞ and E[‖Z‖] <∞ by assumption C.II. Theproduct of φβ,δ with [x Zi] also forms a Donsker class with a square integrable envelope. Finally, theclass F , as defined in assumption G.IV, is Donsker with a constant envelope. Given assumptionG.IV, the bracketing number of F by Corollary 2.7.4 in van der Vaart and Wellner (1996) satisfies
logN[·](ε,F , L2(P )) = O(ε−dim(x,y,z)
δ ) = O(ε−2−δ′), for some δ′ < 0. Those conditions and Corollary9.32 (iii) of Kosorok (2008) lead to our conclusion.
f) Finally, we verify Condition (2.6). Noting that√nQn(β0, δ0, f0) converges weakly and
Assumption G.II, we only verify that
√nΓ2(β0, δ0, f0)[f − f0] =
√nE
∫ψ(Y − xβ0 − Z>δ0)[x Z] · (f(x | Y,Z)− f0(x | Y,Z)) dx,
converges weakly. Also, since the bias of f is op(1/√n), we only need to verify
√nE∫ψ(Y −xβ0−
Zδ0)[x Z] · (f(x | Y,Z)− Ef(x | Y, Z)) dx converges weakly:
√nE
∫ψ(Y − xβ0 − Z>δ0)[x Z] · [ 1
2π
∫κ(hxnζ)(φ− Eφ)e−iζx dζ] dx. (17)
44
First,
supζ| 1n
n∑j=1
eiζX2j − EeiζX2j | p→ 0.
This is because eiζX2j = cos(ζX2j) + i sin(ζX2j) and those two terms are Lipschitz in ζ. Simi-
larly, 1n
∑nj=1X1e
iζX2jp→ EX1e
iζX2j . Therefore,1n
∑nj=1X1e
iζX2j
1n
∑nj=1 e
iζX2j
p→ EX1eiζX2j
EeiζX2j. By the continuous
mapping theorem,
exp
(∫ ζ
0
i 1n
∑nj=1X1e
iζX2j
1n
∑nj=1 e
iζX2j
)p→ exp
(∫ ζ
0
iEX1eiζX2j
EeiζX2j
)Also we have
E[eiζX2 | Y = y, Z = z] ≡1
hynhznn
∑[eiζX2khyn(Y − y)khzn(Z − z)]
1hynhznn
∑[khyn(Y − y)khzn(Z − z)]]
.
So (17) equals
√n
∫ ∫ψ(y − xβ0 − z>δ0)[x z] · [ 1
2π
∫κ(hxnζ)(φ− φ)e−iζx dζ] dx dydz
=√n
∫ ∫ψ(y − xβ0 − z>δ0)[x z] · [ 1
2π
∫κ(hxnζ)×
(1
hynhznn
∑[eiζX2khyn(Y − y)khzn(Z − z)]
exp
(∫ ζ
0
i 1n
∑nj=1X1e
iζX2j
1n
∑nj=1 e
iζX2j
)/
1
n
n∑j=1
eiζX2j1
hynhznn
∑[khyn(Y − y)khzn(Z − z)]− φ)
× e−iζx dζ] dx dydz
=√n
∫ ∫ψ(y − xβ0 − z>δ0)[x z] · [ 1
2π
∫κ(hxnζ)(
1
hynhznn
∑[eiζX2khyn(Y − y)khzn(Z − z)]
[exp
(∫ ζ
0
iEX1eiζX2j
EeiζX2j
)+ op(1)]/[EeiζX2jf(y, z) + op(1)]− φ)e−iζx dζ] dx dydz
=√n 1
n
∑∫ψ(Yj − xβ0 − Z>j δ0)[x Zj ] · [
1
2π
∫κ(hxnζ)×
[eiζX2j ][exp
(∫ ζ
0
iEX1eiζX2
EeiζX2
)+ op(1)]/[EeiζX2f(y, z) + op(1)]e−iζx dζ] dx− φ+ op(n
−1/2),
which converges weakly, and the result follows.
Proof of Lemma 1. The proof is a direct application of Theorem B in Chen, Linton, and VanKeilegom (2003) and parallel to that of convergence in distribution.
Condition (2.4B) in Chen, Linton, and Van Keilegom (2003) is directly implied by conditionG.IB. Let
sup||β−β0||≤εn,||δ−δ0||≤εn,||f−f0||∞≤εn
||Q∗n(β, δ, f)− Qn(β, δ, f)− [Q∗n(β0, δ0, f0)− Qn(β0, δ0, f0)]|| = op∗(1/√n).
45
The verification of Condition (2.5’B) in Chen, Linton, and Van Keilegom (2003) is parallel tothat of (2.5’) of Theorem 5. Finally, to verify Condition (2.6’B) in Chen, Linton, and VanKeilegom (2003), from Gine and Zinn (1990), the P ∗-distribution of
√nQ∗n(β, δ, f)− Qn(β, δ, f)
approximates the distribution of√nQ∗n(β, δ, f)− Q(β, δ, f), which is approximately the same as
the distribution of√nQn(β0, δ0, f0) by the verification of condition (2.5’) of Theorem 5. For this
we need that f(·) possesses the same smoothness as f0(·), which is guaranteed by Theorem 3 andcondition G.IV.
46
References
Agca, S., and A. Mozumdar (2016): “Investment Cash Flow Sensitivity: Fact or Fiction?,”forthcoming in the Journal of Financial and Quantitative Analysis.
Almeida, H., M. Campello, and A. Galvao (2010): “Measurement Errors in InvestmentEquations,” Review of Financial Studies, 23, 3279–3328.
Almeida, H., M. Campello, and M. S. Weisbach (2004): “The Cash Flow Sensitivity ofCash,” Journal of Finance, 59, 1777–1804.
Andrews, D. W. K. (1995): “Nonparametric Kernel Estimation for Semiparametric Models,”Econometric Theory, 11, 560–596.
Angrist, J., V. Chernozhukov, and I. Fernandez-Val (2006): “Quantile Regression UnderMisspecification, with an Application to the U.S. Wage Structure,” Econometrica, 74, 539–573.
Chen, X., O. Linton, and I. Van Keilegom (2003): “Estimation of Semiparametric ModelsWhen the Criterion Function is not Smooth,” Econometrica, 71, 1591–1608.
Chernozhukov, V., and C. Hansen (2006): “Instrumental Quantile Regression Inference forStructural and Treatment Effects Models,” Journal of Econometrics, 132, 491–525.
Chesher, A. (2001): “Parameter Approximations for Quantile Regressions with MeasurementError,” Working Paper CWP02/01, Department of Economics, University College London.
Cummins, J. G., K. A. Hasset, and S. D. Oliner (2006): “Investment Behavior, ObservableExpectations, and Internal Funds,” American Economic Review, 96, 796–810.
Delaigle, A., P. Hall, and A. Meister (2008): “On Deconvolution With Repeated Measure-ments,” The Annals of Statistics, 36, 665–685.
Erickson, T., and T. Whited (2000): “Measurement Error and the Relationship betweenInvestment and Q,” Journal of Political Economy, 108, 1027–1057.
Fan, J. (1991): “On the Optimal Rates of Convergence for Nonparametric Deconvolution Prob-lems,” The Annals of Statistics, 19, 1257–1272.
Fan, J., and Y. K. Truong (1993): “Nonparametric Regression with Errors in Variables,” TheAnnals of Statistics, 21, 1900–1925.
Fazzari, S., R. G. Hubbard, and B. Petersen (1988): “Financing Constraints and CorporateInvestment,” Brooking Papers on Economic Activity, 1, 141–195.
Gine, E., and J. Zinn (1990): “Bootstrapping General Empirical Measures,” Annals of Probabil-ity, 18, 851–869.
Hansen, B. E. (2008): “Uniform Convergence Rates for Kernel Estimation with Dependent Data,”Econometric Theory, 24, 726–748.
Hausman, J., Y. Luo, and C. Palmer (2014): “Errors in the Dependent Variable of QuantileRegression Models,” Working Paper, MIT.
47
Hayashi, F. (1982): “Tobin’s Marginal q and Average q: A Neoclassical Interpretation,” Econo-metrica, 50, 213–224.
He, X., and H. Liang (2000): “Quantile Regression Estimates for a Class of Linear and PartiallyLinear Errors-in-Variables Models,” Statistica Sinica, 10, 129–140.
He, X., and Q.-M. Shao (1996): “A General Bahadur Representation of M-estimators andIts Application to Linear Regression with Nonstochastic Designs,” The Annals of Statistics, 6,2608–2630.
(2000): “On Parameters of Increasing Dimensions,” Journal of Multivariate Analysis, 73,120–135.
Hu, Y., and Y. Sasaki (2015a): “Closed-form Estimation of Nonparametric Models with Non-classical Measurement Errors,” Journal of Econometrics, 185, 392–408.
(2015b): “Identification of Paired Nonseparable Measurement Error Models,” WorkingPaper, Johns Hopkins University.
Hu, Y., and S. M. Schennach (2008): “Instrumental Variable Treatment of Nonclassical Mea-surement Error Models,” Econometrica, 76, 195–216.
Kaplan, S., and L. Zingales (1997): “Do Financing Constraints Explain Why Investment IsCorrelated with Cash Flow?,” Quarterly Journal of Economics, 112, 169–215.
Kato, K., A. F. Galvao, and G. Montes-Rojas (2012): “Asymptotics for Panel QuantileRegression Models with Individual Effects,” Journal of Econometrics, 170, 76–91.
Koenker, R. (2005): Quantile Regression. Cambridge University Press, New York, New York.
Koenker, R., and G. W. Bassett (1978): “Regression Quantiles,” Econometrica, 46, 33–49.
Kosorok, M. (2008): Introduction to Empirical Processes and Semiparametric Inference. Springer.
Kotlarski, I. (1967): “On Characterizing the Gamma and the Normal Distribution,” PacificJournal of Mathematics, 20, 69–76.
Lewellen, J., and K. Lewellen (2014): “Investment and Cashflow: New Evidence,” Journalof Financial and Quantitative Analysis, Forthcoming.
Li, T. (2002): “Robust and Consistent Estimation of Nonlinear Errors-in-variables Models,” Jour-nal of Econometrics, 110, 1–26.
Li, T., and Q. Vuong (1998): “Nonparametric Estimation of the Measurement Error ModelUsing Multiple Indicators,” Journal of Multivariate Analysis, 65, 139–165.
Ma, Y., and G. Yin (2011): “Censored Quantile Regression with Covariate Measurement Errors,”Statistica Sinica, 21, 949–971.
Modigliani, F., and M. Miller (1958): “The Cost of Capital, Corporation Finance and theTheory of Investment,” American Economic Review, 48, 261–297.
48
Newey, W. K. (1994a): “The Asymptotic Variance of Semiparametric Estimators,” Econometrica,62, 1349–1382.
(1994b): “Kernel Estimation of Partial Means and a Generalized Variance Estimator,”Econometric Theory, 10, 233–253.
Newey, W. K., and D. L. McFadden (1994): “Large Sample Estimation and HypothesisTesting,” in Handbook of Econometrics, Vol. 4, ed. by R. F. Engle, and D. L. McFadden. NorthHolland, Elsevier, Amsterdam.
Politis, D. N., and J. P. Romano (1999): “Multivariate Density Estimation with GeneralFlat-Top Kernels of Infinite Order,” Journal of Multivariate Analysis, 68, 1–25.
Poterba, J. (1988): “Financing Constraints and Corporate Investment: Comment,” BrookingsPapers on Economic Activity, 1, 200–204.
Schennach, S. M. (2004a): “Estimation of Nonlinear Models with Measurement Error,” Econo-metrica, 72, 33–75.
(2004b): “Nonparametric Regression in the Presence of Measurement Error,” EconometricTheory, 20, 1046–1093.
(2008): “Quantile Regression with Mismeasured Covariates,” Econometric Theory, 24,1010–1043.
(2014): “Entropic Latent Variable Integration via Simulation,” Econometrica, 82, 345–385.
Stein, J. (2003): “Agency Information and Corporate Investment,” in Handbook of the Economicsof Finance, ed. by M. Harris, and R. Stulz. Elsevier/North-Holland, Amsterdam.
Torres-Saavedra, P. (2013): “Quantile Regression for Repeated Responses Measured with Er-ror,” North Carolina State University, mimeo.
van der Vaart, A., and J. A. Wellner (1996): Weak Convergence and Empirical Processes.Springer-Verlag Press, New York, New York.
Wang, H. J., L. A. Stefanski, and Z. Zhu (2012): “Corrected-Loss Estimation for QuantileRegression with Covariate Measurement Errors,” Biometrika, 99, 405–421.
Wei, Y., and R. J. Carroll (2009): “Quantile Regression with Measurement Error,” Journalof the American Statistical Association, 104, 1129–1143.
Wu, Y., Y. Ma, and G. Yin (2015): “Smoothed and Corrected Score Approach to CensoredQuantile Regression With Measurement Errors,” Journal of the American Statistical Association,110, 1670–1683.
49
Table 1: Fourier Estimator
hx\hy 0.2 0.3 0.4 0.5 0.6
B 0.09592 0.07751 0.09131 0.06610 0.093731.5 SD 0.09765 0.08403 0.11718 0.06085 0.08298
MSE 0.01874 0.01307 0.02207 0.00807 0.01567
B 0.09770 0.07553 0.08439 0.08457 0.096961.6 SD 0.09206 0.07425 0.08380 0.07819 0.07446
MSE 0.01802 0.01122 0.01414 0.01327 0.01495
B 0.10225 0.06963 0.07425 0.07394 0.099681.7 SD 0.10788 0.04347 0.09206 0.08228 0.08720
MSE 0.02209 0.00674 0.01399 0.01224 0.01754
B 0.09728 0.07464 0.07571 0.08745 0.104261.8 SD 0.08762 0.05453 0.06794 0.07288 0.07359
MSE 0.01714 0.00855 0.01035 0.01296 0.01629
B 0.08473 0.09663 0.08869 0.10269 0.104581.9 SD 0.02795 0.09039 0.08893 0.11265 0.08798
MSE 0.007961 0.017507 0.015774 0.023235 0.018679
optimal hx hy B SD MSE1.7 0.3 0.06963 0.04347 0.00674
50
Table 2: Infeasible Kernel Estimator
hx\hy 0.1 0.2 0.3 0.4 0.5
B 0.00706 0.01670 0.01124 0.01097 0.033310.1 SD 0.04271 0.04013 0.04270 0.04468 0.04288
MSE 0.00187 0.00189 0.00195 0.00212 0.00295
B 0.01065 0.00200 0.00537 0.01181 0.030370.2 SD 0.03394 0.03158 0.03454 0.02594 0.02880
MSE 0.00127 0.00100 0.00122 0.00081 0.00175
B 0.00513 0.00737 0.00777 0.01121 0.018900.3 SD 0.02867 0.02620 0.02737 0.02755 0.02652
MSE 0.00085 0.00074 0.00081 0.00088 0.00106
B 0.01320 0.00935 0.01242 0.01729 0.028020.4 SD 0.02411 0.02800 0.02527 0.02289 0.02464
MSE 0.00076 0.00087 0.00079 0.00082 0.00139
B 0.02175 0.01778 0.01631 0.02742 0.033470.5 SD 0.02270 0.02195 0.02541 0.02582 0.02875
MSE 0.00099 0.00080 0.00091 0.00142 0.00195
optimal hx hy B SD MSE0.3 0.2 0.00737 0.02620 0.00074
51
Table 3: Naive Kernel Estimator
hx\hy 0.1 0.2 0.3 0.4 0.5
B 0.11254 0.11132 0.10409 0.10296 0.132490.1 SD 0.04776 0.04581 0.04199 0.04449 0.04084
MSE 0.01495 0.01449 0.01260 0.01258 0.01922
B 0.09695 0.09553 0.10231 0.10989 0.126300.2 SD 0.03399 0.03093 0.03381 0.02886 0.03153
MSE 0.01055 0.01008 0.01161 0.01291 0.01695
B 0.10133 0.09800 0.10012 0.10388 0.116630.3 SD 0.02957 0.02854 0.02820 0.02557 0.02600
MSE 0.01114 0.01042 0.01082 0.01145 0.01428
B 0.10243 0.09939 0.10476 0.10609 0.118840.4 SD 0.02715 0.03008 0.02441 0.02245 0.02630
MSE 0.01123 0.01078 0.01157 0.01176 0.01481
B 0.10757 0.10313 0.10299 0.11340 0.117820.5 SD 0.02196 0.02276 0.02703 0.02525 0.02700
MSE 0.01205 0.01115 0.01134 0.01350 0.01461
optimal hx hy B SD MSE0.2 0.2 0.09553 0.03093 0.01008
52
Table 4: Simulation Results over Various Quantiles
τ\estimator Fourier Infeasible Naive
B 0.07068 0.00510 0.09976τ = 0.2 SD 0.06196 0.02002 0.02746
MSE 0.00883 0.00043 0.01070
B 0.06372 0.00515 0.09785τ = 0.3 SD 0.05685 0.01775 0.02568
MSE 0.00729 0.00034 0.01023
B 0.06014 0.00480 0.09737τ = 0.4 SD 0.05530 0.01734 0.02507
MSE 0.00667 0.00032 0.01011
B 0.05943 0.00457 0.09778τ = 0.5 SD 0.05487 0.01770 0.02383
MSE 0.00654 0.00033 0.01013
B 0.06029 0.00479 0.09818τ = 0.6 SD 0.05546 0.01792 0.02391
MSE 0.00671 0.00034 0.01021
B 0.06326 0.00542 0.09915τ = 0.7 SD 0.05681 0.02003 0.02422
MSE 0.00723 0.00043 0.01042
B 0.07213 0.00458 0.10130τ = 0.8 SD 0.05947 0.02265 0.02564
MSE 0.00874 0.00053 0.01092
53
Table 5: Summary Statistics (standardized data)
First ThirdMin. Quartile Median Mean Quartile Max.
IK −1.7590 −0.7140 −0.1595 0.0000 0.4972 6.0110qe −1.1130 −0.6819 −0.3478 0.0000 0.3019 7.9500q −1.2590 −0.6898 −0.3018 0.0000 0.3591 9.2710
CFK −7.6380 −0.7284 −0.1864 0.0000 0.4988 5.4140Number of observations 11,431
54