testing conditional independence using conditional...

52
Testing Conditional Independence using Conditional Martingale Transforms Kyungchul Song 1 Department of Economics, University of Pennsylvania March 13, 2007 Abstract This paper investigates testing conditional independence between Y and Z given λ θ (X ) for some θ Θ R d , for a function λ θ (·) known upto a parameter θ Θ. First, the paper proposes a new method of conditional martingale transforms un- der which tests are asymptotically pivotal and asymptotically unbiased against n-converging Pitman local alternatives. Second, the paper performs an analy- sis of asymptotic power comparison using the limiting Pitman eciency. From this analysis, it is found that there is improvement in power when we use non- parametric estimators of conditional distribution functions in place of true ones. When both Y and Z are continuous, the improvement in power due to the non- parametric estimation disappears after the conditional martingale transform, but when either Y or Z is binary, this phenomenon of power improvement reappears even after the conditional martingale transform. We perform Monte Carlo simu- lation studies to compare the martingale transform approach and the bootstrap approach. Key words and Phrases : Conditional independence, Asymptotic pivotal tests, conditional martingale transforms, limiting Pitman eciency, approximate Ba- hadur eciency. JEL Classications: C12, C14, C52. 1 I am grateful to Yoon-Jae Whang and Guido Kuersteiner who gave me valuable advice at the initial stage of this research. I also thank Hyungsik Roger Moon, Qi Li, Joon Park, Bernard Salanié, Norm Swanson and seminar participants at a KAEA meeting and workshops at Greater New York Metropolitan Econo- metrics Colloquium, Rutgers University, Texas A&M, UC Riverside, UCLA for their valuable comments. All errors are mine. Address correspondence to: Kyungchul Song, Department of Economics, University of Pennsylvania, 528 McNeil Building, 3718 Locust Walk, Philadelphia, Pennsylvania 19104-6297. 1

Upload: buique

Post on 16-Mar-2018

235 views

Category:

Documents


1 download

TRANSCRIPT

Testing Conditional Independence using Conditional MartingaleTransforms

Kyungchul Song1

Department of Economics, University of Pennsylvania

March 13, 2007

Abstract

This paper investigates testing conditional independence between Y and Z given

λθ(X) for some θ ∈ Θ ⊂ Rd, for a function λθ(·) known upto a parameter θ ∈ Θ.

First, the paper proposes a new method of conditional martingale transforms un-

der which tests are asymptotically pivotal and asymptotically unbiased against√n-converging Pitman local alternatives. Second, the paper performs an analy-

sis of asymptotic power comparison using the limiting Pitman efficiency. From

this analysis, it is found that there is improvement in power when we use non-

parametric estimators of conditional distribution functions in place of true ones.

When both Y and Z are continuous, the improvement in power due to the non-

parametric estimation disappears after the conditional martingale transform, but

when either Y or Z is binary, this phenomenon of power improvement reappears

even after the conditional martingale transform. We perform Monte Carlo simu-

lation studies to compare the martingale transform approach and the bootstrap

approach.

Key words and Phrases: Conditional independence, Asymptotic pivotal tests,

conditional martingale transforms, limiting Pitman efficiency, approximate Ba-

hadur efficiency.

JEL Classifications: C12, C14, C52.

1I am grateful to Yoon-Jae Whang and Guido Kuersteiner who gave me valuable advice at the initial stageof this research. I also thank Hyungsik Roger Moon, Qi Li, Joon Park, Bernard Salanié, Norm Swansonand seminar participants at a KAEA meeting and workshops at Greater New York Metropolitan Econo-metrics Colloquium, Rutgers University, Texas A&M, UC Riverside, UCLA for their valuable comments.All errors are mine. Address correspondence to: Kyungchul Song, Department of Economics, University ofPennsylvania, 528 McNeil Building, 3718 Locust Walk, Philadelphia, Pennsylvania 19104-6297.

1

1 Introduction

Suppose that Y and Z are random variables, and let λθ(X) be a real valued function of a

random vector X indexed by a parameter θ ∈ Θ. This paper investigates the problem of

testing the conditional independence of Y and Z given λθ(X) for some θ ∈ Θ, i.e., (using

the notation of Dawid (1980))

Y ⊥ Z|λθ(X) for some θ ∈ Θ. (1)

The function λθ(·) is known upto a finite dimensional parameter θ ∈ Θ.

The conditional independence restriction is often used as part of the specification of

an econometric model such as an identifying restriction. It is crucial for identification of

parameters of interest in nonseparable models. (e.g. Altonji and Matzkin (2005) and Vytlacil

and Yildiz (2006).) The restriction is also frequently used in the literature of program

evaluations. (e.g. Rubin (1978), Rosenbaum and Rubin (1983), and Hirano, Imbens, and

Ridder (2003)). The restriction is, sometimes, a direct implication of economic theory. For

example, in the literature of insurance, presence of positive conditional dependence between

coverage and risk is known to be a direct consequence of adverse selection. (e.g. Chiappori

and Salanié (2000)).

The literature of testing conditional independence for continuous variables appears rather

recent and, apparently, involves relatively few researches as compared to that of other non-

parametric or semiparametric tests. Linton and Gozalo (1999) proposed tests that are slightly

stronger than tests of conditional independence. Delgado and Gonzáles Manteiga (2001) also

proposed a bootstrap-based test of conditional independence. Su and White (2003a, 2003b,

2003c) studied several methods of testing conditional independence. Angrist and Kuersteiner

(2004) suggested distribution-free tests of conditional independence.

This paper’s framework of hypothesis testing is based on an unconditional-moment formu-

lation of the null hypothesis using indicator functions that run through an index space. Tests

in our framework have the usual local power properties that are shared by other empirical-

process based approaches such as Linton and Gozalo (1999), Delgado and Gonzáles Manteiga

(2001), and Angrist and Kuersteiner (2004). In contrast with Linton and Gozalo (1999) and

Delgado and Gonzáles Manteiga (2001), our tests are asymptotically pivotal (or asymp-

totically distribution-free), which means that asymptotic critical values do not depend on

unknowns, so that one does not need to resort to a simulation-based method like bootstrap

to obtain approximate critical values.

This latter property of asymptotic pivotalness is shared, in particular, by Angrist and

2

Kuersteiner (2004) whose approach is close to ours in two aspects.2 First, their test has local

power properties that are similar to ours. Second, both their approach and ours employ

a method of martingale transform to obtain asymptotically pivotal tests. However, they

assume that Z is binary and the conditional probability P{Zi = 1|Xi} is parametricallyspecified, whereas we do not require such conditions. This latter contrast primarily stems

from completely different approaches of martingale transforms employed by the two papers.

In particular, Angrist and Kuersteiner (2004) employ martingale transforms that apply to

tests involving estimators of finite-dimensional parameters, whereas this paper uses condi-

tional martingale transforms that are amenable to testing semiparametric restrictions that

involve nonparametric estimators (Song (2007)).

The martingale transform approach was pioneered by Khmaladze (1988,1993) and has

been used both in the statistics and economics literature. For example, Stute, Thies, and Zhu

(1998) and Khmaladze and Koul (2004) studied testing nonlinear parametric specification of

regressions. Koul and Stute (1999) proposed specification tests of nonlinear autoregressions.

Recently in the econometrics literature, Koenker and Xiao (2002) and Bai (2003) employed

the martingale transform approach for testing parametric models, and Angrist and Kuer-

steiner (2004) investigated testing conditional independence as mentioned before. See also

Bai and Ng (2003) for testing conditional symmetry in a similar vein. Song (2007) devel-

oped a method of conditional martingale transform that applies to tests of semiparametric

conditional moment restrictions. Although the conditional independence restriction can be

viewed as a special case of semiparametric conditional moment restrictions, the restriction

lies beyond the scope of Song (2007) and the manner the conditional martingale transforms

are employed is quite different.

One of the most distinct aspects of the hypothesis testing framework of focus in this

paper is that we allow the conditioning variable λθ(X) to depend on unknown parameter

θ. This is particularly important when the dimension of the random vector X is large. For

example, the number of variables in X used in Chiappori and Salanié (2000) is 55. In such

a situation, instead of testing

Y ⊥ Z|X,

it is reasonable to impose a single-index restriction upon X and test the hypothesis of the

form in (1). One can check the robustness of the result by varying the parameterization used

for the single-index λθ(X).

2Several tests suggested by Su andWhite are also asymptotically pivotal, and consistent, but have differentasymptotic power properties. Their tests are asymptotically unbiased against Pitman local alternatives thatconverge to the null at a rate slower than

√n. However, when we extend the space of local alternatives

beyond Pitman local alternatives, the rate of convergence of local alternatives that is optimal is typicallyslower than

√n.(e.g. Horowitz and Spokoiny (2002)).

3

This paper contains also an extensive analysis of asymptotic power properties of tests by

using the limiting Pitman efficiency. First, we find that using the nonparametric estimator

of the conditional distribution function of Y given λθ(X) in the test statistic improves the

asymptotic power of the test as compared to the case where one uses a true one. This

may strike one as counter-intuitive because even when the true nonparametric function is

known, one does better by estimating the function. We affirm this phenomenon in terms of

the limiting Pitman efficiency, and provide intuitive explanation. In fact, this phenomenon

can be viewed precisely as a hypothesis testing version of what Hirano, Imbens, and Ridder

(2003) have found in estimation.

When Y and Z are continuous, our approach suggests applying conditional martingale

transforms to the indicator functions involving Y and Z. Since the conditional martingale

transforms remove the additional local shift caused by the nonparametric estimation of

the conditional distribution function, the phenomenon of improved asymptotic power by

nonparametric estimation errors does not arise after we apply the conditional martingale

transforms. However, when Z is binary, this phenomenon of improved local asymptotic

power reappears, because in this case it suffices to take conditional martingale transform

only on the continuous variable Y.

A conditional martingale transform alters the asymptotic power properties of the original

test. Under the setup of a nonseparable model: Y = ξ(Z, ε), we compute the limiting Pitman

efficiency of two Kolmogorov-Smirnov type tests , one using the original empirical process

and the other using the transformed empirical process. We characterize the set of alternatives

under which the conditional martingale transform weakly improves the asymptotic power of

the original test.

We perform Monte Carlo simulations to compare the approach of bootstrap and that of

martingale transforms. We find that the martingale transform-based tests overall appear to

perform as well as the bootstrap-based test.

In the next section, we define the null hypothesis of conditional independence and discuss

examples. In Section 3, we present the asymptotic representation of the semiparametric

empirical process. In Section 4, we introduce conditional martingale transforms and in

Section 5, we focus on the case where Z is binary. Section 6 is devoted to the asymptotic

power analysis of a variety of tests. In Section 7, we present and discuss results from

simulation studies and in Section 8, we conclude.

4

2 Testing Conditional Independence

2.1 The Null and the Alternative Hypotheses

Suppose that we are given with a random vector (Y,Z,X) distributed by P and an unknown

real valued function λθ(·) on RdX , θ ∈ Θ ⊂ Rd. We assume the following for (Y, Z).

Assumption 1 : (i) Y and Z are continuous random variables. (ii) Θ is compact in Rd.

When either Y or Z is binary, the development of this paper’s thesis becomes simpler,

as we will see when Z is binary in a later section. We introduce notations for indicator

functions: for a random variable S and a real number s,

γs(S) , 1{S ≤ s}.

Hence we write γz(Z) = 1{Z ≤ z}, γy(Y ) = 1{Y ≤ y}, and γλ(λθ(X)) , 1{λθ(X) ≤ λ}.The null hypothesis in (1) requires to check the conditional independence of Y and Z

given λθ(X) for all θ ∈ Θ until we find one θ that satisfies the conditional independence

restriction. The following assumption simplifies this problem by making it suffice to focus on

a specific θ0 in Θ.

Assumption 2 : There exists a unique parameter θ0 ∈ Θ such that (i) Y ⊥ Z|λθ0(X),whenever the null hypothesis of (1) holds, and (ii) for some δ > 0, λθ(X) is continuous for

all θ ∈ B(θ0, δ) , {θ ∈ Θ : ||θ − θ0|| < δ}.

For example, the unique parameter θ0 can be defined as a unique solution to the following

problem

θ0 = argminθ∈Θ

sup(y,z,λ)∈R3

¯E[γλ(λθ(X))γz(Z)(γy(Y )−E

£γy(Y )|λθ(X)

¤¯, (2)

when a unique solution exists. Under Assumption 2, we can write the null hypothesis

equivalently as (see Theorem 9.2.1 in Chung (2001), p.322, and Lemma 1 in Song (2007))

H0 : P©E(γy(Y )|U,Z) = E(γy(Y )|U), for all y ∈ [0, 1]

ª= 1,

where U , Fθ0(λθ0(X)) and Fθ denotes the distribution function of λθ(X). The alternative

hypothesis is given by the negation of the null:

H1 : P©E(γy(Y )|U,Z) = E(γy(Y )|U), for all y ∈ R

ª< 1.

5

The next subsection provides examples in which this testing framework is relevant.

2.2 Examples

2.2.1 Nonseparable Models

The conditional independence restriction is often used in nonseparable models. The first

example is a model of weakly separable endogenous dummy variables. Suppose Y and Z are

generated by

Y = g(ν1(X,Z), ε) and Z = 1 {ν2(X) ≥ u} .

For example, this model can be useful when a researcher attempts to analyze the effect of

job training of a worker upon her earnings prospects. Here Y denotes the earnings outcome

of a worker, and Z denotes the indicator for the job training. It will be of interest to

a researcher whether the indicator for the job training remains still endogenous even after

controlling for covariatesX, because in the presence of endogeneity, the researcher might need

to include some additional covariate into the specification of the endogenous job training that

has a sufficient degree of variation independent from X. (e.g. Vytlacil and Yildiz (2006)).

The exogeneity of Z for the outcome Y given the covariate X will be represented by the

conditional independence restriction:

Y ⊥ Z|ν1(X,Z), ν2(X).

A second example is from Altonji and Matzkin (2005) who considered the following

nonseparable regression model Y = m(Z, ε). One of the assumption used to identify the

local average response was that ε is independent of Z given an instrumental variable X. This

gives the following testable implication of conditional independence restriction, Y ⊥ Z|X.

2.2.2 Heterogeneous Treatment Effects

In the literature of program evaluations, often the restriction of conditional independence

(Y0, Y1) ⊥ Z | X

is considered.(e.g. Rubin (1978), Rosenbaum and Rubin (1983), and Hirano, Imbens, and

Ridder (2003) to name but a few.)3 Here Y0 represents the outcome when the agent is not

treated and Y1, the outcome when treated. The variable X represents the agent’s covariates.

3It is also worth noting that Heckman, Ichimura, and Todd (1998) point out that a weaker condition ofmean independence suffices for the identication of the average treatment effect.

6

It is well-known (Rosenbaum and Rubin (1983)) that when p(X) , P (Z = 1|X) ∈ (0, 1),

(Y0, Y1) ⊥ Z | p(X).

This restriction is not directly testable because we do not observe Y0 and Y1 simultaneously

for each individual. However, as pointed out by Heckman, Smith, and Clemens (1997), we

have the testable implications: (1 − Z)Y ⊥ Z | p(X) or ZY ⊥ Z | p(X). When we specifyp(·) parametrically, the testing framework belongs to (1).

2.2.3 Testing No Conditional Positive Dependence

In the literature of contract theory, there has been an interest in testing relevance of asym-

metric information. Under the asymmetric information, it is known that the risk is positively

related to the coverage of the contract conditional on all publicly observed variables. (See

Cawley and Phillipson (1999), Chiappori and Salanié (2000), Chiappori, Jullien, Salanié and

Salanié (2002), and references therein.) In this case, the null hypothesis is conditional inde-

pendence of the risk and the coverage, but the alternative hypothesis is restricted to positive

dependence of the risk (or probability of claims in the insurance data) and the coverage of

the insurance contract conditional on all observable variables. This paper’s framework can

be used to deal with this case of one-sided testing. At the end of the paper, we provide a

table of critical values for one-sided Kolmogorov-Smirnov statistics.

3 Asymptotic Representation of a Semiparametric Em-

pirical Process

3.1 Asymptotic Representation

The null hypothesis of (1) is a conditional moment restriction. We can obtain an equivalent

formulation in terms of unconditional moment restrictions. Let us define

gYy (u) , E(γy(Y )|U = u) and gZz (u) , E(γz(Z)|U = u)

and S , R2 × [0, 1]. Then the null hypothesis can be written as:

H0 : E[γu(U)γz(Z)(γy(Y )− gYy (U))] = 0, ∀(y, z, u) ∈ S.

7

Test statistics we analyze in the paper are based on the sample analogue of the above moment

restrictions.

Suppose that we are given a random sample {Si}ni=1 = {(Yi, Zi, Ui)}ni=1 of S = (Y,Z, U). Letus define

Uθ,i = Fn,θ,i(λθ(Xi)),

where Fn,θ,i(·) denotes the empirical distribution function of {λθ(Xi)}ni=1 with the omissionof the i-th data point λθ(Xi). We assume the following.

Assumption 3: (i) (Yi, Zi, Xi)ni=1 is a random sample from P where E||Y ||p < ∞ and

E||Z||p <∞ for some p > 4.

(ii) There exists an estimator θ of θ0 in Assumption 2 such that ||θ − θ0|| = oP (n−1/4).

(iii) Λ , {λθ(·) : θ ∈ Θ} is uniformly bounded and there exists δ > 0, such that for any

θ1, θ2 ∈ B(θ0, δ),

|λθ1(x)− λθ2(x)| ≤ C||θ1 − θ2||

for some constant C > 0.

The estimator θ in (ii) can be obtained from a sample version of the problem in (2). The

consistency can be obtained using an approach of Chen, Linton, and van Keilegom (2003).

The uniform boundedness condition in (iii) for Λ is innocuous. For any strictly increasing

function Φ on [0, 1], we can redefine λ0θ = Φ ◦ λθ so that λ0θ is now uniformly bounded.

The the null hypothesis and the alternative hypothesis, and the random variable Uθ,i remain

invariant to this redefinition of λθ.

If the function gYy (·) were known and the quantile transformed data (Ui)ni=1 for U were

observed, a test statistic could be constructed as a functional of the following stochastic

process:

νn(r) ,1√n

nXi=1

γu(Ui)γz(Zi)(γy(Yi)− gYy (Ui)), r = (y, z, u) ∈ S. (3)

In this section, we introduce a series estimator of gYy and investigate the asymptotic

properties of the resulting empirical process that involves the estimator. We approximate

gYy (u) by pK(u)0πy which is constituted by a basis function vector pK(u) and a coefficient

vector πy. Given an appropriate estimator λ(·) of λ(·), define

Ui ,1

n

nXj=1,j 6=i

1 {λθ(Xj) ≤ λθ(Xi)} . (4)

The series-based estimator is defined as gYy (u) , pK(u)0πy, where πy , [P 0P ]−1P 0ay with ay

8

and P being defined by

ay ,

⎡⎢⎢⎣γy(Y1)...

γy(Yn)

⎤⎥⎥⎦ and P ,

⎡⎢⎢⎣pK(U1)

0...

pK(Un)0

⎤⎥⎥⎦ . (5)

Using the estimated conditional mean function gYy , we can construct a feasible version of the

process νn(r) :

νn(r) =1√n

nXi=1

γu(Ui)γz(Zi)(γy(Yi)− gYy (Ui)). (6)

A test statistic can be constructed as a known functional of the process νn(r). We assume

the following for the distribution of λθ(X), θ ∈ B(θ0, δ), for some δ > 0. In the following Sθdenotes the support of λθ(X).

Assumption 4 : (i) There exists δ > 0 such that (a) the density fθ(λ) of λθ(X) satisfies thatsupθ∈B(θ0,δ) supλ∈Sθ fθ(λ) <∞, (b) for someC > 0, supθ∈B(θ0,δ)supλ∈Sθ

¯Fθ(λ+ δ0)− Fθ(λ− δ0)

¯<

Cδ0, for all δ0 > 0, and (c) there exists C > 0 such that for each u ∈ U , {Fθ ◦ λθ : θ ∈B(θ0, δ)}, and each δ0 > 0,

sup(u1,u)∈[0,1]2:|u2−u1|<δ0

¯fY,X|u(X)(y, x|u1)− fY,X|u(X)(y, x|u2)

¯≤ ϕu(y, x)δ

0,

where fY,X|u(X) denotes the conditional density function of (Y,X) given u(X), and ϕu(·, ·)is a real function that satisfies supx∈RdX

Rϕu(y, x)dy < C and

Rϕu(y, x)dx < CfY (y) with

fY (·) denoting the density of Y.(ii) The conditional distribution functions gYy (u) and gZz (u) are second-order continuously

differentiable with uniformly bounded derivatives.

The conditions in Assumption 4 are primarily used to resort to a general result of Escan-

ciano and Song (2007) which Theorem 1 below relies on. A condition similar to Condition

(i)(a) was used by Stute and Zhu (2005). Condition (i)(b) says that the distribution function

of λθ(X) should be Lipschitz continuous with a uniformly bounded Lipschitz coefficient. Con-

dition (i)(c) requires Lipschitz continuity of conditional density of (Y,X), given Fθ(λθ(X)) in

the evaluating point of the conditioning variable.

Theorem 1 : Suppose Assumptions 1-4 hold. Furthermore, assume that the basis function

9

pK in (5) satisfies Assumption A1 in the Appendix. Then the following holds:

sup(y,z,u)∈S

¯¯νn(y, z, u)− 1√

n

nXi=1

γu(Ui)©γz(Zi)− gZz (Ui)

ª©γy(Yi)− gYy (Ui)

ª¯¯ = oP (1). (7)

The result of Theorem 1 provides a uniform asymptotic representation of the semipara-

metric empirical process νn(r). This asymptotic representation motivates the construction

of proper conditional martingale transforms that we develop in a later section. The repre-

sentation also constitutes the basis upon which we derive the asymptotic power properties

of a test constructed using νn(r). Indeed, observe that the process

1√n

nXi=1

γu(Ui)©γz(Zi)− gZz (Ui)

ª©γy(Zi)− gYy (Ui)

ªis a zero mean process only when the null hypothesis of conditional independence is true.

From the asymptotic representation, we can derive the asymptotic properties of the process

νn(r) as follows. Let D(S) denote the space of bounded functions on S that is right-

continuous and have left-limits, and we endow D(S) with a uniform norm || · ||∞. Thenotation à denotes weak-convergence in D(S) in the sense of Hoffman-Jorgensen (e.g. vander Vaart and Wellner (1996)). Let us define the conditional inner product h·, ·iu by

hf, giu ,Z(fg)(s)dF (s|u)

where F (s|u) is the conditional distribution function of S given U = u.

Corollary 1 : (i) (a) Under the null hypothesis, νn(r) Ã ν(r) in D(S), where ν is a

Gaussian process whose covariance kernel is given by

c(r1; r2) =

Z u1∧u2

−∞C(w1, w2;u)du

with C(w1, w2;u) = gZz1∧z2(U){gYy1∧y2(U)− gYy1(U)gYy2(U)}, wj = (zj, yj), j = 1, 2.

(b) Under fixed alternatives, n−1/2νn(r)Ã hγu, γz(γy − gYy )i in D(S).(ii) (a) Under the null hypothesis, νn(r) Ã ν1(r) in D(S) where ν1 is a Gaussian process

whose covariance kernel is given by

c1(r1; r2) =

Z u1∧u2

−∞C1(w1, w2;u)du

with C1(w1, w2;u) = {gZz1∧z2(U)− gZz1(U)gZz2(U)}{gYy1∧y2(U)− gYy1(U)g

Yy2(U)}.

10

(b) Under fixed alternatives, n−1/2νn(r)Ã hγu, γz(γy − gYy )i in D(S).

The results of Corollary 1 lead to the asymptotic properties of tests based on the processes

νn(r) and νn(r). The limiting processes ν(r) and ν1(r) under the null hypothesis are Gaussian

processes with different covariance kernels. Under fixed alternatives, the two empirical

processes νn(r) and νn(r) weakly converge to the same shift term after normalization by√n, providing the basis for local power properties against

√n-converging Pitman local al-

ternatives. The result in Corollary 1 gives rise to a surprising consequence that using the

nonparametrically estimated distribution function gYy (u) instead of the true one gYy (u) can

improve the asymptotic power of the tests. In particular, the covariance kernel of the limit

process under the null hypothesis "shrinks" due to the use of gYy (u) instead of gYy (u) whereas

the limit shift hγu, γz(γy − gYy )i under the alternatives remains intact. In a later section, weformalize this observation.

It is worth noting that the estimation error in λ does not play a role in determining the

limit behavior of the process νn(r). In other words, the results of Theorem 1 and Corollary 1

do not change when we replace λθ by λθ0 . A similar phenomenon is discovered and analyzed

by Stute and Zhu (2005) in the context of testing single-index restrictions using a kernel

method. This convenient phenomenon is due to the use of the empirical quantile transform

of {λθ(Xi)}ni=1.We construct the following two sided test statistics:

TKS = supr∈S

|νn(r)| and TCM =

ZSνn(r)

2dr. (8)

The test statistic TKS is of Kolmogorov-Smirnov type and the test statistic TCM is of Cramér-

von Mises type. The asymptotic properties of the tests based on TKS and TCM follow from

Theorem 1. Indeed, under the null hypothesis,

TKS Ã supr∈S

|ν(r)| and TCM ÃZSν1(r)

2dr, in D(S) (9)

and under the Pitman local alternatives Pn such that for some ay,z(·), En(γz(Z)(γy(Y ) −gYy (U))|U = u) = ay,z(u)/

√n+ o(n−1/2), we have

νn(r)Ã ν1(r) + hγu, ay,zi, in D(S).

The presence of the non-zero shift term hγu, ay,zi renders the test to have nontrivial asymp-totic power against such local alternatives.

The testing procedure based on (9) is not feasible. In particular, we cannot tabulate

11

critical values from the limiting distribution, because the test statistics depend on unknown

components of the data generating process and in consequence the tests are not asymptoti-

cally pivotal.

4 Conditional Martingale Transform

A recent work by the author (Song (2007)) proposes the method of conditional martingale

transform which can be useful to obtain asymptotically pivotal semiparametric tests. The

method of conditional martingale transform proposed in the work does not apply to the

testing problem of conditional independence for two reasons. First, the generalized residual

function ρy,z , γz(γy − gYy ) here is indexed by (y, z) ∈ R2 whereas the generalized residual

function ρτ,θ in Song (2007) does not depend on the index of the empirical process. Sec-

ond, while Song (2007) requires that the conditional distribution of instrumental variables

conditional on the variable inside the nonparametric function should not be degenerate, this

condition does not hold here. In the current situation of testing conditional independence,

the instrumental variable in the conditional moment restriction and the variable inside the

nonparametric function are identically λθ0(X). Therefore, the nondegeneracy of the con-

ditional distribution fails. Third, the instrumental variable λθ0(X) here depends on the

unknown parameter θ0.

The main idea of this paper is that first, we take into account the null restriction of

conditional independence when we search for a proper transformation of the semiparamet-

ric empirical process νn(r) using conditional martingale transforms, and then analyze the

asymptotic behavior of the semiparametric empirical process after the transform.4

4.1 Preliminary Heuristics

In this section, we provide some heuristics of the mechanics by which conditional martingale

transforms work. Suppose that we are given with an operator K on the space of indicator

functions such that the transformed indicator functions γKz , Kγz and γKy , Kγy satisfythe orthogonality condition and the isometry condition with respect to the conditional inner

4By noting that the limiting processes ν and ν1 are variants of a 4-sided tied down Kiefer process, we mayconsider a martingale transform for a 4-sided tied down Kiefer process that was suggested by McKeague andSun (1996). The martingale transform for this case is represented as two sequential martingale transforms.However, the sequential martingale transforms will be very complicated in practice because it involves takingrepreated martingale transforms of a function.

12

product almost surely:

Conditional Orthogonality : hγKz , 1iU = 0 and hγKy , 1iU = 0, andConditional Isometry : hγKz1, γKz2iU = hγz1, γz2iU and hγ

Ky1, γKy2iU = hγy1, γy2iU .

An operator K that satisfies these two properties can be used to construct a transformed

empirical process from which asymptotically pivotal tests can be generated. Recall that

by Theorem 1, the process νn(r) has the following asymptotic representation (using our

conditional inner product notation: hγy, 1iUi , E£γy(Yi)|Ui

¤= gYy (u))

νn(r) =1√n

nXi=1

γu(Ui) {γz(Yi)− hγz, 1iUi}©γy(Yi)− hγy, 1iUi

ª+ oP (1).

When we replace the indicator functions γz and γy by the transformed ones γKz and γKy ,

the right-hand side asymptotic representation turns into the following (by the conditional

orthogonality of the transform):

1√n

nXi=1

γu(Ui)©γKz (Yi)− hγKz , 1iUi

ª©γKy (Yi)− hγKy , 1iUi

ª=

1√n

nXi=1

γu(Ui)γKz (Zi)γ

Ky (Yi).

(10)

We can show that the last term is equal to

√nE£γu(U)hγKz , γKy iU

¤+ νK(r) + oP (1) (11)

where νK(r) is a certain Gaussian process. The asymptotic behavior of the leading two terms

determine the asymptotic properties of tests that are based on the transformed empirical

process. Suppose that we are under the null hypothesis. Then the first term in (11) becomes

zero by the conditional independence restriction. As shown in the appendix, the Gaussian

process νK(r) has a covariance function of the form

c(r1, r2) , E[γu1∧u2(U)γz1∧z2(Z)γy1∧y2(Y )]. (12)

This form is obtained by using the conditional isometry of the operator K and the null

hypothesis of conditional independence. Hence the resulting Gaussian process is a time-

transformed Brownian sheet and we can rescale it into a standard Brownian sheet by using

an appropriate normalization. Against alternatives P1 such that P1{|hγKz , γKy iU | > 0} > 0, atest properly based on the transformed process will be consistent.

The interesting phenomenon that the nonparametric estimation error improves the power

13

disappears for tests based on transformed processes of the kind in (10). Consider the as-

ymptotic representation of the empirical process νn(r) that uses the true hγy, 1iUi :

νn(r) =1√n

nXi=1

γu(Ui)γz(Yi)©γy(Yi)− hγy, 1iUi

ª+ oP (1).

When we apply the transform K to the indicator functions of γy and γz, the right-hand sideis equal to

1√n

nXi=1

γu(Ui)γKz (Yi)

©γKy (Yi)− hγKy , 1iUi

ª+ oP (1)

=1√n

nXi=1

γu(Ui)γKz (Yi)γ

Ky (Yi) + oP (1).

Hence the asymptotic representation of the process after the transform K coincides with

that in (10). The asymptotic power properties remain the same regardless of whether we

use gYy (u) or gYy (u) once the transform K is applied.

4.2 Conditional Martingale Transforms

In this section, we discuss a method to obtain the operator K that satisfies the conditionalorthogonality and the conditional isometry properties. Khmaladze (1993) proposes using

a class of isometric projection operators on L2-space. This class of isometric projection

operators are designed to be an isometry in the usual inner-product in L2-space. Song (2007)

extends this isometric projection to conditional isometric projection by using conditional

inner-product in place of the usual inner-product and finds that this class of projection

operators can be useful to generate semiparametric tests that are asymptotically pivotal.

Just as the isometric projection is called a martingale transform, Song (2007) calls the

conditional isometric projection a conditional martingale transform. See Song (2007) for the

details.

By applying the conditional martingale transform in Song (2007) to our indicator func-

tions, we can obtain the transform K as follows:

γKy (U, Y ) , γy(Y )−E£γy(Y )hy(U, Y )|U = u

¤y=Y

and (13)

γKz (U,Z) , γz(Z)−E [γz(Z)hz(U,Z)|U = u]z=Z ,

14

where

hy(U, Y ) ,1{Y ≤ y}

1− FY |U(Y |U)and hz(U,Z) ,

1{Z ≤ z}1− FZ|U(Z|U)

. (14)

Observe that if hy(U, Y ) and hz(U,Z) were taken to be a constant 1, the operator K is just anorthogonal projection onto the orthogonal complement of a constant function in L2(P ). The

factors hy(U, Y ) and hz(U,Z) as defined in (14) render the transforms isometric.

As compared to the conditional martingale transform in Song (2007), the transform in

this paper corresponds to the case of function q0 in Song (2007) being equal to 1 and hence is

simpler. This is primarily because the estimation error of λ is no longer needed to be taken

care of in the transform, as it does not play a role in determining the asymptotic represen-

tation in Theorem 1 due to the sample quantile transform of the conditioning variable.

The following result shows that the transform K defined in (13) is a martingale transformthat we have sought for. Eventually, we focus on test statistics based on the following two

processes νKn (r) and νKn,E(r) defined by

νKn (r) =1√n

nXi=1

γu(Ui)γKz (Zi)γ

Ky (Yi) and (15)

νKn,E(r) =1√n

nXi=1

γu(Ui)γKz (Zi)(γy(Yi)− gKy (Ui)),

where gKy (u) denotes the nonparametric estimator gYy (u) that uses γ

Ky in place of γy.

Theorem 2 : (i) For all y1, y2, z1 and z2 in R,

hγKy1 , 1iU = 0 and hγKy1 , γKy2iU = hγy1, γy2iU a.s. and

hγKz1 , 1iU = 0 and hγKz1 , γKz2iU = hγz1 , γz2iU a.s.

(ii) Suppose Assumptions 1-4 hold.

(a) Under H0, the processes νKn (r) and νKn,E(r) defined in (15) satisfy that

νKn (r)Ã νK(r) and νKn,E(r)Ã νK(r), in D(S), (16)

where νK is a Gaussian process on S whose covariance kernel is given by

c3(r1; r2) ,Z u1∧u2

−∞C3(w1, w2;u)dF (u), rj = (u,wj), j = 1, 2,

and the function C3(w1, w2;u) is defined by C3(w1, w2;u) , gZz1∧z2(u)gYy1∧y2(u).

15

(b) Under the fixed alternatives,

n−1/2νKn (r)Ã E[γu(U)hγKz , γKy iU ] and n−1/2νKn,E(r)Ã E[γu(U)hγKz , γKy iU ], in D(S),

where γKz and γKy are defined in (13).

The first statement of the theorem is immediately adapted from Theorem 6.1 of Khmaladze

and Koul (2004) to the "conditional inner product". This result tells us that the transform

satisfies the orthogonality condition and the isometry condition. From the results of (i)

and (ii), we can derive the asymptotic properties of tests that are constructed based on the

transformed processes νKn (r) and νKn,E(r).

The limiting Gaussian process νK(r) is a time-transformed Brownian sheet whose kernel

still depends on the unknown nonparametric functions gZz (u) and gYy (u). Following a similar

idea in Khmaladze (1993), we can turn the process into a standard Brownian sheet by

replacing γKz (u, z) and γKy (u, y) by

γKz (u, z) ,γKz (u, z)pfZ|U(z|u)

and γKy (u, y) ,γKy (u, y)pfY |U(y|u)

,

where fZ|U and fY |U denote conditional density functions. Then the weak limit νK(r) in (16)

has a covariance function

c(r1, r2) , (u1 ∧ u2) (z1 ∧ z2) (y1 ∧ y2), r1, r2 ∈ S,

which is that of a standard Brownian sheet. Observe that the test based on this conditional

martingale transform is not yet asymptotically pivotal because the range S of the standardBrownian sheet depends on the distribution of (Y, Z). This can be dealt with by rescaling

the marginals of Y and Z appropriately. We will explain this later in a section devoted to

the feasible conditional martingale transform.

4.3 Feasible Conditional Martingale Transforms

The method of conditional martingale transform proposed in the previous section is not

yet feasible for two reasons. First the conditional martingale transform involves unknown

components of the data generating process. Second, the range of index S is not known.Suppose F Y

n and FZn are the empirical quantile transform of (Yi)ni=1 and (Zi)

n.i=1. Let us

16

define two functions in the following way:

V Yi , F Y

n (Yi) and V Zi , FZ

n (Zi). (17)

We define a feasible transform as follows. Let us define P as in (5) and let

β(y, y) ,

⎡⎢⎢⎢⎢⎢⎣1{c0 ≤ V Y

1 ≤ y ∧ y}/½q

f(V Y1 , U1)[1− pK(U1)

0π(V Y1 )]

¾...

1{c0 ≤ V Yn ≤ y ∧ y}/

½qf(V Y

n , Un)[1− pK(Un)0π(V Y

n )]

¾⎤⎥⎥⎥⎥⎥⎦ (18)

where π(y) , [P 0P ]−1P 0b(y), b(y) being the column vector of i-th entry equal to 1{0 <

V Yi ≤ y}, and f(v, u) denotes a nonparametric estimator of the joint density of V Y

i and Ui.5

Thus the feasible transform is obtained by

(γKy )(u, y) =1{y ≤ y}qf(y, u)

− pK(u)0[P 0P ]−1P 0β(y, y). (19)

We can similarly obtain (γKz )(u, z) by exchanging the roles of VYi and V Z

i . An estimated

version of the transformed process is now defined by

νKn (r) ,1√n

nXi=1

γu(Ui)γKz (Ui, V

Zi )γ

Ky (Ui, V

Yi ).

Due to the random rescale of Yi and Zi in (17), the process is indexed by r ∈ S0, whereS0 , [0, 1)2 × [0, 1]. Hence the dependence of the process upon the unknown support S hasdisappeared.

We can construct the transformed two-sided test statistics in the following way:

TKKS = supr∈S0

¯νKn (r)

¯and TKCM =

ZS0νKn (r)

2dr. (20)

In the context of testing semiparametric conditional moment restrictions, Song (2007) delin-

eated conditions under which the feasible conditional martingale transforms are asymptoti-

cally valid. We can establish the asymptotic validity of the feasible conditional martingale

5Note that since Ui is uniformly distributed, we have f(v|u) = fV Y ,U (v, u) where fV Y ,U (v, u) is a joint-density of V Y

i and Ui. Hence we can use simply the estimator of fV Y ,U (v, u) instead of f(v|u).

17

transforms in a similar manner, so that we obtain, combined with Theorem 2,

TKKS →d supr∈S0

|W (r)| and TKCM →d

ZS0|W (r)|2dr,

where W is a standard Brownian sheet on S. We will sketch the derivation of asymptoticvalidity in the appendix.

When we are interested in testing the null of conditional independence against alternatives

of conditional positive dependence, we can construct the following test statistics

TKKS,1 = supr∈S0

νKn (r)→d supr∈S0

W (r) and

TKCM,1 =

ZS0|max{νKn (r), 0}|2dr→d

ZS0|max{W (r), 0}|2dr.

5 When Zi is a Binary Variable

In some applications, either Y or Z is binary. Specifically, we consider the case where Z is a

binary variable taking zero or one, and the variables Y and X are continuous. This setting

is similar to Angrist and Kuersteiner (2004), but the difference is that we do not assume

a parametric specification of the conditional probability P{Z = 1|X}. The application ofconditional martingale transform in this case becomes even simpler. First, we do not need

to transform the indicator function of Z. Second, the dimension of the index space for the

empirical process in the test statistic reduces from 3 to 2.

First, observe that the null hypothesis becomes

E[γu(U)1{Z = z}{γy(Y )−E[γy(Y )|U ]}] = 0

for all (u, y, z) ∈ [0, 1]2 × {0, 1}. From the binary character of Z, we observe that the null

hypothesis is equivalent to6

E[γu(U)Z{γy(Y )−E[γy(Y )|U ]}] = 0.

This latter formulation of the null hypothesis is convenient because the index for the indicator

of Zi is fixed to be one and hence the corresponding empirical process has an index running

over [0, 1]2 instead of [0, 1]2×{0, 1}. The corresponding feasible empirical process in the test6Equivalently, we can write the null hypothesis as E[γu(U)(1− Z){γy(Y )−E[γy(Y )|U ]}] = 0.

18

statistic before the martingale transform are given by

νn(u, y) =1√n

nXi=1

γu(Ui)hZi{γy(Yi)− E[γy(Yi)|Ui]

iqP {Z = 1|Ui}

, (u, y) ∈ [0, 1]2.

Using similar arguments in the proof of Theorem 1 and assuming regularity conditions for

the estimator P {Z = 1|Ui}, we obtain that

νn(u, y) =1√n

nXi=1

γu(Ui)(Zi −E [Zi|Ui])pP {Z = 1|Ui}

{γy(Yi)−E(γy(Yi)|Ui)}+ oP (1).

Hence, it turns out that in this case of binary Z, we have only to take a conditional martingale

transform on the indicator function of γy. Let us consider the following three martingale

transformed processes:

νKna(u, y) , 1√n

nXi=1

Ziγu(Ui)(γKy )(Ui, Yi)p

P {Zi = 1|Ui}, (21)

νKnb(u, y) , 1√n

nXi=1

(1− Zi)γu(Ui)(γKy )(Ui, Yi)p

P {Zi = 0|Ui}, and

νKn,E(u, y) , 1√n

nXi=1

Ziγu(Ui){(γKy )(Ui, Yi)− E[(γKy )(Ui, Yi)|Ui]}qP {Zi = 1|Ui}− P {Zi = 1|Ui}2

.

Then in view of the previous results, these processes will weakly converge inD([0, 1]×[0, 1)) toa standard Brownian sheet under the null hypothesis, i.e., a Gaussian process with covariance

function:

c(u1, y1;u2, y2) , (u1 ∧ u2)(y1 ∧ y2).

In a next subsection, we compare the asymptotic power properties of tests based on the three

processes.

Let us consider the concrete procedure of constructing a test statistic using a feasible

conditional martingale transform. For brevity, we consider only the processes νKna(u, y) and

νKnb(u, y). Let P (Zi = 1|Ui) denote a nonparametric estimator of P {Zi = 1|Ui} and let γKy

19

be as defined in (19). Then we define the processes

νKna(u, y) =1√n

nXi=1

Zi1{Ui ≤ u}γKy (Ui, VYi )q

P (Zi = 1|Ui)and

νKnb(u, y) =1√n

nXi=1

(1− Zi)1{Ui ≤ u}γKy (Ui, VYi )q

P (Zi = 0|Ui).

We can construct the following three Kolmogorov-Smirnov type test statistics:

TK1a = sup(u,y)∈[0,1]×[0,1)

¯νK1na(u, y)

¯TK1b = sup

(u,y)∈[0,1]×[0,1)

¯νK1nb(u, y)

¯and

TK1S = sup(u,y)∈[0,1]×[0,1)

{¯νK1na(u, y)

¯+¯νK1nb(u, y)

¯}/√2.

The three tests have the same asymptotic critical values. However, as we shall see, their

asymptotic power properties are quite different as we can see in the next section and the

section devoted to simulation studies. The asymptotic power of TK1a can be large or small

depending on whether Z and Y conditionally positively or negatively dependent given U.

The symmetrized version TK1S does not show such asymmetric power properties depending

on the direction of alternatives.

In the actual computation of the test statistic, we use grid points. We have found from

our simulation studies that we discuss in the next section, the equally spaced grid points of

10 by 10 in a unit square appear to work well. The limiting distribution under the null is

the Kolmogorov-Smirnov functional of a two-parameter Brownian sheet. Its critical values

are replicated from Table 3 of Khmaladze and Koul (2004) in the following table.

significance level 0.5 0.25 0.20 0.10 0.05 0.025 0.01

critical values 1.46 1.81 1.91 2.21 2.46 2.70 3.03

The critical values are based on the computation of Brownrigg and can be found at his

website: http://www.mcs.vuw.ac.nz/~ray/Brownian. Therefore, one reject the null hypoth-

esis of conditional independence at the level of 0.05 if Tn > 2.46.

We can similarly construct one-sided Kolmogorov-Smirnov type test statistics:

TK1a,1 = sup(u,y)∈[0,1]×[0,1)

νK1na(u, y) and

TK1b,1 = sup(u,y)∈[0,1]×[0,1)

νK1nb(u, y).

20

The statistic TK1a,1 is for testing conditional independence against the positive conditional

dependence given λθ(X) and the test statistic is TK1b,1 is for testing conditional independence

against the negative conditional dependence.

6 Asymptotic Power Analysis

In this section, we embark on a formal comparison of a variety of tests in which we analyze the

effects of the nonparametric estimation error and the conditional martingale transform upon

the asymptotic power of the tests. The basic tool of comparison that we employ is the liming

Pitman efficiency which we can compute without difficulty due to the well-known results

of Bahadur (1960) and Wieand (1976). In particular, Wieand (1976) specified sufficient

conditions under which the limiting Pitman efficiency for asymptotically non-normal tests

can be computed via the approximate Bahadur efficiency (Bahadur (1960)).

First, we find that the nonparametric estimation error improves the asymptotic power of

the test. This finding may strike one as counter-intuitive because it is tantamount to saying

that even when perfect knowledge of the nonparametric function is available, it is better to

use its estimator. A similar observation was made in estimation by Hahn (1998) and Hirano,

Imbens, and Ridder (2003). We provide some discussions regarding this phenomenon later.

Second, we analyze the effect of the conditional martingale transform upon the asymptotic

power properties. We find that when both Y and Z are continuous, the improvement of

asymptotic power by nonparametric estimation error disappears under the conditional mar-

tingale transform, but when either Y or Z is binary, the improvement is partially preserved

even under the conditional martingale transform. We also characterize the set of local al-

ternatives in which the conditional martingale transform improves the asymptotic power of

the tests.

6.1 Limiting Pitman Efficiency

First, we define the limiting Pitman efficiency as follows (c.f. Nikitin (1995), p.2-3.). Sup-

pose that P is the space of probabilities P for (Y, Z,X), and we are given a sequence of

test statistics Tn for testing the null of conditional independence. We choose a parametric

submodel in P and define the limiting Pitman efficiency under the parametric submodel.

And then, we demonstrate that certain bounds of limiting Pitman efficiency do not depend

on the specific parametrization.

We first consider a parametric submodel {Pτ ⊂ P : τ ∈ [0, τ 0]} for some τ0 > 0 in whichP0 represents a probability under the null hypothesis and Pτ with τ ∈ (0, τ 0] represents a

21

probability under the alternative hypothesis. Then, for any β ∈ (0, 1) and τ ∈ [0, τ 0], a realsequence cTn(β, τ) is chosen to be such that

Pτ{Tn > cTn(β, τ)} ≤ β ≤ Pτ{Tn ≥ cTn(β, τ)}.

Then, the number αTn(β, τ) , P0{Tn ≥ cTn(β, τ)} denotes the minimal size of the test basedon {Tn} such that the power at Pτ is not less than β. For any given two test statistics Tnand Sn, the relative efficiency of Tn with respect to Sn is defined as

eT,S(α, β, τ) =min {n : αSn(β, τ) ≤ α for all m ≥ n}min {n : αTn(β, τ) ≤ α for all m ≥ n} .

When eT,S > 1, the test based on {Tn} is preferable to that based on {Sn} because therequired sample size to achieve power β at Pt with the given level α is smaller for the test

Tn than that for the test Sn. In general, the exact computation of eT,S(α, β, τ) is extremely

difficult in many cases, and there has been proposed a variety of notions of asymptotic relative

efficiency. In this paper, we focus on the limiting Pitman efficiency. For other notions, the

reader is referred to Nikitin (1995).

Following Wieand (1976), we define the upper and lower Pitman efficiencies as follows:

e+T,S(α, β) = sup{limsupj→∞eT,S(α, β, τ j) : {τ j} ∈ T1, τ j ↓ 0} ande−T,S(α, β) = inf{liminfj→∞eT,S(α, β, τ j) : {τ j} ∈ T1, τ j ↓ 0},

where T1 denotes the space of sequences {τ j}j∈N+, τ j ∈ (0, τ 0]. When limα→0 e+T,S(α, β)

and limα→0 e−T,S(α, β) exist and coincide, we call the common number the limiting Pitman

efficiency and denote it by e∗T,S(β). This limiting Pitman efficiency is the criterion that we

employ to compare the tests. When the test statistics are asymptotically normal, Bahadur

(1960) showed that the limiting Pitman efficiency coincides with the limit of the approximate

Bahadur efficiency as τ ↓ 0. Wieand (1976) showed that such coincidence is preserved evenunder a more general setting once a stronger condition on the asymptotic behavior of the

test statistic under the local alternatives is satisfied. In this latter case, the limiting Pitman

efficiency is independent of β and hence we simply denote it by e∗T,S.

6.2 The Effect of Nonparametric Estimation Error

We consider the situation of Section 3. We analyze how our use of a nonparametric estimator

of gYy (u) in the process νn(r) affects the power of the test. We find that the power of the

test improves by using the estimator instead of a true one. In other words, the test becomes

22

better when we use estimator gYy (u) even if we know the true conditional distribution function

gYy (u).

As for the stochastic processes νn(r) and νn(r) defined in (3) and (6), define

Tn = supr∈S |νn(r)| and Sn = supr∈S |νn(r)|,

and denote e∗T,S(β) to be the limiting Pitman efficiency of Tn with respect to Sn. Let gZz (·; τ)

and gYy (·; τ) denote the conditional distribution functions of Z and Y given U under Pτ . Then,

we obtain the following result.

Corollary 2 : Suppose the conditions of Theorem 1 hold. Then

e∗T,S = limτ↓0sup(z,y)∈R2

R 10(gZz (u; τ)− gZz (u; τ)

2)(gYy (u; τ)− gYy (u; τ)2)du

supy∈RR 10(gYy (u; τ)− gYy (u; τ)

2)du≤ 14,

if the limit above exists.

The result in Corollary 2 is well expected from the result of Theorem 1. The main

reason is because while the nonparametric estimator gYy (u) in place of gYy (u) has an effect of

"shrinking" the kernel of the limiting Gaussian process, the local shift of the test statistics

under the local alternatives remains the same regardless of whether one uses gYy (u) or gYy (u).

This yields the counter-intuitive result that even when we know the conditional distribution

function gYy (u), it is better to use its estimator gYy (u) in the test statistic.

7

In the literature of estimation, under certain circumstances, the estimation of nonpara-

metric functions improves the efficiency of the estimators upon that when true functions are

used. (e.g. Hahn (1998), and Hirano, Imbens, and Ridder (2003) and see references therein.)

This efficiency gain arises due to the fact that the nonparametric estimation utilizes the

additional restrictions in a manner that is not utilized when we use the true ones.

The primary source of the improvement of the asymptotic power in Corollary 2 stems from

the following asymptotic representation (for a general result, see Lemma 1U of Escanciano

and Song (2007))

1√n

nXi=1

γu(Ui)γz(Zi){γy(Yi)− gYy (Ui)}

=1√n

nXi=1

γu(Ui){γz(Yi)− gZz (Ui)}{γy(Yi)− gYy (Ui)}+ oP (1).

7Khmaladze and Koul (2004) also discuss the case when the power is increased by using estimators insteadof true ones in the context of parametric tests.

23

As noted in the remarks after Theorem 1, the variance of the leading sum on the right-hand

side is smaller than that of 1√n

Pni=1 γu(Ui)g

Zz (Zi){γy(Yi)−gYy (Ui)} under the null hypothesis.

In fact, when we replace γz(Z) by gZz (U) on the left-hand side, we obtain the following result

1√n

nXi=1

γu(Ui)gZz (Ui){γy(Yi)− gYy (Ui)} = oP (1). (22)

This fact provides a contrast to a result obtained using the true nonparametric function gYy :

1√n

nXi=1

γu(Ui)gZz (Ui){γy(Yi)− gYy (Ui)} = OP (1). (23)

The rate of convergence for the sample version of E£γu(U)g

Zz (U){γy(Y )− gYy (U)}

¤is faster

with the nonparametric estimator gYy than the true one. Actually we can obtain the same

contrast between (22) and (23) when we replace γu(U)gZz (U) by any function ϕ(U). In

other words, by using the nonparametric estimation in the test statistic, we utilize the

conditional moment restriction E£γy(Y )− gYy (U)|U

¤= 0 more effectively than by using the

true function gYy . The conditional moment restriction holds both under the null hypothesis

and under the alternatives. Under the alternatives, this effective utilization of the conditional

moment restriction by nonparametric estimation has an effect of order which is dominated

by the√n-diverging shift term that remains the same regardless of whether we use gYy or

gYy . Combined with this, the first order effect of nonparametric estimation under the null

hypothesis results in improvement in power over the test using the true one gYy .

6.3 The Effect of Nonparametric Estimation Error After the Mar-

tingale Transform

In this section, we analyze the effect of nonparametric estimation error for the test statistics

under the conditional martingale transforms. We divide the discussion into the case of Y

and Z being continuous and the case of either Y or Z being binary.

6.3.1 When Y and Z Are Continuous

We consider the situation of Section 4.2. Recall the definitions of the processes νKn (r) and

νKn,E(r) in (15). Now, we compare the two tests defined by

Tn = supr∈S |νKn,E(r)| and Sn = supr∈S |νKn (r)|,

24

and denote e∗T,S to be the limiting Pitman efficiency of Tn with respect to Sn. The test

Tn uses the estimated version gKy (U) = E£γKy (Y )|U

¤and the test Sn uses the true one

gKy (U) = E£γKy (Y )|U

¤which is zero by Theorem 2(i). Then we obtain the following result.

Corollary 3 : Suppose the conditions of Theorems 1 and 2 hold. Then we have

e∗T,S = 1.

The result of Corollary 3 shows that the effect of nonparametric estimation error disappears

under the conditional martingale transform. This was expected from the result of Theorem

2 in which we saw that the asymptotic behavior of the two tests Tn and Sn are the same both

under the null hypothesis and the local alternatives. The main reason is that the conditional

martingale transform completely wipes out the additional term due to the nonparametric

estimation error that is responsible for the improvement in asymptotic power. Therefore,

under the conditional martingale transform, one may prefer the simpler test statistic Sn than

Tn which involves the nonparametric estimator gKy (U).

6.3.2 When Either Y or Z are Binary

In this subsection, we consider the situation of Section 5. We noted that the nonparametric

estimation error does not affect the asymptotic properties of the martingale-transformed

processes. However, in the case when Z is binary, the nonparametric estimation error can

improve the power of the test even after the martingale transform. This is because we are

taking a conditional martingale transform only on γy and thereby, wiping out the effect of

nonparametric estimation error only partially.

To substantiate these remarks, let us analyze the behavior of the two processes νKna(u, y)

and νKn,E(u, y) in (21) under a fixed alternative. Based on Theorems 1 and 2, we can easily

show that8

n−1/2νKna(u, y) → p ϕ0(u, y; τ) , Eτ

"Zγu(U)(γ

Ky )(U, Y )p

Pτ{Z = 1|U}

#and (24)

n−1/2νKn,E(u, y) → p ϕ1(u, y; τ) , Eτ

"Zγu(U)(γ

Ky )(U, Y )p

Pτ{Z = 1|U}− Pτ{Z = 1|U}2

#

where Eτ denotes the expectation under Pτ . Since the denominator of the shift term for

νKna(u, y) dominates that of the shift term for νKn,E(u, y), the test based on the process

8The second statement follows by the law of iteratred conditional expectations and the orthogonalityconditions in Theorem 2(i).

25

νKn,E(u, y) has a better local asymptotic power than that based on n−1/2νKna(u, y). This also

shows the interesting phenomenon that the test has a better power when we use the non-

parametric estimator for E£γKy (U, Y )|U

¤instead of the true one. As before, we can confirm

this phenomenon in terms of the limiting Pitman efficiency. Define now

Tn = sup(u,y)∈[0,1]×R|νKna(u, y)| and Sn = sup(u,y)∈[0,1]×R|νKn,E(u, y)|,

and denote e∗T,S to be the limiting Pitman efficiency of Tn with respect to Sn.

Corollary 4 : Suppose that the conditions of Theorem 1 hold except for that of Z. Then

e∗T,S = limτ↓0

(sup(u,y)∈[0,1]×R |ϕ1(u, y; τ)|sup(u,y)∈[0,1]×R |ϕ0(u, y; τ)|

)2≤ 1,

if the limit above exists.

The result of the improved power property in Corollary 4 is demonstrated in the simu-

lation studies. Note that the local shift EhZγu(U)(γ

Ky )(U, Y )/

pP{Z = 1|U}

iin (24) can

be large or small depending on whether Z and Y are conditionally positively or negatively

dependent. Hence the asymptotic power properties can be asymmetric depending on the

alternatives. The symmetrized version TK1S removes part of this asymmetry.

6.4 The Effect of Conditional Martingale Transforms on Asymp-

totic Power of Tests

Based on the result of Theorem 2, we can perform an analysis on the effect of conditional

martingale transforms on the asymptotic power using the limiting Pitman efficiency. In this

subsection, we compare two tests based on the processes νn(r) and νKn (r) defined in (6) and

(15). At first glance, the comparison is not unambiguous because the martingale transform

dilates the variance of the empirical process under the null hypothesis, but at the same time

moves the shift term under the local alternatives farther from zero. The eventual comparison

should take into account the trade-off between these two changes. To facilitate the analysis,

we consider the following specification of Y :

Y = ξ(Z, ε),

where ξ(z, ε) : R2 → R and ε is a random variable taking values in R and is condition-

ally independent of Z given U. In the following we give the limiting Pitman efficiency of

26

Kolmogorov-Smirnov tests based on the processes ν1n(r) and νKn (r) and characterize a set

of alternatives under which the conditional martingale transform improves the asymptotic

power of the test.

Let us denote the conditional copula of Z and Y given U by

CZ,Y |U(z, y|U ; τ) , Pτ

©gYZ (U ; τ) ≤ z, gYY (U ; τ) ≤ y|U

ª.

The conditional copula CZ,Y |U(z, y|U ; τ) is the conditional joint distribution function of Zand Y under Pτ after normalized by the conditional quantile transforms. It summarizes the

conditional joint dependence of Z and Y given U . For introductory details about copulas,

see Nelsen (1999) and for conditional copulas, see Patton (2006). Define

Tn = supr∈S0|νn(r)| and Sn = supr∈S0 |νKn (r)|,

and denote e∗T,S to be the limiting Pitman efficiency of Tn with respect to Sn.

Corollary 5 : Suppose the conditions of Theorems 1 and 2 hold. Furthermore assume thatξ(z, ε) is either strictly increasing in z for all ε in the support of ε or strictly decreasing in

z for all ε in the support of ε. Then

e∗T,S = limτ↓0sup(u,z,y)∈S

¯R u0

©CZ,Y |U(gZz (u; τ), g

Yy (u; τ)|u; τ)− gZz (u; τ)g

Yy (u; τ)

ªdu¯2

sup(z,y)∈R2

R 10(gZz (u; τ)− gZz (u; τ)

2)(gYy (u; τ)− gYy (u; τ)2)du

,

if the limit above exists. In particular, when gZz (u; τ) and gYy (u; τ) do not depend on u for

all τ ∈ [0, ε) for some ε > 0,

e∗T,S ≤1

4.

Corollary 5 provides a representation of the limiting Pitman efficiency e∗T,S.When gZz (u; τ)

and gYy (u; τ) do not depend on u on a neighborhood of τ = 0, so that the test of conditional

independence becomes test of independence, it follows that e∗T,S ≤ 1/4, and hence in thiscase, the conditional martingale transform weakly improves the asymptotic power as long

as ξ(z, ε) satisfies the condition of Corollary 5. However, in general, we cannot say whether

either of the two tests dominate the other.

27

7 Simulation Studies

7.1 Conditional Martingale Transform and Wild Bootstrap

In this section, we present and discuss the results from simulation studies that compare the

approach of conditional martingale transform and the approach of bootstrap. We consider

testing conditional independence of Yi and Zi givenXi. Each variable is set to be real-valued.

The variable Zi is binary taking zero or one depending on the following rule:

Zi = 1{Xi + ηi > 0}.

The variable Yi is determined by the data generating process

Yi = βXi + κZi + εi.

The parameter β is set to be 0.5. The variable Xi is drawn from the uniform distribution

on [−1, 1] and the errors ηi and εi are independently drawn from N(0, 1). Note that κ = 0

corresponds to the null hypothesis of conditional independence between Y and Z given X.

The magnitude of β controls the strength of the relation between Yi and Xi.

We consider two tests, one based on the conditional martingale transform approach and

the other based on the bootstrap approach. For both tests, we take a sample quantile

transform of the marginals of Yi and Xi to generate V Yi and Ui. The bootstrap procedure is

as follows:

(1) Take (εy,i)ni=1 defined by εy,i = γy(Yi)− E(γy(Yi)|Ui).

(2) Take a wild bootstrap sample (ε∗y,i)ni=1 = (ωiεy,i)

ni=1 where {ωi} is an i.i.d. sequence of

random variables that are independent of {Yi, Zi, Ui} such that E(ωi) = 0, E(ω2i ) = 1.

(3) Construct γ∗y,i = E(γy(Yi)|Ui) + ε∗y,i.

(4) Using (γ∗y,i, Ui)ni=1, estimate the bootstrap conditional mean function E(γ

∗y,i|Ui).

(5) Obtain the bootstrap empirical process ν∗n(u, y) =1√n

Pni=1 γu(Ui)Zi

hγ∗y,i − E(γ∗y,i|Ui)

i.

The bootstrap procedure is the same as that considered in Delgado and Gonzáles Man-

teiga (2001), p.1490, except that they did not take the quantile transform of X. Based upon

the bootstrap process, we can construct the bootstrap test statistics:

TBKS = sup

(u,y)∈[0,1]×R|ν∗n(u, y)| and TB

CM =

Z Zν∗n(u, y)

2dudy.

Nonparametric estimations are done using series-estimation with Legendre polynomials.

28

We rescale the variables Xi and Yi so that they have a unit interval support. The rescaling

was done by taking their (empirical) quantile transformations. We also take into account

unknown conditional heteroskedasticity by normalizing the semiparametric empirical process

by σ(Ui)−1. The test statistics are formed by using a Kolmogorov-Smirnov type. The number

of Monte Carlo iterations and the number of bootstrap Monte Carlo iterations are set to be

1000. The sample size is 300.

First, consider the size of the tests whose results are in Table 1. The nominal size is

set at 0.05. Order 1 represents the number of the terms included in the nonparametric

estimation of the bivariate density function fY,U(y, u) and Order 2 represents the number

of the terms included in the nonparametric estimation of the conditional expectations given

Ui. The martingale-transform based tests appear more sensitive to the choice of orders in the

series estimation than the bootstrap-based test. This is not surprising because the martingale

transform approach involves more incidence of nonparametric estimation than the bootstrap

approach. The empirical size overall seems reasonable.

[INSERT TABLE 1]

Tables 2 and 3 contain the results for power of the tests. The deviation from the null

hypothesis is controlled by the magnitude of κ. Results in Table 2 are under the alternatives

of positive dependence between Yi and Zi and those in Table 3 are under the alternatives of

negative dependence. First, the power of the test based on TK1A is shown to be disproportion-

ate between alternatives of positive dependence (with κ positive) and negative dependence

( with κ negative ). This disproportionate behavior disappears in the case of its "sym-

metrized" version TK1B. Second, the power of tests based on TK2 is better than that based

on TK1A or TK1B. Recall that the test statistic T

K2 involves the nonparametric estimator for

E[(γKy )(Ui, Yi)|Ui] = 0, whereas TK1A and TK1B simply uses zero in its place. The result shows

the interesting phenomenon that the use of the nonparametric estimator improves the power

of the test even when we know that it is equal to zero in population. The results from the

bootstrap works well and the performance is stable over the choice of order in the series and

over the alternatives under which the test is performed.

[INSERT TABLES 2 AND 3]

29

7.2 The Effect of Nonparametric Estimation of gYy (u)

We consider testing the conditional independence of Y and Z given U where Z is a binary

variable and (Y,U) has uniform marginals over [0, 1]2. We generate

Y = α((1− κ)U + κ(Zε+ (1− Z)U)) + (1− α)ε,

where ε follows a standard normal distribution and independent of U and Z, and Z is

generated by

Z = 1 {U − η > 0}

where η is a uniform variate [0, 1]. Hence P{Z = 1} = 1/2.When κ = 0, the null hypothesisof conditional independence of Y and Z given U holds. As κ gets bigger the role of Z in

directly contributing to Y without through U becomes more significant. In this case, we can

explicitly compute the conditional distribution function gYy (u) as follows

gYy (u) = P {Y ≤ y|U = u, Z = 1}P {Z = 1}+ P {Y ≤ y|U = u, Z = 0}P {Z = 0}

=1

2

µΦ

µy − α(1− κ)u

1− α(1− κ)

¶+ Φ

µy − αu

1− α

¶¶.

Note that E [Z|U ] = U. Hence σ2(U) =√U − U2.

We noted before that the estimation error in gYy (Ui) has an effect of improving the power

of the test. In order to investigate this in finite samples, we consider the two test statistics

T INFn = sup

(u,y)∈[0,1]2

¯¯ 1√n

nXi=1

1{Ui ≤ u}Zipσ22(Ui)

¡1{Yi ≤ y}− gYy (Ui)

¢¯¯ andTFEASn = sup

(u,y)∈[0,1]2

¯¯ 1√n

nXi=1

1{Ui ≤ u}Ziqσ22(Ui)− σ42(Ui)

¡1{Yi ≤ y}− gYy (Ui)

¢¯¯ .Note that gYy (u) and σ2(U) are completely known in this case.

We take the sample size equal to n = 3000 and set the Monte-Carlo simulation number

to equal to 1000. We consider the estimated density of the test statistics both under the null

(κ = 0) and under the alternative (κ = 0.5). Here we use the quantile transform of the data,

use the series estimation with Legendre polynomials with the number of terms equal to 5

when univariate and 52 when bivariate. The result is shown in Figure 1. Under the null, the

behavior of the two test statistics is more or less similar as consistent with the asymptotic

theory. However, under the alternative, the behavior becomes dramatically different. The

shift is more prominent in the case of TFEASn than in the case of T INF

n . This was expected

30

from our analysis of power from the previous section, and reflects our theoretical observation

that the use of gYy (Ui) instead of gYy (Ui) improves the local power properties of the tests.

This is also observed in terms of empirical processes inside T INFn and TFEAS

n as shown in

Figure 2.

[INSERT FIGURES 1 AND 2]

8 Conclusion

A test of conditional independence has been proposed. The test is asymptotically unbiased

against Pitman local alternatives and yet has critical values that can be tabulated from

the limiting distribution theory. As in Song (2007), the main idea is to employ conditional

martingale transforms, but the manner that we apply the transforms in testing conditional

independence is different. We apply the transform to the indicator functions of Zi and Yi

with the conditioning on Ui.

There are two possible extensions: first, the analysis of the asymptotic power function,

and second, the accommodation of weakly dependent data. The analysis of the asymptotic

power function will illuminate the effect of the martingale transform on the power of the

tests. The i.i.d. assumption in testing conditional independence is restrictive considering

that testing the presence of causal relations often presupposes the context of time series.

Extending the results in the paper to time series appears nontrivial because the methods

of this paper heavily draws on the results of empirical process theory that are based on

independent series. However, there have been numerous efforts in the literature to extend

the results of empirical processes to time series (e.g., Andrews (1991), Arcones and Yu (1994),

and Nishiyama (2000).) It seems that the results in this paper can be extended to time series

context based on those results.

9 Appendix: Mathematical Proofs

9.1 Main Results

In this section, we present the proofs of the main result. Throughout the proofs, the notation

C denotes a positive absolute constant that may assume different values in different contexts

in which it is used. We first prove Theorem 1. The following lemma is useful for this end.

31

Define

γu,n,λ(x) , 1(1

n

nXi=1

1 {λ(Xi) ≤ λ(x)} ≤ u

).

The function γu,n,λ(·) is indexed by (u, λ) ∈ [0, 1]×Λ0 where Λ0 , {λθ(·) : θ ∈ B(θ0, δ)}. Theindicator function contains the nonparametric function which is an empirical distribution

function. Note that we cannot directly apply the framework of Chen, Linton and van Keile-

gom (2003) to analyze the class of functions that the realizations of γu,n,λ(x) falls into. This

is because when the indicator function contains a nonparametric function, the procedure of

Chen, Linton, and van Keilegom (2003) requires the entropy condition for the nonparametric

functions with respect to the sup norm. This entropy condition with respect to the sup norm

does not help because Lp uniform continuity fails with respect to the sup norm. Hence we

establish the following result for our purpose.

Lemma A1 : Assume that supλ∈Λ0 supλ∈R fλ(λ) < ∞, where fλ(·) denotes the density ofλ(X). Then there exists a sequence of classes Gn of measurable functions such that

P©γu,n,λ ∈ Gn for all (u, λ) ∈ [0, 1]× Λ0

ª= 1,

and for any r ≥ 1,

logN[](C2ε,Gn, || · ||r) ≤ logN(εr/2,Λ0, || · ||∞) +C1ε,

where C1 and C2 are positive constants depending only on r.

Proof of Lemma A1 : For a fixed sequence {xi}ni=1 ∈ RndX , we define a function:

gn,λ,u(λ) , 1(1

n

nXi=1

1©λ(xi) ≤ λ

ª≤ u

),

where gn,λ,u(·) depends on the chosen sequence {xi}ni=1 .We introduce classes of functions asfollows:

G0n ,©gn,λ,u : (λ, {xi}ni=1, u) ∈ Λ0 ×RndX × [0, 1]

ªand

Gn , {g ◦ λ : (g, λ) ∈ G0n × Λ0} .

Then we have γu,n,λ ∈ Gn. Note that gn,λ,u(λ) is decreasing in λ. Hence by Theorem 2.7.5 in

32

van der Vaart and Wellner (1996), for any r ≥ 1

logN[](ε,G0n, || · ||Q,r) ≤C1ε, (25)

for any probability measure Q and for some constant C1 > 0.

Without loss of generality, assume that λ(X) ∈ [0, 1], because otherwise, we can considerΦ ◦ Λ0 = {Φ ◦ λ : λ ∈ Λ0} in place of Λ0, where Φ is the distribution function of a standardnormal variate. We choose {λ1, · · ·, λN2} such that for any λ ∈ Λ0, there exists j with

kλj − λk∞ < εr/2. For each j, we define λj(x) as follows.

λj(x) = mεr/2 when λj(x) ∈ [mεr/2, (m+ 1)εr/2) for some m ∈ {0, 1, 2, · · ·, b2/εrc},

where bzc denotes the greatest integer that does not exceed z. Note that the range of λj is

finite and ||λ− λj||∞ is equal to

supx

b2/εrcXk=0

¯λ(x)− kεr

2

¯1

½λj(x) ∈

∙kεr

2,(k + 1)εr

2

¶¾

≤ supx

b2/εcXk=0

µ|λ(x)− λj(x)|+

¯λj(x)−

kεr

2

¯¶1

½λj(x) ∈

∙kεr

2,(k + 1)εr

2

¶¾≤ εr.

From (25), we have logN(ε,G0n, || · ||Q,r) ≤ C1/ε. We choose {g1, · · ·, gN1} such that for anyg ∈ G0n, there exists j such that

R 10|g(λ)− gj(λ)|rdλ < εr. Then we have for any λ ∈ Λ0,

E |g(λ(X))− gj(λ(X))|r =Z 1

0

¯g(λ)− gj(λ)

¯rfλ(λ)dλ ≤ sup

λsupλ

fλ(λ)εr.

Now, take any h ∈ Gn such that h , g ◦ λ and choose (gj1, λj2) from {g1, · · ·, gN1} and{λ1, · · ·, λN2} such that

R 10|g(λ)− gj1(λ)|rdλ < εr and kλj2 − λk∞ < εr/2. Then consider°°°h− gj1 ◦ λj2

°°°r≤ kg ◦ λ− gj1 ◦ λkr +

°°°gj1 ◦ λ− gj1 ◦ λj2°°°r

≤µsupλsupλ

fλ(λ)

¶1/rε+

³E¯(gj1 ◦ λ)(X)− (gj1 ◦ λj2)(X)

¯r´1/r.

The absolute value in the expectation of the second term is bounded by

1

(1

n

nXi=1

1nλj1(xi) ≤ λj2(X)− εr

o≤ uj1

)− 1

(1

n

nXi=1

1nλj1(xi) ≤ λj2(X) + εr

o≤ uj1

),

33

or by

1

(1

n

nXi=1

1nλj1(xi) ≤ λj2(X)− εr

o≤ uj1 <

1

n

nXi=1

1nλj1(xi) ≤ λj2(X) + εr

o)

=

b2/εcXm=0

1

(1

n

nXi=1

1

½λj1(xi) ≤

mεr

2− εr

¾≤ uj1 <

1

n

nXi=1

1

½λj1(xi) ≤

mεr

2+ εr

¾)

×1½mεr

2≤ λj2(X) <

(m+ 1)εr

2

¾.

Since P {λj2(X) ∈ [mεr/2, (m+ 1)εr/2)} ≤ supλ supλ fλ(λ)εr ≤ Cεr by the mean-value the-

orem, and the sets

Am =

(u ∈ [0, 1] : 1

n

nXi=1

1

½λj1(xi) ≤

mεr

2− εr

¾≤ u <

1

n

nXi=1

1

½λj1(xi) ≤

mεr

2+ εr

¾)

constitutes the partition of [0, 1], we deduce that

E¯(gj1 ◦ λ)(X)− (gj1 ◦ λj2)(X)

¯r≤ Cεr.

Therefore,°°°h− gj1 ◦ λj2

°°°r≤ Cε, yielding the result that

logN[](Cε,Gn, || · ||r) ≤ logN(Cεr,Λ0, || · ||∞) + logN[](ε,G0n, || · ||r)≤ logN(Cεr,Λ0, || · ||∞) + C/ε.

Assumption A1 : (i) λmin¡R

pK(u)pK(u)0du¢> 0.

(ii) There exist d1 and d2 such that (a) for some d > 0, there exist sets of vectors {πy : y ∈ R}and {πz : z ∈ R} such that

sup(y,u)∈R×[0,1]

¯pK(u)0πy − gYy (u)

¯= O(K−d1) and

sup(z,u)∈R×[0,1]

¯pK(u)0πz − gZz (u)

¯= O(K−d2),

(b) for each u ∈ [0, 1], there exist sets of vectors in RK , {πy,u : y ∈ R}, such that

sup(y,u)∈R×[0,1]

µZ 1

0

¯PK(u)0πy,u − gYy (u)1{u ≤ u}

¯pdu

¶1/p= O(K−d1),

34

where gYy (·) denotes the first order derivative of gYy (·).(iii) For d1 and d2 in (ii),

√nζ20,KK

−d = o(1) and√nζ1,KK

−d2 = o(1).

(iv) For b in the definition of Λ0, we have n1/2−2bζ30,Kζ2,K = o(1), n−1/2+1/pK1−1/pζ20,K = o(1)

and n−bζ0,K{pζ0,Kζ2,K + ζ1,K} = o(1).

Proof of Theorem 1 : Choose a sequence λ within the shrinking neighborhood B(λ0, δn) ⊂Λ of λ0 with diameter δn = o(1) in terms of || · ||2. For Gn in Lemma A1, we define a processνn(g, r, λ) indexed by (g, r, λ) ∈ Gn × S × Λ0 as

νn(g, y, z, λ) ,1√n

nXi=1

g(Xi)γz(Zi)hγy(Yi)− pK(Ui)

0πy,λi,

where πy,λ is equal to πy defined below (4) except that in place of λθ(·), λθ ∈ Λ0, is used.

By Lemma A1 and Assumption 3(iii), we obtain

logN[](ε,Gn, || · ||p) ≤ logN(εr/2,Λ0, || · ||∞) + C1/ε ≤ C log ε+ C1/ε.

This implies that the bracketing entropy of {gγz : (z, g) ∈ R× Gn} is bounded by C log ε+C1/ε.We apply Lemma 1U of Escanciano and Song (2007) to νn(g, y, z, λ) to deduce that it

is equal to

1√n

nXi=1

[g(Xi)γz(Zi)−E [g(Xi)γz(Zi)|Ui]£γy(Yi)− P {Yi ≤ y|Ui}

¤+ oP (1)

where oP (1) is uniform over (g, y, z, λ) ∈ Gn ×R×R× Λ0. Hence the above holds for any

arbitrary sequence gn ∈ Gn such that gn(x) → γu((Fθ0 ◦ λθ0)(x)). This yields the wantedresult.

We define two notations that we use throughout the remaining proofs. First, P denotesthe operation of sample mean: Pf , 1

n

Pni=1 f(Si) and second,

⊥ denotes the conditional

mean deviation: γ⊥y (Y,U) , γy(Y )−E£γy(Y )|U

¤.We also denote for a real-valued function

g(y, z) on R×R, Pg , E [g(Y,Z)] and (PUg)(u) , E [g(Y, Z)|U = u] .

Proof of Corollary 1 : (i)(a) and (ii)(b): Assume the null hypothesis. We first considerν1n(r).The convergence of finite dimensional distributions of the process

√nPn

£γuγ

⊥z γ

⊥y

¤follows

by the usual CLT. The finite second moment condition trivially follows from the uniform

boundedness of the indicator functions. And it is trivial to compute the covariance function

of the process√nPn[γuγz⊥γy⊥] to verify the covariance function of the limit Gaussian process

in the theorem. The weak convergence√nPn[γuγz⊥γy⊥] of follows from the fact that the

35

class of univariate indicator functions is VC and products of a finite number of the indicator

functions are also VC by the stability properties of VC classes (see van der Vaart and Well-

ner(1996)). We can likewise analyze the limit behavior of the process νn(r) =√nPn

£γuγzγ

⊥y

¤to obtain the wanted result.

(i)(b) and (ii)(b): Write

n−1/2νn(r) = Pn£γuγzγ

⊥y

¤= Pnγu

£γzγ

⊥y

¤⊥+ Pn[γu[PUρw]−P{γu[PUρw]}] +P[γuρw],

where ρw , γz(γy −PUγy). Now, assume that the null hypothesis does not hold. Using the

preliminary result in (7), we write

n−1/2νn(r) = Pn£γuγ

⊥z γ

⊥y

¤+ oP (1)

= Pnγu£γ⊥z γ

⊥y

¤⊥+ Pn[γu[PUρw]−P{γu[PUρw]}] +P[γuρw] + oP (1),

by adding and subtracting terms. The first and second processes are a mean-zero empirical

process, and using the similar arguments in (i), it is not hard to show that the processes are

indexed by a Glivenko-Cantelli class. Hence, the result follows.

Proof of Theorem 2 (i) The proof is similar to the proof of Proposition 6.1 of Khmaladzeand Koul (2004).

(ii) (a) We apply Theorem 1 to the process

νKn (r) =1√n

nXi=1

γu(Ui)γKz (Zi){γKy (Yi)− (PUγKy )(Yi, Ui)}

by replacing γz and γy by γKz and γKy . In view of the proof of Theorem 1, the main step

is applying Lemma 1U of Escanciano and Song (2007) to the above process. To this end,

it suffices to show that the classes {γKy : y ∈ R} and {γKz : z ∈ R} satisfy the requiredbracketing entropy conditions. This is shown in Lemma A3(a) of Song (2007). Hence by

following the same steps in the proof of Theorem 1, we obtain that under the null hypothesis,

νKn (r) =√nPn[γu(γKz )⊥(γKy )⊥] + oP (1). (26)

Furthermore, by observing from (i) of this theorem, PU(γKy ) = 0 and PU(γKz ) = 0. Note

that under the null hypothesis of conditional independence, (γKz )(Ui, Zi) and (γKy )(Ui, Yi) are

36

conditionally independent given Ui. Therefore

PU(γKz γKy ) = P

U(γKz )PU(γKy ) = 0.

Therefore, under the null hypothesis, we have

νKn (r) =√nPn[γu(γKz )(γKy )] + oP (1).

We investigate the process√nPn[γu(γKz )(γKy )] and establish its weak convergence. The

convergence of finite dimensional distributions follows by the usual central limit theorem.

Note that the functions γKy and γKz are uniformly bounded. By Lemma A3(a) of Song

(2007), the class A = {γu(γKz )(γKy ) : (u, z, y) ∈ S} is totally bounded and the process√nPn[γu(γKz )(γKy )] is stochastically equicontinuous in A, and hence the weak convergence of

the process√nPn[γu(γKz )(γKy )] follows by Proposition in Andrews (1994), p. 2251.

The covariance function of this process is equal to P£γu1∧u2(γ

Kz1)(γKz2)(γ

Ky1)(γKy2)

¤. Under

the null hypothesis of conditional independence, the covariance function becomes

P£γu1∧u2P

U£(γKz1)(γ

Kz2)(γKy1)(γ

Ky2)¤¤

= P£γu1∧u2P

U£γKz1γ

Kz2

¤PU

£γKy1γ

Ky2

¤¤= P

£γu1∧u2P

U£γz1γz2

¤PU

£γy1γy2

¤¤= P

£γu1∧u2γz1∧z2γy1∧y2

¤.

By the null hypothesis of conditional independence, the above is equal toZ u1∧u2FZ|U(z1 ∧ z2|u)FY |U(y1 ∧ y2|u)du.

Now, when we replace γKu , γKz , and γKy by γ

Ku , γ

Kz , and γKy , the covariance function becomesZ u1∧u2 ∙Z z1∧z2

f−1Z|U(z|u)fZ|U(z|u)dz¸ ∙Z y1∧y2

f−1Y |U(y|u)fY |U(y|u)dy¸du

= (u1 ∧ u2)(z1 ∧ z2)(y1 ∧ y2).

(b) The result is immediate from (26).

Here, we review the notion of asymptotic Bahadur efficiency. We consider the para-

metrization that we introduced in the beginning of subsection 6.1. Denote by hT (v) and

hS(v) the limits of the rejection probability of tests based on Γνn and Γνn under the null

37

hypothesis. That is, under the null hypothesis

P {Tn ≥ v}→ hT (κ) and P {Sn ≥ κ}→ hS(κ)

as n→∞. This often boils down to the computation of tail probabilities of a functional of

Gaussian processes. Suppose that there exist constants bT and bS satisfying

lnhT (κ) = −1

2bTκ

2 + o(1) and lnhS(κ) = −1

2bSκ

2 + o(1) as κ→∞. (27)

On the other hand, suppose that under the fixed alternative Pτ , we have numbers ϕT

and ϕS on S such that1√nTn →p ϕT , and

1√nSn →p ϕS. (28)

Then the approximate Bahadur efficiency (denoted by eBT,S(τ)) at Pτ of a test based on the

test statistic Tn relative to one that is based on Sn is computed from (Nikitin (1995))

eBT,S(τ) =bTbS×½ϕT

ϕS

¾2. (29)

The approximate Bahadur efficiency is composed of two ratios. The first ratio bT/bS mea-

sures the relative tail behavior of the limiting distribution of the test statistic under the null

hypothesis. The second ratio ϕT/ϕS measures how sensitively the test statistics deviate from

the null limiting distribution in large samples under the alternatives. The limiting Pitman

efficiency e∗T,S coincides with limτ↓0 eBT,S(τ) when the conditions of Wieand (1976) is satisfied

and the limit limτ↓0 eBT,S(τ) exists.

Proof of Corollary 2 : It is obvious that the sequences {Tn} and {Sn} are standardsequences as defined in Bahadur (1960). Hence, we prove Condition III∗ of Wieand (1976).

Let bP =supr∈S |√nhγu, γz(γu − gYy )i| and

Un(r) = νn(r)−√nhγu, γz(γu − gYy )i.

Then, by the result of Corollary 1, limn→∞ P {supr∈S |Un(r)| < v} = Q(v), where Q is the

distribution function supr∈S |ν1(r)|. Now, observe that for any ε > 0 and δ ∈ (0, 1), thereexists C > 0 such that for all n > C/b2P ,

P©|n−1/2Tn − bP | < εbP

ª≥ P {supr∈S |Un(r)| < εbP} > 1− δ

if P ∈ {Pτ : τ ∈ (0, τ 0]}, by Lemma of Wieand (1976), p.1007. Hence the sequence Tn

38

satisfies the Condition III∗ of Wieand (1976). Similarly, we can show that the sequence Snalso satisfies Condition III∗ because the local shifts under the fixed alternatives are the same.

(See Corollary 1(i)(b) and (ii)(b).) Hence the limiting Pitman efficiency coincides with the

approximate Bahadur efficiency.

From now on, we omit the notation that represents dependence on τ for notational

brevity. The covariance kernels of the limit Gaussian processes of ν(r) and ν1(r) in Theorem

1 are respectively given by

c(r1, r2) =

Z u1∧u2

0

gZz1∧z2(u; τ)(gYy1∧y2(u)− gYy1(u)g

Yy2(u))du,

c1(r1, r2) =

Z u1∧u2

0

(gZz1∧z2(u)− gZz1(u)gZz2(u))(gYy1∧y2(u)− gYy1(u)g

Yy2(u))du.

Therefore, (e.g. Lemma 4.8.1 in Nikitin (1995)) we have

limκ→∞

κ−2 logP {supr∈S |ν(r)| ≥ κ} = −12bT ,

limκ→∞

κ−2 logP {supr∈S |ν1(r)| ≥ κ} = −12bS,

where the constants bT and bS are given by

b−1T = supr∈S

c(r, r) = supy∈R

Z 1

0

(gYy (u)− gYy (u)2)du (30)

and b−1S = supr∈S

c1(r, r) = sup(z,y)∈R2

Z 1

0

(gZz (u)− gZz (u)2)(gYy (u)− gYy (u)

2)du

HencebTbS=sup(z,y)∈R2

R 10(gZz (u)− gZz (u)

2)(gYy (u)− gYy (u)2)du

supy∈RR 10(gYy (u)− gYy (u)

2)du≤ 14.

Now let us consider the case of alternatives. The results of Theorem 1 yields that the

numbers ϕT and ϕS in (28) are equal, given by

ϕT = sup(u,z,y)∈S |Pτ

£γu(hγz, γyiU − hγz, 1iUhγy, 1iU)

¤|.

Therefore, we obtain the wanted result from (29).

Proof of Corollary 3 : Trivial from the result of Theorem 2.

Proof of Corollary 4 : We can easily check Condition III∗ of Wieand(1976) for Tn andSn similarly as in the proof of Corollary 2. Hence, we compute the approximate Bahadur

39

efficiency. The covariance kernels of the limit Gaussian processes of νKna(r) and νKn,E(r) in

Theorem 1 are identical. Hence for bT and bS in (27), bT = bS. Now, the wanted result

follows by plugging (24) into (29).

Proof of Corollary 5 : Again, we can easily check Condition III∗ of Wieand(1976) for Tnand Sn similarly as in the proof of Corollary 2. The computation of e∗T,S is again via the

approximate Bahadur efficiency.

The covariance kernels of the limit Gaussian processes of νK(r) is given by

cK(r1, r2) =Z u1∧u2

0

gZz1∧z2(u)gYy1∧y2(u)du,

yielding the result that (e.g. Lemma 4.8.1 in Nikitin (1995)) we have

limκ→∞

κ−2 logP {supr∈S |νK(r)| ≥ κ} = −bS/2,

where the constant bS is given by b−1S = supr∈S cK(r, r) = 1. Therefore,

bTbS= 1/

(sup

(z,y)∈R2

Z 1

0

(gZz (u)− gZz (u)2)(gYy (u)− gYy (u)

2)du

),

as bS in (30) is bT now. On the other hand, under Pτ , we have

ϕT = sup(u,z,y)∈S |Eτ

£γu{hγz, γyiU − hγz, 1iUhγy, 1iU}

¤|

= sup(u,z,y)∈S |Eτ

£γu(U)

©CZ,Y |U(gZz (U), g

Yy (U)|U)− gZz (U)g

Yy (U)

ª¤|.

Now, we turn to ϕS. Observe that

ϕS = sup(u,z,y)∈S |Pτ

£γu(hγKz , γKy iU − hγKz , 1iUhγKy , 1iU)

¤| (31)

= sup(u,z,y)∈S |Pτ

£γuhγKz , γKy iU

¤| = sup(u,z,y)∈S |Eτ

£(Kγz)(U ;Z)(Kγy)(U ;Y )|U

¤|.

The second equality follows from the conditional orthogonality property of K.First, assume that ξ(z, ε) is strictly increasing in z. We observe that the expectation in

the last term in (31) is equal to

Eτ [(Kγz)(U ;Z)(Kγξ−1(y,ε))(U ;Z)|U ] (32)

in this case. (When ξ(z, ε) is strictly decreasing in z, the expectation in the last term in (31)

40

is equal to

Eτ [(Kγz)(U ;Z)(K(1− γξ−1(y,ε)))(U ;Z)|U ]= −Eτ [(Kγz)(U ;Z)(Kγξ−1(y,ε))(U ;Z)|U ],

because K is a linear operator and Ka = 0 for any constant a. Then we can follow the

subsequent steps in a similar manner.) By the conditional isometry condition, the term in

(32) is equal to

Eτ [γz(Z)γξ−1(y,ε)(Z)|U ] = Eτ

£FZ|U

¡z ∧ ξ−1 (y, ε) |U

¢|U¤

= Pτ

©Z ≤ z, Z ≤ ξ−1 (y, ε) |U

ª= Pτ {Z ≤ z, Y ≤ y|U}

= CZ,Y |U(gZz (U ; τ), gYy (U ; τ)|U).

Therefore, (omitting the notation for dependence on τ)

ϕT = sup(z,y,u)∈S

¯E£γu(U)

©CZ,Y |U(gZz (U), g

Yy (U)|U)− gZz (U)g

Yy (U)

ª¤¯and

ϕS = sup(z,y,u)∈S

E£γu(U)CZ,Y |U(gZz (U), g

Yy (U)|U)

¤= 1.

Combining the results, we obtain the wanted statement.

Suppose gZz (u) and gYy (u) does not depend on u. Then

e∗T,S ≤ 4max(

sup(z,y)∈[0,1]2

{z ∧ y − zy}2, sup(z,y)∈[0,1]2

{zy −max{z + y − 1, 0}}2)=1

4.

Recall that z ∧ y and max{z + y − 1, 0} represent the upper and lower Frechet-Hoeffdingbounds. (e.g. Nelsen (1998)).

9.2 Asymptotic Validity of Feasible Martingale Transforms

In this section, we briefly sketch how the feasible martingale transform that we introduced

in a previous section can be shown to be asymptotically valid. The details can be filled

following the proof of Theorem 2 of Song (2007) and hence are omitted.

Let V Yi = FY (Yi) and V Z

i = FZ(Zi). From Theorem 2(ii), it is not hard to show that

1√n

nXi=1

γu(Ui)γKz (Ui, V

Zi )γ

Ky (Ui, V

Yi )ÃW (r),

41

where γKz (Ui, VZi ) and γKy (Ui, V

Yi ) are equal to γKz (Ui, Zi) and γKy (Ui, Yi) with Zi and Yi

replaced by V Zi and V Y

i . Hence it suffices to show that

1√n

nXi=1

γu(Ui)γKz (Ui, V

Zi )γ

Ky (Ui, V

Yi ) =

1√n

nXi=1

γu(Ui)γKz (Ui, V

Zi )γ

Ky (Ui, V

Yi ) + oP (1),

uniformly over r ∈ S00. The difference between the two sums is equal to

1√n

nXi=1

γu(Ui)γKz (Ui, V

Zi )nγKy (Ui, V

Yi )− γKy (Ui, V

Yi )o

+1√n

nXi=1

γu(Ui)γKy (Ui, V

Yi )nγKz (Ui, V

Zi )− γKz (Ui, V

Zi )o

+1√n

nXi=1

nγu(Ui)− γu(Ui)

oγKy (Ui, V

Yi )γ

Kz (Ui, V

Zi )

= A1n +A2n +A3n say.

We consider A1n first. First define

ψ(z0, x; z, u) , γu(Fn,θ(λθ(x)))γKz (Fn,θ(λθ(x)), F

Zn (z

0)) and

ψ0(z0, x; z, u) , γu(Fθ0(λθ0(x)))γ

Kz (Fθ0(λθ0(x)), FZ(z

0))

and

ϕ(y0, x; y, v) , 1{0 < F Yn (y

0) ≤ y ∧ v}qf(F Y

n (y0), Fn,θ(λθ(x)))

n1− F (F Y

n (y0)|Fn,θ(λθ(x)))

o andϕ0(y

0, x; y, v) , 1{0 < FY (y0) ≤ y ∧ v}q

f(FY (y0), Fθ0(λθ0(x))) {1− F (FY (y0)|Fθ0(λθ0(x)))}.

Then, A1n is equal to

1√n

nXi=1

ψ(Zi,Xi; z, u)

½Ehϕ(Yi,Xi; y, y)|Ui

iy=Yi−E [ϕ0(Yi, Xi; y, y)|Ui]y=Yi

¾.

Under the regularity conditions of the kind in Assumption 6 and conditions for basis functions

in Assumption 2U of Song (2007) for ϕ and ψ, we can deduce

A1n =1√n

nXi=1

Ehψ0(Zi,Xi; z, u)

nϕ0(y, Xi; y, Yi)−E [ϕ0(Yi,Xi; y, y)|Ui]y=Yi

o|Ui

iy=Yi

+oP (1).

(33)

42

Under the null hypothesis of conditional independence,

Ehψ0(Zi,Xi; z, u)

nϕ0(y, Xi; y, Yi)−E [ϕ0(Yi,Xi; y, y)|Ui]y=Yi

o|Ui

iy=Yi

= EhE [ψ0(Zi,Xi; z, u)|Yi, Ui]

nϕ0(y, Xi; y, Yi)−E [ϕ0(Yi,Xi; y, y)|Ui]y=Yi

o|Ui

iy=Yi

= 0

since

E [ψ0(Zi,Xi; z, u)|Yi, Ui] = E [ψ0(Zi,Xi; z, u)|Ui] = γu(Ui)hγKz , 1iUi = 0

by conditional independence and by Theorem 2(i). Under the local alternatives such that

|E [ψ0(Zi,Xi; z, u)|Yi, Ui]| = oP (1),

A1n = oP (1) because the process in (33) is a mean-zero process. (This follows by establishing

that the class of functions that index the process in (33) is P -Donsker.) Hence both under

the null hypothesis and under local alternatives, A1n = oP (1). The second term A2n can be

dealt with similarly.

Let us turn to the third term A3n.We replace Fn,θ(λθ(·)) in Ui by an arbitrary determin-

istic function G that belongs to a class Fn such that

P {Fn ∈ Fn}→ 1 and logN[](ε,Fn, || · ||p) < Cε−b

for b ∈ [0, 2), and for each G ∈ Fn, we have ||Fθ0 ◦λθ0−G||∞ = O(n−1/2). Then, we consider

the process

1√n

nXi=1

{γu(G(Xi))− γu(Ui)} γKy (Ui, VYi )γ

Kz (Ui, V

Zi ), (G, u, y, z) ∈ Fn × S,

which can be shown to be equivalent to

√n {γu(G(Xi))− γu(Ui)}E

hγKy (Ui, V

Yi )γ

Kz (Ui, V

Zi )|G(Xi), Ui

i+ oP (1).

Then using Lemma A2(ii) of Song (2006) and Theorem 2(i),

EhγKy (Ui, V

Yi )γ

Kz (Ui, V

Zi )|G(Xi), Ui

i= OP (n

−1/2).

Now, from the fact that ||Fθ0 ◦ λθ0 −G||∞ = O(n−1/2), it follows A3n = oP (1).

43

References

[1] Ai, C. and X. Chen (2003), "Efficient estimation of models with conditional moment

restrictions containing unknown functions," Econometrica 71, 1795-1843.

[2] Altonji, J. and R. Matzkin (2005), "Cross-section and panel data estimators for non-

separable models with endogenous regressors," Econometrica 73, 1053-1102.

[3] Andrews, D. W. K (1991), "An empirical process central limit theorem for dependent

nonidentically distributed random variables," Journal of Multivariate Analysis, 38, 187-

203.

[4] Andrews, D. W. K (1994), “Empirical process method in econometrics,” in The Hand-

book of Econometrics, Vol. IV, ed. by R. F. Engle and D. L. McFadden, Amsterdam:

North-Holland.

[5] Andrews, D. W. K (1995), "Nonparametric kernel estimator for semiparametric mod-

els," Econometric Theory, 11, 560-595.

[6] Andrews, D. W. K (1997), “A conditional Kolmogorov test,” Econometrica, 65, 1097-

1128.

[7] Angrist, J. D. and G. M. Kuersteiner (2004), "Semiparametric causality tests using the

policy propensity score," Working paper.

[8] Arcones, M. A. and B. Yu (1994), "Central limit theorems for empirical and U-processes

of stationary mixing sequences," Journal of Theoretical Probability, 7, 47-71.

[9] Bai, J. (2003), "Testing parametric conditional distributions of dynamic models," Re-

view of Economics and Statistics, 85, 531-549.

[10] Bai, J. and S. Ng (2001), "A consistent test for conditional symmetry in time series

models," Journal of Econometrics, 103, 225-258.

[11] Bahadur, R. R. (1960), "Stochastic comparison of tests," Annals of Mathematical Sta-

tistics, 31, 276-295.

[12] Bierens, H. J. (1990), “A consistent conditional moment test of functional form,” Econo-

metrica 58, 1443-1458.

[13] Bierens, H. J. and W. Ploberger (1997), “Asymptotic theory of integrated conditional

moment tests,” Econometrica 65, 1129-1151.

44

[14] Cawley, J. and T. Phillipson (1999), "An empirical examination of information barriers

to trade insurance," American Economic Review 89, 827-846.

[15] Chen, X., O. Linton, and I. van Keilegom (2003), "Estimation of semiparametric models

when the criterion function is not smooth," Econometrica 71, 1591-1608.

[16] Chiappori, P-A, and B. Salanié (2000), "Testing for asymmetric information in insurance

markets," Journal of Political Economy, 108, 56-78.

[17] Chiappori, P-A, B. Jullien, B. Salanié, and F. Salanié (2002), "Asymmetric information

in insurance: general testable implications," mimeo.

[18] Chung, K. L. (2001), A Course in Probability Theory, Academic Press, 3rd Ed.

[19] Dawid, P. (1980), "Conditional independence for statistical operations," Annals of Sta-

tistics, 8, 598-617.

[20] Delgado, M. A., W. Gonzáles Manteiga (2001), "Significance testing in nonparametric

regression based on the bootstrap," Annals of Statistics 29, 1469-1507.

[21] Fan, Y. and Q. Li (1996), “Consistent model specification tests: omitted variables,

parametric and semiparametric functional forms,” Econometrica, 64, 865-890.

[22] Hahn, J. (1998), "On the role of the propensity score in efficient semiparametric esti-

mation of average treatment effects," Econometrica, 66, 315-331.

[23] Härdle, W. and E. Mammen (1993), “Comparing nonparametric versus parametric re-

gression fits,” Annals of Statistics, 21, 1926-1947.

[24] Heckman, J. J., H. Ichimura, J. Smith, and P. Todd (1998), "Characterizing selection

bias using experimental data," Econometrica, 66, 1017-1098.

[25] Heckman, J. J., J. Smith, and N. Clemens (1997), "Making the most out of programme

evaluations and social experiments: accounting for heterogeneity in programme im-

pacts," Review of Economic Studies, 64, 487-535.

[26] Hirano, K., G. W. Imbens, and G. Ridder (2003), "Efficient estimation of average

treatment effects using the estimated propensity score," Econometrica, 71, 1161-1189.

[27] Khmaladze, E. V. (1993), “Goodness of fit problem and scanning innovation martin-

gales,” Annals of Statistics, 21, 798-829.

45

[28] Khmaladze, E. V. and H. Koul (2004), “Martingale transforms goodness-of-fit tests in

regression models,” Annals of Statistics, 32, 995-1034.

[29] Linton, O. and P. Gozalo (1997), "Conditional independence restrictions: testing and

estimation", Cowles Foundation Discussion Paper 1140.

[30] McKeague, I. W. and Y. Sun (1996), “Transformations of gaussian random fields to

Brownian sheet and nonparametric change-point tests,” Statistics and Probability Let-

ters, 28, 311-319.

[31] Nelsen, R. B. (1999), An Introduction to Copulas, Springer-Verlag, New York

[32] Newey, W. K. (1997), "Convergence rates and asymptotic normality of series estima-

tors," Journal of Econometrics, 79, 147-168.

[33] Nikitin, Y. (1995), Asymptotic Efficiency of Nonparametric Tests, Cambridge University

Press, New York.

[34] Nishiyama, Y. (2000), Entropy methods for martingales, CWI Tract, 128.

[35] Rosenbaum, P. and D. Rubin (1983), "The central role of the propensity score in ob-

servational studies for causal effects," Biometrika, 70, 41-55.

[36] Patton, A. (2006), "Modeling asymmetric exchange rate dependence," International

Economic Review, 47, 527-556.

[37] Rosenblatt, M. (1952), "Remarks on a multivariate transform," Annals of Mathematical

Statistics, 23, 470-472.

[38] Rubin, D. "Bayesian inference for causal effects: the role of randomization," Annals of

Statistics, 6, 34-58.

[39] Song, K. (2006), "Uniform convergence of series estimators over function spaces," Work-

ing Paper.

[40] Song, K. (2007), "Testing semiparametric conditional moment restrictions using condi-

tional martingale transforms," Working Paper.

[41] Stinchcombe, M. B. and H.White (1998), "Consistent specification testing with nuisance

parameters present only under the alternative," Econometric Theory, 14, 295-325.

[42] Stute, W. and L. Zhu (2005), “Nonparametric checks for single-index models,” Annals

of Statistics, 33, 1048-1083.

46

[43] Su, L. and H. White (2003a), "Testing conditional independence via empirical likeli-

hood," Discussion Paper, University of California San Diego.

[44] Su, L. and H. White (2003b), "A nonparametric Hellinger metric test for conditional

independence," Discussion Paper, University of California San Diego.

[45] Su, L. and H. White (2003c), "A characteristic-function-based test for conditional in-

dependence," Discussion Paper, University of California San Diego.

[46] van der Vaart, A. W. and J. A. Wellner (1996), Weak Convergence and Empirical

Processes, Springer-Verlag.

47

Table 1 : Size of the Tests : Nominal size is set at 5 Percent 9

n = 300

Order 1 Order 2 MG trns (TK1A) MG trns (TK1B) MG trns (TK2 ) Bootstrap

4 0.034 0.054 0.061 0.060

42 5 0.050 0.077 0.076 0.055

6 0.058 0.072 0.073 0.056

7 0.076 0.118 0.088 0.055

4 0.050 0.044 0.070 0.039

52 5 0.037 0.064 0.059 0.056

6 0.059 0.099 0.078 0.046

7 0.090 0.120 0.115 0.042

4 0.035 0.040 0.055 0.050

62 5 0.032 0.068 0.057 0.055

6 0.054 0.092 0.085 0.056

7 0.070 0.106 0.096 0.049

4 0.041 0.048 0.067 0.061

72 5 0.047 0.073 0.066 0.051

6 0.062 0.084 0.082 0.048

7 0.080 0.121 0.089 0.055

9Order 1 represents the number of terms included in the bivariate density function estimation usingLegendre polynomials. Order 2 represents the number of terms included in the estimation of conditionalexpectation given Ui using Legendre polynomials.MG trns (TK1A) and MG trns (T

K2 ) indicate tests based on martingale transforms, the former using the test

statistic TK1A which is based on the true conditional expectation of γKy (Y ) given U and the latter using the

test statistic TK2 which is based on the estimated conditional expectation of γKy (Y ) given U. MG trns (TK1B)is the symmetrized version of MG trns (TK1A).

48

Table 2 : Power of the Tests : Conditional Positive Dependence

n = 300

Order 1 Order 2 MG trns (TK1A) MG trns (TK1B) MG trns (TK2 ) Bootstrap

52 5 0.193 0.191 0.229 0.216

κ = 0.2 6 0.214 0.190 0.225 0.203

62 5 0.187 0.196 0.226 0.230

6 0.245 0.213 0.269 0.243

52 5 0.575 0.581 0.678 0.676

κ = 0.4 6 0.604 0.605 0.667 0.664

62 5 0.591 0.585 0.679 0.673

6 0.626 0.612 0.694 0.701

52 5 0.909 0.938 0.953 0.950

κ = 0.6 6 0.930 0.921 0.957 0.956

62 5 0.898 0.927 0.949 0.949

6 0.919 0.912 0.951 0.948

49

Table 3 : Power of the Tests : Conditional Negative Dependence

n = 300

Order 1 Order 2 MG trns (TK1A) MG trns (TK1B) MG trns (TK2 ) Bootstrap

52 5 0.024 0.176 0.208 0.235

κ = −0.2 6 0.039 0.217 0.204 0.208

62 5 0.029 0.197 0.191 0.238

6 0.036 0.229 0.201 0.201

52 5 0.070 0.614 0.652 0.636

κ = −0.4 6 0.068 0.647 0.634 0.640

62 5 0.072 0.644 0.644 0.649

6 0.062 0.656 0.628 0.650

52 5 0.367 0.944 0.956 0.935

κ = −0.6 6 0.319 0.965 0.957 0.928

62 5 0.359 0.943 0.941 0.925

6 0.306 0.949 0.950 0.933

50

0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8Figure 1: The Empirical Density of Test Statistics both under the Null and under the Alternative: n=3000

with true F(y|x) underH0

with estimated F(y|x)under H0

with true F(y|x) underH1

with estimated F(y|x)under H1

-1.5 -1 -0.5 0 0.5 10

0.2

0.4

0.6

0.8

1

1.2

1.4Figure 2: Empirical Density of Empirical Processes at (x,y)=(0.5,0.5) under the Null and under the Alternative: n=3000

with true F(y|x) underH0

with true F(y|x) underH1

with estimated F(y|x)under H0

with estimated F(y|x)under H1