estimation for the general sample selection models

Austral. J. Statist. 39(1), 1997, 17-24

ESTIMATION FOR THE GENERAL SAMPLE SELECTION MODELS

YOU-GAN WANG’ AND MING YIN’

CSIRO Division of Mathematics and Statistics and University of Florida

Summary

Consider a general regression model with an arbitrary and unknown link function and a stochastic selection variable that determines whether the outcome variable is observable or missing. The paper proposes U-statistics that are based on kernel functions as estimators for the directions of the parameter vectors in the link function and the selection equation, and shows that these estimators are consistent and asymptotically normal. Key words: General regression model; kernel functions; selection equation; U- statistics.

1. Introduction

Consider a general regression problem: we observe the scalar outcome Y and exogenous pdimensional regression variable X, and are interested in studying the relationship between Y and X. Assume that the general regression model for this relationship is

where the link function G and the error distribution are unknown and arbitrary. The importance of this model and some new regression methods can be found in Duan & Li (1985) and Li & Duan (1989). When the true link function G is unknown, CY is not identified and the vector p is identified only u p to a multiplicative scalar (i.e. its direction is identified). Interestingly enough, Duan 81 Li (1991) find under certain conditions that many maximum likelihood-type regression estimators estimate the direction of p consistently, even when the link function is misspecified.

In many cases, the outcome might be missing, and the randomness of missing also depends on the covariate X. In this case, ‘missing’ should be regarded as a special outcome, and so be incorporated into the analysis. We assufie that there is a stochastic mechanism that determines whether the outcome is’observable or missing, and that the selection equation takes the form

Y = G(a + PX, c), (1)

s = I { s ( q + I X J ) > O}, (2)

Received December 1995; revised October 1996; accepted November 1996. ‘CSIRO Division of Mathematics and Statistics, PO Box 120, Cleveland, QLD 4163. 2Dept of Statistics, University of Florida, Gainesville, FA, USA. Acknowledgments. The authors thank an anonymous referee for useful comments, and Profes- sor Zhongguo Zheng for drawing our attention to this problem.

18 YOU-GAN WANG & MING YIN

where 1 is an indicator and S equals 0 or 1 according as s(.) 5 0 or s(-) > 0, indicating the outcome is missing or observable.

The sample selection model is intended to deal with the problem of non- randomly missing outcomes and to provide a valid analysis. The link functions g and s are assumed unknown and arbitrary. The bivariate errors (e,6) associated with each observation are assumed to be independent and identically distributed (i.i.d.1 with an unknown, arbitrary density.

In order to identify the outcome equation and the selection equation sep- arately, assume that there is a component of X which affects the selection but not the outcome (Duan & Li, 1987 p.26). Without loss of generality, assume

P1 = 0, i 1 # 0.

Duan & Li (1987) propose a corrected maximum likelihood estimate which is consistent for the slope vector /? up to a multiplicative scalar. However, a strong condition is required: for all linear combinations $X, the conditional expectation E($X I p X , C X ) is linear. In this paper, we use U-statistics as an estimator of the slope vector ,D up to a multiplicative scalar. Under some smooth conditions, it is proved that the estimator is consistent and asymptotically normal. To obtain consistent estimators for p, we need to obtain consistent estimators for C because of the sample selection. The nonparametric estimator for ( is also based on U-statistics, which are also applicable to a general dichotomous regression model.

Suppose that the random variables (r.v.s) X and (e,6) are independent. Denote cu + PX by 81 and + (X by 0 2 ; we use lower case z and y to denote the realisations of the r.v.s X and Y . From (1) and (2), we know that Y depends on X only through 81 and 8 2 given X = z and S = 1. Therefore

P(Y I x = 2, s = 1) = hl(&,&, Y).

Pr{S = 1 I X = Z} = h2(02) .

(3)

(4)

For given 2, let h Z ( B 2 ) be the probability of {S = 1) conditional on X = z, i.e.

Here both hl and hz are real functions of multivariables.

convenience, we also define Denote the densities of the r.v.s X and ( X , Y ) by u(z) and g(z,y). For

Ul(4 = P(S,S = 11, gl(z,Y) = P(Z,Y,S = 1). /

2. Kernel Estimators

From (4) and (3) ,

ESTIMATION FOR THE GENERAL SAMPLE SELECTION MODELS 19

Using (5) and (6), we obtain

Observe that the left hand side of (7) has the same direction as C while the left hand side of (8) is a linear combination of p and I.

Let K r q define a set of continuous functions w(.) that vanish outside [0,1] and whose moments for j = 1,. . . , q - 1 are given by

It is obvious that Krq is a non-empty set. Let w o ( t ) E Koq and ol(t) E K1,. In terms of the p and ( p + 1)-vectors,

I z = (2(1) , - * ¶ q p ) ) l , z = ( q l ) , . . . , q p ) , Y ) 7

define P P+ 1

no('> = n u o ( z ( i ) ) , 'o(') = O O ( z ( i ) ) , i = l i= l

and for any k = 1,. . . , p , P P+l

Q k ( z ) = Wl(z(k)) n w O ( " ( i ) ) , A k ( Z ) = W l ( Z ( k ) ) n w O ( z ( i ) ) . i = l , i # k i= l , i # k

Let zl,. . . , z , denote n ( p + 1)-vectors like z. Define for 1 5 i , j , k 5 n,

where the summation is over all the permutations of (i,j,-!), and hJs the band- width and I ( y ) = 0 if y is missing and 1 otherwise. We are now ready to introduce some U-statistics based on these kernel functions:

Let W = ( z , e ) , <n = ( < n , l , . .. ,<n ,p ) , Bn = (&). .. We show that <n and P n are consistent estimates of E { [ d u ( X ) / d X ] u l ( X ) } and E { [ d g l ( Z ) / d X ] u l ( X ) } .


3. Asymptotic Properties

Assume that the kernel functions wo( - ) E /c0,2~-1, wl(.) E / c 1 , 2 ~ - 1 . Our conclusions rely on the following condition

Condition C1. u ( z ) and u1(z) are q th order differentiable and theq th partial derivatives are uniformly bounded for some q 2 2.

Theorem 1. Under condition C1, there exists a sequence {hn} such that In and an converge almost surely to a( and bP + c ( , respectively, for some constants a,, b , c.

Under the constraint condition P1 = 0, (1 # 0, and if a # 0, define

From Theorem 1 we know that bn is a consistent estimator of p up to a multiplicative scalar if b # 0. We can also have asymptotic normality.

Theorem 2. If q > i ( p + 5 ) and condition C1 is satisfied, there exists a sequence (h,} such that

where C1 = 9cov[W(')(Z)], C2 = 9c0v[W(~)(Z)]. W ( 2 ) ( Z ) are p-vector r.v.s whose k th components are

Here both W ( l ) ( Z ) and

and

respectively.

These theorems follow from the four lemmas below. In them,

we write h for h, when there is no confusion.


Denote the three terms inside the square brackets by 2’1, T 2 and T 3 , respectively. Then

(15) TI = I / ( *)%I( 2 2 - y) 2 3 - I ( Y 3 ) d X 2 d X 3 ,

(16)

If f(x) is a multivariate function that is qth order differentiable, if w has the same dimension as x, and E is a real number and small, we have the following Taylor expansion,

q-1 ’

f(x + EW) = c + l f ( X ) + E q L1(l - t>”-’qJ(x + t € ) d t , 2. 62 - i=O

where Dhf(x) = ( ~ ‘ V ) ~ f ( x ) . By applying the Taylor series expansions to u and u1, (16) becomes

Similarly, the second and third terms in the square brackets are

k

Equation (11) follows. Equation (12) can be proved in a similar way, and (13) and (14) follow immediately from (11) and (12) by taking expectations.


E P n , k - E-Ui(x) " k

Lemma 2. Under condition C1,

= O(hg- ' ) .

Equation (20) follows from (17) immediately, and (21) can be proved in a similar way.

Lemma 3.

Because ET1 = ET2, equations (17) and (18) yield ( 2 2 ) below.

Equation (23) can be obtained in a similar way. From (7) and (22), we know that E ( d u l ( X ) / B X u ( X ) } has the same direction as C; from (8) and (23), we know that E { d g l ( X ) / d X u l ( X ) } is a linear combination of ( and p.

From Serfling (1980 p.188) we have

where

and


Lemma 4. Under condition C1,

Proof. Clearly, R:; is a U-statistic, and therefore, from Serfling (1980 p.183) we can express its variance as

-(I) whererl = var[E{Hn,k(21,z2~z3) I zl}l? '2 = vw[E{E~~(z1~22~23) I z1322)}] and r3 = var{H,,,(Z,, -(I) Z,, 2,)). After some algebra, we obtain

T~ = 0, r2 = O(h-(p+2)), r3 = O(h-(2p+2) > 7

which leads to (26). Equation (27) can be obtained in a similar way.

Proof of Theorem 1. From Lemma 4, CF=, var{R$} < 00 if we choose h, so that h, > 0, h, 3 0 as n -+ co, and

for example, h, = n-1/[2(P+3)1 will do. In these cases, (1) converges almost surely to 0. The conclusions in Theorem 1 follow from (24) and (25) and the Strong Law of Large Numbers.

Proof of Theorem 2. Under the conditions of Theorem 2, one can choose h, (e.g. h, = {nlog(n)}-1/[2(Q-')1) so that ,/Zh:-', l / (nhR3) and l/(n2hiP+3) all converge to 0 with probability 1. Then ,/Z R!$ converges to 0 in probability, and ,/Z(cn - a() has the same asymptotic distribution as

which is given by (9), from the Central Limit Theorem. We can obtain (10) in a similar way.

As a referee has pointed out, our estimation method requires that the pro- portionality constants satisfy a # 0 and b # 0. From (7), (8) and Lemma 3, we know that these nonzero conditions are equivalent to

E{u2(X)h;(&)) # 0 and E{ ah1(e17e27Y).:(z)} # 0. 8 4

These conditions clearly hold in general except for some uninteresting cases.


4. Conclusion

The estimation method proposed here does not require specification of the link functions or the error distributions. Therefore we expect robust performance but lower efficiency compared with parametric methods that are based on distri- butional assumptions and known link functions. Theorems 1 and 2 provide the consistency and asymptotic normality of the estimators analytically. It would be of great interest to carry out a numerical study to compare these different estimators and apply them to real data.

References D U A N , N. & LI , K.-C. (1985). The ordinary least squares estimation for the general-link

linear models, with applications. Technical Summary Report Ser. 2880, Math. Res. Center, University of Wisconsin, Madison.

-- & - (1987). Distribution-free and link-free estimation for the sample selection model. J . Econometrics 35, 25-35. & - (1991). Slicing regression: a link-free regression method. Ann. Statist. 19, 505-530.

LI , K.-C. & DUAN, N. (1989). Regression analysis under link violation. Ann. Statist. 17,

SERFLING, R. (1980). Approximation Theorems of Mathematical Statistics. New York: John 1009-1 052.

Wiley.

estimation for the general sample selection models

Documents