uniform convergence of semi-nonparametric densities and

Uniform Convergence ofSemi-Nonparametric Densities

and Their Derivatives, withApplication to the

Semi-Nonparametric FractionalIndex Regression Model

Herman J. BierensProfessor Emeritus of Economics

Pennsylvania State University

IntroductionThe paper to be presented is a further elaboration of the fol-lowing two papers:

• Bierens, H. J. (2014): ’’The Hilbert Space Theoretical Foun-dation of Semi-Nonparametric Modeling’’. Chapter 1 in:J. Racine, L. Su and A. Ullah (eds), The Oxford Handbookof Applied Nonparametric and Semiparametric Economet-rics and Statistics, Oxford University Press,

• Bierens, H. J. (2014): ’’Consistency and Asymptotic Nor-mality of Sieve ML Estimators Under Low-Level Condi-tions’’, Econometric Theory 30, 1021-1076,

• and many more, of course.

In these two papers I have shown that any continuous densityh(u) on (0, 1) satisfying h(u) > 0 on (0, 1) can be approxi-mated arbitrarily close by

hn(u) =

¡1 +

Pnk=1 δk

√2 cos(kπu)

¢21 +

Pnm=1 δ

2m

,

where

δm =

R 10

√2 cos(mπu)

ph(u)duR 1

0

ph(u)du

, m ∈ N.Then

limn→∞

Z 1

0

|h(u)− hn(u)|du = 0.

The latter implies thatlimn→∞hn(u) = h(u) a.e. on (0, 1),

i.e.,limn→∞hn(u) = h(u)

pointwise in u in a set with Lebesgue measure 1.

Moreover, given an a priory chosen absolutely continuousdistribution functionG(x)with continuous density g(x) andsupport R, any continuous density f(x) with support R canbe written as

f(x) = h(G(x))g(x),where

h(u) = f(G−1(u))dG−1(u)

du=f(G−1(u))g(G−1(u))

is a continuous density on (0, 1) satisfying h(u) > 0 on(0, 1).Thus, denoting

fn(x) =

¡1 +

Pnk=1 δk

√2 cos(kπG(x))

¢21 +

Pnm=1 δ

2m

g(x)

= hn(G(x))g(x)

with the δm’s the same as before, we havelimn→∞ fn(x) = f(x) a.e. on R.

The questions I will address in this talk are:

• Under what conditions on h(u) do we also havelimn→∞ sup

0≤u≤1|hn(u)− h(u)| = 0

limn→∞ sup

0≤u≤1

¯̄(d/du)ihn(u)− (d/du)ih(u)

¯̄= 0,

i = 1, 2, ..., ` for some ` ∈ N.• Under what conditions on f(x) and g(x) are these condi-

tions onh(u) = f(G−1(u))/g(G−1(u))

satisfied, so thatlimn→∞ supx∈R

|fn(x)− f(x)| = 0 ,limn→∞ supx∈R

¯̄(d/dx)ifn(x)− (d/dx)if(u)

¯̄= 0 ,

i = 1, 2, ..., `

as well.

• What are the rates of uniform convergence ?

The answers to these questions play a key role in deriving theconsistency and asymptotic normality of sieve estimators ofsemi-nonparametric (SNP) models.In

• Bierens, H. J.. (2014), ’’Consistency and Asymptotic Nor-mality of Sieve ML Estimators Under Low-Level Condi-tions’’, Econometric Theory 30, 1021-1076,

I have imposed these conditions by assuming that the para-meters δm in

fn(x) =

¡1 +

Pnk=1 δk

√2 cos(kπG(x))

¢21 +

Pnm=1 δ

2m

g(x)

satisfy∞Xm=1

m`|δm| <∞ for some ` ∈ N.However, as will be shown, the latter condition can be de-rived from smoothness and moment conditions on f and g.

SNP models

SNP models are models for which the functional form is onlypartly parametrized and where the non-specified parts areunknown functions which are approximated by a series ex-pansion.

For example, the SNP discrete choice model takes the form:Pr[Y = 1|X] = F (θ0X),

whereF (x) is an unknown distribution function, Y ∈ {0, 1}is the dependent variable andX is a vector of covariates.

This case is considered as an application of the approach in:

• Bierens, H. J.. (2014), ’’Consistency and Asymptotic Nor-mality of Sieve ML Estimators Under Low-Level Condi-tions’’, Econometric Theory 30, 1021-1076.

Therefore, the following question arises:

• How to model distribution functions and density functionsby series expansions in a flexible way?

The answer is:

• Use Hilbert space theory.

Hilbert spaces of functions

Let w(x) be a density on R.

Consider the space L2 (w) of real functions f(x) onR satis-fying Z

f(x)2w(x)dx <∞.Endow the space L2 (w) with the innerproduct

hf, gi =Zf(x)g(x)w(x)dx

and associated norm ||f || =phf, fi and metric ||f − g||.

Then L2 (w) is a Hilbert space.

Recall that

A Hilbert space H is a vector space endowed with an in-ner product and associated norm and metric such that everyCauchy sequence in H takes a limit in H.

In particular, for any sequence fn ∈ L2 (w) satisfyinglim

min(m,k)→∞kfm − fkk = 0

(so that fn a Cauchy sequence) it can be shown that thereexists a function f ∈ L2 (w) such that

limn→∞ kfn − fk = 0.

Hence, L2 (w) is a Hilbert space.

Complete orthonormal sequences

Let {ϕj(x)}∞j=0 be an orthonormal sequence in L2 (w):

hϕi,ϕji =Z

ϕi (x)ϕj (x)w(x)dx =½1 if i = j0 if i 6= j

Let fn(x) be the linear projection of f(x) on {ϕj(x)}nj=0:fn(x) =

nXj=0

γjϕj (x) , where ||f − fn||2 is minimal.

The solutions to this problem are

γj = hf,ϕji =Zf(x)ϕj (x)w(x)dx,

∞Xj=0

γ2j <∞.

The γj’s involved are called the Fourier coefficients of f(x).

An orthonormal sequence {ϕj(x)}∞j=0 inL2 (w) is called com-plete if for an arbitrary f ∈ L2 (w) ,

limn→∞ ||f − fn|| = lim

n→∞

sZ(f(x)− fn(x))2w(x)dx = 0,

where

fn(x) =nXj=0

γjϕj(x) with γj = hf,ϕji .

Then f(x) = limn→∞ fn(x) a.e. in the support of w(x), i.e.f(x) = lim

n→∞ fn(x)

pointwise in x in a set X ⊆ {x ∈ R : w(x) > 0} satisfyingRXw(x)dx = 1.

Note that the rate of convergence may depend on x, i.e., forevery ε > 0 and x ∈ X there exists an n0(ε, x) ∈ N suchthat

|f(x)− fn(x)| < ε if n > n0(ε, x).

Examples of complete orthonormal sequencesHermite polynomials

In the casew(x) = exp

¡−x2/2¢ /√2π, x ∈ R,the Hermite polynomials form a complete orthonormal se-quence in the corresponding Hilbert space L2 (w) .

Hermite polynomials ϕk (x) , k ≥ 0, on R can be generatedrecursively by√

k + 1ϕk+1(x)− x.ϕk(x) +√kϕk−1(x) = 0, k ≥ 1,

starting fromϕ0(x) = 1,ϕ1(x) = x.

Legendre polynomials

In the case that w(u) is the uniform density on [0, 1],w(u) = I (0 ≤ u ≤ 1) ,

the Hilbert space L2(w) involved is denoted by L2(0, 1).

The Legendre polynomials form a complete orthonormal se-quence in L2 (0, 1) .

Legendre polynomials ϕk(u) on [0, 1] can be generated re-cursively by

(k + 1) /2√2k + 3

√2k + 1

ϕk+1(u) + (0.5− u) .ϕk(u)

+k/2√

2k + 1√2k − 1ϕk−1(u) = 0, k ≥ 1,

starting fromϕ0(u) = 1,ϕ1(u) =

√3 (2u− 1) .

Trigonometric series

Other complete orthonormal sequences in L2(0, 1) are:

• the cosine series,

ϕk(u) =

½1 for k = 0,√2 cos(kπu) for k ∈ N.

• the Fourier series,ϕ0(u) = 1,

ϕ2k−1(u) =√2 sin (2kπu) , k ∈ N,

ϕ2k(u) =√2 cos (2kπu) , k ∈ N.

• the sine seriesϕk(u) =

√2 sin(kπu), k ∈ N.

How well (or bad) do the trigonometric series fit?

In the cases of the cosine and Fourier series I will use the testfunction

f(u) = u(4− 3u), 0 ≤ u ≤ 1,whereas in the sine case I will use the test function

f 0(u) = 4− 6u, 0 ≤ u ≤ 1.

Since f(u) is a polynomial of order 2, we can represent f(u)exactly by a linear combination of the first three Legendrepolynomials, i.e.,

f(u) ≡ α0ϕ0(u) + α1ϕ1(u) + α2ϕ2(u)

whereϕ0(u) = 1, ϕ1(u) =

√3 (2u− 1) , ϕ2(u) =

√5¡6u2 − 6u+ 1¢

and

αm =

Z 1

0

f(u)ϕm(u)du, m = 0, 1, 2.

However, the trigonometric series ’’wobble’’ with increasingfrequency, so the question is whether these series are capableof approximating a smooth non-periodic function like

f(u) = u(4− 3u)using only a modest number of leading elements of theseseries.

Cosine series

Test function:f(u) = u(4− 3u), 0 ≤ u ≤ 1,

with SNP version:

fn(u) = α0 +nX

m=1

αm√2 cos(mπu)

where

α0 =

Z 1

0

f(u)du = 1,

αm =

Z 1

0

f(u)√2 cos(mπu)du

=

½−6√2(mπ)−2 ifm ≥ 2 is even,−2√2(mπ)−2 ifm ≥ 1 is odd.

Note that in this case,

limn→∞ sup

0≤u≤1|f(u)− fn(u)| ≤ π−26

√2 limn→∞

∞Xm=n+1

m−2 = 0.

The case n = 10

We see that fn(u) for n = 10 approximates f(u) quite well,except in the tails of fn(u).The reason for the latter is that

f 0n(u) = −nXk=1

αkkπ√2 sin(kπu),

so that f 0n(0) = f 0n(1) = 0.

The case n = 20

As expected, the tail fit becomes better for larger truncationorders n.

Fourier series

Test function:f(u) = u(4− 3u), 0 ≤ u ≤ 1,

with SNP version:

fn(u) = γ0+

n/2Xk=1

γ2k√2 cos(2kπu)+

n/2Xk=1

γ2k−1√2 sin(2kπu)

where

γ0 =

Z 1

0

f(u)du = 1,

γ2k =

Z 1

0

f(u)√2 cos(2kπu)du = −

³3/√2´(kπ)−2 ,

γ2k−1 =Z 1

0

f(u)√2 sin(2kπu)du = −

³√2kπ

´−1.

The case n = 10

Clearly, the fit of the Fourier series approximation forn = 10is pretty bad compared with the cosine series approxima-tions, especially in the tails.

The case n = 20

For n = 20 the fit is slightly better, as expected, but stillinferior to the cosine case.

This bad fit may be due to the slower rate of convergence tozero of γ2k−1, i.e.,

γ2k−1 =Z 1

0

f(u)√2 sin(2kπu)du = O(k−1),

compared with

α2k−1 =Z 1

0

f(u)√2 cos((2k − 1)πu)du = O(k−2)

in the cosine case, whereas γ2k and α2k are both of orderO(k−2).

Sine series

Test function:f 0(u) = 4− 6u, 0 ≤ u ≤ 1,

with SNP version:

f 0n(u) = −πnX

m=1

αmm√2 sin(mπu)

where the αm’s are the same as in the cosine case.

By the completeness of the sine series in L2(0, 1) we musthave

limn→∞ f

0n(u) = f

0(u) a.e. on (0, 1).

The case n = 10

As expected, the fit in the sine case is bad in the tails, becausef 0(0) = 4, f 0(1) = −2,

whereasfn(0) = fn(1) = 0.

The case n = 20

The fit in the straight middle part of f 0(u) has improved dueto the fact that by the completeness of the sine series,

limn→∞

Z 1

0

(f 0(u)− f 0n(u))2 du = 0.

To see what happens as n becomes larger, consider next thecase n = 50.

The case n = 50

Thus, what we see happening in these pictures is thatlimn→∞ f

0n(u) = f

0(u) a.e. on (0, 1),as predicted, but obviously

limn→∞ sup

0≤u≤1|f 0n(u)− f 0(u)| > 0.

Densities and distribution functionsSeries representation of densities

Given a density w(x) with support (a, b) ⊆ R and corre-sponding complete orthonormal sequence ϕj(x), every den-sity function f(x) with support (a, b) can be written as

f(x) = limn→∞ fn(x) a.e. on (a, b)

where

fn(x) =w(x)

³Pnj=0 γjϕj(x)

´2Pnj=0 γ

2j

for some sequence {γj}∞j=0 satisfying∞Xj=0

γ2j =

Z b

a

f(x)dx = 1

However, there are uncountable many sequences γj for whichthis is true, due to the square in the expression for fn(x).

In particular, let for an arbitrary τ ∈ (a, b)g (x|τ ) = (I (x ≤ τ )− I (x > τ ))

pf(x)/w(x).

Then f(x) = w(x)g (x|τ )2, hence g (x|τ ) ∈ L2 (w) .Consequently,

f(x) = limn→∞ fn(x|τ ) a.e. on (a, b)

as well, where

fn(x|τ ) =w(x)

³Pnj=0 γj(τ )ϕj(x)

´2Pnj=0 γj(τ )

2

with

γj(τ ) =

Z b

a

g (x|τ )ϕj(x)w(x)dx

=

Z τ

a

ϕj(x)pf(x).w(x)dx−

Z b

τ

ϕj(x)pf(x).w(x)dx.

If ϕ0(x) ≡ 1, which is often the case, then

γ0(τ ) =

Z τ

a

pf(x).w(x)dx−

Z b

τ

pf(x).w(x)dx,

so that we can choose

γ0 ∈Ã0,

Z b

a

pf(x).w(x)dx

!.

Then the condition ∞Xj=0

γ2j = 1

can be implemented by reparametrizing the γj’s as

γ0 =1p

1 +P∞

m=1 δ2m

γj =δjp

1 +P∞

m=1 δ2m

, j ≥ 1,where ∞X

m=1

δ2m <∞.

Theorem: Given a density w(x) with support (a, b) ⊆ R,and corresponding complete orthonormal sequence{ϕm(x)}∞m=0in L2(w), with ϕ0(x) ≡ 1, for every density function f(x)with support (a, b) there exist possibly uncountable many se-quences {δm}∞m=1 satisfying

∞Xm=1

δ2m <∞such that

f(x) = limn→∞w(x)

(1 +Pn

k=1 δkϕk(x))2

1 +Pn

m=1 δ2m

a.e. on (a, b).

However, if w(x), f(x) and the ϕm(x)’s are continuous on(a, b) then the sequence {δm}∞m=1 is unique, and is given by

δm =

R ba ϕm(x)

pw(x)

pf(x)dxR b

a

pw(x)

pf(x)dx

, m ∈ N.

Gallant, A. R., and D. W. Nychka (1987): ’’Semi-NonparametricMaximum Likelihood Estimation’’, Econometrica 55, 363-390,

use this approach to generalize the standard normal densityto

fn(x) =exp

¡−x2/2¢√2π

× (1 +Pn

k=1 δkϕk(x))2

1 +Pn

m=1 δ2m

,

where theϕk(x)’s are Hermite polynomials, generated recur-sively by√k + 1ϕk+1(x)− 1√

k + 1x.ϕk(x)+

√kϕk−1(x) = 0, k ≥ 1.

starting fromϕ0(x) = 1, ϕ1(x) = x.

They call these densities semi-nonparametric (SNP) densi-ties.

Density and distribution functions on the unit interval

Let G(x) be an a priori chosen distribution function withdensity g(x) and support

X = {x ∈ R : g(x) > 0},Every absolutely continuous distribution functionF (x)withsupport X can be written as

F (x) = H(G(x))

whereH(u) is an absolutely continuous distribution functionon [0, 1], namely

H(u) = F¡G−1(u)

¢,

where G−1(u), u ∈ [0, 1], is the inverse of G(x).

The density f(x) of F (x) = H(G(x)) can be written asf(x) = h(G(x))g(x)

where h(u) is the density ofH(u):

h(u) =f¡G−1(u)

¢g (G−1(u))

.

Therefore, in modeling general density and distribution func-tions semi-nonparametrically, it suffices to model the densityh(u) and its c.d.f. H(u) semi-nonparametrically.

Theorem: Given a complete orthonormal sequence ϕm(u),with ϕ0(u) ≡ 1, in the Hilbert space L2(0, 1), where thefunctions ϕm(u) are continuous on (0, 1), for every contin-uous density function h(u) on (0, 1) satisfying h(u) > 0 on(0, 1) there exists a unique sequences {δm}∞m=1 defined by

δm =

R 10 ϕm(u)

ph(u)duR 1

0

ph(u)du

such that, with

hn(u) =(1 +

Pnk=1 δkϕk(u))

2

1 +Pn

m=1 δ2m

,

we haveh(u) = lim

n→∞hn(u) a.e. on [0, 1].The latter is equivalent to

limn→∞

Z 1

0

|hn(u)− h(u)| du = 0.

The advantage of the cosine sequenceϕm(u) =

√2 cos(mπu),

with SNP density

hn(u) =

¡1 +

Pnk=1 δk

√2 cos(kπu)

¢21 +

Pnm=1 δ

2m

,

is that then the SNP distribution functionHn(u) =

Z u

0

hn (z) dz

has a closed form expression:

Hn(u) = u+1

1 +Pn

m=1 δ2m

"2√2

nXk=1

δksin (kπu)

kπ+

nXm=1

δ2msin (2mπu)

2mπ

+2nXk=2

k−1Xm=1

δkδmsin ((k +m)πu)

(k +m)π+ 2

nXk=2

k−1Xm=1

δkδmsin ((k −m)πu)(k −m)π

#.

Of course, similar closed form expressions can be derivedfor the Fourier series and the sine series.

However, as has been demonstrated before, the cosine seriesyields the best fit.

Uniform convergence of SNP functionson (0,1) based on the cosine series

Let ϕ(u) be a real function on (0, 1) satisfyingZ 1

0

ϕ(u)2du <∞so that

ϕ(u) ∈ L2(0, 1).Recall that by the completeness of the cosine series we have

limn→∞ϕn(u) = ϕ(u) a.e. on (0, 1)

where

ϕn(u) = γ0 +nX

m=1

γm√2 cos(mπu)

is the SNP version of ϕ(u), with

γ0 =

Z 1

0

ϕ(u)du, γm =Z 1

0

√2 cos(mπu)ϕ(u)du form ∈ N.

The questions I will now address are:

Under what conditions do we have

limn→∞ sup

0≤u≤1|ϕn(u)− ϕ(u)| = 0,

limn→∞ sup

0≤u≤1|diϕn(u)/(du)i − diϕn(u)ϕ(u)/(du)i| = 0

for i = 1, 2, .., `and what are the rates of uniform convergence?

The case ` = 1

Suppose that ϕ(u) is twice differentiable on (0, 1), and thatϕ00(u) ∈ L2(0, 1).

Denote

α0 =

Z 1

0

ϕ00(u)du = ϕ0(1)− ϕ0(0),

αm =

Z 1

0

ϕ00(u)√2 cos(mπu)du form ∈ N,

ϕ00n(u) = α0 +nX

m=1

αm√2 cos(mπu) for n ∈ N.

Then

limn→∞

Z 1

0

(ϕ00n(u)− ϕ00(u))2 du = limn→∞

∞Xm=n+1

α2m = 0

Now the primitive of ϕ00n(u) takes the general form

ϕ0n(u) = c + (ϕ0(1)− ϕ0(0))u+

nXm=1

αmπm

√2 sin(mπu)

for some constant c.Moreover, it can be shown that ϕ0(0) ∈R, ϕ0(1) ∈ R.

Since ϕ0n(1) = ϕ0(1) and ϕ0n(0) = ϕ0(0) if c = ϕ0(0), thelatter is a natural choice for c, so that

ϕ0n(u) = ϕ0(0) + (ϕ0(1)− ϕ0(0))u +nX

m=1

αmπm

√2 sin(mπu)

= ϕ0(0) +Z u

0

ϕ00n(v)dv.

Since alsoϕ0(u) = ϕ0(0) +

Z u

0

ϕ00(v)dv

we have

lim supn→∞

sup0≤u≤1

|ϕ0(u)− ϕ0n(u)| ≤ lim supn→∞

Z 1

0

|ϕ00(v)− ϕ00n(v)| dv

≤ lim supn→∞

sZ 1

0

(ϕ00(v)− ϕ00n(v))2 dv =

vuut limn→∞

∞Xm=n+1

α2m = 0.

Consequently, the series expansion

ϕ0(u) ≡ ϕ0(0) + (ϕ0(1)− ϕ0(0))u+∞Xm=1

αmπm

√2 sin(mπu)

holds exactly and uniformly on [0, 1].

As to the rate of uniform convergence, note that

sup0≤u≤1

|ϕ0(u)− ϕ0n(u)| ≤√2

∞Xm=n+1

|αm|πm

= o(n−1/2),

where the latter is due to the following lemma:

Lemma.P∞

m=1 α2m <∞ implies that for c > 1/2,∞X

m=n+1

m−c|αm| = o³n1/2−c

´.

Next, the primitive ϕ(u) of ϕ0(u) takes the general form

ϕ(u) = c + ϕ0(0)u+1

2(ϕ0(1)− ϕ0(0))u2

−∞Xm=1

αm(πm)2

√2 cos(mπu)

for some constant c.But thenZ 1

0

ϕ(v)dv = c +1

2ϕ0(0) +

1

6(ϕ0(1)− ϕ0(0)) ,

so that

ϕ(u) ≡Z 1

0

ϕ(v)dv − 12ϕ0(0)− 1

6(ϕ0(1)− ϕ0(0))

+ϕ0(0)u+1

2(ϕ0(1)− ϕ0(0))u2

−∞Xm=1

αm(πm)2

√2 cos(mπu).

exactly and uniformly on [0, 1].

Moreover, it can be shown that form ∈ N,Z 1

0

u√2 cos(mπu)du =

√2(−1)m − 1(mπ)2

,Z 1

0

u2√2 cos(mπu)du =

2√2(−1)m(mπ)2

,

hence, since

limn→∞

∞Xm=n+1

m−2 = 0,

it follows that

u ≡ 1

2+√2∞Xm=1

(−1)m − 1(mπ)2

√2 cos(mπu),

u2 ≡ 1

3+ 2√2∞Xm=1

(−1)m(mπ)2

√2 cos(mπu),


Thus,

ϕ(u) ≡Z 1

0

ϕ(v)dv + ϕ0(0)√2∞Xm=1

(−1)m − 1(mπ)2

√2 cos(mπu)

+(ϕ0(1)− ϕ0(0))√2∞Xm=1

(−1)m(mπ)2

√2 cos(mπu)

−∞Xm=1

αm(πm)2

√2 cos(mπu)


However, sinceϕ ∈ L2(0, 1),ϕ(u) has also the cosine seriesrepresentation

ϕ(u) = γ0 +∞Xm=1

γm√2 cos(mπu) a.e. on (0, 1),

where

γ0 =

Z 1

0

ϕ(u)du, γm =Z 1

0

ϕ(u)√2 cos(mπu)du form ∈ N.

Therefore we must have that form ∈ N,γm =

Z 1

0

ϕ(u)√2 cos(mπu)du

=

√2 (ϕ0(1)(−1)m − ϕ0(0))− αm

(mπ)2.

Consequently, ϕ(u) has two equivalent cosine series repre-sentations, namely

ϕ(u) ≡Z 1

0

ϕ(v)dv − 12ϕ0(0)− 1

6(ϕ0(1)− ϕ0(0))

+ϕ0(0)u+1

2(ϕ0(1)− ϕ0(0))u2

−∞Xm=1

αm(πm)2

√2 cos(mπu)

≡Z 1

0

ϕ(v)dv +∞Xm=1

γm√2 cos(mπu),


However, there is a difference between these two expres-sions, namely their rates of uniform convergence are differ-ent!

Denoting

ϕn(u) =

Z 1

0

ϕ(v)dv − 12ϕ0(0)− 1

6(ϕ0(1)− ϕ0(0))

+ϕ0(0)u+1

2(ϕ0(1)− ϕ0(0))u2

−nX

m=1

αm(πm)2

√2 cos(mπu)

we have

sup0≤u≤1

|ϕn(u)− ϕ(u)| ≤√2

π2

∞Xm=n+1

|αm|m2

= o³n−3/2

´,

where the latter is again due to

Lemma.P∞

m=1 α2m <∞ implies that for c > 1/2,∞X

m=n+1

m−c|αm| = o³n1/2−c

´.

On the other hand, denoting

ϕ∗n(u) =Z 1

0

ϕ(v)dv +nX

m=1

γm√2 cos(mπu),

we have

sup0≤u≤1

|ϕ∗n(u)− ϕ(u)| ≤√2

∞Xm=n+1

|γm|

≤ 2

π2(|ϕ0(1)| + |ϕ0(0)|)

∞Xm=n+1

m−2 +√2

π2

∞Xm=n+1

m−2|αm|

=

½o¡n−3/2

¢if ϕ0(0) = ϕ0(1) = 0.

O(n−1) if ϕ0(0) 6= 0 or ϕ0(1) 6= 0.

The case ` = 3

Now suppose that ϕ(u) is four times differentiable on (0, 1),and that

ϕ0000(u) ∈ L2(0, 1).Denote

α0 =

Z 1

0

ϕ0000(u)du = ϕ000(1)− ϕ000(0),

αm =

Z 1

0

√2 cos(mπu)ϕ0000(u)du, m ∈ N,

ϕ0000n (u) = α0 +nX

m=1

αm√2 cos(mπu), n ∈ N,

and recall thatZ 1

0

(ϕ0000n (u)− ϕ0000(u))2 du =∞X

m=n+1

α2m = o(1).

Then similar to the case ` = 1 we have

ϕ(u) ≡ P4(u) +∞Xm=1

αm(mπ)4

√2 cos(mπu),

ϕ0(u) ≡ P 04(u)−∞Xm=1

αm(mπ)3

√2 sin(mπu),

ϕ00(u) ≡ P 004 (u)−∞Xm=1

αm(mπ)2

√2 cos(mπu),

ϕ000(u) ≡ P 0004 (u) +∞Xm=1

αmmπ

√2 sin(mπu).

exactly and uniformly on [0, 1], where P4(u) is a polynomialin u of order 4.

P4(u) =

Z 1

0

ϕ(v)dv − 12ϕ0(0)

−16

µϕ0(1)− ϕ0(0)− 1

2ϕ000(0)− 1

6(ϕ000(1)− ϕ000(0))

¶− 124ϕ000(0)− 1

120(ϕ000(1)− ϕ000(0)) + ϕ0(0)u

+1

2

µϕ0(1)− ϕ0(0)− 1

2ϕ000(0)− 1

6(ϕ000(1)− ϕ000(0))

¶u2

+1

6ϕ000(0)u3 +

1

24(ϕ000(1)− ϕ000(0))u4.

Note that the coefficients of this polynomial depend on thetails of ϕ0 and ϕ000 only, together with

R 10 ϕ(v)dv.

Moreover, note thatZ 1

0

P4(v)dv =Z 1

0

ϕ(v)dv.

Note that ifϕ0(1) = ϕ0(0) = 0 and ϕ000(1) = ϕ000(0) = 0

then

P4(u) ≡Z 1

0

ϕ(v)dv.

and Z 1

0

ϕ00(u)du =Z 1

0

ϕ0000(u)du = 0.

Without the conditionsϕ0(1) = ϕ0(0) = 0 and ϕ000(1) = ϕ000(0) = 0

we can still write

ϕ(u) ≡ γ0 +∞Xm=1

γm√2 cos(mπu)

as well, but nowγm =

αm(mπ)4

+ βm form ∈ N,where

βm =

Z 1

0

P4(u)√2 cos(mπu)du.

As in the case ` = 1, the rate of convergence to zero ofP∞m=n+1 |βm| is slower than

P∞m=n+1m

−4|αm| = o(n−7/2),hence

sup0≤u≤1

|ϕn(u)− ϕ(u)|→ 0

at a slower rate than o(n−7/2).

Finally, note that in either case the conditionϕ0000(u) ∈ L2(0, 1)

implies|ϕ(0)| + |ϕ(1)| <∞, |ϕ00(0)| + |ϕ00(1)| <∞.

Uniform convergence of SNP densitieson [0,1] and their derivatives

Let h(u) be a density on [0, 1] satisfying h(u) > 0 on (0, 1),and denote

ϕ(u) =ph(u).

Suppose that ϕ(u) is four times continuously differentiableon (0, 1) with

ϕ0000(u) ∈ L2(0, 1).

Moreover, suppose thatϕ0(1) = ϕ0(0) = 0 and ϕ000(1) = ϕ000(0) = 0.

As we have seen before, we can write ϕ(u) =ph(u) as

ϕ(u) ≡Z 1

0

ϕ(v)dv +∞Xm=1

αm(mπ)4

√2 cos(mπu)

≡ γ0 +∞Xm=1

γm√2 cos(mπu), say,

≡ 1 +P∞

m=1 δm√2 cos(mπu)p

1 +P∞

k=1 δ2k

where nowP∞

m=0 γ2m = 1 and

δm = γm.

Z 1

0

ϕ(v)dv =αm(mπ)4

Z 1

0

ph(v)dv.

Note that∞X

m=n+1

δ2m = O

Ã ∞Xm=n+1

α2mm8

!= o(n−7).

Moreover, withH the c.d.f. of h andHn the c.d.f. of hn wehavesup0≤u≤1

|H(u)−Hn(u)| ≤ sup0≤u≤1

|h(u)− hn(u)| = o(n−7/2).

These uniform convergence results depend crucially on theconditions that

ϕ0000(u) ∈ L2(0, 1)and

ϕ0(1) = ϕ0(0) = 0 and ϕ000(1) = ϕ000(0) = 0,where

ϕ(u) =ph(u).

So the question is how to impose these conditions.

Tail conditions

Recall that, with F (x) an absolutely continuous distributionfunction with density f(x) and support R, we can alwayswrite

F (x) = H(G(x)), f(x) = h(G(x))g(x)

where G(x) is an a priori chosen absolutely continuous dis-tribution function with density g(x) and supportR, andH(u)is an absolutely continuous distribution function on [0, 1]withdensity h(u), given by

H(u) = F (G−1(u)), h(u) = H 0(u) =f(G−1(u))g(G−1(u))

,

with G−1(u), u ∈ [0, 1], the inverse of G(x).

Thus, with ϕ(u) =ph(u), the question is now:

Under what conditions on f and g do we haveϕ0000(u) ∈ L2(0, 1), ϕ0(1) = ϕ0(0) = 0 andϕ000(1) = ϕ000(0) = 0 ?

First of all, f and g need to be four times differentiable onR.

Second, a necessary condition is that G is chosen such that

ϕ(u) =ph(u) =

pf(G−1(u))pg(G−1(u))

and its derivatives ϕ0(u), ϕ00(u) and ϕ000(u) are uniformlycontinuous on [0, 1], as otherwise the aforementioned uni-form convergence results cannot hold.

Regarding ϕ(u) itself, this requirement holds if and only ifthe tails of g are not fatter than those of f, because then ϕ(u)is bounded on [0, 1], which by continuity of ϕ(u) on (0, 1)implies uniform continuity on [0, 1].

The choice of G

Given that Z ∞−∞|x|f(x)dx <∞,

which implies thatlim|x|→∞

x2f(x) = 0,

this tail condition holds if we choose for G(x) the c.d.f. ofthe standard Cauchy distribution:

G(x) = 0.5 + π−1 arctan(x), g(x) =1

π(1 + x2),

G−1(u) = tan(π(u− 0.5)).

Thenh(u) = π

¡1 + tan2(π(u− 0.5))¢ f (tan(π(u− 0.5))) ,

which satisfiesh(0) = lim

u↓0h(u) = 0, h(1) = lim

u↑1h(u) = 0.

Tail conditions for the derivatives

Nowϕ(u) =

ph(u)

=√π¡1 + tan2(π(u− 0.5))¢1/2 η (tan(π(u− 0.5))) ,

whereη(x) =

pf(x).

It can be shown thatϕ000(0) = ϕ000(1) = 0, ϕ0(0) = ϕ0(1) = 0

iflim|x|→∞

x4η (x) = 0, lim|x|→∞

x5η0 (x) = 0,

lim|x|→∞

x6η00 (x) = 0, lim|x|→∞

x7η000 (x) = 0.

The latter conditions also implyϕ(0) = ϕ(1) = 0, ϕ00(0) = ϕ00(1) = 0.

Sufficient conditions forlim|x|→∞

x4η (x) = 0, lim|x|→∞

x5η0 (x) = 0,

lim|x|→∞

x6η00 (x) = 0, lim|x|→∞

x7η000 (x) = 0,

are that

Assumption(a)R∞−∞ |x|3η (x)dx <∞, where η (x) =

pf(x);

(b) The set{x ∈ R : η0(x) = 0 or η00(x) = 0 or η000(x) = 0}

is finite.

Finally, we have to set forth conditions such thatϕ0000(u) ∈ L2(0, 1),

which is the case ifZ 1

0

ϕ0000(u)2du <∞.Sufficient conditions for the latter is that ϕ0000(u) is continu-ous on (0, 1) and

max

µlimu↓0|ϕ0000(u)|, lim

u↑1|ϕ0000(u)|

¶<∞,

because thenϕ0000(u) is uniformly continuous and thus boundedon [0, 1].It can be shown that the latter condition is implied by theconditionslim|x|→∞ |x|5η(x) <∞, lim|x|→∞ x6|η0(x)| <∞,lim|x|→∞ |x7η00(x)| <∞, lim|x|→∞ x8|η000(x)| <∞,lim|x|→∞ |x9η0000(x)| <∞.

However, it is difficult, if not impossible, to break downthese conditions in more primitive general conditions.

On the other hand, suppose that

Assumption. In addition to the previous assumptions, thefollowing conditions hold:(a)R∞−∞ x

4η(x)dx <∞, where η (x) =pf(x);

(b) The set {x ∈ R : η0000(x) = 0} is finite.

Thenϕ0000(0) = ϕ0000(1) = 0,

henceϕ0000(u) is uniformly continuous on [0, 1] and thus boundedon [0, 1], so that

ϕ0000(u) ∈ L2(0, 1).

Summary

Let f(x) be a four times continuously differentiable densitywith support R, satisfying the following conditions:

(1)R∞−∞ x

4pf(x)dx <∞;

(2) Denoting η(x) =pf(x), the sets

{x ∈ R : η0(x) = 0},{x ∈ R : η00(x) = 0},{x ∈ R : η000(x) = 0},{x ∈ R : η0000(x) = 0},

are finite.

Then f(x) can be written as

f(x) =h¡0.5 + π−1 arctan(x)

¢π(1 + x2)

whereh(u) ≡ π

¡1 + tan2(π(u− 0.5))¢ f (tan(π(u− 0.5)))

≡¡1 +

P∞m=1 δm

√2 cos(mπu)

¢21 +

P∞k=1 δ

2k

uniformly on [0, 1], with

δm =

R 10

ph(u)√2 cos(mπu)duR 1

0

ph(u)du

, m ∈ N.

The infinite-dimensional parameter space

As a by-product of these results, it follows that the infinite-dimensional parameter

δ = {δm}∞m=1involved is contained in the parameter space

∆3 =

(δ = {δm}∞m=1 :

∞Xm=1

m3|δm| <∞).

Endowing this space with the norm

||δ||3=∞Xm=1

m3|δm|and associated metric, it becomes a Banach space, i.e., everyCauchy sequence in ∆3 takes a limit in ∆3.

Moreover, denoting for someM ∈ (0,∞),∆3(M) =

(δ = {δm}∞m=1 :

∞Xm=1

m3|δm| ≤M),

endowed with the same norm ||δ||3 and associated metric asbefore, the metric space ∆3(M) is compact.

The latter follows from the following more general result.

Theorem. Denote

∆`(M) =

(δ = {δm}∞m=1 :

∞Xm=1

m`|δm| ≤M)

for some ` > 0.5 and M ∈ (0,∞). Endow this space withthe norm

||δ||` = ||{δm}∞m=1||` =∞Xm=1

m`|δm|and associated metric. Then ∆`(M) is compact.

An application to the SNP fractional in-dex regression model

Let Y be a fractional dependent variable, i.e.,Pr[Y ∈ (0, 1)] = 1,

and let X ∈ Rd a vector of covariates. The SNP fractionalindex regression model takes the form

E[Y |X] = F0(θ00X),where F0 is an unknown distribution function on R.Note that θ0 and F0 are not unique because for any constantc 6= 0,

E[Y |θ00X] = E[Y |c.θ00X] a.s.,so that without loss of generality we may normalize θ0 to||θ0|| = 1, for example, or set one of the components of θ0to 1 or −1.

Similarly, we cannot allow a constant component in X , be-cause for any constant c 6= 0,

E[Y |θ00X] = E[Y |c + θ00X] a.s.Moreover, if all the components ofX are discrete then thereexists multiple distinct θ’s, and possibly uncountable manyθ’s, such that E[Y |X] = E[Y |θ0X] a.s., even after normal-ization. See

• Bierens, H. J., and J. Hartog (1988): ’’Non-linear Regres-sion with Discrete Explanatory Variables, with an Appli-cation to the Earnings Function’’, Journal of Econometrics38, 269-299.

Thus, at least one component ofX needs to be continuouslydistributed with support R, and such a component needs tohave a nonzero coefficient, which without loss of generalitymay be normalized to 1 or −1, in order for θ0 and F0 to beidentifiable.

In particular, in the case d ≥ 2, partitionX asX = (X1, X02)0

and normalize θ0 as θ0 = (1,β00)0, where conditional onX2,X1 is absolutely continuously distributed with support R.ThenE[Y |X] = F0(X1 + β00X2) ifX = (X1, X

02)0 ∈ Rd, d ≥ 2,

= F0(X) ifX ∈ R,where

Pr[Y ∈ (0, 1)] = 1,andF0 is an unknown absolutely continuous c.d.f. onRwithcontinuous density f0 and support R.

The model involved assumes thatX1 has a positive effect onE[Y |X]. If not, simply replace X1 by −X1, and similarlyin the caseX = X1.

Semi-nonparametric identification

Theorem. The c.d.f. F0, and the parameter vector β0 in thecase d ≥ 2, are semi-nonparametrically identified under thefollowing conditions:In the case d ≥ 2, the conditional distribution of X1 givenX2 in the partition X = (X1, X

02)0 is absolutely continuous

with supportR. MoreoverE[||X2||2] <∞ and det(Var(X2)) >0.In the case d = 1, X itself has an absolutely continuousdistribution with support R.

SNP identification in the cased ≥ 2means that if there existsa parameter vector β ∈ Rd−1 and an absolutely continuousc.d.f. F with support R such that

F (X1 + β0X2) = F0(X1 + β00X2) a.s.then under the conditions in this theorem,

β = β0 and F (x) = F0(x) for all x ∈ R.

SNP modeling of F0 and its derivatives

Now suppose that the aforementioned conditions for uniformconvergence are satisfied.

DenoteF (x|δ) = H(0.5 + π−1 arctan(x)|δ)f(x|δ) = h(0.5 + π−1 arctan(x)|δ)

π(1 + x2),

δ ∈ ∆3 =

(δ = {δm}∞m=1 :

∞Xm=1

m3|δm| <∞),

where

h(u|δ) =¡1 +

P∞m=1 δm

√2 cos(mπu)

¢21 +

P∞m=1 δ

2m

,

H(u|δ) =Z u

0

h(v|δ)dv.

Then there exists a unique δ0 ∈ ∆3 such thatF0(x) ≡ F (x|δ0), f0(x) ≡ f(x|δ0), f 00(x) ≡ f 0(x|δ0),f 000 (x) ≡ f 00(x|δ0), f 0000 (x) ≡ f 000(x|δ0).

Moreover, we may assume that

δ0 ∈ ∆3(M) =

(δ = {δm}∞m=1 :

∞Xm=1

m3|δm| ≤M),

provided thatM > ||δ0||3.

Recall that ∆3(M) is compact.

Next, let πn be the truncation operator, i.e., πn applied toδ = {δm}∞m=1 as πnδ replaces all the δm’s for m > n byzeros.

Then the previous uniform convergence results read:supx∈R

¯̄F0(x)− F (x|πnδ0)

¯̄= o(n−7/2)

supx∈R¯̄f0(x)− f(x|πnδ0)

¯̄= o(n−7/2),

supx∈R¯̄f 00(x)− f 0(x|πnδ0)

¯̄= o(n−5/2),

supx∈R¯̄f 000 (x)− f 00(x|πnδ0)

¯̄= o(n−3/2),

supx∈R¯̄f 0000 (x)− f 000(x|πnδ0)

¯̄= o(n−1/2).

Sieve nonlinear least squares estimationConsistency

Given a random sample {(Yj,X1,j, X2,j)}Nj=1 from (Y,X1, X2)in the case d ≥ 2, the NLLS objective function is

bQN(β, δ) = 1

N

NXj=1

(Yj − F (X1 + β0X2|δ))2

whereβ ∈ B ⊂ Rd−1

withB a compact set w.r.t. the Euclidean metric ||β1− β2||,containing β0 in its interior, and

δ ∈ ∆3(M) =

(δ = {δm}∞m=1 :

∞Xm=1

m3|δm| ≤M)

withM > ||δ0||3 =∞Xm=1

m3|δ0,m|.

Recall that

∆3(M) =

(δ = {δm}∞m=1 :

∞Xm=1

m3|δm| ≤M)

is compact w.r.t. the metric ||δ1 − δ2||3, hence B×∆3(M)is compact w.r.t. the metric ||β1 − β2|| + ||δ1 − δ2||3, forexample.

Since δ0 is unknown, the choice of M > ||δ0||3 is a matterof guess-work, of course. Therefore, choose M very large,for example,M = 100000.

Similarly, since β0 is unknown, the choice of B is a matterof guess-work as well. Therefore, let B = [−K,K]d−1 forsome largeK > 0, sayK = 1000.

By the compactness of B×∆3(M) it follows similar to thestandard uniform strong law of large numbers that

sup(β,δ)∈B×∆3(M)

¯̄̄ bQN(β, δ)−Q(β, δ)¯̄̄ a.s.→ 0,

whereQ(β, δ) = E[ bQN(β, δ)]

is continuous onB×∆3(M), and(β0, δ

0) = arg min(β,δ)∈B×∆3(M)

Q(β, δ),

which under the aforementioned identification conditions isunique.However,

(bβ,bδ) = arg min(β,δ)∈B×∆3(M)

bQN(β, δ)is not uniquely defined, and in general none of these solu-tions are consistent.Therefore, the standard approach is sieve estimation, as fol-lows.

Denote for n ∈ N,∆3,n(M) =

(δ = {δm}∞m=1 :

nXm=1

m3|δm| ≤M, δm = 0 form > n

),

which is called a sieve space, and let(bβn,bδn) = arg min

(β,δ)∈B×∆3,n(M)

bQN(β, δ).Then for any subsequence nN ofN satisfying

limN→∞

nN =∞, limN→∞

nN/N = 0,

it can be shown thatbβnN a.s.→ β0, ||bδnN − δ0||3 a.s.→ 0,

hencesupx∈R

¯̄̄F0(x)− F (x|bδnN)¯̄̄ a.s.→ 0,

supx∈R

¯̄̄f0(x)− f(x|bδnN)¯̄̄ a.s.→ 0,

supx∈R

¯̄̄f 00(x)− f 0(x|bδnN)¯̄̄ a.s.→ 0.

Asymptotic normalityThe next step is to show that√

N(bβnN − β0)d→ Nd−1(0,Σ)

for some asymptotic variance matrixΣ, by verifying the con-ditions in:

• Bierens, H. J. (2014): ’’Consistency and Asymptotic Nor-mality of Sieve ML Estimators Under Low-Level Condi-tions’’, Econometric Theory 30, 1021-1076,

in particular Lemma 5.1 and Theorem 6.2 in this paper.Due to the uniform convergence results

supx∈R¯̄f0(x)− f(x|πnδ0)

¯̄= o(n−7/2),

supx∈R¯̄f 00(x)− f 0(x|πnδ0)

¯̄= o(n−5/2),

supx∈R¯̄f 000 (x)− f 00(x|πnδ0)

¯̄= o(n−3/2),

these conditions are easy to verify, with one exception.

In particular, denoteξ = {ξm}∞m=1 = (β, δ),ξ0 = {ξ0,m}∞m=1 = (β0, δ0)

ρ(ξ) = (Y − F (X1 + β0X2|δ))2Moreover, denote for k,m ∈ N and n > d− 1,

∇kρ(ξ0) = ∂ρ(ξ)/∂ξk|ξ=ξ0 ,∇k,mρ(ξ0) = ∂2ρ(ξ)/(∂ξk∂ξm)

¯̄ξ=ξ0

,

and

Bn =

⎛⎝ E£∇1,1ρ(ξ0)¤ · · · E £∇1,nρ(ξ0)¤

... . . . ...E£∇n,1ρ(ξ0)¤ · · · E £∇n,nρ(ξ0)¤

⎞⎠ ,Vn = Var

¡(∇1ρ(ξ0),∇2ρ(ξ0), ...,∇nρ(ξ0)0

¢

The crucial asymptotic normality conditions in Bierens (2014)are now:

• The sieve order n = nN is chosen such that

limN→∞

√N

∞Xm=nN+1

|ξ0,m| = 0,

which is the case if nN ∝ N1/6 or faster, due to∞Xm=1

m3|ξ0,m| <∞.

• For each n ∈ N the matrices Bn and Vn are nonsingular,so that for n > d− 1,Σn = (Id−1, Od−1,n−d+1)B−1n VnB

−1n

µId−1On−d+1,d−1

¶is nonsingular.

• limn→∞Σn = Σ exists and is nonsingular.

Apart from the condition that limn→∞Σn = Σ exists and isnonsingular, the other conditions are straightforwardly veri-fiable for the SNP model under review.

Under these conditions it follows from Theorem 6.2 in Bierens(2014) that √

N(bβnN − β0)d→ Nd−1(0,Σ),

whereΣ = lim

n→∞Σn

with

Σn = (Id−1, Od−1,n−d+1)B−1n VnB−1n

µId−1On−d+1,d−1

¶.

Note that if δ0 is finite-dimensional, so thatδ0 = πnδ

0 for some fixed n ∈ N,then the NLLS problem involved becomes fully parametric,hence √

N(bβn − β0)d→ Nd−1(0,Σn),

In this case the variance matrix Σn can be consistently esti-mated bybΣn = (Id−1, Od−1,n−d+1) bB−1n bVn bB−1n µ

Id−1On−d+1,d−1

¶,

where bBn and bVn are the usual consistent estimators of Bnand Vn, respectively.Then under the aforementioned conditions it can be shownthat

p limN→∞

bΣnN = Σ.

uniform convergence of semi-nonparametric densities and

Documents