kernel estimators of density in spaces of an arbitrary nature

5
Journal of Mathematical Sciences, Vol. 103, No. 4, 2001 KERNEL ESTIMATORS OF DENSITY IN SPACES OF AN ARBITRARY NATURE A. I. Orlov (Moscow, Russia) UDC 519.2 Introduction Progress in applied research has led to the necessity of considering various objects of a nonnumerical nature as statistical data. Therefore, from the middle of the seventies the methods of statistical analysis of nonnumerical data [1] have been actively developing in our country. As examples of these objects, we can mention elements of spaces which are not linear (vector): binary relations (rankings, partitionings, tolerances, etc.), sets, fuzzy sets, results of measurements in scales different from absolute, and, as a generalization, elements of spaces of a general nature. For nonnumerical data the classical problems of statistics can be considered: data description (including classification), estimation (of parameters, characteristics, distribution density, regressive relationship, etc.), testing hypotheses. The mathematical apparatus of statistics of nonnumerical data is based not on the property of linearity of a space, but on the use of symmetrics and metrics. Therefore, it substantially differs from the classical one. In the statistics of nonnumerical data, we can separate [1] the general theory and the statistics in specific spaces of a nonnumerical nature (for example, the statistics of rankings). There are two main topics in the general theory. One is connected with mean values and the asymptotic behavior of solutions of extremal statistical problems. The second deals with nonparametric estimators of density. It is the second topic to which this paper is devoted. Nonparametric Estimators of Density Let (χ, α) be a measurable space, and let υ and μ be two σ-finite measures on (χ, α) such that υ is absolutely continuous with respect to μ, that is, the equality μ(A) = 0 implies υ(A) = 0, where A α. In this case, there exists a nonnegative measurable function f (x) on (χ, α) such that υ(A)= Z A f (x) for any A α. The function f (x) is called the Radon–Nikodym derivative of the measure υ with respect to the measure μ, and if υ is a probability measure, then f (x) is the density of υ with respect to μ. Assume that a measure μ is given on a space of objects of a nonnumerical nature and a measure υ corresponds to the distribution P of a random element ξ with values in the measurable space (χ, α), that is, υ(A)= P(ξ A). If χ is a space consisting of a finite number of points, then, as μ it is reasonable to use the counting measure (assigning the unit weight to each point of the space) μ(A) = card(A) or a normalized counting measure μ 0 (A)= card(A) card(χ) . For a counting measure, the value of the density at a point x χ coincides with the probability of falling into this point, that is, f (x)= P(ξ = x). We will not undertake an attempt to consider the entire variety of classification techniques in statistics of nonnu- merical data (see, e.g., [2]) but will concentrate our attention on those methods which use distribution densities and their estimates. With known distribution densities of classes it is possible to solve basic classification problems such as the problems of cluster analysis or the problems of diagnostics. In the problems of cluster analysis, one may find the modes of the density and use them as the centers of clusters or as the initial points of “nu´ ees dinamiques”-type iterative procedures. In the problems of diagnostics (discrimination analysis, pattern recognition with a trainer), it is possible to make decisions concerning the classification of objects based on the ratios of the densities corresponding to the classes. With unknown densities it seems reasonable to use their consistent estimators. Translated from Statisticheskie Metody Otsenivaniya i Proverki Gipotez, pp. 68–75, Perm, Russia, 1996. 470 1072-3374/01/1034-0470$25.00 c 2001 Plenum Publishing Corporation

Upload: a-i-orlov

Post on 02-Aug-2016

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Kernel Estimators of Density in Spaces of an Arbitrary Nature

Journal of Mathematical Sciences, Vol. 103, No. 4, 2001

KERNEL ESTIMATORS OF DENSITY IN SPACES OF AN ARBITRARY NATURE

A. I. Orlov (Moscow, Russia) UDC 519.2

Introduction

Progress in applied research has led to the necessity of considering various objects of a nonnumerical nature asstatistical data. Therefore, from the middle of the seventies the methods of statistical analysis of nonnumerical data [1]have been actively developing in our country. As examples of these objects, we can mention elements of spaces which arenot linear (vector): binary relations (rankings, partitionings, tolerances, etc.), sets, fuzzy sets, results of measurementsin scales different from absolute, and, as a generalization, elements of spaces of a general nature. For nonnumericaldata the classical problems of statistics can be considered: data description (including classification), estimation (ofparameters, characteristics, distribution density, regressive relationship, etc.), testing hypotheses.

The mathematical apparatus of statistics of nonnumerical data is based not on the property of linearity of a space,but on the use of symmetrics and metrics. Therefore, it substantially differs from the classical one.

In the statistics of nonnumerical data, we can separate [1] the general theory and the statistics in specific spaces ofa nonnumerical nature (for example, the statistics of rankings). There are two main topics in the general theory. Oneis connected with mean values and the asymptotic behavior of solutions of extremal statistical problems. The seconddeals with nonparametric estimators of density. It is the second topic to which this paper is devoted.

Nonparametric Estimators of Density

Let (χ, α) be a measurable space, and let υ and µ be two σ-finite measures on (χ, α) such that υ is absolutelycontinuous with respect to µ, that is, the equality µ(A) = 0 implies υ(A) = 0, where A ∈ α. In this case, there existsa nonnegative measurable function f(x) on (χ, α) such that

υ(A) =

∫A

f(x) dµ

for any A ∈ α. The function f(x) is called the Radon–Nikodym derivative of the measure υ with respect to the measureµ, and if υ is a probability measure, then f(x) is the density of υ with respect to µ. Assume that a measure µ is givenon a space of objects of a nonnumerical nature and a measure υ corresponds to the distribution P of a random elementξ with values in the measurable space (χ, α), that is,

υ(A) = P(ξ ∈ A).

If χ is a space consisting of a finite number of points, then, as µ it is reasonable to use the counting measure (assigningthe unit weight to each point of the space) µ(A) = card(A) or a normalized counting measure

µ0(A) =card(A)

card(χ).

For a counting measure, the value of the density at a point x ∈ χ coincides with the probability of falling into thispoint, that is, f(x) = P(ξ = x).

We will not undertake an attempt to consider the entire variety of classification techniques in statistics of nonnu-merical data (see, e.g., [2]) but will concentrate our attention on those methods which use distribution densities andtheir estimates. With known distribution densities of classes it is possible to solve basic classification problems suchas the problems of cluster analysis or the problems of diagnostics. In the problems of cluster analysis, one may findthe modes of the density and use them as the centers of clusters or as the initial points of “nuees dinamiques”-typeiterative procedures. In the problems of diagnostics (discrimination analysis, pattern recognition with a trainer), it ispossible to make decisions concerning the classification of objects based on the ratios of the densities corresponding tothe classes. With unknown densities it seems reasonable to use their consistent estimators.

Translated from Statisticheskie Metody Otsenivaniya i Proverki Gipotez, pp. 68–75, Perm, Russia, 1996.

470 1072-3374/01/1034-0470$25.00 c© 2001 Plenum Publishing Corporation

Page 2: Kernel Estimators of Density in Spaces of an Arbitrary Nature

Some methods of distribution-density estimation in spaces of an arbitrary nature were proposed and preliminarilyinvestigated in [3]. In particular, in the problems of classification of nonnumerical objects we suggest using Parzen–Rosenblatt-type nonparametric kernel density estimators (this type of estimator as well as its name was introduced in[3]):

fn(x) =1

vn(hn, x)

∑1≤i≤n

K

(d(xi, x)

hn

),

where K : R1+ → R1 is the kernel function, x1, x2, . . . , xn ∈ χ is the sample from which the density is estimated,

d(xi, x) is the distance between the sample element xi and the point x at which the density is estimated, the sequenceof smoothness coefficients hn is such that hn → 0 and nhn →∞, and vn(hn, x) is the normalizing factor providing thecondition ∫

χ

fn(x) dµ = 1.

Kernel (Parzen–Rosenblatt-type) estimators are special cases of linear estimators [3]. From the theoretical point ofview they can be distinguished by the fact that it is possible to obtain results of the same type as in the classicalone-dimensional case (χ = R1) but, of course, using completely different mathematical apparatus.

Consistency of Kernel Estimators

One of the basic ideas consists of making the distance d agree with the measure µ. Namely, consider the balls ofradius t ≥ 0

Lt(x) = {y ∈ χ : d(y, x) ≤ t}

and their measuresFx(t) = µ(Lt(x)).

Assume that for a fixed x the function Fx(t) is continuous and strictly increasing in t. Introduce the function

d1(x, y) = Fx(d(x, y)).

This is a monotonic transformation of the distance and, therefore, d1(x, y) is a metric or a symmetric (that is, thetriangle inequality may be violated) which can be regarded (as well as d(x, y)) as a measure of the closeness of x and y.

Introduce L1t(x) = {y ∈ χ : d1(y, x) ≤ t}. Since the function F−1x (u) is uniquely defined, we have

L1t(x) = {y ∈ χ : d1(y, x) ≤ F−1x (t)} = LT (x),

where T = F−1x (t), and hence,

F 1x (t) = µ(L1t(x)) = µ(LT (x)) = Fx(F−1

x (t)) = t.

The passage from d to d1 resembles the classical transformation η = F (ξ) which transforms a random variableξ with a continuous distribution function F (x) into the random variable η uniformly distributed on [0, 1] used byN. V. Smirnov. Both of these transforms considerably simplify the subsequent considerations.

The transformation d1 = Fx(d) depends on the point x, which does not influence the subsequent reasoning sincewe restrict ourselves to the consideration of pointwise convergence.

The function d1(x, y) for which the measure of a ball of radius t is equal to t is called (see [3]) a natural measureof distinction or a natural distance. In the case of the space Rk and the Euclidean distance d, we have

d1(x, y) = ckdk(x, y),

where Ck is the volume of a ball with unit radius in Rk.Since we can write

K

[d(Xi, x)

hn

]= K1

[d1(Xi, x)

hn

],

where

K1(u) = K

[F−1x (uhn)

hn

],

471

Page 3: Kernel Estimators of Density in Spaces of an Arbitrary Nature

the passage from d to d1 corresponds to the passage from K to K1. The benefit of this passage lies in the fact that thestatements gain a simpler formulation.

THEOREM 1. Let d be a natural distance,

∞∫0

K(u) du = 1,

∞∫0

(|K(u)|+ K2(u)) du <∞.

Assume that the density f is continuous in x and bounded on χ. Moreover, let f(x) > 0. Then vn(hn, x) = nhn, theestimator fn(x) is consistent, that is, fn(x)→ f(x) in probability as n→∞, and

limn→∞

(nhnDfn(x)) = f(x)

∞∫0

K2(u) du.

Theorem 1 can be proved by the methods developed in [3]. However, the questions of the rate of convergence ofkernel estimators (in particular, of the behavior of αn = E(fn(x)− f(x))2) and of the optimal choice of the smoothingparameters hn still remain open.

Convergence-Rate Estimates for Kernel Density Estimators

Introduce the circle distribution G(x, t) = P{d(X, x) ≤ t} and the circle density g(x, t) = G′t(x, t).

THEOREM 2. Let the kernel function K(u) be continuous and K(u) = 0 for u > E. Assume that the circledensity admits the expansion

g(x, t) = f(x) + tg′t(x, 0) +t2

2g′′tt(x, 0) +

t3

3!g′′′ttt(x, 0) + · · ·+ tk

k!g

(k)

t(k) + o(hkn)

with the remainder term being uniformly bounded on [0, hE]. Let

E∫0

uiK(u) du = 0, i = 1, 2, . . . , k− 1.

Then

αn = [Efn(x)− f(x)]2 + sfV fn(x) = h2kn

( E∫0

ukK(u) du

)2

(g(k)

t(k)(x, 0))2 +f(x)

nhn

E∫0

K2(u) du+ o

(h2kn +

1

nhn

).

Theorem 2 is proved using techniques developed in the statistics of nonnumerical data (see, e.g., [3]). This proofis voluminous and, therefore, is omitted here.

If the coefficients of the principal terms on the right-hand side are not equal to zero, then αn attains its minimum,

αn = O

(n−1+1/(2k+1)

),

at hn = n−1/(2k+1), which coincides with the classical results for a rather particular case of the real line, χ = R1 (see[4, p. 316]). Note that to diminish the bias of the estimator one should use kernels K(u) with alternating signs.

Kernel Estimators in Discrete Spaces

In the case of discrete spaces, there are no natural distances. However, it is possible to obtain analogs of Theorems 1and 2 by passing to the limit not only in the sample size n, but also in the discreteness parameter m.

472

Page 4: Kernel Estimators of Density in Spaces of an Arbitrary Nature

Let χm, m = 1, 2, . . ., be a sequence of finite spaces, dm be the distance in χm, m = 1, 2, . . . , and

µm(A) =card(A)

card(χm), A ⊆ χm.

SetFmx(t) = µm({y ∈ χm : dm(x, y) ≤ t}),

d1m(x, y) = Fmx(dm(x, y)),

F 1mx(t) = µm({y ∈ χm : dm(x, y) ≤ t}).

Then the functions F 1mx(t) are piecewise constant and have jumps at some points ti, i = 1, 2, . . . , so that

F 1mx(ti) = ti.

THEOREM 3. If max(ti − ti−1) → 0 as m → ∞ (in other words, sup |F 1mx(t) − t| → 0 as m → ∞), then there

exists a sequence mn of discreteness parameters such that the conclusions of Theorems 1 and 2 remain valid as n→∞,m→∞, m ≥ mn.

We omit the proof of this theorem due to its volume.Example 1. The space χm = 2σ(m) of all subsets of a finite set σ(m) of m elements admits [5] an axiomatic

definition of the distance d(A,B) = 2−m card(A4B), where 4 is the symbol of symmetric difference of sets. Considerthe nonparametric kernel density estimator of Parzen–Rosenblatt type

fnm(A) =1

nhn

n∑i=1

K

(1

hnΦ

(2 card(A4Xi) −m√

m

)),

where Φ(·) is the standard normal distribution function. It can be demonstrated that this estimator satisfies theconditions of Theorem 3 with mn = (log(n))6.

Example 2. Consider the space of functions f : Yr → Zq defined on the finite set Yr = {1/r, 2/r, . . . , (r− 1)/r, 1}with values in the finite set Zq = {0, 1/q, 2/q, . . ., 1}. This space can be interpreted as a space of fuzzy sets (see [5]). Itis obvious that card(χm) = (q + 1)r. We will use the distance d(f, g) = sup |f(y) − g(y)|. The nonparametric densityestimator has the form

fnm(x) =1

nhn

n∑i=1

K

([2 supy |x(y) −Xi(y)| + 1/q]′

hn(1 + 1/q)r

).

If r = nα and q = nβ, then for β > α the conditions of Theorem 3 hold and hence, Theorems 1 and 2 are valid.Example 3. Consider the space of rankings of m objects, and as the distance d(A,B) between rankings A and B

take the minimum number of inversions required to pass from A to B. Then max(ti − ti−1) does not tend to zero asm→∞ so that the conditions of Theorem 3 are violated.

Example 4. In applied research, the most frequently encountered example of objects of a nonnumerical nature isthat of data of different types, where a real object is described by a vector, one part of the coordinates of which consistsof the values of quantitative characteristics whereas the other part consists of qualitative characteristics (nominal andordinal). For spaces of characteristics of different types, that is, for the Cartesian products of continuous and discretespaces, various settings are possible. Let, for example, the number of levels of qualitative characteristics remain constant.Then the nonparametric density estimator reduces to the product of the frequency of falling into a point in the spaceof qualitative characteristics by a classical Parzen–Rosenblatt estimator in the space of quantitative characteristics. Inthe general case, the distance d(x, y), for example, can be considered as the sum of the Euclidean distance d1 betweenthe quantitative factors, the distance d2 between the nominal characteristics (d2(x, y) = 0 if x = y and d2(x, y) = 1 ifx 6= y), and the distance d3 between the ordinal variables (if x and y are numbers of levels, then d3(x, y) = |x−y|). Thepresence of quantitative factors leads to the continuity and strict monotonicity of Fmx(t), and, hence, for nonparametricdensity estimators in spaces of characteristics of different types Theorems 1 and 2 are valid.

The approaches described in this paper were used to solve some applied problems, in particular, in the constructionof the computer-aided working place “Mathematics in Examination,” in labor medicine as well as in lectures andseminars at the Economical Education Center of the Ministry of Higher Education of Russia and the Moscow StateInstitute of Electronics and Mathematics.

REFERENCES

1. A. I. Orlov, “Statistics of objects of a nonnumerical nature,” Zavod. Lab., 56, No. 3, 76–83 (1990).

473

Page 5: Kernel Estimators of Density in Spaces of an Arbitrary Nature

2. A. I. Orlov, “Notes in classification theory,” Sociologiya: Met., Mat. Mod., No. 2, 28–50 (1991).3. A. I. Orlov “Nonparametric density estimators in topological spaces,” in: Applied Statistics [in Russian], Finansy i

Statistika, Moscow (1983), pp. 12–40.4. I. A. Ibragimov and R. Z. Khas’minskii, Statistic Estimation: Asymptotic Theory, Springer, Berlin–New York (1981).5. A. I. Orlov, Stability in Socioeconomic Models [in Russian], Nauka, Moscow (1979).

474