abstract · 2008-10-19 · abstract this w ork fo cuses on the use of truncated gaussian...

31

Upload: others

Post on 16-Mar-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

Truncated Gaussians as Tolerance Sets

Fabio Cozman Eric KrotkovCMU-RI-TR 94-35

The Robotics InstituteCarnegie Mellon University

Pittsburgh, PA 15213

September 28, 1994

c 1994 Carnegie Mellon University

This research is supported in part by NASA under Grant NAGW-1175. Fabio Cozman is supported under a

scholarship from CNPq, Brazil.

Page 2: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

Abstract

This work focuses on the use of truncated Gaussian distributions as models for bounded data {measurements that are constrained to appear between �xed limits. We prove that the truncatedGaussian can be viewed as a maximum entropy distribution for truncated bounded data, whenmean and covariance are given. We present the characteristic function for the truncated Gaussian;from this, we derive algorithms for calculation of mean, variance, summation, application of Bayesrule and �ltering with truncated Gaussians. As an example of the power of our methods, wedescribe a derivation of the disparity constraint (used in computer vision) from our models. Ourapproach complements results in Statistics, but our proposal is not only to use the truncatedGaussian as a model for selected data; we propose to model measurements as fundamentallybounded in terms of truncated Gaussians.

Page 3: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

1 Introduction

This work presents a new class of statistical models that are well suited for several Roboticsapplications, such as object recognition or computer vision. Our approach deals with boundeddata: measurements that are constrained to appear in a bounded region in the measurement space.Bounded measurements have been studied in Statistics in connection with selection mechanisms;here we propose a di�erent approach, in which the data is considered to be bounded to begin with.

To date, few statistical models for bounded variables are available, none of them satisfactory.The most common approach is to use the Gaussian distribution and model bounds through anad hoc selection mechanism [2, 1]. Another possibility is the uniform distribution [8], but thisapproach has computational problems: summation of uniform variables does not yield a uniformvariable and application of Bayes rule is hard [11]. In short: even though bounds contain a lot ofinformation, they have not received proper attention yet.

Our work uses a class of distributions in the truncated Gaussian family in order to modelbounded data. We derive a complete set of tractable algorithms for these models: calculation ofmoments, approximation methods for Bayes rule and summation and noise �ltering. Overall, ourresults make the truncated Gaussian family an operational tool, much more powerful than theuniform or the Gaussian distributions.

In order to illustrate the strength of our approach, we present a statistical derivation of thedisparity constraint used in Computer Vision. So far no statistical analysis has been given for thisconstraint.

Our analysis complements results scattered in the literature of Statistics, Information Theoryand Control Theory. We contribute to Robotics by indicating a proper way to model boundedmeasurements and deriving tractable algorithms to handle them. Besides contributing to Robotics,our algorithms demonstrate that Robotics has much to contribute to Statistics itself.

2 The Truncated Gaussian Family

Our basic model is the elliptically truncated Gaussian family. A distribution in this family isproportional to a Gaussian inside an ellipsoid and is zero outside the ellipsoid. The truncatedGaussian model has been proposed in a variety of contexts in Statistics [9] as models of selectionmechanisms. In Robotics, the �rst explicit mention of the possibility of using the truncatedGaussian appears to be by Erdmann [5].

A truncated Gaussian distribution for a n-dimensional random vector x is referred to as

1

Page 4: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

N�;M;k(�; P ); its mathematical expression is:

N�;M;k(�; P ) =(q(�; P; �;M; k))�1

j(2�)ndet(P )j 12 exp��1

2(x� �)TP�1(x� �)

�Ifx:(x��)TM�1(x��)�kg(x); (1)

where I(�) is the indicator function and q(�; P; �;M; k) is a normalizing constant. The set fx :(x � �)TM�1(x � �) � kg de�nes an ellipsoid in n-dimensional space. Call k the radius of thedistribution.

As a special case, note that if � = � and M = P , then the normalizing constant depends onlyon k:

q(�; P; �; P; k) = Pr(�2n � k) =

Z k

0(2n=2�(n=2))�1xn=2�1e�x=2dx:

In this case, q(�; P; �; P; k) is the value (at k) of the distribution function of a chi-square variablewith n degrees of freedom. Due to the importance of this sub-family for modeling purposes, wecall it the radially truncated Gaussian family.

2.1 Genesis of a Truncated Gaussian

There are other situations where a truncated model is appropriate because data is purposefullytruncated. Consider an n-dimensional source of (unbounded) Gaussian noise and an ellipsoid inthis n-dimensional space. If any measurement outside the ellipsoid is discarded, then the resultingdata obeys a truncated Gaussian. Examples of these procedures are the algorithms of Cox [2]and Bar-Shalom/Fortmann [1]. These algorithms use the Gaussian distribution associated withad hoc selection mechanisms; none of these algorithms uses appropriate models for bounded data.This approach to bounded data is employed in Statistics in order to model selection mechanisms[9]. Results derived in this work can be understood as new tools for this type of analysis.

There is a di�erent way of looking at bounded data. We can use bounded models in order tomodel data that is fundamentally bounded, not bounded as the result of a selection. We elaborateon that. Suppose we know a random vector x has mean �, covariance matrix Q, and Pr(x) = 0for all x outside fx : (x � �)TQ�1(x� �) � kg. In other words, possible values of x concentratearound the mean in a symmetric fashion, up to the distance k in the metric induced by Q. Underthese conditions, we have (proof in Appendix A):

Theorem 1 Given a expected value is �, a covariance matrix Q and the fact that a distribution

is zero outside the set fx : (x� �)TQ�1(x� �) � kg, a maximum entropy distribution that obeys

these conditions is a truncated Gaussian N�;cQ;k0(�; cQ), where c = Pr(�2n � k)(Pr(�2

n+2 � k))�1

and k0 = kPr(�2n+2 � k)(Pr(�2

n � k))�1.

This theorem strengthens the parallel between truncated and unbounded Gaussians.

2

Page 5: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

-0.4 -0.2 0.2 0.4

1

2

3

4

5

0.5 1 1.5 2 2.5 3

0.2

0.4

0.6

0.8

1

Figure 1: Distributions in the truncated Gaussian family: N0;1;0:01(0; 1) (left) and N1:5;1;1(0; 1)(right).

The truncated Gaussian family is even more powerful than the unbounded Gaussian. Thetruncated Gaussian family can represent highly skewed distributions and distributions that arenearly constant. Figure 1 illustrates this claim.

2.2 Numerical Evaluation of q(�; P; �;M; k)

The central problem in the characterization of a truncated Gaussian is the determination ofq(�; P; �;M; k). We introduce an important linear transformation that will be used throughoutthe paper.

It is always possible to transform the original expression of a truncated Gaussian into thefollowing:

N!;D;k(0; I) =(q(!;D; k))�1

(2�)1

2

exp��1

2zT z

�Ifz:(z�!)TD�1(z�!)�kg(z);

where D is a diagonal matrix. The result is obtained through a linear transformation, nameddouble diagonalization:

z = �Tp��1V T (x� �)

where: � is the eigenvalue matrix of P , V is the corresponding eigenvector matrix of P and � is

the eigenvector matrix of (p��1V T )M(V

p��1). The transformation is always possible since P

and M are positive de�nite. The vector ! is��T

p��1V T (�� �)

�.

We work with the transformed variable z, since q(!;D;k) = q(�; P; �;M; k). Then (di theinverse of the ith element in the diagonal of D):

q(!;D; k) = Pr

nXi=1

di(zi � !i)2 � k

!=Z(P

n

i=1di(zi�!i)2�k)

exp��1

2zT z

�dz:

Kotz, Johnson and Boyd [9] give a numerical method for the evaluation of this integral based

3

Page 6: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

on its Laguerre expansion. We have:

q(!;D; k) = Pr(�2n � k=�) +

k

2�

!n=2e�

k2�

1Xj=1

cjLj�1(k=(2�))(j � 1)!

�(n=2 + j)

where:

Lr(y) =rX

i=0

(�y)ii!(r� i)!

�(r + n=2 + 1)

�(i+ n=2 + 1)

cr = (2r)�1r�1Xj=0

sr�jcj ; co = 1

sr = (�r=�)nX

j=1

!2i di(1 � di=�)

r�1 +nX

j=1

(1 � dj=�)r:

Convergence is uniform for � >maxj(dj)

2. In general, large values of � yield slow convergence. If

we truncate the evaluation of the series at j = N , the truncation error is always smaller than:

2�N exp�k=(2�) + (n� 1) log 2 + 4!T!

�:

As a one-dimensional example, consider N1:5;1;1(0; 1) (shown in �gure 1). Numerical integrationwith 10 signi�cant digits yields q(0; 1; 1:5; 1; 1) = 0:3023278734. Using � = 1:1, N = 9, we get thesame answer.

2.3 Moment Generating Function of a Truncated Gaussian

The moment generating function of any distribution is a fundamental tool in statistical analysis.We give here an expression for the moment generating function of a truncated Gaussian in themost general form (proof in Appendix A):

Theorem 2 The moment generating function of a truncated Gaussian N!;D;k(0; I) is:

�(t) =q(! � t;D; k)

q(!;D; k)exp

tT t

2

!: (2)

For a particular value of t, we can evaluate �(t) using Kotz, Johnson and Boyd recursions forq(! � t;D; k) and then plugging the result in (q(!;D; k))�1q(! � t;D; k) exp(tT t=2).

4

Page 7: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

2.4 Mean Vector and Covariance Matrix of a Truncated Gaussian

The mean and covariance can be obtained by successive di�erentiations of �(t) [10]. Since Kotz,Johnson and Boyd recursions for �(t) are uniformly convergent we can di�erentiate these expres-sions term by term. We use � for the mean and P for the covariance of a truncated Gaussian.

Direct di�erentiation of 2 yields:

Theorem 3 We have:

� = ko1Xj=1

kjgj

P = I � ��T + ko1Xj=1

kjGj ;

where (kr are scalars, gr and hr are vectors, Gr and Hr are matrices and I is an identity matrix):

kj =(j � 1)!Lj�1(k=(2�))

�(n=2 + j); ko =

k

2�

!n=2e�

k2�

q(!;D; k)

gr = (2r)�1r�1Xj=0

(hr�jcj + sr�jgj) ; g0 = 0

hr =2r

�diag [d1(1� d1=�)

r�1; . . . ; dn(1� dn=�)r�1]!

Gr = (2r)�1r�1Xj=0

�cjHr�j + gjh

Tr�j + hr�jg

Tj + sr�jGj

�; G0 = 0

Hr = �2r

�diag [d1(1 � d1=�)

r�1; . . . ; dn(1� dn=�)r�1]

For a distribution N�;P;k(�; P ) in the radially truncated Gaussian family, the mean is � andthe covariance matrix is P r(�2n+2 � k)(P r(�2n � k))�1P [9].

2.5 Linear Transformation and Summation of Truncated Gaussians

A non-singular linear transformation applied to random vector with truncated Gaussian distribu-tion produces another truncated Gaussian random vector (proofs of theorems in this section arein Appendix A).

5

Page 8: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

Theorem 4 If x � N�;M;k(�; P ) and y = Ax (where A is any non-singular square matrix) theny � NA�;AMAT ;k(A�;APA

T ).

It can be seen, through direct manipulation of truncated Gaussians even in one dimension,that the sum of truncated Gaussian random variables is not a truncated Gaussian. We canderive some results for particular cases. Suppose z = x + y where x � N�x;Mx;kx(�x; Px) andy � N�y;My;ky(�y; Py), x and y independent. Under these conditions:

Theorem 5 The distribution of z has expected value �z = �x+�y and covariance P z = P x+P y;the distribution of z is positive only inside the ellipsoid de�ned by fz : (z� �z)

TM�1z (z� �z) � 1g

where�z = �x + �y Mz = kxMx + kyMy

A reasonable approximation to the distribution of z is N�z;Mz;1(�z; P z).

A special but important case is represented by �x = �x, Mx = Px, �y = �y, My = Py, kx = ky.In this special case we can make statements about the distance between the approximation andthe correct distribution. Call Gz the correct distribution for z; then:

Theorem 6 For �z = �x + �y, Mz = Mx +My, kz = kx = ky, supz jGz � N�z;Mz;kz (�z;Mz)j isO(exp(�k=2)):

So for this case, the indicated approximation is fairly good.

Consider now the most general case: z = x + y, where x and y are arbitrary truncatedGaussians. We shall indicate approximation strategies that work well under speci�c conditions.We consider the vector u, a linear transformation of z so that de�ning ellipsoid is fu : uTu � 1g.Call �u the mean of u and Pu the covariance matrix of u. The distribution of u is obtained byconvolution, so it is unimodal and closer to a bell-shaped function than the original distributions.We may expect that the distribution of u to be closer to a Gaussian than the distribution of x ory, invoking the Central Limit Theorem.

Consider the approximation p1(u) = N0;I;1(�u; Pu). If all eigenvalues of Pu are much smallerthan 1, then this approximation is reasonable because the mean will be approximately �u andthe variance will be approximately Pu. The distribution will always have a maximum inside thede�ning ellipsoid, so it will always yield a smooth approximation to pU (u). An example willillustrate this. Consider distribution N1:5;1;1(0; 1), depicted in �gure 1. Mean is 1.107 and varianceis 0.21288. If we add two random variables with this same distribution, we obtain (numerically)the distribution shown in �gure 2.a. Mean is 2.213 and variance is 0.4257. The approximationN1;5;1(2:213; 0:4257) is shown in �gure 2.b. Both distributions are shown in �gure 2.c. Agreement

6

Page 9: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

1 2 3 4 5 6

a)

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6

b)

0.1

0.2

0.3

0.4

0.5

0.6

1 2 3 4 5 6

c)

0.1

0.2

0.3

0.4

0.5

0.6

Figure 2: Example: Approximation to Summation

-1.5 -1 -0.5 0.5 1 1.5

a)

0.1

0.2

0.3

0.4

0.5

-2 -1 1 2

b

0.1

0.2

0.3

0.4

0.5

Figure 3: Example: Failure of Approximation to Summation

is remarkable even in this highly skewed distribution; the strategy will work better if the meansof the original summands are closer to zero.

We may take the approximation as valid for large variances, but there is a limit to such process.We cannot use this strategy for distributions with arbitrarily large variances. Again, we use anexample to illustrate this situation. Consider the distribution N0;1;1(0; 50), depicted at �gure 3.a.Consider the summation of two random variables with this same distribution. Figure 3.b showsboth the correct distribution and the approximation N0;1;4(0; 100). The disagreement is evident.So we need a speci�c approximation for the case of large variances.

A di�erent strategy would be to approximate the distribution of u by a truncated Gaussian thatmatches the mean and variance of u. This approach is explored in Appendix B; approximationderived there are valid under restrictive conditions on the second moment of the distribution ofu. Derivation of good approximations for all cases is still an open problem.

3 Inferences with the Truncated Gaussian

Inferences about a random variable are obtained by application of Bayes rule associated with adecision rule. We analyze two decision rules: maximum a posteriori estimate and minimum square

7

Page 10: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

loss estimate.

Since the truncated Gaussian is not closed under multiplication, we should not expect to beable to apply Bayes rule and obtain a truncated Gaussian distribution. The next example revealsthe problems that may arise here.

Example 1 Suppose x and ! are related by z = x+ !, p(x) is N0;(1+�)2;3(0; (1 + �)2) and p(zjx)is Nx;(1+�)2;3(x; (1 + �)2). Suppose the observation comes out to be z = 2

p3. Application of Bayes

rule for p(xjz) yields a distribution concentrated between [p3� �;

p3 + �], tending to a spike as �

goes to zero. 2

Fortunately, we can �nd a good approximation methodology for a situation relevant to practicalapplications. We consider a linear set-up:

z = Hx+ !z (3)

where x 2 <n and z 2 <m, A and H are matrices of appropriate dimensions, x is distributed as atruncated Gaussian N�x;Mx;kx(�x; Px); !z is distributed as a radially truncated Gaussian with zeromean N0;Pz;kz(0; Pz). As a result, z � NHx;Pz ;kz(Hx;Pz).

The central question is: given that we observe the value of z, what can we say about x?

The posterior is not a truncated Gaussian because the region where the distribution is positiveis not an ellipsoid (it is the intersection of two ellipsoids). The distribution still is proportional toa Gaussian in the points where it is positive, as we show now. Consider two distributions:

q1 exp��1

2(x� �x)

TP�1x (x� �x)

�IA(x) q2 exp

��1

2(z �Hx)TP�1

z (z �Hx)�IB(x);

where A and B are arbitrary convex sets, not necessarily ellipsoids. If we multiply these distribu-tions together, the result is positive only on the set A \ B; on this set the result is proportionalto:

W = exp��1

2(x� �x)

TP�1x (x� �x) + (z �Hx)TP�1

z (z �Hx)�:

By using the Matrix Inversion Lemma [1], we get the following standard result of linear systemstheory:

W / exp��1

2(x� �0)TQ�1(x� �0)

where Q = Px �PxHT (HPxH

T +Pz)�1HPx and �0 = Q(P�1x �x +P�1

z HT z). This shows that theshape of the distribution is proportional to a Gaussian. So the �nal distribution will be:

c

j2�det(Q)jn=2 exp��1

2(x� �0)TQ�1(x� �0)

�IA\B(x); (4)

8

Page 11: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

where c is a normalizing constant.

So we can easily recover the shape of the distribution; the normalizing constant c still isunknown and depends on A\B. The main di�culty is to �nd a good and tractable approximationto this region. This is presented in the next section.

3.1 Intersecting Ellipsoids

In order to approximate the posterior, an ellipsoidal approximation can be built so that theapproximate posterior is always a truncated Gaussian. This method was originally proposed byFogel and Huang [7] in the context of tolerance sets for ARMA processes. Since the algorithmapproximates only the geometric intersection of ellipsoids, the values of � and P do not matter.We indicate truncated ellipsoids by N (�;M; k) in this section.

3.1.1 Preliminary Approximation

We make some preliminary approximations:

� Consider expression (3). Suppose !z � N(0;W; 1=a) and x � N (�; P; b).

� Now take a linear transformation A such that AWAT = I (always possible since W mustbe positive de�nite). So we have Az = AHx + A!z. Now consider B = AH, y = Az and!y = A!y; we have:

y = Bx+ !y

and !y � N (0; I; 1=a).

� We know that a!Ty !y � 1; instead of trying to satisfy this inequality, we will satisfy the set

of inequalities:a!2

yi � 1 for i = 1; . . . ;m:

The meaning of this approximation is the following: the original constraint de�nes a spheroidin <m; the new constraint de�nes the m-dimensional cube that circumscribes the spheroid.

� By rearranging terms we have:

a(yi �Bix)2 � 1 for i = 1; . . . ;m: (5)

where yi is the ith element of y and Bi is the ith row of B.

Now the problem is how to approximate the following set:

�m =nx :

h(x� �)TP�1(x� �) � b

i\h\m1 a(yi �Bix)

2 � 1io:

9

Page 12: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

3.1.2 Fogel/Huang Algorithm

Consider the following de�nitions:

Po = P

�o = �

bo = b

�o =nx :

h(x� �o)

TP�1o (x� �o) � bo

ioSi = fx : a(yi �Bix)

2 � 1g for i = 1; . . . ;m:

Each set Si de�nes a strip (a region in <n between two hyperplanes).

The Fogel/Huang strategy is to start with �o and intersect this ellipsoid with one strip at atime, approximating the intermediate intersections by ellipsoids: We could generate �m by therecursion:

�i+1 = Si+1 \�i:

� We approximate �i+1 as:

�i+1 =nx :

h(x� �)TP�1

i (x� �) + aqi+1(yi+1 �Bi+1x)2 � (bi + qi+1)

io(6)

where qi+1 is a free parameter to be determined. �i+1 contains the intersection between �i

and the strip Si+1.

� We can transform expression (6) into an explicit ellipsoid expression1

�i+1 =nx : (x� �i+1)

TP�1i+1(x� �i+1) � i+1

o(7)

where:

P�1i+1 = P�1

i + aqi+1BTi Bi Pi+1 = Pi � aqi+1

PiBTi BiPi

1 + aqi+1BiPiBTi

(8)

�i+1 = Pi+1(P�1i �i + aqi+1B

Ti yi+1) bi+1 = bi + qi+1 � aqi+1

(yi+1 �Bi�i)2

1 + qi+1BiPiBTi

:

� We must determine qi+1 by some convention. Fogel and Huang prove that it is possible tochoose qi+1 so that the square of the volume of the ellipsoid �i+1 is minimized. By doingso, we obtain (proof of this claim is in Appendix A):

qi+1 =

8<:

0 if �22 � 3�1�3 < 0 or ��2 +q�22 � 4�1�3 � 0

��2+p

�22�4�1�3

2�1otherwise.

1Using the following standard result: (x� a)TA(x� a)+ (x� b)TB(x� b) = (x� c)TC(x� c) where C = A+B

and c = C�1(Aa +Bb), and the Matrix Inversion Lemma [1].

10

Page 13: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

where:

�1 = a2(n� 1)(Bi+1PiBTi+1)

2

�2 = (Bi+1PiBi+1(2an � a+ a2biBi+1PiBi+1 + a2(yi �Bi�i)2 (9)

�3 = n(1� a(yi �Bi�i)2)� aBi+1PiBi+1:

Summary of Fogel/Huang Algorithm The algorithm is a sequence of m steps. At each step:

1. Get qi+1 from the results of the previous step using expressions (9).

2. If qi+1 = 0, the measurement did not cause any change in the ellipsoids. Updating is theellipsoid is not necessary.

3. Update �i+1 using expressions (7) and (8).

4. If bi+1 < 0, the likelihood and the prior are in con ict: the sets have empty intersection.This case is discussed below.

We take the de�ning ellipsoid for the posterior distribution to be fx : (x��m)TP�1m (x��m) � bmg.

This is equivalent to the setnx : (x� �)TP�1(x� �) +

Pmi=1 aqi(yi �Bix)2 � b+

Pmi+1 qi

o. Our

approximation is complete.

3.2 Updated Posterior

The posterior distribution is a combined result of expression (4) and Fogel/Huang algorithm; weuse N�m;Pm;bm(�

0; Q) as the posterior. By looking at the recursions in expressions (7) and (8),we notice that, if qi+1a = 1 for every i, then the �nal approximation is in the radially truncatedGaussian family.

Some examples clarify the use of our algorithm.

Example 2 Consider the situation:

z =

"4 12 3

#x+ ! ! � N0;I;1(0; I) x � Na;B;1=4(a;B)

where x is unknown and:

a =

"11

#B =

"2 00 2

#z =

"23

#:

11

Page 14: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

Figure 4.a shows the initial ellipsoids of the problem. The region of possible values of x is theintersection of ellipsoids. Figure 4.b shows the strips generated at expression (5); the intersectionof these strips with the prior ellipsoid will de�ne the posterior approximation.

The algorithm is applied and results are shown in �gure 4.c. The largest hatched ellipse is theresult of incorporating z1 = 2; the smallest hatched ellipse is the �nal result after incorporation ofz2 = 3. The approximate posterior is Nc;D;e(f;G) where e = 0:257 and:

c =

"0:3630:547

#D =

"0:4488 �0:3272�0:3272 0:7555

#f =

"0:3210:7418

#G =

"0:091 �0:0867�0:0867 1:8857

#:2

The approximations found by the algorithm are correct in that the intersections are alwaysinterior to the successive approximations. It should be noted that the original prior distributionfor x is almost at; it is possible, using our approach, to approximate situations where the strictGaussian model would be brittle.

As a comparison, we analyze what would have happened if we had used an approximationstrategy in the spirit of [1] or [2]. We would pretend x and ! to be distributed as unboundedGaussians (with same mean and variance). Now we �nd a crucial problem: how to combine thefact that x has an elliptic region with radius 1/4 and ! has an elliptic region with radius 1?By using Bayes rule in the unbounded Gaussians N(0; I) and N(a;B), we obtain an unboundedGaussian N (f;G). The hatched ellipsis in �gure 4.d shows the results of using these values ofmean and variance with radius 1/4, 5/8 and 1. All ellipsis fail to cover the region where x is knownto lie. Radius 1/4 is extreme: almost all values of x inferred from this posterior are inconsistentwith prior expectations!

Example 3 Consider the situation:

z =

"1 00 1

#x+ ! ! � N0;I;9(0; I) x � N0;I;4(0; I);

z = [3; 7]T and x is unknown.

Figure 5.a shows the ellipsis de�ned by prior expectations and measurements. It is clear thatthe measurement is inconsistent, indicating either the presence of outliers or mismodeling. Incor-poration of z1 yields the hatched ellipse in the �gure; y1 is consistent. When z2 is incorporated,we obtain b2 = �11:49, indicating failure. 2

Notice that detection of inconsistency is important and not trivial when using Gaussian dis-tributions. If we were to suppose that x � N (0; I) and ! � N(0; I), then the posterior would

be N

"1:53:5

#;

"0:5 00 0:5

#!. Again, it is not clear how to choose the radius for the truncated

posterior. Figure 5.b shows the resulting ellipsis for radius 4, 6.5 and 9; none of them re ect thetrue state of a�airs.

12

Page 15: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

-0.6 -0.4 -0.2 0.2 0.4 0.6

(a) Initial Ellipsoids

-0.5

0.5

1 Likelihood Ellipsoid

Prior State Ellipsoid

-1 -0.5 0.5 1

(b) Intersecting Strips

-1

-0.5

0.5

1

1.5

-0.6 -0.4 -0.2 0.2 0.4 0.6

(c) Bounded Approximation

-0.5

0.5

1

-0.6 -0.4 -0.2 0.2 0.4 0.6

(d) Unbounded Approximation

-0.5

0.5

1

Figure 4: Example of Inference Algorithm for Bounded Distributions

13

Page 16: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

-2 2 4 6

(a) Bounded Approximation

-2

2

4

6

8

10

Likelihood Ellipsoid

Prior State Ellipsoid

First Measurement Update

-2 2 4 6

(b) Unbounded Approximation

-2

2

4

6

8

10

Figure 5: Example of Inference Algorithm for Bounded Distributions in Case of No Intersection

3.3 Inferences based on Maximum a Posteriori

A common estimate is the one obtained by maximizing the posterior density p(xjz). Considerx � N�;M;k(�; P ). We take the estimate x = arg maxxN�;M;k(�; P ). The optimization problem issimpli�ed if we perform a double diagonalization. We look for:

w = arg maxwN�;D;k(0; I):

There are two possible cases:

� If �TD�1� � k then 0 is inside the de�ning ellipsoid. Since the Gaussian is unimodal, themaximum a posteriori is w = 0 (it must be transformed back to x).

� Otherwise, the maximum occurs on the boundary of the de�ning ellipsoid. So the optimiza-tion problem is:

w = arg minfw:(w��)TD�1(w��)=kgwTw

De�ning a Lagrange multiplier �, we obtain the Lagrangian:

w = �h(w � �)TD�1(w � �)� k

i+ wTw

Di�erentiation of the Lagrangian with respect to w and � produces a set of equations:

wi + �wi � �iDii

= 0 for i = 1; . . . ; n andnXi=1

(wi � �i)2

Dii= k:

14

Page 17: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

If we isolate wi in each one of the linear equations we obtain wi =�i�

�+Dii. Combining all

equations:nXi=1

Dii

�i

�+Dii

!2

= k: (10)

We must solve this equation for �. Consider the function g(�) =Pn

i=1Dii

��i

�+Dii

�2. g(�)

is always positive and goes to zero if � goes either to in�nity or minus in�nity. g(�) hassingularities at �Dii for all i; at these points it goes to in�nity. Suppose we order the valuesof �Dii so that we have D1 � D2 � . . . � Dn < 0. We know that the solution for equation(10) that corresponds to a minimum is a positive �, since this is the only way we can havewi < �i for all i. So the solution must lie in the interval (Dn;1). So the constrainedminimization is solved by performing a double diagonalization, sorting the values of Dii andsearching for the solution of equation (10) in the interval (Dn;1). Since the function g(�)is decreasing in this interval, Newton's method will work if started with any value �, largerthan Dn but smaller than the solution.

3.4 Inferences based on Square Loss

Another common estimate is obtained by minimizing expected square loss: L(x; x) = (x� x)T (x�x). The optimal estimate is the conditional mean x = E[xjz] [4]. Since we have the posterior, wecan evaluate the mean by the methods previously presented.

4 Filtering with the Truncated Gaussian

Using all results so far presented, we can derive a �ltering scheme akin to the Kalman �lter butusing truncated Gaussians.

Consider �rst the simple linear system:

xi+1 = Axi

(11)

zi+1 = Bxi+1 + !i:

We assume xo distributed as a truncated Gaussian and ! distributed as zero mean radially trun-cated Gaussian. A and B are matrices of appropriate dimensions.

Given this set-up, we are interested in obtaining p(xi+1jz1; . . . ; zi+1):p(xi+1jz1; . . . ; zi+1) / p(zi+1jxi+1; z1; . . . ; zi)� p(xi+1jz1; . . . ; zi)

= p(zi+1jxi+1)� p(xi+1jz1; . . . ; zi)

15

Page 18: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

The distribution p(zi+1jxi+1) is directly obtained from the distribution of !i. We need to �nd thedistribution p(xi+1jz1; . . . ; zi) = p(Axijz1; . . . ; zi). Using theorem 4 we can obtain p(Axijz1; . . . ; zi)from the available p(xijz1; . . . ; zi). So the �ltering scheme has the following structure:

1. Propagate forward the uncertainty in xi.

2. Use expression (4) and Fogel/Huang algorithm in order to fuse the incoming informationzi+1 with xi.

If necessary, the best estimate of x can be obtained at any time.

Extension of the �lter above to a noisy state problem is straighforward. Consider the systemmodel:

xi+1 = Axi + !xi(12)

zi+1 = Bxi+1 + !zi :

In this case we must obtain p(xi+1jz1; . . . ; zi) by calculating p(Axijz1; . . . ; zi) and combining itwith p(!zi ). Having done this, we can propagate forward the uncertainty in xi (by properlyapproximating the summation of Axi and !zi) and fuse it with the incoming information (throughthe Fogel/Huang algorithm).

5 Using the truncated Gaussian: Disparity Constraint

In this section we analyze a practical problem in Computer Vision using truncated Gaussians.The example will illustrate the power of the method.

Consider two cameras perfectly aligned so that all epipolar lines are parallel. Suppose a featureis detected in one camera at pixel x2. The correspondence to the feature must lie in the epipolarline in the other camera in some pixel x1. The question is: how much of the epipolar line shouldwe explore in order to �nd the correspondence x1?

From the basic optics of the problem we can derive the following equation [6]:

x1 = x2 +Bf

z; (13)

where x2 is the known coordinate of the correspondence (in pixels), B is the baseline distance (inmeters), f is the focal length (in pixels) and z is the depth of the feature (in meters). Only z isunknown; as an example, we take x2 = 0, f = 600 and B = 0:5.

16

Page 19: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

A powerful constraint that can be used is the fact that the point x1 cannot be arbitrarily farfrom the given value of x2. This constraint has not been justi�ed or analyzed with the help of astatistical model.

The model (13) does not take into account the fact that every camera has �nite depth of view.Not all values of z are allowed in a real camera. This can be modeled by a radially truncatedGaussian: pZ(z) = N�;�2;k(�; �

2). As an example, we take � = 5, �2 = 4 and k = 3. Note that thevariance is large, re ecting a possible situation of ignorance with respect to z. The distributionof z is shown in �gure 6.a. The distribution of x1 is [10]:

pX1(x1) =

1

Bfx21pX

1

Bfx1

!:

This distribution is graphed in �gure 6.b.

A simple calculation will show that x1 2 [35:4438; 195:325]. This could have been obtained froma tolerance set approach: if we simply assume z 2 [1:5359; 8:4641], then x1 2 [35:4438; 195:325].But notice that by using truncated distributions, we obtain much more information beyond thebounds: we know where we should invest time searching if we have only bounded resources; weknow the means and variances of the quantities.

Consider now the following extension of the problem: suppose we have additional noise presentin the imagery process, so that:

x1 = x2 +Bf

z+ !:

where ! � N0;1;10(0; 1). How are we to use this valuable information if we work only with bounds?We show now how to easily handle this in the framework of bounded distributions.

We can easily obtain the mean and variance of x2 + (Bf)=z by numerical integration. If thisis considered a burden, we can linearize expression (13) around the mean of z:

x1 � x2 +Bf

�� Bf

�2(x1 � �)

and from this we can approximate the mean and variance of x1. With the values previously out-lined, we obtain mean 60 and variance 382.25. Since the standard deviation is much smaller thanthe domain of de�nition, we takeN115:38;6490:5;1(60; 382:25) as an approximation for the distributionof x2+ (Bf )=z. Figure 6.c shows the agreement between approximation and correct distribution.Now we can proceed and combine this with !; we can actually use this measurement for �lteringpurposes through algorithms outlined in previous sections.

Since our approach is mainly approximative, it is interesting to compare its performance withsimilar methods in the literature. We already mentioned the inability of tolerance sets to captureall the information in the most simple situations. We now analyze another common strategy.

17

Page 20: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

2 4 6 8 10

a)

0.05

0.1

0.15

0.2

50 100 150 200

b)

0.005

0.01

0.015

0.02

50 100 150 200

c)

0.005

0.01

0.015

0.02

50 100 150 200

d)

0.005

0.01

0.015

0.02

Figure 6: Analysis of the Disparity Constraint

Suppose we consider z as a unbounded Gaussian variable. We approximate the distribution ofx1 through a linearization: x1 � N (60; 382:25). How to justify the disparity constraint? We mustarbitrarily set a threshold, based on extraneous heuristic knowledge, if we want to decide whethera value of x1 is possible or impossible. The model neglects the physical source of the constraint,the camera's depth of �eld. And �nally, the model is also an approximation, a bad one indeed.Figure 6.d shows the roughness of the approximation. The �gure depicts the result of truncatingthe Gaussian N (60; 382:25) after 2 standard deviations: some possible values of x1 are discarded,many impossible values of x1 are included. It the correct value of x1 is located between 100 and195, a failure will result. We could try to cover all possible values of x1, but unless we set thenecessary thresholds in a very conservative way, some possible values of x1 may be discarded asimpossible.

6 Conclusion

The central idea of this research is: measurements that are bounded must be properly modeled inorder to be consistently used. Our focus is not so much is data produced by selection mechanisms;we focus on data that is fundamentally bounded. We take this type of data to be fundamental inRobotics application such as computer vision or object recognition. Our proposal is to use a familyof statistical models (the truncated Gaussian family) that captures measurement boundedness.We o�ered a comprehensive analysis of estimation aspects for the truncated Gaussian: algorithmsfor information handling, updating and �ltering. An open problem is to �nd good approximationsfor summations of truncated Gaussians.

The statistical approach implied by bounded distributions is more interesting than pure toler-ance sets in a variety of grounds. First, we can use the powerful language of Probability theory todraw conclusions. Second, usually the mean and variance of disturbances are known or can be es-timated. We should use this valuable information if available, as an example illustrates: Considera variables a 2 [��=2; �=2] and a derived quantity b = tan(a), such that a 2 [��=2; �=2]. Without

18

Page 21: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

any statistical knowledge, we only obtain that b 2 (�1;1)! If instead we take a � N0;1;1(0; 1),then we can obtain b � (1 + b2)N0;1;1(0; 1) [10], from where the analysis can proceed. Situationslike this can appear, for example, when using triangulation in order to recover position. Theadaptation of tolerance sets to the language of Probability theory is an area in which our modelscan be applied.

Another possible application of our results is to use a Gaussian model for unbounded data,but de�ne selection mechanisms on the data. We discussed this strategy in section 3.2; this isa common problem in Statistics. Our methods have the advantage that, although approximate,they never assign probability zero to a possible value of the underlying unknowns. Algorithmshere developed can �nd application in any of the usual problems of measurement error analysis,when measurements are bounded and when measurements are selected. In particular, the correctdevelopment of a Bayesian analysis of validation gates, as used by Bar-Shalom and Fortmann orCox, is an imediate example of application.

Acknowledgement

We gratefully acknowledge Ofer Barkai for his help in the proof of theorem 6. This paperbene�ted from a several suggestions made by Mike Erdmann in the reading of a �rst draft.

References

[1] Y. Bar-Shalom and T. E. Fortmann, Tracking and Data Association, Academic Press, Boston,1988.

[2] I. J. Cox, A Review of Statistical Data Association Techniques for Motion Correspondence,International Journal of Computer Vision, 10(1): 53-66, 1993.

[3] H. Cram�er, Mathematical Methods of Statistics, Princeton University Press, 1946.

[4] M. DeGroot, Optimal Statistical Decisions, McGraw-Hill Inc., 1970.

[5] M. Erdmann, Randomization in Robot Tasks, International Journal of Robotics Research,11(5):399-436, October 1992.

[6] O. Faugeras, Three-Dimensional Computer Vision: a Geometric Viewpoint, The MIT Press,1993.

[7] E. Fogel and Y. F. Huang, On the Value of Information in System Identi�cation - BoundedNoise Case, Automatica, 18(2):229-238, 1982.

[8] W. E. L. Grimson, Object Recognition by Computer: the Role of Geometric Constraints, MITPress, 1990.

19

Page 22: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

[9] N. L. Johnson and S. Kotz, Distributions in Statistics: Continuous Multivariate Distributions,John Wiley and Sons, Inc., New York, 1972.

[10] A. Papoulis, Probability, Random Variables and Stochastic Processes, 3rd edition, McGraw-Hill, 1991.

[11] L. D. Servi and Y. C. Ho, Recursive Estimation in the Presence of Uniformly DistributedMeasurement Noise, IEEE Trans. on Automatic Control, AC-26(2):563-564, 1981.

20

Page 23: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

A Proofs of Theorems

Theorem 1 Given a expected value is �, a covariance matrix Q and the fact that a distributionis zero outside the set fx : (x� �)TQ�1(x� �) � kg, a maximum entropy distribution that obeysthese conditions is a truncated Gaussian N�;cQ;k0(�; cQ), where c = Pr(�2

n � k)(Pr(�2n+2 � k))�1

and k0 = kPr(�2n+2 � k)(Pr(�2

n � k))�1.

Proof: It is known that the distribution which is positive in a region C and which maximizesentropy for given mean and covariance matrix is of the form [10]:

p(x) = �o exp���1x� (x� �)T��12 (x� �)

�for x 2 C and zero otherwise. �o, �1 and �2 are respectively a scalar, a vector and a matrix.

The key to this optimization is to note that w logw is convex and the constraints are linear,so the solutions for �o, �1 and �2, if they exist, are unique and optimal. By straighforwardsubstitution, we notice that if p(x) = N�;cQ;k0(�; cQ) then the mean is �, the variance is Q [9] andthe de�ning ellipsoid is f(x � �)Q�1(x � �) � kg. So the truncated Gaussian is the maximumentropy solution. 2

Theorem 2 The moment generating function of a truncated Gaussian N!;D;k(0; I) is:

�(t) =q(! � t;D; k)

q(!;D; k)exp

tT t

2

!: (14)

Proof:

�(t) =(q(!;D; k))�1q

(2�)n

Z(Pn

i=1di(zi�!i)2�k)

exp��1

2zT z + tTz

�dz

=(q(!;D; k))�1q

(2�)nexp

tT t

2

!Z(Pn

i=1di(zi�!i)2�k)

exp��1

2(z � t)T (z � t)

�dz

=(q(!;D; k))�1q

(2�)nexp

tT t

2

!Z(Pn

i=1di(wi+ti�!i)2�k)

exp��1

2wTw

�dw

= (q(!;D; k))�1 exp

tT t

2

!q(! � t;D; k)

2

Theorem 4 If x � N�;M;k(�; P ) and y = Ax (where A is any non-singular square matrix) theny � NA�;AMAT ;k(A�;APA

T ).

21

Page 24: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

Proof: The distribution of y is given by pY (y) =1

det(A)fX(A�1x); this gives:

pY (y) =1

jdet(A)jq(�; P; �;M; k)

j(2�)ndet(P )j 12 �

exp��1

2(y �A�)TA�TP�1A�1(y �A�)

�If(y�A�)TA�TM�1A�1(y�A�)�kg(y)

=q(�; P; �;M; k)

j(2�)ndet(A)det(M )det(A)j 12 �

exp��1

2(y �A�)T (APAT )�1(y �A�)

�If(y�A�)T (AMAT )�1(y�A�)�kg(y)

= NA�;AMAT ;k(A�;AMAT )

2

Theorem 5 The distribution of z has expected value �z = �x+�y and covariance P z = P x+P y;the distribution of z is positive only inside the ellipsoid de�ned by fz : (z� �z)TM�1

z (z� �z) � 1gwhere

�z = �x + �y Mz = kxMx + kyMy

Proof: Given the independence of x and y, the expressions for �z and P z are straightforward.It is necessary to prove that p(z) is positive in an ellipsoid fz : (z � �z)TM�1

z (z � �z) � kg. Forthis, apply a linear transformation to x and y so that:

x0 = �T (p��1V T )(x� �x)

y0 = �T (p��1V T )(y � �y)

where: � is the eigenvalue matrix of Mx, V is the corresponding eigenvector matrix of My and �

is the eigenvector matrix of (p��1V T )My(

p��1V T ). As a result of this transformation:

x0 � N0;I;kx(0; I)

y0 � N0;L;ky(0; L)

where I is an identity matrix of appropriate dimension and L is a diagonal matrix of appropriatedimension. The distribution of u = x0+ y0 is given by the multi-dimensional convolution betweenx0 and y0. So the distribution of u is positive in a set obtained by sliding a spheroid along anellipsoid. In other words, u is positive inside an ellipsoid de�ned by fu : uT (kxI + kyL)�1u � 1.In order to translate this into a constraint about z, we use two facts:

u =��T (

p��1V T )

�(x+ y � �x � �y) =

��T (

p��1V T )

�(z � �z) (15)

22

Page 25: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

and �Vp��1�(kxI + kyL)

�1�Tp��1V T

��1= kxV

p���T

p�V T + kyV

p��L�T

p�V T

= (kxMx + kyMy) (16)

in order to obtain:fu : uT (kxI + kyL)

�1u � 1g =

fz :��T (

p��1V T )(z � �z)

�T(kxI + kyL)

�1��T (

p��1V T )(z � �z)

�� 1g =

fz : (z � �z)T (kxMx + kyMy)

�1(z � �z) � 1g =fz : (z � �z)

TM�1z (z � �z) � 1g

2

Theorem 6 For �z = �x + �y, Mz = Mx +My, kz = kx = ky, supz jGz � N�z;Mz;k(�z;Mz)j isO(exp(�k=2)):

Proof: It is enought to prove that, for a given ko, there exists a constant � (which dependsonly on n) such that

d supz jGz �N�z;Mz;k(�z;Mz)jdk

< ��exp(�k=2)2

for k > ko:

We start with the same double diagonalization used in the previous theorem. Consider thevector [u; v]T given by:

u = x0 + y0

v = y0

The distribution of u is given by the multi-dimensional convolution between x0 and y0:

pU (u) =ZWu

pX 0(u� v)pY 0(v)dv

where Wu = fv : (u � v)T (u � v) � kg \ fv : vTL�1v � kg. Using the distributions of x0 and y0

we obtain:

pU (u) =ZWu

q(k)�2

j(2�)2ndet(L)j 12 exp��1

2

�(u� v)T (u� v) + vTL�1v

��dv (17)

Expression (17) can be simpli�ed by noting two facts:

23

Page 26: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

� Call li an eigenvalue of L (L is diagonal):

det(L) =Yi

li =Yi

(1 + li)li

1 + li=Yi

(1 + li)Yi

(1 + 1=li)�1 = det(I + L)det(I + L�1)�1:

� Consider the quadratic form (u� v)T (u� v) + vTL�1v; it is a summation of terms:

(ui � vi)2 +

v2ili

=u2i

1 + li� u2i

1 + li+ u2i � 2uivi +

v2ili

=u2i

1 + li� 1 + li

li

l2iu

2i

(1 + li)2� 2uivi

li1 + li

+ v2i

!

=u2i

1 + li� 1 + li

li(vi � (1 + 1=li)ui)

2 :

Applying these results, expression (17) can be written as:

pU (u) = I(u)q(k)�1

j(2�)ndet(I + L)j 12 exp��1

2uT (I + L)�1u

where

I(u) =ZWu

q(k)�1

j(2�)ndet((I + L�1)�1)j 12 exp��1

2(v � (I + L�1)u)T (I + L�1)(v � (I + L�1)u)

�dv

so u � I(u)Nk(0; I + L). By using expressions (15) and (16), we transform pU (u) into p(z):

p(z) = I(z)q(k)�1

j(2�)ndet(Mz)j 12exp

��1

2(z � �z)

TM�1z (z � �z)

where Wu is transformed into Wz by straightforward substitution and

I(z) = q(k)�1ZWz

1

j(2�)ndet((I + L�1)�1)j 12

exp

�1

2

�v � (I + L�1)

��Tp��1V T

�(z � �z)

�T(I + L�1)

�v � (I + L�1)

��Tp��1V T

�(z � �z)

��dv:

Observe that I(z) is equal to q(k)�1 times the integral of a Gaussian in a region of <n (so thisintegral is strictly less than 1). We can state I(z) < q(k)�1. So we can write:

supzjGz �N�z;Mz;kz (�z;Mz)j = sup

zjI(z)Nk(�z;Mz)�Nk(�z;Mz)j <

(q(k)�1 � 1) supuN�z ;Mz;k(�z;Mz) =

q(k)�1(q(k)�1 � 1)

j(2�)ndet(Mz)j

24

Page 27: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

Consider the following function of k:

g(k) = q(k)�1(q(k)�1 � 1)

Then:

dg(k)

dk= �

2642�

R k0

12n=2�(n=2)

xn=2�1 exp(�x=2)dx�R k0

12n=2�(n=2)

xn=2�1 exp(�x=2)dx�3375 k

n=2�1 exp��k

2

�2n=2�(n=2)

and since the term in brackets is larger than one:

dg(k)

dk< �

kn=2�1 exp��k

2

�2n=2�(n=2)

< �kn=2�1o exp

��k

2

�2n=2�(n=2)

for all k > ko. By collecting constants in �, the result is proven. 2

Claims in Appendix 3

qi+1 is given by equations (9).

Call Vi+1 the square of the volume of the ellipsoid �i+1. Since �i+1 always contains the desiredintersections for any value qi+1, we are interested in the smallest possible �i+1. The value of qi+1is then chosen so that Vi+1 is minimal.

We have Vi+1 = cnbni+1det(Pi+1), where cn is the volume of a n-dimensional spheroid. Some

algebraic manipulations show that in order to minimize Vi+1 we must minimize

bi + qi+1 � aqi+1(yi �Bi�i)2=(1 + aqi+1BiPiBTi )

1 + aqi+1BiPiBTi

with respect to qi+1.

Claims in Section B

1. Equation (19) has a unique solution if 0 < �2 < 1=(n + 2).

Proof: Call h = c�2; there is a one-to-one correspondence between c and h. So we canstudy the following equation:

hP r(�2

n+2 � 1=h)

Pr(�2n � 1=h)

= �2:

We analyze the function g(h) = hPr(�2n+2�1=h)

Pr(�2n�1=h). Notice that:

25

Page 28: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

� limh!0 g(h) = 0.

� limh!1 g(h) = 1n+2

. This is obtained by using the following expansion of the chi-square

probability: Pr(�2n � x) = e�x=2(x=2)n=2

P1i=0

(x=2)i

�(n=2+1+i).

� g(h) is equal to:

g(h) =ZxTx�1

1q(2�h)n

exp

�xTx

2h

!dx;

which is increasing with h (variance increases with h).

So g(h) is increasing from 0 to 1=(n + 2). Any value of �2 strictly between these limits willproduce a solution for d and hence for c. 2

2. IntegralRxTx�1 xx

Tp4(x)dx is equal to expression (23).

Proof: We have four terms to integrate:

� RxTx�1 xxT1p4(x)dx is equal to �2I.

� RxTx�1 xxT�2tr(L)p4(x)dx is equal to �4tr(L)I.

� RxTx�1 xxT �T

�2xp4(x)dx is zero.

� RxTx�1 xxT (xTLx)p4(x)dx is a diagonal matrix. Consider element i in the diagonalof this matrix. It is a summation of one term Lii

RxTx�1 x

4ip4(x)dx, and n � 1 terms

Ljj

RxTx�1 x

2ix

2jp4(x)dx. The fourth moment of any xi is 6�

4 and the second crossedmoments of x2ix

2j are 2�

4 . So the matrix is equal to 4�4 L+ 2�2 tr(L)I.

Putting all these terms together, we obtain expression (23). 2

B Approximations of Bounded Distributions given First

and Second Moments

One may want to approximate a distribution with bounded support by a truncated Gaussian.For example, it may be necessary to obtain an approximation for summations of truncated Gaus-sians. A natural approximation is to pick a truncated Gaussian that matches the �rst and secondmoments of the original distribution.

In this section, we assume that the distribution to be approximated is positive in the spheroidfx : xTx � 1g and has a diagonal second moment matrix Q. This may be assumed without lossof generality if the original distribution is positive in an ellipsoid (since we can perform a doublediagonalization). Approximations are valid if no eigenvalue of Q exceeds 1=(n+2), where n is thedimension of the space.

26

Page 29: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

B.1 Equal Variances Case

In this section we consider a distribution fX(x) with Q = �2I and mean �. Take an initialapproximation of the form:

p1(x) =1

qq(2�c�2)n

exp

� xTx

2c�2

!IxT c�1(x); (18)

where c is the solution of the equation:

cPr(�2

n+2 � 1=(c�2))

Pr(�2n � 1=(c�2))

= 1: (19)

This equation always has a unique solution if 0 < �2 < 1=(n+ 2), as proved in Appendix A. Thelimiting value 1=(n+ 2) corresponds to the second moment of a constant distribution.

The variance of X under the approximation p1(x) is exactly �2I. But the mean is zero. Weadjust the mean through a linear term:

p2(x) =1

qq(2�c�2)n

exp

� xTx

2c�2

! 1 +

�T

�2x

!IxTx�1(x): (20)

The function p2(x) is not a distribution, since it may have negative values. So we use anotherapproximation: (1 + (�T=�2)x) � exp(�T=�2x), in order to obtain:

p3(x) =1

q0q(2�c�2)n

exp

�xTx� 2c�Tx+ c2�T�

2c�2

!IxTx�1(x): (21)

Essentially, we are constructing a Edgeworth-like expansion [3] for fX(x) and then approximat-ing it by an exponential. If we have higher moments we can increase the number of terms in theapproximation by building a sequence of orthogonal polynomials (with norm based on exp(�x2)),as detailed by Cram�er [3].

Example 4 Consider the distribution f (x) = ((1 � x)=2)Ix2�1(x), with mean -1/3 and variance2/9. This distribution is highly skewed. We obtain c = 1:51696 and:

p3(x) =1

1:16599q2�(2=9)c

exp

�(x+ (c=3))2

2(2=9)c

!Ix2�1(x):

Figure 7 shows the graphs of f(x) and p3(x). 2

27

Page 30: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

-1 -0.5 0.5 1

0.2

0.4

0.6

0.8

1

Figure 7: Distribution f(x) and the approximation p3(x).

B.2 General Case

We relax the assumptions of last section, taking Q to be diagonal. Take function p1(�) in expression(18) as an initial approximation. There is freedom in the choice of �2. We choose �2 to be thelargest eigenvalue of Q; the constant c is the solution of equation (19). Approximation p1(�) doesnot produce the correct mean and variance. First we adjust the mean through a linear termobtaining function p2(�) as in expression (20). Approximation p2(�) is atter than the originaldistribution. We can stop here and obtain p3(�) as in expression (21).

Alternatively, we can improve the approximation by adding more terms to p2(�). Pick aquadratic term:

p4(x) =1

qq(2�c�2)n

exp

� xTx

2c�2

! 1 +

�T

�2x+ (xTLx� �2tr(L))

!IxTx�1(x):

L is a diagonal matrix that produces the correct second moment for p4(�). Since p2(�) is atterthan the original distribution, we de�ne L as a diagonal negative de�nite matrix.

Function p4(x) may not be a distribution. So we use another approximation:

1� �2tr(L) +

�T

�2x+ xTLx

!� exp

l0 +

l0

1� �2tr(L)�=�2x+

l0

1� �2tr(L)xTLx

!; (22)

where l0 must be obtained numerically by solving l0 exp(l0) = 1 � �2tr(L). Solution is alwayspossible since L is de�ned to be negative de�nite. The �nal approximation is the product of p1(x)and the approximation in expression (22).

We now indicate how to obtain L. The value ofRxTx�1 xx

Tp4(x)dx is (Appendix A):

4 �4L+ (2 �4 � �4)tr(L)I + �2I (23)

28

Page 31: Abstract · 2008-10-19 · Abstract This w ork fo cuses on the use of truncated Gaussian distributions as models for b ounded data {measuremen ts that are constrained to app ear b

where = c2Pr(�2n+4 � (1=(c�2)))(Pr(�2

n � (1=(c�2))))�1. Call A the vector of diagonal elementsof Q and B the vector of diagonal elements of L; then (1 is a vector of ones):

I +

1

2� 1

4

!11T

!B =

A� �21

4 �4:

Closed form solution of this equation is:

B =

I � 2 � 1

(4 + 2n) � n11T

!A� �21

4 �4: (24)

From B we can obtain L. These formal manipulations do not guarantee that the resulting Lmatrix is negative de�nite, as required. Notice that (A� �21)=(4 �4) is a vector of non-positivenumbers. From the geometry of expression (24) it is clear that all eigenvalues of Q must lie in awedge in the �rst quadrant. If this fails to happen, we can increase �2. It this fails, we can setthe negative elements of L to zero. Albeit crude, the strategy will always improve upon p2(�).

29