esnoi agristars sr-ll-04031 (tft // a?o/> · sensing program. under contract nas 9-15800, dr. c....

AgRISTARS

3 Supporting Research

EsnoiSR-Ll-04031 (tft // A?O/>JSC-16853 '» " '^0737

JAN 2 1 1981

A Joint Program forAgriculture andResources InventorySurveys ThroughAerospaceRemote Sensing

MAXIMUM LIKELIHOOD CLUSTERING WITH DEPENDENTFEATURE TREES

C. B. Chittineni

(E81-10186) M A X I M U M LIKELIHOOD CLUSTEBIMG N81-29502MITH D E P E N D E N T F E A T U R E TREES (LockheedEngineering and Management) 54 pHC A04/f lF AQ1 CSCL 12A Onclas

V. G3/43 OQ1B6

Lockheed Engineering and Management Services Company, Inc.1830 NASA Road 1, Houston, Texas 77058

NASA4?

O

X

i«

*j

Lyndon B. Johnson Space CenterHouston, Texas 77058

https://ntrs.nasa.gov/search.jsp?R=19810020964 2018-08-30T09:29:33+00:00Z

1 Report No

SR-L1-04031, JSC-168532 Government Accession No 3 Recipient's Catalog No

4 Title and Subtitle

Maximum Likelihood Clustering With Dependent Feature Trees

5 Report Date

January 19816 Performing Organization Code

7 Author(s)

C. B. ChittineniLockheed Engineering and Management Services Company, Inc.

8 Performing Organization Report No

LEMSCO-1568310 Work Unit No

9 Performing Organization Name and Address

Lockheed Engineering and Management Services Company, Inc.1830 NASA Road 1Houston, Texas 77058

11 Contract or Grant No

NAS 9-15800

12 Sponsoring Agency Name and Address

National Aeronautics and Space AdministrationLyndon B. Johnson Space CenterHouston, Texas 77058 (Technical Monitor Dr. R. Heydorn)

13 Type of Report and Period Covered

Technical Report14 Sponsoring Agency Code

15 Supplementary Notes

16 Abstract

In this report, maximum likelihood clustering for the decomposition of mixture densityof the data into its normal component densities is considered. The densities areapproximated with first-order dependent feature trees using criteria of mutual informa-tion and distance measures. Expressions are presented for the criteria when the densitiesare Gaussian. By defining different types of nodes in a general dependent feature tree,maximum likelihood equations are developed for the estimation of parameters using fixed-point iterations. The field structure of the data is also taken into account in develop-ing maximum likelihood equations. Furthermore, experimental results from the processingof remotely sensed multispectral scanner imagery data are presented.

17 Key Words (Suggested by Author(s))

Clustering, distance measures, dependentfeature trees, fields, fixed-point iterationschemes, link, maximum likelihood equations,mutual information, parameter estimation,types of nodes

18 Distribution Statement

19 Security Qassif (of this report)

Unclassified20 Security Qassif (of this page)

Unclassified21 No of Pages

54

22 Price'

'For sale by the National Technical Information Service. Springfield, Virginia 22161

NASA — jsc

SR-L1-04031JSC-16853

MAXIMUM LIKELIHOOD CLUSTERING WITH

DEPENDENT FEATURE TREES

Job Order 73-306

This report describes Classification activities of theSupporting Research project of the AgRISTARS program.

PREPARED BY

C. B. Chittineni

APPROVED BY

T. C. Minter, SupervisorTechniques Development Section

U. E. Wainwright, ManagerDeve/Vopment and Evaluation Department

LOCKHEED ENGINEERING AND MANAGEMENT SERVICES COMPANY, INC.

Under Contract NAS 9-15800

For

Earth Resources Research Division

Space and Life Sciences Directorate

NATIONAL AERONAUTICS AND SPACE ADMINISTRATIONLYNDON B. JOHNSON SPACE CENTER

HOUSTON, TEXAS

January 1981

LEMSCO-15683

PREFACE

The techniques which are the subject of this report were developed to supportthe Agriculture and Resources Inventory Surveys Through Aerospace RemoteSensing program. Under Contract NAS 9-15800, Dr. C. B. Chittineni, a principalscientist for Lockheed Engineering and Management Services Company, Inc.,performed this research for the Earth Resources Research Division, Space andLife Sciences Directorate, National Aeronautics and Space Administration, atthe Lyndon B. Johnson Space Center.

PAGE BUM NO

CONTENTS

Section ,, Paqe

1. INTRODUCTI ON 1-1

2. GENERAL MAXIMUM LIKELIHOOD EQUATIONS 2-1

3. APPROXIMATING PROBABILITY DENSITY FUNCTIONSWITH DEPENDENT FEATURE TREES 3-1

3.1 CONSTRUCTION OF OPTIMAL DEPENDENT FEATURE TREES 3-1

3.1.1 A CRITERION BASED ON INFORMATION MEASURE 3-2

3.1.2 A CRITERION BASED ON PROBABILISTICDISTANCE MEASURES 3-3

3.2 EXPRESSIONS FOR THE CRITERIA WHEN THEDISTRIBUTIONS OF THE FEATURES ARE GAUSSIAN 3-4

3.2.1 AN EXPRESSION FOR THE MUTUAL INFORMATIONBETWEEN FEATURES xn AND Xj 3-5

3.2.2 AN EXPRESSION FOR Aj^x^x.) WHEN FEATURES x,

AND Xj ARE NORMALLY DISTRIBUTED 3-6

4. A GENERAL DEPENDENT FEATURE TREE 4-1

4.1 DIFFERENT TYPES OF NODES 4-1

4.2 AN EXPRESSION FOR THE COVARIANCE BETWEEN THE FEATURESCONNECTED BY A PATH IN A DEPENDENT FEATURE TREE 4-3

5. MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF THE DENSITY FUNCTIONS 5-1

5.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE A PRIORIPROBABILITIES. MEANS, AND VARIANCES 5-3

5.1.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF TYPE I NODES 5-3

5.1.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF TYPE II NODES 5-5

5.1.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF TYPE III NODES 5-6

P&ECEDiNG PAGE BLANK NOT FILMED

vn

Section Page

5.1.4 MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF TYPE IVA NODES 5-7

5.1.5 MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF TYPE IVB NODES 5-7

5.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THECOVARIANCES BETWEEN FEATURES. 5-8

5.2.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THECOVARIANCE BETWEEN TYPE I AND TYPE II NODES 5-8

5.2.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCEBETWEEN TYPE IVA AND TYPE II OR TYPE III NODES 5-9

5.2.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCEBETWEEN TYPES OF NODES OTHER THAN THOSE CONSIDEREDIN SECTIONS 5.2.1 AND 5.2.2 5-10

6. EXPERIMENTAL RESULTS 6-1

7. CONCLUDING REMARKS 7-1

8. REFERENCES 8-1

Appendix

A. DERIVATIONS OF MAXIMUM LIKELIHOOD EQUATIONSFOR THE PARAMETERS OF A TYPE III FEATURE A-l

B. MAXIMUM LIKELIHOOD EQUATIONS WITH FIELD STRUCTURE B-l

C. DEPENDENT FEATURE TREES WITH THE NODESREPRESENTING FEATURE SUBSETS C-l

vn i

FIGURES

Figure Page

3-1 An exampl e of a dependent feature tree 3-1

4-1 A general dependent feature tree 4-1

4-2 Illustration of a type IVa node 4-2

4-3 Illustration of a type IVb node 4-2

4-4 An example dependent feature tree 4-3

4-5 A path between features x^ and xr+s in a dependentfeature tree 4-4

4-6 A path between features xj and xr in a dependentfeature tree 4-5

5-1 Reduction in the number of parameters withthe dimensional ity 5-2

5-2 A link in a dependent feature tree with a type I node 5-4

5-3 A typical type II node in a generaldependent feature tree : 5-5

5-4 A typical type III node in a generaldependent feature tree 5-6

5-5 A typical type IVa node in a generaldependent feature tree 5-7

5-6 A typical type IVb node in a generaldependent feature tree 5-8

5-7 A typical link connecting type I and type II nodes 5-9

5-8 A typical link connecting type IVa and type IIor-type III nodes in a general dependent feature tree 5-9

6-1 Optimal dependent feature tree of segment 1648 6-1

6-2 Optimal dependent feature tree of segment 1739 6-3

6-3 Arbitrary dependent feature tree used in the experiment 6-4

A-l Illustration of a typical type III node in a generaldependent feature tree A-l

IX

1. INTRODUCTION

Recently, considerable interest has been shown in developing techniques for theclassification of imagery data (such as remotely sensed multispectral scannerdata acquired by the Landsat series of satellites) for inventorying naturalresources, monitoring crop conditions, and detecting changes in natural andmanmade objects. Nonsupervised classification or clustering techniques havebeen found to be effective in the analysis of remotely sensed data (ref. 1).The approach of clustering for imagery data classification, in general,involves two steps: (1) partitioning the image into its inherent modes or intoits homogeneous parts and (2) labeling the clusters using information from agiven set of labeled patterns.

In practical applications of pattern recognition such as remote sensing, it isdifficult to obtain labels for the patterns. In remote sensing imagery, ananalyst-interpreter provides the labels for the picture elements (pixels) byexamining imagery films and using other information (e.g., crop growth stagemodels and historic information). Remote sensing imagery usually has a fieldstructure, and it is recognized that fields are easier to label than arepixels. The development of algorithms for locating fields has attracted theattention of several researchers in the recent literature (refs. 2-5).

Considerable interest has been shown in applying maximum likelihood equationsfor the decomposition of the mixture density of the imagery data into itsnormal component densities (refs. 5-9). Recently, methods have been developed(refs. 10, 11) for probabilistically labeling the modes of the data usinginformation from a given set of labeled patterns and, also, from a given set oflabeled fields.

In decomposing the mixture density of the data into its normal component densi-

ties, the parameters of the component densities and the a priori probabilitiesof the modes are iteratively computed using maximum likelihood equations coupledwith a split and merge sequence. The updating of the parameters is usuallystopped after a few iterations because of the large amount of computation.

1-1

Also, in practical problems (remote sensing imagery data of several acquisi-tions), a large number of parameters will be estimated. For a fixed samplesize, the accuracy of estimation usually decreases (ref. 12) as the number ofparameters to be estimated increases. To overcome the computational require-ments and the large number of parameters to be estimated with the usual maximumlikelihood clustering technique, maximum likelihood equations are obtained inthis report by approximating the cluster conditional densities with first-ordertree dependence (refs. 13, 14) among the features. The field structure of thedata is also taken into account. Either the average mutual information betweenthe features (ref. 13) or the probabilistic distance measures (ref. 15) can beused to construct optimal dependent feature trees for a given data type.

This paper is organized as follows. General maximum likelihood equations arepresented in section 2. Section 3 concerns the problem of approximating proba-bility density functions with dependent feature trees using the criteria ofinformation measure and probabilistic distance measure. Expressions arederived for the criteria when the distributions of the features are Gaussian.In section 4, a general dependent feature tree and its various types of nodesare described, and expressions for the covariance between the features notconnected by a single link are derived. Maximum likelihood equations for theparameters of the density functions when approximated by dependent featuretrees are developed in section 5. Experimental results from the processing ofremotely sensed multispectral scanner imagery data are presented in section 6.Section 7 contains the concluding remarks. Detailed derivations of maximumlikelihood equations are given in appendix A. In appendix B, the field

structure of the data is taken into account in developing maximum likelihoodequations. An expression is derived in appendix C for the mutual informationbetween the feature subsets when they are represented by the nodes in a

dependent feature tree. Also, expressions are derived for the covariancebetween the feature subsets when they are connected by a path in a dependentfeature tree.

1-2

2. GENERAL MAXIMUM LIKELIHOOD EQUATIONS

General maximum likelihood equations are presented in this section for the

decomposition of the mixture density of the data into its component densities.

It is assumed that a set •¥? = {X^,»»»,XN} of N unlabeled patterns, each ofdimension n, is given. These patterns are assumed to be drawn independently

from the mixture density

mp(X|e) = 2 p(X,u> e )P(u> ) (2-1)

j=l J J J

where 9 is a fixed but unknown parameter vector, 64 is a parameter vector for

the j^n cluster, and m is the number of modes or clusters in the data. Let

P(u>.:) and p(X|aj-i) be the a priori probabilities of the modes and mode condi-j Jtional densities, respectively. The likelihood of the observed pattern vectorsis, by definition, the joint density

NP(*|e) = II p(Xje) (2-2)

Since the logarithm is a monotonic function of its argument, taking thegradient of the logarithm of equation (2-2) with respect to 9^ results in

N

V=m

P(X. |ve )p(o>1=1 " " ^

(2-3)

Nwhere * = £ log[p(X. |9] (2-4)

k=l k

and VQ a is the gradient of fc with respect to 9^ From the Bayes rule, the

a posteriori probability can be written as

p(X kK,e.)P(u>.)k 1

2-1

If the elements of 6n- and 6,- are assumed to be functionally independent, using

equation (2-5) in equation (2-3) yields

J, )V*= ? pfaJV^V Io9[p(xi<lvei)p(a)i)]i (2-6)

I K J . 1

The following likelihood equation for the a priori probabilities can easily be

obtained from equation (2-6) by introducing Lagrangian multipliers to take into

account the probability constraints on P(u-j).

i N1 V* ~f .. Iv a\ (2-7}

Since 8.J is a parameter vector of the density of the i*n cluster,equation (2-6) can be written as

N ,V* = ? P(^|Xk,6)VQ jlog[p(Xk|o)i,0i)] (2-8)

—

N

I K — J . 1

From equation (2-8), general maximum likelihood equations for the parameters of

the cluster conditional densities can be obtained.

2-2

3. APPROXIMATING PROBABILITY DENSITY FUNCTIONS WITH DEPENDENT FEATURE TREES

If the probability density function of the itn class is approximated by afirst-order dependent feature tree, it can be written as

where x is the m feature of pattern vector X; (m, , •••, m ) is an unknownm t » i npermutation of integers 1, 2, •••, n; and p(x |Xg), by definition, is equal top(x.j). Each variable in the above expansion may be conditioned upon, at most,one of the other variables. Figure 3-1 shows an example of a dependent feature

tree.

X2

Figure 3-1.- An example of a dependent feature tree.

The component of the density in the product approximation that is representedby a single link, such as the one connecting features x^ and XQ in figure 3-1,is p(xg|xg). The density that is approximated by the dependence tree offigure 3-1 can be written as

P(X) = P(x1)p(x2|x1)p(x3|x2)p(x4|x2)p(x5|x1)p(x6|x5)p(x7|x5)p(x8|x5) (3-2)

3.1 CONSTRUCTION OF OPTIMAL DEPENDENT FEATURE TREES

This section concerns the problem of constructing dependent feature trees. Thedependent feature tree, the density of which best approximates the true density,is proposed to be constructed using either the criterion of information preser-vation (ref. 13) or the criterion of class separability (ref. 15). An algorithmdeveloped by Kruskal (ref. 16) provides an efficient computational procedure forconstructing optimal dependent feature trees using the expressions developed inthe following.

3-1

3.1.1 A CRITERION BASED ON INFORMATION MEASURE

Let P-,t(X) be the approximate density of theapproximation. That is,

class with the product

n- n m, m. (3-3)

Consider the following measure of closeness between the true and approximatedensities (ref. 13). That is,

I(p,Pt)= (3-4)

where C is the number of classes. From equation (3-4), it is seen that

Hp.Pt) = 0 whenever p^X) is equal to p-jt(X) for all X and that I(p,pt) > 0 ifP-,(X) is different from P-,^(X) for some X. To find the product approximationfor the densities or the dependent feature tree that minimizes I(consider

Kp.Pt) = - E P( ) f Pi1 1=1 i J i

C£

E

where

=J Pi^^j

and E UxJ4=1 *

(X)log[p(X)]dX

+ E PK) f P1=1 n J ^

XjlogCp (X)]dXn

(3-5)

(3-6)

3-2

The quantity I [x ,x , »] is the mutual information between features xp andT * J v* /xi(£) of c^ass ""• From equation (3-5), Kruskal's algorithm (ref. 16) can beefficiently used to construct optimal dependent feature trees.

3.1.2 A CRITERION BASED ON PROBABILISTIC DISTANCE MEASURES

A probability density function, like any other function, can be approximated bya number of different procedures. In the sense of preserving the separabilitybetween the classes, it is proposed that a criterion based on probabilisticdistance measures such as divergence be used to construct dependent featuretrees. The measure of closeness between the approximate and true densities isdefined as

'12 ogPlt(xy

dX + (3-7)

From equation (3-7), it is seen that J^ is large whenever the ratio of

to P2t(x) 1S Iar9e in the region of class 1 and the ratio of P2t(x) tois large in the region of class 2. By using the product approximation of equa-

tion (3-3) for the densities p^(X), equation (3-7) can be written as follows.

'12

n f+ .E j p2(

x)]

. £ fp rx ,x llogpfc

mj(T)

(3-8)

where

and

K = Ms)'dxm.

Ms)'(S)

dxmi

(3-9)

(3-10)

If more than two classes exist, the expected value of the measure of closeness

defined over pairs of classes can be used to obtain optimal approximations for

the densities (ref. 17). From equation (3-8), Kruskal's algorithm (ref. 16)

can be efficiently used to construct optimal dependent feature trees.

3.2 EXPRESSIONS FOR THE CRITERIA WHEN THE DISTRIBUTIONS OF THE FEATURES AREGAUSSIAN

Expressions are derived in this section for the mutual information and for AJ,?between the features, assuming that the distributions of the features are

Gaussian. If Pn(x,-)» the density of feature x., of the Jr" class, is Gaussian,

it can be written as

or it is denoted as P^(X-J) ~ N[u.(£),o.(£)]. The joint and conditional

densities of features x,- and x,- of the £tn class can be written as follows.1 J

0-

exp - \ qt(Xl .X.,)] (3-12)

where

Vvxj) = ^.\

(3-13)

3-4

(3-14)

and 0-,,-U) is the covanance between features xn- and x, of the £ class. Fromequations (3-11) and (3-12), the conditional density can be written as

1

.,.

where

(3-15)

(3-16)

3.2.1 AN EXPRESSION FOR THE MUTUAL INFORMATION BETWEEN FEATURES x1 AND x.

In this section, an expression is derived for the mutual information betweenGaussian-distributed features x1 and Xj of class £. From equations (3-11) and(3-15), the following can easily be obtained. Consider

f ? 1= 1 - P (A)L iJ J

(3-17)

From equation (3-17), the mutual information between features xn- and x, ofclass £ can be obtained as follows.

(3-18)

3-5

3.2.2 AN EXPRESSION FOR AJ1 9(x. ,x.) WHEN FEATURES x, AND x, ARE NORMALLYnrcTDiRiiTcn *•<- ' J JDISTRIBUTED

From equations (3-7) and (3-9), when features x^ and Xj are normallydistributed, an expression for AJ^x ,x-) can be easily obtained as follows

Ifo (l)o (2) - 2o ( l )o (2) + o ( l )o (2) a (2)o (1) - 2a (l)o (2) + a (2)o (1) ]0.1 /„ v \ - / l J _ J _ 'J IJ _ ] _ J . J _ ] _ ' J 'J _ 1 _ J O I2Aj12(xi'xj) ' - A^TTf - - + 4^T2T -- 2J

",(2)

(3-19)

where A.^l) = a.(l)oj(l) - <%.(!) (3-20)

3-6

4. A GENERAL DEPENDENT FEATURE TREE

A general dependent feature tree is shown in figure 4-1.

*18

x4

x12

Figure 4-1.- A general dependent feature tree.

Each node of the tree represents a feature, and the feature numbers are givenin figure 4-1. In approximating the probability density functions with depend-ent feature trees, each feature may be conditioned upon, at most, one of theother features. Node Xj is the root node of the tree. Nodes x2, x4, x5, x7,etc., are nodes on the periphery of the tree.

4.1 DIFFERENT TYPES OF NODES

For convenience in the following analysis, the nodes of the dependent featuretree are divided into the following types.

1. Type I nodes: Except for the root node, nodes on the periphery of the treeare defined as type I nodes. For example, in figure 4-1, nodes x2, x^, x5,xy, etc., are type I nodes.

4-1

2. Type II nodes: These are nodes which are one node deep from the periphery.

For example, in figure 4-1, nodes x6, XIQ, x15, x17, x^g, etc., are type II

nodes.

3. Type III nodes: These are nodes which are at least two nodes deep from theperiphery. For example, in figure 4-1, nodes X3, XQ, xg, x^, etc., aretype III nodes.

4. Type IV nodes: The root node of the tree is defined as a type IV node.The types of root nodes are divided into type IVa and type IVb. Examplesof the types of root nodes are described in the following.

a. Type IVa node: The type IVa node is the root node of a tree with asingle link. As an example, node x of figure 4-2 is a type IVa node.

xl

Figure 4-2.- Illustration of a type IVa node.

b. Type IVb node: The type IVb node is the root node of a tree with twoor more links. As an example, node x^ of figure 4-3 is a type IVbnode.

Figure 4-3.- Illustration of a type IVb node.

It is noted that the type IVb node is different from the type IVa node in thatmore than one node links directly with the root node of the tree.

4-2

4.2 AN EXPRESSION FOR THE COVARIANCE BETWEEN THE FEATURES CONNECTED BY A PATHIN A DEPENDENT FEATURE TREE

An expression for the covariance between the features when a path connects

their representative nodes in a dependent feature tree is developed in thissection. For example, features X-Q and x^g are connected through features X

Xg, xg, x^, and x^ in the dependent feature tree of figure 4-1. For the

following analysis, consider the dependent feature tree shown in figure 4-4.

X3

Figure 4-4.- An example dependent feature tree.

The probability density represented by the dependent feature tree of figure 4-4

can be written as follows.

P(X) = P(x1)p(x2|x1)p(x3|x2)p(x4|x2)p(x5|x2)p(x6|x5)p(x7|x4) (4-1)

In the following, an expression for the covariance between features x6 and x7of figure 4-4 is derived.

E[(xg - u6)(x7 - u?)] =f (x6 - u6)(x? - uy)p(X)dX

=1 (x6 - U6)p(x2)p(x5|x2)p(x6|x5)dx2 dx5 dxg

•J p(x4|x2)dx4J (x7 - U7)p(x7|x4)dx?

From equation (3-15), the following equations are obtained.

(4-2)

rI (xy - U7)p(x7|x4)dx? = - u4)

/<*4 - »,

(4-3)

(4-4)

4-3

Using equations (4-3) and (4-4) in equation (4-2) yields

E[(x6 - u6)(x7 - u?)] - P74 P42y /(*2 - U2)p(x2)dx2

Jp(x5|x2)dx5 J (xg - U6)p(x6|x5)dx6 (4-5)

Similar to equations (4-3) and (4-4), which were developed from equation (3-15),

the following are obtained.

(x6 - U6)p(x6|x5)dx6 = p65 — (x5 - u5)b

J

- U5)p(x5|x2)dx5 = p

- U2)p(x2)dx2 =

(x2 - U2)(4-6)

From equations (4-5) and (4-6), the covanance between features Xg and

be obtained as follows.

n °74 °42 °65- U7>1 = •^•-^• '52E[(x6 - u6)l

For a general case, the following theorems can easily be established.

can

(4-7)

Theorem 1: Suppose the features x^ and xr+s in a dependent feature tree are

connected by a path as shown in figure 4-5. Then, the covariance between

features x-^ and xr+s is given by equation (4-8).

xr+lxr+2

/r+s-1

V+s

Figure 4-5.- A path between features xj and xr+sin a dependent feature tree.

4-4

°23 Vl,r Vs,r+s-l

V "' —r+s-l,r+s-2 r+2,r+l ,. ON—' r-2 * o i (4-8)°r+s-2 Vl r 1>r

Theorem 2: Suppose the features xj and xr in a dependent feature tree are

connected by a path as shown in figure 4-6. Then, the covariance between

features x^ and xr is given by equation (4-9).

x1 x2 x3 x4 xr_! xr

Figure 4-6.- A path between features x^ and xrin a dependent feature tree.

°1? °?^ a14 °r 9 r 1Ul)(xr - ur)] . Ji . Ji . Ji ... -SiLl • or.1>r (4-9)

4-5

5. MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF THE DENSITY FUNCTIONS

In this section, maximum likelihood equations are developed for estimating theparameters of the cluster conditional densities when approximated by the first-order dependent feature trees. In practice, such as in the classification ofremotely sensed multispectral scanner imagery data, considerable interest hasbeen shown in applying maximum likelihood clustering for the decomposition ofthe mixture density of the data into its normal component densities. Themixture density p(X) can be written as

(5-1)P(X) = L P( )P(X| )

where m is the number of clusters, and P(w-j) and p(X|w1-) are the a priori proba-bilities of the modes and mode conditional densities, respectively. If thecluster conditional densities are Gaussian [i.e., p(X|u>.) ~N(U.,£ )], by usinga given set of N independent observations from the mixture density, from equa-tion (2-6), the maximum likelihood equations for the estimates of the parametersof the mixture density can easily be shown to be the following (ref. 6).

^u = IX=1ui N(5-2)

k=l

(xk - u.)(xk - ul)TP(u)l|xk)

N

k=

In maximum likelihood clustering, equation (5-2) is used for updating the

parameters of the densities, and this computation is coupled with a split andmerge sequence. The updating is usually stopped after a few iterations becauseof the large amount of computation in clustering data such as imagery data.

5-1

For practical problems, the number of parameters to be estimated is large.Using equation (5-2), the number of parameters to be estimated for each mode is

3', where n is the dimensionality of the patterns. It is known that,with a fixed sample size, the accuracy of estimation usually decreases as the

number of parameters to be estimated increases (ref. 12).

In this paper, the cluster conditional densities are approximated with first-order dependent feature trees to reduce the number of parameters to beestimated. In the product approximation for the densities discussed in the

previous sections, it is noted that each feature is conditioned upon, at most,one of the other features. The number of parameters to be estimated for eachmode is obtained as follows: the means n, the variances n, and the covariances(n - 1), or a total of (3n - 1), where n is the dimensionality of the patterns.When the product approximation is used for the probability densities, with anincrease in the dimensionality of the patterns, the reduction in the number ofparameters for each mode is as shown in figure 5-1.

O).0

c toi.

OJ <DJC 4->4J QJ

C ro

toC Q.O•- M-4-> OO3

XJ0)

t.80

.60

.40

.20

01 11 21 31 41

n dimensionality —*•

Figure 5-1.- Reduction in the number of parameters with the dimensionality.

In the following, maximum likelihood equations are developed for estimating theparameters of the cluster conditional densities when approximated with first-order dependent feature trees. It is assumed that the structure of the depend-ent feature tree is determined using the techniques discussed in section 3.The different types of nodes described in section 4 are considered separately.

5-2

5.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE A PRIORI PROBABILITIES. MEANS. ANDVARIANCES

In this section, maximum likelihood equations similar to equation (5-2) are

obtained for the a priori probabilities of the clusters and for the means andvariances of features in each cluster when the cluster conditional densitiesare approximated with dependent feature trees. The different types of nodesdiscussed in section 4 are treated separately. It is assumed that a set-X- = {X,,"«,XN> of N unlabeled patterns, each of dimension n, drawn independ-ently from the mixture density p(X) is given. When the cluster conditionaldensities are approximated by first-order dependent feature trees, the densityof the itn cluster can be written as

1=1The maximum likelihood equations for the a priori probabilities of the clusterscan easily be shown to be the following.

PK) =4 E pK-K) (5-4)1 N k=l n K

If e.j is a parameter of the ith cluster, using equation (5-3) in equation (2-8)results in

(5-5)36.

In the following, it is assumed that the distributions of pattern features ineach cluster are Gaussian. That is,

P ) ~N[U(i),a(i)] (5-6)

5.1.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE I NODES

Consider a link in a dependent feature tree containing a type I node, as shownin figure 5-2.

5-3

X2X3

Figure 5-2.- A link in a dependent feature treewith a type I node.

Since each feature is conditioned upon, at most, one of the other features,

equation (5-5) becomes

U-'f; P<»,IV»if-|109Cl>i(x2'x3>]lI K J . 1

for = u3(i)

and ^ = o3(i) (5-7)

When A = u3(i), from equation (5-7), the following is obtained.

N

(5-8)

k=l

In equation (5-7), letting <t>1 = 03(1) and $. = o23(i) and eliminatingf k "I2X3 ~ U3^M ^rom t'ie resultin9 equations yields, after simplification, anexpression for the covariance between type I and type II nodes. That is,

E(5-9)

c=l ' "

Letting A^ = a3(i) in equation (5-7) and using equation (5-9) yields the

following.

V /

5-4

5.1.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE II NODES

A typical type II node, as defined in section 4, in a general dependent featuretree is shown in figure 5-3.

-•4-

Figure 5-3.- A typical type II node in ageneral dependent feature tree.

In figure 5-3, node xs is a type II node, and nodes x,, x,,, •••, x are type I

nodes. The terms in the product approximation of the probability density

function of cluster i containing feature x$ are

un) = ••• P(xs|xt)p(x1|xs) ••• P(xp|xs) ••• (5-11)

If 61 is a parameter of the ith cluster, using equation (5-11) in

equation (2-8) yields

-= E PKK.ei k=l 39.

4=1Pi(x4|xs) + P1(xs|xt) = 0 (5-12)

From equation (5-12), letting 91 = us(i) results in the following maximum

likelihood equation for the mean of feature xs of cluster i.

Uc(l) =

where

£

1) -as£(i)

(5-13)

(5-14)

5-5

In equation (5-12), letting 6i = 0s(i), °si(1)» "*» Osr^^' and °st yie1ds

the following after simplification.

(5-15)

5.1.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE III NODES

A typical type III node in a general dependent feature tree is shown infigure 5-4.

Figure 5-4.- A typical type III node in ageneral dependent feature tree.

In figure 5-4, xs is a type III node; and nodes x,, x2, •••, xr and x^ may betype III nodes or other types of nodes. Proceeding as in section 5.1.2, themaximum likelihood equations for the variance and mean of feature xs ofcluster i can be shown to be the following (see appendix A).

as(i)• •!<•»

r-f o (i)o (i)[g .,;«)4o

(5-16)

5-6

and

k=p(w i lxk>8)xs r V1'

k=lP ( " | x . e )

(5-17)

5.1.4 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE IVA NODES

A typical type IVa node in a general dependent feature tree is shown infigure 5-5.

xl X2 X3

Figure 5-5.- A typical type IVa node in ageneral dependent feature tree.

In figure 5-5, xj is a root node of type IVa, and node /2 may be of type I,type II, or type III. The maximum likelihood equations for the variance andmean of feature Xj of cluster i are given in the following.

and

k=p(u»i|xk,e)

=1p(u1|X|(f8)

(5_18)

(5-19)

5.1.5 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE IVB NODES

A typical type IVb node in a general dependent feature tree is shown infigure 5-6.

5-7

xr+l

Figure 5-6.- A typical type IVb node in ageneral dependent feature tree.

In figure 5-6, xs is a root node of type IVb, and nodes x,, x2, •••, xp are of

type I, type II, or type III. The maximum likelihood equations for the

variance and mean of feature xs of cluster i can be shown to be the following.

(i) =k=l

(r- - ua(i)

1=1

k*

N

k=l- u(1)

and(5-20)

-.(i) (5-21)

5.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCES BETWEEN FEATURES

In this section, maximum likelihood equations are developed for the covariances

between the features when the probability density functions of the clusters areapproximated by first-order dependent feature trees.

5.2.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCE BETWEEN TYPE I ANDTYPE II NODES

In this section, maximum likelihood equations for the covariance between type I

and type II features are derived. A typical link connecting type I and type II

nodes is shown in figure 5-7.

5-8

xr xs

Figure 5-7.- A typical link connecting type Iand type II nodes.

In figure 5-7, node xs is of type I, and node xr is of type II. The maximumlikelihood equation for the covariance between features xr and xs of cluster iis given in the following.

E P(«l|Xk.e)[x!;.ur(1)][xJ.us(1)]orsd) - op(1) —T, -, (5-22)

E PKlXk,e

5.2.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCE BETWEEN TYPE IVA ANDTYPE II OR TYPE III NODES

A typical link connecting type IVa and type II or type III nodes in a generaldependent feature tree is shown in figure 5-8.

xl

Figure 5-8.- A typical link connectingtype IVa and type II or type III nodesin a general dependence tree.

In figure 5-8, node xj is of type IVa, and node x2 may be of type II ortype III. The maximum likelihood equation for the covariance betweenfeatures X} and x2 of cluster i is as follows.

E(5.23)

5-9

5.2.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCE BETWEEN TYPES OF NODESOTHER THAN THOSE CONSIDERED IN SECTIONS 5.2.1 AND 5.2.2

Let there be a link between nodes x2 anc' X3 in a 9ener%al dependent feature tree

whose types are other than those considered in sections 5.2.1 and 5.2.2. Themaximum likelihood equation for the covariance between features x2 and X3 ofcluster i is given in the following.

Z p(M1|xk,e) o2(i)o3(i) + o23(i)A23(i) + o23(i)[x2 - u2(i)l[x3 - u3(i)l|'=1 ( ii (5-24)

P(«1|X|C,8)

5-10

6. EXPERIMENTAL RESULTS

In this section, some results from processing remotely sensed Landsatmultispectral scanner imagery data are presented. The images are of a 5- by6-nautical-rmle area called a segment. The image is divided into a rectangulararray of pixels, 117 rows by 196 columns. The image is overlaid with a rectan-gular grid of 209 grid intersections. Two classes are considered: class 1 iswheat, and class 2 is "other." The true (ground truth) labels for the pixelsat the grid intersections are acquired. The locations of the segments and theindividual acquisitions used for each of the segments are listed in table 6-1.The a priori probabilities of the classes are estimated as sample estimates.Equations (3-6) and (3-18) are used to compute the weighted mutual informationbetween the features, assuming in each class that the features are Gaussiandistributed. Kruskal's algorithm (ref. 16) is used to construct optimaldependent feature trees by minimizing I(p,pj.) of equation (3-5). The optimaldependent feature trees of segments 1648 and 1739 are shown in figures 6-1 and6-2, respectively.

X2

X7

Figure 6-1.- Optimal dependent feature treeof segment 1648.

Generally, it is known that, for each acquisition, a strong dependency existsbetween channels 1 and 2 and between channels 3 and 4. From figures 6-1 and6-2, it is seen that these dependencies appear in the optimal dependent featuretrees.

6-1

OL

LL.

OO00

oo

o00UJI—IoQi

Oo<c

oooo

CJ

Q

00LUOi— iOL

oP-H

oo

IVO

CO

depe

nden

t>

tre

eA

rbit

rary

fea

turt

4->COl•a <uc a><u i-CL4->

•o 01,— 3in 4_>E 10r- O)

CLO

4->COJ in

T3 QJc u<U 3Q-J->QJ 10

•O OJC V>-

ari

an

ceIX> u

O -4->0 ro

2U.

sit

ion

3CT

5

Lo

catio

n

+i

iu

j_></) 4->HI V

I— t/>

O1

C

C 4J•- 0)<o ini-

1—

•4->in +jQJ (U

h— in

0c

C *JI- OJ10 inI_

»—

4->l/> 4_)at a>

I— </)

01cC 4->r- (U10 </>i_

1—

4->I/) 4.)OJ 01

I— C/>

0)C

E 4->*- ai10 ini_

1—

t/)a>4->

-O

f«= <->3 <O3 *JJ </>— •»

J

|

J)1)•)

f~ CT> ^~If) O

CT)1 -

KJ- if> •, CM i- lt O

ir> «i- evjIf) CTv

1Cr^

10 01 •OJ •-< O

oo eo r^to vo

to00

CO If) .CJ C3

vo t~- toto ^~

oo00

LT> to •[ PO O

co t? toto i

•3-oo

CO OOt CM , O

ir> o cr>to <o

CVJoo

to ro •CM .-1 O

•d- tn CTI^H tO O

oor^

r^. CA •t t-i 1 O

no cor^ tvj

*}•Ol

oo co •t cvj j o

<T3

u- to or— in CNJ

Big

Sto

ne,

Min

neso

ta

oCMtr>t— t

i— t C\J CTCVJ dJ ,— 1

voLO

r^ to •| CO CM[ O

CO lf> t-<»-l C«J CO

r^to

to •— < •*T CM O

r CM CM.— 1 CO IT)

ento.— i in •

U- r-< O

t-l to O^i-H CM »-)

Ot^

r^ o •t ^»- cMt o

'CM r-.1 oCM CM O

O

to O •| CO CMt C

co to r~-1 CM CM

00to

in o •«• CM O

O CO CT>CM CM O

ooIf)

CO *t •, CO CM§ O

CT) C^CM

If)f^

cr» »^ •[ ^- •-'| O

CO lf>*)• CM

Re

nvi

lle,

Nor

th D

akot

a

sto•— <

to if) en•—I CO O

ooIf)

to oo •CM CM O

CO O OOt— 1 ^- CO

If)to

oo co •CM CM O

CM •-! CM,-H «a- tor^

toO CM •CO CM O

If) CM O

•* If)r^

to «— t •CO CM O

*!• to lf>»— i co o>

oto

OO r~~ •, CM CM, O

OO CT) COCO CM

CT>10

CO <• •CO CM O

OO if) >— i»-H «*• r

If)to

1-00CM i— 1 O

•— 1 Lf) <— 1f-H ^- r— t

CMt~

o oo< co f-i[ o

CT) If)r~ CM•— 1 «— t

i r**

<o4->o<o

• oc«j :i t0 0

CO Z

oo«!•tof-H

CM 1- r~CM If) tOto

toto co •t— 1 r-l O

CT) t CO^H If) CM

CT)to

00 COf— 1 «-! O

rt/1 O* CMI-l If) If)

CT)to

CO r-~ •| CM t-l O

CM If) CMi— 1 If) CT)

tor^

If) CM •CM i-i O

O CO CMCM If) IOr-*

tooo •* •t-H *-H O

r if) CM1— 1 tf) «— 1

CMp^

O CM •CM i-H O

CT) O i-lr-H If) r-

lf>to

CT» •i— i i— i O

co co r^CM to CM

COCT)

*1- «*• •co o

CM 00 CM COCM VO CO toCM i— 1 •— 1 CM

r p . r*» r—

Tet

on,

Mon

tana

CT)COr*-I— 1

—t CO If)!-< VO O

CT)r~

O .-1 •t CM i-l, O

r-. co if)vo vo

CO00

^ 0CM i-l O

i— i 00 i— ii-H tO 00

CO

o to •, CM | O

i-v r~to if>

r .CO

*»• to •CM O

•—i co tr>— I VD O

CDr^

0 i-l| CM i—l, O

f CO If)to to

COCO

^- o •CM i-l O

^- vo if)•—I to O

CT)r-

f^ CO •[ i-l O

*»• CTI t— 1ID CO

CMCT)

r*> ^- •CM O

co r~ coCTl to If)i-l O CM

r p~ P~

</>• IO

I/) COVI CHI 10•z. ^

COIf)00I-l

oor^to

•

0

oIf)CMr**•o

CO*fvf>r-^

•o

I-HtoCT>r^

0

r^^cor^•o

CT)I-HIf)r^

o

COCOCT)to

•

o

*}•If)If)00

•o

ssif

ica

tio

n

*s-O IO

C 3IO OQJ <J£ 10

</)

IO

13

uOJs.i.o

M JD3 10

>*- )C OO !-O O.

10 .0,

6-2

Optimal dependent feature treeof segment 1739.

To investigate the effectiveness of the optimal dependent feature trees inclassification, an experiment is performed to compare the classification accu-racies of the Bayes classifier (1) when the densities are approximated withoptimal dependent feature trees, (2) when no approximation is used for thedensities (full covariance matrix), (3) assuming the features are independent,and (4) when the densities are approximated with arbitrary dependent featuretrees. Spectral vectors of 104 labeled pixels are used as the training patternset, and the spectral vectors of the remaining 105 labeled pixels are used asthe test pattern set. The structure of the arbitrary dependent feature treeused in this experiment is shown in figure 6-3.

The computed confusion matrices and the classification accuracies on the train-ing and test sets for each of the segments processed are listed in table 6-1.From table 6-1, it is seen that, in general, better classification accuracies

6-3

X3

x4

xn-l

Figure 6-3.- Arbitrary dependent feature treeused in the experiment.

are obtained on the training set when the full covariance matrix is used

without approximating the densities. Improved classification accuracies are

obtained on the test set when the densities are approximated with optimal

dependent feature trees. This might be due to the fact that a large number of

parameters are estimated when the full covariance matrices are used.

One of the important objectives in the classification of remotely sensed

agricultural imagery data is to estimate the proportion of the class of

interest in the image. The ratio of the variance of the estimated proportion

using machine classification to the variance of the estimated proportion using

simple random sampling is called variance reduction factor R (ref. 1). The

quantity R can be viewed as an indication of how much the machine classifica-tion improves the proportion estimation. The computed variance reduction

factors for each of the segments processed are listed in table 6-2. From

table 6-2, it is seen that the variance reduction factor consistently improves

when the densities are approximated with dependent feature trees, compared tothe other cases.

6-4

oCJ

QUJa;

OOO

o;<cQ-

oo

CD

cO)-oC OlO) CUQ. L.O) 4->

•XJCU

>> «-l_ 3

1- re4-> CU

s-

4J

0)T3 OJC <D

D-*JMl

•o QJp— 3re 4->E "3•i- QJ

CLO

cQJ (/)

!£CU 3

cu re•o 01C M->— *

QJO

reS- Xre v> j_0 4->O 03

£

u_

cor-

4-> IO

•r- IO

CTO

eC

C »O >^--"^

+J C 4->re 3 re0 O 4->O 0 co

1C71CU

CO

oIDf^.

0

CVJCOIDID

O

O

VOM

O

^J-C3^! HCO

O

*i- vo Ot ID CVJ

i rv. r^

M

cu reC +Jo o

4-> COco cuo> cr- *r-

CO £

oCVJ

1-H

oo r-~ ID oCO VO ^" f^CT^ CT^ CTv ^^

O O O O

ID VO «*• VO^"* O^ ^J ^^*• oo r*. ID00 00 00 VOo o o o

r>- Cfi cvj I-Hco o «*• ovo to cvi r*»Cn Cn Cn *• • • •o o o o

VO ID CVJ <*00 CVJ CO COr~ co ^t- Oo> en c7t oo• • • •o o o o

CO ID CT> ID CVI CO CVJ CO CO^•cv i t ^ c v j c v j v o c o v o eni^r^ 1^1^ t ^ r > - r ~ . v o r»

re reo o

* ^ ^*cu re re

r— o • o rei— C • C•i- .c re j: c re •> 4-> E 4J O 4-> COC t- $ t. 4-> C COCU O O O CU O CU

CC Z CO Z 1— £ Z

^* oo en coO *f CO IDVO VO l~~ OOt-H i-H i-H i-H

1- COVO IDCO CVJt^ VO

COreCOcre*

6-5

7. CONCLUDING REMARKS

In the classification of imagery data, such as in the machine processing ofremotely sensed multispectral scanner data, unsupervised classificationtechniques have been found to be effective. The application of clusteringtechniques for the analysis of imagery data essentially involves two steps:(1) clustering the data or partitioning the image into its inherent modes and(2) giving the probabilistic class labels to the resulting clusters. Inpractice, it is observed that fields are relatively easy to label when compared

to pixels.

Several researchers have investigated methods for locating fields in theimagery data. Recently, considerable interest has been shown in developingtechniques for probabilistically labeling the clusters using information from agiven set of labeled patterns and, also, from a given set of labeled fields.

In decomposing the mixture density of the data into its normal component densi-ties, the parameters of the component densities and the a priori probabilitiesof the modes are iteratively computed using maximum likelihood equations coupledwith a split and merge sequence. The updating of the parameters is usuallystopped after a few iterations; and for practical data, a large number of param-eters must be estimated. For a fixed sample size, the accuracy of estimationusually decreases as the number of parameters to be estimated increases.

To overcome the above shortcomings, it is proposed in this paper that the den-sities be approximated with first-order dependent feature trees. The dependentfeature trees can be constructed using criteria based on information measureand, also, based on class separability measure. Expressions are derived for

the criteria when the distributions of the features are Gaussian. Expressionsalso are derived for the covariances between features not connected by a singlelink in the dependent feature tree.

Different types of nodes are defined in a general dependent feature tree.Maximum likelihood equations are derived for the parameters of the mixture

7-1

density of the data by approximating the cluster conditional densities withfirst-order dependent feature trees. The field structure of the data is alsotaken into account in the decomposition of the mixture density of the data intoits normal component densities. Furthermore, experimental results from theprocessing of remotely sensed multispectral scanner imagery data are presented.

7-2

8. REFERENCES

1. Heydorn, R. P.; Trichel, M. C.; and Enckson, J. D.: Methods for SegmentWheat Area Estimation. Proc. LACIE Symp., NASA/JSC (Houston). JSC-16015,vol. II, July 1979, pp. 621-632.

2. Bryant, J.: On the Clustering of Multidimensional Pictorial Data.Pattern Recognition, vol. 11, no. 2, 1979, pp. 115-125.

3. Kauth, R. J.; Pentland, A. P.; and Thomas, G. S.: Blob: An UnsupervisedClustering Approach to Spatial Preprocessing of MSS Imagery. Proc. llthInt. Symp. on Remote Sensing of Environment. Environmental ResearchInstitute of Michigan (Ann Arbor), 1977, pp. 1309-1317.

4. Kettig, R. L.; and Landgrebe, D. A.: Classification of MultispectralImage Data by Extraction and Classification of Homogeneous Objects. IEEETrans. Geoscience Electronics, vol. GE-14, Jan. 1976, pp. 19-26.

5. Peters, B. C., Jr.: On the Consistency of the Maximum Likelihood Estimateof Normal Mixture Parameters for a Sample With Field Structure. Reportno. 74, Dept. of Math., Univ. of Houston, Sept. 1979.

6. Duda, R. 0.; and Hart, P. E.: Pattern Classification and Scene Analysis.John Wiley and Sons (New York), 1973.

7. Hasselblad, V.: Estimation of Parameters for a Mixture of NormalDistributions. Technometrics, vol. 8, 1966, pp. 431-446.

8. Day, N. E.: Estimating the Components of a Mixture of NormalDistributions. Biometrika, vol. 56, 1969, pp. 463-474.

9. Lennington, R. K.; and Rassbach, M. E.: Mathematical Description and Pro-gram Documentation for CLASSY: An Adaptive Maximum Likelihood ClusteringMethod. Lockheed Electronics Company, Inc., Tech. Memo LEC-12177,JSC-14621, NASA/JSC (Houston), Apr. 1979.

10. Chittineni, C. B.: Some Approaches to Optimal Cluster Labeling ofAerospace Imagery. Lockheed Engineering and Management Services Company,Inc., Tech. Report LEMSCO-14597, JSC-16355, NASA/JSC (Houston), May 1980.

11. Chittineni, C. B.: Probabilistic Cluster Labeling of Imagery Data.Lockheed Engineering and Management Services Company, Inc., Tech. ReportLEMSCO-15358, JSC-16384, NASA/JSC (Houston), Sept. 1980.

12. Hughes, G. F.: On the Mean Accuracy of Statistical Pattern Recognizers.IEEE Trans. Information Theory, vol. IT-14, Jan. 1968, pp. 55-63.

13. Chow, C. K.; and Liu, C. N.: Approximating Discrete Probability Distribu-tions With Dependence Trees. IEEE Trans. Information Theory, vol. IT-14,May 1968, pp. 462-467.

8-1

14. Lewis, P. M.: Approximating Probability Distributions to Reduce StorageRequirements. Information and Control, vol. 2, Sept. 1959, pp. 214-225.

15. Chittineni, C. B.: Efficient Feature Subset Selection With ProbabilisticDistance Criteria. Lockheed Electronics Company, Inc., Tech. MemoLEC-13355, JSC-14870, NASA/JSC (Houston), May 1979.

16. Kruskal, J. B.: On the Shortest Spanning Subtree of a Graph and theTraveling Salesman Problem. Proc. Anerican Math. Soc., vol. 7, 1956,pp. 48-50.

17. Tou, J. T.; and Gonzalez, R. C.: Pattern Recognition Principles.Addison-Wesley Publishing Co., Inc. (Mass.), 1974.

8-2

APPENDIX A

DERIVATIONS OF MAXIMUM LIKELIHOOD EQUATIONS FOR THE

PARAMETERS OF A TYPE III FEATURE

APPENDIX A

DERIVATIONS OF MAXIMUM LIKELIHOOD EQUATIONS FOR THE

PARAMETERS OF A TYPE III FEATURE

In this appendix, maximum likelihood equations for the parameters of a typical

type III feature are derived. A typical type III node in a general dependent

feature tree is illustrated in figure A-l.

X5

Figure A-l.- Illustration of a typical type III nodein a general dependent feature tree.

In figure A-l, x2 is a type III feature. The following is obtained from equa-

tion (5-5) by keeping only the terms that involve feature x2 in the product

approximation of the density of the i^"1 cluster.

3 o7= E PfoJXfc.e) log[p.(x4 |x2)] + 1og[p.(x3|x2)]1 K~ -L I

+ Iog[p1(x2 |x1)] j

Tog[pi(x2,x4)] + Iog[pi(x2,x3)]N

K~l 1

+ log[Pl(Xl,x2)] - 2 log[Pl(x2)] - logCp^Xj)] (A-l)

A-l

It is assumed that the features of the ith cluster are Gaussian distributed.That is,

. rrr iV1^- v""2rs

log[Pl(xr,xs)] = -log(2ir) - \ log[Ars(i)] - l

(.} jas(i)[xr - up(i)]J

where

- 2ors(i)[xr- ur(i)][xs- us(1)] + ar(i)[xs

= or(i)as(i) -

Using equation (A-2) in equation (A-l) yields

where

Jo

.2

Letting 6. = U2(i)» from equation (A-5), the following is obtained.9Ck " "

i) rTT X1) [

2ai(1')+ l-

2a

(A-2)

(A-3)

(A-4)

(i)[xj - U4(

(A-5)

(A-6)

A-2

Substituting equation (A-6) in equation (A-4) and equating the results to zero

yields the following maximum likelihood equation for the mean of feature x2 ofcluster i.

u2(i) •

(A-7)

Letting 8 = o2(i) in equation (A-4) and equating the result to zero yields thefollowing after simplification.

2 °(1)

Similarly, differentiating t with respect to 02j(i) for j = 1,3,4 and equatingthe resulting expression to zero yields

( 2o2 (i) 2o2(i)<j (i) r k ,r . , 2o? (i)a,(i) rt

•'"»••'{- J r - t~ [4 - -.(•)]« - V"] *-SJr5)- [4 • "2

- »r [4 - i')]K - V"] * !5^i K - VA ( l ) l

(A-9)

Using equation (A-9) for j = 1,3,4 in equation (A-8) yields, aftersimplification, the following maximum likelihood equation for

££* [4 - JW -v

(A-10)

A-3

Proceeding, similar to equations (A-9) and (A-10), it can easily be shown thatthe maximum likelihood equations for the mean us(i) and variance os(i) offeature xs of cluster i of figure 5-4 are of equations (5-16) and (5-15).

A-4

APPENDIX B

MAXIMUM LIKELIHOOD EQUATIONS WITH FIELD STRUCTURE

APPENDIX B

MAXIMUM LIKELIHOOD EQUATIONS WITH FIELD STRUCTURE

In practical applications of pattern recognition, such as in the classificationof remotely sensed agricultural imagery data, one of the difficult problems isto obtain labels for the training patterns. The labels for the training pat-terns are usually provided by an analyst-interpreter by examining imagery filmsand using some other information such as historic information and crop calendarmodels. Agricultural imagery data usually have a field-like structure, and itis observed that fields are relatively easy to label when compared to pixels.Recently, considerable interest has been shown in developing techniques forlocating fields in the imagery data (ref. 2-4) and in developing methods for theprobabilistic labeling (refs. 10, 11) of cluster distributions using informationfrom a given set of labeled fields. Once the fields are located by a field-finding algorithm, the problem of fitting a mixture of Gaussian densityfunctions to the data by taking into account the field structure of the data isconsidered in this appendix.

It is assumed that there are f-fields in the data. Let the j1-11 field be denoted

by F,; let it contain N-, pixels; and let X^, k = 1,2,»-«,N., be their spectralj j j jvectors. Let m be the number of clusters in the data. Let P(u.j) and p(X|u>.j) bethe a priori probability that a field belongs to cluster u>1 and cluster condi-tional densities, respectively. Let X- be the concatenated vector of spectralvectors of the pixels in the jth field. That is,

"x.xjl(J2

(B-l)

It is assumed that the fields are independent. Then, the joint density of

f-fields is given by

h.'-.xV) = II pfL) (B-2)1 j=1 J/

B-l

The mixture density p(x" ) can be written as* \J '

m(B-3)

If it is assumed that the spectral vectors of the pixels in each field are

cluster conditionally independent, then

NJ(B-4)

Using equations (B-3) and (B-4), the joint density of equation (B-2) can be

written as follows.

i

-.xf)= n m

£ P(».) kn (B-5)

Since the logarithm is a monotonic function of its argument, taking the log of

both sides of equation (B-5) and denoting it by l results in

-LJ=l

logm

5 p(^"N.j

~

P(x jk|»r) (B-6)

From equation (B-6), which is similar to equation (2-7), the maximum likelihoodequation for the probability that a field belongs to a cluster can easily beobtained as the following.

(B-7)

If Q-j is a parameter of the density function of the i^n cluster, differentiatingi of equation (B-6) with respect to ei yields the following.

ii_36.

i!!i> f _L"i

(B-8)

B-2

If the probability density functions of the clusters are Gaussian [i.e.,

p(X|ui) ~ N(U ,£.)], from equation (B-8), the maximum likelihood equations forthe mean and covanance matrix of the densities of the clusters can be shown tobe the following.

(B-9)- J=J

j=lrN,

and (B-10)

where (B-ll)

If the probability density functions of the clusters are approximated withfirst-order feature dependence trees, maximum likelihood equations for theparameters of the densities (similar to those developed in section 5) caneasily be obtained from equations (B-8) and (3-1) by taking into account thefield structure of the data.

B-3

APPENDIX C

DEPENDENT FEATURE TREES WITH THE NODES

REPRESENTING FEATURE SUBSETS

APPENDIX C

DEPENDENT FEATURE TREES WITH THE NODES

REPRESENTING FEATURE SUBSETS

Very often it is necessary to have each node of a dependent feature treerepresent a set of features instead of one feature. For example, in remotesensing, the satellite makes multiple passes over a given area and, at eachacquisition, gathers several channels of data. In some instances, it isdesirable to have each nede of a dependent feature tree represent a set offeatures (e.g., the set of features corresponding to an acquisition). In thisappendix, expressions are developed for the mutual information between thefeature subsets and for the covariance between the feature subsets when a pathconnects them in a dependent feature tree. It is assumed that the features areGaussian distributed.

Let the components of feature vectors X^ and Xj be the sets of features repre-sented by nodes i and j, respectively. Let n^ and n,- be the dimensionality ofvectors X^ and X,-, respectively. If the feature vector X1 is normally distrib-uted, its probability density p(Xn) can be represented as

p(X.) (C-l)

where Un- is the mean vector and E. is the covariance matrix.

Let Z =

where

xI.xM . Then,\ I J /

p(Z) ~N(Uz,Ez) (C-2)

E =Z

E.1

ET.. ""J

E .U

E .J _

(C-3)

C-l

The mutual information between feature vectors X^ and Xj can be written as

dX

|E' (C-4)

(C-5)

where IE I is the determinant of the matrix EZ. Let y and v be the zero mean

normal random vectors. That is, p(y) ~ N(0,Cy) and p(v) ~ N(0,CV). Let

Z = (yT,vT) and p(Z) ~ N(0,CZ). Let

and

Consider

c =z

_ x

z

\

CTyv

"°y

0Tyv

°yv

Cv

V

°».

|

(

= constant • expi-

(C-6)

(C-7)

where

(C-8)

C-2

Thus, the density p(y|v) is Gaussian with the mean -Q" Qyyv and the covariancematrix Q" . Following a similar argument, it can easily be shown that, if Xn-is normally distributed, p(X.|X.) is normally distributed with the mean

[ -1 »1 J -1Ji ~ Q-i Qii^i ~ UiM and tne covariance matrix Q. . Now expressions for thei 1 ij j j j icovariance between the feature subsets, when a path connects their representa-tive nodes in a dependent feature tree, can be derived as in section 4.2. Forexample, if X4 and X7 are Gaussian random vectors, similar to equation (4-3),the following can easily be obtained.

(Xy - U7)p(X7lX4)dX? = -Q71Q?4(X4 - U4) (C-9)

Thus, expressions similar to equations (4-8) and (4-9) can easily be obtained.

NASA-JSC

C-3

"Maximum Likelihood Clusteringwith Dependent Feature Trees" SUPPORTING RESEARCH

TECHNICAL REPORT

SR-L1-04031JSC-16853January 16, 1980

DISTRIBUTION

o Distribution of this document is limited to those people whose names appear withoutan asterisk. Persons with an asterisk(*) beside their name will receive an abstractonly (JSC Form 1424).

o To obtain a copy of this document, contact one of the following:F. G. Hall - (SG3)J. E. Wainwright - Lockheed Development & Eveluation Department

(626-43) (C09)

SK/R. Erb*J. Powers *R. Hatch (USDA) *W. Stephenson *M. Helfert *F. Barrett (USDA) *L. Chi IdsG. Nixon *

SG/R. MacDonaldSG2/D. Hay

R. MusgroveW. Weimer *D. Frank (USDA) *G. McKain *

SG3/F. HallM. FergusonC. Hall urnR. HeydornA. HoustonD. ThompsonD. PittsA. FeivesonR. JudayM. SteibK. HendersonJ. Dietrich

Oregon State UniversityL. Eisgruber

GMI/R. HolmesData Resources, Inc.

B. ScherrKansa^ Statp University/

E. T. LabE. Kanemasu

SH2/R.D.K.K.N.R.T.C.L.

HillAmsbury *Demel *Hancock *Hatcher *Joosten *PendletonForbes *Bennett *

M.-TrichelSH3/J.

R.R.D.W.G.R.H.F.L.V.K.J.C.G.F.F.R.

DraggEason *BizzellHenningerJones *Kraus *McKinney *Prior *Ravet *Wade *Whitehead *Baker *Carney *DavisGutschewskiHerbert *Johnson *Patterson *

J. Ginns *Tom Barnett/NOAA CEAS *

116 Federal BuildingColumiba, MO 65?01

** LOCKHEED Documents Only

LOCKHEEDC09/J. G. Baron *

M. L. Bertrand *B. L. CarrollP. L. Krumm *T. C. Minter (10)D. E. PhinneyP. C. Swanzy *J. J. Vaccaro *J. E. Wainwright (3)J. L. Hawkins (2)

** Job Order File

** B09/Technical Library (5)ERIM/R. Horvath (4)PURDUE-LARS/M. E. Bauer (4)PURDUE UniversityDr. PaarlbergTAMU/L. F. GusemanUCB/C. M. HayIBM- Houston. TX MC16/

A. Anderson

IBM - Palo Alto, CA/Dr. KolskyUSDA - Houston. TX

G. BoatwrightUSDC - Washington, D.C./

R. AmbroziakUSDC/EDIS/CEAS - Columbia, MO

S. LeDucC. Sakamoto