esnoi agristars sr-ll-04031 (tft // a?o/> · sensing program. under contract nas 9-15800, dr. c....
TRANSCRIPT
AgRISTARS
3 Supporting Research
EsnoiSR-Ll-04031 (tft // A?O/>JSC-16853 '» " '^0737
JAN 2 1 1981
A Joint Program forAgriculture andResources InventorySurveys ThroughAerospaceRemote Sensing
MAXIMUM LIKELIHOOD CLUSTERING WITH DEPENDENTFEATURE TREES
C. B. Chittineni
(E81-10186) M A X I M U M LIKELIHOOD CLUSTEBIMG N81-29502MITH D E P E N D E N T F E A T U R E TREES (LockheedEngineering and Management) 54 pHC A04/f lF AQ1 CSCL 12A Onclas
V. G3/43 OQ1B6
Lockheed Engineering and Management Services Company, Inc.1830 NASA Road 1, Houston, Texas 77058
NASA4?
O
X
i«
*j
Lyndon B. Johnson Space CenterHouston, Texas 77058
https://ntrs.nasa.gov/search.jsp?R=19810020964 2018-08-30T09:29:33+00:00Z
1 Report No
SR-L1-04031, JSC-168532 Government Accession No 3 Recipient's Catalog No
4 Title and Subtitle
Maximum Likelihood Clustering With Dependent Feature Trees
5 Report Date
January 19816 Performing Organization Code
7 Author(s)
C. B. ChittineniLockheed Engineering and Management Services Company, Inc.
8 Performing Organization Report No
LEMSCO-1568310 Work Unit No
9 Performing Organization Name and Address
Lockheed Engineering and Management Services Company, Inc.1830 NASA Road 1Houston, Texas 77058
11 Contract or Grant No
NAS 9-15800
12 Sponsoring Agency Name and Address
National Aeronautics and Space AdministrationLyndon B. Johnson Space CenterHouston, Texas 77058 (Technical Monitor Dr. R. Heydorn)
13 Type of Report and Period Covered
Technical Report14 Sponsoring Agency Code
15 Supplementary Notes
16 Abstract
In this report, maximum likelihood clustering for the decomposition of mixture densityof the data into its normal component densities is considered. The densities areapproximated with first-order dependent feature trees using criteria of mutual informa-tion and distance measures. Expressions are presented for the criteria when the densitiesare Gaussian. By defining different types of nodes in a general dependent feature tree,maximum likelihood equations are developed for the estimation of parameters using fixed-point iterations. The field structure of the data is also taken into account in develop-ing maximum likelihood equations. Furthermore, experimental results from the processingof remotely sensed multispectral scanner imagery data are presented.
17 Key Words (Suggested by Author(s))
Clustering, distance measures, dependentfeature trees, fields, fixed-point iterationschemes, link, maximum likelihood equations,mutual information, parameter estimation,types of nodes
18 Distribution Statement
19 Security Qassif (of this report)
Unclassified20 Security Qassif (of this page)
Unclassified21 No of Pages
54
22 Price'
'For sale by the National Technical Information Service. Springfield, Virginia 22161
NASA — jsc
SR-L1-04031JSC-16853
MAXIMUM LIKELIHOOD CLUSTERING WITH
DEPENDENT FEATURE TREES
Job Order 73-306
This report describes Classification activities of theSupporting Research project of the AgRISTARS program.
PREPARED BY
C. B. Chittineni
APPROVED BY
T. C. Minter, SupervisorTechniques Development Section
U. E. Wainwright, ManagerDeve/Vopment and Evaluation Department
LOCKHEED ENGINEERING AND MANAGEMENT SERVICES COMPANY, INC.
Under Contract NAS 9-15800
For
Earth Resources Research Division
Space and Life Sciences Directorate
NATIONAL AERONAUTICS AND SPACE ADMINISTRATIONLYNDON B. JOHNSON SPACE CENTER
HOUSTON, TEXAS
January 1981
LEMSCO-15683
PREFACE
The techniques which are the subject of this report were developed to supportthe Agriculture and Resources Inventory Surveys Through Aerospace RemoteSensing program. Under Contract NAS 9-15800, Dr. C. B. Chittineni, a principalscientist for Lockheed Engineering and Management Services Company, Inc.,performed this research for the Earth Resources Research Division, Space andLife Sciences Directorate, National Aeronautics and Space Administration, atthe Lyndon B. Johnson Space Center.
PAGE BUM NO
CONTENTS
Section ,, Paqe
1. INTRODUCTI ON 1-1
2. GENERAL MAXIMUM LIKELIHOOD EQUATIONS 2-1
3. APPROXIMATING PROBABILITY DENSITY FUNCTIONSWITH DEPENDENT FEATURE TREES 3-1
3.1 CONSTRUCTION OF OPTIMAL DEPENDENT FEATURE TREES 3-1
3.1.1 A CRITERION BASED ON INFORMATION MEASURE 3-2
3.1.2 A CRITERION BASED ON PROBABILISTICDISTANCE MEASURES 3-3
3.2 EXPRESSIONS FOR THE CRITERIA WHEN THEDISTRIBUTIONS OF THE FEATURES ARE GAUSSIAN 3-4
3.2.1 AN EXPRESSION FOR THE MUTUAL INFORMATIONBETWEEN FEATURES xn AND Xj 3-5
3.2.2 AN EXPRESSION FOR Aj^x^x.) WHEN FEATURES x,
AND Xj ARE NORMALLY DISTRIBUTED 3-6
4. A GENERAL DEPENDENT FEATURE TREE 4-1
4.1 DIFFERENT TYPES OF NODES 4-1
4.2 AN EXPRESSION FOR THE COVARIANCE BETWEEN THE FEATURESCONNECTED BY A PATH IN A DEPENDENT FEATURE TREE 4-3
5. MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF THE DENSITY FUNCTIONS 5-1
5.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE A PRIORIPROBABILITIES. MEANS, AND VARIANCES 5-3
5.1.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF TYPE I NODES 5-3
5.1.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF TYPE II NODES 5-5
5.1.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF TYPE III NODES 5-6
P&ECEDiNG PAGE BLANK NOT FILMED
vn
Section Page
5.1.4 MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF TYPE IVA NODES 5-7
5.1.5 MAXIMUM LIKELIHOOD EQUATIONS FOR THEPARAMETERS OF TYPE IVB NODES 5-7
5.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THECOVARIANCES BETWEEN FEATURES. 5-8
5.2.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THECOVARIANCE BETWEEN TYPE I AND TYPE II NODES 5-8
5.2.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCEBETWEEN TYPE IVA AND TYPE II OR TYPE III NODES 5-9
5.2.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCEBETWEEN TYPES OF NODES OTHER THAN THOSE CONSIDEREDIN SECTIONS 5.2.1 AND 5.2.2 5-10
6. EXPERIMENTAL RESULTS 6-1
7. CONCLUDING REMARKS 7-1
8. REFERENCES 8-1
Appendix
A. DERIVATIONS OF MAXIMUM LIKELIHOOD EQUATIONSFOR THE PARAMETERS OF A TYPE III FEATURE A-l
B. MAXIMUM LIKELIHOOD EQUATIONS WITH FIELD STRUCTURE B-l
C. DEPENDENT FEATURE TREES WITH THE NODESREPRESENTING FEATURE SUBSETS C-l
vn i
FIGURES
Figure Page
3-1 An exampl e of a dependent feature tree 3-1
4-1 A general dependent feature tree 4-1
4-2 Illustration of a type IVa node 4-2
4-3 Illustration of a type IVb node 4-2
4-4 An example dependent feature tree 4-3
4-5 A path between features x^ and xr+s in a dependentfeature tree 4-4
4-6 A path between features xj and xr in a dependentfeature tree 4-5
5-1 Reduction in the number of parameters withthe dimensional ity 5-2
5-2 A link in a dependent feature tree with a type I node 5-4
5-3 A typical type II node in a generaldependent feature tree : 5-5
5-4 A typical type III node in a generaldependent feature tree 5-6
5-5 A typical type IVa node in a generaldependent feature tree 5-7
5-6 A typical type IVb node in a generaldependent feature tree 5-8
5-7 A typical link connecting type I and type II nodes 5-9
5-8 A typical link connecting type IVa and type IIor-type III nodes in a general dependent feature tree 5-9
6-1 Optimal dependent feature tree of segment 1648 6-1
6-2 Optimal dependent feature tree of segment 1739 6-3
6-3 Arbitrary dependent feature tree used in the experiment 6-4
A-l Illustration of a typical type III node in a generaldependent feature tree A-l
IX
1. INTRODUCTION
Recently, considerable interest has been shown in developing techniques for theclassification of imagery data (such as remotely sensed multispectral scannerdata acquired by the Landsat series of satellites) for inventorying naturalresources, monitoring crop conditions, and detecting changes in natural andmanmade objects. Nonsupervised classification or clustering techniques havebeen found to be effective in the analysis of remotely sensed data (ref. 1).The approach of clustering for imagery data classification, in general,involves two steps: (1) partitioning the image into its inherent modes or intoits homogeneous parts and (2) labeling the clusters using information from agiven set of labeled patterns.
In practical applications of pattern recognition such as remote sensing, it isdifficult to obtain labels for the patterns. In remote sensing imagery, ananalyst-interpreter provides the labels for the picture elements (pixels) byexamining imagery films and using other information (e.g., crop growth stagemodels and historic information). Remote sensing imagery usually has a fieldstructure, and it is recognized that fields are easier to label than arepixels. The development of algorithms for locating fields has attracted theattention of several researchers in the recent literature (refs. 2-5).
Considerable interest has been shown in applying maximum likelihood equationsfor the decomposition of the mixture density of the imagery data into itsnormal component densities (refs. 5-9). Recently, methods have been developed(refs. 10, 11) for probabilistically labeling the modes of the data usinginformation from a given set of labeled patterns and, also, from a given set oflabeled fields.
In decomposing the mixture density of the data into its normal component densi-
ties, the parameters of the component densities and the a priori probabilitiesof the modes are iteratively computed using maximum likelihood equations coupledwith a split and merge sequence. The updating of the parameters is usuallystopped after a few iterations because of the large amount of computation.
1-1
Also, in practical problems (remote sensing imagery data of several acquisi-tions), a large number of parameters will be estimated. For a fixed samplesize, the accuracy of estimation usually decreases (ref. 12) as the number ofparameters to be estimated increases. To overcome the computational require-ments and the large number of parameters to be estimated with the usual maximumlikelihood clustering technique, maximum likelihood equations are obtained inthis report by approximating the cluster conditional densities with first-ordertree dependence (refs. 13, 14) among the features. The field structure of thedata is also taken into account. Either the average mutual information betweenthe features (ref. 13) or the probabilistic distance measures (ref. 15) can beused to construct optimal dependent feature trees for a given data type.
This paper is organized as follows. General maximum likelihood equations arepresented in section 2. Section 3 concerns the problem of approximating proba-bility density functions with dependent feature trees using the criteria ofinformation measure and probabilistic distance measure. Expressions arederived for the criteria when the distributions of the features are Gaussian.In section 4, a general dependent feature tree and its various types of nodesare described, and expressions for the covariance between the features notconnected by a single link are derived. Maximum likelihood equations for theparameters of the density functions when approximated by dependent featuretrees are developed in section 5. Experimental results from the processing ofremotely sensed multispectral scanner imagery data are presented in section 6.Section 7 contains the concluding remarks. Detailed derivations of maximumlikelihood equations are given in appendix A. In appendix B, the field
structure of the data is taken into account in developing maximum likelihoodequations. An expression is derived in appendix C for the mutual informationbetween the feature subsets when they are represented by the nodes in a
dependent feature tree. Also, expressions are derived for the covariancebetween the feature subsets when they are connected by a path in a dependentfeature tree.
1-2
2. GENERAL MAXIMUM LIKELIHOOD EQUATIONS
General maximum likelihood equations are presented in this section for the
decomposition of the mixture density of the data into its component densities.
It is assumed that a set •¥? = {X^,»»»,XN} of N unlabeled patterns, each ofdimension n, is given. These patterns are assumed to be drawn independently
from the mixture density
mp(X|e) = 2 p(X,u> e )P(u> ) (2-1)
j=l J J J
where 9 is a fixed but unknown parameter vector, 64 is a parameter vector for
the j^n cluster, and m is the number of modes or clusters in the data. Let
P(u>.:) and p(X|aj-i) be the a priori probabilities of the modes and mode condi-j Jtional densities, respectively. The likelihood of the observed pattern vectorsis, by definition, the joint density
NP(*|e) = II p(Xje) (2-2)
Since the logarithm is a monotonic function of its argument, taking thegradient of the logarithm of equation (2-2) with respect to 9^ results in
N
V=m
P(X. |ve )p(o>1=1 " " ^
(2-3)
Nwhere * = £ log[p(X. |9] (2-4)
k=l k
and VQ a is the gradient of fc with respect to 9^ From the Bayes rule, the
a posteriori probability can be written as
p(X kK,e.)P(u>.)k 1
2-1
If the elements of 6n- and 6,- are assumed to be functionally independent, using
equation (2-5) in equation (2-3) yields
J, )V*= ? pfaJV^V Io9[p(xi<lvei)p(a)i)]i (2-6)
I K J . 1
The following likelihood equation for the a priori probabilities can easily be
obtained from equation (2-6) by introducing Lagrangian multipliers to take into
account the probability constraints on P(u-j).
i N1 V* ~f .. Iv a\ (2-7}
Since 8.J is a parameter vector of the density of the i*n cluster,equation (2-6) can be written as
N ,V* = ? P(^|Xk,6)VQ jlog[p(Xk|o)i,0i)] (2-8)
—
N
I K — J . 1
From equation (2-8), general maximum likelihood equations for the parameters of
the cluster conditional densities can be obtained.
2-2
3. APPROXIMATING PROBABILITY DENSITY FUNCTIONS WITH DEPENDENT FEATURE TREES
If the probability density function of the itn class is approximated by afirst-order dependent feature tree, it can be written as
where x is the m feature of pattern vector X; (m, , •••, m ) is an unknownm t » i npermutation of integers 1, 2, •••, n; and p(x |Xg), by definition, is equal top(x.j). Each variable in the above expansion may be conditioned upon, at most,one of the other variables. Figure 3-1 shows an example of a dependent feature
tree.
X2
Figure 3-1.- An example of a dependent feature tree.
The component of the density in the product approximation that is representedby a single link, such as the one connecting features x^ and XQ in figure 3-1,is p(xg|xg). The density that is approximated by the dependence tree offigure 3-1 can be written as
P(X) = P(x1)p(x2|x1)p(x3|x2)p(x4|x2)p(x5|x1)p(x6|x5)p(x7|x5)p(x8|x5) (3-2)
3.1 CONSTRUCTION OF OPTIMAL DEPENDENT FEATURE TREES
This section concerns the problem of constructing dependent feature trees. Thedependent feature tree, the density of which best approximates the true density,is proposed to be constructed using either the criterion of information preser-vation (ref. 13) or the criterion of class separability (ref. 15). An algorithmdeveloped by Kruskal (ref. 16) provides an efficient computational procedure forconstructing optimal dependent feature trees using the expressions developed inthe following.
3-1
3.1.1 A CRITERION BASED ON INFORMATION MEASURE
Let P-,t(X) be the approximate density of theapproximation. That is,
class with the product
n- n m, m. (3-3)
Consider the following measure of closeness between the true and approximatedensities (ref. 13). That is,
I(p,Pt)= (3-4)
where C is the number of classes. From equation (3-4), it is seen that
Hp.Pt) = 0 whenever p^X) is equal to p-jt(X) for all X and that I(p,pt) > 0 ifP-,(X) is different from P-,^(X) for some X. To find the product approximationfor the densities or the dependent feature tree that minimizes I(consider
Kp.Pt) = - E P( ) f Pi1 1=1 i J i
C£
E
where
=J Pi^^j
and E UxJ4=1 *
(X)log[p(X)]dX
+ E PK) f P1=1 n J ^
XjlogCp (X)]dXn
(3-5)
(3-6)
3-2
The quantity I [x ,x , »] is the mutual information between features xp andT * J v* /xi(£) of c^ass ""• From equation (3-5), Kruskal's algorithm (ref. 16) can beefficiently used to construct optimal dependent feature trees.
3.1.2 A CRITERION BASED ON PROBABILISTIC DISTANCE MEASURES
A probability density function, like any other function, can be approximated bya number of different procedures. In the sense of preserving the separabilitybetween the classes, it is proposed that a criterion based on probabilisticdistance measures such as divergence be used to construct dependent featuretrees. The measure of closeness between the approximate and true densities isdefined as
'12 ogPlt(xy
dX + (3-7)
From equation (3-7), it is seen that J^ is large whenever the ratio of
to P2t(x) 1S Iar9e in the region of class 1 and the ratio of P2t(x) tois large in the region of class 2. By using the product approximation of equa-
tion (3-3) for the densities p^(X), equation (3-7) can be written as follows.
'12
n f+ .E j p2(
x)]
. £ fp rx ,x llogpfc
mj(T)
(3-8)
where
and
K = Ms)'dxm.
Ms)'(S)
dxmi
(3-9)
(3-10)
If more than two classes exist, the expected value of the measure of closeness
defined over pairs of classes can be used to obtain optimal approximations for
the densities (ref. 17). From equation (3-8), Kruskal's algorithm (ref. 16)
can be efficiently used to construct optimal dependent feature trees.
3.2 EXPRESSIONS FOR THE CRITERIA WHEN THE DISTRIBUTIONS OF THE FEATURES AREGAUSSIAN
Expressions are derived in this section for the mutual information and for AJ,?between the features, assuming that the distributions of the features are
Gaussian. If Pn(x,-)» the density of feature x., of the Jr" class, is Gaussian,
it can be written as
or it is denoted as P^(X-J) ~ N[u.(£),o.(£)]. The joint and conditional
densities of features x,- and x,- of the £tn class can be written as follows.1 J
0-
exp - \ qt(Xl .X.,)] (3-12)
where
Vvxj) = ^.\
(3-13)
3-4
(3-14)
and 0-,,-U) is the covanance between features xn- and x, of the £ class. Fromequations (3-11) and (3-12), the conditional density can be written as
1
.,.
where
(3-15)
(3-16)
3.2.1 AN EXPRESSION FOR THE MUTUAL INFORMATION BETWEEN FEATURES x1 AND x.
In this section, an expression is derived for the mutual information betweenGaussian-distributed features x1 and Xj of class £. From equations (3-11) and(3-15), the following can easily be obtained. Consider
f ? 1= 1 - P (A)L iJ J
(3-17)
From equation (3-17), the mutual information between features xn- and x, ofclass £ can be obtained as follows.
(3-18)
3-5
3.2.2 AN EXPRESSION FOR AJ1 9(x. ,x.) WHEN FEATURES x, AND x, ARE NORMALLYnrcTDiRiiTcn *•<- ' J JDISTRIBUTED
From equations (3-7) and (3-9), when features x^ and Xj are normallydistributed, an expression for AJ^x ,x-) can be easily obtained as follows
Ifo (l)o (2) - 2o ( l )o (2) + o ( l )o (2) a (2)o (1) - 2a (l)o (2) + a (2)o (1) ]0.1 /„ v \ - / l J _ J _ 'J IJ _ ] _ J . J _ ] _ ' J 'J _ 1 _ J O I2Aj12(xi'xj) ' - A^TTf - - + 4^T2T -- 2J
",(2)
(3-19)
where A.^l) = a.(l)oj(l) - <%.(!) (3-20)
3-6
4. A GENERAL DEPENDENT FEATURE TREE
A general dependent feature tree is shown in figure 4-1.
*18
x4
x12
Figure 4-1.- A general dependent feature tree.
Each node of the tree represents a feature, and the feature numbers are givenin figure 4-1. In approximating the probability density functions with depend-ent feature trees, each feature may be conditioned upon, at most, one of theother features. Node Xj is the root node of the tree. Nodes x2, x4, x5, x7,etc., are nodes on the periphery of the tree.
4.1 DIFFERENT TYPES OF NODES
For convenience in the following analysis, the nodes of the dependent featuretree are divided into the following types.
1. Type I nodes: Except for the root node, nodes on the periphery of the treeare defined as type I nodes. For example, in figure 4-1, nodes x2, x^, x5,xy, etc., are type I nodes.
4-1
2. Type II nodes: These are nodes which are one node deep from the periphery.
For example, in figure 4-1, nodes x6, XIQ, x15, x17, x^g, etc., are type II
nodes.
3. Type III nodes: These are nodes which are at least two nodes deep from theperiphery. For example, in figure 4-1, nodes X3, XQ, xg, x^, etc., aretype III nodes.
4. Type IV nodes: The root node of the tree is defined as a type IV node.The types of root nodes are divided into type IVa and type IVb. Examplesof the types of root nodes are described in the following.
a. Type IVa node: The type IVa node is the root node of a tree with asingle link. As an example, node x of figure 4-2 is a type IVa node.
xl
Figure 4-2.- Illustration of a type IVa node.
b. Type IVb node: The type IVb node is the root node of a tree with twoor more links. As an example, node x^ of figure 4-3 is a type IVbnode.
Figure 4-3.- Illustration of a type IVb node.
It is noted that the type IVb node is different from the type IVa node in thatmore than one node links directly with the root node of the tree.
4-2
4.2 AN EXPRESSION FOR THE COVARIANCE BETWEEN THE FEATURES CONNECTED BY A PATHIN A DEPENDENT FEATURE TREE
An expression for the covariance between the features when a path connects
their representative nodes in a dependent feature tree is developed in thissection. For example, features X-Q and x^g are connected through features X
Xg, xg, x^, and x^ in the dependent feature tree of figure 4-1. For the
following analysis, consider the dependent feature tree shown in figure 4-4.
X3
Figure 4-4.- An example dependent feature tree.
The probability density represented by the dependent feature tree of figure 4-4
can be written as follows.
P(X) = P(x1)p(x2|x1)p(x3|x2)p(x4|x2)p(x5|x2)p(x6|x5)p(x7|x4) (4-1)
In the following, an expression for the covariance between features x6 and x7of figure 4-4 is derived.
E[(xg - u6)(x7 - u?)] =f (x6 - u6)(x? - uy)p(X)dX
=1 (x6 - U6)p(x2)p(x5|x2)p(x6|x5)dx2 dx5 dxg
•J p(x4|x2)dx4J (x7 - U7)p(x7|x4)dx?
From equation (3-15), the following equations are obtained.
(4-2)
rI (xy - U7)p(x7|x4)dx? = - u4)
/<*4 - »,
(4-3)
(4-4)
4-3
Using equations (4-3) and (4-4) in equation (4-2) yields
E[(x6 - u6)(x7 - u?)] - P74 P42y /(*2 - U2)p(x2)dx2
Jp(x5|x2)dx5 J (xg - U6)p(x6|x5)dx6 (4-5)
Similar to equations (4-3) and (4-4), which were developed from equation (3-15),
the following are obtained.
(x6 - U6)p(x6|x5)dx6 = p65 — (x5 - u5)b
J
- U5)p(x5|x2)dx5 = p
- U2)p(x2)dx2 =
(x2 - U2)(4-6)
From equations (4-5) and (4-6), the covanance between features Xg and
be obtained as follows.
n °74 °42 °65- U7>1 = •^•-^• '52E[(x6 - u6)l
For a general case, the following theorems can easily be established.
can
(4-7)
Theorem 1: Suppose the features x^ and xr+s in a dependent feature tree are
connected by a path as shown in figure 4-5. Then, the covariance between
features x-^ and xr+s is given by equation (4-8).
xr+lxr+2
/r+s-1
V+s
Figure 4-5.- A path between features xj and xr+sin a dependent feature tree.
4-4
°23 Vl,r Vs,r+s-l
V "' —r+s-l,r+s-2 r+2,r+l ,. ON—' r-2 * o i (4-8)°r+s-2 Vl r 1>r
Theorem 2: Suppose the features xj and xr in a dependent feature tree are
connected by a path as shown in figure 4-6. Then, the covariance between
features x^ and xr is given by equation (4-9).
x1 x2 x3 x4 xr_! xr
Figure 4-6.- A path between features x^ and xrin a dependent feature tree.
°1? °?^ a14 °r 9 r 1Ul)(xr - ur)] . Ji . Ji . Ji ... -SiLl • or.1>r (4-9)
4-5
5. MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF THE DENSITY FUNCTIONS
In this section, maximum likelihood equations are developed for estimating theparameters of the cluster conditional densities when approximated by the first-order dependent feature trees. In practice, such as in the classification ofremotely sensed multispectral scanner imagery data, considerable interest hasbeen shown in applying maximum likelihood clustering for the decomposition ofthe mixture density of the data into its normal component densities. Themixture density p(X) can be written as
(5-1)P(X) = L P( )P(X| )
where m is the number of clusters, and P(w-j) and p(X|w1-) are the a priori proba-bilities of the modes and mode conditional densities, respectively. If thecluster conditional densities are Gaussian [i.e., p(X|u>.) ~N(U.,£ )], by usinga given set of N independent observations from the mixture density, from equa-tion (2-6), the maximum likelihood equations for the estimates of the parametersof the mixture density can easily be shown to be the following (ref. 6).
^u = IX=1ui N(5-2)
k=l
(xk - u.)(xk - ul)TP(u)l|xk)
N
k=
In maximum likelihood clustering, equation (5-2) is used for updating the
parameters of the densities, and this computation is coupled with a split andmerge sequence. The updating is usually stopped after a few iterations becauseof the large amount of computation in clustering data such as imagery data.
5-1
For practical problems, the number of parameters to be estimated is large.Using equation (5-2), the number of parameters to be estimated for each mode is
3', where n is the dimensionality of the patterns. It is known that,with a fixed sample size, the accuracy of estimation usually decreases as the
number of parameters to be estimated increases (ref. 12).
In this paper, the cluster conditional densities are approximated with first-order dependent feature trees to reduce the number of parameters to beestimated. In the product approximation for the densities discussed in the
previous sections, it is noted that each feature is conditioned upon, at most,one of the other features. The number of parameters to be estimated for eachmode is obtained as follows: the means n, the variances n, and the covariances(n - 1), or a total of (3n - 1), where n is the dimensionality of the patterns.When the product approximation is used for the probability densities, with anincrease in the dimensionality of the patterns, the reduction in the number ofparameters for each mode is as shown in figure 5-1.
O).0
c toi.
OJ <DJC 4->4J QJ
C ro
toC Q.O•- M-4-> OO3
XJ0)
t.80
.60
.40
.20
01 11 21 31 41
n dimensionality —*•
Figure 5-1.- Reduction in the number of parameters with the dimensionality.
In the following, maximum likelihood equations are developed for estimating theparameters of the cluster conditional densities when approximated with first-order dependent feature trees. It is assumed that the structure of the depend-ent feature tree is determined using the techniques discussed in section 3.The different types of nodes described in section 4 are considered separately.
5-2
5.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE A PRIORI PROBABILITIES. MEANS. ANDVARIANCES
In this section, maximum likelihood equations similar to equation (5-2) are
obtained for the a priori probabilities of the clusters and for the means andvariances of features in each cluster when the cluster conditional densitiesare approximated with dependent feature trees. The different types of nodesdiscussed in section 4 are treated separately. It is assumed that a set-X- = {X,,"«,XN> of N unlabeled patterns, each of dimension n, drawn independ-ently from the mixture density p(X) is given. When the cluster conditionaldensities are approximated by first-order dependent feature trees, the densityof the itn cluster can be written as
1=1The maximum likelihood equations for the a priori probabilities of the clusterscan easily be shown to be the following.
PK) =4 E pK-K) (5-4)1 N k=l n K
If e.j is a parameter of the ith cluster, using equation (5-3) in equation (2-8)results in
(5-5)36.
In the following, it is assumed that the distributions of pattern features ineach cluster are Gaussian. That is,
P ) ~N[U(i),a(i)] (5-6)
5.1.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE I NODES
Consider a link in a dependent feature tree containing a type I node, as shownin figure 5-2.
5-3
X2X3
Figure 5-2.- A link in a dependent feature treewith a type I node.
Since each feature is conditioned upon, at most, one of the other features,
equation (5-5) becomes
U-'f; P<»,IV»if-|109Cl>i(x2'x3>]lI K J . 1
for = u3(i)
and ^ = o3(i) (5-7)
When A = u3(i), from equation (5-7), the following is obtained.
N
(5-8)
k=l
In equation (5-7), letting <t>1 = 03(1) and $. = o23(i) and eliminatingf k "I2X3 ~ U3^M ^rom t'ie resultin9 equations yields, after simplification, anexpression for the covariance between type I and type II nodes. That is,
E(5-9)
c=l ' "
Letting A^ = a3(i) in equation (5-7) and using equation (5-9) yields the
following.
V /
5-4
5.1.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE II NODES
A typical type II node, as defined in section 4, in a general dependent featuretree is shown in figure 5-3.
-•4-
Figure 5-3.- A typical type II node in ageneral dependent feature tree.
In figure 5-3, node xs is a type II node, and nodes x,, x,,, •••, x are type I
nodes. The terms in the product approximation of the probability density
function of cluster i containing feature x$ are
un) = ••• P(xs|xt)p(x1|xs) ••• P(xp|xs) ••• (5-11)
If 61 is a parameter of the ith cluster, using equation (5-11) in
equation (2-8) yields
-= E PKK.ei k=l 39.
4=1Pi(x4|xs) + P1(xs|xt) = 0 (5-12)
From equation (5-12), letting 91 = us(i) results in the following maximum
likelihood equation for the mean of feature xs of cluster i.
Uc(l) =
where
£
1) -as£(i)
(5-13)
(5-14)
5-5
In equation (5-12), letting 6i = 0s(i), °si(1)» "*» Osr^^' and °st yie1ds
the following after simplification.
(5-15)
5.1.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE III NODES
A typical type III node in a general dependent feature tree is shown infigure 5-4.
Figure 5-4.- A typical type III node in ageneral dependent feature tree.
In figure 5-4, xs is a type III node; and nodes x,, x2, •••, xr and x^ may betype III nodes or other types of nodes. Proceeding as in section 5.1.2, themaximum likelihood equations for the variance and mean of feature xs ofcluster i can be shown to be the following (see appendix A).
as(i)• •!<•»
r-f o (i)o (i)[g .,;«)4o
(5-16)
5-6
and
k=p(w i lxk>8)xs r V1'
k=lP ( " | x . e )
(5-17)
5.1.4 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE IVA NODES
A typical type IVa node in a general dependent feature tree is shown infigure 5-5.
xl X2 X3
Figure 5-5.- A typical type IVa node in ageneral dependent feature tree.
In figure 5-5, xj is a root node of type IVa, and node /2 may be of type I,type II, or type III. The maximum likelihood equations for the variance andmean of feature Xj of cluster i are given in the following.
and
k=p(u»i|xk,e)
=1p(u1|X|(f8)
(5_18)
(5-19)
5.1.5 MAXIMUM LIKELIHOOD EQUATIONS FOR THE PARAMETERS OF TYPE IVB NODES
A typical type IVb node in a general dependent feature tree is shown infigure 5-6.
5-7
xr+l
Figure 5-6.- A typical type IVb node in ageneral dependent feature tree.
In figure 5-6, xs is a root node of type IVb, and nodes x,, x2, •••, xp are of
type I, type II, or type III. The maximum likelihood equations for the
variance and mean of feature xs of cluster i can be shown to be the following.
(i) =k=l
(r- - ua(i)
1=1
k*
N
k=l- u(1)
and(5-20)
-.(i) (5-21)
5.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCES BETWEEN FEATURES
In this section, maximum likelihood equations are developed for the covariances
between the features when the probability density functions of the clusters areapproximated by first-order dependent feature trees.
5.2.1 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCE BETWEEN TYPE I ANDTYPE II NODES
In this section, maximum likelihood equations for the covariance between type I
and type II features are derived. A typical link connecting type I and type II
nodes is shown in figure 5-7.
5-8
xr xs
Figure 5-7.- A typical link connecting type Iand type II nodes.
In figure 5-7, node xs is of type I, and node xr is of type II. The maximumlikelihood equation for the covariance between features xr and xs of cluster iis given in the following.
E P(«l|Xk.e)[x!;.ur(1)][xJ.us(1)]orsd) - op(1) —T, -, (5-22)
E PKlXk,e
5.2.2 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCE BETWEEN TYPE IVA ANDTYPE II OR TYPE III NODES
A typical link connecting type IVa and type II or type III nodes in a generaldependent feature tree is shown in figure 5-8.
xl
Figure 5-8.- A typical link connectingtype IVa and type II or type III nodesin a general dependence tree.
In figure 5-8, node xj is of type IVa, and node x2 may be of type II ortype III. The maximum likelihood equation for the covariance betweenfeatures X} and x2 of cluster i is as follows.
E(5.23)
5-9
5.2.3 MAXIMUM LIKELIHOOD EQUATIONS FOR THE COVARIANCE BETWEEN TYPES OF NODESOTHER THAN THOSE CONSIDERED IN SECTIONS 5.2.1 AND 5.2.2
Let there be a link between nodes x2 anc' X3 in a 9ener%al dependent feature tree
whose types are other than those considered in sections 5.2.1 and 5.2.2. Themaximum likelihood equation for the covariance between features x2 and X3 ofcluster i is given in the following.
Z p(M1|xk,e) o2(i)o3(i) + o23(i)A23(i) + o23(i)[x2 - u2(i)l[x3 - u3(i)l|'=1 ( ii (5-24)
P(«1|X|C,8)
5-10
6. EXPERIMENTAL RESULTS
In this section, some results from processing remotely sensed Landsatmultispectral scanner imagery data are presented. The images are of a 5- by6-nautical-rmle area called a segment. The image is divided into a rectangulararray of pixels, 117 rows by 196 columns. The image is overlaid with a rectan-gular grid of 209 grid intersections. Two classes are considered: class 1 iswheat, and class 2 is "other." The true (ground truth) labels for the pixelsat the grid intersections are acquired. The locations of the segments and theindividual acquisitions used for each of the segments are listed in table 6-1.The a priori probabilities of the classes are estimated as sample estimates.Equations (3-6) and (3-18) are used to compute the weighted mutual informationbetween the features, assuming in each class that the features are Gaussiandistributed. Kruskal's algorithm (ref. 16) is used to construct optimaldependent feature trees by minimizing I(p,pj.) of equation (3-5). The optimaldependent feature trees of segments 1648 and 1739 are shown in figures 6-1 and6-2, respectively.
X2
X7
Figure 6-1.- Optimal dependent feature treeof segment 1648.
Generally, it is known that, for each acquisition, a strong dependency existsbetween channels 1 and 2 and between channels 3 and 4. From figures 6-1 and6-2, it is seen that these dependencies appear in the optimal dependent featuretrees.
6-1
OL
LL.
OO00
oo
o00UJI—IoQi
Oo<c
oooo
CJ
Q
00LUOi— iOL
oP-H
oo
IVO
CO
depe
nden
t>
tre
eA
rbit
rary
fea
turt
4->COl•a <uc a><u i-CL4->
•o 01,— 3in 4_>E 10r- O)
CLO
4->COJ in
T3 QJc u<U 3Q-J->QJ 10
•O OJC V>-
ari
an
ceIX> u
O -4->0 ro
2U.
sit
ion
3CT
5
Lo
catio
n
+i
iu
j_></) 4->HI V
I— t/>
O1
C
C 4J•- 0)<o ini-
1—
•4->in +jQJ (U
h— in
0c
C *JI- OJ10 inI_
»—
4->l/> 4_)at a>
I— </)
01cC 4->r- (U10 </>i_
1—
4->I/) 4.)OJ 01
I— C/>
0)C
E 4->*- ai10 ini_
1—
t/)a>4->
-O
f«= <->3 <O3 *JJ </>— •»
J
|
J)1)•)
f~ CT> ^~If) O
CT)1 -
KJ- if> •, CM i- lt O
ir> «i- evjIf) CTv
1Cr^
10 01 •OJ •-< O
oo eo r^to vo
to00
CO If) .CJ C3
vo t~- toto ^~
oo00
LT> to •[ PO O
co t? toto i
•3-oo
CO OOt CM , O
ir> o cr>to <o
CVJoo
to ro •CM .-1 O
•d- tn CTI^H tO O
oor^
r^. CA •t t-i 1 O
no cor^ tvj
*}•Ol
oo co •t cvj j o
<T3
u- to or— in CNJ
Big
Sto
ne,
Min
neso
ta
oCMtr>t— t
i— t C\J CTCVJ dJ ,— 1
voLO
r^ to •| CO CM[ O
CO lf> t-<»-l C«J CO
r^to
to •— < •*T CM O
r CM CM.— 1 CO IT)
ento.— i in •
U- r-< O
t-l to O^i-H CM »-)
Ot^
r^ o •t ^»- cMt o
'CM r-.1 oCM CM O
O
to O •| CO CMt C
co to r~-1 CM CM
00to
in o •«• CM O
O CO CT>CM CM O
ooIf)
CO *t •, CO CM§ O
CT) C^CM
If)f^
cr» »^ •[ ^- •-'| O
CO lf>*)• CM
Re
nvi
lle,
Nor
th D
akot
a
sto•— <
to if) en•—I CO O
ooIf)
to oo •CM CM O
CO O OOt— 1 ^- CO
If)to
oo co •CM CM O
CM •-! CM,-H «a- tor^
toO CM •CO CM O
If) CM O
•* If)r^
to «— t •CO CM O
*!• to lf>»— i co o>
oto
OO r~~ •, CM CM, O
OO CT) COCO CM
CT>10
CO <• •CO CM O
OO if) >— i»-H «*• r
If)to
1-00CM i— 1 O
•— 1 Lf) <— 1f-H ^- r— t
CMt~
o oo< co f-i[ o
CT) If)r~ CM•— 1 «— t
i r**
<o4->o<o
• oc«j :i t0 0
CO Z
oo«!•tof-H
CM 1- r~CM If) tOto
toto co •t— 1 r-l O
CT) t CO^H If) CM
CT)to
00 COf— 1 «-! O
rt/1 O* CMI-l If) If)
CT)to
CO r-~ •| CM t-l O
CM If) CMi— 1 If) CT)
tor^
If) CM •CM i-i O
O CO CMCM If) IOr-*
tooo •* •t-H *-H O
r if) CM1— 1 tf) «— 1
CMp^
O CM •CM i-H O
CT) O i-lr-H If) r-
lf>to
CT» •i— i i— i O
co co r^CM to CM
COCT)
*1- «*• •co o
CM 00 CM COCM VO CO toCM i— 1 •— 1 CM
r p . r*» r—
Tet
on,
Mon
tana
CT)COr*-I— 1
—t CO If)!-< VO O
CT)r~
O .-1 •t CM i-l, O
r-. co if)vo vo
CO00
^ 0CM i-l O
i— i 00 i— ii-H tO 00
CO
o to •, CM | O
i-v r~to if>
r .CO
*»• to •CM O
•—i co tr>— I VD O
CDr^
0 i-l| CM i—l, O
f CO If)to to
COCO
^- o •CM i-l O
^- vo if)•—I to O
CT)r-
f^ CO •[ i-l O
*»• CTI t— 1ID CO
CMCT)
r*> ^- •CM O
co r~ coCTl to If)i-l O CM
r p~ P~
</>• IO
I/) COVI CHI 10•z. ^
COIf)00I-l
oor^to
•
0
oIf)CMr**•o
CO*fvf>r-^
•o
I-HtoCT>r^
0
r^^cor^•o
CT)I-HIf)r^
o
COCOCT)to
•
o
*}•If)If)00
•o
ssif
ica
tio
n
*s-O IO
C 3IO OQJ <J£ 10
</)
IO
13
uOJs.i.o
M JD3 10
>*- )C OO !-O O.
10 .0,
6-2
Optimal dependent feature treeof segment 1739.
To investigate the effectiveness of the optimal dependent feature trees inclassification, an experiment is performed to compare the classification accu-racies of the Bayes classifier (1) when the densities are approximated withoptimal dependent feature trees, (2) when no approximation is used for thedensities (full covariance matrix), (3) assuming the features are independent,and (4) when the densities are approximated with arbitrary dependent featuretrees. Spectral vectors of 104 labeled pixels are used as the training patternset, and the spectral vectors of the remaining 105 labeled pixels are used asthe test pattern set. The structure of the arbitrary dependent feature treeused in this experiment is shown in figure 6-3.
The computed confusion matrices and the classification accuracies on the train-ing and test sets for each of the segments processed are listed in table 6-1.From table 6-1, it is seen that, in general, better classification accuracies
6-3
X3
x4
xn-l
Figure 6-3.- Arbitrary dependent feature treeused in the experiment.
are obtained on the training set when the full covariance matrix is used
without approximating the densities. Improved classification accuracies are
obtained on the test set when the densities are approximated with optimal
dependent feature trees. This might be due to the fact that a large number of
parameters are estimated when the full covariance matrices are used.
One of the important objectives in the classification of remotely sensed
agricultural imagery data is to estimate the proportion of the class of
interest in the image. The ratio of the variance of the estimated proportion
using machine classification to the variance of the estimated proportion using
simple random sampling is called variance reduction factor R (ref. 1). The
quantity R can be viewed as an indication of how much the machine classifica-tion improves the proportion estimation. The computed variance reduction
factors for each of the segments processed are listed in table 6-2. From
table 6-2, it is seen that the variance reduction factor consistently improves
when the densities are approximated with dependent feature trees, compared tothe other cases.
6-4
oCJ
QUJa;
OOO
o;<cQ-
oo
CD
cO)-oC OlO) CUQ. L.O) 4->
•XJCU
>> «-l_ 3
1- re4-> CU
s-
4J
0)T3 OJC <D
D-*JMl
•o QJp— 3re 4->E "3•i- QJ
CLO
cQJ (/)
!£CU 3
cu re•o 01C M->— *
QJO
reS- Xre v> j_0 4->O 03
£
u_
cor-
4-> IO
•r- IO
CTO
eC
C »O >^--"^
+J C 4->re 3 re0 O 4->O 0 co
1C71CU
CO
oIDf^.
0
CVJCOIDID
O
O
VOM
O
^J-C3^! HCO
O
*i- vo Ot ID CVJ
i rv. r^
M
cu reC +Jo o
4-> COco cuo> cr- *r-
CO £
oCVJ
1-H
oo r-~ ID oCO VO ^" f^CT^ CT^ CTv ^^
O O O O
ID VO «*• VO^"* O^ ^J ^^*• oo r*. ID00 00 00 VOo o o o
r>- Cfi cvj I-Hco o «*• ovo to cvi r*»Cn Cn Cn *• • • •o o o o
VO ID CVJ <*00 CVJ CO COr~ co ^t- Oo> en c7t oo• • • •o o o o
CO ID CT> ID CVI CO CVJ CO CO^•cv i t ^ c v j c v j v o c o v o eni^r^ 1^1^ t ^ r > - r ~ . v o r»
re reo o
* ^ ^*cu re re
r— o • o rei— C • C•i- .c re j: c re •> 4-> E 4J O 4-> COC t- $ t. 4-> C COCU O O O CU O CU
CC Z CO Z 1— £ Z
^* oo en coO *f CO IDVO VO l~~ OOt-H i-H i-H i-H
1- COVO IDCO CVJt^ VO
COreCOcre*
6-5
7. CONCLUDING REMARKS
In the classification of imagery data, such as in the machine processing ofremotely sensed multispectral scanner data, unsupervised classificationtechniques have been found to be effective. The application of clusteringtechniques for the analysis of imagery data essentially involves two steps:(1) clustering the data or partitioning the image into its inherent modes and(2) giving the probabilistic class labels to the resulting clusters. Inpractice, it is observed that fields are relatively easy to label when compared
to pixels.
Several researchers have investigated methods for locating fields in theimagery data. Recently, considerable interest has been shown in developingtechniques for probabilistically labeling the clusters using information from agiven set of labeled patterns and, also, from a given set of labeled fields.
In decomposing the mixture density of the data into its normal component densi-ties, the parameters of the component densities and the a priori probabilitiesof the modes are iteratively computed using maximum likelihood equations coupledwith a split and merge sequence. The updating of the parameters is usuallystopped after a few iterations; and for practical data, a large number of param-eters must be estimated. For a fixed sample size, the accuracy of estimationusually decreases as the number of parameters to be estimated increases.
To overcome the above shortcomings, it is proposed in this paper that the den-sities be approximated with first-order dependent feature trees. The dependentfeature trees can be constructed using criteria based on information measureand, also, based on class separability measure. Expressions are derived for
the criteria when the distributions of the features are Gaussian. Expressionsalso are derived for the covariances between features not connected by a singlelink in the dependent feature tree.
Different types of nodes are defined in a general dependent feature tree.Maximum likelihood equations are derived for the parameters of the mixture
7-1
density of the data by approximating the cluster conditional densities withfirst-order dependent feature trees. The field structure of the data is alsotaken into account in the decomposition of the mixture density of the data intoits normal component densities. Furthermore, experimental results from theprocessing of remotely sensed multispectral scanner imagery data are presented.
7-2
8. REFERENCES
1. Heydorn, R. P.; Trichel, M. C.; and Enckson, J. D.: Methods for SegmentWheat Area Estimation. Proc. LACIE Symp., NASA/JSC (Houston). JSC-16015,vol. II, July 1979, pp. 621-632.
2. Bryant, J.: On the Clustering of Multidimensional Pictorial Data.Pattern Recognition, vol. 11, no. 2, 1979, pp. 115-125.
3. Kauth, R. J.; Pentland, A. P.; and Thomas, G. S.: Blob: An UnsupervisedClustering Approach to Spatial Preprocessing of MSS Imagery. Proc. llthInt. Symp. on Remote Sensing of Environment. Environmental ResearchInstitute of Michigan (Ann Arbor), 1977, pp. 1309-1317.
4. Kettig, R. L.; and Landgrebe, D. A.: Classification of MultispectralImage Data by Extraction and Classification of Homogeneous Objects. IEEETrans. Geoscience Electronics, vol. GE-14, Jan. 1976, pp. 19-26.
5. Peters, B. C., Jr.: On the Consistency of the Maximum Likelihood Estimateof Normal Mixture Parameters for a Sample With Field Structure. Reportno. 74, Dept. of Math., Univ. of Houston, Sept. 1979.
6. Duda, R. 0.; and Hart, P. E.: Pattern Classification and Scene Analysis.John Wiley and Sons (New York), 1973.
7. Hasselblad, V.: Estimation of Parameters for a Mixture of NormalDistributions. Technometrics, vol. 8, 1966, pp. 431-446.
8. Day, N. E.: Estimating the Components of a Mixture of NormalDistributions. Biometrika, vol. 56, 1969, pp. 463-474.
9. Lennington, R. K.; and Rassbach, M. E.: Mathematical Description and Pro-gram Documentation for CLASSY: An Adaptive Maximum Likelihood ClusteringMethod. Lockheed Electronics Company, Inc., Tech. Memo LEC-12177,JSC-14621, NASA/JSC (Houston), Apr. 1979.
10. Chittineni, C. B.: Some Approaches to Optimal Cluster Labeling ofAerospace Imagery. Lockheed Engineering and Management Services Company,Inc., Tech. Report LEMSCO-14597, JSC-16355, NASA/JSC (Houston), May 1980.
11. Chittineni, C. B.: Probabilistic Cluster Labeling of Imagery Data.Lockheed Engineering and Management Services Company, Inc., Tech. ReportLEMSCO-15358, JSC-16384, NASA/JSC (Houston), Sept. 1980.
12. Hughes, G. F.: On the Mean Accuracy of Statistical Pattern Recognizers.IEEE Trans. Information Theory, vol. IT-14, Jan. 1968, pp. 55-63.
13. Chow, C. K.; and Liu, C. N.: Approximating Discrete Probability Distribu-tions With Dependence Trees. IEEE Trans. Information Theory, vol. IT-14,May 1968, pp. 462-467.
8-1
14. Lewis, P. M.: Approximating Probability Distributions to Reduce StorageRequirements. Information and Control, vol. 2, Sept. 1959, pp. 214-225.
15. Chittineni, C. B.: Efficient Feature Subset Selection With ProbabilisticDistance Criteria. Lockheed Electronics Company, Inc., Tech. MemoLEC-13355, JSC-14870, NASA/JSC (Houston), May 1979.
16. Kruskal, J. B.: On the Shortest Spanning Subtree of a Graph and theTraveling Salesman Problem. Proc. Anerican Math. Soc., vol. 7, 1956,pp. 48-50.
17. Tou, J. T.; and Gonzalez, R. C.: Pattern Recognition Principles.Addison-Wesley Publishing Co., Inc. (Mass.), 1974.
8-2
APPENDIX A
DERIVATIONS OF MAXIMUM LIKELIHOOD EQUATIONS FOR THE
PARAMETERS OF A TYPE III FEATURE
In this appendix, maximum likelihood equations for the parameters of a typical
type III feature are derived. A typical type III node in a general dependent
feature tree is illustrated in figure A-l.
X5
Figure A-l.- Illustration of a typical type III nodein a general dependent feature tree.
In figure A-l, x2 is a type III feature. The following is obtained from equa-
tion (5-5) by keeping only the terms that involve feature x2 in the product
approximation of the density of the i^"1 cluster.
3 o7= E PfoJXfc.e) log[p.(x4 |x2)] + 1og[p.(x3|x2)]1 K~ -L I
+ Iog[p1(x2 |x1)] j
Tog[pi(x2,x4)] + Iog[pi(x2,x3)]N
K~l 1
+ log[Pl(Xl,x2)] - 2 log[Pl(x2)] - logCp^Xj)] (A-l)
A-l
It is assumed that the features of the ith cluster are Gaussian distributed.That is,
. rrr iV1^- v""2rs
log[Pl(xr,xs)] = -log(2ir) - \ log[Ars(i)] - l
(.} jas(i)[xr - up(i)]J
where
- 2ors(i)[xr- ur(i)][xs- us(1)] + ar(i)[xs
= or(i)as(i) -
Using equation (A-2) in equation (A-l) yields
where
Jo
.2
Letting 6. = U2(i)» from equation (A-5), the following is obtained.9Ck " "
i) rTT X1) [
2ai(1')+ l-
2a
(A-2)
(A-3)
(A-4)
(i)[xj - U4(
(A-5)
(A-6)
A-2
Substituting equation (A-6) in equation (A-4) and equating the results to zero
yields the following maximum likelihood equation for the mean of feature x2 ofcluster i.
u2(i) •
(A-7)
Letting 8 = o2(i) in equation (A-4) and equating the result to zero yields thefollowing after simplification.
2 °(1)
Similarly, differentiating t with respect to 02j(i) for j = 1,3,4 and equatingthe resulting expression to zero yields
( 2o2 (i) 2o2(i)<j (i) r k ,r . , 2o? (i)a,(i) rt
•'"»••'{- J r - t~ [4 - -.(•)]« - V"] *-SJr5)- [4 • "2
- »r [4 - i')]K - V"] * !5^i K - VA ( l ) l
(A-9)
Using equation (A-9) for j = 1,3,4 in equation (A-8) yields, aftersimplification, the following maximum likelihood equation for
££* [4 - JW -v
(A-10)
A-3
Proceeding, similar to equations (A-9) and (A-10), it can easily be shown thatthe maximum likelihood equations for the mean us(i) and variance os(i) offeature xs of cluster i of figure 5-4 are of equations (5-16) and (5-15).
A-4
APPENDIX B
MAXIMUM LIKELIHOOD EQUATIONS WITH FIELD STRUCTURE
In practical applications of pattern recognition, such as in the classificationof remotely sensed agricultural imagery data, one of the difficult problems isto obtain labels for the training patterns. The labels for the training pat-terns are usually provided by an analyst-interpreter by examining imagery filmsand using some other information such as historic information and crop calendarmodels. Agricultural imagery data usually have a field-like structure, and itis observed that fields are relatively easy to label when compared to pixels.Recently, considerable interest has been shown in developing techniques forlocating fields in the imagery data (ref. 2-4) and in developing methods for theprobabilistic labeling (refs. 10, 11) of cluster distributions using informationfrom a given set of labeled fields. Once the fields are located by a field-finding algorithm, the problem of fitting a mixture of Gaussian densityfunctions to the data by taking into account the field structure of the data isconsidered in this appendix.
It is assumed that there are f-fields in the data. Let the j1-11 field be denoted
by F,; let it contain N-, pixels; and let X^, k = 1,2,»-«,N., be their spectralj j j jvectors. Let m be the number of clusters in the data. Let P(u.j) and p(X|u>.j) bethe a priori probability that a field belongs to cluster u>1 and cluster condi-tional densities, respectively. Let X- be the concatenated vector of spectralvectors of the pixels in the jth field. That is,
"x.xjl(J2
(B-l)
It is assumed that the fields are independent. Then, the joint density of
f-fields is given by
h.'-.xV) = II pfL) (B-2)1 j=1 J/
B-l
The mixture density p(x" ) can be written as* \J '
m(B-3)
If it is assumed that the spectral vectors of the pixels in each field are
cluster conditionally independent, then
NJ(B-4)
Using equations (B-3) and (B-4), the joint density of equation (B-2) can be
written as follows.
i
-.xf)= n m
£ P(».) kn (B-5)
Since the logarithm is a monotonic function of its argument, taking the log of
both sides of equation (B-5) and denoting it by l results in
-LJ=l
logm
5 p(^"N.j
~
P(x jk|»r) (B-6)
From equation (B-6), which is similar to equation (2-7), the maximum likelihoodequation for the probability that a field belongs to a cluster can easily beobtained as the following.
(B-7)
If Q-j is a parameter of the density function of the i^n cluster, differentiatingi of equation (B-6) with respect to ei yields the following.
ii_36.
i!!i> f _L"i
(B-8)
B-2
If the probability density functions of the clusters are Gaussian [i.e.,
p(X|ui) ~ N(U ,£.)], from equation (B-8), the maximum likelihood equations forthe mean and covanance matrix of the densities of the clusters can be shown tobe the following.
(B-9)- J=J
j=lrN,
and (B-10)
where (B-ll)
If the probability density functions of the clusters are approximated withfirst-order feature dependence trees, maximum likelihood equations for theparameters of the densities (similar to those developed in section 5) caneasily be obtained from equations (B-8) and (3-1) by taking into account thefield structure of the data.
B-3
APPENDIX C
DEPENDENT FEATURE TREES WITH THE NODES
REPRESENTING FEATURE SUBSETS
Very often it is necessary to have each node of a dependent feature treerepresent a set of features instead of one feature. For example, in remotesensing, the satellite makes multiple passes over a given area and, at eachacquisition, gathers several channels of data. In some instances, it isdesirable to have each nede of a dependent feature tree represent a set offeatures (e.g., the set of features corresponding to an acquisition). In thisappendix, expressions are developed for the mutual information between thefeature subsets and for the covariance between the feature subsets when a pathconnects them in a dependent feature tree. It is assumed that the features areGaussian distributed.
Let the components of feature vectors X^ and Xj be the sets of features repre-sented by nodes i and j, respectively. Let n^ and n,- be the dimensionality ofvectors X^ and X,-, respectively. If the feature vector X1 is normally distrib-uted, its probability density p(Xn) can be represented as
p(X.) (C-l)
where Un- is the mean vector and E. is the covariance matrix.
Let Z =
where
xI.xM . Then,\ I J /
p(Z) ~N(Uz,Ez) (C-2)
E =Z
E.1
ET.. ""J
E .U
E .J _
(C-3)
C-l
The mutual information between feature vectors X^ and Xj can be written as
dX
|E' (C-4)
(C-5)
where IE I is the determinant of the matrix EZ. Let y and v be the zero mean
normal random vectors. That is, p(y) ~ N(0,Cy) and p(v) ~ N(0,CV). Let
Z = (yT,vT) and p(Z) ~ N(0,CZ). Let
and
Consider
c =z
_ x
z
\
CTyv
"°y
0Tyv
°yv
Cv
V
°».
|
(
= constant • expi-
(C-6)
(C-7)
where
(C-8)
C-2
Thus, the density p(y|v) is Gaussian with the mean -Q" Qyyv and the covariancematrix Q" . Following a similar argument, it can easily be shown that, if Xn-is normally distributed, p(X.|X.) is normally distributed with the mean
[ -1 »1 J -1Ji ~ Q-i Qii^i ~ UiM and tne covariance matrix Q. . Now expressions for thei 1 ij j j j icovariance between the feature subsets, when a path connects their representa-tive nodes in a dependent feature tree, can be derived as in section 4.2. Forexample, if X4 and X7 are Gaussian random vectors, similar to equation (4-3),the following can easily be obtained.
(Xy - U7)p(X7lX4)dX? = -Q71Q?4(X4 - U4) (C-9)
Thus, expressions similar to equations (4-8) and (4-9) can easily be obtained.
NASA-JSC
C-3
"Maximum Likelihood Clusteringwith Dependent Feature Trees" SUPPORTING RESEARCH
TECHNICAL REPORT
SR-L1-04031JSC-16853January 16, 1980
DISTRIBUTION
o Distribution of this document is limited to those people whose names appear withoutan asterisk. Persons with an asterisk(*) beside their name will receive an abstractonly (JSC Form 1424).
o To obtain a copy of this document, contact one of the following:F. G. Hall - (SG3)J. E. Wainwright - Lockheed Development & Eveluation Department
(626-43) (C09)
SK/R. Erb*J. Powers *R. Hatch (USDA) *W. Stephenson *M. Helfert *F. Barrett (USDA) *L. Chi IdsG. Nixon *
SG/R. MacDonaldSG2/D. Hay
R. MusgroveW. Weimer *D. Frank (USDA) *G. McKain *
SG3/F. HallM. FergusonC. Hall urnR. HeydornA. HoustonD. ThompsonD. PittsA. FeivesonR. JudayM. SteibK. HendersonJ. Dietrich
Oregon State UniversityL. Eisgruber
GMI/R. HolmesData Resources, Inc.
B. ScherrKansa^ Statp University/
E. T. LabE. Kanemasu
SH2/R.D.K.K.N.R.T.C.L.
HillAmsbury *Demel *Hancock *Hatcher *Joosten *PendletonForbes *Bennett *
M.-TrichelSH3/J.
R.R.D.W.G.R.H.F.L.V.K.J.C.G.F.F.R.
DraggEason *BizzellHenningerJones *Kraus *McKinney *Prior *Ravet *Wade *Whitehead *Baker *Carney *DavisGutschewskiHerbert *Johnson *Patterson *
J. Ginns *Tom Barnett/NOAA CEAS *
116 Federal BuildingColumiba, MO 65?01
** LOCKHEED Documents Only
LOCKHEEDC09/J. G. Baron *
M. L. Bertrand *B. L. CarrollP. L. Krumm *T. C. Minter (10)D. E. PhinneyP. C. Swanzy *J. J. Vaccaro *J. E. Wainwright (3)J. L. Hawkins (2)
** Job Order File
** B09/Technical Library (5)ERIM/R. Horvath (4)PURDUE-LARS/M. E. Bauer (4)PURDUE UniversityDr. PaarlbergTAMU/L. F. GusemanUCB/C. M. HayIBM- Houston. TX MC16/
A. Anderson
IBM - Palo Alto, CA/Dr. KolskyUSDA - Houston. TX
G. BoatwrightUSDC - Washington, D.C./
R. AmbroziakUSDC/EDIS/CEAS - Columbia, MO
S. LeDucC. Sakamoto