classification with several populations presented by: libin zhou
TRANSCRIPT
![Page 1: Classification with several populations Presented by: Libin Zhou](https://reader036.vdocuments.site/reader036/viewer/2022082517/56649d985503460f94a82df1/html5/thumbnails/1.jpg)
Classification with several populations
Presented by: Libin Zhou
![Page 2: Classification with several populations Presented by: Libin Zhou](https://reader036.vdocuments.site/reader036/viewer/2022082517/56649d985503460f94a82df1/html5/thumbnails/2.jpg)
Classification procedure
Minimum Expected Cost of Misclassification Method (ECM) The ECM for two populations is:
Where: P is the conditional probability; p is the prior probability; c is the cost of misclassification
The ECM for multiple populations could be:
)))|()|((())(*()(*...)1(*
)|()|()(
)1|()1|()1|()1|(...)1|3()1|3()1|2()1|2()1(
1 1 &11
&1
2
g
i
g
i
g
ikkiig
g
ikk
g
k
ikcikPpiECMpgECMPECMpECM
ikcikPiECM
kckPgcgPcPcPECM
21 )2|1()2|1()1|2()1|2( pcPpcPECM
![Page 3: Classification with several populations Presented by: Libin Zhou](https://reader036.vdocuments.site/reader036/viewer/2022082517/56649d985503460f94a82df1/html5/thumbnails/3.jpg)
Minimum ECM classification Rule
Result 11.5 on page 614. When
is smallest, assigning x to population k could minimize the ECM.
• If misclassification costs are equal, the rule could be simplified as
Or
g
kiiii ikcxfp
&1
)|()(
)(ln)(ln
)()(
xfpxfp
xfpxfp
iikk
iikk
![Page 4: Classification with several populations Presented by: Libin Zhou](https://reader036.vdocuments.site/reader036/viewer/2022082517/56649d985503460f94a82df1/html5/thumbnails/4.jpg)
Maximum posterior probability Rule
The posterior probability is =P(x comes from population k given that x was observed)
for k=1,2,…,g This rule is the generalization of the largest posterior
probability rule for two populations classification (Equation (11-9))
)|( xP
)](*)[(
)(*)(
)(
)()|(
1
likelihoodprior
likelihoodprior
xfp
xfpxP g
iii
kkk
![Page 5: Classification with several populations Presented by: Libin Zhou](https://reader036.vdocuments.site/reader036/viewer/2022082517/56649d985503460f94a82df1/html5/thumbnails/5.jpg)
Classification with Normal population
When the populations are multivariate normal distribution, the term in the minimum ECM classification rule with equal misclassification costs (Equation(11-41))
could be written by
Then we get
Where d is the quadratic discrimination score and i=1,2,…,p
)(xfi)(ln)(ln xfpxfp iikk
)]()'(exp[||)2(
1)(
1
21
2/12/ iiii
pi xxxf
ii iiiQi
iiik kkk
pkkk
pxxxd
xfpxxpxfp
ln)()'(||ln)(
))((lnmax)()'(||ln)2ln()(ln)(ln
1
21
21
1
21
21
2
![Page 6: Classification with several populations Presented by: Libin Zhou](https://reader036.vdocuments.site/reader036/viewer/2022082517/56649d985503460f94a82df1/html5/thumbnails/6.jpg)
Minimum total probability of misclassification (TPM) rule for normal populations with different
If the quadratic discrimination score
then x would be allocated to population k
))(max()( xdxd Qi
Qk
i
![Page 7: Classification with several populations Presented by: Libin Zhou](https://reader036.vdocuments.site/reader036/viewer/2022082517/56649d985503460f94a82df1/html5/thumbnails/7.jpg)
Estimated Minimum (TPM) rule for several normal populations with different
In practice, the and are usually unknown, but a training set of correctly classified observations is often available for the construction of estimates. The relevant sample quantities for population i are and
Then the estimated could be written by
i=1,2,…,g
ii
ix
iSQid̂
iiiiiQi pxxSxxSxd ln)()'(||ln)(ˆ 1
21
21
i
![Page 8: Classification with several populations Presented by: Libin Zhou](https://reader036.vdocuments.site/reader036/viewer/2022082517/56649d985503460f94a82df1/html5/thumbnails/8.jpg)
The estimated minimum TPM rule for equal-covariance normal population
If the covariance of the several populations are equal, then the quadratic discrimination score could be simplified into an estimate of a linear discriminant score based on the pooled estimate of the covariance.
We can also define a new variable: Generalized Squared Distance
Then the sample discriminant score could be written by
iipooledipooledii pxSxxSxxd ln)(ˆ 1'211'
)()()( 12ipooledii xxSxxxD
iiiiiiiQi pxDconspxxSxxSxd ln)(ln)()'(||ln)(ˆ 2
211
21
21
![Page 9: Classification with several populations Presented by: Libin Zhou](https://reader036.vdocuments.site/reader036/viewer/2022082517/56649d985503460f94a82df1/html5/thumbnails/9.jpg)
Example 11.11. Classifying a potential business-school graduate student
Introduction: the admission officer of a business school has used an “index” of undergraduate grade point average (GPA) and graduate management aptitude test (GMAT) scores to help decide which applicants should be admitted to the school’s graduate programs.
Analysis: Populations: Pop1—admit; Pop2—do not admit; Pop3—
borderline Variable: x1—GPA; x2—GMAT
Question: Allocating a new applicant with variables (3.21,497) using sample discriminant scores
![Page 10: Classification with several populations Presented by: Libin Zhou](https://reader036.vdocuments.site/reader036/viewer/2022082517/56649d985503460f94a82df1/html5/thumbnails/10.jpg)
Resolution
1) calculate the mean values for each populations 2) calculate the pooled covariance 3) calculate the sample squared distances
using the sample squared distance function
Where i=1,2,3
• 4) Results: =2.58; =17.10; =2.47
From the rule of assigning x to the “closest” population, the new application should be assigned to population 3, borderline.
)(2 xDi
)()()( 12ipooledii xxSxxxD
)( 02
1 xD )( 022 xD )( 0
23 xD