discrimination amongst k populations. we want to determine if an observation vector comes from one...

39
Discrimination amongst k populations

Upload: leonard-sherr

Post on 01-Apr-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Discrimination amongst k populations

Page 2: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

We want to determine if an observation vector

1 1 1 1

1

: , ,

: , ,

p

k k p k

f x x f x

f x x f x

1

p

x

x

x

comes from one of the k populations

For this purpose we need to partition p-dimensional space into k regions C1, C2 , …, Ck

Page 3: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

We will make the decision:

j

j i i

C

P x C f x dx

came from i iD x

For this purpose we need to partition p-dimensional space into k regions C1, C2 , …, Ck

if ix C

Misclassification probabilities

P[j|i] = P[ classify the case in j when case is from i]

Cost of Misclassification

cj|i = Cost classifying the case in j when case is from i

Page 4: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

1 1 11 1 1i i i i iECM i c P i c P i i c P i i

Initial probabilities of inclusion

P[i] = P[ classify the case is from i initially]

Expected Cost of Misclassification of a case from population i

We assume that we know the case came from i

k ic P k i

j ij i

c P j i

Page 5: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

1 1ECM P ECM P k ECM k

Total Expected Cost of Misclassification

j ii j i

P i c P j i

i

P i ECM i

j

ij ii j i C

P i c f x dx

j

i j ij i jC

P i f x c dx

i

j

Page 6: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Optimal Classification Rule

The optimal classification rule will find the regions Cj that will minimize:

j

i j ij i jC

ECM P i f x c dx

| if j

i j ij i jC

c P i f x dx c c

1

j

k

i jj iC

c P i f x P j f x dx

ECM will be minimized if Cj is chosen where the term that is omitted:

is the largest jP j f x

Page 7: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Optimal Regions when misclassification

costs are equal

for j j iC x P j f x P i f x i j

ln ln for j ix P j f x P i f x i j

Page 8: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Optimal Regions when misclassification

costs are equal an distributions are p-variate Normal with common covariance matrix

for j j iC x P j f x P i f x i j

ln ln for j ix P j f x P i f x i j

112

/ 2 1/ 2

1

2i ix x

i pf x e

In the case of normality

ln iP i f x

112

1ln ln 2 ln

2 2 i i

pP i x x

Page 9: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

and ln ln if:j iP j f x P i f x

that is

112

1ln ln 2 ln

2 2 j j

pP j x x

112

1ln ln 2 ln

2 2 i i

pP i x x

1 1 1 11 12 2ln lnj j j i i ix P j x P i

j j i ia x b a x b

1 112where and lni i i i ia b P i

or

Page 10: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Summarizing

We will classify the observation vector in population j if: max maxj j j i i i

i iL a x b L a x b

1 112where and lni i i i ia b P i

1 2 3,L L L12

3

3 1 2,L L L 2 1 3,L L L

1 2L L

2 3L L1 3L L

Page 11: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

k—means Clustering

A non-hierarchical clustering scheme

want subdivide the data set into k groups

Page 12: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

The k means algorithm

1. Initially subdivide the complete data into k groups.

2. Compute the centroids (mean vector) for each group.

3. Sequentially go through the data reassigning each case to the group with the closest centroid.

4. After reassigning a case to a new group recalculate the centroid for the original group and the new group to which it is a member.

5. Continue until there are no new reassignment of cases.

Page 13: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Example: n = 60 cases with two variables (x,y) measured

x y x y x y x y

26.25 17.32 18.29 13.55 8.95 25.8 20.85 8.1427.11 23.51 17.45 15.61 8.56 16.98 18.65 8.2735.05 19.41 26.21 22.43 11.23 17.89 22.03 529.04 21.98 35.48 21.5 10.99 15.44 19.16 7.2628.4 22.39 19.34 15.19 14.1 23.47 25.27 4.3122.2 19.26 16.33 20.05 3.51 20.32 21.2 10.26

29.38 22.61 15.02 17.21 12.36 18.77 19.02 7.2226.59 18.76 8.31 23.44 7.87 17.32 20.49 6.6727.61 18.9 8.9 21.13 9.98 20.26 20.09 9.5725.23 20.05 9.3 17.33 9.32 22.96 33.83 4.1325.02 16.63 14.55 22.01 18.79 10.2 23.95 10.5733.57 26.85 12.12 28.5 28.91 8 23.92 12.1527.25 16.44 1.16 20.54 26.91 5.65 13.51 11.7333.7 22.8 13.95 17.69 14.22 10.45 20.53 9.4

27.16 19.48 13.14 24.02 33.24 5.29 14.88 14.42

Page 14: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

0

5

10

15

20

25

30

0 10 20 30 40

Graph: Scattergram of data

Page 15: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

0

5

10

15

20

25

30

0 10 20 30 40

Graph: Initial Clustering

Page 16: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

0

5

10

15

20

25

30

10 20 30 40

Graph: Final Clustering

Page 17: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

0

5

10

15

20

25

30

0 10 20 30 40

Graph: True subpopulations

Page 18: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

An Example: Cluster Analysis, Discriminant Analysis, MANOVA

A survey was given to 132 students

• Male=35,

• Female=97

They rated, on a Likert scale

• 1 to 5

• their agreement with each of 40 statements.

All statements are related to the Meaning of Life

Page 19: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Questions and Statements

1. How religious/spiritual would you say you are?

2. To have trustworthy and intimate friend(s)

3. To have a fulfilling career

4. To be closely connected to family 5. To share values/beliefs with others in your close circle or

community

6. To have and raise children 7. To continually set short and long-term, achievable goals for

yourself

8. To feel satisfied with yourself (feel good about yourself)

9. To live up to the expectations of family and close friends

10. To contribute to world peace

Page 20: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Statements - continued

11. To be involved in an intimate relationship with a significant person

12. To give of yourself to others.

13. To be able to plan and take time for leisure.

14. To act on your own personal beliefs, despite outside pressure.

15. To be seen as physically attractive. 16. To feel confident in choosing new experiences to better

yourself.

17. To care about the state of the physical/natural environment.

18. To take responsibility for your mistakes.

19. To make restitution for you mistakes, if necessary.

20. To be involved with social or political causes.

Page 21: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

21. To keep up with media and popular-culture trends.

22. To adhere to religious practices based on tradition or rituals. 23. To use your own creativity in a way that you believe is

worthwhile. 24. The meaning of life is found in understanding ones ultimate

purpose for life. 25. The meaning of life can be discovered through intentionally

living a life that glorifies a Spiritual being.

26. There is a reason for everything that happens. 27. Obtaining things in life that are material and tangible is only

part of discovering the meaning of life. 28. People unearth the same basic values when attempting to find

the meaning of life. 29. It is more important to cultivate character than to be consumed

with outward rewards, or, awards.

30. Some aims or goals in life are more valuable than other goals.

Page 22: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

31. The purpose of life lies in promoting the ends of truth, beauty, and goodness.

32. A meaningful life is one that contributes to the well-being of others.

33. The meaning of life is the same as a happy life.

34. The meaning of life is found in realizing my potential.

35. Life has purpose only in the everyday details of living. 36. There is no, one, universal way of obtaining a meaningful life

for all people. 37. People passionately desire different things. Obtaining these

things contributes to making life more meaningful for them. 38. What contributes to a meaningful life varies according to each

person (or group). 39. Lives can be meaningful even without the existence of a God

or spiritual realm.

40. Our lives have no significance, but we must live as if they do.

Page 23: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

The Analysis

The first step in the analysis is to perform cluster analysis to see if there are any subgroups of interest:

Both hierarchical and partitioning method (K-means) approaches were used for the cluster analysis.

Page 24: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Figure1: Dendogram

Clustering using Ward`s method

Euclidean distances

Cases

Lin

ka

ge

Dis

tan

ce

0

10

20

30

40

50

60

70

80

Page 25: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

The Analysis

From the of the previous figure, it follows by cutting across the dendogram branches at a linkage distance of between 30 or 75, that 2 or 3 clusters describe the data best.

The k-means method was then used (with k=2 and k=3) to identify members of these clusters. Using the k-means procedure, similarly, two and three cluster models fit the data best (attempts to use higher values of k resulted in clusters with only one case).

Page 26: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

One-way MANOVA was then utilized to test for significant differences between the clusters

It was also used to identify the statements on which the differences between the two clusters were most significant.

Page 27: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Table 1: Questions and Descriptive Statistics by Clusters

Cluster 1 Cluster 2 Cluster 3 p-value mean std.dev mean std dev mean std dev Question

0.000 2.40 0.93 4.41 0.84 1.26 0.45 25. The meaning of life can be discovered through intentionally living a life that glorifies a Spiritual being.

0.000 4.65 0.73 2.59 1.52 4.58 0.96 36. There is no, one, universal way of obtaining a meaningful life for all people.

0.000 4.24 0.85 1.59 0.95 4.37 1.38 39. Lives can be meaningful even without the existence of a God or spiritual realm.

0.000 1.40 0.60 1.17 0.44 3.05 1.78 40. Our lives have no significance, but we must live as if they do.

0.000 2.31 1.10 3.41 1.07 1.37 0.68 22. To adhere to religious practices based on tradition or rituals.

0.000 4.03 0.93 4.34 0.94 2.32 1.57 26. There is a reason for everything that happens.

0.000 3.50 1.11 3.27 1.36 1.53 0.90 27. Obtaining things in life that are material and tangible is only part of discovering the meaning of life.

0.000 2.78 1.01 3.98 1.29 2.37 1.34 1. How religious/spiritual would you say you are?

0.000 4.79 0.41 3.90 1.14 4.53 0.84 38. What contributes to a meaningful life varies according to each person (or group).

0.000 4.22 0.86 3.22 1.26 3.16 1.26 37. People passionately desire different things. Obtaining these things contributes to making life more meaningful for them.

0.000 4.03 0.71 3.34 1.09 2.89 1.15 34. The meaning of life is found in realizing my potential.

0.000 3.89 0.78 4.34 0.62 3.16 1.07 5. To share values/beliefs with others in your close circle or community

0.000 4.25 0.55 4.61 0.54 4.89 0.32 14. To act on your own personal beliefs, despite outside pressure.

0.000 3.53 0.92 3.76 1.11 2.37 1.42 24. The meaning of life is found in understanding ones ultimate purpose for life.

0.000 4.38 0.57 4.37 0.73 3.58 1.43 32. A meaningful life is one that contributes to the well-being of others.

0.001 3.38 1.12 2.61 1.12 2.53 1.54 33. The meaning of life is the same as a happy life.

0.003 2.64 1.07 2.05 1.16 1.84 1.17 35. Life has purpose only in the everyday details of living.

0.005 3.01 0.94 2.95 0.92 2.21 1.08 28. People unearth the same basic values when attempting to find the meaning of life.

0.007 3.93 0.94 4.27 0.90 3.42 1.17 7. To continually set short and long-term, achievable goals for yourself

0.008 3.67 0.95 3.51 0.95 2.84 1.34 9. To live up to the expectations of family and close friends

0.013 3.63 0.78 3.90 0.89 4.26 1.10 17. To care about the state of the physical/natural environment.

0.015 4.32 0.73 4.05 0.89 4.68 0.75 13. To be able to plan and take time for leisure.

0.015 2.72 0.97 2.56 1.05 1.95 1.13 21. To keep up with media and popular-culture trends.

0.041 4.50 0.80 4.54 0.78 3.95 1.35 30. Some aims or goals in life are more valuable than other goals.

Page 28: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Table: Questions and Cluster means

Cluster p-value 1 2 3 Question

0.000 2.40 4.41 1.26 25. The meaning of life can be discovered through intentionally living a life that glorifies a Spiritual being.

0.000 4.65 2.59 4.58 36. There is no, one, universal way of obtaining a meaningful life for all people.

0.000 4.24 1.59 4.37 39. Lives can be meaningful even without the existence of a God or spiritual realm.

0.000 1.40 1.17 3.05 40. Our lives have no significance, but we must live as if they do.

0.000 2.31 3.41 1.37 22. To adhere to religious practices based on tradition or rituals.

0.000 4.03 4.34 2.32 26. There is a reason for everything that happens.

0.000 3.50 3.27 1.53 27. Obtaining things in life that are material and tangible is only part of discovering the meaning of life.

0.000 2.78 3.98 2.37 1. How religious/spiritual would you say you are?

0.000 4.79 3.90 4.53 38. What contributes to a meaningful life varies according to each person (or group).

0.000 4.22 3.22 3.16 37. People passionately desire different things. Obtaining these things contributes to making life more meaningful for them.

0.000 4.03 3.34 2.89 34. The meaning of life is found in realizing my potential.

0.000 3.89 4.34 3.16 5. To share values/beliefs with others in your close circle or community

0.000 4.25 4.61 4.89 14. To act on your own personal beliefs, despite outside pressure.

0.000 3.53 3.76 2.37 24. The meaning of life is found in understanding ones ultimate purpose for life.

0.000 4.38 4.37 3.58 32. A meaningful life is one that contributes to the well-being of others.

0.001 3.38 2.61 2.53 33. The meaning of life is the same as a happy life.

0.003 2.64 2.05 1.84 35. Life has purpose only in the everyday details of living.

0.005 3.01 2.95 2.21 28. People unearth the same basic values when attempting to find the meaning of life.

0.007 3.93 4.27 3.42 7. To continually set short and long-term, achievable goals for yourself

0.008 3.67 3.51 2.84 9. To live up to the expectations of family and close friends

0.013 3.63 3.90 4.26 17. To care about the state of the physical/natural environment.

0.015 4.32 4.05 4.68 13. To be able to plan and take time for leisure.

0.015 2.72 2.56 1.95 21. To keep up with media and popular-culture trends.

0.041 4.50 4.54 3.95 30. Some aims or goals in life are more valuable than other goals.

Page 29: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

A step-wise discriminant function analysis was done to predict cluster membership and to attempt to identify the minimal set of survey statements used to identify cluster separation for the 128 participants in the study.

Page 30: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Table 2: Standardized Canonical Discriminant Function Coefficients

Function

Question 1 2 13. To be able to plan and take time for leisure. .339 -.069 14. To act on your own personal beliefs, despite outside pressure.

.012 -.351

25. The meaning of life can be discovered through intentionally living a life that glorifies a Spiritual being.

-.501 .118

26. There is a reason for everything that happens. -.336 .368 27. Obtaining things in life that are material and tangible is only part of discovering the meaning of life.

-.234 .287

34. The meaning of life is found in realizing my potential. -.042 .444 36. There is no, one, universal way of obtaining a meaningful life for all people.

.258 .319

39. Lives can be meaningful even without the existence of a God or spiritual realm.

.469 .468

40. Our lives have no significance, but we must live as if they do.

.268 -.637

Page 31: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Figure 2: Cluster Mean Scores for discriminating questions

0

1

2

3

4

5

Q13 Q14 Q25 Q26 Q27 Q34 Q36 Q39 Q40

Semi- Religious

Religious

Humanistic

Page 32: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

0

1

2

3

4

5

6

7

8

-4 -3 -2 -1 0 1 2 3 4 5 6

F1 (Discriminant function 1)

F 2 (

Dis

crim

inan

t fu

ncti

on 2

)

Semi-ReligiousReligiousHumanistic

religious Non-religious

Opt

imis

ticP

essi

mis

tic

Page 33: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

1. 96% of the cluster 1 respondents were correctly classified,

2. 88% of cluster 2 respondents were correctly classified, and

3. 84% of cluster 3 respondents were classified correctly.

Discrimination performance

Page 34: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Techniques for studying correlation and covariance structure

Principle Components Analysis (PCA)

Factor Analysis

Page 35: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Principle Component Analysis

Page 36: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Let x

and covariance matrix .

Definition:

1 1 1 p pC a x a x a x

have a p-variate Normal distribution

with mean vector

The linear combination

is called the first principle component if

1, , pa a a

is chosen to maximize

1Var C Var a x a a

subject to2 21 1pa a a a

Page 37: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Let

, 1 1g a V a a a a a a

Consider maximizing

subject to 2 21 1pa a a a

V Var a x a a

Using the Lagrange multiplier technique

Page 38: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Now

,1 0 if 1

g aa a a a

and

,2 2 0 if

g aa a a a

a

Thus is an eigenvector of and is the eigenvalue

associated with .

a

a

Page 39: Discrimination amongst k populations. We want to determine if an observation vector comes from one of the k populations For this purpose we need to partition

Also Var a x a a a a a a

Hence is maximized if is the largest

eigenvalue of .

Var a x

Summary

1 1 1 p pC a x a x a x

is the first principle component if 1

p

a

a

a

2 21i.e. 1pa a a a

is the eigenvector (length 1)of associated with the largest eigenvalue 1 of .