differentiating types of aphasia, a case study in modern data mining techniques

1

Differentiating Types of Aphasia: A Case Study

in Modern Data Mining Techniques By: Christopher Peter Makris & Thomas Todd

Background One of the most complex and unique functions of the human brain is its ability to both

understand and generate speech. Every day billions of people around the world communicate

with each other through verbal and written means. Arguably, these skills are vital to an

individual’s wellbeing. Unfortunately, each year approximately five to six hundred thousand

people worldwide develop a condition known as aphasia that threatens to destroy the victim’s

communication skills. Aphasia usually results from stroke or head trauma and impairs an

individual’s language ability. The ailment can manifest in many ways, and often greatly depends

upon the part of the brain that has been affected. Severity can range from mild speech

impairment to complete inability in communicating one’s thoughts.

Currently, there are eight different types of aphasia that are generally used among

clinicians for patient classification. To test for the various types of aphasia, several methods may

be employed. The most widely used is the Western Aphasia Battery (WAB) test. Additionally,

specific examinations concerning repetition, verb naming, or even a brain MRI scan can also

help aid in classifying patients. Among experts in the field, there is large debate over both the

classification methodology and the groups themselves. Many speech pathologists believe that the

current aphasia types are outdated, should be modified, and that other groupings may be more

informative.

Aims & Expectations Our primary goal is to help in better understanding how to classify aphasic patients based

on speech patterns and ability in verbal skills such as repetition, naming, comprehension, and

fluency. We would also like to determine if there are other groupings that may not specifically

align with the existing eight aphasia categories. If there is an underlying structure within our

data, highlighting it may be more helpful in classifying patients than the current clinician labels.

Ideally, we would like to either better define the current groups or create new classification

groups. Our findings could help decide specialized treatment plans for past and present patients

in order to ensure an efficient recovery.

Data & Missingness Information regarding patients’ demographics, WAB test scores, subtest scores, and error

measurements has been recorded for 161 patients from around the United States. The data is

currently stored online at www.talkbank.org. All patients within our dataset suffered from a

stroke and exhibit at least some type of aphasic behavior.

Our study will include a detailed analysis of data primarily regarding five distinct yet

intrinsically related tests. The first concerns the WAB test itself, whereas the four others concern

various error measurements taken throughout the administration of four subtests (free speech,

Cinderella, sandwich, and picture). For the free speech subtest, patients were asked to recount the

events leading up to and through their stroke. In the Cinderella subtest, patients were asked to tell

http://www.talkbank.org/

2

the familiar tale as best they could. For the sandwich subtest, patients were asked to describe the

process involved in constructing a peanut butter and jelly sandwich. Lastly, for the picture

subtest, patients were shown a series of pictures that outlined a story and were essentially asked

to narrate. For ease of future reference, patients were videotaped during the subtest

administration. After testing, the records were analyzed and various measurements were noted.

Of the original 161 patients available, 6 were removed prior to our analyses because of

test administration and protocol violations. They will not be included in the remainder of this

paper. For each of the valid 155 patients, 134 different variable measurements were taken for a

grand total of 20,770 individual cell entries. Of these entries, only 114 (approximately 0.549%)

were missing. All missing values were from the WAB test itself; however, there is no discernible

pattern among the blank values within the test variables. Thus, we assume that they are missing

completely at random. In order to avoid deleting entire patient records because of this sparse

missingness, we decide to impute averages based upon WAB aphasia type group membership.

Although this induces some group structure into our data, the assumption violation can be

relaxed because of the extreme sparseness of the missing data.

Patient Demographics We begin by summarizing some demographic information about the patients involved

within the study. Of the 155 patients, 56 (36.129%) are female and 99 (63.871%) are male. We

note that the number of males in our study is nearly double the number of females. A histogram

of patient age is depicted below in Figure 1. The distribution is approximately bell-shaped and

symmetric; the mean and median ages are nearly the same at approximately 63.054 and 64 years,

respectively. The standard deviation is approximately 11.871 years. The youngest subject is 34

years old, whereas the oldest is 91. There do not appear to be any overt outliers.

Figure 1: Histogram of patient age

In our dataset, we have at least one representative patient from each of eight different

classifications of aphasia: anomic, Broca, conduction, global, not aphasic (as determined by the

Western Aphasia Battery test), transcortical motor, transcortical sensory, and Wernicke. The pie

chart in Figure 2 shows the proportion of each type of aphasia that we have in our dataset. The

3

majority of the patients in the dataset are classified with anomic aphasia (49, approximately

31.613%). The smallest group in the data set is the transcortical sensory group which has only

one patient in the dataset, followed by the global and transcortical motor groups with three and

six patients, respectively. Fifteen individuals within our dataset were classified as not aphasic

because their WAB test score was considerably high; however, these subjects still exhibit some

aphasic behavior and thus will be included within our analysis.

Figure 2: Pie chart of WAB aphasia types

Univariate Exploratory Data Analysis Since there are over one hundred variables within our dataset, it is implausible to

graphically summarize all variable distributions. Thus, we choose to depict those variables that

displayed the most interesting behavior or those that may be pertinent for our future analyses.

The Western Aphasia Battery test returns a score called the Aphasia Quotient: a

numerical measure for each patient that is believed to help classify them into the eight

aforementioned groups of aphasia. A histogram of these values from our dataset is depicted in

below in Figure 3.

Figure 3: Histogram of WAB aphasia quotient

4

The distribution appears to be skewed left, towards lower values. The mean score is

approximately 70.028, while the median score is 72.6. The standard deviation is approximately

20.813. The lowest score an individual received is 16.1, whereas the highest is 99. Although the

distribution is skewed, there do not appear to be any overt outliers.

It is interesting to note that the WAB score splits aphasic patients into somewhat distinct

groups, which is to be expected considering its purpose (Figure 4). Although there is overlap in

some of the tails of the conditional distributions, the interquartile ranges generally appear to be

distinct. We note that the conditional distributions of the WAB score for the Broca and Wernicke

groups overlap quite a bit. Likewise, the conduction and transcortical motor conditional

distributions overlap heavily. On the other hand, the global, anomic, and mildly aphasic patients

do appear to be well-separated from the majority of the other types of patients.

Figure 4: Conditional boxplots of WAB aphasia quotient by aphasia type

The Boston Naming Test (BNT) is also another diagnostic tool often used in the

classification of aphasic patients. In contrast to the WAB scores, the Boston Naming Test scores

appear to be almost uniform (Figure 5). The mean and median scores are nearly the same at

approximately 7.213 and 7, respectively. The standard deviation is approximately 4.843 points.

Again, no outliers are clearly identifiable; however, it is interesting to note that most patients

score comparatively low on this test.

The Boston Naming Test scores do not seem to separate the WAB types quite as well

(Figure 6). Each individual conditional distribution appears to have a greater spread. Thus, the

conditional distributions tend to overlap much more frequently. There are still a couple groups,

however, that are quite well-separated from the majority of the other data. Once again, the

anomic and mildly aphasic patients seem to have interquartile ranges that do not overlap any of

the others. We also note that, although their overall distribution overlaps with the Broca and

Wernicke distributions, the global patients are again clustering towards very low scores.

The amount a patient speaks or the speed at which they speak may be indicative of how

impaired they are. Thus, it might be interesting to investigate the distribution of words spoken

per second. For graphical purposes, we choose to investigate this variable for the free speech

subtest (Figure 7).

5

Figure 5: Histogram of Boston Naming Test scores

Figure 6: Conditional boxplots of BNT scores by aphasia type

6

The distribution appears to be skewed right, towards higher values. The mean and median values

are approximately the same at 1.408 and 1.324 words per second, respectively. The standard

deviation is approximately 0.759 words per second. The minimum value is about 0.162 words

per second, whereas the maximum value is about 4.181. This maximum value might be

considered a mild outlier, as it is somewhat separated from the rest of the values. Also, this value

might seem a bit high for an aphasic patient, as they are speaking at a relatively fast rate.

Figure 7: Histogram of words spoken per second, free speech subtest

This variable seems to contain interesting separation information among the WAB

aphasia types (Figure 8). Although most of the interquartile ranges appear to overlap, there seem

to be two distinct groups. The anomic, conduction, mildly aphasic, and Wernicke patients appear

to score relatively higher than the Broca, global, transcortical motor and transcortical sensory

patients. This might serve as a split between relatively fluent and non-fluent patients.

Figure 8: Conditional boxplots of words spoken per second on the free speech subtest by aphasia type

7

The final variable we choose to discuss for our univariate exploratory data analysis

concerns the number of words a patient gets correct when they make no errors. The distribution

is depicted in Figure 9 below. In general, the distribution appears to be skewed towards lower

values, but it may also be bimodal. The mean is approximately 17.215 words whereas the median

is 20 words. The standard deviation is approximately 11.72 words. The minimum value is 0

words correct, and the maximum value is 31. Although no outliers are present, we note that there

is a low mode below 5 words correct and a high mode above 30 words correct.

Figure 9: Histogram of number of words correct with no errors

Once again, there is some separation when inspecting the conditional distribution based

on the WAB classification type (Figure 10). The anomic, mildly aphasic, transcortical motor and

transcortical sensory patients all have relatively high values, whereas the Broca, conduction,

global, and Wernicke patients all have relatively low values.

Figure 10: Conditional boxplots of words correct with no errors by aphasia type

8

Bivariate Exploratory Data Analysis Although the main focus of our project is to come up with new classification groupings

that do not incorporate the overall measurements of the Western Aphasia Battery test, we have

not restricted our use of information concerning the final score of each patient’s performance on

the Boston Naming Test. Since theoretically these tests are somewhat related, we would like to

determine how they interact with one another. Within the scatterplots in Figure 11 below, we

plot the two variables against each other and color the observations by their aphasia type as

determined by the WAB test. Even though the different groups appear to be somewhat widely

spread out across scores, the WAB and BNT tend to agree when in reference to overall high and

low scores. This is reflected in the high positive correlation of approximately 0.803 between the

two variables.

Figure 11: Scatterplots of BNT score against WAB AQ score, colored by aphasia type

9

We also examined the relationship between the spontaneous speech score and the naming

score for the aphasia quotient. The two variables have a high positive correlation of

approximately 0.776. In the scatterplot below, we can see that the anomic and mildly aphasic

groups tend to score high in both categories. If we restrict our attention to only the Broca

patients, there appears to be an evident linear trend between the spontaneous speech and naming

scores. If a Broca scored low on spontaneous speech then they also tend to score low on naming

as well. Furthermore, although the Wernicke group seems to be quite scattered, patients within

this category tend to score in the higher ranges of the spontaneous speech measurement.

Overall, there are no clear classifications that can be extrapolated from these plots;

however, information from the pairs of variables may help in some similar patient separation

such as in segmenting the anomic and mildly aphasic individuals from the rest of the subjects.

Figure 12: Scatterplots of naming score against spontaneous score, colored by aphasia type

10

Subtest Comparisons The four subtests in our data are theoretically intrinsically related, but may serve to

measure different patient characteristics. Although the same types of error measurements are

recorded for each patient among each subtest, we would like to determine if there are any

distinguishing factors between each of the subtest exams.

We begin by comparing the total number of utterances across each of the subtests (Figure

13a). We can see each individual distribution is skewed towards higher values, indicating that

most people did not speak for an extended period of time for each subtest. The mode of each

distribution is located between approximately 10 and 50 words. We can see that most individuals

completed the sandwich test with relatively few words, whereas most individuals freely spoke

about their stroke comparatively much longer.

Although the distributions look somewhat different when we examine only the total

number of utterances, when we normalize by time nearly all of the distributions appear to be

similar (Figure 13b). This indicates that, although patients may have spoken for variable periods

of time among the four subtests, the average rate at which they spoke was approximately

between .1 and .3 words per second regardless of subtest.

Figure 13: Kernel density estimates of: a. total number of utterances by subtest; b. total utterances per second by subtest

# Utterances Free Speech Sandwich Cinderella Picture

Free Speech 1.000 0.408 0.312 0.246

Sandwich 0.408 1.000 0.470 0.366

Cinderella 0.312 0.470 1.000 0.603

Picture 0.246 0.366 0.603 1.000

# Utt./Sec. Free Speech Sandwich Cinderella Picture

Free Speech 1.000 0.230 0.351 0.634

Sandwich 0.230 1.000 0.350 0.277

Cinderella 0.351 0.350 1.000 0.471

Picture 0.634 0.277 0.471 1.000 Table 1: Correlations among subtests: a. total number of utterances by subtest; b. total utterances per second by subtest

11

Correlations among the four tests’ total number of utterances and total number of

utterances per second are depicted in the tables on the previous page (Tables 1ab). When

considering either the total number of utterances or the total number of utterances per second,

most of the pairwise inter-subtest correlations are relatively weak (ranging from about 0.230 to

0.471). We note that, in terms of total number of utterances, the Cinderella and picture subtests

are moderately correlated ( ) and also, in terms of total number of utterances per

second, the free speech and picture subtests are moderately correlated ( ). Similarly, we also inspected the total number of words and the total number of words per

second across each of the subtests (Figures 14ab). The distributions for the total number of words

appear to be nearly identical to the distributions for the total number of utterances. This makes

intuitive sense since the two variables are measuring interrelated outcomes. Also, once we

normalize by time the subtest distributions are nearly indistinguishable.

Correlations among the four tests’ total number of words and total number of words per

second are much stronger in comparison to the utterance measurements (Tables 2ab). When

considering either the total number of words or the total number of words per second, most of the

pairwise inter-subtest correlations are relatively strong (ranging from about 0.463 to 0.908).

Figure 14:Kernel density estimates of: a. total number of words by subtest; b. total words per second by subtest

# Words Free Speech Sandwich Cinderella Picture

Free Speech 1.000 0.549 0.528 0.463

Sandwich 0.549 1.000 0.627 0.616

Cinderella 0.528 0.627 1.000 0.759

Picture 0.463 0.616 0.759 1.000

# Words/Sec. Free Speech Sandwich Cinderella Picture

Free Speech 1.000 0.757 0.809 0.908

Sandwich 0.757 1.000 0.686 0.797

Cinderella 0.809 0.686 1.000 0.825

Picture 0.908 0.797 0.825 1.000 Table 2: Correlations among subtests: a. total number of words by subtest; b. total words per second by subtest

12

Principal Component Analysis As noted before, there are a total of 134 variables within our dataset. Rather than using all

measurements upon all patients, we might be able to summarize a majority of the information

contained within the overall dataset by using just a few key variable combinations. To

investigate what these combinations may be, we conduct a principal component analysis.

Within the scree plot below (Figure 15), we can see that the first two principal

components explain most of the variability in the dataset (approximately 24.934% and 12.984%,

respectively). These percentages drop off quite quickly thereafter; the third principal component

explains only about 6.595% of the variability, whereas the fourth and fifth explain 4.907% and

4.487%, respectively. The rest of the principal components each explain less than 4% of the

variability in the dataset. This is an indication that the first four or five principal components

may be sufficient to describe the primary nuances inherent within our data.

Figure 15: Scree plot, percent variability explained

We plot the first five principal components against each other and color by the WAB

aphasia types on the following page (Figure 16). Note that the WAB coloring scheme is the same

as before. There does not appear to be much clear separation among all of the groups; however,

the red (anomic) and light blue (mildly aphasic) observations appear to most consistently be

somewhat clustered together. Furthermore, although similar observations often overlap with

observations from other groups, observations within the same group often appear to be clustered

near one another. This is an indication that the first few principal components may contain

separation information that could potentially validate the current aphasia type labeling standard.

It is also important to understand what information each principal component tends to

summarize. In order to do so, we graph the principal component loadings against the column

variables to which they are assigned (Figures 17a-e). We note that the horizontal direction in this

series of graphs is somewhat arbitrary; however, we order the column variables by the subtest to

which they are associated. Therefore, each plot is somewhat separated into five pieces. From left

to right, the sections correspond to the WAB, free speech, Cinderella, sandwich, and picture

subtests.

13

Figure 16: Pairs plot of first five principal components, colored by aphasia type

14

The first principal component loadings (Figure 17a) have two primary interesting

features: the positive values, and the most extreme negative values. Nearly all of the positive

loading values correspond to variables concerning the total number of unintelligible words

among all of the subtests. In contrast, all of the extreme negative values have to do with errors

made within each of the subtests. Specific variables highlighted by the extreme negative loadings

are the total number of repetitions, the total number of retracings/revisions, and the total number

of fragment filler words used. Therefore, this first principal component seems to help delineate

between the amount that a patient speaks and the number of errors that they make.

Separation of variables is much more distinct in the second principal component loadings

(Figure 17b). It is clear that most of the first twenty variables have highly negative loadings,

whereas the remaining variables have marginally positive and negative loadings with no clear

structure. The variables that have comparatively highly negative loadings correspond to the

measurements taken on the WAB test. Therefore, the second principal component seems to

delineate between the WAB test and all other subtests.

Rather than depicting clear separation among variables, the third principal component

loadings seem to pair together whole subtests (Figure 17c). Most of the positive loadings appear

to correspond to the variables within the free speech and sandwich subtests, whereas most of the

negative loadings correspond to the variables within the WAB, Cinderella, and picture subtests.

Most notably, the negative loadings with the highest magnitude belong to the WAB subtest

measurements indicating whether or not various questions were administered again. Overall, this

principal component seems to imply that an individual patient’s performance on the free speech

and sandwich subtests might measure one aspect of aphasic impairment, whereas a patient’s

performance on the WAB, Cinderella, and picture subtests might measure another.

The interpretation for the fourth principal component loadings (Figure 17d) once again

returns to specific variables. Most of the comparatively larger positive loadings correspond to

variables measuring dysfluency within-word errors and the total number of filler words an

individual patient uses across all of the subtests. In contrast, most of the comparatively larger

negative loadings correspond to variables measuring the total number of semantic errors an

individual patient makes and the total number of utterances/words an individual patient says

across all of the subtests. Thus, this principal component seems to make a distinction among the

different types of errors, along with the amount an individual patient speaks.

Lastly, we interpret the fifth principal component loadings (Figure 17e). This principal

component also concerns combinations of specific types of variables across all of the subtests.

All of the largest positive loadings primarily correspond to variables measuring the total number

of utterances, words, or fillers an individual patient speaks across all of the subtests. On the other

hand, all of the largest negative loadings correspond to variables measuring the total number of

neologistic and phonological errors. Thus, the fifth principal component seems to make a

distinction between the amount an individual speaks with the number of errors they make. Note

that it appears that the first and fifth principal component interpretations seem to be the same on

the surface; however, the error measures that are incorporated in each of the principal

components are different: the first principal component includes error measures of correct words

that may be repeated, whereas the fifth principal component includes error measures of incorrect

words only.

It is interesting to note that each of the first five principal components have unique

plausible interpretations. These may help dictate what the important variables are for the

remainder of our analysis.

15

Figure 17a-e: First five principal component loadings

16

Choice of K Clusters & K-Means Simulations One of the most critical parts of clustering problems is determining the appropriate

number of clusters, K, that should be identified within the data. There are several different types

of clustering methods available for use, each having different advantages and disadvantages. For

the purposes of our analysis, we focus on the K-means clustering algorithm.

K-means is an iterative process in which a number of clusters is defined and K random

centers are selected. The observations are then assigned to the closest center. Afterwards, the

center of each cluster is recalculated, and the process continues until there is little to no change in

the cluster assignments. Unfortunately, this process is nondeterministic meaning, that it can yield

various solutions depending on the choice of the initial random centers. To ensure that we have a

stable solution, the K-means algorithm must be repeated several times for a given value of K.

Furthermore, given a value of K, we must determine how well our classification method

is performing. There are two measures we choose to investigate as indicators of our algorithm’s

performance: the Adjusted Rand Index (ARI) and the Fowlkes-Mallows Index (FMI). The ARI is

a measure of how similar observations in the same cluster are to one another. On the other hand,

the FMI is a measure of the accuracy of the decided groupings. In general, we would like to

maximize both the ARI and FMI measures, as this would indicate a good clustering solution.

To objectively determine the best choice of K for our data, we run 1,000 simulations of

the K-means algorithm for each choice of K from two through ten. For each value of K, the most

stable solution is selected, and the ARI and FMI are calculated. Figure 18 below summarizes the

information gained by this procedure. Note that it is clear that, from this perspective, the best

choice for K is five clusters. Thus, we will use as an approximate benchmark for the

remainder of our analyses.

Figure 18: FMI against ARI for stable K-means solutions

17

Dataset Restriction It is clear that the WAB test itself plays a heavy role in delineating among the eight types

of aphasia as it was designed to do so; however, for our research, we wish to either validate these

classification groups by arriving at them in a different manner, or find other plausible groupings

as dictated by the data. Thus, for the remainder of our analyses we will not include variables

measured from the WAB test itself. We do so in order to not convolute our results with that

which would be expected from the WAB test in the first place. The only variable exception is the

auditory verbal comprehension score component of the aphasia quotient. This measure was left

in as a recommendation by expert clinician opinion as an accurate way to measure a patient’s

comprehension ability. From a statistical standpoint, since it is the only remnant variable of the

WAB test, leaving it within our analyses should not cause any theoretical problems.

Random Forest & Tree Clustering In the realm of classification problems, decision trees are arguably among the most

flexible and powerful tools. They are often simple to understand, and also relatively easy to

create. One main drawback is that decision trees are also often sensitive to small changes within

the data. To remedy this, we can implement a bagging and bootstrapping technique that will help

distinguish between data structure and noise. Ultimately, we can arrive at a stable solution.

For our project, we choose to use the random forest method. Rather than creating a

single, potentially unstable decision tree, we will create 5,000 decorrelated decision trees. Each

of the trees essentially represents a vote for the classification label for each patient observation.

By taking the most frequently appearing classification label for each observation, we eliminate

much of the noise and arrive upon a more stable and precise result. Note that this methodology is

very similar to the simulations conducted in the previous section.

Figure 19: Variable importance for random forest method

Each of the trees randomly samples a subset of the variables that it will use to create the

best splits among the observations. A measure of how well the chosen variables separate the

18

observations is calculated, called a Gini index. In general, a higher mean decrease in the Gini

index indicates that the associated variable performs observation separation well. The variable

importance plot on the previous page (Figure 19) depicts a ranking of the variables based on the

mean decrease in the Gini index. Most notably, the variable that appears to do the best job in

separating the groups of aphasic patients is the auditory verbal comprehension score component

of the aphasia quotient. This is somewhat expected simply because of the nature of the variable.

The next few important variables concern various measures of correct words.

In order to gauge the high-dimensional structure of the random forest results, we use

multidimensional scaling to project the observations down to two dimensions. The scatterplot in

Figure 20 depicts this projection colored by the aphasia type as determined by the WAB. We can

see that the random forest proximities appear to perform well when trying to separate groups of

aphasia types; similarly colored observations tend to be clustered nearby each other.

Figure 20: Scatterplot of first two multidimensional scaling coordinates, by aphasia type

To further investigate the high-dimensional structure of the random forest results, we

graph the first three multidimensional scaling coordinates and color them both by the aphasia

types as determined by the WAB and also colored by the five groups the random forest method

found (Figures 21ab). Note that the WAB coloring scheme is the same as before. Separation is

clear among the three dimensions. In either of the plots, one can argue that as you follow patients

from one color to another along the truncated triangular pathway, you pass through patients that

are on a scale of severity. The original eight WAB aphasia type classifications mostly align with

the five random forest aphasia type classifications, except that the relative simplicity of the

random forest solution somewhat blurs the distinction between the old groupings.

19

Figure 21: 3D multidimensional scaling plots colored by: a. aphasia type; b. random forest type

20

A major drawback from applying the random forest algorithm is that there is no way to

visualize the average decision tree. Because it is pertinent to our study to understand how the

variables are being split, we manually fit a single decision tree to our entire dataset (Figure 22).

We note that we should use the information shown in this tree with caution as we are severely

overfitting to the characteristics among our dataset. On the other hand, almost all of the variables

that were implemented in the overall tree appeared in the variable importance plot from the

random forest method.

We note that the resulting nodes yield some information pertaining to potential clustering

solutions for new observations. For example, it is clear that the anomic patients are most similar

to the mildly anomic patients. Also, there may be two groups of the Broca patients: one that is

more similar to the Wernicke patients, and one that is more similar to the conduction patients.

Iterative Spectral Clustering As we have seen from the principal components analysis and random forest algorithm, it

is clear that there is high-dimensional structure among our data in terms of the variables. This

motivates a spectral clustering analysis, in which we are mainly concerned with all pairs of

observations instead of the variables themselves.

The main goal is to cluster observations into groups. Ideally, observations in the same

group are more similar to each other than observations in different clusters. Many different

clustering methods are available for this purpose; however, most other clustering algorithms rely

on assumptions that may not always be applicable or practical. For example, the success of the

model-based clustering algorithm depends on the assumption that the true group structure is

elliptical in shape. The spectral clustering method relaxes the shape assumption and therefore can

be applied more broadly.

In order to use the spectral clustering methodology, we must first define a measure of

similarity between pairs of observations. We will call this measure an observation’s affinity to

another observation. Intuitively, observation pairs with larger affinity values are comparatively

more similar, and thus should theoretically be clustered together. On the other hand, observation

pairs with smaller affinity values are comparatively more dissimilar, and thus should

theoretically be clustered into different groups.

Figure 22: Overall decision tree for aphasia type classification

21

The mathematical definition of affinity varies among scenarios. In general, one must

decide upon specific choices for tuning parameters that could change the results of the analysis

quite drastically. For our problem, we must decide upon two main tuning parameters: a distance

metric, and neighborhood definition . The former is a rule for calculating the distance between

observations in high-dimensional space, whereas the latter plays a part in defining both how

close one observation is to one another and how easily two observations may be considered

members of the same cluster.

For our analysis, we first decided to scale our overall dataset to ensure that all variable

measurements are taken into account. We then used the squared Euclidean distances between

observations and a neighborhood parameter . These selections were made not only because

they highlight group structure, but also because they are quite stable. In running multiple

simulations of the spectral clustering algorithm upon the affinities defined in this manner,

solutions are incredibly consistent and thus robust.

Below is a graphical representation of each individual patient’s affinity profile compared

to all other patients within the dataset (Figure 22). Note that affinity is bound between 0

(extremely dissimilar) and 1 (identical). Under this representation of the affinities, most

observation distributions have an average affinity value between .4 and .5. This indicates that the

chosen affinity definition is successful in partitioning observations into those that are similar and

those that are dissimilar from the reference frame of each individual observation.

Figure 23: Boxplots of individual patient affinities between all other patients

We also inspected a heat map of the affinity matrix (Figure 23a) in order to see if any

group structure is present. In this graph, each pixel represents an observation pair’s affinity. The

coloring is on a scale: red values are lower (closer to 0), whereas yellow values are higher (closer

to 1). Because of the way we ordered the observations in our dataset, any apparent block-

structuring in the heat maps would be reflective of similar groupings among the observations in

terms of at least the old clinician classification labels. If we focus our attention on only those

observations that have a relatively high affinity of greater than or equal to .75 (Figure 23b), it is

clear that there is some block-structure. This is extremely beneficial, and exactly what we would

want to see before performing the spectral clustering analysis as it validates our notion of using

affinities to partition observations into similar groups.

22

Figure 24: Heat maps of pairwise affinities: a. all affinities; b. affinity values

For the purposes of this analysis, we do not want to make any assumptions concerning

the number of clusters we should ultimately be looking for. Thus, rather than using the classical

spectral clustering algorithm, we defined a new iterative approach that allows halting at any

number of clusters. This consideration of binary splits was also implemented in order to compare

against the random forest splitting process.

During this iterative approach, we begin with all of the observations in the same cluster.

We then analyze the main structure present in the principal component decomposition of the

affinity matrix in order to understand which observations are more similar. Then, based on the

observed structure, we partition the observations into the two most common subgroups that

appear over thousands of simulations of the K-means algorithm. Of the resulting subgroups, the

next candidate of a binary split is the subgroup that has the largest average within-cluster sum of

squared distances from the cluster mean. This is a measure of the average dissimilarity among a

single cluster. Intuatively, the cluster that has the largest within-cluster sum of squared distances

from the cluster mean is likely the most dissimilar and thus should be split into smaller groups.

To ensure that resulting groups are not too small, we can also define a size of groups that we

would consider appropriate to halt candidacy upon. This would guarantee that we could avoid

breaking one single group down to just a couple of observations while also avoiding leaving

other considerably larger groups undivided.

On the following page, we illustrate the main application of our iterative spectral

clustering approach (Figure 25). Because we noted earlier that potentially the best number of

clusters to choose based on the adjusted rand index and the Fowlkes-Mallows index is

approximately five or six, we halt the algorithm once we reach six subgroups. We also note that

we induced a limiting condition in which we did not consider resulting groups of less than

⌈ ⌉ ⌈ ⌉ observations as candidate clusters for binary splits. This halting factor

was implemented in order to not continue breaking down clusters of already small size while

leaving large clusters intact for the final clustering solution. Notice that although the methods are

completely different, the structure of this iterative spectral clustering process is reflective of the

random forest and tree solutions.

23

Figure 25: Iterative spectral clustering algorithm solution

We note that this iterative approach appears to be quite flexible. Resulting clusters

generally have approximately the same average within-cluster sum of squared distances from the

mean, implying that they are all perform approximately the same when identifying underlying

group structure. Furthermore, we note that resulting cluster sizes can vary quite widely. This is

another advantage of the iterative method, as it does not limit the size of the various possible

clustering solutions.

Although we are ultimately interested in coming up with potentially new groupings for

aphasic patients, it is interesting to compare the resulting clusters from the itereative spectral

clustering approach to those previously given by clinicians. We illustrate a confusion matrix with

the clinician classifications in the rows and our clustering solution based upon the spectral

clustering analysis as the columns below (Table 3). Note that the column names are somewhat

arbitrary.

Red Orange Yellow Green Blue Purple

Anomic 24 25 0 0 0 0

Broca 0 3 20 0 3 11

Conduction 0 16 1 1 5 9

Global 0 0 3 0 0 0

Not Aphasic 14 1 0 0 0 0

Trans. Motor 5 0 0 0 1 0

Trans. Sensory 0 0 0 0 0 1

Wernicke 0 5 1 5 1 0 Table 3: Iterative spectral clustering groupings against aphasia types

One aspect that is important to highlight is that the anomic patients were split into two

groups. In one group (red), half of the anomic patients are paired with nearly all of the not

aphasic patients and the transcortical motor patients, whereas in the other group (orange) the

other half are paired with most of the conduction and some Wernicke patients. This is

interesting, as it partitions the anomic patients into those who have a mild form of aphasia and

those who have a severe form of aphasia. Likewise, the Broca patients were primarily split into

24

two groups. In one group (yellow), most of the Brocas are paired with the global aphasic patients

in our dataset, whereas in the other (purple) most of the remaining Brocas are paired with some

conduction patients. Again, this seems to partition the Broca patients into those who have a

comparatively more mild form of aphasia and those who have a more severe form of aphasia.

Tree Splits & Spectral Groupings In order to compare how the different methods are splitting up patients, we illustrate the

variables of the first few splits of the tree against the cluster groupings yielded from the iterative

spectral clustering algorithm (Figure 26a-c). Ideally, we would like to see clear separations

among each of the conditional distributions. This would indicate that both methods are finding

similar structure among the data. Furthermore, the methods would serve as validations upon one

another for the discovery of similar patients.

The variable first split upon in the tree is the number of words correct with no errors. It is

clear that there are spectral clustering groupings whose distributions appear centered about

higher values, whereas there are other groupings whose distributions appear centered about

lower values (Figure 26a). We note that the tree makes a distinction between observations with

values above and below 21.5 words. We can see that the red and orange groups appear to contain

nearly all of the observation values that are above 21.5 words, whereas the yellow, green, blue,

and purple groups appear to contain nearly all of the observation values that are below 21.5

words. Thus, the most important split as defined by the cluster tree is reflected in the final

spectral clustering groupings.

On the left-hand side of the tree, the next variable that partitions observations is the WAB

auditory verbal comprehension score. Once again, the spectral groupings do present some

separation among the variable (Figure 26b). We note that the tree makes a distinction between

observations with values above and below a score of 6.475. The red, orange, blue, and purple

groups appear to contain nearly all of the observation values that are above a score of 6.475,

whereas the yellow and green groups appear to contain nearly all of the observation values that

are below a score of 6.475. For the second level of the tree, the WAB auditory verbal

comprehension score serves as a good measure for group separation.

Lastly, on the right-hand side of the tree, the second variable that partitions observations

measures the open word list repetition for each patient. The interquartile ranges for the spectral

groupings on this part of the tree do not overlap at all (Figure 26c). Once again, this is an

indication of both the tree and spectral clustering algorithms finding similar structure among the

data. We note that the tree makes a distinction between observations with values above and

below a score of 2.25. The red group appears to contain most of the observation values above a

score of 2.25, whereas the orange group appears to contain most of the observation values below

a score of 2.25. Thus, the open word list repetition variable also performs well when attempting

to separate similar observations both in reference to the tree and also the spectral clustering

groupings.

As noted before, the variables among the decision tree were determined to have high

importance based on the Gini coefficient values. Although there are nuance differences between

the decision tree and iterative spectral clustering methods, the fact that the iterative spectral

clustering algorithm independently captures the high-dimensional structure that the important

decision tree variables provide alludes to the notion that both methods are comparable. Thus, we

are confident that both methods provide plausible groupings that may serve as solutions to our

primary research aim.

25

Figure 26: Tree splits by spectral groups: a. words correct no errors; b. WAB aud./ver. comp. score; c. rep. open word list

26

Conclusion Ideally, we would like to summarize the newfound groupings as succinctly as possible. In

an effort to combine all that we have learned throughout our entire analysis, we attempt to

categorize patients into one of five groups based upon four basic verbal measurements: repetition

skill, naming skill, comprehension skill, and fluency. We summarize the groupings below in

Figure 27. Note that the general severity of aphasia increases as we pass from the leftmost to the

rightmost bubble. Summary measures indicative of the four skills used to create these groupings

are given on the following page (Table 4a-d). It is important to note that most of these scales are

fluid in that there is no steadfast cutoff value for each of the categories.

By simple inspection of the aforementioned four basic verbal measurements, a new

aphasic patient from outside of our study may be categorized into one of the five proposed

groups below. We hope that this tool will be useful in aiding the understanding of aphasic patient

behavior, and may potentially help in dictating recovery strategies for future patients.

Figure 27: Proposed groupings

27

Repetition: # Words Correct, No Errors

1st Quart. Mean 3

rd Quart.

Light Blue 76.00 79.78 87.00

Green 44.00 55.42 71.00

Purple 7.25 21.00 32.25

Orange 4.75 19.90 32.00

Brown 0.00 3.08 3.00

Naming: Boston Naming Test Score Naming: Verb Naming Test Score

1st Quart. Mean 3

rd Quart. 1

st Quart. Mean 3

rd Quart.

Light Blue 8.00 10.86 14.00 Light Blue 19.00 19.72 22.00

Green 7.00 9.00 12.00 Green 13.25 16.54 21.00

Purple 1.00 3.19 4.50 Purple 6.50 10.88 15.75

Orange 2.00 5.05 7.00 Orange 8.00 12.62 16.00

Brown 0.00 1.76 3.00 Brown 0.00 4.28 7.00

Comprehension: WAB Auditory Verbal Comprehension Score

1st Quart. Mean 3

rd Quart.

Light Blue 8.83 9.27 9.94

Green 7.51 8.44 9.58

Purple 6.10 7.42 8.75

Orange 6.80 7.42 8.25

Brown 4.20 5.95 7.85

Fluency: Words Per Second (Free Speech) Fluency: Words Per Second (Cinderella)

1st Quart. Mean 3rd Quart. 1

st Quart. Mean 3

rd Quart.


Green 1.03 1.58 2.03 Green 0.71 1.19 1.58

Purple 1.31 1.61 2.10 Purple 0.60 1.29 1.97

Orange 0.78 1.14 1.43 Orange 0.52 0.78 1.04

Brown 0.51 0.77 1.06 Brown 0.00 0.33 0.60

Fluency: Words Per Second (Sandwich) Fluency: Words Per Second (Picture)

1st Quart. Mean 3

rd Quart. 1

st Quart. Mean 3

rd Quart.


Green 0.78 1.24 1.51 Green 0.76 1.32 1.81

Purple 0.70 1.42 1.81 Purple 0.92 1.52 2.06

Orange 0.44 0.92 1.12 Orange 0.69 0.95 1.23

Brown 0.00 0.16 0.27 Brown 0.35 0.56 0.66

Table 4: Summary measures for new groupings: a. repetition; b. naming; c. comprehension; d. fluency

28

Appendix A: Tree Group Membership

Note: Groups are numbered from leftmost to rightmost node appearance on the decision tree (Figure 22)

Group #1 (8 Patients) [1] kansas06a kansas08a scale07a TAP14a adler19a scale09a TAP03a

[8] scale12a

Anomic Broca Conduction Global NotAphasicByWAB

0 4 0 3 0

TransMotor TransSensory Wernicke

0 1 0

Group #2 (12 Patients) [1] adler10a adler11a kansas01a kansas16a kempler04a scale03a

[7] scale18a scale27a TAP13a tucson02a wright206a kansas12a


0 11 0 0 0


0 0 1

Group #3 (11 Patients) [1] scale01a TAP09a adler06a adler23a elman12a elman14a

[7] kansas05a scale11b thompson05a tucson13a tucson15a


0 1 0 1 0


0 0 9

Group #4 (13 Patients) [1] adler13a adler16a elman11a kansas02a kansas09a

[6] scale10a TAP06a TAP16a TAP17a williamson12a

[11] wright201a wright205a kansas14a


0 12 0 0 0


0 0 1

Group #5 (5 Patients) [1] scale02a scale18b wright207a scale04a TAP11a


1 2 2 0 0


0 0 0

Group #6 (9 Patients) [1] adler25a elman06a scale15b scale26a kansas13a

[6] scale13a scale15a tucson11a williamson11a


0 4 5 0 0


0 0 0

29

Group #7 (15 Patients) [1] adler02a adler05a elman02a fridriksson13a kansas10a

[6] kansas20a scale11a TAP02a TAP15a tucson12a

[11] williamson01a williamson03a williamson04a williamson06a williamson09a


0 0 15 0 0


0 0 0

Group #8 (8 Patients) [1] adler24a TAP18a elman01a scale06a scale06b TAP12a tucson08a

[8] wright203a


2 0 6 0 0


0 0 0

Group #9 (5 Patients) [1] adler15a adler21a scale02b tucson07a elman03a


4 1 0 0 0


0 0 0

Group #10 (6 Patients) [1] scale05b adler04a adler18a scale05a scale19a thompson09a


1 0 0 0 0


5 0 0

Group #11 (5 Patients) [1] kempler03a adler14a scale23a williamson14a thompson03a


0 1 3 0 0


0 0 1

Group #12 (8 Patients) [1] adler08a adler17a adler20a scale14a scale22a thompson11a

[7] tucson16a kansas21a


7 0 1 0 0


0 0 0

Group #13 (7 Patients) [1] adler12a elman07a fridriksson05a kansas15a kansas18a

[6] williamson02a kansas17a


6 0 0 0 0


1 0 0

30

Group #14 (7 Patients) [1] adler01a cmu03a elman10a kansas19a TAP01a TAP04a kansas03a


6 0 0 0 1


0 0 0

Group #15 (8 Patients) [1] thompson10a thompson13a wright202a wright204a adler22a kansas04a

[7] scale16a scale20a


4 0 0 0 4


0 0 0

Group #16 (10 Patients) [1] adler03a adler07a elman04a kansas07a scale21a

[6] tucson01a tucson04a williamson05a williamson10a williamson13a


0 0 0 0 10


0 0 0

Group #17 (18 Patients) [1] adler09a elman05a kansas11a kempler02a scale08a

[6] scale17a thompson01a thompson02a thompson04a thompson06a

[11] thompson07a thompson07b thompson08a tucson06a tucson10a

[16] williamson07a williamson08a williamson15a


18 0 0 0 0


0 0 0

31

Appendix B: Iterative Spectral Clustering Group

Membership

Group “Red” (43 Patients) [1] adler01a adler12a adler24a cmu03a elman05a elman10a

[7] fridriksson05a kansas18a kansas19a kempler02a scale05b scale08a

[13] scale14a TAP01a TAP04a thompson07a thompson13a tucson06a

[19] tucson10a tucson16a williamson02a williamson15a wright202a wright204a

[25] adler03a adler07a adler22a elman04a kansas03a kansas04a

[31] kansas07a scale16a scale20a tucson01a tucson04a williamson05a

[37] williamson10a williamson13a adler04a kansas17a scale05a scale19a

[43] thompson09a


24 0 0 0 14


5 0 0

Group “Orange” (50 Patients) [1] adler08a adler09a adler15a adler17a adler20a adler21a

[7] elman07a kansas11a kansas15a scale02a scale02b scale17a

[13] scale22a TAP18a thompson01a thompson02a thompson04a thompson06a

[19] thompson07b thompson08a thompson10a thompson11a tucson07a williamson07a

[25] williamson08a adler25a elman03a kempler03a adler02a adler14a

[31] elman01a elman02a fridriksson13a kansas21a scale04a scale11a

[37] scale13a scale23a tucson08a williamson01a williamson09a williamson11a

[43] williamson14a wright203a scale21a elman12a elman14a kansas14a

[49] thompson03a thompson05a


25 3 16 0 1


0 0 5

Group “Yellow” (25 Patients) [1] adler10a adler11a elman06a kansas01a kansas02a kansas06a kansas08a kansas09a

[9] kansas16a scale03a scale07a scale10a scale27a TAP06a TAP14a TAP16a

[17] TAP17a tucson02a wright205a adler19a TAP11a scale09a TAP03a TAP09a

[25] adler06a


0 20 1 3 0


0 0 1

Group “Green” (6 Patients) [1] williamson06a kansas05a kansas12a scale11b tucson13a tucson15a


0 0 1 0 0


0 0 5

32

Group “Blue” (10 Patients) [1] elman11a scale01a scale18b adler05a kansas10a scale15a tucson11a tucson12a

[9] adler18a adler23a


0 3 5 0 0


1 0 1

Group “Purple” (21 Patients) [1] adler13a adler16a kempler04a scale15b scale18a scale26a

[7] TAP13a williamson12a wright201a wright206a wright207a kansas13a

[13] kansas20a scale06a scale06b TAP02a TAP12a TAP15a

[19] williamson03a williamson04a scale12a


0 11 9 0 0


0 1 0

33

Appendix B: Proposed Group Membership

Group “Light Blue” (43 Patients) [1] adler01a adler12a adler24a cmu03a elman05a elman10a

[7] fridriksson05a kansas18a kansas19a kempler02a scale05b scale08a

[13] scale14a TAP01a TAP04a thompson07a thompson13a tucson06a

[19] tucson10a tucson16a williamson02a williamson15a wright202a wright204a

[25] adler03a adler07a adler22a elman04a kansas03a kansas04a

[31] kansas07a scale16a scale20a tucson01a tucson04a williamson05a

[37] williamson10a williamson13a adler04a kansas17a scale05a scale19a

[43] thompson09a


24 0 0 0 14


5 0 0

Group “Green” (50 Patients) [1] adler08a adler09a adler15a adler17a adler20a adler21a

[7] elman07a kansas11a kansas15a scale02a scale02b scale17a

[13] scale22a TAP18a thompson01a thompson02a thompson04a thompson06a

[19] thompson07b thompson08a thompson10a thompson11a tucson07a williamson07a

[25] williamson08a adler25a elman03a kempler03a adler02a adler14a

[31] elman01a elman02a fridriksson13a kansas21a scale04a scale11a

[37] scale13a scale23a tucson08a williamson01a williamson09a williamson11a

[43] williamson14a wright203a scale21a elman12a elman14a kansas14a

[49] thompson03a thompson05a


25 3 16 0 1


0 0 5

Group “Purple” (16 Patients) [1] elman11a scale01a scale18b adler05a kansas10a scale15a

[7] tucson11a tucson12a williamson06a adler18a adler23a kansas05a

[13] kansas12a scale11b tucson13a tucson15a


0 3 6 0 0


1 0 6

Group “Orange” (21 Patients) [1] adler13a adler16a kempler04a scale15b scale18a scale26a

[7] TAP13a williamson12a wright201a wright206a wright207a kansas13a

[13] kansas20a scale06a scale06b TAP02a TAP12a TAP15a

[19] williamson03a williamson04a scale12a


0 11 9 0 0


0 1 0

34

Group “Brown” (25 Patients) [1] adler10a adler11a elman06a kansas01a kansas02a kansas06a kansas08a kansas09a

[9] kansas16a scale03a scale07a scale10a scale27a TAP06a TAP14a TAP16a

[17] TAP17a tucson02a wright205a adler19a TAP11a scale09a TAP03a TAP09a

[25] adler06a


0 20 1 3 0


0 0 1

differentiating types of aphasia, a case study in modern data mining techniques

Documents