differentiating types of aphasia, a case study in modern data mining techniques
TRANSCRIPT
1
Differentiating Types of Aphasia: A Case Study
in Modern Data Mining Techniques By: Christopher Peter Makris & Thomas Todd
Background One of the most complex and unique functions of the human brain is its ability to both
understand and generate speech. Every day billions of people around the world communicate
with each other through verbal and written means. Arguably, these skills are vital to an
individual’s wellbeing. Unfortunately, each year approximately five to six hundred thousand
people worldwide develop a condition known as aphasia that threatens to destroy the victim’s
communication skills. Aphasia usually results from stroke or head trauma and impairs an
individual’s language ability. The ailment can manifest in many ways, and often greatly depends
upon the part of the brain that has been affected. Severity can range from mild speech
impairment to complete inability in communicating one’s thoughts.
Currently, there are eight different types of aphasia that are generally used among
clinicians for patient classification. To test for the various types of aphasia, several methods may
be employed. The most widely used is the Western Aphasia Battery (WAB) test. Additionally,
specific examinations concerning repetition, verb naming, or even a brain MRI scan can also
help aid in classifying patients. Among experts in the field, there is large debate over both the
classification methodology and the groups themselves. Many speech pathologists believe that the
current aphasia types are outdated, should be modified, and that other groupings may be more
informative.
Aims & Expectations Our primary goal is to help in better understanding how to classify aphasic patients based
on speech patterns and ability in verbal skills such as repetition, naming, comprehension, and
fluency. We would also like to determine if there are other groupings that may not specifically
align with the existing eight aphasia categories. If there is an underlying structure within our
data, highlighting it may be more helpful in classifying patients than the current clinician labels.
Ideally, we would like to either better define the current groups or create new classification
groups. Our findings could help decide specialized treatment plans for past and present patients
in order to ensure an efficient recovery.
Data & Missingness Information regarding patients’ demographics, WAB test scores, subtest scores, and error
measurements has been recorded for 161 patients from around the United States. The data is
currently stored online at www.talkbank.org. All patients within our dataset suffered from a
stroke and exhibit at least some type of aphasic behavior.
Our study will include a detailed analysis of data primarily regarding five distinct yet
intrinsically related tests. The first concerns the WAB test itself, whereas the four others concern
various error measurements taken throughout the administration of four subtests (free speech,
Cinderella, sandwich, and picture). For the free speech subtest, patients were asked to recount the
events leading up to and through their stroke. In the Cinderella subtest, patients were asked to tell
2
the familiar tale as best they could. For the sandwich subtest, patients were asked to describe the
process involved in constructing a peanut butter and jelly sandwich. Lastly, for the picture
subtest, patients were shown a series of pictures that outlined a story and were essentially asked
to narrate. For ease of future reference, patients were videotaped during the subtest
administration. After testing, the records were analyzed and various measurements were noted.
Of the original 161 patients available, 6 were removed prior to our analyses because of
test administration and protocol violations. They will not be included in the remainder of this
paper. For each of the valid 155 patients, 134 different variable measurements were taken for a
grand total of 20,770 individual cell entries. Of these entries, only 114 (approximately 0.549%)
were missing. All missing values were from the WAB test itself; however, there is no discernible
pattern among the blank values within the test variables. Thus, we assume that they are missing
completely at random. In order to avoid deleting entire patient records because of this sparse
missingness, we decide to impute averages based upon WAB aphasia type group membership.
Although this induces some group structure into our data, the assumption violation can be
relaxed because of the extreme sparseness of the missing data.
Patient Demographics We begin by summarizing some demographic information about the patients involved
within the study. Of the 155 patients, 56 (36.129%) are female and 99 (63.871%) are male. We
note that the number of males in our study is nearly double the number of females. A histogram
of patient age is depicted below in Figure 1. The distribution is approximately bell-shaped and
symmetric; the mean and median ages are nearly the same at approximately 63.054 and 64 years,
respectively. The standard deviation is approximately 11.871 years. The youngest subject is 34
years old, whereas the oldest is 91. There do not appear to be any overt outliers.
Figure 1: Histogram of patient age
In our dataset, we have at least one representative patient from each of eight different
classifications of aphasia: anomic, Broca, conduction, global, not aphasic (as determined by the
Western Aphasia Battery test), transcortical motor, transcortical sensory, and Wernicke. The pie
chart in Figure 2 shows the proportion of each type of aphasia that we have in our dataset. The
3
majority of the patients in the dataset are classified with anomic aphasia (49, approximately
31.613%). The smallest group in the data set is the transcortical sensory group which has only
one patient in the dataset, followed by the global and transcortical motor groups with three and
six patients, respectively. Fifteen individuals within our dataset were classified as not aphasic
because their WAB test score was considerably high; however, these subjects still exhibit some
aphasic behavior and thus will be included within our analysis.
Figure 2: Pie chart of WAB aphasia types
Univariate Exploratory Data Analysis Since there are over one hundred variables within our dataset, it is implausible to
graphically summarize all variable distributions. Thus, we choose to depict those variables that
displayed the most interesting behavior or those that may be pertinent for our future analyses.
The Western Aphasia Battery test returns a score called the Aphasia Quotient: a
numerical measure for each patient that is believed to help classify them into the eight
aforementioned groups of aphasia. A histogram of these values from our dataset is depicted in
below in Figure 3.
Figure 3: Histogram of WAB aphasia quotient
4
The distribution appears to be skewed left, towards lower values. The mean score is
approximately 70.028, while the median score is 72.6. The standard deviation is approximately
20.813. The lowest score an individual received is 16.1, whereas the highest is 99. Although the
distribution is skewed, there do not appear to be any overt outliers.
It is interesting to note that the WAB score splits aphasic patients into somewhat distinct
groups, which is to be expected considering its purpose (Figure 4). Although there is overlap in
some of the tails of the conditional distributions, the interquartile ranges generally appear to be
distinct. We note that the conditional distributions of the WAB score for the Broca and Wernicke
groups overlap quite a bit. Likewise, the conduction and transcortical motor conditional
distributions overlap heavily. On the other hand, the global, anomic, and mildly aphasic patients
do appear to be well-separated from the majority of the other types of patients.
Figure 4: Conditional boxplots of WAB aphasia quotient by aphasia type
The Boston Naming Test (BNT) is also another diagnostic tool often used in the
classification of aphasic patients. In contrast to the WAB scores, the Boston Naming Test scores
appear to be almost uniform (Figure 5). The mean and median scores are nearly the same at
approximately 7.213 and 7, respectively. The standard deviation is approximately 4.843 points.
Again, no outliers are clearly identifiable; however, it is interesting to note that most patients
score comparatively low on this test.
The Boston Naming Test scores do not seem to separate the WAB types quite as well
(Figure 6). Each individual conditional distribution appears to have a greater spread. Thus, the
conditional distributions tend to overlap much more frequently. There are still a couple groups,
however, that are quite well-separated from the majority of the other data. Once again, the
anomic and mildly aphasic patients seem to have interquartile ranges that do not overlap any of
the others. We also note that, although their overall distribution overlaps with the Broca and
Wernicke distributions, the global patients are again clustering towards very low scores.
The amount a patient speaks or the speed at which they speak may be indicative of how
impaired they are. Thus, it might be interesting to investigate the distribution of words spoken
per second. For graphical purposes, we choose to investigate this variable for the free speech
subtest (Figure 7).
5
Figure 5: Histogram of Boston Naming Test scores
Figure 6: Conditional boxplots of BNT scores by aphasia type
6
The distribution appears to be skewed right, towards higher values. The mean and median values
are approximately the same at 1.408 and 1.324 words per second, respectively. The standard
deviation is approximately 0.759 words per second. The minimum value is about 0.162 words
per second, whereas the maximum value is about 4.181. This maximum value might be
considered a mild outlier, as it is somewhat separated from the rest of the values. Also, this value
might seem a bit high for an aphasic patient, as they are speaking at a relatively fast rate.
Figure 7: Histogram of words spoken per second, free speech subtest
This variable seems to contain interesting separation information among the WAB
aphasia types (Figure 8). Although most of the interquartile ranges appear to overlap, there seem
to be two distinct groups. The anomic, conduction, mildly aphasic, and Wernicke patients appear
to score relatively higher than the Broca, global, transcortical motor and transcortical sensory
patients. This might serve as a split between relatively fluent and non-fluent patients.
Figure 8: Conditional boxplots of words spoken per second on the free speech subtest by aphasia type
7
The final variable we choose to discuss for our univariate exploratory data analysis
concerns the number of words a patient gets correct when they make no errors. The distribution
is depicted in Figure 9 below. In general, the distribution appears to be skewed towards lower
values, but it may also be bimodal. The mean is approximately 17.215 words whereas the median
is 20 words. The standard deviation is approximately 11.72 words. The minimum value is 0
words correct, and the maximum value is 31. Although no outliers are present, we note that there
is a low mode below 5 words correct and a high mode above 30 words correct.
Figure 9: Histogram of number of words correct with no errors
Once again, there is some separation when inspecting the conditional distribution based
on the WAB classification type (Figure 10). The anomic, mildly aphasic, transcortical motor and
transcortical sensory patients all have relatively high values, whereas the Broca, conduction,
global, and Wernicke patients all have relatively low values.
Figure 10: Conditional boxplots of words correct with no errors by aphasia type
8
Bivariate Exploratory Data Analysis Although the main focus of our project is to come up with new classification groupings
that do not incorporate the overall measurements of the Western Aphasia Battery test, we have
not restricted our use of information concerning the final score of each patient’s performance on
the Boston Naming Test. Since theoretically these tests are somewhat related, we would like to
determine how they interact with one another. Within the scatterplots in Figure 11 below, we
plot the two variables against each other and color the observations by their aphasia type as
determined by the WAB test. Even though the different groups appear to be somewhat widely
spread out across scores, the WAB and BNT tend to agree when in reference to overall high and
low scores. This is reflected in the high positive correlation of approximately 0.803 between the
two variables.
Figure 11: Scatterplots of BNT score against WAB AQ score, colored by aphasia type
9
We also examined the relationship between the spontaneous speech score and the naming
score for the aphasia quotient. The two variables have a high positive correlation of
approximately 0.776. In the scatterplot below, we can see that the anomic and mildly aphasic
groups tend to score high in both categories. If we restrict our attention to only the Broca
patients, there appears to be an evident linear trend between the spontaneous speech and naming
scores. If a Broca scored low on spontaneous speech then they also tend to score low on naming
as well. Furthermore, although the Wernicke group seems to be quite scattered, patients within
this category tend to score in the higher ranges of the spontaneous speech measurement.
Overall, there are no clear classifications that can be extrapolated from these plots;
however, information from the pairs of variables may help in some similar patient separation
such as in segmenting the anomic and mildly aphasic individuals from the rest of the subjects.
Figure 12: Scatterplots of naming score against spontaneous score, colored by aphasia type
10
Subtest Comparisons The four subtests in our data are theoretically intrinsically related, but may serve to
measure different patient characteristics. Although the same types of error measurements are
recorded for each patient among each subtest, we would like to determine if there are any
distinguishing factors between each of the subtest exams.
We begin by comparing the total number of utterances across each of the subtests (Figure
13a). We can see each individual distribution is skewed towards higher values, indicating that
most people did not speak for an extended period of time for each subtest. The mode of each
distribution is located between approximately 10 and 50 words. We can see that most individuals
completed the sandwich test with relatively few words, whereas most individuals freely spoke
about their stroke comparatively much longer.
Although the distributions look somewhat different when we examine only the total
number of utterances, when we normalize by time nearly all of the distributions appear to be
similar (Figure 13b). This indicates that, although patients may have spoken for variable periods
of time among the four subtests, the average rate at which they spoke was approximately
between .1 and .3 words per second regardless of subtest.
Figure 13: Kernel density estimates of: a. total number of utterances by subtest; b. total utterances per second by subtest
# Utterances Free Speech Sandwich Cinderella Picture
Free Speech 1.000 0.408 0.312 0.246
Sandwich 0.408 1.000 0.470 0.366
Cinderella 0.312 0.470 1.000 0.603
Picture 0.246 0.366 0.603 1.000
# Utt./Sec. Free Speech Sandwich Cinderella Picture
Free Speech 1.000 0.230 0.351 0.634
Sandwich 0.230 1.000 0.350 0.277
Cinderella 0.351 0.350 1.000 0.471
Picture 0.634 0.277 0.471 1.000 Table 1: Correlations among subtests: a. total number of utterances by subtest; b. total utterances per second by subtest
11
Correlations among the four tests’ total number of utterances and total number of
utterances per second are depicted in the tables on the previous page (Tables 1ab). When
considering either the total number of utterances or the total number of utterances per second,
most of the pairwise inter-subtest correlations are relatively weak (ranging from about 0.230 to
0.471). We note that, in terms of total number of utterances, the Cinderella and picture subtests
are moderately correlated ( ) and also, in terms of total number of utterances per
second, the free speech and picture subtests are moderately correlated ( ). Similarly, we also inspected the total number of words and the total number of words per
second across each of the subtests (Figures 14ab). The distributions for the total number of words
appear to be nearly identical to the distributions for the total number of utterances. This makes
intuitive sense since the two variables are measuring interrelated outcomes. Also, once we
normalize by time the subtest distributions are nearly indistinguishable.
Correlations among the four tests’ total number of words and total number of words per
second are much stronger in comparison to the utterance measurements (Tables 2ab). When
considering either the total number of words or the total number of words per second, most of the
pairwise inter-subtest correlations are relatively strong (ranging from about 0.463 to 0.908).
Figure 14:Kernel density estimates of: a. total number of words by subtest; b. total words per second by subtest
# Words Free Speech Sandwich Cinderella Picture
Free Speech 1.000 0.549 0.528 0.463
Sandwich 0.549 1.000 0.627 0.616
Cinderella 0.528 0.627 1.000 0.759
Picture 0.463 0.616 0.759 1.000
# Words/Sec. Free Speech Sandwich Cinderella Picture
Free Speech 1.000 0.757 0.809 0.908
Sandwich 0.757 1.000 0.686 0.797
Cinderella 0.809 0.686 1.000 0.825
Picture 0.908 0.797 0.825 1.000 Table 2: Correlations among subtests: a. total number of words by subtest; b. total words per second by subtest
12
Principal Component Analysis As noted before, there are a total of 134 variables within our dataset. Rather than using all
measurements upon all patients, we might be able to summarize a majority of the information
contained within the overall dataset by using just a few key variable combinations. To
investigate what these combinations may be, we conduct a principal component analysis.
Within the scree plot below (Figure 15), we can see that the first two principal
components explain most of the variability in the dataset (approximately 24.934% and 12.984%,
respectively). These percentages drop off quite quickly thereafter; the third principal component
explains only about 6.595% of the variability, whereas the fourth and fifth explain 4.907% and
4.487%, respectively. The rest of the principal components each explain less than 4% of the
variability in the dataset. This is an indication that the first four or five principal components
may be sufficient to describe the primary nuances inherent within our data.
Figure 15: Scree plot, percent variability explained
We plot the first five principal components against each other and color by the WAB
aphasia types on the following page (Figure 16). Note that the WAB coloring scheme is the same
as before. There does not appear to be much clear separation among all of the groups; however,
the red (anomic) and light blue (mildly aphasic) observations appear to most consistently be
somewhat clustered together. Furthermore, although similar observations often overlap with
observations from other groups, observations within the same group often appear to be clustered
near one another. This is an indication that the first few principal components may contain
separation information that could potentially validate the current aphasia type labeling standard.
It is also important to understand what information each principal component tends to
summarize. In order to do so, we graph the principal component loadings against the column
variables to which they are assigned (Figures 17a-e). We note that the horizontal direction in this
series of graphs is somewhat arbitrary; however, we order the column variables by the subtest to
which they are associated. Therefore, each plot is somewhat separated into five pieces. From left
to right, the sections correspond to the WAB, free speech, Cinderella, sandwich, and picture
subtests.
13
Figure 16: Pairs plot of first five principal components, colored by aphasia type
14
The first principal component loadings (Figure 17a) have two primary interesting
features: the positive values, and the most extreme negative values. Nearly all of the positive
loading values correspond to variables concerning the total number of unintelligible words
among all of the subtests. In contrast, all of the extreme negative values have to do with errors
made within each of the subtests. Specific variables highlighted by the extreme negative loadings
are the total number of repetitions, the total number of retracings/revisions, and the total number
of fragment filler words used. Therefore, this first principal component seems to help delineate
between the amount that a patient speaks and the number of errors that they make.
Separation of variables is much more distinct in the second principal component loadings
(Figure 17b). It is clear that most of the first twenty variables have highly negative loadings,
whereas the remaining variables have marginally positive and negative loadings with no clear
structure. The variables that have comparatively highly negative loadings correspond to the
measurements taken on the WAB test. Therefore, the second principal component seems to
delineate between the WAB test and all other subtests.
Rather than depicting clear separation among variables, the third principal component
loadings seem to pair together whole subtests (Figure 17c). Most of the positive loadings appear
to correspond to the variables within the free speech and sandwich subtests, whereas most of the
negative loadings correspond to the variables within the WAB, Cinderella, and picture subtests.
Most notably, the negative loadings with the highest magnitude belong to the WAB subtest
measurements indicating whether or not various questions were administered again. Overall, this
principal component seems to imply that an individual patient’s performance on the free speech
and sandwich subtests might measure one aspect of aphasic impairment, whereas a patient’s
performance on the WAB, Cinderella, and picture subtests might measure another.
The interpretation for the fourth principal component loadings (Figure 17d) once again
returns to specific variables. Most of the comparatively larger positive loadings correspond to
variables measuring dysfluency within-word errors and the total number of filler words an
individual patient uses across all of the subtests. In contrast, most of the comparatively larger
negative loadings correspond to variables measuring the total number of semantic errors an
individual patient makes and the total number of utterances/words an individual patient says
across all of the subtests. Thus, this principal component seems to make a distinction among the
different types of errors, along with the amount an individual patient speaks.
Lastly, we interpret the fifth principal component loadings (Figure 17e). This principal
component also concerns combinations of specific types of variables across all of the subtests.
All of the largest positive loadings primarily correspond to variables measuring the total number
of utterances, words, or fillers an individual patient speaks across all of the subtests. On the other
hand, all of the largest negative loadings correspond to variables measuring the total number of
neologistic and phonological errors. Thus, the fifth principal component seems to make a
distinction between the amount an individual speaks with the number of errors they make. Note
that it appears that the first and fifth principal component interpretations seem to be the same on
the surface; however, the error measures that are incorporated in each of the principal
components are different: the first principal component includes error measures of correct words
that may be repeated, whereas the fifth principal component includes error measures of incorrect
words only.
It is interesting to note that each of the first five principal components have unique
plausible interpretations. These may help dictate what the important variables are for the
remainder of our analysis.
15
Figure 17a-e: First five principal component loadings
16
Choice of K Clusters & K-Means Simulations One of the most critical parts of clustering problems is determining the appropriate
number of clusters, K, that should be identified within the data. There are several different types
of clustering methods available for use, each having different advantages and disadvantages. For
the purposes of our analysis, we focus on the K-means clustering algorithm.
K-means is an iterative process in which a number of clusters is defined and K random
centers are selected. The observations are then assigned to the closest center. Afterwards, the
center of each cluster is recalculated, and the process continues until there is little to no change in
the cluster assignments. Unfortunately, this process is nondeterministic meaning, that it can yield
various solutions depending on the choice of the initial random centers. To ensure that we have a
stable solution, the K-means algorithm must be repeated several times for a given value of K.
Furthermore, given a value of K, we must determine how well our classification method
is performing. There are two measures we choose to investigate as indicators of our algorithm’s
performance: the Adjusted Rand Index (ARI) and the Fowlkes-Mallows Index (FMI). The ARI is
a measure of how similar observations in the same cluster are to one another. On the other hand,
the FMI is a measure of the accuracy of the decided groupings. In general, we would like to
maximize both the ARI and FMI measures, as this would indicate a good clustering solution.
To objectively determine the best choice of K for our data, we run 1,000 simulations of
the K-means algorithm for each choice of K from two through ten. For each value of K, the most
stable solution is selected, and the ARI and FMI are calculated. Figure 18 below summarizes the
information gained by this procedure. Note that it is clear that, from this perspective, the best
choice for K is five clusters. Thus, we will use as an approximate benchmark for the
remainder of our analyses.
Figure 18: FMI against ARI for stable K-means solutions
17
Dataset Restriction It is clear that the WAB test itself plays a heavy role in delineating among the eight types
of aphasia as it was designed to do so; however, for our research, we wish to either validate these
classification groups by arriving at them in a different manner, or find other plausible groupings
as dictated by the data. Thus, for the remainder of our analyses we will not include variables
measured from the WAB test itself. We do so in order to not convolute our results with that
which would be expected from the WAB test in the first place. The only variable exception is the
auditory verbal comprehension score component of the aphasia quotient. This measure was left
in as a recommendation by expert clinician opinion as an accurate way to measure a patient’s
comprehension ability. From a statistical standpoint, since it is the only remnant variable of the
WAB test, leaving it within our analyses should not cause any theoretical problems.
Random Forest & Tree Clustering In the realm of classification problems, decision trees are arguably among the most
flexible and powerful tools. They are often simple to understand, and also relatively easy to
create. One main drawback is that decision trees are also often sensitive to small changes within
the data. To remedy this, we can implement a bagging and bootstrapping technique that will help
distinguish between data structure and noise. Ultimately, we can arrive at a stable solution.
For our project, we choose to use the random forest method. Rather than creating a
single, potentially unstable decision tree, we will create 5,000 decorrelated decision trees. Each
of the trees essentially represents a vote for the classification label for each patient observation.
By taking the most frequently appearing classification label for each observation, we eliminate
much of the noise and arrive upon a more stable and precise result. Note that this methodology is
very similar to the simulations conducted in the previous section.
Figure 19: Variable importance for random forest method
Each of the trees randomly samples a subset of the variables that it will use to create the
best splits among the observations. A measure of how well the chosen variables separate the
18
observations is calculated, called a Gini index. In general, a higher mean decrease in the Gini
index indicates that the associated variable performs observation separation well. The variable
importance plot on the previous page (Figure 19) depicts a ranking of the variables based on the
mean decrease in the Gini index. Most notably, the variable that appears to do the best job in
separating the groups of aphasic patients is the auditory verbal comprehension score component
of the aphasia quotient. This is somewhat expected simply because of the nature of the variable.
The next few important variables concern various measures of correct words.
In order to gauge the high-dimensional structure of the random forest results, we use
multidimensional scaling to project the observations down to two dimensions. The scatterplot in
Figure 20 depicts this projection colored by the aphasia type as determined by the WAB. We can
see that the random forest proximities appear to perform well when trying to separate groups of
aphasia types; similarly colored observations tend to be clustered nearby each other.
Figure 20: Scatterplot of first two multidimensional scaling coordinates, by aphasia type
To further investigate the high-dimensional structure of the random forest results, we
graph the first three multidimensional scaling coordinates and color them both by the aphasia
types as determined by the WAB and also colored by the five groups the random forest method
found (Figures 21ab). Note that the WAB coloring scheme is the same as before. Separation is
clear among the three dimensions. In either of the plots, one can argue that as you follow patients
from one color to another along the truncated triangular pathway, you pass through patients that
are on a scale of severity. The original eight WAB aphasia type classifications mostly align with
the five random forest aphasia type classifications, except that the relative simplicity of the
random forest solution somewhat blurs the distinction between the old groupings.
19
Figure 21: 3D multidimensional scaling plots colored by: a. aphasia type; b. random forest type
20
A major drawback from applying the random forest algorithm is that there is no way to
visualize the average decision tree. Because it is pertinent to our study to understand how the
variables are being split, we manually fit a single decision tree to our entire dataset (Figure 22).
We note that we should use the information shown in this tree with caution as we are severely
overfitting to the characteristics among our dataset. On the other hand, almost all of the variables
that were implemented in the overall tree appeared in the variable importance plot from the
random forest method.
We note that the resulting nodes yield some information pertaining to potential clustering
solutions for new observations. For example, it is clear that the anomic patients are most similar
to the mildly anomic patients. Also, there may be two groups of the Broca patients: one that is
more similar to the Wernicke patients, and one that is more similar to the conduction patients.
Iterative Spectral Clustering As we have seen from the principal components analysis and random forest algorithm, it
is clear that there is high-dimensional structure among our data in terms of the variables. This
motivates a spectral clustering analysis, in which we are mainly concerned with all pairs of
observations instead of the variables themselves.
The main goal is to cluster observations into groups. Ideally, observations in the same
group are more similar to each other than observations in different clusters. Many different
clustering methods are available for this purpose; however, most other clustering algorithms rely
on assumptions that may not always be applicable or practical. For example, the success of the
model-based clustering algorithm depends on the assumption that the true group structure is
elliptical in shape. The spectral clustering method relaxes the shape assumption and therefore can
be applied more broadly.
In order to use the spectral clustering methodology, we must first define a measure of
similarity between pairs of observations. We will call this measure an observation’s affinity to
another observation. Intuitively, observation pairs with larger affinity values are comparatively
more similar, and thus should theoretically be clustered together. On the other hand, observation
pairs with smaller affinity values are comparatively more dissimilar, and thus should
theoretically be clustered into different groups.
Figure 22: Overall decision tree for aphasia type classification
21
The mathematical definition of affinity varies among scenarios. In general, one must
decide upon specific choices for tuning parameters that could change the results of the analysis
quite drastically. For our problem, we must decide upon two main tuning parameters: a distance
metric, and neighborhood definition . The former is a rule for calculating the distance between
observations in high-dimensional space, whereas the latter plays a part in defining both how
close one observation is to one another and how easily two observations may be considered
members of the same cluster.
For our analysis, we first decided to scale our overall dataset to ensure that all variable
measurements are taken into account. We then used the squared Euclidean distances between
observations and a neighborhood parameter . These selections were made not only because
they highlight group structure, but also because they are quite stable. In running multiple
simulations of the spectral clustering algorithm upon the affinities defined in this manner,
solutions are incredibly consistent and thus robust.
Below is a graphical representation of each individual patient’s affinity profile compared
to all other patients within the dataset (Figure 22). Note that affinity is bound between 0
(extremely dissimilar) and 1 (identical). Under this representation of the affinities, most
observation distributions have an average affinity value between .4 and .5. This indicates that the
chosen affinity definition is successful in partitioning observations into those that are similar and
those that are dissimilar from the reference frame of each individual observation.
Figure 23: Boxplots of individual patient affinities between all other patients
We also inspected a heat map of the affinity matrix (Figure 23a) in order to see if any
group structure is present. In this graph, each pixel represents an observation pair’s affinity. The
coloring is on a scale: red values are lower (closer to 0), whereas yellow values are higher (closer
to 1). Because of the way we ordered the observations in our dataset, any apparent block-
structuring in the heat maps would be reflective of similar groupings among the observations in
terms of at least the old clinician classification labels. If we focus our attention on only those
observations that have a relatively high affinity of greater than or equal to .75 (Figure 23b), it is
clear that there is some block-structure. This is extremely beneficial, and exactly what we would
want to see before performing the spectral clustering analysis as it validates our notion of using
affinities to partition observations into similar groups.
22
Figure 24: Heat maps of pairwise affinities: a. all affinities; b. affinity values
For the purposes of this analysis, we do not want to make any assumptions concerning
the number of clusters we should ultimately be looking for. Thus, rather than using the classical
spectral clustering algorithm, we defined a new iterative approach that allows halting at any
number of clusters. This consideration of binary splits was also implemented in order to compare
against the random forest splitting process.
During this iterative approach, we begin with all of the observations in the same cluster.
We then analyze the main structure present in the principal component decomposition of the
affinity matrix in order to understand which observations are more similar. Then, based on the
observed structure, we partition the observations into the two most common subgroups that
appear over thousands of simulations of the K-means algorithm. Of the resulting subgroups, the
next candidate of a binary split is the subgroup that has the largest average within-cluster sum of
squared distances from the cluster mean. This is a measure of the average dissimilarity among a
single cluster. Intuatively, the cluster that has the largest within-cluster sum of squared distances
from the cluster mean is likely the most dissimilar and thus should be split into smaller groups.
To ensure that resulting groups are not too small, we can also define a size of groups that we
would consider appropriate to halt candidacy upon. This would guarantee that we could avoid
breaking one single group down to just a couple of observations while also avoiding leaving
other considerably larger groups undivided.
On the following page, we illustrate the main application of our iterative spectral
clustering approach (Figure 25). Because we noted earlier that potentially the best number of
clusters to choose based on the adjusted rand index and the Fowlkes-Mallows index is
approximately five or six, we halt the algorithm once we reach six subgroups. We also note that
we induced a limiting condition in which we did not consider resulting groups of less than
⌈ ⌉ ⌈ ⌉ observations as candidate clusters for binary splits. This halting factor
was implemented in order to not continue breaking down clusters of already small size while
leaving large clusters intact for the final clustering solution. Notice that although the methods are
completely different, the structure of this iterative spectral clustering process is reflective of the
random forest and tree solutions.
23
Figure 25: Iterative spectral clustering algorithm solution
We note that this iterative approach appears to be quite flexible. Resulting clusters
generally have approximately the same average within-cluster sum of squared distances from the
mean, implying that they are all perform approximately the same when identifying underlying
group structure. Furthermore, we note that resulting cluster sizes can vary quite widely. This is
another advantage of the iterative method, as it does not limit the size of the various possible
clustering solutions.
Although we are ultimately interested in coming up with potentially new groupings for
aphasic patients, it is interesting to compare the resulting clusters from the itereative spectral
clustering approach to those previously given by clinicians. We illustrate a confusion matrix with
the clinician classifications in the rows and our clustering solution based upon the spectral
clustering analysis as the columns below (Table 3). Note that the column names are somewhat
arbitrary.
Red Orange Yellow Green Blue Purple
Anomic 24 25 0 0 0 0
Broca 0 3 20 0 3 11
Conduction 0 16 1 1 5 9
Global 0 0 3 0 0 0
Not Aphasic 14 1 0 0 0 0
Trans. Motor 5 0 0 0 1 0
Trans. Sensory 0 0 0 0 0 1
Wernicke 0 5 1 5 1 0 Table 3: Iterative spectral clustering groupings against aphasia types
One aspect that is important to highlight is that the anomic patients were split into two
groups. In one group (red), half of the anomic patients are paired with nearly all of the not
aphasic patients and the transcortical motor patients, whereas in the other group (orange) the
other half are paired with most of the conduction and some Wernicke patients. This is
interesting, as it partitions the anomic patients into those who have a mild form of aphasia and
those who have a severe form of aphasia. Likewise, the Broca patients were primarily split into
24
two groups. In one group (yellow), most of the Brocas are paired with the global aphasic patients
in our dataset, whereas in the other (purple) most of the remaining Brocas are paired with some
conduction patients. Again, this seems to partition the Broca patients into those who have a
comparatively more mild form of aphasia and those who have a more severe form of aphasia.
Tree Splits & Spectral Groupings In order to compare how the different methods are splitting up patients, we illustrate the
variables of the first few splits of the tree against the cluster groupings yielded from the iterative
spectral clustering algorithm (Figure 26a-c). Ideally, we would like to see clear separations
among each of the conditional distributions. This would indicate that both methods are finding
similar structure among the data. Furthermore, the methods would serve as validations upon one
another for the discovery of similar patients.
The variable first split upon in the tree is the number of words correct with no errors. It is
clear that there are spectral clustering groupings whose distributions appear centered about
higher values, whereas there are other groupings whose distributions appear centered about
lower values (Figure 26a). We note that the tree makes a distinction between observations with
values above and below 21.5 words. We can see that the red and orange groups appear to contain
nearly all of the observation values that are above 21.5 words, whereas the yellow, green, blue,
and purple groups appear to contain nearly all of the observation values that are below 21.5
words. Thus, the most important split as defined by the cluster tree is reflected in the final
spectral clustering groupings.
On the left-hand side of the tree, the next variable that partitions observations is the WAB
auditory verbal comprehension score. Once again, the spectral groupings do present some
separation among the variable (Figure 26b). We note that the tree makes a distinction between
observations with values above and below a score of 6.475. The red, orange, blue, and purple
groups appear to contain nearly all of the observation values that are above a score of 6.475,
whereas the yellow and green groups appear to contain nearly all of the observation values that
are below a score of 6.475. For the second level of the tree, the WAB auditory verbal
comprehension score serves as a good measure for group separation.
Lastly, on the right-hand side of the tree, the second variable that partitions observations
measures the open word list repetition for each patient. The interquartile ranges for the spectral
groupings on this part of the tree do not overlap at all (Figure 26c). Once again, this is an
indication of both the tree and spectral clustering algorithms finding similar structure among the
data. We note that the tree makes a distinction between observations with values above and
below a score of 2.25. The red group appears to contain most of the observation values above a
score of 2.25, whereas the orange group appears to contain most of the observation values below
a score of 2.25. Thus, the open word list repetition variable also performs well when attempting
to separate similar observations both in reference to the tree and also the spectral clustering
groupings.
As noted before, the variables among the decision tree were determined to have high
importance based on the Gini coefficient values. Although there are nuance differences between
the decision tree and iterative spectral clustering methods, the fact that the iterative spectral
clustering algorithm independently captures the high-dimensional structure that the important
decision tree variables provide alludes to the notion that both methods are comparable. Thus, we
are confident that both methods provide plausible groupings that may serve as solutions to our
primary research aim.
25
Figure 26: Tree splits by spectral groups: a. words correct no errors; b. WAB aud./ver. comp. score; c. rep. open word list
26
Conclusion Ideally, we would like to summarize the newfound groupings as succinctly as possible. In
an effort to combine all that we have learned throughout our entire analysis, we attempt to
categorize patients into one of five groups based upon four basic verbal measurements: repetition
skill, naming skill, comprehension skill, and fluency. We summarize the groupings below in
Figure 27. Note that the general severity of aphasia increases as we pass from the leftmost to the
rightmost bubble. Summary measures indicative of the four skills used to create these groupings
are given on the following page (Table 4a-d). It is important to note that most of these scales are
fluid in that there is no steadfast cutoff value for each of the categories.
By simple inspection of the aforementioned four basic verbal measurements, a new
aphasic patient from outside of our study may be categorized into one of the five proposed
groups below. We hope that this tool will be useful in aiding the understanding of aphasic patient
behavior, and may potentially help in dictating recovery strategies for future patients.
Figure 27: Proposed groupings
27
Repetition: # Words Correct, No Errors
1st Quart. Mean 3
rd Quart.
Light Blue 76.00 79.78 87.00
Green 44.00 55.42 71.00
Purple 7.25 21.00 32.25
Orange 4.75 19.90 32.00
Brown 0.00 3.08 3.00
Naming: Boston Naming Test Score Naming: Verb Naming Test Score
1st Quart. Mean 3
rd Quart. 1
st Quart. Mean 3
rd Quart.
Light Blue 8.00 10.86 14.00 Light Blue 19.00 19.72 22.00
Green 7.00 9.00 12.00 Green 13.25 16.54 21.00
Purple 1.00 3.19 4.50 Purple 6.50 10.88 15.75
Orange 2.00 5.05 7.00 Orange 8.00 12.62 16.00
Brown 0.00 1.76 3.00 Brown 0.00 4.28 7.00
Comprehension: WAB Auditory Verbal Comprehension Score
1st Quart. Mean 3
rd Quart.
Light Blue 8.83 9.27 9.94
Green 7.51 8.44 9.58
Purple 6.10 7.42 8.75
Orange 6.80 7.42 8.25
Brown 4.20 5.95 7.85
Fluency: Words Per Second (Free Speech) Fluency: Words Per Second (Cinderella)
1st Quart. Mean 3rd Quart. 1
st Quart. Mean 3
rd Quart.
Light Blue 1.04 1.63 2.04 Light Blue 0.74 1.22 1.72
Green 1.03 1.58 2.03 Green 0.71 1.19 1.58
Purple 1.31 1.61 2.10 Purple 0.60 1.29 1.97
Orange 0.78 1.14 1.43 Orange 0.52 0.78 1.04
Brown 0.51 0.77 1.06 Brown 0.00 0.33 0.60
Fluency: Words Per Second (Sandwich) Fluency: Words Per Second (Picture)
1st Quart. Mean 3
rd Quart. 1
st Quart. Mean 3
rd Quart.
Light Blue 1.03 1.64 2.25 Light Blue 0.94 1.54 2.20
Green 0.78 1.24 1.51 Green 0.76 1.32 1.81
Purple 0.70 1.42 1.81 Purple 0.92 1.52 2.06
Orange 0.44 0.92 1.12 Orange 0.69 0.95 1.23
Brown 0.00 0.16 0.27 Brown 0.35 0.56 0.66
Table 4: Summary measures for new groupings: a. repetition; b. naming; c. comprehension; d. fluency
28
Appendix A: Tree Group Membership
Note: Groups are numbered from leftmost to rightmost node appearance on the decision tree (Figure 22)
Group #1 (8 Patients) [1] kansas06a kansas08a scale07a TAP14a adler19a scale09a TAP03a
[8] scale12a
Anomic Broca Conduction Global NotAphasicByWAB
0 4 0 3 0
TransMotor TransSensory Wernicke
0 1 0
Group #2 (12 Patients) [1] adler10a adler11a kansas01a kansas16a kempler04a scale03a
[7] scale18a scale27a TAP13a tucson02a wright206a kansas12a
Anomic Broca Conduction Global NotAphasicByWAB
0 11 0 0 0
TransMotor TransSensory Wernicke
0 0 1
Group #3 (11 Patients) [1] scale01a TAP09a adler06a adler23a elman12a elman14a
[7] kansas05a scale11b thompson05a tucson13a tucson15a
Anomic Broca Conduction Global NotAphasicByWAB
0 1 0 1 0
TransMotor TransSensory Wernicke
0 0 9
Group #4 (13 Patients) [1] adler13a adler16a elman11a kansas02a kansas09a
[6] scale10a TAP06a TAP16a TAP17a williamson12a
[11] wright201a wright205a kansas14a
Anomic Broca Conduction Global NotAphasicByWAB
0 12 0 0 0
TransMotor TransSensory Wernicke
0 0 1
Group #5 (5 Patients) [1] scale02a scale18b wright207a scale04a TAP11a
Anomic Broca Conduction Global NotAphasicByWAB
1 2 2 0 0
TransMotor TransSensory Wernicke
0 0 0
Group #6 (9 Patients) [1] adler25a elman06a scale15b scale26a kansas13a
[6] scale13a scale15a tucson11a williamson11a
Anomic Broca Conduction Global NotAphasicByWAB
0 4 5 0 0
TransMotor TransSensory Wernicke
0 0 0
29
Group #7 (15 Patients) [1] adler02a adler05a elman02a fridriksson13a kansas10a
[6] kansas20a scale11a TAP02a TAP15a tucson12a
[11] williamson01a williamson03a williamson04a williamson06a williamson09a
Anomic Broca Conduction Global NotAphasicByWAB
0 0 15 0 0
TransMotor TransSensory Wernicke
0 0 0
Group #8 (8 Patients) [1] adler24a TAP18a elman01a scale06a scale06b TAP12a tucson08a
[8] wright203a
Anomic Broca Conduction Global NotAphasicByWAB
2 0 6 0 0
TransMotor TransSensory Wernicke
0 0 0
Group #9 (5 Patients) [1] adler15a adler21a scale02b tucson07a elman03a
Anomic Broca Conduction Global NotAphasicByWAB
4 1 0 0 0
TransMotor TransSensory Wernicke
0 0 0
Group #10 (6 Patients) [1] scale05b adler04a adler18a scale05a scale19a thompson09a
Anomic Broca Conduction Global NotAphasicByWAB
1 0 0 0 0
TransMotor TransSensory Wernicke
5 0 0
Group #11 (5 Patients) [1] kempler03a adler14a scale23a williamson14a thompson03a
Anomic Broca Conduction Global NotAphasicByWAB
0 1 3 0 0
TransMotor TransSensory Wernicke
0 0 1
Group #12 (8 Patients) [1] adler08a adler17a adler20a scale14a scale22a thompson11a
[7] tucson16a kansas21a
Anomic Broca Conduction Global NotAphasicByWAB
7 0 1 0 0
TransMotor TransSensory Wernicke
0 0 0
Group #13 (7 Patients) [1] adler12a elman07a fridriksson05a kansas15a kansas18a
[6] williamson02a kansas17a
Anomic Broca Conduction Global NotAphasicByWAB
6 0 0 0 0
TransMotor TransSensory Wernicke
1 0 0
30
Group #14 (7 Patients) [1] adler01a cmu03a elman10a kansas19a TAP01a TAP04a kansas03a
Anomic Broca Conduction Global NotAphasicByWAB
6 0 0 0 1
TransMotor TransSensory Wernicke
0 0 0
Group #15 (8 Patients) [1] thompson10a thompson13a wright202a wright204a adler22a kansas04a
[7] scale16a scale20a
Anomic Broca Conduction Global NotAphasicByWAB
4 0 0 0 4
TransMotor TransSensory Wernicke
0 0 0
Group #16 (10 Patients) [1] adler03a adler07a elman04a kansas07a scale21a
[6] tucson01a tucson04a williamson05a williamson10a williamson13a
Anomic Broca Conduction Global NotAphasicByWAB
0 0 0 0 10
TransMotor TransSensory Wernicke
0 0 0
Group #17 (18 Patients) [1] adler09a elman05a kansas11a kempler02a scale08a
[6] scale17a thompson01a thompson02a thompson04a thompson06a
[11] thompson07a thompson07b thompson08a tucson06a tucson10a
[16] williamson07a williamson08a williamson15a
Anomic Broca Conduction Global NotAphasicByWAB
18 0 0 0 0
TransMotor TransSensory Wernicke
0 0 0
31
Appendix B: Iterative Spectral Clustering Group
Membership
Group “Red” (43 Patients) [1] adler01a adler12a adler24a cmu03a elman05a elman10a
[7] fridriksson05a kansas18a kansas19a kempler02a scale05b scale08a
[13] scale14a TAP01a TAP04a thompson07a thompson13a tucson06a
[19] tucson10a tucson16a williamson02a williamson15a wright202a wright204a
[25] adler03a adler07a adler22a elman04a kansas03a kansas04a
[31] kansas07a scale16a scale20a tucson01a tucson04a williamson05a
[37] williamson10a williamson13a adler04a kansas17a scale05a scale19a
[43] thompson09a
Anomic Broca Conduction Global NotAphasicByWAB
24 0 0 0 14
TransMotor TransSensory Wernicke
5 0 0
Group “Orange” (50 Patients) [1] adler08a adler09a adler15a adler17a adler20a adler21a
[7] elman07a kansas11a kansas15a scale02a scale02b scale17a
[13] scale22a TAP18a thompson01a thompson02a thompson04a thompson06a
[19] thompson07b thompson08a thompson10a thompson11a tucson07a williamson07a
[25] williamson08a adler25a elman03a kempler03a adler02a adler14a
[31] elman01a elman02a fridriksson13a kansas21a scale04a scale11a
[37] scale13a scale23a tucson08a williamson01a williamson09a williamson11a
[43] williamson14a wright203a scale21a elman12a elman14a kansas14a
[49] thompson03a thompson05a
Anomic Broca Conduction Global NotAphasicByWAB
25 3 16 0 1
TransMotor TransSensory Wernicke
0 0 5
Group “Yellow” (25 Patients) [1] adler10a adler11a elman06a kansas01a kansas02a kansas06a kansas08a kansas09a
[9] kansas16a scale03a scale07a scale10a scale27a TAP06a TAP14a TAP16a
[17] TAP17a tucson02a wright205a adler19a TAP11a scale09a TAP03a TAP09a
[25] adler06a
Anomic Broca Conduction Global NotAphasicByWAB
0 20 1 3 0
TransMotor TransSensory Wernicke
0 0 1
Group “Green” (6 Patients) [1] williamson06a kansas05a kansas12a scale11b tucson13a tucson15a
Anomic Broca Conduction Global NotAphasicByWAB
0 0 1 0 0
TransMotor TransSensory Wernicke
0 0 5
32
Group “Blue” (10 Patients) [1] elman11a scale01a scale18b adler05a kansas10a scale15a tucson11a tucson12a
[9] adler18a adler23a
Anomic Broca Conduction Global NotAphasicByWAB
0 3 5 0 0
TransMotor TransSensory Wernicke
1 0 1
Group “Purple” (21 Patients) [1] adler13a adler16a kempler04a scale15b scale18a scale26a
[7] TAP13a williamson12a wright201a wright206a wright207a kansas13a
[13] kansas20a scale06a scale06b TAP02a TAP12a TAP15a
[19] williamson03a williamson04a scale12a
Anomic Broca Conduction Global NotAphasicByWAB
0 11 9 0 0
TransMotor TransSensory Wernicke
0 1 0
33
Appendix B: Proposed Group Membership
Group “Light Blue” (43 Patients) [1] adler01a adler12a adler24a cmu03a elman05a elman10a
[7] fridriksson05a kansas18a kansas19a kempler02a scale05b scale08a
[13] scale14a TAP01a TAP04a thompson07a thompson13a tucson06a
[19] tucson10a tucson16a williamson02a williamson15a wright202a wright204a
[25] adler03a adler07a adler22a elman04a kansas03a kansas04a
[31] kansas07a scale16a scale20a tucson01a tucson04a williamson05a
[37] williamson10a williamson13a adler04a kansas17a scale05a scale19a
[43] thompson09a
Anomic Broca Conduction Global NotAphasicByWAB
24 0 0 0 14
TransMotor TransSensory Wernicke
5 0 0
Group “Green” (50 Patients) [1] adler08a adler09a adler15a adler17a adler20a adler21a
[7] elman07a kansas11a kansas15a scale02a scale02b scale17a
[13] scale22a TAP18a thompson01a thompson02a thompson04a thompson06a
[19] thompson07b thompson08a thompson10a thompson11a tucson07a williamson07a
[25] williamson08a adler25a elman03a kempler03a adler02a adler14a
[31] elman01a elman02a fridriksson13a kansas21a scale04a scale11a
[37] scale13a scale23a tucson08a williamson01a williamson09a williamson11a
[43] williamson14a wright203a scale21a elman12a elman14a kansas14a
[49] thompson03a thompson05a
Anomic Broca Conduction Global NotAphasicByWAB
25 3 16 0 1
TransMotor TransSensory Wernicke
0 0 5
Group “Purple” (16 Patients) [1] elman11a scale01a scale18b adler05a kansas10a scale15a
[7] tucson11a tucson12a williamson06a adler18a adler23a kansas05a
[13] kansas12a scale11b tucson13a tucson15a
Anomic Broca Conduction Global NotAphasicByWAB
0 3 6 0 0
TransMotor TransSensory Wernicke
1 0 6
Group “Orange” (21 Patients) [1] adler13a adler16a kempler04a scale15b scale18a scale26a
[7] TAP13a williamson12a wright201a wright206a wright207a kansas13a
[13] kansas20a scale06a scale06b TAP02a TAP12a TAP15a
[19] williamson03a williamson04a scale12a
Anomic Broca Conduction Global NotAphasicByWAB
0 11 9 0 0
TransMotor TransSensory Wernicke
0 1 0
34
Group “Brown” (25 Patients) [1] adler10a adler11a elman06a kansas01a kansas02a kansas06a kansas08a kansas09a
[9] kansas16a scale03a scale07a scale10a scale27a TAP06a TAP14a TAP16a
[17] TAP17a tucson02a wright205a adler19a TAP11a scale09a TAP03a TAP09a
[25] adler06a
Anomic Broca Conduction Global NotAphasicByWAB
0 20 1 3 0
TransMotor TransSensory Wernicke
0 0 1