topological methods reveal high and low functioning neuro ... · topological methods reveal high...

12
Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano, 1 * Monica Nicolau, 2 Eve-Marie Quintin, 1 Paul K. Mazaika, 1 Amy A. Lightbody, 1 Heather Cody Hazlett, 3 Joseph Piven, 3 Gunnar Carlsson, 2 and Allan L. Reiss 1 1 Center for Interdisciplinary Brain Sciences Research, Stanford University Medical School, Stanford, California 2 Department of Mathematics, Stanford University, Stanford, California 3 CB#3367, Carolina Institute for Developmental Disabilities, UNC, Chapel Hill, North Carolina r r Abstract: Fragile X syndrome (FXS), due to mutations of the FMR1 gene, is the most common known inherited cause of developmental disability as well as the most common single-gene risk factor for autism. Our goal was to examine variation in brain structure in FXS with topological data analysis (TDA), and to assess how such variation is associated with measures of IQ and autism-related behav- iors. To this end, we analyzed imaging and behavioral data from young boys (n 5 52; aged 1.57–4.15 years) diagnosed with FXS. Application of topological methods to structural MRI data revealed two large subgroups within the study population. Comparison of these subgroups showed significant between-subgroup neuroanatomical differences similar to those previously reported to distinguish chil- dren with FXS from typically developing controls (e.g., enlarged caudate). In addition to neuroanat- omy, the groups showed significant differences in IQ and autism severity scores. These results suggest that despite arising from a single gene mutation, FXS may encompass two biologically, and clinically separable phenotypes. In addition, these findings underscore the potential of TDA as a powerful tool in the search for biological phenotypes of neuropsychiatric disorders. Hum Brain Mapp 35:4904–4915, 2014. V C 2014 Wiley Periodicals, Inc. Key words: autism spectrum behavior; MRI; topological data analysis; multivariate pattern analysis; voxel-based morphometry r r Additional Supporting Information may be found in the online version of this article. Contract grant sponsor: National Institute of Mental Health; Contract grant numbers: R01MH050047, R01MH064708, R01MH064580; MH061696, 5T32MH019908; Contract grant sponsor: National Institute of Child Health and Human Develop- ment; Contract grant number: P30HD003110; Contract grant sponsor: the Canel Family Fund. *Correspondence to: David Romano, Stanford University, Psychia- try and Behavioral Sciences, 401 Quarry Rd, Rm 1356, MC 5795, Stanford, CA 94305. E-mail: [email protected] Received for publication 12 November 2013; Revised 24 February 2014; Accepted 20 March 2014. DOI 10.1002/hbm.22521 Published online 15 April 2014 in Wiley Online Library (wileyon- linelibrary.com). r Human Brain Mapping 35:4904–4915 (2014) r V C 2014 Wiley Periodicals, Inc.

Upload: others

Post on 19-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

Topological Methods Reveal High and LowFunctioning Neuro-Phenotypes Within

Fragile X Syndrome

David Romano,1* Monica Nicolau,2 Eve-Marie Quintin,1 Paul K. Mazaika,1

Amy A. Lightbody,1 Heather Cody Hazlett,3 Joseph Piven,3

Gunnar Carlsson,2 and Allan L. Reiss1

1Center for Interdisciplinary Brain Sciences Research, Stanford University Medical School,Stanford, California

2Department of Mathematics, Stanford University, Stanford, California3CB#3367, Carolina Institute for Developmental Disabilities, UNC, Chapel Hill,

North Carolina

r r

Abstract: Fragile X syndrome (FXS), due to mutations of the FMR1 gene, is the most common knowninherited cause of developmental disability as well as the most common single-gene risk factor forautism. Our goal was to examine variation in brain structure in FXS with topological data analysis(TDA), and to assess how such variation is associated with measures of IQ and autism-related behav-iors. To this end, we analyzed imaging and behavioral data from young boys (n 5 52; aged 1.57–4.15years) diagnosed with FXS. Application of topological methods to structural MRI data revealed twolarge subgroups within the study population. Comparison of these subgroups showed significantbetween-subgroup neuroanatomical differences similar to those previously reported to distinguish chil-dren with FXS from typically developing controls (e.g., enlarged caudate). In addition to neuroanat-omy, the groups showed significant differences in IQ and autism severity scores. These results suggestthat despite arising from a single gene mutation, FXS may encompass two biologically, and clinicallyseparable phenotypes. In addition, these findings underscore the potential of TDA as a powerful toolin the search for biological phenotypes of neuropsychiatric disorders. Hum Brain Mapp 35:4904–4915,2014. VC 2014 Wiley Periodicals, Inc.

Key words: autism spectrum behavior; MRI; topological data analysis; multivariate pattern analysis;voxel-based morphometry

r r

Additional Supporting Information may be found in the onlineversion of this article.

Contract grant sponsor: National Institute of Mental Health;Contract grant numbers: R01MH050047, R01MH064708,R01MH064580; MH061696, 5T32MH019908; Contract grantsponsor: National Institute of Child Health and Human Develop-ment; Contract grant number: P30HD003110; Contract grantsponsor: the Canel Family Fund.

*Correspondence to: David Romano, Stanford University, Psychia-try and Behavioral Sciences, 401 Quarry Rd, Rm 1356, MC 5795,Stanford, CA 94305. E-mail: [email protected]

Received for publication 12 November 2013; Revised 24 February2014; Accepted 20 March 2014.

DOI 10.1002/hbm.22521Published online 15 April 2014 in Wiley Online Library (wileyon-linelibrary.com).

r Human Brain Mapping 35:4904–4915 (2014) r

VC 2014 Wiley Periodicals, Inc.

Page 2: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

INTRODUCTION

Because of similarities between the behavioral profiles ofchildren with the single gene disorder fragile X syndrome(FXS) and the criteria used to define autism spectrum disor-ders (ASD), it had been hoped that FXS might serve as a use-ful genetically-defined model for studying ASD. A recentstudy by our group [Hoeft et al., 2011]; however, showedthat the gray and white matter profiles of young childrenwith FXS are significantly different from those of childrenwith idiopathic autism (iAUT; i.e., who do not have FXS).These differences were of sufficient magnitude that the twopopulations could be discriminated with a high degree ofaccuracy (90%) through the use of machine learningapproaches. This finding also supports the hypothesis thatthere is a high level of neurobiological heterogeneity amongindividuals meeting diagnostic criteria for iASD [Abrahamsand Geschwind, 2008]. In the study presented here, we con-centrate exclusively on children with FXS in order to explorethe degree to which neurobiological heterogeneity may bepresent within the FXS population itself. To date, there hasbeen little work suggesting the existence of separable neuro-biological phenotypes within FXS, the notable exceptionbeing a study by Jacquemont et al. [2011] that suggestsmethylation status may constitute a biomarker for predict-ing response to AFQ056, a subtype-selective mGluR5 inhibi-tor. Our motivation to search for neuro-phenotypicsubgroups within FXS derived from previous studies sug-gesting the existence of behavioral subgroups based onwhether individuals met criteria for autism [Brock and Hat-ton, 2010; Wolff et al., 2012]. Thus, we sought to explore theneuro-phenotypic “landscape” of FXS to attempt to deter-mine whether there may be previously undiscovered biolog-ical underpinnings to this behavioral observation. In thissense, our work may be seen as part of current efforts tounderstand the biological basis of neuropsychiatric pathol-ogy (e.g., the RDoC approach articulated by NIMH: http://www.nimh.nih.gov/research-priorities/rdoc/index. shtml.)

To investigate putative phenotypic subgroups withinFXS as captured by MRI data, we employed topologicaldata analysis (TDA) [Carlsson, 2009], a recently-developedapproach that is specifically designed to identify structuralcharacteristics of high-dimensional datasets. One of thestrengths of TDA is its ability to reduce such high-dimensional data to human-readable representations thatcapture essential features, similar to the manner in whicha topographical map is able to capture the essential fea-tures of a landscape. An example of this approach can befound in Nicolau et al. [2011], where a TDA-basedapproach was successful in elucidating a previouslyunidentified subgroup of breast cancers that exhibit 100%survival and no metastases. To our knowledge, our studyrepresents the first application of TDA to appear in the lit-erature on neuroimaging.

Here, we use TDA to explore the landscape of brainimaging phenotypes of very young boys with FXS. In con-trast to the focus of Hoeft et al. [2011], which used brain

images to discriminate between iAUT and FXS, the specificgoal of this project was to identify previously unobserved,yet neurobiologically salient subgroups within FXS thatexhibit separable neuroanatomical phenotypes. The outputof TDA, taking the form of subgroups of children whoshare similar brain structure patterns, can then be used toidentify brain regions that characterize subgroup differen-ces at the anatomical level as well as possible differencesin behavioral profiles. Our primary hypothesis was thatapplication of TDA would uncover brain-imaging pheno-types that were not just anatomically separable, but alsodifferent from a clinical viewpoint.

MATERIALS AND METHODS

Participants and Data Collection

The data utilized in our study were collected frominfant and toddler boys who participated in a study ofbrain development in FXS [Hoeft et al., 2011]. Participantsin this study included boys with FXS, idiopathic autism(iAUT), and developmental delays (DD) as well as typi-cally developing (TD) children, all of whom were recruitedby collaborating research teams at the Stanford UniversitySchool of Medicine and the University of North Carolina,Chapel Hill; the current study focuses only on the subsetof children from this earlier study that were diagnosedwith FXS. The study protocols were approved by thehuman subjects committees at both institutions and con-sent from parents was obtained. Children with FXS(n 5 52; mean [SD] age, 2.90 [0.63] years) were recruitedthrough registry databases maintained by the StanfordUniversity School of Medicine and the University of NorthCarolina, Chapel Hill, through postings to the NationalFragile X Foundation Web site and quarterly newsletter,and through mailings to other regional FXS organizations.All participants had the “full mutation” form of the FMR1gene known to cause FXS [Hoeft et al., 2008]. Participantscompleted the Autism Diagnostic Observation Schedule–Generic (ADOS) [Gotham et al., 2009; Lord et al., 2000]and their parents were given the Autism Diagnostic Inter-view (ADI)–Revised [Lord et al., 1994]. The Mullen Scalesof Early Learning was administered to measure child IQ.There were no significant differences between sites in anyof the cognitive variables for the participants with FXS (allP’s> 0.05). [see Table I for a summary of demographicinformation, including the participant’s level of Fragile XMental Retardation Protein (FMRP)]. Anatomical MRI scanacquisition parameters consisted of a coronal T1-weightedsequence (inversion recovery preparation pulse 5 300 ms;repetition time (TR) 5 12 ms; echo-time (TE) 5 5 ms; flipangle 5 20�; slice thickness 5 1.5 mm; number ofexcitations 5 1; field-of-view (FOV) 5 20 cm; matrix 5 2563 192) (Phantoms were used to ensure matching calibra-tion of MRI scanners at both sites).

r Topology Finds Fragile X Syndrome Phenotypes r

r 4905 r

Page 3: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

Voxel-Based Morphometry Preprocessing

Standard voxel-based morphometry (VBM) preprocess-ing of MRI data was carried out using the Statistical Para-metric Mapping 5 (SPM5) statistical package (http://www.fil.ion.ucl.ac.uk/spm) and VBM5.1 (http://dbm.neuro.uni-jena.de/vbm). Images were bias-field correctedand segmented to GM, WM, and CSF. A Hidden MarkovRandom Field (HMRF; prior probability weight 0.3) wasapplied in order to incorporate spatial constraints arisingfrom neighboring voxels. The images were normalizedwith a 12-parameter affine transformation with a spatialfrequency cut-off of 25 in all three (x, y, z) directions andresampled to 1 3 1 3 1 mm3 voxels. Linear and nonlinearJacobian modulation were applied. Customized GM, WM,and CSF templates created using all participants in ourprevious study [Hoeft et al., 2011] (FXS, n 5 52; iAUT,n 5 63; DD, n 5 19; and TD, n 5 31) were used for VBMpreprocessing. For each participant, segmentation and nor-malization accuracy were manually inspected.

Multivariate Pattern Analysis of Magnetic

Resonance Images Using Topological Data

Analysis

TDA is one of a general class of approaches to analyzinghigh-dimensional data known in the literature as multivari-ate pattern analysis (MVPA). Multivariate approaches aredesigned to detect effects that may be discernable withinthe relationships (patterns) among variables, but whichmay elude detection when variables are examined in isola-tion. One type of multivariate approach that has alreadybeen used extensively in brain imaging studies is the useof support vector machines (SVMs) [Bray et al., 2009]. TDAfollows the initial steps taken in SVM analyses to prepareimage data for analysis by building a data matrix fromsmoothed and vectorized individual images, but thereafterdiffers from SVM analyses in two critical ways (see belowfor a brief discussion of vectorization). One difference isthat SVMs are used to compare conditions that are known

a priori, such as disorder versus control or stimulus versusrest, and are thus said to be supervised approaches toMVPA. In contrast, TDA is unsupervised since, rather thancompare predefined groups, it is used to identify coherent,but possibly heretofore unknown groups within the studypopulation. Other examples of unsupervised approachesto MVPA include independent component analysis [Cal-houn et al., 2012] and clustering via correlation matrices[Fair et al., 2012]. A second critical difference, and a keybenefit of using TDA not found in other MVPAapproaches, is that TDA can produce a compressed andeasily readable visual representation of the data, called aReeb graph, which preserves the underlying geometricstructure of the data, and thus facilitates identification ofits salient features [Carlsson, 2009]. As can be seen belowin Figure 1, these features can encode not only informationabout clusters that may exist within the data, but alsoinformation about spectra (i.e., variation in the data due tocontinuously varying underlying parameters). To allowthe reader to gain insight into the process of Reeb graphconstruction, we give a brief overview of the TDA pipelineas applied to a toy example from Lum and colleagues[Lum et al., 2013], and provide more a detailed descriptionof our use of TDA in the following subsection.

Overview of TDA

As with SVMs, the starting point for TDA is a datamatrix whose m rows and n columns correspond to thevariables of the study and their observed values; in thetypical TDA data matrix, each column corresponds to avariable (e.g., voxel), and each row corresponds to anobservation (e.g., subject). In geometric terms, we mayinterpret the n values in a row as giving the coordinates ofa point in n-dimensional space, and can thus interpret them rows as m points (collectively called a point cloud) inspace. In this way, we may also interpret the point cloudin Figure 1A in the shape of a hand as being a visual rep-resentation of a data matrix with thousands of rows and

TABLE I. Demographic information from Hoeft et al. [2011]

FXS iAUT DD TD

Site, SU:UNCParticpants, no. 28:24 17:46 11:8 11:20

Age, yrs.Mean(SD) 2.90 (0.63) 2.77 (0.41) 2.96 (0.50) 2.55 (0.60)

MSEL composite standardized scoreMean (SD) 54.94 (9.14) 54.10 (9.41) 55.47 (7.53) 109.55 (17.24)

FMRP (%)Mean (SD)a 5.83 (3.94) NA NA NA

SU, Stanford University; UNC, University of North Carolina; MSEL, Mullen Scales of Early Learning; FMRP, fragile X mental retarda-tion protein.aFXS n 5 50 for FMRP %.

r Romano et al. r

r 4906 r

Page 4: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

three columns, where the columns correspond to the x, y,and z coordinates of the points in the point cloud.

The first stage of TDA is to assign numerical values toeach point in the point cloud, a process we refer to as selec-tion of filter values; in Figure 1B, these values are representedby a color in the red-blue spectrum. The next stage of TDAis to separate the point cloud into overlapping regions byseparating the filter values into overlapping bins. For exam-ple, in Figure 1B we could view the filter values as indicat-ing the distance from the tip of the middle finger asmeasured by a horizontal ruler, in which case we couldthen view the regions shown in Figure 1C as defined by thebins [0–1.5 in], [1–2.5 in], [2–3.5], [3–4.5], [4–5.5], [5–6.5]. The

last stage of TDA is to represent each region from the previ-ous stage by a dot, called a node, and then for each pair ofregions that intersect, to connect the corresponding nodesby a line segment. Figure 1D shows the resulting Reebgraph, where the size and color of each node represent thenumber and “average” color of the points in the corre-sponding region. Note how the Reeb graph captures theunderlying geometric structure of the hand, providing aschematic representation whose structural features corre-spond to real physical features, such as fingers, which alsoillustrates TDA’s ability to highlight spectra and clusterswithin data, as described earlier. Note also that this con-struction does not involve multiple comparisons, and thusno correction for multiple comparisons is warranted.

Unlike the case of the hand, where our familiarity withits shape allows us to validate the Reeb graph of the corre-sponding point cloud, the “shape” of the point cloud cor-responding to MRI data for our study population ofyoung boys with FXS is not known to us a priori. Toaddress this issue, we use TDA to identify subgroupspurely on the basis of the MRI data, but then apply stand-ard statistical tests to compare clinical data for these sub-groups to provide confirmation they are clinically, as wellas anatomically distinct.

Details of Reeb Graph Construction

As described earlier, we can interpret a subject’s numer-ical data as the coordinates of a point in a high-dimensional space S, referred to as subject space, and canthus recast the study population as a point cloud in sub-ject space. In our case, we used the combined gray andwhite matter voxel data to create the subject space S, andthen used the Iris software package to construct a Reebgraph of the data, so that we could explore the geometryof the resulting point cloud to identify its topological (i.e.,shape) features [Lum et al., 2013] for an overview of Iris.These features would then correspond to anatomicallydefined subgroups of the study population, whose clinicalprofiles could then be compared and subjected to furtheranalysis. In this subsection, we give a detailed descriptionof the process used to construct a Reeb graph (Fig. 2 for aflowchart that summarizes this process).

As noted above, Reeb graph construction begins, as doSVMs, with a data matrix. Since SVMs have already beenused extensively in brain imaging studies, we followed thesame approach to data matrix construction; namely, wecombined the VBM-preprocessed imaging data for the allsubjects into a single matrix, with the data for each subjectentered as a row of the data matrix. Specifically, we placedin each column the voxel intensity at a particular spatiallocation across all subjects. TThe process of converting amultidimensional object, like a 3D image, into an orderedlist, like a row in a matrix, is known as vectorization, and isa common requirement of MVPA approaches [Pereiraet al., 2009]. In principle, vectorization results in a loss of

Figure 1.

TDA pipeline. Visual depiction of TDA pipeline from point cloud

to Reeb graph, reprinted from [Lum et al., 2013] in accordance

with the Creative Commons license under which it was pub-

lished (CC BY-NC-ND 3.0). [Color figure can be viewed in the

online issue, which is available at wileyonlinelibrary.com.]

r Topology Finds Fragile X Syndrome Phenotypes r

r 4907 r

Page 5: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

information about the spatial relationships between voxels,but in practice this loss does not appear to hamper thesuccess of MVPA approaches in general, as evidenced bytheir use in numerous studies [Bray et al., 2009] for anextensive review of the use of MVPA in neuroimaging.

Before the preprocessed data were vectorized, the inten-sities for each subject’s gray matter 1 3 1 3 1 mm3 voxelimages were binned and averaged to produce correspond-ing gray 4 3 4 3 4 mm3 voxel images, and the same wasdone for each subject’s 1 mm white matter images. This

Figure 2.

Reeb graph construction. Flowchart depicting the process used to construct a Reeb graph from

raw MRI data. Note that the last four steps correspond to parts B, C, and D of Figure 1, with

the last two steps corresponding to part D. [Color figure can be viewed in the online issue,

which is available at wileyonlinelibrary.com.]

r Romano et al. r

r 4908 r

Page 6: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

step was undertaken to both reduce noise in the 1 mmimages by smoothing and for ease of computation. Thevoxels in each of these new 4 mm 3D images were thenlinearly ordered, so that each subject’s gray and whitematter images were converted into gray and white matterintensity vectors, which were then concatenated to pro-duce a single, combined gray and white matter intensityvector. These combined intensity vectors were then usedas rows to build a data matrix M of dimension (# subjects)3 (# gray voxels 1 # white voxels).

Once the data matrix M is obtained, construction of a Reebgraph proceeds in several steps: (1) column selection, (2)metric selection, (3) filter selection, and finally (4) selection ofvalues of the filter parameter called gain and resolution. Thefirst step, column selection, is used to define the subjectspace S; this is done by selecting a subset of columns of M,whose size then determines the dimension of S. In our case,the columns of M correspond to voxels, and we chose todefine S by selecting those voxels whose variance was atleast 0.03. (In modulated and segmented images such asthose utilized here [Hoeft et al., 2011], voxels intensities cor-respond to proportions of volume and thus range in valuefrom 0 to 1). The next step, metric selection, is used to definedistances within the subject space S, and thus provide a mea-sure of similarity between subjects. Euclidean distance, ageneralization of the usual distance between points P1 5 (x1,y1) and P2 5 (x2, y2) in the coordinate plane is only one ofmany choices for a metric on S.

dðP1;P2Þ5ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiðx22x1Þ21ðy22y1Þ2

q

Because we had no a priori basis for giving more weightin the analysis to a particular subset of the high-variancevoxels, we chose to standardize the voxel data beforeapplying the Euclidean metric. The process of standardiza-tion, where data for each variable is first demeaned andthen scaled to have variance 1, is a common statistical steptaken in this context in order to give each variable equalweight in the subsequent analysis. In Iris, this is achievedby selecting the metric referred to as the variance-normalized Euclidean metric.

The third step, filter selection, is in many ways the mostcritical choice made in the construction of a Reeb graphsince it is the filters that transform the similarity informa-tion determined in the previous two steps into a visualrepresentation that is easily grasped by the human eye,namely, a 2D rendering of a 3D graph (Fig. 3, below). Fil-ters provide an alternative notion of similarity among sub-jects that is distinct from the one defined by the metricchosen for the subject space S, and together these twonotions of similarity guide the construction of the verticesof the Reeb graph, in two stages. In the first stage, the fil-ters are used to sort the subjects into bins based on simi-larity, where bin size is controlled by the resolutionparameter from step 4. To capture the structure of variabil-ity among the subjects, we chose the first two principalcomponent scores (PC1 and PC2) of the variance-

normalized data as our filters, so that two subjects werejudged to be similar (i.e., shared the same bin) if both theirPC1 and PC2 scores were sufficiently similar. In the sec-ond stage, the metric selected in step 2 is used to furthercluster the subjects within each bin, so that two subjectswithin the same bin are merged into the same cluster ifthe distance between them is sufficiently small; each clus-ter obtained in this way is then viewed as a vertex of theReeb graph. Finally, the edges of the graph are constructedas follows: Although bin size is governed by the resolutionparameter, bins are allowed to overlap (i.e., share subjects)to an extent governed by the gain parameter, and thisoverlap may lead clusters from different bins to share sub-jects. Any two clusters (i.e., vertices) that share a subjectare then joined by an edge. We may think of the resultinggraph as a view of the data through a microscope, wheregain and resolution play roles analogous to level of lightand level of magnification. A group of edge-connected ver-tices of the resulting Reeb graph may then serve as a can-didate subgroup within the data (Fig. 3). In our case, achoice of parameter values that clearly decomposed thedata into subgroups consisted of a gain of 4 for both PC1and PC2, a resolution of 45 for PC1, and a resolution of 30for PC2.

Analysis of Behavioral Data

Once subgroups were identified with TDA based onneuroimaging data, we used standard t-tests to comparesubgroup differences on behavioral measures.

Figure 3.

TDA results. An Iris rendering of a Reeb graph of the FXS data,

with labels and ellipses added to indicate the subgroups LH and

LL. Note that the size of each node corresponds to the number

of subjects that were clustered to form that node, and that an

edge between two nodes indicates that the corresponding clus-

ters have a subject in common. The light blue node between the

subgroups LH and LL is not included in either ellipse because it

contains two subjects, one from each subgroup; nevertheless,

the edges connected to this node indicate that each of the these

subjects is also contained in a neighboring node within the

appropriate subgroup. [Color figure can be viewed in the online

issue, which is available at wileyonlinelibrary.com.]

r Topology Finds Fragile X Syndrome Phenotypes r

r 4909 r

Page 7: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

Univariate Analyses of Magnetic Resonance

Images

Between subgroup contrasts using gray and white matter

images were performed with FSL’s randomize tool (http://

fsl.fmrib.ox.ac.uk/fsl/fslwiki/Randomise) with 5000 per-

mutations and threshold-free cluster enhancement [Smith

and Nichols, 2009] to correct for multiple comparisons.

Images corresponding to P< 0.005 (corrected) were retained.

RESULTS

Multivariate Pattern Analysis

The parameter selections described above decomposedthe data into two large subgroups and two smaller sub-groups, as well as a single disconnected node containingjust one subject (Fig. 3). In addition to classification bysize, these subgroups could also be classified by PC1;indeed, size (small or large) and average value of PC1

Figure 4.

2D and 3D gray matter differences. Top: Red color indicates

voxels where gray matter volume is significantly enlarged in the

LH subgroup relative to LL (P< 0.005, corrected). Bottom: ante-

rior, superior, and lateral views of 3D rendering of all such vox-

els, color-coded by region: noninsular cortex (red), caudate

(magenta), putamen (purple), insular cortex (peach), thalamus

(cyan), hippocampus (blue), amygdala (green), cerebellum (yel-

low), and brain stem (lavender). The full MNI 152 cortex (pink)

is included as a reference. In lower right view, cortical and ante-

rior structures were removed to expose posterior structures,

so that in addition to the superior surface of the cerebellum, a

separate region of the cerebellar vermis is visible. [Color figure

can be viewed in the online issue, which is available at wileyonli-

nelibrary.com.]

r Romano et al. r

r 4910 r

Page 8: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

(low or high) together clearly distinguished the four sub-groups from one another. Accordingly, we will refer to thesubgroups based on these Size and PC1 characteristics,respectively, as large-low (LL, n 5 19, mean [SD] age, 2.88[0.67] years), large-high (LH, n 5 18, mean [SD] age, 3.13[0.41] years), small-low (SL, n 5 4, mean [SD] age, 1.76[0.20] years), and small-high (SH, n 5 10, mean [SD] age,2.91 [0.55] years). Because of the limited sample size of thesmaller subgroups (n 5 4 and 10), and because SL and SHdiffer significantly in age from each other, our subsequentanalyses focused on comparing the two larger subgroups(LL and LH).

Univariate VBM Results

Voxel-wise comparison of the grey and white matterimages for the LL and LH subgroups showed widespreadsignificant and directionally uniform differences, with theLH subgroup consistently showing enlarged volumes rela-tive to LL. Gray matter differences were bilaterally sym-metric, encompassing regions of the cortex and of multiplesubcortical structures as well as of the cerebellum. Asidefrom extensive midline cortical differences, differenceswere also found in the insular cortex, the inferior orbito-frontal cortex, the posterior region of the temporal lobe,and the inferior parietal lobule. Subcortical gray matterdifferences were found in the caudate, putamen, thalamus,hippocampus, and amygdala; cerebellar gray matter differ-ences were observed in the superior surface of the cerebel-

lum and the cerebellar vermis. White matter differenceswere also bilaterally symmetric, encompassing a largeregion of subcortical white matter as well as regions ofcerebellar and brainstem white matter. The 2D gray matterimages in the top portion of Figure 4 show those voxelsthat are significantly enlarged in LH relative to LL(P< 0.005) on a multiplanar image selected at a mid-sagittal location; the collection of all such voxels across thewhole brain are rendered as 3D images in the bottom por-tion of Figure 4, in Figure 5, and in Supporting Informa-tion Video 1. The corresponding white matter imagesappear in Figure 6 and in Supporting Information Video 2.All 3D images were produced with the 3D Viewer plug-in[Schmid et al., 2010], as implemented in the Fiji distribu-tion [Schindelin et al., 2012] of the ImageJ image-processing software suite [Schneider et al., 2012].

Behavioral Results

Comparison of key behavioral measures for the LL andLH subgroups showed significant differences across all ofthe standardized measures comprising the Mullen Scalesof Early Learning (MSEL–Table II; higher scores indicatehigher cognitive function), as well as across all ADOS andADI measures (Tables III and IV; lower scores indicate lesssevere symptoms) except those for play (ADOS,P 5 0.0829) and repetitive or stereotyped behavior (ADOS,P 5 0.0883; ADI, P 5 0.368). All significant differences(P< 0.05) were coupled with effect sizes greater than 0.5(Cohen’s d), with large effect sizes for the MSEL Compos-ite (0.95), Receptive Language (0.82), Visual Reception(0.82) standardized scores. These differences uniformlyplace the LH subgroup on the more severe (or lower func-tioning) end of the spectrum on every measure. Behavioralcomparisons of LL and LH with SH, the larger of the twosubgroups excluded from our analysis, can be found inSupporting Information Tables I–VI).

DISCUSSION

Although one might naturally expect variation withinFXS at the “macro” level of behavior, our study suggeststhat FXS may lack homogeneity at the neurobiological levelas well despite its single-gene origin. Indeed, our resultssuggest that the LH and LL subgroups may correspond toseparable phenotypes within FXS that have significantly dif-ferent neuroanatomical and behavioral profiles. These sub-groups accounted for a large proportion of 1- to 3-year-oldboys with FXS in our sample (71%). It is interesting to notethat the neuroanatomical and behavioral differencesbetween LH and LL are analogs to differences previouslynoted to distinguish young children with FXS fromtypically-developing children. Specifically, in addition tothe LH subgroup scoring lower on measures of IQ andhigher on measures of autism-related behaviors, the regionsof the brain that are enlarged in LH relative to LL (e.g., the

Figure 5.

3D gray matter differences: oblique view. Oblique anterior-

superior-lateral view of 3D rendering of all voxels where gray

matter volume is significantly enlarged in the LH subgroup rela-

tive to LL (P< 0.005, corrected), color-coded by region: nonin-

sular cortex (red), caudate (magenta), putamen (purple), insular

cortex (peach), thalamus (cyan), hippocampus (blue), amygdala

(green), and cerebellum (yellow). The full MNI 152 cortex (pink)

is included as a reference. [Color figure can be viewed in the

online issue, which is available at wileyonlinelibrary.com.]

r Topology Finds Fragile X Syndrome Phenotypes r

r 4911 r

Page 9: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

Figure 6.

2D and 3D white matter differences. Top: Red color indicates

voxels where white matter volume is significantly enlarged in

the LH subgroup relative to LL (P< 0.005, corrected). Bottom:

Anterior, superior, lateral, and oblique views of 3D rendering of

all such voxels, color-coded by region: subcortical white matter

(magenta), cerebellar white matter (lime green), and brain stem

white matter (dark green). The full MNI 152 cortex (pink) is

included as a reference. In lower right, an oblique superior-

posterior view was used to highlight the separation between the

subcortical and cerebellar/brain stem regions. [Color figure can

be viewed in the online issue, which is available at wileyonlineli-

brary.com.]

TABLE II. Mullen scales of early learning

Standardized subscale LL mean(SD) LH mean (SD) Difference P-value Cohen’s d

Composite 58.68 (11.29) 50.67 (3.43) 8.02 0.004 0.95Receptive Language 27.89 (10.65) 20.94 (2.04) 6.95 0.006 0.89Visual Reception 28.89 (10.32) 22.11 (5.18) 6.78 0.008 0.82Expressive Language 26.21 (8.84) 21.72 (3.61) 4.49 0.026 0.65Fine Motor 23.37 (4.34) 20.50 (2.12) 2.87 0.043 0.58

r Romano et al. r

r 4912 r

Page 10: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

caudate, putamen, and thalamus) are consistent with theregions reported to be enlarged in FXS relative to typically-developing controls [Hazlett et al., 2012; Hoeft et al., 2008],with the notable exceptions of the amygdala and cerebellarvermis, which appear enlarged in LH relative to LL butreduced in FXS relative to typically-developing controls. Itis also potentially noteworthy that the LH subgroup showsenlarged volume of the middle cerebellar peduncles (MCP),white matter tracts that connect the cerebellum to the pons(Fig. 7). These same MCP tracts are also the location in

which T2 hyperintensities occur in patients with fragile Xtremor/ataxia syndrome (FXTAS), often referred as the“MCP sign” in males with the “premutation” form of theFMR1 gene mutation [Brunberg et al., 2002]. The signifi-cance of this finding in the LH subgroup and its possibleassociation with neuroimaging findings in FXTAS is apotential area for future investigation.

As to a plausible explanation for the differences betweenLL and LH, one could speculate about the existence of a sec-ond genetic factor that, through its interaction with FMR1,leads to a bifurcation in the developmental trajectories of

TABLE III. Autism diagnostic observation schedule-generic

Subscale LL mean (SD) LH mean (SD) Difference P-value Cohen’s d

Social 5.00 (3.99) 7.83 (3.71) 22.83 0.016 20.73Communication/Social 8.47 (6.02) 12.39 (4.96) 23.92 0.019 20.71Communication 3.47 (2.20) 4.56 (1.65) 21.08 0.049 20.55Stereotyped Behavior 1.53 (1.80) 2.28 (1.41) 20.75 0.083 20.46Play 2.63 (1.34) 3.22 (1.26) 20.59 0.088 20.45

TABLE IV. Autism diagnostic interview–revised

Subscale LL mean (SD) LH mean (SD) Difference P-value Cohen’s d

Social 7.56 (4.88) 10.71 (5.46) 23.15 0.041 20.61Communication (nonverbal) 7.81 (4.39) 10.19 (3.17) 22.38 0.045 20.62Abnormal development 3.89 (1.02) 4.41 (0.80) 20.52 0.050 20.57Communication (verbal) 3.67 (3.79) 10.00 (2.83) 26.33 0.064 21.88Repetitive and stereotyped behavior 3.06 (1.80) 3.24 (1.30) 20.18 0.368 20.11

Figure 7.

MCP sign and white matter differences. Left: Red color indicates

voxels where white matter volume is significantly enlarged in

the subgroup LH relative to LL (P< 0.005, corrected), and blue

ellipses indicate region spatially analogs to the “MCP sign” in

FXTAS. Right: Oblique posterior 3D view with the coronal slice

of the left image placed in its correct orientation. Note that

opacity of the slice dims the brightness of white matter anterior

to the slice, so that the boundary between anterior and poste-

rior white matter coincides with the regions shown in red in

the left image. [Color figure can be viewed in the online issue,

which is available at wileyonlinelibrary.com.]

Figure 8.

Unlabelled PCA scatter plot. Scatter plot of PC2 versus PC1

values for each subject, obtained from applying PCA to data

matrix used in TDA (Compare with Supporting Information Fig.

1, in which the subgroups are color-coded).

r Topology Finds Fragile X Syndrome Phenotypes r

r 4913 r

Page 11: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

children with FXS. For example, a similar bifurcation hasbeen suggested in 22q.11.2 deletion syndrome, where func-tional polymorphisms of the COMT gene may contribute toreduced prefrontal cortical gray matter and increased riskfor developing psychosis [Gothelf et al., 2005]. Further evi-dence for gene interaction comes from studies of Fmr1knockout mice, which show that differences in genetic back-ground are associated to significant differences in socialapproach. Research designed to test this hypothesis couldbe carried out focusing on genotypic variations in signalingpathways known to interact with the FMR1 protein [Ascanoet al., 2012]. Given that the data analyzed here are from thefirst time point in a longitudinal study of young boys withFXS [Hoeft et al., 2010], we also plan to perform TDA withdata collected at 3–6 years of age in the same sample, aswell as with data from an independent sample of adultswith FXS. These future studies will also attempt to exploreapproaches that allow recovery of 3D structural informationthat is lost as a consequence of MRI image vectorization.

As noted in [Nicolau et al., 2011], the power of TDA forrevealing the structure of high-dimensional data lies in itsorigins within the field of topology, the area of mathematicsthat is specifically concerned with characterizing the shapeof high-dimensional geometric structures. The tools oftopology allow TDA to drastically reduce the dimension ofhigh-dimensional datasets while at the same time shed lighton critical aspects of their structure, accomplishing bothwithout sacrificing subtle but important information thatcan easily be lost through standard approaches such as PCAand linkage-based clustering. The advantage of using TDAover PCA alone can be easily seen in Figure 8, where thescatter plot of the same values of PC1 and PC2 used in ourTDA to identify the subgroups LH and LL has no obviousstructure. (Note that SVMs require a priori labels, and sowould not be appropriate in the current context).

LIMITATIONS

Questions concerning inference and validation—naturalquestions that arise inevitably in connection to any novelanalytical approach—are the subject of active ongoingresearch in TDA. Furthermore, the most direct means ofvalidating our results, namely replication, is not one thatis available to us at this time. As mentioned earlier, we doplan to extend our analysis to include data from the samechildren at later ages, which would provide confirmationof our findings as they apply specifically to our study pop-ulation; however, replicating our finding in an independ-ent population of children with FXS within the same ageas our cohort is clearly a necessary future step in validat-ing our approach.

CONCLUSIONS

As more attention is brought to bear on the limitationsof behaviorally-defined taxonomies of psychiatric disor-

ders [Cross-Disorder Group of the Psychiatric GenomicsConsortium, 2013], the case for replacing these taxonomieswith new approaches based on high-dimensional, multi-modal data becomes more compelling. By integratingbehavioral data with imaging and genetic data, our abilityto cut the landscape of neuropsychiatric disorders alongits natural joints will likely be enhanced, thus improvingour ability to identify the boundaries between disorderswith greater neurobiological validity more accurately.TDA is one method that can contribute to this newapproach to brain disorders.

ACKNOWLEDGMENTS

Access to Iris software was provided by Ayasdi, Inc., ofwhich Gunnar Carlsson is a cofounder. David Romanowould like to thank Manish Saggar for suggesting FSLand for his help with FSL’s randomise tool, as well as Fran-cisco Pereira for his helpful comments on vectorization.

REFERENCES

Abrahams BS, Geschwind DH (2008): Advances in autism genet-ics: on the threshold of a new neurobiology. Nat Rev Genet 9:341–355.

Ascano M, Mukherjee N, Bandaru P, Miller JB, Nusbaum JD,Corcoran DL, Langlois C, Munschauer M, Dewell S, Hafner M,Williams Z, Ohler U, Tuschl T. (2012): FMRP targets distinctmRNA sequence elements to regulate protein expression.Nature 492:382.

Bray S, Chang C, Hoeft F (2009): Applications of multivariate pat-tern classification analyses in developmental neuroimaging ofhealthy and clinical populations. Front Hum Neurosci 3:30.

Brock M, Hatton D (2010): Distinguishing features of autism inboys with fragile X syndrome. J Intellect Disabil Res 54:894–905.

Brunberg JA, Jacquemont S, Hagerman RJ, Berry-Kravis EM,Grigsby J, Leehey MA, Tassone F, Brown WT, Greco CM,Hagerman PJ (2002): Fragile X premutation carriers: Character-istic MR imaging findings of adult male patients with progres-sive cerebellar and cognitive dysfunction. Am J Neuroradiol23:1757–1766.

Calhoun VD, Eichelel T, Adali T, Allen EA (2012): Decomposingthe brain: components and modes, networks and nodes.Trends Cogn Sci 16:255–256.

Carlsson G (2009): Topology and data. Bull. Am Math Soc (N.S.)46:255–308.

Cross-Disorder Group of the Psychiatric Genomics Consortium(2013): Identification of risk loci with shared effects on fivemajor psychiatric disorders: a genome-wide analysis. Lancet381:1371–1379.

Fair DA, Bathula D, Nikolas MA, Nigg JT (2012): Distinct neuro-psychological subgroups in typically developing youth informheterogeneity in children with ADHD. Proc Natl Acad SciUSA 109:6769–6774.

Gotham K, Pickles A, Lord C (2009): Standardizing ADOS scoresfor a measure of severity in autism spectrum disorders. JAutism Dev Disord 39:693–705.

Gothelf D, Eliez S, Thompson T, Hinard C, Penniman L, FeinsteinC, Kwon H, Jin S, Jo B, Antonarakis SE, Morris MA, Reiss AL.

r Romano et al. r

r 4914 r

Page 12: Topological Methods Reveal High and Low Functioning Neuro ... · Topological Methods Reveal High and Low Functioning Neuro-Phenotypes Within Fragile X Syndrome David Romano,1* Monica

(2005): COMT genotype predicts longitudinal cognitive declineand psychosis in 22q11.2 deletion syndrome. Nat Neurosci 8:1500–1502.

Hazlett HC, Poe MD, Lightbody AA, Styner M, MacFall JR, ReissAL, Piven J (2012): Trajectories of early brain volume develop-ment in fragile X syndrome and autism. J Am Acad ChildAdolesc Psychiatry 51:921–933.

Hoeft F, Lightbody AA, Hazlett HC, Patnaik S, Piven J, Reiss AL(2008): Morphometric spatial patterns differentiating boys withfragile X syndrome, typically developing boys, and develop-mentally delayed boys aged 1 to 3 years. Arch Gen Psychiatry65:1087–1097.

Hoeft F, Carter JC, Lightbody AA, Cody Hazlett H, Piven J, ReissAL (2010): Region-specific alterations in brain development inone- to three-year-old boys with fragile X syndrome. Proc NatlAcad Sci 107:9335–9339.

Hoeft F, Walter E, Lightbody AA, Hazlett HC, Chang C, Piven J,Reiss AL (2011): Neuroanatomical differences in toddler boyswith fragile X syndrome and idiopathic autism. Arch Gen Psy-chiatry 68:295–305.

Jacquemont S, Curie A, des Portes V, Torrioli MG, Berry-Kravis E,Hagerman RJ, Ramos FJ, Cornish K, He Y, Paulding C, Neri G,Chen F, Hadjikhani N, Martinet D, Meyer J, Beckmann JS,Delange K, Brun A, Bussy G, Gasparini F, Hilse T, Floesser A,Branson J, Bilbe G, Johns D, Gomez-Mancilla B. (2011): Epige-netic modification of the FMR1 gene in fragile X syndrome isassociated with differential response to the mGluR5 antagonistAFQ056. Sci Transl Med 3:64ra1.

Lord C, Rutter M, Couteur A (1994): Autism diagnostic interview-revised: A revised version of a diagnostic interview for care-givers of individuals with possible pervasive developmentaldisorders. J Autism Dev Disord 24:659–685.

Lord C, Risi S, Lambrecht L, Cook EH, Leventhal BL, DiLavorePC, Pickles A, Rutter M (2000): The autism diagnostic observa-tion schedule-generic: A standard measure of social and com-munication deficits associated with the spectrum of autism. JAutism Dev Disord 30:205–223.

Lum PY, Singh G, Lehman A, Ishkanov T, Vejdemo-Johansson M,Alagappan M, Carlsson J, Carlsson G. (2013): Extractinginsights from the shape of complex data using topology. SciRep 3:1236.

Nicolau M, Levine AJ, Carlsson G (2011): Topology based dataanalysis identifies a subgroup of breast cancers with a uniquemutational profile and excellent survival. Proc Natl Acad Sci108:7265–7270.

Pereira F, Mitchell T, Botvinick M (2009): Machine learning classi-fiers and fMRI: A tutorial overview. Neuroimage 45:S199–S209.

Schindelin J, Arganda-Carreras I, Frise E, Kaynig V, Longair M,Pietzsch T, Preibisch S, Rueden C, Saalfeld S, Schmid B,Tinevez JY, White DJ, Hartenstein V, Eliceiri K, Tomancak P,Cardona A. (2012): Fiji: An open-source platform forbiological-image analysis. Nat Methods 9:676–682.

Schmid B, Schindelin J, Cardona A, Longair M, Heisenberg M(2010): A high-level 3D visualization API for Java and ImageJ.BMC Bioinformatics 11.

Schneider CA, Rasband WS, Eliceiri KW (2012): NIH Image toImageJ: 25 years of image analysis. Nat Methods 9:671–675.

Smith SM, Nichols TE (2009): Threshold-free cluster enhancement:Addressing problems of smoothing, threshold dependence andlocalisation in cluster inference. Neuroimage 44:83–98.

Wolff JJ, Bodfish JW, Hazlett HC, Lightbody AA, Reiss AL, PivenJ (2012): Evidence of a distinct behavioral phenotype in youngboys with fragile X syndrome and autism. J Am Acad ChildAdolesc Psychiatry 51:1324–1332.

r Topology Finds Fragile X Syndrome Phenotypes r

r 4915 r