abstract arxiv:1804.01296v1 [cs.cv] 4 apr 2018...gaussian process uncertainty in age estimation as a...

Gaussian Process Uncertainty in Age Estimation as a Measure of Brain Abnormality

Benjamin Gutierrez Beckera,c,∗∗, Tassilo Kleinb, Christian Wachingera,for the Alzheimer’s Disease Neuroimaging Initiative3

aArtificial Intelligence in Medical Imaging (AI-Med), Department of Child and Adolescent Psychiatry, LMU Munchen, GermanybSAP SE Berlin, Germany

cCAMP, Technische Universitat Munchen, Germany

Abstract

Multivariate regression models for age estimation are a powerful tool for assessing abnormal brain morphologyassociated to neuropathology. Age prediction models are built on cohorts of healthy subjects and are built to reflectnormal aging patterns. The application of these multivariate models to diseased subjects usually results in high predictionerrors, under the hypothesis that neuropathology presents a similar degenerative pattern as that of accelerated aging. Inthis work, we propose an alternative to the idea that pathology follows a similar trajectory than normal aging. Instead,we propose the use of metrics which measure deviations from the mean aging trajectory. We propose to measure thesedeviations using two different metrics: uncertainty in a Gaussian process regression model and a newly proposed ageweighted uncertainty measure. Consequently, our approach assumes that pathologic brain patterns are different tothose of normal aging. We present results for subjects with autism, mild cognitive impairment and Alzheimer’s diseaseto highlight the versatility of the approach to different diseases and age ranges. We evaluate volume, thickness, andVBM features for quantifying brain morphology. Our evaluations are performed on a large number of images obtainedfrom a variety of publicly available neuroimaging databases. Across all features, our uncertainty based measurementsyield a better separation between diseased subjects and healthy individuals than the prediction error. Finally, weillustrate differences in the disease pattern to normal aging, supporting the application of uncertainty as a measure ofneuropathology.

1. Introduction

The brain is a complex organ whose morphology variessubstantially across the population. The causes of mor-phological variation have not yet been fully understood,but several studies have reported on potential causal fac-tors including age (Guttmann et al., 1998; Franke et al.,2010; Ziegler et al., 2012; Wachinger et al., 2015), sex (In-galhalikar et al., 2014), pathologies like dementia (Gaseret al., 2013; Wachinger et al., 2016), and even environmen-tal factors such as education and physical activity (Stef-

∗This paper has been accepted for publication in Neu-roimage.

∗∗Corresponding author. Waltherstrasse 23. KJPForschungsabteilung. 80337, Munich, Germany

Data used in preparation of this article were obtained fromthe Alzheimer’s Disease Neuroimaging Initiative (ADNI) database(adni.loni.usc.edu). As such, the investigators within the ADNIcontributed to the design and implementation of ADNI and/orprovided data but did not participate in analysis or writingof this report. A complete listing of ADNI investigators canbe found at: http://adni.loni.usc.edu/wp-content/uploads/howto apply/ADNI Acknowledgement List.pdf

Email addresses:[email protected] (BenjaminGutierrez Becker), [email protected] (Tassilo Klein),[email protected] (Christian Wachinger)

fener et al., 2016). Among all these variables, age wasshown to be the main factor determining brain morphol-ogy (Potvin et al., 2017). Due to the wide impact of agingon brain morphology, multivariate regression methods us-ing features based on brain morphology can in turn beused to estimate a subject’s age. A recent volume of workhas focused on modeling the normal aging of healthy indi-viduals to predict a subject’s age. Obtaining a predictionof the age with imaging features was shown to be use-ful to derive imaging biomarkers, which can potentially beused to predict brain anomaly caused by disease (Cole andFranke, 2017).

The task of predicting age from brain images has beenformulated as multivariate regression, where a predictivemodel is trained to relate structural information obtainedfrom brain MR images to the chronological age of healthysubjects (Gaser et al., 2013; Wang et al., 2014; Kondoet al., 2015; Valizadeh et al., 2017; Liem et al., 2017).The prediction from these models is interpreted as an es-timate of a subject’s biological age, in contrast to a sub-ject’s chronological age. Of particular interest is the pre-diction error, which is defined as the difference betweenthe biological and chronological age (figure 1a). Whenpredicting the age of healthy subjects, the prediction error

Preprint submitted to Elsevier April 5, 2018

arX

iv:1

804.

0129

6v1

[cs

.CV

] 4

Apr

201

8

http://adni.loni.usc.edu/wp-content/uploads/how_

is assumed to be small, while the prediction on subjectswith neuropathology is assumed to result in large posi-tive prediction errors. The error could therefore serve asa personalized marker of pathological processes (Frankeet al., 2010; Gaser et al., 2013). The main assumption be-hind using the prediction error as a measure of pathologyis that changes caused by neuropathology are equivalentto an accelerated aging process. Following this hypoth-esis, Gaser et al. (2013) and Habes et al. (2016) showedthat changes related to Alzheimer’s disease (AD) resem-ble accelerated aging, since differences between biologicaland chronological age are larger for individuals with ADthan for healthy controls. Similar results on age differ-ences have also been reported for individuals diagnosedwith schizophrenia (Nenadic et al., 2017) and depression(Koutsouleris et al., 2013). In these studies, the age pre-diction model is trained using only images from healthyindividuals. This means that contrary to their discrimi-native counterparts, a single age prediction model can beused to assess differences between healthy controls and in-dividuals diagnosed with different conditions.

Although the findings of these studies show the big po-tential of using models of healthy aging to assess brain ab-normality, a potentially limiting factor when quantifyingneuropathology through the difference between chronolog-ical and predicted age, is the assumption that morpholog-ical changes caused by disease follow an accelerated agingprocess. The assumption that brain anomaly is equiva-lent to accelerated aging may hold true for specific brainregions that accommodate neural systems with high sus-ceptibility to deleterious factors, which are therefore af-fected by aging and disease processes. However, the as-sumption of accelerated aging does likely not extend tothe whole brain, given that differences in brain morphol-ogy are caused by a variety of neurobiological processesthat are complex and non-linear (Fjell et al., 2014; Buck-ner, 2004; Hedden and Gabrieli, 2004). This is potentiallyproblematic, as current approaches for predicting the brainage are based on multivariate regression models that oper-ate on gray matter maps or morphological features acrossthe entire brain.

In this work, we build on the idea of modeling neu-ropathology as deviations from the healthy development ofthe brain. Our main hypothesis is that disease and agingresult in brain-wide patterns of change. These patterns arenot independent from each other and are essentially similarfor several brain regions where disease results in patternsresembling accelerated aging. However, these acceleratedaging patterns do not extend to the whole brain, mak-ing the assessment of deviations from healthy aging solelythrough prediction error problematic. We propose insteadthe use of Gaussian process regression (GPR). GPR canmeasure how a new subject deviates from previous obser-vations used to construct the model by means of the pos-terior prediction uncertainty. Different to prediction error,GPR uncertainty is able to measure deviations from thehealthy aging model without the implicit assumption of a

brain-wide accelerated aging pattern. Particularly for thetask of age prediction, we introduce a variation to tradi-tional Gaussian processes regression that takes the knownchronological age into account. This modification yieldsa weighted uncertainty measure. We evaluate our newmethod on a large collection of images obtained from sev-eral public datasets for assessing the variation to normalaging in mild cognitive impairment, Alzheimer’s disease,and autism. Our results support the use of Gaussian pro-cess uncertainty and the age weighted uncertainty as toolsto measure neuropathological patterns that deviate fromhealthy aging. Similar to previous age prediction mod-els, our evaluations are done with a single age predictionmodel which is trained only on healthy controls, showingits versatility across different age ranges and diseases.

2. Materials and Methods

2.1. Method overview

In this section, we describe our method for assess-ing neuropathology based on GPR uncertainty. Figure 2presents an overview of our method, which consists of twostages. In the training stage (top section of figure 2), webuild a GPR model that estimates the chronological ageof healthy subjects. This model is built using a datasetof MRI scans of healthy controls (section 2.2). Images areprocessed and segmented to extract a set of features de-scribing brain morphology (section 2.3). Finally, a GPRmodel mapping the extracted features to a predicted ageis trained on these features (section 2.4).

In the testing stage (bottom section of figure 2), we usethe GPR model trained on healthy subjects to quantifydeviations from the normal aging pattern on previouslyunseen subjects. In this stage, morphological features areextracted from the MR images of the test subjects, andthese features are then used to obtain an estimate of theage of the subject using the GPR model. From the GPRmodel, we obtain the estimated age y, an uncertainty mea-sure of the estimation cov(y), and a weighted uncertaintymeasure covw(y) (see section 2.4.1 for details on thesemeasurements). We will show in our experiments (section3) that these measurements based on the uncertainty of theGPR model can be used to assess the similarity betweensubjects in the testing set and the healthy population inthe training set.

2.2. Data

Similar to previous work on age prediction, we train anage regression model based on T1-MR images of healthyindividuals. The training images for our age regressionmodel are extracted from three different databases: IXI 1,ABIDE (Di Martino et al., 2014), and AIBL (Ellis et al.,2009). Details for each training dataset are shown in ta-ble 1. We perform evaluation on 3 different test datasets

1http://brain-development.org/ixi-dataset/

2

(a) Prediction error ε. A series of data points are plottedshowing their chronological age y and their predicted bio-logical age y. Training points used to train the model areshown in blue. Prediction error ε for a test point is definedas the difference between its predicted biological age and itschronological age.

(b) Prediction uncertainty cov(y). Data points are plotted onthe feature space determined by two volumetric features. Circlescorrespond to training subjects and non circles to testing subjects.Uncertainty cov(y) measures the distance of a testing point toall training points in the feature space. The background colorcorresponds to uncertainty in the feature space. Areas close tothe training points have lower degrees of uncertainty. The colorof each subject encodes its chronological age.

(c) Weighted Uncertainty covw(y). Data points are shown on an extended feature space given by two volumetric features andtheir corresponding chronological age y. Weighted uncertainty covw(y) measures the distance of a testing point to all trainingpoints in this extended feature space.

Figure 1: Comparison between the three evaluated anomaly metrics: prediction error ε, GPR uncertainty cov(y) and GPR age-weighteduncertainty covw(y).

summarized in table 2. The training and testing groupsare therefore extracted from different databases and areindependent from each other. For the first and secondgroups, obtained from the OASIS (Marcus et al., 2007)

and ADNI (Jack et al., 2008) datasets, we aim at find-ing differences between Healthy Controls (HC), individu-als diagnosed with Mild Cognitive Impairment (MCI), andsubjects diagnosed with Alzheimer’s disease (AD). For the

3

Figure 2: Overview of our brain anomaly prediction model. The top part corresponds to the training stage where a set of images from healthyindividuals is used to build a GPR age prediction model. The bottom part corresponds to the age prediction stage where the GPR modelis used to predict the age of a set of test images. A predicted age y as well as the uncertainty measures cov(y) and covw(y) are obtained.These measurements can be used to find differences between the HC and Dx groups.

third dataset, ABIDE II, we look for differences betweenHC and individuals diagnosed with autism. In total, 1,543images were used for training and 4,819 images for testing.

2.3. Feature Extraction

As mentioned in the overview section of our method,we require to extract features from structural MR im-ages to quantify the brain morphology. Several imagebased features have previously been used for the age es-timation task: Good et al. (2001), Franke et al. (2010)and Gaser et al. (2013) used Voxel Based Morphometry(VBM) features, which identify differences on the compo-sition of brain tissue by registering all structural imagesto the same space. After segmenting gray and white mat-ter, the voxel values of the gray matter extracted imagesare used as features. Finally the dimensionality of the fea-ture space is reduced using Principal Component Analysis(PCA) and keeping only the principal modes of variation.Valizadeh et al. (2017) and Wang et al. (2014) use volumet-ric, thickness and curvature measurements of the brain, de-rived from the brain segmentation with FreeSurfer (Fischl,2012). These different sets of features have been presentedin separate studies but, to the best of our knowledge, havenot yet been directly compared on the same age estimationtask. In summary, we use three different types of featuresin our approach:

• VBM features (50 Principal Components),

• Thickness of 70 cortical structures.

• Volume of 50 brain structures.

Additionally we build a prediction model combining VBM,thickness and volume features together. VBM featureswere extracted using the CAT12 toolbox 2 together withthe SPM12 toolbox 3 for segmentation. The preprocess-ing of the images, the segmentation of gray matter, andpost processing were implemented in line with the pipelineproposed by Franke et al. (2010). Dimensionality reduc-tion was performed using the PCA library included in thescikit-learn toolbox (Pedregosa et al., 2011). The princi-pal component directions were estimated using only thetraining sample, and the testing data was projected tothis estimated lower dimensional space. For all the anal-yses on thickness and volume features, FreeSurfer version5.3 was used. The default Deskian/Killiany atlas was usedfor the parcellation to obtain thickness measurements. Weare using all subcortical volume measurements as providedby FreeSurfer and described in the FreeSurfer subcorticalsegmentation pipeline.

2http://www.neuro.uni-jena.de/cat/3http://www.fil.ion.ucl.ac.uk/spm/

4

Training Dataset No. Images Female/Male Age (Min-Max) Age QuantilesIXI 581 311/270 19-87 33.7 - 48.6 - 62.2ABIDE I 573 99/474 6-64 14.6 - 17.0 - 20.1AIBL 409 209/200 55-92 67.0 - 73.0 - 79.0

Table 1: Summary of the datasets used for training the age prediction model.

Testing Dataset No. Images Female/Male Age (Min-Max) Age Quantiles Target DxADNI 3591 1422/2169 54-90 71.2 - 75.1 -79.7 MCI - ADOASIS 196 129/67 60-82 71.0 - 76.0 - 82.0 MCI - ADABIDE II 1032 247/785 5-64 9.5 - 11.4 - 15.2 Autism

Table 2: Summary of the datasets used for testing.

2.4. Uncertainty Estimation with Gaussian Process Re-gression

Several multivariate regression techniques have beenpreviously used for the task of age prediction from brainMR images. A detailed comparison of the performance ofneural networks, random forests, k-nearest neighbors, sup-port vector machines, multiple linear regression and ridgeregression was presented by (Valizadeh et al., 2017). Inthe work by Franke et al. (2010) relevance vector regres-sion was preferred.

In our case, we are interested in modeling the age re-gression problem with a model that does not only provideestimates of the biological age, but also provides uncer-tainties of these estimates. Gaussian process regressionachieves a comparable accuracy in age regression thanstandard regression techniques, while offering the advan-tage of providing an estimate of the uncertainty of eachprediction. GPR models have been used successfully be-fore as age prediction models (Cole et al., 2015, 2016), butthe potential of using uncertainty based measurements asbiomarkers has not been explored yet. In this section, wewill briefly introduce GPR models, focusing particularlyon the calculation of uncertainty, where we refer the readerto (Rasmussen and Williams, 2005) for a more detailed ex-planation, and introduce our modification for computingan age-weighted uncertainty.

2.4.1. Gaussian Process

A Gaussian process is defined as a collection of randomvariables, any finite number of which have a joint Gaussiandistribution (Rasmussen and Williams, 2005) with:

• a mean function m(x) = E[f(x)],

• and a covariance function k(x,x′) = E[f(x)−m(x)f(x′)−m(x′)].

Although not necessary, it is often assumed that the meanfunction m(x) of the GPR is zero. Therefore the designof a GPR is focused on the selection of an appropriatecovariance function k(x,x′) measuring the similarity be-tween data points. This covariance function is equivalentto a similarity measure between two data points, givingsmall values for points that are close to each other and

large values otherwise. In our case we define the covariancefunction as a squared exponential function of the form:

k(xi,xj) =

K∑k=1

exp

[−(xki − xkj )2

2l2k

]+ σ2

nδ(xi,xj), (1)

where σ2n is the noise variance, lk is the length scale of the

k-th feature, and δ is the Kronecker delta function. Wecan think of the length scale vector l ∈ RK as a parametercontrolling how close should two data points xi and xjshould be in order to influence each other. In general, thesmaller an element lk is, the more dependent y is to thefeature element xk.

We model the joint distribution of the training and testoutputs as:[

yy′

]∼ N

(0,

[K(X,X) K(X,X′)K(X′,X) K(X′,X′)

]). (2)

The elements of the joint distribution in Eq.(2) can besummarized as follows:

• an intra-covariance matrix of the training setK(X,X) ∈Rmxm ,

• an intra-covariance matrix of the testing setK(X′,X′) ∈Rnxn,

• an inter-covariance matrix between the training andtesting set K(X,X′) ∈ Rmxn,

• a training labels vector y ∈ Rm, and

• a testing labels vector y′ ∈ Rn,

where m corresponds to the number of training samplesand n to the number of testing samples. The matricesK(X,X) have the form:

K =

k(x1,x1) k(x1,x2) . . . k(x1,xn)k(x2,x1) k(x2,x2) . . . k(x2,xn)

......

. . ....

k(xm,x1) k(xm,x2) . . . k(xm,xn)

, (3)

where each element k(x1,x2) corresponds to a measure ofthe similarity between two feature vectors xi and xj . We

5

are interested in predicting the values for the test labelsy′, which have the following conditional distribution:

(4)y′|y,X,X′ ∼ N (K(X′,X),K(X,X)−1y,K(X′,X′)

−K(X′,X)K(X,X)−1K(X,X′)).

Using this conditional distribution we can derive thepredictive equations of a GPR:

y′ = E[y′|X, y,X′] = K(X′,X)K(X,X)−1y (5)

cov(y) = K(X′,X′)−K(X′,X)K(X,X)−1K(X,X′),(6)

which correspond to the predicted labels and to the esti-mated covariance, respectively. The estimated covariancecan also be thought as a measure of uncertainty for the pre-dicted values. This uncertainty estimate is usually used inthe GPR framework to measure the degree of confidenceof a predicted value y′ by measuring the similarity of anew observation with respect to the previous observationsin the training set.

When training a GPR model, the parameters θ = {l, σn}have to be tuned in order to fit the training data. This isdone by maximizing the marginal likelihood of the modelgiven by:

(7)log(p(y|X, θ)) = −1

2yTK(X,X)y

− 1

2log(K(X,X))− n

2log(2π).

By finding the parameters θ that maximize the marginallikelihood, we can obtain a GPR model that best fits thetraining data.

2.4.2. Age-Weighted Uncertainty

The uncertainty measurement of the GPR cov(y) issolely defined with respect to the feature vectors x. Ina common regression scenario this is a natural approachsince the real values of the labels are unknown. However,in the case of an age estimation framework, we do pos-sess the real values of the labels, which correspond to thechronological age of the patient.

We can introduce the age information into the GPRframework by creating age-weighted similarity matricesKw(X,X′,y,y′). Similar to the GPR covariance matri-ces we can construct three different similarity matrices:

• a weighted intra-similarity matrix for the trainingsamples Kw(X,X,y,y) ∈ Rmxm ,

• a weighted intra-similarity matrix for the testing sam-ples Kw(X′,X′,y,y′) ∈ Rnxn, and

• a weighted inter-similarity matrix between the train-ing and test samples Kw(X,X′,y,y′) ∈ Rmxn.

These similarity matrices are constructed in the samemanner as the covariance matrices presented in section2.4.1. The only difference consists in a modification ofthe kernel to take into account differences in age. This isachieved by creating an age weighted similarity kernel ofthe form:

kw(xi,xj ,yi,yj) = s(yi,yj)k(xi,xj), (8)

where k(xi,xj) corresponds to the kernel defined in Eq.(1)and s(yi,yj) corresponds to an age similarity term definedas:

s(yi,yj) = exp

[−(yi − yj)

2

2l2y

]+ σ2

yδ(yi,yj). (9)

where ly corresponds to the age length scale, which is aparameter controlling the effect of the age weighting. Byusing this updated kernel kw, we obtain a weighted uncer-tainty term covw(y) which takes into account the age ofthe subjects to define similarities between subjects. Thisweighted uncertainty is obtained similar to the regular un-certainty presented in Eq. (6):

covw(y′) = Kw(X′,X′)−Kw(X′,X)Kw(X,X)−1Kw(X,X′).(10)

2.5. Prediction Error, Uncertainty and Age-Weighted Un-certainty

In this work, we compare three age regression basedmetrics in order to measure their usefulness as a biomarkerto distinguish between healthy controls and subjects withdifferent neuropathologies. These metrics are the com-monly used prediction error ε = y−y (Franke et al., 2010),the GPR uncertainty cov(y), and the GPR age-weighteduncertainty covw(y). As discussed in the introduction, theprediction error has previously been used to assess differ-ences between healthy and non-healthy populations. Theprediction error is the difference between the predicted andchronological age, as shown in figure 1a. A higher pre-diction error is assumed to indicate an accelerated agingprocess (Franke et al., 2010; Gaser et al., 2013).

The computation of the GPR uncertainty cov(y) waspresented in section 2.4.1. It can be thought of as a metricon how close a testing point is to all the training pointsin the feature space, illustrated in figure 1b. The scatterplot represents a set of subjects in a 2-dimensional spacecomposed of the volume of two different structures. Bytraining a GPR on a set of training points (representedby circles), we obtain a measure of uncertainty cov(y) forevery point in the 2D-space. This covariance matrix is rep-resented by the shading of the grid, where darker regionscorrespond to regions where the predictor has higher con-fidence on its prediction. When performing prediction onpreviously unseen points (Test Subject 1 and Test Subject2), we can obtain both a predicted age y and its confi-dence cov(y). In figure 1b, we observe that even though

6

both test subjects get similar predicted values, the confi-dence of the prediction for subject two is higher due to itsproximity to the training set.

The third metric, the age-weighted uncertainty covw(y)expands upon the notion of uncertainty by taking into ac-count the subject’s age. Measuring covw(y) is equivalentto adding a further dimension to the distance measured bythe normal GPR uncertainty. The reasoning behind thisis to give higher similarity to individuals which have simi-lar morphological features to healthy individuals of similarages. For example, we see in figure 1c that a healthy test-ing point (blue) is close to training points with similarfeatures and age; the testing point would therefore have ahigh covw(y) value. On the other hand, the Testing Pointcorresponding to a diseased subject (red point) would havea low covw(y) value because even though there are indi-viduals in the training set with similar feature values, theycorrespond to subjects with a different age range. Bothproposed metrics are closely related. In fact, cov(y) isequivalent to covw(y) for the special case when ly =∞.

2.6. Aging and Disease Assessment

As discussed in the introduction, several studies havedemonstrated that aging is a complex process, which af-fects different brain structures and regions at differentrates of change. It has also been reported that deleteri-ous changes caused by neurodegenerative disease follow apattern that resembles an accelerated aging process. In or-der to evaluate how aging and neurodegenerative diseaseaffect different brain regions, we performed an analysis ofthe volumetric features obtained from our training set. Tofacilitate this analysis we restricted our analysis to imagesobtained from the ADNI database. To assess which indi-vidual structures are affected either by aging, disease orboth factors, a series of simple linear fixed effects modelwere fitted to our data. In each one of these models, thedependent variable corresponds to the volume of a differentbrain structure, and the independent variables correspondto age, sex, diagnosis (0 = healthy, 1 = MCI/AD) and aninteraction term between diagnosis and age.

3. Results

3.1. Assessing the Effect of Aging and Disease on BrainDevelopment

In this section we present the results obtained afterfitting the aging and disease model described in section2.6. In table 3, we present the regression coefficients forage, diagnosis and the age/diagnosis interaction term aswell as their corresponding p-values for the linear fixedeffects model. The table is sorted by descending p-valuefor the diagnostic coefficient, which means that the struc-tures at the top of the table are those which present moresignificant volume alterations caused by disease. Age andvolume variables were normalized in order to make the co-efficients of different structures comparable. In the case

of bilateral structures we only show the values for the lefthemisphere in order to simplify our analysis. Plots show-ing the progress of the hippocampus and cerebellum whitematter across different ages for both healthy and individ-uals with MCI/AD are also presented to illustrate our re-sults in figure 3. The hippocampus was selected since itwas the structure which had the most evident effects ofage and disease. On the other hand, the cerebellum whitematter was selected as a structure which showed signifi-cant effects of aging but is apparently not largely affectedby Alzheimer’s disease.

There are a couple of relevant observations that can beextracted from table 3. First, the regression coefficientsfor age and diagnosis always have the same sign for struc-tures with significant associations, and there exist signif-icant interactions between aging and diagnosis for mostof the analyzed structures. This supports the hypothe-sis that disease and aging are overlapping processes thataffect the brain structures in the same direction. How-ever, we can also observe in table 3 that there exist somestructures that although largely affected by aging do notpresent significant disease effects (i.e. cerebellum whitematter). This can also be observed on the box plots in thetop of figure 3, where a clear difference between the HCand Dx groups is evident for all age ranges in the case ofthe hippocampus, whereas for the cerebellum white matterno significant differences exist between both groups.

To further illustrate the point that aging and diseaseare processes that affect different regions of the brain atdifferent rates, we show the progress of pairs of featuresfor both the HC and Dx groups (bottom of figure 3). Bylooking at the central plot, where the volume of left andright hippocampus is shown, we can understand the rea-soning behind the accelerated aging hypothesis of previousage estimation works. Indeed, by looking only at these fea-tures, we would be tempted to conclude that the brain ofa healthy 80 year old is essentially similar to that of adiseased 60 year old. However, this observation contrastswith the left plot, where the left and right cerebellum whitematter volumes are shown. By looking at these featuresalone, we would draw a different conclusion, since it wouldappear that there are no differences between the brains ofhealthy and diseased subjects of the same age. By lookingat the left hippocampus and left cerebellum white mat-ter simultaneously (right in figure 3), we can observe thatdisease produce changes in the brain that are essentiallydifferent to those of accelerated aging, causing the overallappearance of the brain of an average 60 year old diagnosedwith AD to be different to a healthy individual of any age.These observations support our hypothesis that morpho-logical changes associated to AD and MCI are complexand that a model of accelerated aging across the wholebrain may be too simplistic to model the specific effects ofdisease and aging at specific brain structures.

7

50 60 70 80Age

8000

10000

12000

14000

16000

18000

20000

22000

24000

Volu

me

Left Cerebellum White MatterDx Group

HCDx

50 60 70 80Age

1000

2000

3000

4000

5000

Volu

me

Left HippocampusDx Group

HCDx

1.18 1.20 1.22 1.24 1.26 1.28 1.30 1.32 1.34Left Cerebellum White Matter 1e4

1.200

1.225

1.250

1.275

1.300

1.325

1.350

Righ

t Cer

ebel

lum

Whi

te M

atte

r

1e4

6060

7070

80

80

HealthyMCI or AD

2.8 3.0 3.2 3.4 3.6 3.8Left Hippocampus 1e3

2.8

3.0

3.2

3.4

3.6

3.8

Righ

t Hip

poca

mpu

s

1e3

60

6070

7080

80

HealthyMCI or AD

28003000

32003400

36003800

Left Hippocampus

1.18

1.20

1.22

1.24

1.26

1.28

1.30

1.32

1.34

Left

Cere

bellu

m W

hite

Mat

ter

1e4

60

60

7070

8080

HealthyMCI or AD

Figure 3: Top: Box plots showing changes in left cerebellum white matter volume and hippocampus volume for individuals between 50 and80 years old. Bottom: Plots showing age progression of feature pairs for three different cases; left: structures that are not affected by disease;center: structures affected by disease that show an accelerated aging pattern; right: one structure affected by disease (left hippocampus) andone structure with no significant disease effect (left cerebellum white matter).

3.2. Training of the Age Prediction Model

Using the training datasets summarized in table 1, wetrain 4 different GPR models, each one with a differentset of features as described in section 2.3. Each one of theGPR models is trained to estimate the age of healthy sub-jects based on either volume, thickness, VBM features or acombination of all features. Our models were implementedusing python together with the scikit-learn toolbox. In ta-ble 4, we show the Mean Absolute Error (MAE) and R2

score for the training set, using a 5-fold cross validation.Our model presents similar MAE and R2 when comparedto previous work on age estimation (Valizadeh et al., 2017;Cole et al., 2016). Similar to previously reported results(Valizadeh et al., 2017), we observed higher R2 score andlower MAE for the model trained using an ensemble of allavailable features. The chronological and predicted age foreach subject in the training set are presented in the scatterplots in figure 4.

3.3. Evaluation of Gassian Process Uncertainty as a Mea-sure of Brain Abnormality

In this section, we present results of our experimentscomparing the use of prediction error ε, uncertainty cov(y)and age-weighted uncertainty covw(y) as a biomarker fordifferentiating between healthy controls and patients with

MCI, AD or autism. We performed three different ex-periments: the first two experiments are targeted at find-ing differences between HC, MCI and AD groups in boththe ADNI and OASIS databases; the third experiment isperformed on the ABIDE II database where differences be-tween autism and healthy groups are evaluated. Note thatas summarized in tables 1 and 2, the datasets used for test-ing are different to those used for training. For all experi-ments we assessed differences between groups both by per-forming non-parametric Wilcoxon rank-sum tests (Mannand Whitney, 1947) and by measuring the classificationperformance in a per subject basis by generating ReceiverOperating Characteristic (ROC) curves with their corre-sponding Area Under the Curve (AUC) values. For allexperiments, results are shown for four different sets offeatures: volume, thickness and VBM, as well as for thecombination of all three feature sets. Our proposed GPRuncertainty based metrics are compared to the predictionerror ε, obtained in a similar fashion as previous work onage estimation (Franke et al., 2010). An appropriate agelength scale parameter ly for the covw(y) metric was setindependently for each experiment by performing evalu-ations at different scales an keeping the best performingresults (See figure 5).

8

20 40 60 80Chronological Age

0

20

40

60

80

Pred

icted

Age

Volume


20

40

60

80

Pred

icted

Age

Thickness


0

20

40

60

80

100

Pred

icted

Age

VBM


0

20

40

60

80

Pred

icted

Age

All

Figure 4: Scatter plots showing the prediction results of the age prediction models trained using different feature sets.

101

100

101

102

103

104

105

length scale

0.690

0.700

0.710

0.720

0.730

AUC

OASIS

101

100

101

102

103

104

105

length scale

0.710

0.715

0.720

0.725

0.730

0.735

AUC

ADNI

101

100

101

102

103

104

105

length scale

0.578

0.580

0.582

0.584

0.586

0.588

0.590

0.592

0.594

AUC

ABIDE II

Figure 5: AUC values obtained by using the covw(y) metric for different age length scales ly . According to this curves, a different lengthscale was selected for each experiment (ADNI: ly =1 × 102, OASIS: ly =1 × 105, ABIDE: ly =1). The selected length scales are highlightedin red in each plot.

3.3.1. Experiment 1: ADNI Dataset

For our first experiment we measure the separation be-tween HC, MCI and AD groups for images obtained fromthe ADNI database. Due to the very large dataset size ofthis testing scenario, all the p-values reported in table 5 arestatistically significant (p-value< 6× 10−5). The reportedAUC values in table 6 and the ROC curves in figure 6 showconsistently a better performance of the uncertainty basedmetrics cov(y) and covw(y) with respect to ε. In generalcov(y) and covw(y) presented similar performance, butadding the age term resulted in larger AUC values for theexperiments on volume and VBM features. Box plots foreach feature set and each diagnostic group are also shown

in figure 7. The results using uncertainty based measurescov(y) and covw(y) were strongly correlated (R =0.95).In contrast, correlations of 0.35 and 0.37 were obtainedbetween ε-cov(y) and ε-covw(y), respectively.

3.3.2. Experiment 2: OASIS Dataset

Our second experiment is similar to experiment 1, butour evaluation is performed on images obtained from theOASIS database. In order to ensure similar age rangesfor the HC, MCI and AD groups, all individuals under 60years were removed from the testing dataset. Tables 7 and8 summarize the numerical results of the comparisons be-tween HC-MCI and MCI-AD groups. Similar to previousresults (Franke et al., 2010), we observed that prediction

9

0.0 0.2 0.4 0.6 0.8 1.0False Positive Ratio

0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ADNI Volume (Healthy vs MCI/AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ADNI Thickness (Healthy vs MCI/AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ADNI VBM (Healthy vs MCI/AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ADNI All (Healthy vs MCI/AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ADNI Volume (Healthy/MCI vs AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ADNI Thickness (Healthy/MCI vs AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ADNI VBM (Healthy/MCI vs AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ADNI All (Healthy/MCI vs AD)ε= y− ycov(y)

covw(y)

Figure 6: Receiver Operating Characteristic (ROC) curves for the prediction of the presence of MCI/AD (Top) or the presence of AD (Bottom)evaluated on the ADNI dataset. Columns correspond to the different evaluated features.

302010

0102030405060

ε=y−y

ADNI Volume

2468

101214161820

cov(y

)

HC MCI ADDx

0

100

200

300

400

500

600

700

800

covw(y

)

40302010

01020304050

ε=y−y

ADNI Thickness

6

8

10

12

14

16

18

20

cov(y

)

HC MCI ADDx

100

200

300

400

500

600

700

covw(y

)

40

20

0

20

40

60

80

ε=y−y

ADNI VBM

468

10121416182022

cov(y

)

HC MCI ADDx

0

200

400

600

800

1000

covw(y

)

30

20

10

0

10

20

30

40

ε=y−y

ADNI All

4

6

8

10

12

14

16

cov(y

)

HC MCI ADDx

50

100

150

200

250

300

350

400

450

covw(y

)

Figure 7: Box plots showing prediction results for ε (top), cov(y) (middle) and covw(y) (bottom) for HC, MCI and AD groups on the ADNIdataset. Columns correspond to the different evaluated features.

error ε is a useful biomarker in this particular dataset. Ac-cording to the results in tables 7 and 8, ε presented largerAUC values and smaller p-values for the models trainedusing volume and thickness features when discriminatingbetween HC and MCI groups. However for all the restof the evaluations on the OASIS database, cov(y) and

covw(y) presented the best performance amongst the eval-uated metrics. Correlation coefficients between the met-rics were 0.99 for cov(y)-covw(y), 0.38 for ε-cov(y) and0.38 for ε-covw(y). Notice that in this case the resultsfor cov(y) and covw(y) are strongly correlated due to thevery large value assigned to the age length scale parameter

10

Tab

le3:

Coeffi

cien

tsan

dp

-valu

esco

rres

pon

din

gto

the

lin

ear

mod

els

fitt

edto

pre

dic

tvolu

me

of

ind

ivid

ual

stru

ctu

res.

Str

uct

ure

sare

sort

edby

des

cen

din

gp

-valu

efo

rd

iagn

ost

ic.

Str

uct

ure

Age

Coeffi

cien

tD

xC

oeffi

cien

tA

ge×

Dx

Coeffi

cien

tA

ge

p-v

alu

eD

xp

-valu

eA

ge×

Dx

p-v

alu

eL

eft.

Hip

poca

mp

us

-0.3

0-0

.53

0.03

9.12×

10−107

4.04×

10−291

2.8

7×

10−2

Lef

t.A

mygd

ala

-0.2

1-0

.41

0.07

4.1

3×

10−50

1.5

3×

10−169

1.6

7×

10−5

Lef

t.In

f.L

at.V

ent

0.27

0.38

-0.0

71.

77×

10−70

1.50×

10−141

3.0

0×

10−5

Lef

t.L

ater

al.V

entr

icle

0.22

0.24

-0.0

98.3

8×

10−46

1.8

4×

10−57

1.3

2×

10−7

CS

F0.

160.

22-0

.11

1.07×

10−23

1.85×

10−46

5.9

5×

10−11

Lef

t.A

ccu

mb

ens.

area

-0.3

0-0

.22

0.07

2.7

5×

10−75

1.4

8×

10−43

7.8

9×

10−5

3rd

.Ven

tric

le0.

270.

19-0

.13

1.04×

10−70

1.53×

10−36

3.4

9×

10−15

Lef

t.ch

oroi

d.p

lexu

s0.

140.

17-0

.04

2.04×

10−18

1.5

9×

10−27

2.1

6×

10−2

Lef

t.P

uta

men

-0.1

5-0

.14

0.02

9.84×

10−21

2.15×

10−17

0.3

0L

eft.

Ven

tral

DC

-0.2

7-0

.11

-0.0

37.

09×

10−

71

3.38×

10−14

3.0

0×

10−2

Lef

t.T

hal

amu

s.P

rop

er-0

.32

-0.1

1-0

.01

6.28×

10−97

2.65×

10−13

0.4

8L

eft.

Cer

ebel

lum

.Cor

tex

-0.2

6-0

.07

-0.0

31.9

8×

10−

65

5.01×

10−6

9.0

2×

10−2

Bra

in.S

tem

-0.2

3-0

.06

-0.0

21.

15×

10−48

1.60×

10−5

0.1

3L

eft.

Cau

dat

e0.

070.

040.

037.

89×

10−

61.

93×

10−2

0.1

2L

eft.

Cer

ebel

lum

.Wh

ite.

Mat

ter

-0.3

0-0

.03

0.00

1.60×

10−74

0.10

0.8

6L

eft.

Pal

lid

um

-0.0

6-0

.02

-0.0

44.

84×

10−

40.

23

1.3

1×

10−2

4th

.Ven

tric

le0.

090.

00-0

.08

2.38×

10−7

0.83

1.1

3×

10−5

Lef

t.ve

ssel

0.12

0.00

-0.0

21.

05×

10−12

0.9

30.3

7

ly. The ROC curves in figure 8 and the box plots in figure9 confirm these observations.

3.3.3. Experiment 3: ABIDE II Dataset

For our third experiment, we evaluate the age predic-tion on the ABIDE II dataset that contains subjects withautism. To the best of our knowledge, no previous age pre-diction based approach has been used for studying autism.However, previous studies have suggested abnormal braindevelopment in patients diagnosed with autism Courch-esne et al. (2001). By observing figures 11 and 10 it isclear that differences between HC and MCI groups areconsiderably less noticeable than in the previous two ex-periments. In fact, by analyzing the AUC results in ta-ble 10 and the p-values in table 9, we can observe thatno significant differences between groups were found us-ing the standard prediction error approach using ε as apredictive variable. In contrast, both uncertainty basedmeasurements showed significant differences between HCand autistic groups. Also different to the previous two ex-periments, the weighted uncertainty based measurementcovw(y) showed a better performance than the standarduncertainty cov(y). In this case correlation coefficientsbetween the metrics were 0.64 for cov(y)-covw(y), 0.30for ε-cov(y) and 0.49 for ε-covw(y). The lower correla-tion between cov(y)-covw(y) and the larger correlationbetween ε-covw(y) when compared to the two previousexperiments are caused by the smaller value of ly.

4. Discussion

In this work, we have proposed to use uncertainty inGPR as a measure of neuropathology. In contrast to pre-vious work based on the prediction error, which assumessimilar trajectories between aging and disease processes,the GPR uncertainty handles differences in morphology ofdiseased brains that do not necessarily lie on a healthy ag-ing trajectory. If we consider predicted age as an agingbiomarker, GPR uncertainty can be seen as a measure ofuncertainty of the aging biomarker. We have evaluated theability of GPR uncertainty to discriminate subjects withpathology for two very different diseases: Alzheimer’s dis-ease where we worked with a cohort of advanced age in-dividuals, and autism where we operated on a youngercohort. To the best of our knowledge, this is the first timethat uncertainty in a GPR model is used as a measure ofneuropathology and it is also the first application of anage regression model for autism. For both applications,we work with a single model that was trained on healthysubjects from a wide age range. This distinguishes ourwork from discriminative approaches, which require theinclusion of images of patients diagnosed with a particulardisease in the training set.

In this work, we build age prediction models using threedifferent types of brain features: VBM, volume and thick-ness as well as a combination of all of them. Based on

11

Feature Set MAE R2

Volume 5.52 0.87Thickness 6.50 0.80VBM 5.65 0.86All 3.86 0.93

Table 4: Mean Absolute Error (MAE) and R2 score of the age prediction models trained with different sets of features. Measurements areobtained using a 5-fold cross validation on the training set.


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

OASIS Volume (Healthy vs MCI/AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

OASIS Thickness (Healthy vs MCI/AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

OASIS VBM (Healthy vs MCI/AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

OASIS All (Healthy vs MCI/AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

OASIS Volume (Healthy/MCI vs AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

OASIS Thickness (Healthy/MCI vs AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

OASIS VBM (Healthy/MCI vs AD)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

OASIS All (Healthy/MCI vs AD)ε= y− ycov(y)

covw(y)

Figure 8: Receiver Operating Characteristic (ROC) curves for the prediction of the presence of MCI/AD (Top) or the presence of AD (Bottom)evaluated on the OASIS dataset. Columns correspond to the different evaluated features.

HC-MCI MCI-ADε cov(y) covw(y) ε cov(y) covw(y)

Volume 6.82× 10−19 2.07× 10−29 1.85× 10−31 3.36× 10−60 4.54× 10−69 2.05× 10−74

Thickness 2.18× 10−18 3.53× 10−27 2.92× 10−24 3.98× 10−38 6.03× 10−105 1.45× 10−101

VBM 3.10× 10−7 6.12× 10−30 1.20× 10−34 5.61× 10−22 2.92× 10−77 1.17× 10−76

All 5.63× 10−5 5.16× 10−36 5.10× 10−37 9.46× 10−23 2.76× 10−103 1.02× 10−103

Table 5: p-values corresponding to the statistical tests performedon the experiments comparing the HC, MCI and AD groups on theADNI dataset.


Volume 0.67 0.71 0.72 0.73 0.76 0.77Thickness 0.66 0.71 0.71 0.69 0.80 0.80VBM 0.60 0.71 0.72 0.64 0.77 0.77All 0.59 0.74 0.74 0.64 0.81 0.81

Table 6: Area Under the Curve (AUC) values corresponding to thestatistical tests performed on the experiments comparing the HC,MCI and AD groups on the ADNI dataset.

our results, we have observed that none of the featuresoutperformed the others across all our evaluations. How-ever, combining all features resulted on a model with thelowest prediction error and consistently achieved the bestresults when performing separation between healthy anddisease groups. This is in line with previous work (Val-izadeh et al., 2017; Liem et al., 2017), where it has been


Volume <0.0001 <0.0001 < 0.0001 0.0611 0.0065 0.0067Thickness 0.0005 0.1186 0.1123 0.0707 0.0002 0.0002VBM 0.0075 <0.0001 <0.0001 0.0225 0.0003 0.0003All 0.0015 <0.0001 < 0.0001 0.0707 0.0020 0.0020

Table 7: p-values corresponding to the statistical tests performedon the experiments comparing the HC, MCI and AD groups on theOASIS dataset. The highlighted values correspond to p values withsignificance levels under 0.05 (light background), 0.01 (middle back-ground) and 0.001 (dark background).

observed that extended feature sets which give models alarger variety of measurements to base the prediction on,result in more accurate age estimation models. Althoughthe main goal of this paper is not to present a state-of-the-art age prediction method, we have observed that ourproposed GPR model has a high prediction accuracy, com-parable to that of current age estimation approaches (Val-izadeh et al., 2017; Cole et al., 2016).

We have also demonstrated the generalization abilityof our method by training and testing our model in com-pletely independent datasets. Our training dataset wasbuilt based on the IXI, ABIDE and AIBL databases whiletesting was performed on the OASIS, ADNI and ABIDE

12

30

20

10

0

10

20

30

ε=y−y

OASIS Volume

8

10

12

14

16

18

20

cov(y

)

HC MCI ADDx

100

200

300

400

500

600

700

800

covw(y

)

50

40

30

20

10

0

10

20

30

ε=y−y

OASIS Thickness

6

8

10

12

14

16

18co

v(y

)

HC MCI ADDx

100

200

300

400

500

600

700

covw(y

)

60

40

20

0

20

40

60

80

ε=y−y

OASIS VBM

468

10121416182022

cov(y

)

HC MCI ADDx

0

200

400

600

800

1000

covw(y

)

40

30

20

10

0

10

20

ε=y−y

OASIS All

6

8

10

12

14

cov(y

)

HC MCI ADDx

50

100

150

200

250

300

350

400

450

covw(y

)

Figure 9: Box plots showing prediction results for ε (top), cov(y) (middle) and covw(y) (bottom) for HC, MCI and AD groups on the OASISdataset. Columns correspond to the different evaluated features.


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ABIDE Volume (Healthy vs Autism)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ABIDE Thickness (Healthy vs Autism)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ABIDE VBM (Healthy vs Autism)ε= y− ycov(y)

covw(y)


0.0

0.2

0.4

0.6

0.8

1.0

True

Pos

itive

Rat

io

ABIDE All (Healthy vs Autism)ε= y− ycov(y)

covw(y)

Figure 10: Receiver Operating Characteristic (ROC) curves for the prediction of the presence of Autism. Columns correspond to the differentevaluated features.


Volume 0.77 0.73 0.73 0.73 0.75 0.75Thickness 0.68 0.62 0.62 0.68 0.76 0.76VBM 0.64 0.72 0.72 0.68 0.77 0.77All 0.67 0.74 0.74 0.68 0.77 0.77

Table 8: Area Under the Curve (AUC) values corresponding to thestatistical tests performed on the experiments comparing the HC,MCI and AD groups on the OASIS dataset.

II databases. Using different datasets for training andtesting complicates the age prediction problem, as unde-sired dataset biases can impact the result (Wachinger andReuter, 2016; Gutierrez et al., 2017). However, such ex-periments model a scenario that is more realistic, as thetranslation to the clinic requires the accurate deployment

ε cov(y) covw(y)Volume 0.3210 <0.0317 <0.0019Thickness 0.4664 <0.0001 <0.0001VBM 0.0556 0.0089 <0.0001All 0.1060 <0.0001 <0.0001

Table 9: p-values corresponding to the statistical tests performed onthe experiments comparing the HC, and Autism groups. The high-lighted values correspond to p values with significance levels under0.05 (light background), 0.01 (middle background) and 0.001 (darkbackground).

of our method on data that differs from the training set.Based on GPR uncertainty, we have introduced two

metrics to assess the similarity of a test subject to a modelof healthy aging: the uncertainty of the predictions ofthe GPR cov(y) and an age-weighted uncertainty mea-surement covw(y). We have shown in our experiments

13

20

10

0

10

20

30

40

50

60

ε=y−y

ABIDE Volume

4

6

8

10

12

14

16

18

20

cov(y

)

HC AutismDx

200

300

400

500

600

700

800

covw(y

)

20

0

20

40

60

80

100

ε=y−y

ABIDE Thickness

6

8

10

12

14

16

18

20co

v(y

)

HC AutismDx

250300350400450500550600650700

covw(y

)

302010

0102030405060

ε=y−y

ABIDE VBM

468

10121416182022

cov(y

)

HC AutismDx

100200300400500600700800900

1000

covw(y

)

20

10

0

10

20

30

40

50

ε=y−y

ABIDE All

4

6

8

10

12

14

16

cov(y

)

HC AutismDx

100

150

200

250

300

350

400

450

covw(y

)

Figure 11: Box plots showing prediction results for ε (top), cov(y) (middle) and covw(y) (bottom) for HC and Autism groups on the ABIDEII dataset. Columns correspond to the different evaluated features.

ε cov(y) covw(y)Volume 0.50 0.53 0.58Thickness 0.50 0.60 0.60VBM 0.53 0.55 0.60All 0.53 0.58 0.60

Table 10: Area Under the Curve (AUC) values corresponding to thestatistical tests performed on the experiments comparing the HC,and Autism groups on the ABIDE II dataset.

in section 3 that both measures find statistically signifi-cant differences between HC, MCI and AD groups as wellas between autism and HC groups. We have comparedthese results to the commonly used prediction error ε, andwe have shown that the proposed metrics yield a betterseparation between groups. The age-weighted uncertaintymeasurement can be seen as an extension to the standarduncertainty measure, with the inclusion of a weighting pa-rameter based on the chronological age of the test sub-ject. The effect of this weighting is controlled by the age-length scale parameter ly. We have analyzed the effectof ly in the performance of the age-weighted uncertaintymeasurement and we have observed that although the useof the age-weighting had a limited effect in the case ofthe MCI/AD experiments there was a clear improvementin the case of autism. We hypothesize that these differ-ences in performance of covw(y) can be attributed to thedifferent age ranges of the testing cohorts, since age predic-tion models present smaller prediction errors when testingon younger cohorts compared to the prediction error pre-sented on datasets consisting of older individuals (Cole and

Franke, 2017).For the experiment on autism, our proposed uncer-

tainty based metrics showed its ability to discriminate be-tween autistic and healthy groups. We find these resultsparticularly encouraging since the prediction error basedapproach showed to be insufficient to find differences forthis particular disease. Given the analysis performed insection 3.1 and the results of our experiments, we believethat the main reason of the better performance of our un-certainty based measures is that they do not model brainanomaly as an accelerated aging process, but rather asdeviations from healthy aging. As discussed before, thecomplex effects of aging and disease follow trajectories thataffect different areas of the brain at different rates. Themore relaxed assumptions, which our proposed uncertaintybased measures are based on, are therefore better suitedto account for the complex impact of aging and diseaseacross the entire brain.

We have not performed direct quantitative compar-isons between our uncertainty based measures and dis-criminative approaches. The main reason behind this isthat discriminative approaches require training images notonly from healthy individuals but also from patients. Thismeans that separate models have to be trained for eachspecific disease. In contrast, age-prediction based modelsare only built on images from healthy individuals. Thisallows to have a flexible model which can be used for dif-ferent diseases without any need to retrain or adjust themodel. We demonstrated this in our experiments, wherewe used the same age prediction model to predict brainanomaly both on patients with Alzheimer’s disease and

14

patients with autism.

5. Conclusions

We introduced the prediction uncertainty in age esti-mation as a measure of neuropathology, based on a mul-tivariate age prediction model based on Gaussian processregression. Our measure does not make a priori specific as-sumptions about the nature of the changes caused by dis-ease, but rather models these changes as deviations fromhealthy aging. The method is therefore not limited to aspecific pathology or age range, as demonstrated in ourexperiments on patients with Alzheimer’s disease and pa-tients with autism. Our method is also flexible to workwith different sets of features, as we have illustrated in ourexperiments using volume, thickness, and VBM features.We have introduced an extension of the Gaussian processuncertainty measure for age estimation that also takes thechronological age into account, resulting in a weighted un-certainty measure, and we have demonstrated that the in-clusion of this weighted measure can potentially be help-ful for some applications. In comparison to the commonlyused prediction error, the prediction uncertainty yieldedan improved separation of diagnostic groups across all fea-ture types and for different applications. It is also impor-tant to point out that in contrast to discriminative ap-proaches, age prediction based models only require imagesof healthy individuals for training, which may allow for in-corporating scans from large population-based studies inthe future. The results presented in this paper encourageus to further explore the potential of uncertainty basedmeasures and to apply our method to different diseases orconditions that might have complex effects in the anatomyof the brain. We are further interested in investigating therelationship between the prediction uncertainty and cog-nitive and clinical characteristics, as well as, future healthoutcomes.

Acknowledgements

This work was partly funded by SAP SE, the Forderprogrammfur Forschung und Lehre, the Bavarian State Ministryof Education, Science and the Arts in the framework ofthe Centre Digitisation.Bavaria (ZD.B) and the Facultyof Medicine of the University Munich (Forderprogrammfur Forschung & Lehre). Data collection and sharing forthis project was funded by the Alzheimer’s DiseaseNeu-roimaging Initiative (ADNI) (National Institutes of HealthGrant U01 AG024904) and DOD ADNI (Department ofDefense award number W81XWH-12-2-0012). ADNI isfunded by the National Institute on Aging, the NationalInstitute of Biomedical Imaging and Bioengineering, andthrough generous contributions from the following: Ab-bVie, Alzheimer’s Association; Alzheimer’s Drug Discov-ery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen;Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate;

Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Com-pany; EuroImmun; F. Hoffmann-La Roche Ltd and its af-filiated company Genentech, Inc.; Fujirebio; GE Health-care; IXICO Ltd.; Janssen Alzheimer Immunotherapy Re-search & Development, LLC.; Johnson & Johnson Phar-maceutical Research & Development LLC.; Lumosity; Lund-beck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.;NeuroRx Research; Neurotrack Technologies; Novartis Phar-maceuticals Corporation; Pfizer Inc.; Piramal Imaging;Servier; Takeda Pharmaceutical Company; and Transi-tion Therapeutics. The Canadian Institutes of Health Re-search is providing funds to support ADNI clinical sitesin Canada. Private sector contributions are facilitatedby the Foundation for the National Institutes of Health(www.fnih.org). The grantee organization is the NorthernCalifornia Institute for Research and Education, and thestudy is coordinated by the Alzheimer’s Therapeutic Re-search Institute at the University of Southern California.ADNI data are disseminated by the Laboratory for NeuroImaging at the University of Southern California.

References

Buckner, R. L., 2004. Memory and executive function in aging andad. Neuron 44 (1), 195–208.

Cole, J. H., Franke, K., 2017. Predicting age using neuroimaging:Innovative brain ageing biomarkers. Trends in Neurosciences.

Cole, J. H., Leech, R., Sharp, D. J., 2015. Prediction of brain agesuggests accelerated atrophy after traumatic brain injury. Annalsof neurology 77 (4), 571–581.

Cole, J. H., Poudel, R. P., Tsagkrasoulis, D., Caan, M. W., Steves,C., Spector, T. D., Montana, G., 2016. Predicting brain age withdeep learning from raw imaging data results in a reliable and her-itable biomarker.URL http://arxiv.org/abs/1612.02572

Courchesne, E., Karns, C., Davis, H., Ziccardi, R., Carper, R., Tigue,Z., Chisum, H., Moses, P., Pierce, K., Lord, C., et al., 2001. Un-usual brain growth patterns in early life in patients with autisticdisorder an mri study. Neurology 57 (2), 245–254.

Di Martino, A., Yan, C.-G., Li, Q., Denio, E., Castellanos, F. X.,Alaerts, K., Anderson, J. S., Assaf, M., Bookheimer, S. Y.,Dapretto, M., et al., 2014. The autism brain imaging data ex-change: towards large-scale evaluation of the intrinsic brain archi-tecture in autism. Molecular psychiatry 19 (6), 659.

Ellis, K. A., Bush, A. I., Darby, D., De Fazio, D., Foster, J., Hudson,P., Lautenschlager, N. T., Lenzo, N., Martins, R. N., Maruff,P., et al., 2009. The australian imaging, biomarkers and lifestyle(aibl) study of aging: methodology and baseline characteristics of1112 individuals recruited for a longitudinal study of alzheimer’sdisease. International Psychogeriatrics 21 (4), 672–687.

Fischl, B., 2012. Freesurfer. Neuroimage 62 (2), 774–781.Fjell, A. M., Westlye, L. T., Grydeland, H., Amlien, I., Reinvang, I.,

Raz, N., Holland, D., Dale, A. M., Walhovd, K. B., Neuroimag-ing, D., 2014. Critical ages in the life-course of the adult brain:nonlinear subcortical aging 34 (10), 2239–2247.

Franke, K., Ziegler, G., Kloppel, S., Gaser, C., 2010. Estimatingthe age of healthy subjects from T1-weighted MRI scans usingkernel methods: Exploring the influence of various parameters.Neuroimage 50 (3), 883–892.URL http://dx.doi.org/10.1016/j.neuroimage.2010.01.005

Gaser, C., Franke, K., Kloppel, S., Koutsouleris, N., Sauer, H., 2013.BrainAGE in Mild Cognitive Impaired Patients: Predicting theConversion to Alzheimer’s Disease. PLoS One 8 (6).

Good, C. D., Johnsrude, I. S., Ashburner, J., Henson, R. N., Friston,K. J., Frackowiak, R. S., 2001. A Voxel-Based Morphometric

15

http://arxiv.org/abs/1612.02572

http://dx.doi.org/10.1016/j.neuroimage.2010.01.005

Study of Ageing in 465 Normal Adult Human Brains. Neuroimage14 (1), 21–36.URL http://linkinghub.elsevier.com/retrieve/pii/

S1053811901907864

Gutierrez, B., Peter, L., Klein, T., Wachinger, C., 2017. A multi-armed bandit to smartly select a training set from big medicaldata. In: International Conference on Medical Image Computingand Computer-Assisted Intervention. Springer, pp. 38–45.

Guttmann, C., Jolesz, F. A., Kikinis, R., Killiany, R. J., Moss,M. B., Sandor, T., Albert, M. S., 1998. White matter changeswith normal aging. Neurology 50 (4), 972–978.URL http://www.neurology.org/cgi/doi/10.1212/WNL.50.4.

972

Habes, M., Janowitz, D., Erus, G., Toledo, J. B., Resnick, S. M.,Doshi, J., Van der Auwera, S., Wittfeld, K., Hegenscheid, K.,Hosten, N., Biffar, R., Homuth, G., Volzke, H., Grabe, H. J.,Hoffmann, W., Davatzikos, C., 2016. Advanced brain aging: re-lationship with epidemiologic and genetic risk factors, and over-lap with Alzheimer disease atrophy patterns. Transl. Psychiatry6 (February), e775.URL http://www.ncbi.nlm.nih.gov/pubmed/27045845

Hedden, T., Gabrieli, J. D., 2004. Insights into the ageing mind:a view from cognitive neuroscience. Nature reviews neuroscience5 (2), 87–96.

Ingalhalikar, M., Smith, A., Parker, D., Satterthwaite, T. D., Elliott,M. A., Ruparel, K., Hakonarson, H., Gur, R. E., Gur, R. C.,Verma, R., 2014. Sex differences in the structural connectome ofthe human brain. Proceedings of the National Academy of Sciences111 (2), 823–828.

Jack, C. R., Bernstein, M. A., Fox, N. C., Thompson, P., Alexan-der, G., Harvey, D., Borowski, B., Britson, P. J., L Whitwell,J., Ward, C., et al., 2008. The alzheimer’s disease neuroimag-ing initiative (adni): Mri methods. Journal of magnetic resonanceimaging 27 (4), 685–691.

Kondo, C., Ito, K., Kai Wu, Sato, K., Taki, Y., Fukuda, H., Aoki,T., 2015. An age estimation method using brain local featuresfor T1-weighted images. 2015 37th Annu. Int. Conf. IEEE Eng.Med. Biol. Soc., 666–669.URL http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.

htm?arnumber=7318450

Koutsouleris, N., Davatzikos, C., Borgwardt, S., Gaser, C., Bottlen-der, R., Frodl, T., Falkai, P., Riecher-Rossler, A., Moller, H.-J.,Reiser, M., et al., 2013. Accelerated brain aging in schizophreniaand beyond: a neuroanatomical marker of psychiatric disorders.Schizophrenia bulletin 40 (5), 1140–1153.

Liem, F., Varoquaux, G., Kynast, J., Beyer, F., Masouleh, S. K.,Huntenburg, J. M., Lampe, L., Rahim, M., Abraham, A., Crad-dock, R. C., et al., 2017. Predicting brain-age from multimodalimaging data captures cognitive impairment. NeuroImage 148,179–188.

Mann, H. B., Whitney, D. R., 1947. On a test of whether one oftwo random variables is stochastically larger than the other. Theannals of mathematical statistics, 50–60.

Marcus, D. S., Wang, T. H., Parker, J., Csernansky, J. G., Mor-ris, J. C., Buckner, R. L., 2007. Open access series of imagingstudies (oasis): cross-sectional mri data in young, middle aged,nondemented, and demented older adults. Journal of cognitiveneuroscience 19 (9), 1498–1507.

Nenadic, I., Dietzek, M., Langbein, K., Sauer, H., Gaser, C., 2017.BrainAGE score indicates accelerated brain aging in schizophre-nia, but not bipolar disorder. Psychiatry Res. Neuroimaging266 (March), 86–89.URL http://linkinghub.elsevier.com/retrieve/pii/

S0925492717301002

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion,B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg,V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Per-rot, M., Duchesnay, E., 2011. Scikit-learn: Machine learning inPython. Journal of Machine Learning Research 12, 2825–2830.

Potvin, O., Dieumegarde, L., Duchesne, S., 2017. Normative Mor-phometric Data for Cerebral Cortical Areas Over the Lifetime of

the Adult Human Brain. Neuroimage.URL http://linkinghub.elsevier.com/retrieve/pii/

S1053811917304135

Rasmussen, C. E., Williams, C. K. I., 2005. Gaussian Processes forMachine Learning (Adaptive Computation and Machine Learn-ing). The MIT Press.

Steffener, J., Habeck, C., O’Shea, D., Razlighi, Q., Bherer, L., Stern,Y., 2016. Differences between chronological and brain age are re-lated to education and self-reported physical activity. Neurobiol-ogy of aging 40, 138–144.

Valizadeh, S. A., Hanggi, J., Merillat, S., Jancke, L., 2017. Age pre-diction on the basis of brain anatomical measures. Hum. BrainMapp. 38 (2), 997–1008.

Wachinger, C., Golland, P., Kremen, W., Fischl, B., Reuter, M.,2015. Brainprint: a discriminative characterization of brain mor-phology. NeuroImage 109, 232–248.

Wachinger, C., Reuter, M., 2016. Domain adaptation for alzheimer’sdisease diagnostics. Neuroimage 139, 470–479.

Wachinger, C., Salat, D. H., Weiner, M., Reuter, M., 2016. Whole-brain analysis reveals increased neuroanatomical asymmetries indementia for hippocampus and amygdala. Brain 139 (12), 3253–3266.

Wang, J., Li, W., Miao, W., Dai, D., Hua, J., He, H., 2014. Ageestimation using cortical surface pattern combining thickness withcurvatures. Med. Biol. Eng. Comput. 52 (4), 331–341.

Ziegler, G., Dahnke, R., Gaser, C., Initiative, A. D. N., et al., 2012.Models of the aging brain structure and individual decline. Fron-tiers in Neuroinformatics 6 (3).

16

http://linkinghub.elsevier.com/retrieve/pii/S1053811901907864


http://www.neurology.org/cgi/doi/10.1212/WNL.50.4.972

http://www.neurology.org/cgi/doi/10.1212/WNL.50.4.972

http://www.ncbi.nlm.nih.gov/pubmed/27045845

http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7318450

http://ieeexplore.ieee.org/lpdocs/epic03/wrapper.htm?arnumber=7318450





abstract arxiv:1804.01296v1 [cs.cv] 4 apr 2018...gaussian process uncertainty in age estimation as a...

Documents