characterizing and differentiating task-based and …...position patterns that can effectively...

ORIGINAL RESEARCH

Characterizing and differentiating task-based and resting statefMRI signals via two-stage sparse representations

Shu Zhang & Xiang Li & Jinglei Lv & Xi Jiang & Lei Guo &

Tianming Liu

# Springer Science+Business Media New York 2015

Abstract A relatively underexplored question in fMRI iswhether there are intrinsic differences in terms of signal com-position patterns that can effectively characterize and differ-entiate task-based or resting state fMRI (tfMRI or rsfMRI)signals. In this paper, we propose a novel two-stage sparserepresentation framework to examine the fundamental differ-ence between tfMRI and rsfMRI signals. Specifically, in thefirst stage, the whole-brain tfMRI or rsfMRI signals of eachsubject were composed into a big data matrix, which was thenfactorized into a subject-specific dictionary matrix and aweight coefficient matrix for sparse representation. In the sec-ond stage, all of the dictionary matrices from both tfMRI/rsfMRI data across multiple subjects were composed into an-other big data-matrix, which was further sparsely representedby a cross-subjects common dictionary and a weight matrix.This framework has been applied on the recently publiclyreleased Human Connectome Project (HCP) fMRI data andexperimental results revealed that there are distinctive anddescriptive atoms in the cross-subjects common dictionarythat can effectively characterize and differentiate tfMRI andrsfMRI signals, achieving 100 % classification accuracy.Moreover, our methods and results can be meaningfullyinterpreted, e.g., the well-known default mode network

(DMN) activities can be recovered from the very noisy andheterogeneous aggregated big-data of tfMRI and rsfMRI sig-nals across all subjects in HCP Q1 release.

Keywords Task-based fMRI . Resting-state fMRI . Sparsecoding . Online dictionary learning

Introduction

Functional magnetic resonance imaging (fMRI) based onblood-oxygen-level dependent (BOLD) techniques has beenwidely used to study the functional activities and cognitivebehaviors of the brain based on the induced stimulus by tasks,i.e., task fMRI (tfMRI) (Worsley and Friston 1995; Worsley1997; Linden et al. 1999; Heeger and Ress 2002; Calhounet al. 2011) or during task-free resting-state, i.e., resting statefMRI (rsfMRI) (Raichle et al. 2001; Fox and Raichle 2007).To infer meaningful neuroscientific patterns within fMRI data,various computational/statistical methods have been pro-posed, including the widely-used general linear model(GLM) for tfMRI (Friston et al. 1994; Worsley 1997), inde-pendent component analysis (ICA) for rsfMRI (McKeownet al. 1998), as well as many other methods, includingwavelet algorithms (Bullmore et al. 2003; Shimizu et al.2004), Markov random field (MRF) models (Descombeset al. 1998), mixture models (Hartvig and Jensen 2000),autoregressive spatial models (Woolrich et al. 2014), andBayesian approaches (Luo and Puthusserypady 2007). In the-se methods, GLM is one of the most widely used methods dueto its effectiveness, simplicity, robustness, and wide availabil-ity (Friston et al. 1994; Worsley 1997; Lv et al. 2014a, b).

However, a relatively underexplored question in tfMRI andrsfMRI is whether there exists intrinsic, fundamental

Shu Zhang and Xiang Li contributed equally to this work.

Electronic supplementary material The online version of this article(doi:10.1007/s11682-015-9359-7) contains supplementary material,which is available to authorized users.

S. Zhang :X. Li : J. Lv :X. Jiang : T. Liu (*)Cortical Architecture Imaging and Discovery Lab, Department ofComputer Science and Bioimaging Research Center, The Universityof Georgia, Athens, GA, USAe-mail: [email protected]

J. Lv : L. GuoSchool of Automation, Northwestern Polytechnic University,Xi’an, China

Brain Imaging and BehaviorDOI 10.1007/s11682-015-9359-7

http://dx.doi.org/10.1007/s11682-015-9359-7

differences in signal composition patterns which can effective-ly characterize and differentiate these two types of fMRI sig-nals. As task-based fMRI is widely adopted to identify brainregions that are functionally involved in a specific task perfor-mance, while resting state fMRI is used to explore the intrin-sically functionally segregation or specialization of brainregions/networks (Logothetis 2008), such differences couldinspire better understanding for the organization and origina-tion of the brain cognitive functioning. Also, determiningwhether participants are focusing on task during task scan orbeing rest during resting state scan could be very crucial forthe further analysis. As far as we know, there are at least threechallenges in addressing the above question. Firstly, the vari-ability of fMRI signals across brain scans and across individ-ual subjects could be remarkable. Despite the success of usingGLM-based framework in analyzing individual brain activa-tion patterns (e.g., Worsley and Friston 1995; Bullmore et al.1996; Woolrich et al. 2001), it has been challenging to deriveconsistent fMRI activation patterns across different brains andpopulations due to the huge variability between individuals(Brett et al. 2002; Mueller et al. 2013). Many research studieshave been done to investigate the individual variability inbrain imaging, and it has been shown that there are severalmajor sources of variability (which are often mixed): 1) thevariability in the structure and its corresponding functionalitybetween individual brains, as it has been shown that the stan-dardized parcellation of the brain still poses a major difficultyin terms of function and microanatomy (Brett et al. 2002); 2)the variability of each individual’s response to the externalstimulus during tfMRI scanning, as well as their variabilityduring resting-state, which are even more significant. For in-stance, it has been reported that there is significant and sub-stantial variability in the shape of responses collected acrosssubjects, and even across multiple scans of a single subject(Aguirre et al. 1998; Barch et al. 2013; Steinmetz and Seitz1991); 3) and consequently, the variability in the spatial dis-tribution of the activation patterns obtained by GLM and/orfunctional networks inferred by network analysis could beeven larger, as mentioned in the literature (McGonigle et al.2000; Handwerker et al. 2004).

Secondly, the amount of whole-brain, voxel-wise fMRIsignals from multiple subjects could be immense. For exam-ple, high-resolution tfMRI scans from the recently publiclyreleased Human Connectome Project (HCP) has around 150,000–200,000 time series signals for one subject during a sin-gle task/resting-state scan (Barch et al. 2013). In total, for Q1release of HCP data, there are around 10,200,000–13,600,000time series signals for all 60 subjects of a single task. As thisdataset includes 7 tasks and 1 resting state scans, the total sizewill grow to 81 million. The memory capacity on a server/workstation level can barely handle such a great amount ofdata. Also, there would be many more subjects involved if weaim to conduct a cross-population study. Therefore, eventually

we would need a scalable computational framework with thecapacity of handling the big-data of fMRI signals to any avail-able size to obtain meaningful groupwise result.

Thirdly, there are a variety of noise sources in fMRI sig-nals. During fMRI scans, several factors including scannerinstability, experiment design deficits, and effects of suscepti-bility of high fields may all lead to noise (Stocker et al. 2005;Hu and Norris 2004). For an individual subject, head motion,lack of attention, and other factors that are not related to theexperiment design could also introduce noise (Stocker et al.2005). There have been various studies focused on fMRI im-aging quality with enormous techniques developed for thesignal de-noising and artifact removal (Simmons et al. 1999;Foland and Glover 2004; Stocker et al. 2005; Friedman andGlover 2006). However, it has been rarely explored if big-dataanalytic strategies such as dictionary learning and sparse rep-resentation could potentially effectively deal with such a vari-ety of noise from the entire brains of multiple subjects.

Inspired by the successes of using sparse representation inpattern recognition (Mairal et al. 2009; Kreutz-Delgado et al.2003; Aharon et al. 2006; Lewicki and Sejnowski 2000;Wright et al. 2010) and in brain functional imaging analysis(Lee et al. 2011; Li et al. 2009, 2012; Yamashita et al. 2008;Lv et al. 2014a, b; Li et al. 2013), in this paper, we propose anovel two-stage sparse representation framework to obtain agroupwise characterization of fMRI signals obtained duringvarious tasks (or during resting-state), which have the capa-bility of addressing the abovementioned three challenges.Specifically, for the first challenge, the sparse-constraineddictionary learning method has been algorithmically shownto be capable of identifying the representative componentsfrom the given fMRI dataset as the activation maps from thefMRI study usually have little overlap (Daubechies et al.2009). Further, proposed framework would put the represen-tative dictionary matrix from each individual into the samespace established by the common dictionary learned at thesecond stage, thus dealing with the inter-subject variabilityproblem for analysis without losing individual information.For the second challenge, the two-stage framework applies adivide-and-conquer scheme by first reducing the data of eachindividual to its dictionary-based representation, and then ag-gregating the reduced data into a new input to learn thegroupwise dictionary. Using the HCP Q1 dataset as an exam-ple, after the first stage, we would learn 400 dictionary atomsfrom 150,000 to 200,000 signals for each of the 60 subjects(Lv et al. 2014a), while the sparsity constraint imposed on thelearning process ensures that the learned dictionaries couldcover the major information of the massive number of signals.Thus, at the second stage, the input would be of a much-reduced size (400*60), and we can learn a common dictionaryof all the subjects at ease, compared with the computationalload of decomposing 10,200,000–13,600,000 signals. For thethird challenge, as the sparse representations learned at the

Brain Imaging and Behavior

first stage capture the most prominent temporal activities andtheir corresponding spatial organization patterns of the brainfunctional signal, the individual dictionaries, which serve asthe input of the second stage dictionary learning, essentiallyhave been de-noised, since in most cases noise signals aretemporally inhomogeneous and spatially scattered.

The organization of this paper is as follows: in the methodsection we introduce our two-stage dictionary learning frame-work with a running example. Then the result section providesthe accuracy of classification on task/resting-state fMRI data,which serves as the main verification of the proposed frame-work. After that, we provide the spatial/temporal characteri-zation of three types of the common functional componentsobtained by the framework, which are the main new findingsof our work.

Materials and methods

Overview

The computational flowchart of the proposed framework issummarized in Fig. 1, and a running example of the frame-work applied on the combined dataset of working memory(WM) and resting-state (RS) fMRI is illustrated in Fig. 2. In

the first stage (Fig. 1a), we apply the dictionary learningmethod on the whole-brain tfMRI and rsfMRI signals fromeach subject (in both training and testing datasets) to learndictionaries Dt (from tfMRI) and Dr (from rsfMRI) with thecorresponding loading coefficients αt and αr, and the ex-ample results are shown in Fig. 2b. In this work, each atomin the learned dictionary along with its loading coefficientwould be termed as Bfunctional component^, since it isconsidered as a functional basis that constitutes the wholebrain activities. Then the dictionaries Dt and Dr learned atthe first stage from half of the entire subjects (i.e., trainingdataset) would be aggregated into one single matrix S*(Fig. 1b, with an example in Fig. 2c), which serves as theinput for the second-stage dictionary learning to infer anew, groupwise common dictionary D* and loading coeffi-cient α* (Figs. 1c and 2d). Atoms in the common dictionaryand their estimated spatial maps are then termed asBcommon functional component^, as they are inferredgroupwise and constitute the functional activity variationfor all subjects involved. Further, the most discriminativeatoms in the common dictionary would be selected by ana-lyzing the loading coefficients α* as classification features(Fig. 1f, illustrated in Fig. 2e). The selected common func-tional components are then used to train a support vectormachine (SVM) for the classification of the dictionaries

Fig. 1 Overview of the two-stage dictionary learning scheme: a First-stage dictionary learning routine for each individual subject and for eachtask type (blue: fMRI data obtained during task, red: fMRI data obtainedduring resting-state). αt

1 denotes the loading coefficients obtained fromthe tfMRI of subject1 etc. b Construction of the second-stage dictionarylearning input S* and sparse coding input S*test. c Second-stage

dictionary learning performed on S* to obtain D* (common dictionary)and α*. d Using D* for the sparse coding on S*test, obtaining α*test. eEstimation of the spatial re-maps of common functional components. fCalculating ROAvector by analyzing α*, then performing classification-based feature selection. g Training SVM. h Applying SVM on α*test forthe classification of the testing dataset


learned from the half of subjects (i.e., testing dataset) dur-ing the classification stage, as in Fig. 1g–h.

Data acquisition and preprocessing

The dataset used in this work comes from the HumanConnectome Project Q1 release (Barch et al. 2013; Van Essenet al. 2013). The acquisition parameters of tfMRI data as fol-lows: 90×104 matrix, 220 mm FOV, 72 slices, TR=0.72 s,TE=33.1 ms, flip angle=52°, BW=2290 Hz/Px, in-planeFOV=208×180 mm, 2.0 mm isotropic voxels. For tfMRIimages, the preprocessing pipelines included motion correc-tion, spatial smoothing, temporal pre-whitening, slice timecorrection, global drift removal. For more detailed data acqui-sition and preprocessing, refer to (Barch et al. 2013; Van Essenet al. 2013). rsfMRI data were acquired with the same EPIpulse sequence parameters as T-fMRI (Smith et al. 2013).The time length of each task and resting state are shown here:resting state (1200 frames), working memory (405 frames),gambling (253 frames), motor (284 frames), language (316frames), social cognition (274 frames), relational processing(232 frames), emotion processing (176 frames). As there are60 subjects in the released dataset, in this work half (30) of the

subjects were used for training (i.e., common dictionary learn-ing and feature set constructing), while data from the otherhalf were used for testing (i.e., classification). When used asdictionary learning input, signals on each voxel are normal-ized to have unit l2-norm for both tfMRI and rsfMRI data.

Two-stage dictionary learning

First-stage dictionary learning method

In the first stage, the effective online dictionary learning algo-rithm (Mairal et al. 2009) is adopted to learn a dictionary withsparsity constraint from the whole-brain fMRI signals fromgrey and white matter voxels (with time length t and voxelnumber n) of each subject from both the training and testingdatasets. The algorithm would learn a meaningful and over-complete dictionary D consisting of k atoms (m>t, m<<n) torepresent S with the corresponding sparse loading coefficientmatrix α, as each signal in S is supposed to be represented bythe most relevant atoms in the learned dictionary. Specifically,for the fMRI signal set S=[s1,s2,…sn]ϵℝ

t×n, the loss functionfor the dictionary learning algorithm to minimize is defined inEq. (1) with a l1 regularization that yields to a sparse constraint

Fig. 2 A running exampleillustrating the two-stagedictionary learning frameworkusing WM/rs fMRI dataset: afMRI signals from WM (in blue)and resting-state (in red), from atotal of 30 subjects; bDictionaries(upper time series plot) andloading coefficients (lower spatialmaps) obtained from the firststage dictionary learning. Eachtype of data from each subjectwould obtain 400 dictionaries andcorresponding loadingcoefficients; c Aggregation of alllearned dictionaries; d Commondictionaries (left) and theirloading coefficients (right)obtained from the second stagedictionary learning, over a totalnumber of 50; e Spatial maps ofthe common functionalcomponents estimated usingEq. (3). The color-coded ROAvector of common functionalcomponents are shown below,selected components arehighlighted by yellow circles


to the loading coefficient α (constrained by non-negativity),where λ is a regularization parameter to trade-off the regres-sion residual and sparsity level:

minDϵℝt�k ;αϵℝk�n

1

2jjS−DαjjF þ λjjαjj1;1 ð1Þ

To prevent D from arbitrarily large values which leads totrivial solution of the optimization, its columns d1,d2,……dkare constrained by Eq. (2).

C≜ Dϵℝt�k s:t: ∀j j ¼ 1;…k; dTj d j≤1n o

ð2Þ

In brief, dictionary learning can be rewritten as a matrixfactorization problem for both D and α, and we use the effec-tive online dictionary learning methods in (Mairal et al. 2009)to derive the solution by iteratively updating D and α inEq. (1) during the optimization. It should be noted that weemploy the same assumption as in previous studies, (Li et al.2009, 2012; Lee et al. 2011, 2013; Oikonomou et al. 2012;Abolghasemi et al. 2013) that the atomic components (whichare dictionary atoms in D in our work) involved in eachvoxel’s fMRI signal are a few major ones, and the neuralintegration of those components is linear. In this work, thevalue of λ and dictionary size m were determined experimen-tally (λ=0.1, k=400) (Lv et al. 2014a, b). After the dictionarylearning, the resulting D matrix contains the temporal varia-tion of each atomic basis component of the functional brain,while the corresponding sparse loading coefficient matrix αcontains the spatial distribution of each component, both illus-trated in Fig. 2b.

Based on the dictionary learning results of each individualbrain, our next major task is to obtain a groupwise character-ization that could reveal the distinctive organization patternsbetween the brains’ fMRI data under different conditions.

Second-stage dictionary learning and common functionalcomponents re-mapping

In this stage, all the learned dictionaries from tfMRI andrsfMRI are aggregated together to form a multi-subject,multi-type matrix S* of dimension t×(2kp), where p is thenumber of subjects in the dataset in Fig. 2c. Note that inHCP dataset, rsfMRI data has a longer temporal length thantfMRI data for all tasks and so does the learned dictionaries.Thus we truncated the learnedDr to make them have the samelength with Dt, thus enabling the aggregation of the dictionar-ies from different task types. S* would then be used as theinput for the second-stage dictionary learning analysis basedon the same method as introduced previously (λ=0.1,m=50),aiming at obtaining a groupwise common dictionary D* andthe corresponding loading coefficients α* (constrained bynon-negativity). Compared with the original fMRI data which

are defined on the whole brain voxels of each subject, ourproposed two-stage framework achieves a huge size reductionwhile still maintaining the major functional characterizationfor each individual. More importantly, noise and undesiredvoxel-wise signal fluctuations are largely removed in S*, thuswe can ensure that most of the common functional compo-nents can represent the groupwise consistent functional activ-ities, and their differences are from the intrinsic features offunctional brain activity patterns. As the common dictionariesare defined on the groupwise aggregated dictionaries, it is thenimportant to estimate their spatial maps over the brain (i.e.,spatial re-mapping). In this work, the re-mapping is achievedby first aligning all the brains into the same template usinglinear registration. The aligning procedure first registered theaveraged frames of fMRI data into the MNI standard space ofeach individual subject, then the transform matrix obtainedfrom the registration was applied to the loading coefficientmatrix α of that subject, transformed it into α’. In this study,we had tried both linear and non-linear registration methodsand obtained similar results for the re-mapping. Then thespatial map of the i-th common functional component(ReMapi) is obtained by:

ReMapi ¼Xp

x¼1

Xk

y¼1

α0x;ytask

⋅α*i; x−1ð Þkþy þ

Xp

x¼1

Xk

y¼1

α0x;y

resting⋅α*

i; pþx−1ð Þkþy

!=2kp

ð3Þ

where α’x,y, task is the loading coefficient matrix of the y-th

dictionary (over the total of k) of the x-th subject (over thetotal of p) obtained from the first stage dictionary learning ontfMRI, after registration to the template, α’

x,y, resting is theloading coefficient matrix of the rsfMRI result, after registra-tion to the template, andα*

i is the value of their correspondingloading coefficient for the i-th common dictionary from thesecond stage dictionary learning. In other words, the spatialmaps of the common components are the weighted averagefrom each individual component of each subject. Several sam-ple spatial mapping results (ReMap) are showns in Fig. 2e.

Feature selection on common functional components

As discussed above, the common dictionaries D* and theircorresponding loading coefficients α* obtained at thesecond-stage dictionary learning capture the groupwise char-acteristics of both types of the input fMRI data. Further, therow vectors in the loading coefficients α* indicate the weightof the corresponding common dictionary’s activation in eachatom in S*. An example α* matrix obtained from the WM/resting-state fMRI datasets is visualized in Fig. 2d. The [i, j]-thcell in α* indicates how the i-th common dictionary is acti-vated in the j-th atom in S*. As the composition of S* isknown in the training dataset (the pattern is illustrated inFig. 1b: dictionaries from tfMRI and rsfMRI are put into S*


in turn), for the i-th common dictionary we can obtain its Ratioof Activation (ROA) by:

ROAi ¼ logα i; jð Þ��

0; jth column belongs to tfMRI

α i; jð Þ��

0; jth column belongs to rsfMRI

ð4Þ

Thus the ratio is obtained by counting the number of non-zero entries of the row vector in S* which have been labeled astfMRI or rsfMRI. A sample ROA vector for all 50 commoncomponents is visualized in Fig. 2f and color-coded by theratio value, where a higher ratio (e.g., B4.0^ in red) indicatesthe specific common dictionary is e (4.0)=52 times more in-volved in tfMRI than in rsfMRI, while a lower value (green)indicates the opposite. ROA value approaching 1 (white) in-dicates that the specific component is nearly equally activatedin both tfMRI and rsfMRI. Based on the ROAvector, we canthen select the components that are specific to either tfMRI orrsfMRI by a high absolute value of ROA (i.e., on the two endsof the ROA vector).

In order to quantitatively define the exact set of the com-mon functional components reflecting the underlying datacomposition, we design a data-driven algorithm based onthe premise that the loading coefficient of the selected com-ponents shall have the maximum capacity in classifying thedata. To test this premise, the algorithm would split α* intotwo halves consisting of equal number of subjects (i.e.,columns). Then we would use only one row from the firsthalf of α* which corresponds to the highest ROA value totrain a Support Vector Machine (SVM) based on theLIBSVM toolbox (Chih and Chih 2011), establishing therelationship between the composition of common compo-nents (i.e., loading coefficients) and the composition of rawdata (i.e., task/rs labels). Then we would use the trainedSVM to classify the same rows of the second half of α*.After storing the classification accuracy, which is definedby the proportion of columns in α* that has been classifiedinto the correct label, we would iteratively employ morerows in α* sorted by their absolute ROA values as the fea-ture inputs, thus selecting more features for the SVM

training and classification. In this way, the feature set (i.e.,selected common functional components) could be deter-mined by minimizing the classification error.

Sparse coding of the testing dataset and classification

For the purpose of verification of the proposed framework, weperformed the classification analysis on the testing datasetwhich constitutes half of the total subjects. Before analyzingthe testing dataset, the loading coefficients of the previouslyselected common functional components in the trainingdataset would be used to train an SVM in a similar way asin Feature selection on common functional components part.Note that the same first-stage dictionary learning has beenperformed on the testing dataset as shown in the right panelin Fig. 1a. We could aggregate the individually-learned dictio-naries from the testing dataset into S*testing, similar to theformation of S* in Second-stage dictionary learning and com-mon functional components re-mapping part. Then the com-mon dictionary D* obtained from the training dataset wouldbe used to sparsely code S*testing by solving a typical l-1 reg-ularized LASSO problem (Fig. 1d) to obtain its correspondingloading coefficients αtesting:

ℓ αtesting

� �≜ min

αtestingϵℝm�n

1

2jjStesting−D*αtestingjjF þ λjjαtestingjj1;1

ð5Þ

α*testing has the similar implications with α*, and the differ-ence between them is that α* and D* were learned simulta-neously from the training dataset utilizing an optimizationroutine, while α*testing is the deterministic LASSO solutionof projectingD* on a new dataset. As the tfMRI/rsfMRI com-position pattern inαtesting is unknown, the trained SVMwouldbe used to classify the rows in α*testing that correspond tofeature selection results to obtain the labels of the columnsin α*testing. Thus the link between training and testing datasetis established by the fact that both of their individual dictio-naries learned during the first stage are sparsely coded by the

50%

60%

70%

80%

90%

1 11 21 31 41 50

50%

60%

70%

80%

90%

1 11 21 31 41 50

Fig. 3 Number of componentsselected for the classification (x-axis) vs. component-wiseclassification accuracy (y-axis),using SVM-based classificationmethod (top panel) and NaïveBayesian-based classificationmethod (bottom panel)


same common dictionary D*, making the rows in α* and α-*testing corresponding to the same common functional compo-nents. After obtaining the classification result of the labels ofthe m number of functional components in each fMRI datasetfrom each subject (i.e., component-wise result), our next goalis to classify the type of that dataset (i.e., subject-wise result),as the dataset constituted by thosem functional components ofeach subject has only one label. In this work, we used a simplescheme by comparing the number of components belongingto either task or resting-state in the given dataset, and then dothe classification according to the majority voting rule.

Results

By using the HCP dataset described in Data acquisition andpreprocessing, we combined each of the tfMRI data obtainedfrom seven different tasks with one rsfMRI data, forming theseven combined datasets including emotion/rsfMRI, gambling/rsfMRI, language/rsfMRI, motor/rsfMRI, social/rsfMRI,relational/rsfMRI and working memory/rsfMRI. Then we ap-plied the proposed framework on the seven combined datasets.In all the datasets, tfMRI and rsfMRI can be effectively differ-entiated, and the intrinsic spatial/temporal pattern underlyingsuch difference could be characterized by the learned commonfunctional components. In this work, we categorized the func-tional components into three types: task-evoked components,high-frequency components, and resting-state components. Inmost of the following sections, we would use the combinedworking memory (WM) tfMRI/rsfMRI dataset as an exampleto showcase our results, while the results from the other sixtasks can be found in the supplemental materials.

Classification results on testing dataset and feature selection

As described in BFeature selection on common functionalcomponents^ section, we used classification accuracy onhalf of the training data as the criteria for determining theexact portion of common functional components that wouldbe used for the classification on the testing dataset. Thecomponent-wise accuracy plot obtained from WM/rsfMRIdata using different numbers of features (i.e., components)and two different classification methods (SVM and NaïveBayesian) is shown in Fig. 3. It can be seen that when thenumber of features used was small, the classification per-formance is only slightly better than random guess. Asmore components were used, the accuracy increased mono-tonically and then reached the maximum at 16 for bothclassification methods. As the performance would notchange much afterwards, we could conclude that the addi-tional components employed did not contribute much to thedifferentiation power, thus totally 16 components were se-lected as the features for classification.

After the feature selection in each of the seven combinedtask/rsfMRI datasets, we classified their corresponding testingdatasets following steps in Sparse coding of the testing datasetand classification part, and the subject-wise results are sum-marized in Table 1. It can be seen that the classification accu-racies are very high: tfMRI data from all the 30 subjects havebeen classified correctly, rsfMRI data from all the 30 subjectsalso have been classified correctly using both SVM-based andNaïve Bayesian-based classification methods. The resultsdemonstrate that there exists fundamental differences betweenthe component composition of tfMRI and rsfMRI, while thecommon functional components (i.e., features for the classifi-cation input) learned by the proposed model has the capabilityfor uncovering and characterizing such differences from thelarge and noisy groupwise data.

In Table 1, the first row shows the number of commonfunctional components used for the classification by featureselection. The second row shows the percentage of tfMRIdataset of all 30 subjects that has been classified to thecorrect label. Similarly, the third row shows the percentageof rsfMRI dataset classified to the correct label. To furtherinvestigate the effect of the regularization parameter λ val-ue on the classification results, we have tested the frame-work on the same WM/rsfMRI dataset with various λvalues, the final classification accuracies are shown in

Table 1 Subject-wise classification accuracies for 7 tasks

Emotion /rsfMRI Gambling /rsfMRI Languange /rsfMRI Motor /rsfMRI Social /rsfMRI Relational /rsfMRI WM /rsfMRI

#Components 24 18 20 21 23 22 16

tfMRI 100 % 100 % 100 % 100 % 100 % 100 % 100 %

rsfMRI 100 % 100 % 100 % 100 % 100 % 100 % 100 %

Table 2 Classification accuracies onWM task / resting-state fMRI datausing various dictionary size and λ values for the second stage dictionarylearning

λ value 0.5 0.8 0.1 0.15 0.25 0.35

Accuracy forclassificationof tfMRI

100 % 100 % 100 % 97 % 97 % 87 %

Accuracy forclassificationof rsfMRI

93 % 90 % 100 % 87 % 43 % 77 %


Table 2. The results show that the classification accuracywould be relatively stable within a stable range, especiallyfor the performance on tfMRI dataset. However, extremelarger λ value would lead to a loading coefficient matrix(i.e., input feature for classification) that is too sparse,which decreases the differentiation capability of the fea-tures and reduces the classification accuracy. Also, theclassification accuracies of 7 task/rsfMRI datasets usingreduced dictionary size of 25 in the second stage dictionarylearning are listed in Table 3. The results shown that al-though the tfMRI dataset could be identified accuratelyusing smaller dictionary size (and consequently less num-ber of features to use), dataset from rsfMRI could not besuccessfully distinguished from certain tasks, indicating theimportance of the framework to effectively cover the whole

component space by using a sufficiently large common dic-tionary size during the learning.

Task-evoked common functional components

The most prominent and intuitive common functional compo-nents obtained by our framework is the task-evoked type.In the working memory task, there is an example componentthat belongs to this category, with very high ROA values of4.1 and it has been selected for the classification. The spatialdistributions of this component is very similar to the resultsfrom groupwise GLM activation detection applied on thetfMRI of WM task from the 30 subjects in training dataset,as shown Fig. 4a and b, where the spatial overlapping ratebetween (a) and (b) are 89.5 %. Its time series, plotted in

Table 3 Subject-wise classification accuracies for 7 tasks using reduced dictionary size of 25 for second stage dictionary learning

Dictionary size=25 Emotion /rsfMRI Gambling /rsfMRI Languange /rsfMRI Motor /rsfMRI Social /rsfMRI Relational /rsfMRI WM /rsfMRI

tfMRI 100 % 100 % 93 % 100 % 100 % 100 % 100 %

rsfMRI 87 % 70 % 57 % 93 % 100 % 57 % 83 %

Fig. 4 Example task-evokedcommon functional componentfrom WM/RS dataset. a: volumemap of the component; b volumemap of the corresponding contrastmap by groupwise GLM; c:component mapped on inflatedcortical surface; d groupwiseGLM result mapped on inflatedcortical surface; e time series ofthe components (blue), taskdesign contrast curve of WM task(yellow); f: frequency spectrum ofthe components (red), frequencyspectrum of the contrast curve(green)


Fig. 4e, are correspondent with the task design contrast curves(correlation value: 0.6653). Further, the frequency spectrum ofits time series (Fig. 4f) is highly concentrated on the taskdesign frequency. Based on the spatial, temporal, frequency-domain characteristic and its sole presence in tfMRI, we canbe assured that our framework could identify task-evokedfunctional component in the large scale combined fMRI data.More results could be found in supplemental materials (Sup-plemental Figs. 1–6).

Resting-state domain common functional components

Opposite to the task-evoked components, there is one resting-domain common functional component with the lowest ROAvalue=−1.1 (i.e., the most frequently activated in rsfMRI) inthe WM/RS dataset. As visualized in Fig. 5a, its spatial maplargely resembles the widely-reported default mode network(DMN) (Raichle et al. 2001). We had also applied thegroupwise independent component analysis (ICA) on thesame dataset and obtained similar pattern, as shown inFig. 5b. It should be noted that as no low-pass filtering hasbeen applied in HCP rsfMRI pre-processing, the dominanceof lower frequency in the component spectrum (Fig. 5f) is a

valid characterization of the resting-state brain functional ac-tivation pattern, rather than from the filtering artifact (spatialoverlapping rate with ICA resting-state map: 83 %). Moreresults could be found in supplemental materials (Supplemen-tal Figs. 7–12).

High frequency common functional components

Besides the two traditional types of common functional com-ponents described above, several of the identified componentsfrom various tasks are immensely activated in tfMRI data, yetexhibit diverse spatial/temporal patterns, compared with thecommon knowledge of brain regions that are responding totasks. One characteristic shared by those components is thedominance of high frequency in their spectrum (bottom panelof Fig. 6). It is interesting that components from variousdatasets have almost the same frequency domain characteris-tics and very similar spatial distribution, even though the taskdesign and time length are all different in these datasets. Byexamining the spatial map of those components in Fig. 6, itcould be found that in all the three tasks (WM, emotion andgambling) the ventral posterior cingulate cortex is consistentlyactivated, which receives inputs from thalamus and neocortex,

Fig. 5 Example resting-statecommon functional componentfrom WM/RS dataset. a: volumemap of the component; b volumemap of the correspondinggroupwise ICA result; c:component mapped on inflatedcortical surface; d groupwise ICAresult mapped on inflated corticalsurface; e time series of thecomponent; f: frequencyspectrum of the component


and projects to the entorhinal cortex via cingulum. Being anintegral part of the limbic system, this area has been reportedto be involved with associative learning (Maddock et al.2001), memory retrieval (Nielsen et al. 2005), as well as emo-tion formation and processing (Maddock et al. 2003), whichexplains its significant presence during those tasks. Unlikeresting-state networks which have been reported to be at pres-ence in different tasks with similar spatial distribution (e.g.,DMN) (Raichle et al. 2001), the common functional compo-nents shown in Fig. 6 only activate during their respectivetasks but rarely during resting-state, thus largely excludingthe possibility that these two components belong to the tradi-tional resting-state network. Also, these components could notbe identified by traditional activation detection method due tothe high-frequency nature of their temporal pattern (third panelin Fig. 6), although these components only activatedduring the task and were highly related to tfMRI data (all withROA value of infinity). While in our two-stage dictionarylearning framework such components are very obvious and

could be robustly identified. More results could be found insupplemental materials (Supplemental Figs. 13–16).

Discussion and conclusion

By using the HCP public tfMRI/rsfMRI datasets, we havepresented a novel two-stage sparse representation frameworkto examine the intrinsic differences in tfMRI/rsfMRI signals.The major methodological novelty of the two-stage sparserepresentation is that the framework can effectively removethe noise and undesired voxel-wise signal fluctuations, effi-ciently deal with the big-data (a matrix of millions times hun-dreds data points), and infer distinctive and descriptive com-mon dictionary atoms that can well characterize and differen-tiate tfMRI/rsfMRI signals in task performance and restingstate. In addition, the results also suggest that our two-stagesparse representation method can effectively recover theDMN activities from the very noisy and heterogeneous

Fig. 6 High frequency common functional components identified fromthree datasets: aWM/RS, b Emotion/RS, and c Gambling/RS. First (top)panel: volume maps of the components; second panel: componentmapped on inflated cortical surface; third panel: time series of the

components; fourth (bottom) panel: frequency spectrums of thecomponents (red), frequency spectrums of the contrast curves areshown in green


aggregated big-data of tfMRI and rsfMRI signals across allsubjects in HCP Q1 release. The applications of this frame-work on seven HCP tfMRI datasets and one rsfMRI datasethave demonstrated promising results. In the future, we plan tobetter interpret other dictionary atoms in two stages and applythis framework to clinical fMRI datasets to elucidate possiblealterations of functional activities in brain disorders.

Acknowledgments T Liu was supported by NSF CAREER Award(IIS-1149260), NIH R01 DA-033393, NIH R01 AG-042599, NSFCBET-1302089 and NSF BCS-1439051. L Guo was supported by theNSFC #61273362.

Conflict of Interest Shu Zhang, Xiang Li, Jinglei Lv, Xi Jiang, LeiGuo, and Tianming Liu declare that they have no conflicts of interest.

Informed Consent Data used in this study were previously collectedand archived in a data bank.

References

Abolghasemi, V., Ferdowsi, S., Sanei, S. (2013). Fast and incoherentdictionary learning algorithms with application to fMRI. Signal,Image and Video Processing.

Aguirre, G. K., Zarahn, E., & D’esposito, M. (1998). The variability ofhuman, BOLD hemodynamic responses. NeuroImage, 8(4), 360–369.

Aharon, M., Elad, M., & Bruckstein, A. (2006). K-SVD: an algorithm fordesigning overcomplete dictionaries for sparse representation. IEEETransactions on Signal Processing, 54(11), 4311–4322.

Barch, D.M., Burgess, G.C., Harms, M.P., Petersen, S.E., Schlaggar,B.L., Corbetta, M., Glasser, M.F., Curtiss, S., Dixit, S., Feldt, C.,Nolan, D., Bryant, E., Hartley, T., Footer, O., Bjork, J.M., Poldrack,R., Smith, S., Johansen-Berg, H., Snyder, A.Z., Van Essen, D.C.,WU-Minn HCP Consortium. (2013). Function in the humanconnectome: task-fMRI and individual differences in behavior.Neuroimage.

Brett, M., Johnsrude, I. S., & Owen, A. M. (2002). The problem offunctional localization in the human brain. Nature ReviewsNeuroscience, 3(3), 243–249.

Bullmore, E., Brammer, M., Williams, S., Rabe-Hesketh, S., Janot, N.,David, A., Mellers, J., Howard, R., & Sham, P. (1996). Statisticalmethods of estimation and inference for functional MR image anal-ysis. Magnetic Resonance in Medicine, 35(2), 261–277.

Bullmore, E., Fadili, J., Breakspear, M., Salvador, R., Suckling, J., &Brammer, M. (2003). Wavelets and statistical analysis of functionalmagnetic resonance images of the human brain. Statistical Methodsin Medical Research, 12(5), 375–399.

Calhoun, V.D., et al. (2011). fMRI Activation in a visual-perception task:network of areas detected using the general linear model and inde-pendent components analysis. NeuroImage, 14(5), 1080–1088,2001.

Chih C.C., & Chih J.L. LIBSVM: a library for support vector machines.ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27.

Daubechies, I., Roussos, E., Takerkart, S., Benharrosh, M., Golden, C.,D’Ardenne, K., Richter, W., Cohen, J. D., & Haxby, J. (2009).Independent component analysis for brain fMRI does not selectfor independence. Proceedings of the National Academy ofSciences of the United States of America, 106(26), 10415–10422.

Descombes, X., Kruggel, F., & von Cramon, D. Y. (1998). fMRI signalrestoration using a spatio-temporal markov random field preservingtransitions. NeuroImage, 8(4), 340–349.

Foland, L., & Glover, G.H. (2004). Scanner quality assurance for longi-tudinal or multicenter fMRI studies, In International Society forMagnetic Resonance Imaging. 12th Annual Meeting of theInternational Society for Magnetic Resonance Imaging (ISMRM).

Fox, M. D., & Raichle, M. E. (2007). Spontaneous fluctuations in brainactivity observed with functional magnetic resonance imaging.Nature Review Neuroscience, 8, 700–711.

Friedman, L., & Glover, G. H. (2006). Report on a multicenter fMRIquality assurance protocol. Journal of Magnetic ResonanceImaging, 23(6), 827–839.

Friston, KJ., Holmes, AP., Worsley, KJ. (1994). Statistical parametricmaps in functional imaging: a general linear approach. HumanBrain Mapping, V2-I4: 189–210.

Handwerker, D. A., Ollinger, J. M., & D’Esposito, M. (2004). Variationof BOLD hemodynamic responses across subjects and brain regionsand their effects on statistical analyses. NeuroImage, 21(4), 1639–1651.

Hartvig, N. V., & Jensen, J. L. (2000). Spatial mixture modeling of fmridata. Human Brain Mapping, 11(4), 233–248.

Heeger, D. J., & Ress, D. (2002). What does fMRI tell us about neuronalactivity? Nature Review Neuroscience, 3(2), 142–152.

Hu, X., & Norris, D. G. (2004). Advances in high-field magnetic reso-nance imaging. Annual Review of Biomedical Engineering, 6, 157–184.

Kreutz-Delgado, K., Murray, J. F., Rao, B. D., Engan, K., Lee, T. W., &Sejnowski, T. J. (2003). Dictionary learning algorithms for sparserepresentation. Neural Computation, 15(2), 349–396.

Lee, K., Tak, S., & Ye, J. C. (2011). A data-driven sparse GLM for fMRIanalysis using sparse dictionary learning with MDL criterion. IEEETransactions on Medical Imaging, 30(5), 1076–1089.

Lee, J., Jeong, Y., Ye, J.C. (2013). Group sparse dictionary learning andinference for resting-state fMRI analysis of Alzheimer’s disease.ISBI.

Lewicki, M., & Sejnowski, T. (2000). Learning overcomplete represen-tations. Neural Computation, 12(2), 337–365.

Li, Y., Namburi, P., Yu, Z., Guan, C., Feng, J., & Gu, Z. (2009). Voxelselection in FMRI data analysis based on sparse representation.IEEE Transactions on Biomedical Engineering, 56(10), 2439–2451.

Li, Y., Long, J., He, L., Lu, H., Gu, Z., et al. (2012). A sparserepresentation-based algorithm for pattern localization in brain im-aging data analysis. PLoS ONE, 7(12), e50332.

Li, X., Zhu, D., Jiang, X., Jin, C., Zhang, X., Guo, L., Zhang, J., Hu, X.,Li, J., Liu, T. (2013). Dynamic functional connectomics signaturesfor characterization and differentiation of PTSD Patients, in press,Human Brain Mapping.

Linden, D. E., Prvulovic, D., Formisano, E., Vollinger, M., Zanella, F. E.,Goebel, R., & Dierks, T. (1999). The functional neuroanatomy oftarget detection: an fMRI study of visual and auditory oddball tasks.Cerebral Cortex, 9(8), 815–823.

Logothetis, N. K. (2008). What we can do and what we cannot do withfMRI. Nature, 453(7197), 869–878.

Luo, H., & Puthusserypady, S. (2007). fMRI data analysis with nonsta-tionary noise models: a Bayesian approach. IEEE Transactions onBiomedical Engineering, 54, 1621–1630.

Lv, J., Jiang, X., Li, X., Zhu, D., Chen, H., Zhang, T., Zhang, S., Hu, X.,Han, J., Huang, H., Zhang, J., Guo, L., Liu, T. (2014a). Sparserepresentation of whole-brain FMRI signals for identification offunctional networks, in press, Medical Image Analysis.

Lv, J., Jiang, X., Li, X., Zhu, D., Zhang, S., Zhao, S., Chen, H., Zhang, T.,Hu, X., Han, J, Ye, J, Guo, L, Liu, T. (2014b). Holistic atlases offunctional networks and interactions reveal reciprocal organizationalarchitecture of cortical function, accepted, IEEE Transactions onBiomedical Engineering.


Maddock, R. J., Garrett, A. S., & Buonocore, M. H. (2001).Remembering familiar people: the posterior cingulate cortex andautobiographical memory retrieval. Neuroscience, 104(3), 667–676.

Maddock, R. J., Garrett, A. S., & Buonocore, M. H. (2003). Posteriorcingulate cortex activation by emotional words: fMRI evidencefrom a valence decision task. Human Brain Mapping, 18(1), 30–41.

Mairal, J., Bach, Francis., Ponce, J., Sapiro, G. (2009). Online dictionarylearning for sparse coding. In Proceedings of the InternationalConference on Machine Learning (ICML).

McGonigle, D. J., Howseman, A. M., Athwal, B. S., Friston, K. J.,Frackowiak, R. S. J., & Holmes, A. P. (2000). Variability in fMRI:an examination of intersession differences.NeuroImage, 11(6), 708–734.

McKeown, M. J., et al. (1998). Spatially independent activity patterns infunctional MRI data during the Stroop color-naming task. PNAS,95(3), 803.

Mueller, S., Wang, D., Fox, M. D., Yeo, B. T., Sepulcre, J., Sabuncu, M.R., Shafee, R., Lu, J., & Liu, H. (2013). Individual variability infunctional connectivity architecture of the human brain. Neuron,77(3), 586–595.

Nielsen, F. A., Balslev, D., & Hansen, L. K. (2005). Mining the posteriorcingulate: segregation between memory and pain components.NeuroImage, 27(3), 520–532.

Oikonomou, V. P., Blekas, K., & Astrakas, L. (2012). A sparse and spa-tially constrained generative regression model for fMRI data analy-sis. IEEE Transactions on Biomedical Engineering, 59(1), 58–67.

Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard,D. A., & Shulman, G. L. (2001). A default mode of brain function.Proceedings of the National Academy of Sciences of the UnitedStates of America, 98(2), 676–682.

Shimizu, Y., Barth, M., Windischberger, C., Moser, E., & Thurner, S.(2004). Wavelet-based multifractal analysis of fMRI time series.NeuroImage, 22, 1195–1202.

Simmons, A., Moore, E., & William, S. C. R. (1999). Quality control forfunctional magnetic resonance imaging using automated data anal-ysis and shewhart charting.Magnetic Resonance inMedicine, 41(6),1274–1278.

Smith, S.M., Beckmann, C.F., Andersson, J., Auerbach, E.J.,Bijsterbosch, J., Douaud, G., Duff, E., Feinberg, D.A.,Griffanti, L., Harms, M.P., Kelly, M., Laumann, T., Miller,K.L., Moeller, S., Petersen, S., Power, J., Salimi-Khorshidi, G.,Snyder, A.Z., Vu, A.T., Woolrich, M.W., Xu, J., Yacoub, E.,Uğurbil, K., Van Essen, D.C., Glasser, M.F., WU-Minn HCPConsortium. (2013). Resting-state fMRI in the HumanConnectome Project. Neuroimage.

Steinmetz, H., & Seitz, R. J. (1991). Functional anatomy of languageprocessing: neuroimaging and the problem of individual variability.Neuropsychologia, 29, 1149–1161.

Stocker, T., Schnneider, F., Klein, M., Habel, U., Kellermann, T., Ziles,K., & Shah, N. J. (2005). Automated quality assurance routines forfMRI data applied to a multicenter study. Human Brain Mapping,25(2), 237–246.

Van Essen, D. C., Smith, S. M., Barch, D. M., Behrens, T. E., Yacoub, E.,& Ugurbil, K. (2013). WU-Minn HCP consortium. The WU-Minnhuman connectome project: an overview. NeuroImage, 80(2013),62–79.

Woolrich, M. W., Ripley, B., Brady, J., & Smith, S. (2001). Temporalautocorrelation in univariate linear modelling of FMRI data.NeuroImage, 14(6), 1370–1386.

Woolrich, M. W., Jenkinson, M., Brady, J. M., & Smith, S. M. (2014).Fully bayesian spatio-temporal modeling of fmri data. IEEETransactions on Medical Imaging, 23(2), 213–231.

Worsley, K. J. (1997). An overview and some new developments in thestatistical analysis of PET and fMRI data. Human Brain Mapping,5(4), 254–258.

Worsley, K. J., & Friston, K. J. (1995). Analysis of fMRI time seriesrevisited again. NeuroImage, 2, 173–181.

Wright, J., et al. (2010). Sparse representation for computer vision andpattern recognition. Proceedings of the IEEE, 98(6), 1031–1044.

Yamashita, O., Sato, M. A., Yoshioka, T., Tong, F., & Kamitani, Y.(2008). Sparse estimation automatically selects voxels relevant forthe decoding of fMRI activity patterns. NeuroImage, 42(4), 1414–1429.


characterizing and differentiating task-based and …...position patterns that can effectively...

Documents