separating stimulus-induced and background components of

13
1 Separating Stimulus-Induced and Background Components of Dynamic Functional Connectivity in Naturalistic fMRI Chee-Ming Ting, Senior Member, IEEE, Jeremy I. Skipper, Steven L. Small and Hernando Ombao Abstract—We consider the challenges in extracting stimulus- related neural dynamics from other intrinsic processes and noise in naturalistic functional magnetic resonance imaging (fMRI). Most studies rely on inter-subject correlations (ISC) of low- level regional activity and neglect varying responses in individ- uals. We propose a novel, data-driven approach based on low- rank plus sparse (L+S) decomposition to isolate stimulus-driven dynamic changes in brain functional connectivity (FC) from the background noise, by exploiting shared network structure among subjects receiving the same naturalistic stimuli. The time-resolved multi-subject FC matrices are modeled as a sum of a low-rank component of correlated FC patterns across subjects, and a sparse component of subject-specific, idiosyncratic background activities. To recover the shared low-rank subspace, we introduce a fused version of principal component pursuit (PCP) by adding a fusion-type penalty on the differences between the rows of the low-rank matrix. The method improves the detection of stimulus-induced group-level homogeneity in the FC profile while capturing inter-subject variability. We develop an efficient algorithm via a linearized alternating direction method of multipliers to solve the fused-PCP. Simulations show accurate recovery by the fused-PCP even when a large fraction of FC edges are severely corrupted. When applied to natural fMRI data, our method reveals FC changes that were time-locked to auditory processing during movie watching, with dynamic engagement of sensorimotor systems for speech-in-noise. It also provides a better mapping to auditory content in the movie than ISC. Index Terms—Low-rank plus sparse decomposition, dynamic functional connectivity, inter-subject correlation, fMRI. I. I NTRODUCTION N ATURALISTIC functional magnetic resonance imaging (fMRI) is an emerging approach in cognitive neuro- science, employing naturalistic stimuli (e.g., movies, spo- ken narratives, music, etc.) to provide an ecologically-valid paradigm that mimics real life scenarios [1]. Complementary to traditional task-based paradigms with strictly controlled artificial stimuli, naturalistic paradigms offer a better under- standing of neural processing and network interactions in realistic contexts, where stimuli typically engage the brain in continuous integration of dynamic streams of multimodal (e.g., audiovisual) information. Resting-state fMRI, although entirely unconstrained, is vulnerable to confounds and difficult C.-M. Ting is with the School of Information Technology, Monash Uni- versity Malaysia, 47500 Bandar Sunway, Selangor, Malaysia, and also the Biostatistics Group, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia (e-mail: [email protected]). J.I. Skipper is with the Department of Experimental Psychology, University College London, UK (e-mail: [email protected]). S.L. Small is with the School of Behavioral and Brain Sciences, University of Texas at Dallas, USA (e-mail: [email protected]). H. Ombao, is with the Biostatistics Group, King Abdullah University of Science and Technology, Thuwal 23955, Saudi Arabia (e-mail: her- [email protected]). to link with ongoing cognitive states. Natural stimuli like movies have shown higher test-retest reliability than resting and task fMRI [2]. In this paper, we focus on a major challenge in naturalistic fMRI data analysis – to extract stimulus-related neural dynamics from other intrinsic and noise contributions. Early studies use traditional methods for task fMRI such as the general linear model (GLM) to detect brain activation driven by natural stimuli, using annotations of specific features or events (e.g., faces, scenes or speech in a film [3]) as regressors. However, GLM-based analysis requires explicit a priori models of relevant features of the stimuli and associated neural responses as predictors, where minor model misspec- ification can result in low detection power. It is therefore effective only for simple parametric designs, but extremely rigid for capturing the richer dynamical information present in natural stimuli. Inter-subject correlation (ISC) analyses provide a powerful, data-driven alternative for handling nat- uralistic paradigms without a pre-defined response model, by leveraging the reliable, shared neural responses across different subjects when exposed to the same continuous naturalistic stimulation [4], [5]. By correlating fMRI time courses from the same regions across subjects, ISC can detect inter-subject synchronization of brain activity in specific areas activated by the stimuli, e.g., prefrontal cortex during movie viewing [6]. It was recently extended to inter-subject functional correlation (ISFC) to characterize stimulus-related functional networks by computing the correlations between all pairs of regions across subjects [7]. Use of ISFC has revealed, for example, engagement of default mode networks during narrative com- prehension [7]. The ISFC increases specificity to stimulus- locked inter-regional correlations, by filtering out intrinsic, task-unrelated neural dynamics as well as non-neuronal arti- facts (e.g., respiratory rate; head motion) that are theoretically uncorrelated across subjects. Time-resolved ISFC computed using a sliding-window technique has been used to track movie-induced dynamic changes in functional connectivity [8]. A recent study [9] explored ISC of time-courses of dynamic connectivity rather than regional activity. For a review of the ISC family of approaches see [10]. However, one limitation of IS(F)C is that it has no inherent way to detect individual differences in neural activity, due to the averaging-out of all uncorrelated variations across subjects. Moreover, the ISFC analysis inherently relies on the Pearson correlations between activity time courses of subjects, and thus how it can be generalized to other measures of brain connectivity, e.g., for directed functional networks is still unclear. We propose a new approach based on low-rank plus sparse (L+S) decomposition to separate stimulus-induced and back- arXiv:2102.10331v1 [q-bio.NC] 24 Jan 2021

Upload: others

Post on 02-Feb-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

1

Separating Stimulus-Induced and BackgroundComponents of Dynamic Functional Connectivity in

Naturalistic fMRIChee-Ming Ting, Senior Member, IEEE, Jeremy I. Skipper, Steven L. Small and Hernando Ombao

Abstract—We consider the challenges in extracting stimulus-related neural dynamics from other intrinsic processes and noisein naturalistic functional magnetic resonance imaging (fMRI).Most studies rely on inter-subject correlations (ISC) of low-level regional activity and neglect varying responses in individ-uals. We propose a novel, data-driven approach based on low-rank plus sparse (L+S) decomposition to isolate stimulus-drivendynamic changes in brain functional connectivity (FC) fromthe background noise, by exploiting shared network structureamong subjects receiving the same naturalistic stimuli. Thetime-resolved multi-subject FC matrices are modeled as a sumof a low-rank component of correlated FC patterns acrosssubjects, and a sparse component of subject-specific, idiosyncraticbackground activities. To recover the shared low-rank subspace,we introduce a fused version of principal component pursuit(PCP) by adding a fusion-type penalty on the differences betweenthe rows of the low-rank matrix. The method improves thedetection of stimulus-induced group-level homogeneity in the FCprofile while capturing inter-subject variability. We develop anefficient algorithm via a linearized alternating direction methodof multipliers to solve the fused-PCP. Simulations show accuraterecovery by the fused-PCP even when a large fraction of FC edgesare severely corrupted. When applied to natural fMRI data, ourmethod reveals FC changes that were time-locked to auditoryprocessing during movie watching, with dynamic engagement ofsensorimotor systems for speech-in-noise. It also provides a bettermapping to auditory content in the movie than ISC.

Index Terms—Low-rank plus sparse decomposition, dynamicfunctional connectivity, inter-subject correlation, fMRI.

I. INTRODUCTION

NATURALISTIC functional magnetic resonance imaging(fMRI) is an emerging approach in cognitive neuro-

science, employing naturalistic stimuli (e.g., movies, spo-ken narratives, music, etc.) to provide an ecologically-validparadigm that mimics real life scenarios [1]. Complementaryto traditional task-based paradigms with strictly controlledartificial stimuli, naturalistic paradigms offer a better under-standing of neural processing and network interactions inrealistic contexts, where stimuli typically engage the brainin continuous integration of dynamic streams of multimodal(e.g., audiovisual) information. Resting-state fMRI, althoughentirely unconstrained, is vulnerable to confounds and difficult

C.-M. Ting is with the School of Information Technology, Monash Uni-versity Malaysia, 47500 Bandar Sunway, Selangor, Malaysia, and also theBiostatistics Group, King Abdullah University of Science and Technology,Thuwal 23955, Saudi Arabia (e-mail: [email protected]).

J.I. Skipper is with the Department of Experimental Psychology, UniversityCollege London, UK (e-mail: [email protected]).

S.L. Small is with the School of Behavioral and Brain Sciences, Universityof Texas at Dallas, USA (e-mail: [email protected]).

H. Ombao, is with the Biostatistics Group, King Abdullah Universityof Science and Technology, Thuwal 23955, Saudi Arabia (e-mail: [email protected]).

to link with ongoing cognitive states. Natural stimuli likemovies have shown higher test-retest reliability than restingand task fMRI [2]. In this paper, we focus on a major challengein naturalistic fMRI data analysis – to extract stimulus-relatedneural dynamics from other intrinsic and noise contributions.

Early studies use traditional methods for task fMRI suchas the general linear model (GLM) to detect brain activationdriven by natural stimuli, using annotations of specific featuresor events (e.g., faces, scenes or speech in a film [3]) asregressors. However, GLM-based analysis requires explicit apriori models of relevant features of the stimuli and associatedneural responses as predictors, where minor model misspec-ification can result in low detection power. It is thereforeeffective only for simple parametric designs, but extremelyrigid for capturing the richer dynamical information presentin natural stimuli. Inter-subject correlation (ISC) analysesprovide a powerful, data-driven alternative for handling nat-uralistic paradigms without a pre-defined response model, byleveraging the reliable, shared neural responses across differentsubjects when exposed to the same continuous naturalisticstimulation [4], [5]. By correlating fMRI time courses fromthe same regions across subjects, ISC can detect inter-subjectsynchronization of brain activity in specific areas activated bythe stimuli, e.g., prefrontal cortex during movie viewing [6].It was recently extended to inter-subject functional correlation(ISFC) to characterize stimulus-related functional networksby computing the correlations between all pairs of regionsacross subjects [7]. Use of ISFC has revealed, for example,engagement of default mode networks during narrative com-prehension [7]. The ISFC increases specificity to stimulus-locked inter-regional correlations, by filtering out intrinsic,task-unrelated neural dynamics as well as non-neuronal arti-facts (e.g., respiratory rate; head motion) that are theoreticallyuncorrelated across subjects. Time-resolved ISFC computedusing a sliding-window technique has been used to trackmovie-induced dynamic changes in functional connectivity [8].A recent study [9] explored ISC of time-courses of dynamicconnectivity rather than regional activity. For a review of theISC family of approaches see [10]. However, one limitationof IS(F)C is that it has no inherent way to detect individualdifferences in neural activity, due to the averaging-out of alluncorrelated variations across subjects. Moreover, the ISFCanalysis inherently relies on the Pearson correlations betweenactivity time courses of subjects, and thus how it can begeneralized to other measures of brain connectivity, e.g., fordirected functional networks is still unclear.

We propose a new approach based on low-rank plus sparse(L+S) decomposition to separate stimulus-induced and back-

arX

iv:2

102.

1033

1v1

[q-

bio.

NC

] 2

4 Ja

n 20

21

2

ground components of dynamic functional connectivity (dFC)in naturalistic fMRI. Our method is inspired by the successfulapplications of L+S decomposition in video surveillance toseparate a slowly-changing background (modeled by a low-rank subspace) from moving foreground objects (sparse out-liers) from a video sequence [11], [12]. We formulate a time-varying L+S model for dFC, where the time-resolved, multi-subject functional connectivity (FC) matrices are representedas the sum of a low-rank component and a sparse compo-nent. The low-rank component corresponds to the similar orcorrelated connectivity patterns across subjects elicited by thesame stimuli, which can be well-approximated by a commonlow-dimensional subspace. The sparse component capturesthe corruptions on stimulus-related FC edges, arising frombackground activity (i.e., intrinsic processes and artifacts) thatare idiosyncratic and occur occasionally in certain subjects. Weapply the robust principal component analysis (RPCA) [13],[14] at each time point to recover a sequence of time-resolvedlow-rank FC matrices from the sparse background noise, bysolving a convex optimization that minimizes a combination ofnuclear norm and `1 norm (called principal component pursuit(PCP)). Under mild assumptions (i.e., the sparse matrix issufficiently sparse), the RPCA-PCP can exactly recover theshared low-rank FC structure across subjects, even though anunknown fraction of the FC edges are corrupted by artifacts ofarbitrarily large magnitude, as shown in [14] for the generalcase. A related work [15] imposed L+S constraints on FCmatrices of individual subjects and focused on modeling rawfMRI signals, another [16] used a low-rank approximation ofstacked FC networks across subjects. However, both studiesare limited to analyses of static FC and resting-state data.

We further introduce a novel L+S decomposition methodcalled the fused PCP, by adding a new fusion-type penaltyin the PCP to shrink the successive differences between thecolumns of the low-rank matrix towards zero. The fusedPCP encourages homogeneity in connectivity patterns acrosssubjects, by smoothing out the remaining stimulus-unrelatedindividual differences in the common connectivity structure.Moreover, it can provide a better recovery of the low-rankFC matrices in presence of dense noise (i.e., many of the FCedges are corrupted) than the generic PCP. We develop anefficient algorithm via linearized alternating direction methodof multipliers (ADMM) to solve the proposed fused PCPprogram which admits no closed-form solution. We evaluatedthe performance of the proposed algorithm for L+S recoveryvia simulations. Application to naturalistic fMRI data revealsdynamic FC patterns that are reliably related to processing ofcontinuous auditory events during movie watching.

The contributions of this work are as follows: (1) This isthe first work to propose a L+S decomposition method fornaturalistic fMRI, which is is model-free and data-driven indetecting stimulus-induced dynamic FC. (2) The proposed L+Slearning algorithm offers a novel way to isolate low-rank partsof correlated multi-subject FC from idiosyncratic componentsover time. It exploits shared neuronal activity across subjects interms of the network connectivity directly rather than the low-level BOLD responses in ISFC. This is equivalent to denoisingthe FC networks instead of raw fMRI signals via ISC, and thus

extracts better FC structure information. By utilizing cross-subject reliability of FC measures captured in the low-rankpart, we show that it provides a better mapping to naturalisticstimuli than the ISFC. Furthermore, our method advancesa more general analysis framework applicable to other FCmeasures beyond the Pearson correlation in ISFC. (3) UnlikeISFC which computes a group-level FC profile related only tothe stimuli, the proposed method recovers both the stimulus-related and unrelated FC patterns. This allows us to quantifyconfounding influence on the stimulus-induced FC, includingpotentially interesting stimulus-independent sources of neuraldynamics, such as intrinsic processes, attentional variationsand other individual differences. (4) While extracting a sharedconnectivity structure, our method also retains subject-specificFC patterns unavailable from ISFC, which is useful for study-ing individual differences in dynamic FC during naturalisticstimulation.

II. LOW-RANK AND SPARSE MATRIX DECOMPOSITIONFOR DYNAMIC FUNCTIONAL CONNECTIVITY

A. Proposed Model

Suppose we observe a set of weighted connectivity matrices{Σti, t = 1, . . . , T, i = 1, . . . ,M} for multi-subject, time-dependent functional brain networks with same set of Nnodes, where elements in Σti ∈ RN×N are measures ofpairwise interactions between nodes in the network at timepoint t for ith subject. T is the number of time points andM is the number of subjects. Given the observed {Σti}, weaim to separate the task or stimulus-related brain FC dynamicsfrom the background composed of intrinsic neural correlations,noise and other non-neuronal contributions. Our approachbuilds on the hypothesis that FC networks across differentsubjects admit a common underlying structure and exhibitsimilar or highly-correlated patterns, due to shared neuronalresponse among subjects when experiencing the same stimuli,e.g., watching the same movie in a naturalistic setting. On theother hand, the background connectivity networks are assumedto be relatively sparse compared to the stimulus-induced ones,and some confounding fluctuations such as motion-relatedartifacts occur only occasionally in certain subjects. Thesebackground activities are spontaneous, not time-locked tostimuli, and thus varying considerably across subjects.

In particular, we propose a decomposition of the time-dependent connectivity matrices Σti as a superposition ofa low-rank and a sparse component that represent respec-tively the correlated stimulus-induced connectivity and thesubject-varying background activities. To formulate, let zti =vec(Σti) be the vectorized version of Σti, and Zt =[zt1, . . . , ztM ] ∈ RN2×M to denote the concatenation of thevectors of connectivity metrics over all subjects at time t. Weconsider a low-rank plus sparse (L+S) decomposition of thetime-dependent matrices Zt

Zt = Lt + St, ∀t = 1, . . . , T (1)

where Lt = [lt1, . . . , ltM ] is the low-rank matrix withrank(Lt) = rt < min(N2,M), St = [st1, . . . , stM ] isa sparse matrix with a fraction s of non-zero entries, andlti ∈ RN2

, sti ∈ RN2

are respectively the column vectors

3

of stimulus-related and background connectivity metrics forthe i-th subject at time t. The stimulus-induced connectivitythat are correlated between subjects at each time t can bewell-approximated by the columns of Lt that lie on a low-dimensional subspace spanned by common bases. For exam-ple, rank one rt = 1 yields a constant connectivity patternover all subjects. The columns of St captures the subject-specific background activities. Unlike the small Gaussian noiseassumption, entries of St may have arbitrarily large magnitudewith unknown sparse support, and thus capturing random,gross corruptions from the background events on the stimulus-related information in functional connectivity.B. L+S Decomposition

1) Principal Component Pursuit: To recover the sequenceof low-rank matrices {Lt} of stimulus-related connectivityfrom the corrupted observations {Zt} in (1), we considersolving the following optimization problem

min{Lt},{St}

T∑

t=1

‖Lt‖∗ + λT∑

t=1

‖St‖1

s.t. Lt + St = Zt, ∀t = 1, . . . , T (2)

where ‖Lt‖∗ =∑rti=1 σi(Lt) is the nuclear norm of matrix Lt,

i.e. the sum of all its singular values, ‖St‖1 =∑ij |stij | is

the `1 norm of St and λ > 0 is a regularization parametercontrolling the trade-off between the low-rank and sparsecomponents. The nuclear norm induces sparsity of the vectorof singular values, or equivalently for the matrix Lt to below rank, while the `1 penalty imposes element-wise sparsityof St. Minimization of (2) corresponds to applying the well-known PCP approach [14] at each time point. Under theincoherence conditions on the row and column subspaces ofLt, the PCP solution can perfectly recover the low-rank andsparse components, provided that the rank of Lt is not toolarge and St is reasonably sparse [14], [17]. Many efficientand scalable algorithms such as the augmented Lagrangemultipliers (ALM) method [18] have been proposed to solvethis convex optimization.

2) Fused L+S Recovery: The slowly-varying or possiblyconstant stimulus-related connectivity patterns across subjectsmay not be fully explained by a low-rank structure estimatedin (2). To better capture this inter-subject similarity in connec-tivity structure, we introduce a novel L+S decomposition bysolving a fused version of PCP (fused PCP)

min{Lt},{St}

T∑

t=1

‖Lt‖∗ + λ1

T∑

t=1

‖St‖1

+ λ2

T∑

t=1

M∑

i=2

∥∥lti − lt(i−1)∥∥1

s.t. Lt + St = Zt, ∀t = 1, . . . , T (3)

Here, the additional fused term∑Mi=2

∥∥lti − lt(i−1)∥∥1, regu-

larized by tuning parameter λ2 > 0, encourages the stimulus-induced group homogeneity by penalizing the differencesin the connectivity profiles lti between subjects. For somesufficiently large λ2, the penalty shrinks inter-subject differ-ences |ltij − lt(i−1)j | in some connectivity coefficients j to 0,

resulting in similar (but not necessarily identical) connectivitypatterns across subjects. Clearly, the model (3) includes theoriginal L+S decomposition (2) as a special case when λ2 = 0.

Define a matrix A = D⊗ IN2 where

D =

−1 1 0 . . . 00 −1 1 . . . 0...

.... . . . . .

...0 0 . . . −1 1

∈ R(M−1)×M

is first-order difference matrix and IN2 is a N2×N2 identitymatrix. We can reformulate the objective function in (3) as

T∑

t=1

‖Lt‖∗ + λ1

T∑

t=1

‖St‖1 + λ2

T∑

t=1

‖Avec(Lt)‖1 . (4)

III. OPTIMIZATION

For the fused L+S decomposition, we develop an ADMMalgorithm to solve the convex optimization problem (3) whichhas no closed form solution. We introduce a set of surrogatevariables α1, . . . ,αT where αt = Avec(Lt) ∈ RN2(M−1).Then, the minimization of (4) is equivalent to

min{Lt},{St},{αt}

T∑

t=1

‖Lt‖∗ + λ1

T∑

t=1

‖St‖1 + λ2

T∑

t=1

‖αt‖1

s.t. Lt + St = Zt, Avec(Lt) = αt, ∀t = 1, . . . , T(5)

The augmented Lagrangian function of (5) is

Lµ({Lt}, {St}, {αt}, {Xt}, {Yt})

=T∑

t=1

‖Lt‖∗ + λ1

T∑

t=1

‖St‖1 + λ2

T∑

t=1

‖αt‖1

+

T∑

t=1

(tr [XT

t (Lt + St − Zt)] + YTt (Avec(Lt)−αt)

)

2

T∑

t=1

(‖Lt + St − Zt‖2F + ‖Avec(Lt)−αt‖22

)(6)

where {Xt}, {Yt} are Lagrange multipliers, µ > 0 is a pre-specified step-size parameter, ‖·‖2 denotes the `2 norm and‖·‖F is the Frobenius norm. In a rescaled form, (6) can bewritten as

Lµ({Lt}, {St}, {αt}, {Xt}, {Yt})

=T∑

t=1

‖Lt‖∗ + λ1

T∑

t=1

‖St‖1 + λ2

T∑

t=1

‖αt‖1

2

T∑

t=1

(‖Lt + St − Zt + Xt‖2F

+ ‖Avec(Lt)−αt + Yt‖22 − ‖Xt‖2F − ‖Yt‖22)

(7)

where Xt = µ−1Xt and Yt = µ−1Yt are rescaled Lagrangemultipliers.

We can obtain the parameter estimates by minimizing theaugmented Lagrangian via ADMM algorithm [19], whichalternates between minimizing (7) with respect to each set

4

of primal variables Lt,St,αt sequentially, and the updatingof the dual variables Xt,Yt. Let initialize the parameters by(L0

t ,S0t ,α

0t ,X

0t ,Y

0t ) = (0,0,0,0,0) for t = 1, . . . , T . The

proposed ADMM algorithm solves the following subproblemsiteratively until convergence, with parameter updates at eachiteration k given by

{Lkt } = argmin{Lt}

Lµ({Lt}, {Sk−1t }, {αk−1t }, {Xk−1t }, {Yk−1

t })

(8)

{Skt } = argmin{St}

Lµ({Lkt }, {St}, {αk−1t }, {Xk−1t }, {Yk−1

t })

(9)

{αkt } = argmin{αt}

Lµ({Lkt }, {Skt }, {αt}, {Xk−1t }, {Yk−1

t })

(10)

Xkt = Xk−1

t + Lkt + Skt − Zt, ∀t = 1, . . . , T (11)

Ykt = Yk−1

t + Avec(Lkt )−αkt , ∀t = 1, . . . , T (12)

Next we derive the solution for each subproblem.1) Update Lt: To update {Lkt }, the subproblem (8) can be

separated into minimizations with respect to Lt’s at individualtime points (similarly for subproblems (9)-(10)), where eachLt is updated independently

Lkt = argminLt

‖Lt‖∗ +µ

2

∥∥Lt + Sk−1t − Zt + Xk−1t

∥∥2F

2

∥∥Avec(Lt)−αk−1t + Yk−1t

∥∥22. (13)

Due to the non-identity matrix A, (13) does not have a closed-form solution. Motivated by the linearized ADMM methodin [20] recently applied to lasso regression [21], [22], welinearize the quadratic term 1

2

∥∥Avec(Lt)−αk−1t + Yk−1t

∥∥22

in (13) as(AT

(Avec(Lk−1t )−αk−1t + Yk−1

t

))T (vec(Lt)− vec(Lk−1t )

)

2

∥∥vec(Lt)− vec(Lk−1t )∥∥22

(14)

where the parameter ν > 0 controls the proximity to Lk−1t .Substituting (14) into (13) and after some algebra, we obtainthe following approximate subproblem for (13)

Lkt = argminLt

1

µ‖Lt‖∗ +

1

2

∥∥Lt + Sk−1t − Zt + Xk−1t

∥∥2F

2‖Lt −Ct‖2F (15)

where Ct ∈ RN2×M is a matrix such that vec(Ct) = ct withct = vec(Lk−1t ) − AT

(Avec(Lk−1t )−αk−1t + Yk−1

t

)/ν.

We show that (15) can be solved via the singular value thresh-olding (SVT) [23]. More precisely, let Sτ (M) be the element-wise soft-thresholding (shrinkage) operator for a matrix M atlevel τ > 0, (Sτ (M))ij = sign(mij)max(|mij − τ |, 0). Wealso denote by Dτ (M) the singular value thresholding operatorgiven by Dτ (M) = USτ (Σ)VT where M = UΣVT is thesingular value decomposition (SVD) of M. Then, Lkt admitsan explicit solution

Lkt = D1/µ(ν+1)(Qt), Qt =1

ν + 1(Zt−Sk−1t +νCt−Xk−1

t )

(16)

Algorithm 1 ADMM for Fused L+S DecompositionInput: A series of vectorized multi-subject dynamic connec-

tivity metrics Z1, . . . ,ZT .Parameters: λ1 = 1/

√max(N2,M), λ2 > 0, µ =

1/ ‖Zt‖2, ν > ρ(ATA) where ρ(.) denotes the spectralradius.

Initialize: (L0t ,S

0t ,α

0t ,X

0t ,Y

0t ) = (0,0,0,0,0), ∀t =

1, . . . , T .1: for t = 1 : T do2: while not converge do3: Compute Ct: vec(Ct) = ct with ct = vec(Lk−1t )−

AT(Avec(Lk−1t )−αk−1t + Yk−1

t

)/ν

4: Compute Qt =1

ν+1 (Zt − Sk−1t + νCt −Xk−1t )

5: Update Lkt = D1/µ(ν+1) (Qt)

6: Update Skt = Sλ1/µ(Zt − Lkt −Xk−1t )

7: Update αkt = Sλ2/µ(Avec(Lkt ) + Yk−1t )

8: Update Xkt = Xk−1

t + Lkt + Skt − Zt9: Update Yk

t = Yk−1t + Avec(Lkt )−αkt

10: end while11: end forOutput: L1, . . . , LT and S1, . . . , ST .

The derivation of (16) is given in Supplementary SectionI. Note that when ν → 0, (16) reduces to the solution ofthe ALM algorithm for the low-rank component [18] in theoriginal L+S decomposition problem (2).

2) Update St: The minimization in subproblem (9) withrespect to St at each time point is equivalent to

Skt = argminSt

µ

2

∥∥Lkt + St − Zt + Xk−1t

∥∥2F+ λ1 ‖St‖1

We can get a closed-form solution by applying the element-wise soft-thresholding operator

Skt = Sλ1/µ(Zt − Lkt −Xk−1t ). (17)

3) Update αt: Similarly, the subproblem (10) with respectto each αt is

αkt = argminαt

µ

2

∥∥Avec(Lkt )−αt + Yk−1t

∥∥22+ λ1 ‖αt‖1

where the solution is also a soft-thresholding operation

αkt = Sλ2/µ(Avec(Lkt ) + Yk−1t ). (18)

We chose the parameter λ1 = 1/√

max(N2,M), as sug-gested by [14] for the PCP. The tuning parameter λ2 can beselected using cross-validation. To establish the convergence ofthe linearized ADMM algorithm, the parameter ν is requiredto satisfy ν > ρ(ATA) where ρ(.) denotes the spectral radius[22]. We set ν = 1.5ρ(ATA). We set the stopping criterionas

∥∥(Lkt ,Skt )− (Lk−1t ,Sk−1t )∥∥F

max{∥∥(Lk−1t ,Sk−1t )

∥∥F, 1} ≤ 10−6, ∀t = 1, . . . , T

or the maximum iteration number of 5000 is reached. Theproposed ADMM algorithm for solving the fused L+S de-composition is summarized in Algorithm 1.

5

IV. SIMULATION

We evaluate the performance of the proposed fused L+Sdecomposition method via simulation studies. The goal isto assess the performance of L+S algorithms in separatingthe common structure from individual-specific backgroundcomponents in multi-subject FC networks. Here, we focuson the static snapshot of dynamic networks at a particulartime point, and drop the index t for notational brevity. Wegenerate the data matrix of concatenated connectivity metricsof weighted, undirected networks for M subjects as follows

Z = Bβ + S (19)

where Z = [z1, . . . , zM ], S = [s1, . . . , sM ] with si =vec (Σs

i ), B = [b1, . . . ,br] ∈ RN2×r whose columns cor-respond to a set of r vectorized basis connectivity matri-ces bj = vec

(Σbj

)that are shared across subjects, and

β = [β1, . . . ,βM ] with βi ∈ Rr being a vector of mixingweights for subject i. To emulate the modular structure ofbrain networks [24] induced by stimuli or tasks, we generateeach basis connectivity matrix Σb

j from a stochastic blockmodel (SBM) [25] with two equal-sized communities withintra- and inter-community edge probabilities of 0.95 and0.2, respectively. The edge strengths are randomly drawnfrom the uniform distribution U [−1, 1]. The subject-dependentmixing weights βi are clustered according to two subgroups,which are independently sampled from normal distributionsβi ∼ N(0.5I, εI) for i = 1, . . . ,M/2 and βi ∼ N(0, εI)for i =M/2 + 1, . . . ,M to represent the strong and the weakresponse to the stimulus in the two subgroups, respectively. Weset ε = 0.005 such that the variability within each subgroupis small. The resulting matrix L = Bβ has rank r, and theconnectivity profiles in the columns of L are similar withineach subgroup. The background components of individualsubjects Σs

i are generated independently as sparse symmetricmatrices with support set uniformly chosen at random ata constant sparsity level s (fraction of non-zero entries),and whose non-zero entries are i.i.d. samples from U [−5, 5]indicating large magnitude of noise in Z. The simulations werereplicated 100 times.

We compare the performance of the fused L+S decom-position with the original version of PCP in recovering Land S from the observed matrix Z. For the fused L+S, weselect the optimal parameter λ2 over a large grid of values,which gives the best recovery performance on an indepen-dently generated validation set. To measure the accuracy ofrecovery, we computed the root mean squared error (RMSE)between the ground-truth and estimated low-rank and sparsecomponents, defined by RMSE-L = ‖L− L‖F / ‖L‖F andRMSE-S = ‖S− S‖F / ‖S‖F .

We investigate robustness of the methods to increasingrank r and sparsity level s of the ground-truth components.Fig.1 plots the average RMSEs over 100 replications as afunction of s (varied from 0.1 to 0.8 with an increment of0.1) for r = 1, 5, 10 with fixed N = 10 and M = 50. Asexpected, estimation errors of both methods for the low-rankand sparse components increase with s and r, generally. Thelarger fraction of corrupted entries and the rank of the data

matrix render the estimated L and S more deviated from theground-truth, in consistency with theoretical results in previousstudies [14]. Both methods perform comparably when s issmall. However, as s increases, the fused L+S estimator clearlyoutperforms the original version with substantially lower errorswith a slower growth rate. This suggests the robustness of thefused L+S method under the presence of severe dense noise.It provides a better recovery of the low-rank matrix of sharedconnectivity patterns even if almost all of its entries are grosslycorrupted by the background components, as evidenced bythe relatively stable errors when s = 0.7 and s = 0.8. Thisis due to the additional fused penalty term which smoothsout the individual-specific random background noise, whenleveraging on the inter-subject similarity of the stimulus-induced connectivity profiles.

Fig.2 shows fused L+S decomposition of a simulated multi-subject connectivity matrix Z with underlying rank r = 5and a fraction s = 0.5 of corrupted edges in each subject.The estimate L successfully recovers the correlated connec-tivity patterns across subjects with distinct sub-group sharedresponses from the highly-corrupted Z, while S filters outsubject-specific background noise without prior informationof the locations of the corrupted edges (the support of S).

V. ANALYSIS OF MOVIE-WATCHING FMRI

In this section, we applied the proposed fused L+S decom-position approach to separating the stimulus-related and back-ground components in time-varying FC from fMRI data of agroup of subjects exposed to complex naturalistic stimulationfrom movie viewing. We tested the established hypothesis that,as speech becomes more difficult to process, e.g., as throughspeech-in-noise, somatosensory, motor, and auditory systemsare engaged more to facilitate processing [26].

A. Data Acquisition & Stimuli

Thirteen subjects watched a television show during which 3Tesla fMRI data were collected at the Joan & Sanford I. WeillMedical College of Cornell University by Dr Skipper (GEMedical Systems, Milwaukee, WI; TR = 1.5 s; see [27] for fulldetails). The show, Are you smarter than a 5th grader?, was32min 24sec long. Among other features, audio events wereannotated at a 25ms resolution by a trained coder. These weregrouped into five general categories including noise, music,speech (speech that occurred in silence), sp+noise (speechthat co-occurred with noise, including background talking,clapping, music etc.) and silence. The audio annotations weredown-sampled to match the resolution of fMRI data by select-ing the auditory category that occurred most frequently withineach 1.5 s TR.

B. Preprocessing

The fMRI data were minimally preprocessed, includingslice timing correction, despiking, registration, nonlinear trans-formation to MNI coordinate space, and masking of voxelsoutside of the brain. We used the Freesurfer parcellation toobtain 16 regions of interest (ROIs) associated with languageand auditory processing: These ROIs include the centralsulcus (CS), planum polare (PP), planum temporale (PT),

6

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0

0.2

0.4

0.6

0.8

1

RM

SE

-L

L+S

Fused L+S

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0

0.2

0.4

0.6

0.8

1

RM

SE

-L

L+S

Fused L+S

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

0

0.2

0.4

0.6

0.8

1

RM

SE

-L

L+S

Fused L+S

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Sparsity Level s

0

0.05

0.1

0.15

0.2

RM

SE

-S

L+S

Fused L+S

(a) r = 1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Sparsity Level s

0

0.05

0.1

0.15

0.2

RM

SE

-S

L+S

Fused L+S

(b) r = 5

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8

Sparsity Level s

0

0.05

0.1

0.15

0.2

RM

SE

-S

L+S

Fused L+S

(c) r = 10

Fig. 1. Performance comparison of original and fused L+S decomposition algorithms in recovering low-rank and sparse structures from simulated multi-subjectconnectivity matrices. RMSEs of estimated low-rank (top) and sparse (bottom) components against increasing level of sparsity s for different ranks r. Resultsare averages over 100 replications.

Original FC

10

20

30

40

Connections

-2

0

2

Low-rank

10

20

30

40

Connections

-1

-0.5

0

0.5

1

Sparse

1 5 10 15 20 25 30 35 40 45 50

Subjects

10

20

30

40

Connections

-2

0

2

Fig. 2. L+S recovery of synthetic multi-subject FC networks. One realizationof simulated matrix Z (top) whose columns are vectorized upper-triangularpart of undirected FC matrices with N = 10 nodes for M = 50 subjects. Thedimension is N(N−1)/2×M = 45×50. Low-rank L (middle) and sparseS (bottom) components recovered from Z using the fused L+S method.

postcentral gyrus (PoCG), precentral gyrus (PreCG), superiortemporal gyrus (STG), transverse temporal gyrus (TTG) andtransverse temporal sulcus (TTS) from both left and righthemispheres. These regions are commonly associated withauditory, somatosensory and motor processing generally andspeech perception more specifically. The mean BOLD timeseries were computed across voxels within each ROI.

C. Shared Structure in Dynamic Connectivity Across Subjects

To characterize dynamic FC, we computed time-varyingcorrelations between the 16 ROI fMRI time series for eachsubject, using sliding windows of 15 TRs (22.5 s) withstep size of 1 TR (1.5 s) between windows. The choiceof window length of 22.5 s was based on the inverse of

the lowest frequency present in the data [28] after high-passfiltering at 0.05 Hz. This window length has been used in[8] to compute the sliding-window ISFC for movie-watchingfMRI data. At each time point t, the vectorized correlationcoefficients of individual subjects are stacked as columns ofmatrix Zt ∈ R120×13. We then performed the fused L+Sdecomposition on the multi-subject dynamic correlation ma-trices Z1, . . . ,ZT by solving (3) using the proposed ADMMalgorithm. We selected the penalty parameter λ2 based onthe performance of the resulting low-rank models in pre-dicting the time-varying auditory stimuli (see Section IV-E).From a range of λ2 values from 0.001 to 0.5, we identifiedλ2 = 0.05 as optimal, giving the highest classification ac-curacy on the validation set in a leave-one-subject-out cross-validation. Fig. 3 shows the extracted time-varying low-rankand sparse components averaged across subjects and acrossall edges. In Fig. 3 (left), the low-rank component showsnoticeable time-varying structure with higher specificity intime compared to the original dynamic correlations. It providesa better localization of dynamic changes in connectivity whichwere possibly elicited by the movie and time-locked to theprocessing of specific ongoing movie events. From Fig. 3(right), it is apparent that the low-rank components are highlysynchronized across subjects over time, while identifying someinter-subject variability possibly arising from varied responsesof different individuals to the stimuli. In contrast, the sparsecomponents show temporally uncorrelated patterns. The low-rank part is specifically sensitive to synchronized responsesacross subjects. This suggests that the proposed fused L+Sdecomposition is capable of isolating reliable FC changesdriven by a correlated source across subjects (exposed to thesame movie stimuli in our case). The uncorrelated backgroundsources of dynamic FC across subjects are filtered out as theresidual sparse components. These sources may correspond tosubject-specific dynamic fluctuations of FC networks as well

7

0 100 200 300 400 500 600 700 800 900 100011001200

20406080

100

Edge

s

Average across subjects

0 100 200 300 400 500 600 700 800 900 1000 1100 1200

0.0

0.2

0.4

0.6

Mea

n co

rr.ac

ross

edg

es

Variability across subjects

0 100 200 300 400 500 600 700 800 900 100011001200

20406080

100

Edge

s

0 100 200 300 400 500 600 700 800 900 1000 1100 12000.00

0.05

0.10

0.15

Mea

n co

rr.ac

ross

edg

es

0 100 200 300 400 500 600 700 800 900 100011001200Time (TR)

20406080

100

Edge

s

0 100 200 300 400 500 600 700 800 900 1000 1100 1200Time (TR)

0.0

0.2

0.4

0.6

Mea

n co

rr.ac

ross

edg

es

0.20.00.20.40.6

0.10.00.10.20.3

0.20.00.20.40.6

OriginalFC

LowRank

Sparse

Fig. 3. Fused L+S decomposition of time-resolved, multi-subject FC from movie-watching fMRI. (Left) Original dynamic pairwise correlations between 16ROIs and the estimated low-rank and sparse components, averaged over all subjects. (Right) Mean dynamic correlations across all edges and the estimatedlow-rank and sparse components. The group-average (blue lines) and individual (thin lines) time courses depict distributions of dynamic correlations oversubjects. The time-varying low-rank components are temporally locked across subjects, suggesting potential shared response to stimulus sequence in the movie.

Orig

inal

Sub1 Sub2 Sub3 Sub4 Sub5 Sub6 Sub7 Sub8 Sub9 Sub10 Sub11 Sub12 Sub13

Low

Rank

Spar

se

0.80.4

0.00.40.8

0.040.000.040.080.12

0.80.4

0.00.40.8

Fig. 4. Common structure in time-resolved functional connectivity across subjects. (Top) An example of original functional connectivity matricesZt = [zt1, . . . , ztM ] for M = 13 subjects at time window t = 59 TRs. (Middle) Low-rank component Lt with rt = 2 recovered by our fusedL+S approach, revealing shared connectivity structure across subjects. (Bottom) Estimated sparse errors or residuals St correspond to individual-specificbackground components. The selected time window corresponds to the median value of mean correlation over all edges and subjects.

as non-neuronal artifacts unrelated to the stimulus-processing.For illustration, Fig. 4 shows decomposition for a snapshot

of connectivity networks Zt at time point t = 59 TRs(selected based on median correlation over time). The low-rankcomponent identifies a common network structure underlyingall subjects, as evident from the similar and highly-correlatedconnectivity patterns detected across subjects. As validated bysimulation, our algorithm can reliably recover the commonstructure from the subject-specific background noise, whichcould be arbitrarily large in magnitude and contaminated mostof the network correlations (e.g., the dense residuals for subject3). The estimated rank was rt = 2, where individual networks(columns of L) can be expressed as a linear combination of2 common basis connectivity matrices with different mixingweights for each subject. See Supplementary Fig. 1 for addi-tional example decomposition at t = 72 TRs.

D. Low-Rank Components Reveal Stimulus-Induced Connec-tivity Patterns

To test the aforementioned hypothesis, we examinedwhether the extracted low-rank components are related to the

auditory stimuli present in the movie. We did so by fittinga multinomial logistic regression model to learn the mappingbetween the audio annotations and the time-resolved low-rankconnectivity metrics. We set the auditory category labels asresponses and the connectivity edges as predictors. The silenceevents were used as the reference category. To evaluate thegoodness-of-fit and how well the low-rank components aretemporally mapped to the auditory events, we used the fittedlogistic model to decode the probability of the actual categorylabels at each time point. Fig. 5 shows the decoded timecourses for each auditory category. For all categories, highprobabilities are precisely aligned with the intervals whenthose categories are present, indicating accurate mapping ofthe low-rank components to the stimulus annotations.

In Fig. 6, we plot the mean functional connectivity mapsand the estimated regression coefficient maps associated withthe four auditory categories in the movie. The results arebased on the low-rank components at a lag of 3 s (i.e.,Lt−2 at 2 time points before stimulus onsets at t). Thelatter was empirically determined to exhibit the strongestconnectivity by testing at previous and future time points (at

8

0.00.20.40.60.81.0 Noise

0.00.20.40.60.81.0 Music

0.00.20.40.60.81.0 Sp+Noise

0.00.20.40.60.81.0 Speech

0 100 200 300 400 500 600 700 800 900 1000 1100 1200Time (TR)

0.00.20.40.60.81.0 Silence

Deco

ded

Prob

abilit

y

Fig. 5. Decoded probability of a specific auditory stimulus category beingpresent in the movie over time based on the multinomial logistic regressionfitted on the extracted low-rank components. Colored lines indicate timecourses of decoded probabilities averaged over all subjects. Gray regions showtime points when the stimulus category was actually present in the movie.

intervals of 1.5 s or 1 TR) relative to the stimulus onsets(See Supplementary Fig. 2 and Fig. 3). This lag was expectedbecause it approximately includes to delay, rise time and peakof a canonical hemodynamic response function associatedwith any one stimulus lasting about 1.5 s. For details, seeSupplementary Section III for the additional analyses on thetemporal effect of brain response. As shown in Fig. 6, thelow-rank component detected distinct connectivity patterns,with relatively different distributions of connectivity across thedifferent auditory environments. As hypothesized, we observea noticeable increase in connectivity when speech was accom-panied by noise (sp+noise), compared to speech in silence.We used a group-level t-test to contrast the FC between thesp+noise and speech conditions. Fig. 7(a) shows differencein the mean FC matrices (across subjects and time points)from the low-rank component between the two conditions.The connections were thresholded at p < 0.0001 (Bonferroni-corrected). Specifically, there was significantly higher FCthroughout regions typically involved in speech perceptionand language comprehension for speech-in-noise comparedto speech. This included connections PT.R-TTS.R and PT.R-STG.R which showed the largest increase in FC strength.Furthermore, our method identified significant increase inconnectivity between the regions of auditory and speech pro-cessing with the premotor, motor and somatosensory regionsin response to speech mixed with noise, e.g., TTS.L-PoCG.R,PT.R-PreCG.R and TTG.R-PoCG.R, as shown in Fig. 7(b).

In summary, results support hypotheses about the roleof ‘the motor system’ in speech perception [26], [29]. Inparticular, a distributed set of brain regions involved in pro-ducing speech are dynamically recruited in perceiving speech,particularly in more effortful listening situations. This might bebecause speech production regions are engaged in predictingupcoming speech sounds. In noisy environments, this processneeds to be engaged more as predictions are inherently lessaccurate.

E. Across-Subject Prediction of Stimulus Annotations

We trained a support vector machine (SVM) classifier onthe time-varying low-rank FC components to evaluate anyimprovement in predicting the different auditory categoriesover the original observed version of time-varying FC metrics.We also compared the performance with the inter-subjectfunctional correlation (ISFC) [7], a widely-used method forisolating stimulus-induced inter-regional correlations betweenbrains exposed to the same stimuli from other intrinsic andnon-neuronal activities. We computed the ISFC in a sliding-window manner as in [8], which showed that the time-varying ISFC can detect movie-driven fMRI FC changes byaveraging out uncorrelated dynamic fluctuations. Following[7], we performed across-subject classification but using thetime-resolved instead of static FC patterns to predict the stim-ulus events over time. We used leave-one-subject-out cross-validation. In each fold, one subject was held out for testingand the remaining M − 1 subjects were used to train theSVM. We computed the time-resolved low-rank componentsL(m)t = [lt,1, lt,m−1, . . . , lt,m+1, lt,M ] for the training set

from the standard FC, Z(m)t = [zt,1, zt,m−1, . . . , zt,m+1, zt,M ]

using the fused L+S algorithm. Since the test set is unseenfrom the training set, we extracted the low-rank componentsfor the test subject m via orthogonal projection of the test datalt,m = UUT zt,m, where U is a N2×r basis matrix estimatedbased on the SVD of train data L

(m)t . Classification accuracy

was computed as the percentage of time points correctlyassigned to the true category labels.

Fig. 8 shows the classification results using the differenttime-varying FC metrics at a lag of 3 s relative to thestimulus annotations. Analysis of the temporal effects on theclassification performance shows that the highest accuracy wasobtained at this time lag (Supplementary Fig. 5). The low-rank components significantly outperformed both the standardFC and ISFC in discriminating ongoing auditory events inthe movie. The ISFC is slightly better than the standard FC.The superior decoding performance suggests that the low-rank components are more stimulus-related, extracting novelinformation about the stimulus-induced dynamic functionalchanges not captured by the FC and ISFC. Among the L+Salgorithms, addition of fused penalty improves the cross-subject prediction considerably, due to the further smoothingof subject-specific fluctuations to uncover shared neuronalresponses that are time-locked to the same stimulus processing.As expected, the residual sparse components, corresponding touncorrelated effects of intrinsic neuronal processes and noise,were poorly predictive of the stimuli across subjects.

VI. CONCLUSION

We developed a novel L+S decomposition method forseparating stimulus-induced and background components ofdynamic FC networks during naturalistic neuroimaging, byleveraging on the time-locked nature of naturalistic stimuli.The method provides a principled way to recover both thestimulus-evoked, shared FC dynamics across individuals andthe stimulus-unrelated idiosyncratic variations in individuals,respectively modeled as the low-rank and sparse structures of

9

CS.L

PP.L

PT.L

PoCG

.LPr

eCG.

LST

G.L

TTG.

LTT

S.L

CS.R

PP.R

PT.R

PoCG

.RPr

eCG.

RST

G.R

TTG.

RTT

S.R

CS.LPP.LPT.L

PoCG.LPreCG.L

STG.LTTG.LTTS.LCS.RPP.RPT.R

PoCG.RPreCG.R

STG.RTTG.RTTS.R

ROIs

Noise

CS.L

PP.L

PT.L

PoCG

.LPr

eCG.

LST

G.L

TTG.

LTT

S.L

CS.R

PP.R

PT.R

PoCG

.RPr

eCG.

RST

G.R

TTG.

RTT

S.R

CS.LPP.LPT.L

PoCG.LPreCG.L

STG.LTTG.LTTS.LCS.RPP.RPT.R

PoCG.RPreCG.R

STG.RTTG.RTTS.R

Music

CS.L

PP.L

PT.L

PoCG

.LPr

eCG.

LST

G.L

TTG.

LTT

S.L

CS.R

PP.R

PT.R

PoCG

.RPr

eCG.

RST

G.R

TTG.

RTT

S.R

CS.LPP.LPT.L

PoCG.LPreCG.L

STG.LTTG.LTTS.LCS.RPP.RPT.R

PoCG.RPreCG.R

STG.RTTG.RTTS.R

Sp+Noise

CS.L

PP.L

PT.L

PoCG

.LPr

eCG.

LST

G.L

TTG.

LTT

S.L

CS.R

PP.R

PT.R

PoCG

.RPr

eCG.

RST

G.R

TTG.

RTT

S.R

CS.LPP.LPT.L

PoCG.LPreCG.L

STG.LTTG.LTTS.LCS.RPP.RPT.R

PoCG.RPreCG.R

STG.RTTG.RTTS.R

Speech

0.02

0.00

0.02

0.04

0.06

0.08

0.10

0.12

CS.L

PP.L

PT.L

PoCG

.LPr

eCG.

LST

G.L

TTG.

LTT

S.L

CS.R

PP.R

PT.R

PoCG

.RPr

eCG.

RST

G.R

TTG.

RTT

S.R

ROIs

CS.LPP.LPT.L

PoCG.LPreCG.L

STG.LTTG.LTTS.LCS.RPP.RPT.R

PoCG.RPreCG.R

STG.RTTG.RTTS.R

ROIs

CS.L

PP.L

PT.L

PoCG

.LPr

eCG.

LST

G.L

TTG.

LTT

S.L

CS.R

PP.R

PT.R

PoCG

.RPr

eCG.

RST

G.R

TTG.

RTT

S.R

ROIs

CS.LPP.LPT.L

PoCG.LPreCG.L

STG.LTTG.LTTS.LCS.RPP.RPT.R

PoCG.RPreCG.R

STG.RTTG.RTTS.R

CS.L

PP.L

PT.L

PoCG

.LPr

eCG.

LST

G.L

TTG.

LTT

S.L

CS.R

PP.R

PT.R

PoCG

.RPr

eCG.

RST

G.R

TTG.

RTT

S.R

ROIs

CS.LPP.LPT.L

PoCG.LPreCG.L

STG.LTTG.LTTS.LCS.RPP.RPT.R

PoCG.RPreCG.R

STG.RTTG.RTTS.R

CS.L

PP.L

PT.L

PoCG

.LPr

eCG.

LST

G.L

TTG.

LTT

S.L

CS.R

PP.R

PT.R

PoCG

.RPr

eCG.

RST

G.R

TTG.

RTT

S.R

ROIs

CS.LPP.LPT.L

PoCG.LPreCG.L

STG.LTTG.LTTS.LCS.RPP.RPT.R

PoCG.RPreCG.R

STG.RTTG.RTTS.R

-30.0

-20.0

-10.0

0.0

10.0

20.0

30.0

Fig. 6. Stimulus-induced connectivity networks during movie watching as revealed by the low-rank components. (Top) Average functional connectivity maps(across 13 subjects and time steps) associated with each of the four categories of auditory events in the natural movie. (Bottom) The corresponding spatialmaps of regression coefficients from the fitted logistic model. Only functional connections showing significant stimulus-related effects are displayed (maskedby regression coefficients with p < 0.0001, Bonferroni-corrected for multiple testings).

multi-subject dFC networks. The proposed fused PCP solvedvia an efficient ADMM algorithm can capture the stimulus-induced similarities in FC profiles between subjects moreeffectively, and its robustness to dense noise allows us tofilter out corruptions of FC edges that are not necessarilysparse. Our proposed framework is general, broadly applicableto other neuroimaging data such as electroencephalography(EEG), and can incorporate other FC measures beyond simplecorrelations in ISC-based analyses. In an application to movie-watching fMRI, our method identified time-locked changes inFC networks across subjects, which were meaningfully relatedto the processing of complex, time-varying auditory eventsin the movie, e.g., dynamic recruitment of speech productionsystems in perceiving speech in noisy environments. It alsorevealed potentially interesting individual differences in FCpatterns that may relate to behavioral appraisal of stimuli. Theextracted low-rank FC components also show better predictionof the movie auditory annotations than ISFC, suggesting thegain in information about the naturalistic stimuli by usingshared network structure instead of synchronization of low-level regional activity. Thus, the proposed L+S recovery ap-proach opens new opportunities for mapping brain networkdynamics to stimulus features and behavior during naturalisticparadigms. Future studies can explore potential applications ofL+S analysis of natural fMRI to connectome-based predictionof individual traits and behavior, and diagnosis of cognitiveand affective disorders. Further extensions are needed toimprove the scalability of L+S learning algorithm to handlelarge-sized brain networks. The dynamic FC analysis canalso be extended to investigate state-related changes in thenetwork structure, e.g., dynamic community structure acrosssubjects during naturalistic paradigms, by applying regime-switching models [30], [31] on the extracted time-resolved FCcomponents.

REFERENCES

[1] S. Sonkusare, M. Breakspear and C. Guo, “Naturalistic stimuli inneuroscience: Critically acclaimed,” Trends Cognitive Sci., vol. 23, no.8, pp. 699–714, 2019.

[2] J. Wang, et al., “Test–retest reliability of functional connectivitynetworks during naturalistic fmri paradigms,” Human Brain Mapp., vol.38, no. 4, pp. 2226–2241, 2017.

[3] A. Bartels and S. Zeki, “Functional brain mapping during free viewingof natural scenes,” Human Brain Mapp., vol. 21, no. 2, pp. 75–85, 2004.

[4] U. Hasson, et al., “Intersubject synchronization of cortical activity duringnatural vision,” Science, vol. 303, no. 5664, pp. 1634–1640, 2004.

[5] U. Hasson, R. Malach and D. J. Heeger, “Reliability of cortical activityduring natural stimulation,” Trends Cognitive Sci., vol. 14, no. 1, pp.40–48, 2010.

[6] I. P. Jaaskelainen, et al., “Inter-subject synchronization of prefrontal cor-tex hemodynamic activity during natural viewing,” Open NeuroimagingJ., vol. 2, pp. 14–19, 2008.

[7] E. Simony, et al., “Dynamic reconfiguration of the default mode networkduring narrative comprehension,” Nature Comm., vol. 7, pp. 12141,2016.

[8] T. A. Bolton, et al., “Brain dynamics in ASD during movie-watchingshow idiosyncratic functional integration and segregation,” Human BrainMapp., vol. 39, no. 6, pp. 2391–2404, 2018.

[9] X. Di and B. B. Biswal, “Intersubject consistent dynamic connectivityduring natural vision revealed by functional mri,” NeuroImage, pp. 1–9,2020.

[10] S. A. Nastase, et al., “Measuring shared responses across subjects usingintersubject correlation,” 2019.

[11] T. Bouwmans and E. H. Zahzah, “Robust pca via principal componentpursuit: A review for a comparative evaluation in video surveillance,”Comput. Vision Image Understanding, vol. 122, pp. 22–34, 2014.

[12] X. Liu, et al., “Background subtraction based on low-rank and structuredsparse decomposition,” IEEE Trans. Image Process., vol. 24, no. 8, pp.2502–2514, 2015.

[13] J. Wright, et al., “Robust principal component analysis: Exact recoveryof corrupted low-rank matrices via convex optimization,” in Adv. NeuralInf. Process. Syst., 2009, pp. 2080–2088.

[14] E. J. Candes, et al., “Robust principal component analysis?,” J. ACM,vol. 58, no. 3, pp. 1–37, 2011.

[15] P. Aggarwal and A. Gupta, “Low rank and sparsity constrained methodfor identifying overlapping functional brain networks,” PloS one, vol.13, no. 11, pp. 1–19, 2018.

[16] X. Jiang, et al., “Estimating functional connectivity networks via low-rank tensor approximation with applications to mci identification,” IEEETrans. Biomed. Eng., vol. 67, no. 7, pp. 1912–1920, 2020.

10

CS.L

PP.L

PT.L

PoCG

.LPr

eCG.

LST

G.L

TTG.

LTT

S.L

CS.R

PP.R

PT.R

PoCG

.RPr

eCG.

RST

G.R

TTG.

RTT

S.R

ROIs

CS.LPP.LPT.L

PoCG.LPreCG.L

STG.LTTG.LTTS.LCS.RPP.RPT.R

PoCG.RPreCG.R

STG.RTTG.RTTS.R

ROIs

CS.L

PP.L

PT.L

PoCG

.LPr

eCG.

LST

G.L

TTG.

LTT

S.L

CS.R

PP.R

PT.R

PoCG

.RPr

eCG.

RST

G.R

TTG.

RTT

S.R

ROIs

CS.LPP.LPT.L

PoCG.LPreCG.L

STG.LTTG.LTTS.LCS.RPP.RPT.R

PoCG.RPreCG.R

STG.RTTG.RTTS.R

ROIs

0.000

0.004

0.008

0.012

0.016

0.0060

0.0045

0.0030

0.0015

0.0000

Speech+Noise > Speech

(a)

(b)

Fig. 7. Difference in functional connectivity between the speech-in-noise(sp+noise) and speech-only (speech) events as revealed by the low-rankcomponents at time lag of 3 s (or 2 TRs). (a) Connectivity maps showincrease (left) and decrease (right) in FC during sp+noise compared tospeech events. Only edges with significant difference in FC strength (absolutevalue of correlation coefficients) are shown (p < 0.0001, Bonferroni-corrected). The five connections exhibiting largest increase in FC includePT.R-TTS.R, TTS.L-PoCG.R, PP.R-TTS.R, TTS.L-PT.R and PT.R-STG.R. (b)Connectogram shows changes increase (hot colors) and decrease (blue) in FCstrength between auditory (AUD) and premotor-motor-somatosensory (PMS)regions. The thickness of links indicate magnitude of changes.

L+S Fused L+S0

10

20

30

40

50

60

70

Accura

cy (

%)

Original ISFC

Low-rank Sparse

Fig. 8. Accuracy scores of leave-one-subject-out prediction of stimulusannotations for five categories of audio events present in the movie over timeusing different time-resolved connectivity features. The categories consideredinclude Noise, Music, Speech, Sp+Noise and Silence.

[17] V. Chandrasekaran, et al., “Rank-sparsity incoherence for matrix

decomposition,” SIAM J. Optimization, vol. 21, no. 2, pp. 572–596,2011.

[18] Z. Lin, M. Chen and Y. Ma, “The augmented lagrange multiplier methodfor exact recovery of corrupted low-rank matrices,” arXiv preprintarXiv:1009.5055, 2010.

[19] S. Boyd, et al., “Distributed optimization and statistical learning viathe alternating direction method of multipliers,” Found. Trends Mach.Learn., vol. 3, no. 1, pp. 1–122, 2011.

[20] X. Wang and X. Yuan, “The linearized alternating direction method ofmultipliers for dantzig selector,” SIAM J. Sci. Comput., vol. 34, no. 5,2012.

[21] X. Li, et al., “Linearized alternating direction method of multipliers forsparse group and fused lasso models,” Comput. Stat. Data Anal., vol.79, pp. 203–221, 2014.

[22] M. Li, et al., “The linearized alternating direction method of multipliersfor low-rank and fused lasso matrix regression model,” J. Appl. Stat.,pp. 1–18, 2020.

[23] J.-F. Cai, E. J. Candes and Z. Shen, “A singular value thresholdingalgorithm for matrix completion,” SIAM J. Optimization, vol. 20, no. 4,pp. 1956–1982, 2010.

[24] O. Sporns and R. F. Betzel, “Modular brain networks,” Annu. Rev.Psychol., vol. 67, pp. 613–640, 2016.

[25] D. M. Pavlovic, et al., “Stochastic blockmodeling of the modules andcore of the caenorhabditis elegans connectome,” PloS One, vol. 9, no.7, 2014.

[26] J. I. Skipper, J. T. Devlin and D. R. Lametti, “The hearing ear is alwaysfound close to the speaking tongue: Review of the role of the motorsystem in speech perception,” Brain and language, vol. 164, pp. 77–105, 2017.

[27] J. I. Skipper and J. D. Zevin, “Brain reorganization in anticipation ofpredictable words,” bioRxiv 101113, 2017.

[28] N. Leonardi and D. Van De Ville, “On spurious and real fluctuationsof dynamic functional connectivity during rest,” NeuroImage, vol. 104,pp. 430–436, 2015.

[29] P. Adank, “The neural bases of difficult speech comprehension andspeech production: Two activation likelihood estimation (ale) meta-analyses,” Brain and Language, vol. 122, no. 1, pp. 42–54, 2012.

[30] C. M. Ting, et al., “Estimating dynamic connectivity states in fMRIusing regime-switching factor models,” IEEE Trans. Med. Imaging, vol.37, no. 4, pp. 1011–1023, 2018.

[31] C. M. Ting, et al., “Detecting dynamic community structure in functionalbrain networks across individuals: A multilayer approach,” IEEE Trans.Med. Imaging, 2020, In press.

1

Supplementary Material

C.-M. Ting, J. I. Skipper, S. L. Small, and H. Ombao

Abstract—This appendix contains additional material to ac-company our paper “Separating Stimulus-Induced and Back-ground Components of Dynamic Functional Connectivity inNaturalistic fMRI”. Section I provides some technical details ofthe proposed ADMM algorithm for fused L+S decomposition.Section II presents supplementary results for the detected sharedand individual-specific connectivity structure from movie fMRIdata. Section III investigates the temporal effect of brain responseto auditory stimuli present in the movie.

I. TECHNICAL DETAILS: UPDATE OF Lt

We derive a closed-form solution for updating the low-rankcomponent Lt in the proposed ADMM algorithm.

Proposition 1. For any µ ≥ 0, the solution (Eq. 16, maintext) to the subproblem (15) as defined by the singular valuethresholding (SVT) operator obeys

D1/µ(ν+1)(Qt) = argminLt

{1

µ‖Lt‖∗

+1

2‖Lt + St − Zt + Xt‖2F +

ν

2‖Lt −Ct‖2F

}(1)

where Qt =1

ν+1 (Zt − St + νCt −Xt).

Proof. We generally follow the lines of proof of Theorem 2.1in [1]. Let φ = ν+1 and τ = µ−1. By the optimality conditionof the objective function in (1), the solution L∗

t must satisfy

0 ∈ φL∗t − Γt + τ∂ ‖L∗

t ‖∗ (2)

where ∂ ‖L∗t ‖∗ is the subgradient of nuclear norm evaluated

at L∗t , and Γt = Zt − St + νCt −Xt. Let M = UΣVT be

the SVD of an arbitrary matrix M. It is known that

∂ ‖M‖∗ ={UVT + W : UTW = 0, WV = 0, ‖W‖2 ≤ 1

}.

Define L∗t = Dτ/φ(φ−1Γt). To show that L∗

t obeys (2),decompose the SVD of Γt as

Γt = U0Σ0VT0 + U1Σ1V

T1

where U0,V0 (resp. U1,V1) are singular vectors with singu-lar values σ0,j > τ (resp. σ1,j ≤ τ ). It follows that the SVD

of φ−1Γt is

φ−1Γt = U0Σ0VT0 + U1Σ1V

T1

where Σ0 = φ−1Σ0 (resp. Σ1 = φ−1Σ1) with singular valuesσ0,j > τ/φ (resp. σ1,j ≤ τ/φ). Then, we have

L∗t = U0(Σ0 − τ/φI)V0 = φ−1U0 (Σ0 − τI)V0.

Therefore,

Γt − φL∗t = τ

(U0V

T0 + W

), W = (τ/φ)−1U1Σ1V

T1 .

By definition, UT0 W = 0, WV0 = 0 and since the diagonal

elements of Σ1 have magnitude bounded by τ/φ and ‖W‖2 ≤1. Hence Γt − φL∗

t = τ∂ ‖L∗t ‖∗.

II. SHARED & SUBJECT-SPECIFIC FC STRUCTURE

Fig. 1 shows L+S decomposition for a snapshot of FC net-works at time window t = 72 (corresponding to the maximumvalue of correlation over time). The low-rank componentsdetected shared FC patterns across subjects while revealingstronger response to the stimulus in subjects 3, 5 and 7.

III. TEMPORAL EFFECT OF BRAIN RESPONSE

To investigate the timing of brain response to stimuli, weexamined the estimated functional connectivity (FC) beforeand after the onsets of auditory stimuli. Fig. 2 and Fig. 3show respectively the FC and regression coefficient maps fittedbased on low-rank components at future and previous timepoints (at intervals of 1.5 s or 1 TR) relative to the stimulusonsets. The responsiveness of brain networks varies as afunction of temporal context. We observed that the strongestconnectivity occurred 3 s prior to the auditory events (t-3s), which was especially evident for the sp+noise and musiccategories. In Fig. 4, we also contrast the FC for sp+noise vsspeech categories at the different time points around stimulusonsets. Similarly, the largest increase in connectivity strengthwas found at the lag of 3 s before the onsets when perceivingspeech-in-noise, with enhanced interactions between regionsfrom the sensory motor cortex and the superior temporal plane.These interactions decreased gradually afterward over timewith little activity after the stimulus onsets.

Orig

inal

Sub1 Sub2 Sub3 Sub4 Sub5 Sub6 Sub7 Sub8 Sub9 Sub10 Sub11 Sub12 Sub13

Low

Rank

Spar

se

0.80.4

0.00.40.8

0.20.00.20.4

1.00.5

0.00.51.0

Fig. 1. Common structure in time-resolved functional connectivity across subjects. (Top) Example of original functional connectivity matrices Zt =[zt1, . . . , ztM ] for M = 13 subjects at time window t = 72 TRs. (Middle) Low-rank component Lt with rt = 2 recovered by our fused L+Sapproach, revealing shared connectivity structure across subjects. (Bottom) Estimated sparse errors or residuals St correspond to individual-specific backgroundcomponents. The time window selected corresponds to the maximum value of mean correlation over all edges and subjects.

arX

iv:2

102.

1033

1v1

[q-

bio.

NC

] 2

4 Ja

n 20

21

2

Noi

set-6.0s t-4.5s t-3.0s t-1.5s t t+1.5s t+3.0s t+4.5s t+6.0s

Mus

icSp

+N

oise

Spee

ch

0.00

0.04

0.08

0.12

0.00

0.04

0.08

0.12

0.00

0.04

0.08

0.12

0.00

0.04

0.08

0.12

Fig. 2. Mean functional connectivity maps of the low-rank components (across subjects and time) at different time points before and after the stimulus onsetsfor four auditory events in the movie. The maps are masked by significant regression coefficients in the logistic model (p < 0.0001, Bonferroni-corrected).

Noi

se

t-6.0s t-4.5s t-3.0s t-1.5s t t+1.5s t+3.0s t+4.5s t+6.0s

Mus

icSp

+N

oise

Spee

ch

4020

020

4020

02040

40

20

0

20

3015

01530

Fig. 3. Regression coefficient maps of logistic regression models fitted on time-resolved low-rank components at different time points before and after thestimulus annotations for four auditory events in the movie. The maps are thresholed at p < 0.0001 (Bonferroni-corrected).

To examine the temporal effect on the cross-subject predic-tion of stimulus annotations, we trained the SVM classifiersusing the preceding and succeeding FC patterns to predict theauditory events. Fig. 5 shows that use of the proposed fusedlow-rank FC performed consistently better than the standardFC and ISFC at all time points considered before and after thestimulus onsets. The highest prediction accuracy was achievedat 3 s before the onsets suggesting the FC metrics at this timelag provided the most discriminative information about thestimuli. This is confirmed by the largest contrast in FC pattern

between stimulus types detected at this time lag (Fig. 4). Thedrop in prediction performance when using the succeeding FCimplies deceased level of brain responsiveness to the stimuliover time after the onsets.

REFERENCES

[1] J.-F. Cai, E. J. Candes and Z. Shen, “A singular value thresholdingalgorithm for matrix completion,” SIAM J. Optimization, vol. 20, no. 4,pp. 1956–1982, 2010.

3

Sp+

Noi

se

t-6.0s t-4.5s t-3.0s t-1.5s t t+1.5s t+3.0s t+4.5s t+6.0s

Spee

chSp

+N

> S

pSp

+N

< S

p

0.00

0.04

0.08

0.12

0.00

0.04

0.08

0.12

0.0000.0050.0100.0150.020

0.0120.0090.0060.003

0.000

Fig. 4. Contrast of functional connectivity between the speech-in-noise and speech events as revealed by the low-rank components at different time pointsbefore and after the stimulus onsets. Mean FC maps across subjects and time for the speech-in-noise and speech events (top panels) and their differences (bottompanels). Only edges with significant difference in FC strength (absolute value of correlation coefficients) are shown (p < 0.0001, Bonferroni-corrected).

t-4.5s t-3.0s t-1.5s t t+1.5s t+3.0s t+4.5s0

10

20

30

40

50

60

70

Accura

cy (

%)

Original ISFC Low-rank

Fig. 5. Prediction accuracies of various time-resolved connectivity features at different time points before and after the auditory stimulus onsets. Error barsindicate standard deviations of accuracy across subjects in leave-one-subject-out cross-validation.