source-informed segmentation: a data-driven approach for...

18
1 Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation of EEG Ali E. Haddad, Student Member, IEEE, and Laleh Najafizadeh, Senior Member, IEEE AbstractGoal: Understanding the dynamics of brain func- tion through non-invasive monitoring techniques requires the development of computational techniques that can deal with the non-stationary properties of recorded activities. As a solution to this problem, a new data-driven segmentation method for recordings obtained through electroencephalography (EEG) is presented. Methods: The proposed method utilizes singular value decomposition (SVD) to identify the time intervals in the EEG recordings during which, the spatial distribution of clusters of active cortical neurons remains quasi-stationary. Theoretical analysis shows that the spatial locality features of these clusters can be, asymptotically, captured by the most significant left singular subspace of the EEG data. A reference/sliding win- dow approach is employed to dynamically extract this feature subspace, and the running error is monitored for significant changes using Kolmogorov-Smirnov (K-S) test. Results: Simu- lation results, for a wide range of possible scenarios regarding the spatial distribution of active cortical neurons, show that the algorithm is successful in accurately detecting the segmental structure of the simulated EEG data. The algorithm is also applied to experimental EEG recordings of a modified visual oddball task. Results identify a unique sequence of dynamic patterns in the event-related potential (ERP) response for each of the three involved stimuli. Conclusion: The proposed method, without using source localization methods or scalp topographical maps, is able to identify intervals of quasi-stationarity in the EEG recordings. Significance: The proposed segmentation technique can offer new insights on the dynamics of functional organization of the brain in action. Index Terms— Electroencephalography (EEG), segmentation, singular value decomposition (SVD), dynamics of brain function. I. I NTRODUCTION Recent years have witnessed an increasing interest in an- alyzing the dynamics of brain function in order to obtain a better understanding of the brain’s large-scale functional organization. A growing number of studies involving various brain monitoring modalities have shown that different regions of the brain interact dynamically during task execution or when the brain is at rest [1]–[21]. As neuronal interactions occur on the millisecond time scale, by offering high temporal resolution, electroencephalography (EEG) is considered to be an attractive monitoring technique for exploring the temporal dynamics of brain function [4], [11], [18], [22]–[26]. One approach to analyze the dynamics of brain function from EEG recordings, while coping with the non-stationarity A. E. Haddad and L. Najafizadeh are with the Department of Electrical and Computer Engineering, Rutgers University, Piscataway, NJ 08854, USA. E- mail: [email protected], laleh.najafi[email protected]. This paper was presented in part at the International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), 2015. Copyright (c) 2017 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending an email to [email protected]. of neuronal activity, is to use segmentation techniques, where recorded activities are broken into intervals during which, the functionality of the brain can be assumed to be quasi- stationary, given some statistical definition. To date, several EEG segmentation approaches have been proposed. These methods can generally be classified into three categories: model-based, metric-based, and connectivity-based [27]. Segmentation methods in the model-based category interpret the quasi-stationarity of an interval as the existence of a time- invariant model that is representative of the temporal variations of the signal within the interval [28]–[31]. A common tech- nique in this category involves estimating an autoregressive (AR) model (of a certain order) for the data under a reference window, fitting the data under a sliding window to this model, and evaluating the statistical properties of the fitting error. Data under two windows that belong to the same segment share the same underlying model, therefore, in this case, for example, the fitting error is expected to be zero-mean white noise. In general, the model-based segmentation methods consider signals from single EEG electrodes at a time. Segmentation methods in the metric-based category calcu- late a testing metric directly from the data sequence. The metric is then monitored for significant changes to identify segment boundaries [32]–[37]. For example, in [32]–[34], the segmentation problem is reduced to finding a change point in the autocorrelation of the recordings of a given channel, using a generalized variant of the Kolmogorov-Smirnov (K- S) statistic. The metrics used in this category of segmentation methods only address certain statistical properties of the data, thus, restricting their applications to the definitions of quasi- stationarity these statistics impose. Furthermore, most of the the metric-based methods segment EEG data on a single channel basis. The third category of segmentation methods can be con- sidered a special case of the metric-based category, with the metric used being the connectivity measure [11], [38]– [40]. Connectivity-based methods assume that a segment is an interval during which, the functional connectivity among some predefined brain regions stays quasi-stationary. For example, in [11], through low-rank approximations to consecutive tensors and computation of subspace distance between them, segment boundaries are identified when network connectivity patterns show consistent changes across subjects. Unlike the two earlier categories of segmentation methods, the connectivity-based methods process all EEG channels simultaneously. However, due to the segment definition they assume, their applications are restricted to connectivity analysis. A related EEG analysis method is based on microstates

Upload: others

Post on 18-Oct-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

1

Source-Informed Segmentation: A Data-DrivenApproach for the Temporal Segmentation of EEG

Ali E. Haddad, Student Member, IEEE, and Laleh Najafizadeh, Senior Member, IEEE

Abstract— Goal: Understanding the dynamics of brain func-tion through non-invasive monitoring techniques requires thedevelopment of computational techniques that can deal with thenon-stationary properties of recorded activities. As a solutionto this problem, a new data-driven segmentation method forrecordings obtained through electroencephalography (EEG) ispresented. Methods: The proposed method utilizes singular valuedecomposition (SVD) to identify the time intervals in the EEGrecordings during which, the spatial distribution of clustersof active cortical neurons remains quasi-stationary. Theoreticalanalysis shows that the spatial locality features of these clusterscan be, asymptotically, captured by the most significant leftsingular subspace of the EEG data. A reference/sliding win-dow approach is employed to dynamically extract this featuresubspace, and the running error is monitored for significantchanges using Kolmogorov-Smirnov (K-S) test. Results: Simu-lation results, for a wide range of possible scenarios regardingthe spatial distribution of active cortical neurons, show that thealgorithm is successful in accurately detecting the segmentalstructure of the simulated EEG data. The algorithm is alsoapplied to experimental EEG recordings of a modified visualoddball task. Results identify a unique sequence of dynamicpatterns in the event-related potential (ERP) response for eachof the three involved stimuli. Conclusion: The proposed method,without using source localization methods or scalp topographicalmaps, is able to identify intervals of quasi-stationarity in the EEGrecordings. Significance: The proposed segmentation techniquecan offer new insights on the dynamics of functional organizationof the brain in action.

Index Terms— Electroencephalography (EEG), segmentation,singular value decomposition (SVD), dynamics of brain function.

I. INTRODUCTION

Recent years have witnessed an increasing interest in an-alyzing the dynamics of brain function in order to obtaina better understanding of the brain’s large-scale functionalorganization. A growing number of studies involving variousbrain monitoring modalities have shown that different regionsof the brain interact dynamically during task execution orwhen the brain is at rest [1]–[21]. As neuronal interactionsoccur on the millisecond time scale, by offering high temporalresolution, electroencephalography (EEG) is considered to bean attractive monitoring technique for exploring the temporaldynamics of brain function [4], [11], [18], [22]–[26].

One approach to analyze the dynamics of brain functionfrom EEG recordings, while coping with the non-stationarity

A. E. Haddad and L. Najafizadeh are with the Department of Electrical andComputer Engineering, Rutgers University, Piscataway, NJ 08854, USA. E-mail: [email protected], [email protected]. This paper waspresented in part at the International Conference of the IEEE Engineering inMedicine and Biology Society (EMBC), 2015.

Copyright (c) 2017 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending an email to [email protected].

of neuronal activity, is to use segmentation techniques, whererecorded activities are broken into intervals during which,the functionality of the brain can be assumed to be quasi-stationary, given some statistical definition. To date, severalEEG segmentation approaches have been proposed. Thesemethods can generally be classified into three categories:model-based, metric-based, and connectivity-based [27].

Segmentation methods in the model-based category interpretthe quasi-stationarity of an interval as the existence of a time-invariant model that is representative of the temporal variationsof the signal within the interval [28]–[31]. A common tech-nique in this category involves estimating an autoregressive(AR) model (of a certain order) for the data under a referencewindow, fitting the data under a sliding window to this model,and evaluating the statistical properties of the fitting error. Dataunder two windows that belong to the same segment share thesame underlying model, therefore, in this case, for example,the fitting error is expected to be zero-mean white noise.In general, the model-based segmentation methods considersignals from single EEG electrodes at a time.

Segmentation methods in the metric-based category calcu-late a testing metric directly from the data sequence. Themetric is then monitored for significant changes to identifysegment boundaries [32]–[37]. For example, in [32]–[34], thesegmentation problem is reduced to finding a change pointin the autocorrelation of the recordings of a given channel,using a generalized variant of the Kolmogorov-Smirnov (K-S) statistic. The metrics used in this category of segmentationmethods only address certain statistical properties of the data,thus, restricting their applications to the definitions of quasi-stationarity these statistics impose. Furthermore, most of thethe metric-based methods segment EEG data on a singlechannel basis.

The third category of segmentation methods can be con-sidered a special case of the metric-based category, withthe metric used being the connectivity measure [11], [38]–[40]. Connectivity-based methods assume that a segment is aninterval during which, the functional connectivity among somepredefined brain regions stays quasi-stationary. For example, in[11], through low-rank approximations to consecutive tensorsand computation of subspace distance between them, segmentboundaries are identified when network connectivity patternsshow consistent changes across subjects. Unlike the two earliercategories of segmentation methods, the connectivity-basedmethods process all EEG channels simultaneously. However,due to the segment definition they assume, their applicationsare restricted to connectivity analysis.

A related EEG analysis method is based on microstates

Page 2: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

[41]–[47]. This method assumes the existence of a predefined,limited number of repeating brain states, each of which ischaracterized by a distinct dominant topographical map. Toidentify these states and their corresponding topographies, aniterative procedure is generally used. A common practice isto initialize the procedure with a subset of the topographicalmaps obtained at the points of local maxima of the globalfield power (GFP). The topographic map at each sample ofthe EEG recordings is then assigned to the state with theclosest matching map, and through an iterative process, thetopographical map for each class is updated. This procedureis expected to converge to a local optimum point, where theassigned topographical maps explain most of the variance inthe EEG data. A drawback of this approach is its need forthe number of microstates to be known a priori. To overcomethis, the Chicago electrical neuroimaging analytics (CENA)micro-segmentation technique [48], [49] adopts a data-drivenapproach to determine the number of stable microstates fromERPs. In this approach, first, the running global root-mean-squared-error (RMSE) between pairs of samples, with a pre-defined time-lag, is calculated. Then, peaks in the RMSE thatpass a predetermined confidence interval (CI) are consideredthe beginnings of stable microstates, which would last tillthe next corresponding RMSE troughs. Microstates are setas the average topographies over each stable interval. Toavoid having repeated microstates, the n-dimenaional cosinedistance among the detected microstates is calculated andthe ones with distances lower than a predefined value aremerged. A drawback of this approach, though, is the needfor jittered baseline data for determining the CI. A variantof the microstate analysis method based on the state offunctional connectivity graphs has also been proposed in[50], in which the instantaneous phase locking value (PLV)-based functional connectivity graphs are first created, then theneural gas algorithm is used to search for a given number ofreference topographies to approximate the created functionalconnectivity graphs, with minimum error within each trial.Reference topographies from different trails and subjects arethen clustered to find the overall representative topographies,where the number of topographies is chosen to optimize theclustering performance.

In this paper, utilizing singular value decomposition (SVD),we present a new EEG segmentation method capable ofidentifying intervals of time in EEG recordings during which,the spatial distribution of clusters of active cortical neuronsremains quasi-stationary. The segmentation is achieved with-out using source localization methods and without using thespatial characteristics of topographical scalp maps. It will beshown that the significant left singular subspace of the EEGrecordings obtained through SVD, resembles the spatial local-ity features of the clusters of active cortical neurons and canbe used as a feature space in the segmentation algorithm. Theproposed algorithm uses information from all EEG electrodesto identify the segment boundaries. A preliminary version ofthis work was presented in [27]. Here, in addition to pro-viding a comprehensive theoretical analysis for the proposedapproach, a simulation platform is presented to extensivelyexamine the ability of the algorithm in successfully identifying

Fig. 1. An illustration of 3 simulated EEG segments, differentiated by thespatial distribution of active cortical points during each segment. Two verticallines at T1 = 60 msec and T2 = 160 msec mark the segment boundaries, atwhich changes in the spatial distribution of activities in the source space (thecortex) occur.

the segment boundaries for a variety of possible scenariosregarding the distribution of cortical activity. The algorithm isalso applied to experimental data obtained from five healthysubjects while they performed a modified visual oddball task.As the ground-truth is not known for the experimental data,the outcomes of the proposed segmentation algorithm arecompared to the results of an alternative source-localization-based segmentation approach.

The rest of the paper is organized as follows. In SectionII, the definition for the segment in EEG data is presented,along with the theoretical basis for the proposed segmentationapproach. The proposed algorithm is described in Section III.Extensive simulation results are presented in Section IV, andexperimental results are discussed in Section V. Discussionsare provided in Section VI, and the paper is concluded inSection VII.

II. THEORETICAL FORMULATION

A. Definition of EEG Segment

While the segmentation method proposed in this paperaims at segmenting the EEG data collected on the scalp (thesensor space), the definition adopted for the segment is basedon the state of neuronal activity in the cortex (the sourcespace). Inspired by the definition of the microstates [24], [43],[44], [46], [51], we define a segment in the EEG data as aninterval of time during which, the EEG recordings correspondto a quasi-stationary spatial distribution for clusters of activecortical neurons. The objective of the proposed segmentationmethod is to detect the time points in the EEG recordingsat which there would be changes in the spatial distributionof cortical activity. An illustrative example for three suchsegments in simulated EEG signals is shown in Fig. 1. Thevertical lines at T1 = 60 msec and T2 = 160 msec markinstants when changes in the spatial distribution of corticalactivity occur.

Page 3: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

Fig. 2. Top view of the brain showing spatial discretization of the surfaceof the cortex into P = 15002 points. An active cluster of size Q1 = 221 (aradius of 2 cm) of negative polarity is shown in blue on the left hemisphere. Anactive cluster of size Q2 = 108 (a radius of 1.75cm) with positive polarity isshown in red on the right hemisphere (middle cluster). Another active clusterof size Q3 = 180 (a radius of 1.75cm) with negative polarity is shown inblue, also on the right hemisphere. The discretization in this figure is basedon a Brainstorm’s template head anatomy [54].

B. Theoretical Basis

To examine the implications of the adopted segment defini-tion on the characteristics of the EEG data, and in developingthe theoretical basis for the proposed segmentation technique,we take a bottom-up approach. That is, we first inspect thespace where neuronal activity occurs – the source space, andthen examine the implications of the adopted segment defi-nition on the characteristics of the EEG recordings obtainedin the sensor space. We start with the scenario where thereis one cluster of active cortical neurons over the duration ofa segment (e.g., the second segment in Fig. 1), and developthe theoretical basis for the proposed segmentation method.The more general case of multiple clusters of active neuronsis discussed in Appendix A.

1) Assumptions and Notations: In the source space, weassume the cortex to be spatially discretized into P points (seeFig. 2). The activation intensities at these P cortical points,observed over the duration {t1, t2, · · · , tT }, form the sourcematrix S ∈ RP×T . We refer to a cluster of Q adjacent corticalpoints with activation intensities remaining above a giventhreshold δ over the duration of a segment {t1, t2, · · · , tTseg

},as an active cluster of size Q. For the scenario where thereexist R active clusters, for a given cluster i (i = 1 · · ·R) ofsize Qi cortical points, we will use k to index a cortical pointin this cluster (k = 1, · · · , Qi), and tj to represent a timepoint during this segment (j = 1, · · · , Tseg). Cortical pointswith activities lower than the threshold δ will be consideredas background activities.

In the sensor space, we assume there are C electrodes onthe scalp. The EEG recordings obtained from these electrodes,observed over the duration {t1, t2, · · · , tT }, form the EEGmatrix Y ∈ RC×T . We assume the relation between matricesY and S follows the well-known model [27], [52], [53]

Y = G · S + N, (1)

where G ∈ RC×P represents the gain (or lead field) matrix,and N ∈ RC×T is the additive noise matrix.

2) Problem formulation in the Source Space: We assumethere exists a segment {t1, t2, · · · , tTseg

} within the course of

observation, during which, one cluster of cortical points ofsize Q1 � P stays active above a given threshold δ. Over theduration of the segment, the activation intensity at the corticalpoint k can be expressed as a stochastic process

{xk,t} = Dk · (1 + {ek,t}), (2)

where Dk denotes its average activation intensity over theentire interval, with the requirement that

|Dk| > δ > 0, (3)

and {ek,t} is a stochastic process, representing Dk-normalizeddeviation of activation intensity, with variance σ2

k,t and expec-tation µk,t, such that

σ2k,t + µ2

k,t � 1. (4)

Let X1 ∈ RQ1×Tseg , and E1 ∈ RQ1×Tseg be the randommatrices whose elements at index (k, j) are the randomvariables xk,tj and ek,tj , respectively. Also, let D1 ∈ RQ1×Q1

be a diagonal matrix with the constants Dk, on its diagonal.Equation (2) can then be written as

X1 = D1 · (11 + E1) = D1 · 11 + D1 · E1, (5)

where 11 is a Q1 × Tseg all-ones matrix.

We represent cortical points with background activities(activity lower than δ) as zero-mean stochastic processes {bl,t}(l = 1, · · · , P −Q1), with variance σ2

l,t � δ2 and expectationµl,t = 0. Let B1 ∈ RP−Q1×Tseg be the random matrix whoseelement at index (l, j) is the random variable bl,tj .

Without loss of generality, we arrange S1, such that theactivity at the Q1 cortical points within the active cluster isplaced in the first Q1 rows. The remaining rows of S1 are tocarry the background activity at the remaining P−Q1 corticalpoints. That is [27]

S1 =

[X1

B1

]=

[D1 · 11

01

]+

[D1 · E1

B1

], (6)

where 01 is a (P −Q1)× Tseg all-zeros matrix.

3) Problem formulation in the Sensor Space: Over theduration of this segment, the recordings obtained in the sensorspace form the matrix Y1 ∈ RC×Tseg . To match the structureof S1 in (6), we rearrange the columns of G in (1) as [27]

G = [g1 · · · gQ1|gQ1+1 · · · gP ] = [G1|G0]. (7)

Placing (6) and (7) in (1), and substituting the noise matrixN1 ∈ RC×Tseg , we obtain

Y1 = G · S1 + N1

= G1 · D1 · 11 + G1 · D1 · E1 + G0 · B1 + N1

= H1 + He1 + Hb1 + N1.

(8)

In (8), H1 = [h1 · · · h1] ∈ RC×Tseg , where h1 ∈ RC representsa linear combination of the gain vectors corresponding to thecortical points inside the active cluster, i.e., denoting an allones Q1 dimensional vector as ~11, we can write

h1 = G1 · D1 ·~11 = D1 · g1 + · · ·+DQ1· gQ1

. (9)

Page 4: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

Defining the jth column of He1 as hej ∈ RC ,

hej = G1 · D1 · ej= e1,tj ·D1 · g1 + · · ·+ eQ1,tj ·DQ1

· gQ1,

(10)

where ej ∈ RQ1 is the jth column of E1 and ek,tj is the kth

element of ej , we can write

He1 = [he1 · · · heTseg]. (11)

Similarly, taking Hb1 = [hb1 · · · hbTseg], where hbj ∈ RC , the

jth column of Hb1 , is a weighted combination of the gainvectors corresponding to the cortical points outside the activecluster, i.e.,

hbj = G0 · bj= b1,tj · gQ1+1 + · · ·+ bP−Q1,tj · gP ,

(12)

where bj ∈ RP−Q1 is the jth column of B1, and bl,tj , forl = 1, · · · , P −Q1, is the lth element of bj .

4) Further Formulation: Comparing (10) to (9), it can benoted that both h1 and hej are linear combinations of thesame vectors, namely, the columns of the matrix G1. Thisimplies that hej could still be well-correlated to h1. On theother hand, (12) shows that hbj is a linear combination ofsome different vectors, namely, the columns of the matrix G0.While the gain vectors constituting the columns of G0 and G1

correspond to two different sets of cortical points, these twosets are, generally, not orthogonal.

The gain vectors corresponding to the subset of corticalpoints in close proximity to the active cluster, at least, arewell-correlated to the gain vectors constituting the columns ofG1. The noise injected into the EEG channels, N1, can alsohave a non-zero component in the direction of h1. Taking thecomponents that are in the direction of h1 away from hej , hbj ,and N1, we can rewrite (8) as

Y1 = h1 · (aT1 + aTb1 + aTn1) + He1 + Hb1 + N1, (13)

whereaT1 = ~1

T

1 + (hT1 · h1)−1 · hT1 ·He1 , (14)

aTb1 = (hT1 · h1)−1 · hT1 ·Hb1 , (15)

aTn1= (hT1 · h1)−1 · hT1 · N1, (16)

He1 = He1 − h1 · (a1 −~11)T , (17)

Hb1 = Hb1 − h1 · aTb1 , and (18)

N1 = N1 − h1 · aTn1. (19)

Here, a1 ∈ RTseg represents the temporal variation of theoverall activation intensity of the active cluster, whereas thecolumns of He1 ∈ RC×Tseg , now all orthogonal to h1,correspond to the residual activation intensity variations withinthe cluster. Similarly, ab1 , aTn1

∈ RTseg represent the temporalvariation of the components of background activity and chan-nel noise, respectively, which are in the direction of h1. Hb1

and N1 are, thus, the components of background activity andchannel noise, respectively, orthogonal to h1.

5) Asymptotic Analysis: Given the definition we adoptedearlier for the EEG segment, and given that the gain vectorcorresponding to each cortical point is indeed a characteristicof its spatial location, it becomes intuitive to think of theaverage of the columns of G1, in (7), as a feature vectorcharacterizing the spatial distribution of the activated corticalpoints. That is, averaging the gain vectors associated with theactivated cortical points implies assigning equal weights tothem all, regardless of their instantaneous activation intensi-ties. It is expected that the cortical points in close proximityof each other have highly correlated gain vectors. Therefore,a significant portion of the total activation energy of anactive cluster would be concentrated along the direction of itsaverage gain vector. Components orthogonal to the directionof the average gain vector would, then, carry a relativelysmall fraction of the total activation energy. Asymptotically,as the correlation among the gain vectors corresponding to thecortical points within the active cluster increases to the pointthat all these vectors become identical to some exemplary gainvector g ∈ RTseg , which would, in this case, be the same asthe average of these gain vectors, i.e., as

gk → g, (20)

substituting (20) in (9) and (10), we obtain

h1 → α · g, and (21)

hej → βj · g, (22)

respectively, with

α = D1 + · · ·+DQ1 , and (23)

βj = e1,tj ·D1 + · · ·+ eQ1,tj ·DQ1 . (24)

From (21), (22), (11), (14), and (17), we get

He1 → 0, (25)

which implies that all the activation energy is concentrated inthe direction of g, the direction that h1 picked.

The results in (21), (22), and (25) can also be reachedasymptotically as the total activation energy gets compactedinto a smaller cluster of cortical points, approaching size of 1cortical point, with the gain vector g, i.e., as

Q1 → 1, (26)

only withα = D, and (27)

βj = etj ·D. (28)

Unfortunately, instead of the average of the columns ofG1, (13) shows that the most significant component that Y1

incorporates is h1, a weighted average of the columns of G1,as in (9). It is expected though, that the differences amongthe averaging weights, Dk, for all activated cortical points k,decrease as the distances among these points decrease, sincethe activation energy is more likely to be homogeneouslydistributed within smaller cortical regions. Asymptotically, asthe activation energy becomes homogeneously distributed overthe cortical points within the active cluster at any instant during

Page 5: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

the interval, i.e., going back to (10) as

ek,tj → etj and Dk → D, (29)

then, we again reach the results in (21), (22), and (25), onlynow with

g =1

Q1· (g1 + · · ·+ gQ1

), (30)

α = D ·Q1, and (31)

βj = etj ·D ·Q1, (32)

where g is the average of the gain vectors corresponding tothe cortical points within the active cluster.

In all of the asymptotic cases (20), (26), and (29), we endup with the conclusion that

Y1 → α · g · (aT1 + aTb1 + aTn1) + Hb1 + N1, or

Y1 → α · g · aT1 + Hb1 + N1.(33)

Note that depending on their statistical characteristics, theterms Hb1 and N1, the residual low background activityand residual noise, have their total energies accordinglyspread over spaces of dimensionalities dimb1 , dimn1 ≤min(C, Tseg)− 1, where subtracting 1 corresponds to exclud-ing the direction of g from these spaces. Assuming well-homogeneity in the way residual background and noise en-ergies are spread over the corresponding spaces, the effects ofHb1 and N1 can be ignored.

The minimum mean-squared-error (MMSE) estimate of theterm (α · g · aT1 ) can be determined from the rank 1 SVDapproximation of Y1, i.e.,

δ1 · u1 · vT1MMSE−−−−→ α · g · aT1 , (34)

where δ1 is the largest eigenvalue in the SVD decompositionof Y1, and u1 and v1 are the corresponding left and rightsingular vectors, respectively. The rank 1 approximation in(34) implies

u1MMSE−−−−→ ± g

‖g‖, (35)

v1MMSE−−−−→ ± a1

‖a1‖, and (36)

δ1MMSE−−−−→ α · ‖g‖ · ‖a1‖, (37)

where the pre-multiplied signs in (35) and (36) are tiedtogether. Ignoring the irrelevant sign ambiguity, (35) showsthat u1 is a normalized MMSE estimate of the average ofthe gain vectors corresponding to the cortical points withinthe active cluster, which can be used as a feature vectorcharacterizing the spatial distribution of the activated corticalpoints during the considered interval.

The scenario for multiple clusters can also be analyzed ina similar way, and is presented in Appendix A.

III. PROPOSED ALGORITHM

In this section, we first describe the segmentation algorithm.We will then theoretically discuss the effects of intensity varia-tion, background activity, as well as noise, on the performanceof the algorithm.

Fig. 3. The flowchart for the proposed segmentation algorithm.

A. Source-Informed Segmentation Algorithm

The proposed segmentation algorithm, shown in Fig. 3,takes three inputs: 1) the EEG matrix Y, 2) the initialreference window length Wr (which is also the length of thesliding window), and 3) the decision window length Wd. Thealgorithm proceeds as follows until it reaches the end of Y.

1) The current reference window length W is initializedto Wr. The consecutive possible segment boundariescounter, ωd, is initialized to 0.

2) The beginning edge of the reference window is set topoint to the first column of the unsegmented matrix.The columns under the reference window are saved intomatrix Z.

3) The SVD of Z is calculated to determine its column

Page 6: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

space and the weights of the corresponding basis vec-tors. Z belongs to the current segment and its columnspace corresponds to the underlying active clusters. TheSVD is expected to order the singular values, and thecorresponding singular vectors, in a descending order.

4) To select the most significant basis vectors in the columnspace, the singular values (starting from the first one)are searched for an elbow. The singular values down tothe elbow are taken as the most significant ones, andthe corresponding left singular vectors are chosen as thecurrent segment’s feature subspace {UR}.

5) The sliding window selects the Wr columns of the EEGmatrix next to the end of the current reference window.These columns are saved in matrix X.

6) Each column in Z and X is projected, separately, ontothe feature subspace, and the error is, accordingly, savedinto er and es, the reference, and sliding error samples,respectively. It is assumed that distinct segments likelyhave the significant parts of their activation energies con-centrated at different subspaces, since they correspondto different active clusters. Note that, ideally, the errorshould only account for: the residual activation intensityvariations within the underlying active clusters, the lowbackground activity, and additive noise, all orthogonal tothe feature space. The statistical characteristics of thesecomponents are less susceptible to variation throughoutthe segment.

7) The sliding error sample, es, is then tested against thereference error sample, er, using Kolmogorov-Smirnov(K-S) test [55]. This test makes no assumptions aboutthe shapes of the probability distributions of the testeddata. Yet, it requires relatively large sample sizes to workaccurately [55], [56].

a) If the two error samples are found different andeither ωd < Wd or the current p-value is less thanthe last p-value, then, the beginning of the slidingwindow is saved as a possible segment boundaryalong with its p-value. The corresponding counter,ωd, is incremented by 1. The current referencewindow length W , is also incremented by 1, andthe algorithm is resumed from step 2. A possiblesegment boundary is not confirmed until all possi-ble adjacent boundaries are explored.

b) Otherwise, one more decision regarding how thenumber of previously detected possible boundaries,ωd, compares to the length of the decision window,Wd, needs to be made. Aggregating at least Wd

consecutive decisions increases the reliability ofthe overall test by increasing the sizes of theerror samples, without requiring longer referenceand sliding windows. This allows for the mini-mum detectable segment length (which is equalto the initial length of the reference window Wr)to be small. Additionally, as the sliding windowstarts passing through a boundary, the capabilityof continuing to explore possible segment bound-aries even after Wd decisions have supported the

alternative hypothesis, minimizes the possibility ofdetecting the segment boundary earlier than it is,as long as the p-values continue to decline.i) If ωd < Wd, then none of the saved pos-

sible segment boundaries is confirmed. Thecolumns selected by the sliding window mostlybelong to the current segment. However, dueto the uncertainty regarding the actual numberof columns that belong to the current seg-ment, the current reference window length, W ,will be only incremented by 1. This way, thesegmentation algorithm would still be able tolocate segment boundaries with increments of 1time sample above the initial reference windowlength Wr. The segment boundary counter, ωd,is initialized to 0 and all previously savedpoints, if any, are cleared. The current referencewindow length, W , is incremented by 1 and thealgorithm is resumed from step 2.

ii) If ωd ≥ Wd, a segment boundary has beenconfirmed. The segment boundary is selectedfrom the previously saved possible boundariesas the one which gave the lowest p-value in theK-S test. All other saved points are cleared andthe algorithm is resumed from step 1.

B. Examining the Effect of Intensity Variation

Here, we examine the effect that the intensity variationsshown by active cortical points at instant tj , and captured byhej in (10), have on the proposed segmentation approach. Todo this, we compare the significance of the latter to that ofthe average activation intensity, represented by h1 in (9). Weconsider the correlation matrices of these components. As forthe correlation matrix of h1, it can be seen from (9) that

E(h1 · hT1 ) = G1 · D1 ·

1 · · · 1...

. . ....

1 · · · 1

· DT1 ·GT1 . (38)

Similarly, based on (10), the correlation matrix of hej , can becalculated as

E(hej ·hTej ) = G1 ·D1 ·

ρ21,tj · · · ρ1,Q1;tj

.... . .

...ρQ1,1;tj · · · ρ2

Q1,tj

·DT1 ·GT1 , (39)

whereρk,k;tj

= E(ek,tj · ek,tj ), (40)

for k, k = 1, · · · , Q1 and

ρ2k,tj = E(e2

k,tj ) = σ2k,tj + µ2

k,tj . (41)

Substituting (4) in (41) yields

ρ2k,tj � 1. (42)

Noting that, generally,

|ρk,k;tj| ≤ max(ρ2

k,tj , ρ2k,tj

), (43)

Page 7: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

it becomes clear that as long as

maxk

ρ2k,tj � 1 (44)

then, the component hej makes much less significant contri-bution than h1 to the EEG recordings at instant tj (compare(39) to (38)).

C. Examining the Effect of Background Activity

Here, we examine the effect that background activity atinstant tj , as captured by hbj in (12), have on the proposedsegmentation approach. To do this, we compare the signifi-cance of the latter to that of the average activation intensity,represented by h1 in (9). Again, we consider the correlationmatrices of these components. First, we start with rearranging(38) as follows

E(h1 · hT1 ) = G1 ·

D21 · · · D1 ·DQ1

.... . .

...DQ1

·D1 · · · D2Q1

·GT1 , (45)

where, for k, k = 1, · · · , Q1,

|Dk ·Dk| > δ2 > 0. (46)

Next, we determine the correlation matrix of hbj using (12)

E(hbj · hTbj ) = G0 ·

σ21,tj · · · σ1,P−Q1;tj

.... . .

...σP−Q1,1;tj · · · σ2

P−Q1,tj

·GT0 , (47)

where the correlation matrix of bj , a zero-mean random vector,was replaced by the covariance matrix of the latter. That is,

σl,l;tj = E(bl,tj · bl,tj ) = cov(bl,tj · bl,tj ), (48)

for l, l = 1, · · · , P −Q1. Again, generally, we can state that

|σl,l;tj | ≤ max(σ2l,tj , σ

2l,tj

), (49)

which when combined with

maxl

σ2l,tj � δ2, (50)

and given (46), we realize that the component hbj makesa much less significant contribution, than h1, to the EEGrecordings at instant tj (compare (47) to (45)).

D. Examining the Effect of Channel Noise

As discussed earlier, EEG recordings break down into thesum of two components: the first corresponds to the underlyingcortical activity reaching the EEG channels and the second isthe noise injected into EEG channels, i.e.,

Y1 = G · S1 + N1 = Z1 + N1. (51)

To quantify the impact of channel noise on the perceptionof cortical activity and therefore, on the proposed source-informed segmentation approach, we examine how these twocomponents project onto the feature space – a main step inthe proposed segmentation algorithm (see Section III-A).

Given an arbitrary, normalized vector w ∈ RC , i.e., wT ·w =1, the energy of the EEG recordings segment in the directionof w is

εy = (wT · Y1) · (wT · Y1)T = wT · (Y1 · YT1 ) · w. (52)

To represent it statistically, we turn to the expected value

E(εy) = wT · E(Y1 · YT1 ) · w= wT · (var(Z1) + var(N1) + 2 · cov(Z1,N1)

+ E(Y1) · E(Y1)T ) · w.(53)

Since the additive noise injected into the EEG channels isindependent of the underlying cortical activity, their covariance

cov(Z1,N1) = 0. (54)

Assuming the expected value of the channel noise to be zero,we can write

E(Y1) = E(Z1) + E(N1) = E(Z1). (55)

Substituting (54) and (55) into (53), we obtain

E(εy) = wT · (E(Z1 · ZT1 ) + var(N1)) · w= E(εz) + E(εn),

(56)

where E(Z1 · ZT1 ) + var(N1) is the sum of two symmetricmatrices, and hence, is also symmetric.

At a given signal-to-noise ratio (SNR),

SNR = 10 · log(

tr(E(Z1 · ZT1 ))

tr(var(N1))

), (57)

where tr(·) refers to the trace of a matrix, the off-diagonalelements of var(N1), i.e., the covariances of the noise signalson different pairs of channels, do not contribute to the overallSNR. This means that the projected noise component, E(εn),is expected to make a, generally, lower impact on the totalprojected energy E(εy), for a given SNR and an arbitraryw, when var(N1) is diagonal, i.e., when the noise signals ondifferent channels are uncorrelated. It is also fair to assumethat these noise signals have identical variance, σ2

n, i.e.,

var(N1) = σ2n · I1, (58)

where I1 is the C × C identity matrix. Therefore, the effectof noise on the perceived cortical activity would be minimalin that all the eigenvectors, wz , of the correlation matrixE(Z1 · ZT1 ) would be preserved and that the correspondingeigenvalues, λz , would only be shifted by σ2

n, without a changeto their significance order. In terms of the eigenvalue problem

E(Y1 ·YT1 ) ·wy = (E(Z1 ·ZT1 )+var(N1)) ·wy = λy ·wy, (59)

where λy and wy are the eigenvalues and eigenvectors ofE(Y1 · YT1 ), respectively. Substituting (58) into (59) yields

(E(Z1 · ZT1 ) + σ2n · I1) · wz = (λz + σ2

n) · wz. (60)

Therefore,λy = λz + σ2

n and wy = wz. (61)

Pre-multiplying (60), or equivalently (59), by wTz yields

wTz · E(Y1 · YT1 ) · wz = λz + σ2n. (62)

Page 8: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

As mentioned in Section III-A, the feature space used bythe proposed segmentation algorithm corresponds to the mostsignificant, i.e., maximum energy, left singular subspace ofthe readings matrix. Maximizing the expected total projectionenergy E(εy) over all possible directions of w, we reach theRayleigh quotient

maxw

E(εy) = wTzmax· E(Y1 · YT1 ) · wzmax

= λzmax + σ2n,

(63)

where λzmaxis the maximum eigenvalue of E(Z1 · ZT1 ), the

correlation matrix of the underlying cortical activity, and thevector wzmax

achieving the optimal point is the correspondingeigenvector. Therefore, the channel noise characterized by (58)and (54) preserves the feature space corresponding to theunderlying cortical activity.

IV. SIMULATION RESULTS

A simulation platform is created to extensively evaluate theperformance of the proposed segmentation algorithm for adiverse range of possible scenarios. In this section, we firstdescribe the simulation setup, and then discuss the results.

A. Simulation Setup and Performance Measures

1) Apparatus and Data Generation: The proposed segmen-tation algorithm is expected to identify boundaries in the EEGdata (in the sensor space), based on changes in the state of theunderlying neuronal activity (in the source space), followingthe definition provided in Section II-A. Therefore, for sim-ulations, the data is first generated in the source space fora variety of scenarios, and then the forward model is used toconstruct the corresponding EEG data in the sensor space. Theproposed segmentation algorithm is then applied to the sensorspace EEG data. Following this approach, as the ground-truthboundaries of segments are already known, the performanceof the segmentation algorithm can be later assessed. TheEEG is believed to be mainly generated by the inhibitoryand excitatory post-synaptic potentials of cortical pyramidalnerve cells, which are spatially aligned perpendicular to thecortical surface [52]. Therefore, in simulations, we restrict thegeneration of the source space data to the cortical surface,with the orientation of the neuronal activity being normal tothe surface.

The gain matrix is generated using the Brainstorm tool-box [54], based on the symmetric boundary element method(BEM). A template head anatomy called “ICBM152”, ob-tained from a non-linear average of the head anatomies of152 subjects, is used. The surface of the cortex is discretizedinto 15002 points, (Brainstorm’s default setting).

To ensure that the data generated in the source space isa reasonable approximation to realistic scenarios in terms ofintensity, the probability mass function (PMF) for neuronalactivity is created and used. Without loss of generality, thePMF is estimated from an exemplary experimental data corre-sponding to the stimulus-locked event-related potential (ERP)of the target trials of a visual oddball task (see Section V-A), with a duration of 500 ms (post stimulus). For each point

Fig. 4. The PMF of neuronal activity estimated from the ERP of the visualoddball task recordings of one subject. Three regions are identified: the PMFof the high-intensity cortical activity with negative polarity (left tail, blue),the PMF of high-intensity cortical activity with positive polarity (right tail,red), and the PMF of the low-intensity background activity (center, gray).

of time throughout the duration of the ERP, the whitened `2-minimum norm estimate (wMNE) algorithm is employed toestimate the neural activity in the source space. The PMF isthen obtained from the histogram of the estimated neuronalactivity over the entire cortex and for the whole duration ofthe ERP. Fig. 4 illustrates an example of the PMF obtainedfrom the ERP of one subject. Three regions are also identified.The positive and negative tails can be used to generate “active”(high intensity) clusters with positive and negative polarities,respectively, and the central part of the PMF can be used togenerate the low background activity in regions of the cortexwhere there are no “active” clusters.

For a given simulation run, using a combination of ran-domized and controlled variables that will be described inSection IV-A.2, the cortical activity corresponding to eachEEG segment is generated as follows. The cortical pointswithin each active cluster are assigned randomized signalsbased on one of the two tails of the PMF (depending on thepolarity of the cluster). The low background activity shownby the remaining cortical points is randomized based on thecentral part of the PMF. The concatenation of the simulatedactivity of all cortical points construct the source matrix.Following (1), the source matrix is then pre-multiplied by thegain matrix, and is transformed to the sensor space, wherezero-mean white Gaussian noise, at a given signal-to-noiseratio (SNR), is added to each electrode.

2) Simulation Variables: To extensively evaluate the perfor-mance of the proposed segmentation algorithm, the followingvariables, controlling different aspects of the simulated activeclusters during a given segment, are considered:

-Segment length: Uniformly distributed over [20, 60] samples.

-Total active cortical area: Uniformly distributed over[4π, 12π] cm2. The total activated area is randomly dividedamong the active clusters, with the restriction that a clustercannot assume less than 10% of the total activated area, toavoid extremely small clusters.

-Number of active clusters: Uniformly distributed over [0, 4].With the range considered for the total activated area, [1, 4]active clusters will have radii ranging, non-uniformly, from0.63 cm to 3.46 cm. Consecutive segments with no activeclusters are not allowed.

Page 9: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

           

T1 T3T2

           

Fig. 5. Visual illustration of successfully detected and failed boundaries.The solid vertical lines represent the ground-truth boundaries. The dashedvertical lines represent the middle points between consecutive segments. Greenmarkers represent boundaries detected successfully by the algorithm. Theground-truth boundary located at T2 is a missed boundary, since the redboundary detected by the algorithm is located at a distance more than halfthe segment length from its corresponding ground-truth.

-Locations of active clusters: Active clusters are simulated ascircular patches that are placed randomly on the cortex, withtwo restrictions: clusters should not overlap within the samesegment, and for any two adjacent segments, clusters shouldnot be identical (specifically, no more than 50% overlapping).

-Polarity of active clusters: Positive or negative. The polaritychooses which PMF tail to use when generating the cluster.

Additionally, in the sensor space, the following two vari-ables are considered:

-Number of EEG electrodes: Simulations are mostly carriedout based on a 128-electrode setting in the sensor space. Forthe simulations in sections IV-A.4 and IV-B.5, 64- and 32-electrode configurations are also considered.

-SNR: After transforming each simulated source matrix intoan EEG matrix using the forward model, zero-mean whiteGaussian noise at a given SNR is added to the signal at eachelectrode. SNR values of 0, 2, 4, 6, and 8 dB are considered.

3) Performance Measures: Knowing the ground-truthboundaries in the source space, the following measures areused to evaluate the performance of the algorithm in eachsimulation run.

-Success Rate: A boundary detected by the algorithm isconsidered a success, if it is located within less than half thesegment length, before or after the ground-truth boundary. Thesuccess rate is defined as the ratio of successfully detectedboundaries to the total number of ground-truth boundaries.

-Failure Rate: A ground-truth boundary that is not detectedby the algorithm (missed) based on the definition of success,or a falsely-detected boundary that does not correspond toa ground-truth boundary (excess), is noted as failed. Thefailure rate is defined as the ratio of failed boundaries to thetotal number of ground-truth boundaries. It should be notedthat, because of excess boundaries, the number of detectedboundaries can exceed the number of ground-truth boundaries,and therefore, the failure rate is not the complement of thesuccess rate. In other words, the failure rate is expressed as

fail.% = 100%−succ.%+#exess boundaries

#ground-truth boundaries×100%. (64)

-Average Estimation Displacement (µED): The estimationdisplacement is defined as the absolute distance, in samples,between the successfully detected boundary and its corre-sponding ground-truth boundary. In a given simulation run,

the average estimation displacement is obtained by finding thearithmetic mean of the estimation displacements of all the suc-cessfully detected boundaries. This measure provides insighton how far, on average, the estimated segment boundaries arelocated from the ground-truth boundaries.

-Standard Deviation (SD) of Estimation Displacement (σED):This measure provides insight on the spread of the distances,between the estimated and actual boundaries around the av-erage estimation displacement µED. Fig. 5 visually illustratessuccessfully detected and failed boundaries.

4) On the Input Parameters of the Algorithm: The proposedalgorithm requires two input parameters: the initial referencewindow length, Wr, and the decision window length, Wd. Toget perspective on good working ranges for these parameters,we devise the following procedure. Let the aggregate rate be

aggr.% = (succ.%− fail.%) ≤ 100%. (65)

This measure, combines both the success and the failurerates to penalize missed boundaries twice as hard as excessboundaries. Such a bias is appropriate, as a missed boundarywould fuse two adjacent segments into one. Since the inputparameters have to work for signals with different SNRs, weuse the “average” of the aggregate rates, obtained across allsimulated SNRs (see Section IV-A.2), to look for optimalvalues of Wr and Wd.

We also examined 128-, 64-, and 32-electrode settings,separately. Input parameters in the range of Wr ∼ [5, 20]samples and Wd ∼ [1, 20] samples are considered here.Following the procedure described in Section IV-A.1, for eachelectrode setting and at each SNR value, 125 trials, eachconsisting of 11 segments, were simulated. Data for eachsegment was generated by randomizing all 5 variables in thesource space (see Section IV-A.2). For each simulated trial andfor each possible (Wr,Wd) choice, the proposed algorithm isthen applied and the aggregate rate is computed and averagedacross all SNRs.

Figs. 6(a)–(c) show the color-coded averaged aggregaterates, for 128-, 64-, and 32-electrode settings, respectively,over all possible (Wr,Wd) pairs. The white dot on eachplot indicates the location where the choice of (Wr,Wd) hasmaximized the averaged aggregate rate. It can be seen thatthe optimal (Wr,Wd) for 128-, 64-, and 32-electrode settingsare (19, 11), (20, 8), and (20, 6), respectively. As these inputparameters are obtained by maximizing the “averaged” aggre-gate rate, in Fig. 6(d) we show how much the performancewould be improved if the optimal (Wr,Wd) “at each SNR”were used as input parameters, instead. It can be seen that onlyslight enhancements (less than 3%) are achieved. As such, forthe rest of the paper, we will use the parameters correspondingto the white dots as inputs to the segmentation algorithm.

B. Simulation Results

1) Results for Variable Segment Length: The effect of seg-ment length on the performance of the proposed segmentationalgorithm is investigated here. Four choices for the segmentlength were considered: 30, 40, 50, and 60 samples. For each

Page 10: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

(a) (b)

(c) (d)

Fig. 6. The aggregate performance (AP) of the segmentation algorithm,averaged over the even-valued SNRs in the range [0, 8] dB, for: (a) 128-electrode setting, (b) 64-electrode setting, and (c) 32-electrode setting. Whitedots are placed where the measure is maximum. (d) Enhancement in theaggregate performance offered by using the optimal (Wr,Wd) at each SNR.

choice of segment length and at each SNR value, 2500 trials,each consisting of 11 segments, were generated. Each segmentwas constructed by assigning random values to the sourcespace variables (other than the segment length) describedin Section IV-A.2 (i.e., the total activated cortical area, andthe number, location, and polarity of active clusters). Thesegmentation algorithm was then applied and the performancemeasures were calculated.

Fig. 7 shows the success rate, the failure rate, and theaverage and SD of the estimation displacement, for each ofthe four choices of segment length, as functions of SNR. Forall four choices, at the lowest SNR value of 0 dB, the successrate stays above 85%, and the average estimation displacementstays below 4 samples. Note that the performance improvesas the SNR increases. When comparing the performance ofthe algorithm across the four segment length choices, oneinteresting observation to make is that as segments becomelonger, both the success rate and the failure rate increase. Theincrease in the failure rate could be due to the fact that assegments get longer, the probability of falsely detecting excessboundaries also increases. Yet, the failure rate stays below24% for the lowest tested SNR, and decreases as the SNRincreases. Overall, these results indicate that for different seg-ment lengths, the proposed segmentation algorithm performsreasonably well, identifying boundaries associated with thechange of state in the source space data.

2) Results for Variable Total Active Cortical Area: Here weexamine the question of how the performance of the proposedsegmentation algorithm is influenced by the total active areain the source space. Five choices for the total active area(4π, 6π, 8π, 10π, and 12π cm2) are considered. For eachchoice of active area, and at each SNR value, 2500 trials, eachconsisting of 11 segments, were generated. Each segment was

(a) (b)

(c) (d)

Fig. 7. (a) Success rate, (b) failure rate, (c) average estimation displacement(µED), and (d) SD of the estimation displacement (σED) of the segmentationalgorithm as a function of SNR, for a setting of 128 electrodes, and segmentlengths of 30, 40, 50, and 60 samples.

constructed by assigning random values to the source spacevariables (other than total active area) described in SectionIV-A.2 (i.e., segment length, and the number, location, andpolarity of active clusters). The segmentation algorithm wasthen applied and the performance measures were calculated.

Fig. 8 shows the success rate, the failure rate, and theaverage and SD of the estimation displacement for each ofthe five choices of total active area, as functions of SNR. It isobserved that a success rate higher than 80% and an averageestimation displacement lower than 3 samples are achievedeven for the lowest SNR and smallest total active area tested.

As the total active area per segment increases, the per-formance of the algorithm improves, with the success rateincreasing, and the failure rate decreasing. Note that increasingthe total active area corresponds to increasing the area of theindividual active clusters, which in turn, results in boostingthe aggregate cortical signals, traveling from these clusters tothe EEG electrodes in the sensor space. Yet, for small totalactive areas, the algorithm still performs reasonably well.

3) Results for Variable Number of Active Clusters: Here,we examine the effect of the number of active clusters onthe performance of the segmentation algorithm. Four choicesfor the number of active clusters in a segment (1, 2, 3,or 4) are considered. Note that the 0-active cluster scenariocorresponds to that where there is only background activity.For each choice of the number of active clusters, and at eachSNR value, 2500 trials, each consisting of 11 segments, weregenerated. Each segment was constructed by assigning randomvalues to the source space variables (other than the numberof active clusters) described in Section IV-A.2 (i.e., segmentlength, total active area, and the location and polarity of activeclusters). The segmentation algorithm was then applied, andthe performance measures were calculated.

Page 11: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

(a) (b)

(c) (d)

Fig. 8. (a) Success rate, (b) failure rate, (c) µED , and (d) σED of thesegmentation algorithm as a function of SNR, for a setting of 128 electrodes,and for total active cortical areas of 4π, 6π, 8π, 10π, and 12π cm2.

Fig. 9 shows the success rate, the failure rate, and theaverage and SD of the estimation displacement, as functionsof SNR, for each of the four choices of the number of activeclusters. The lowest obtained success rate is larger than 94%,and corresponds to the scenario where the number of activeclusters is set to 4 at the SNR of 0 dB. From these simulationresults, it is observed that as the number of active clustersper segment increases, the success rate and the failure rate,both, slightly decrease. This leads to the conclusion that anincrease in the number of active clusters slightly decreases thesensitivity of the segmentation algorithm, causing it to detectfewer true boundaries, as well as fewer false boundaries. Thisresult can be connected to the way the segmentation algorithmestimates the feature space. For a given segment, the estimatedfeature space captures the most significant directions, whichare expected to span the space of the feature vectors ofthe individual active clusters. However, as the number ofactive clusters increases, the estimated feature space maynot accordingly expand in dimensionality, leaving some ofthe weaker components of the individual feature vectors un-spanned, and therefore, unrecognizable.

Another important conclusion to be drawn from the resultsof this simulation is related to the considerable enhancementin the success and failure rates of the segmentation algorithm,as compared to the remaining simulations presented in thissection (IV-B). When simulating the source matrix in theremaining of the this section, the number of active clusterswas randomly chosen to be an integer between 0 and 4, whilehere, 0 active clusters was not considered. The enhancement inthe performance of the algorithm in the absence of segmentswith 0 active clusters suggests that a considerable ratio of themissed and excess boundaries possibly occurred around/withinsegments with no active clusters. Such behavior is expectedfrom the segmentation algorithm, since the feature spaces of

(a) (b)

(c) (d)

Fig. 9. (a) Success rate, (b) failure rate, (c) µED , and (d) σED of thesegmentation algorithm as a function of SNR, for a setting of 128 electrodes,and for a number of active clusters equal to 1, 2, 3, or 4.

segments with 0 clusters are selected based on the randomlow background activity. Furthermore, the error generated byprojecting these segments onto the feature spaces of the neigh-boring segments may not produce a distinct significant residualpart, which lowers the chances of detecting the boundariesaround segments with 0 active clusters.

4) Results for Variable Inter-Segment Cluster Distance:Here, we examine the effect of varying the distances among theactive clusters across adjacent segments on the performance ofthe segmentation algorithm. For simplicity, we only considerthe case of a single active cluster per segment. Considering therange specified for the total active cortical area in Section IV-A.2, the radius of a single active cluster would be between 2and 3.46 cm. The active clusters of two adjacent segments areplaced such that the distance between the centers of these twoclusters is set to a certain value (referred to, here, as the inter-segment distance). Four choices for the inter-segment distance(4, 6, 8, and 10 cm) are considered.

For each choice of the inter-segment distance, and at eachSNR value, 2500 trials, each consisting of 11 segments, weregenerated. Each segment was constructed by assigning randomvalues to the source space variables described in Section IV-A.2 (i.e., segment length, total active area, and the polarityof the active cluster). The segmentation algorithm was thenapplied and the performance measures were calculated.

Fig. 10 shows the success rate, the failure rate, and theaverage and SD of the estimation displacement, as functionsof SNR, for each of the four choices of the inter-segmentdistances. The results of this simulation show that as the inter-segment distances between the active clusters in consecutivesegments increase, the success and the failure rates of thesegmentation algorithm significantly enhance. Note that it isgenerally expected that as two active clusters get spatially

Page 12: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

(a) (b)

(c) (d)

Fig. 10. (a) Success rate, (b) failure rate, (c) µED , and (d) σED of thesegmentation algorithm as a function of SNR, for a setting of 128 electrodes,and for inter-segment active cluster distances of 4, 6, 8, and 10 cm.

further apart from each other, the correlation between the gainvectors of the two would decrease. As such, the significantparts of the activation energy in the two segments, withspatially distant active clusters, would become concentratedin less overlapping spaces, making them more differentiable.

5) Overall: We now examine the impact of the number ofEEG electrodes on the performance of the proposed segmen-tation algorithm. For each choice of electrode setting (32, 64or 128), and at each SNR, 12500 trials, each consisting of 11segments, were generated. Each segment was constructed byassigning random values to all the five source space variablesdescribed in Section IV-A.2. The segmentation algorithm wasthen applied and the performance measures were calculated.

Fig. 11 shows the success rate, the failure rate, and theaverage and SD of the estimation displacement, as functionsof SNR, for each EEG electrode setting. It can be seen thatas the number of electrodes in the sensor space increases, thesegmentation algorithm tends to achieve higher success ratesand lower failure rates. This indicates that a higher numberof EEG electrodes in the sensor space provides more accurateinformation about the location and the spatial spread of theunderlying activated cortical regions [57].

V. EXPERMENTAL RESULTS

We now examine the performance of the proposed segmen-tation algorithm using experimental data. Unlike in the earliersimulation, ground-truth knowledge about the state changes inthe source space is no longer available for the experimentaldata. Therefore, for the purpose of validation, we compare theperformance of the proposed algorithm with a clustering-basedtechnique performed in the source space. We first describethe data collection process, and then present and discuss thesegmentation results.

(a) (b)

(c) (d)

Fig. 11. (a) Success rate, (b) failure rate, (c) µED , and (d) σED of thesegmentation algorithm as a function of SNR, for the 128-, 64-, and 32-channel data formats.

A. Data Aquisition and Preprocessing

EEG data was collected from 5 healthy, right-handed males(ages ranging between 20 and 35 years), while performing amodified visual odd-ball task [4], [58]. All participants pro-vided written informed consents approved by the InstitutionalReview Board (IRB) of Rutgers University.

The paradigm consisted of displaying a randomized se-quence of three visual stimuli [+, ◦, �] on a computer monitor.Each stimulus was presented for 68 ms, with an inter-trialinterval (ITI) of 1200−2000 ms. Overall, there were 100 plussign (+) stimuli for target trials, 100 circle (◦) stimuli for rarenon-target trials, and 300 square (�) stimuli for common non-target trials. Subjects were instructed to press a button whena target stimulus appeared on the screen.

EEG recordings were collected using a 10/5-standard 128-channel EEG system (Brain Products, Germany), at a samplingrate of 1000 sample/sec. The EEG data was then preprocessedusing EEGLAB [59]. The data was first filtered using a band-pass finite impulse response (FIR) filter, with lower and highercutoff frequencies of 0.5 Hz and 50.5 Hz, respectively, thendownsampled to 500 sample/sec. Major artifacts related to eyeblink or muscle activities were removed using the independentcomponent analysis (ICA) technique. For every subject, theaverage ERPs for the target, rare, and common stimuli werecomputed, and the grand average ERPs across all 5 subjectswere calculated for each stimulus.

B. Description of Validation Approach: Segmentation in theSource Space

Unlike the simulated scenario, the ground-truth segmentalstructure for experimental data is not known. Given that thedefinition of a segment used in this paper is based on thestate of active clusters in the source space, for the purpose

Page 13: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

of validation, we will compare the outcome of the proposedsegmentation algorithm to the outcome of an alternative seg-mentation approach performed in the source space. Simi-lar to the proposed segmentation technique, the alternativeapproach, referred to, here, as the source-localization-basedsegmentation technique, uses a growing reference window anda non-overlapping, fixed-length sliding window. The initialreference window length (equal to the sliding window length)is, again, an input parameter. However, it is important tonote that this alternative approach is heavily dependent on theaccuracy of the source localization algorithm used. Therefore,the outcomes of the source-localization-based segmentationalgorithm should not be treated as the ground-truth.

To apply the source-localization-based segmentation algo-rithm, the temporal activity at each cortical point, over theduration of the EEG data, is estimated using a linear sourcelocalization technique to construct the source matrix. Here,we used the wMNE source localization algorithm, providedby the Brainstrom toolbox. After transforming the data tothe source space, boundaries associated with the instants ofchanges in the spatial distribution of active cortical pointsshould be determined. To accomplish this, first, an intensitythreshold should be determined, in order to differentiate activecortical points from those points showing only backgroundactivity. To determine this threshold, the absolute intensities ofall the cortical points under the reference window are passedto a 2-means algorithm, in order to group them into twoclasses of high and background activity. The threshold valueis then set to the average intensity that separates these twoclasses. This threshold is then also applied to the data under thesliding window to determine the active cortical points underthis window. The threshold is updated every time the referencewindow is updated.

A significant shift in the location of active cortical pointsis also likely to shift their centroid. Therefore, to determinechanges in the spatial distribution of active clusters, we usethe measure of centroid. Centroids of topographical maps havealso been previously used in [60] for microstate analysis. Thecoordinates of the centroid of the active cortical points, ateach time sample, under the reference window and the slidingwindow are calculated. The average of the coordinates of thecentroids under the reference window (referred to, here, as theaveraged coordinates) is also obtained.

The displacements between the averaged coordinates andthe centroids under the reference window, on one hand, andbetween the averaged coordinates and the centroids under thesliding window, on the other, form two statistical pools. Thetwo pools are passed to the K-S test to determine whetherthey are significantly different. If a significant difference isdetected, then a segment boundary is marked at the beginningof the sliding window. If no significant difference is detected,the length of the reference window is increased by 1, theactivation threshold is re-calculated, the sliding window ismoved by 1 sample, and the process is repeated until the endof the data is reached.

TABLE IK-S TEST p-VALUES FOR THE BOUNDARIES DETECTED IN FIG. 12 BY THE

SOURCE-INFORMED AND THE SOURCE-LOCALIZATION-BASEDALGORITHMS.

Source-Informed Source-Localization-BasedI Target Rare Common Target Rare Common1 1.34e-08 7.81e-12 5.96e-12 1.85e-07 3.55e-05 6.50e-092 2.59e-11 1.58e-10 2.80e-12 3.66e-07 2.73e-02 4.78e-063 5.59e-08 9.44e-10 3.60e-11 6.81e-10 3.66e-07 5.36e-074 6.81e-10 3.70e-10 2.28e-13 9.44e-10 4.22e-06 2.59e-115 2.40e-10 2.28e-13 2.59e-11 2.40e-10 1.03e-07 5.09e-116 1.07e-10 1.15e-12 3.73e-13 7.66e-09 3.13e-03 1.57e-097 2.22e-12 1.15e-12 N/A 1.88e-11 1.34e-08 1.77e-128 1.17e-10 N/A N/A 1.58e-09 5.09e-11 1.07e-109 N/A N/A N/A 1.34e-08 1.36e-08 N/A

TABLE IIPERFORMANCE OF THE SOURCE-INFORMED AND CENA ALGORITHMS,

APPLIED IN FIG. 12, WITH THE OUTCOMES OF THESOURCE-LOCALIZATION-BASED ALGORITHM TAKEN AS ESTIMATED

GROUND-TRUTH.

Algorithm Succ.% Fail.% µED [msec] σED [msec]Source-Informed 80.56 19.44 13.92 8.30

CENA 39.35 60.65 10.77 5.10

C. Segmentation Results

The outcomes of the proposed source-informed, source-localization-based, and the CENA segmentation algorithms,for the grand-average ERPs of the target, rare, and commontrials are shown on the top row of Figs. 12(a), 12(b), and 12(c),respectively. For each ERP, the segment boundaries estimatedby these algorithms are shown using different colors (red forthe proposed algorithm, blue for CENA, and black for thesource-localization-based algorithm). Based on the results ofSection IV-A.4, the initial reference window length for boththe source-informed and source-localization-based algorithmsand the time-lag for the CENA algorithm were all chosen asWr = 19 samples (38 msec). For the source-informed segmen-tation method, the decision window was chosen as Wd = 11samples (22 msec). Table I lists the p-values of the K-S testat the boundaries in the ERPs of Fig. 12 detected by boththe source-informed and source-localization-based algorithms.Taking the source-localization-based algorithm as an estimateof the ground-truth, the performances of the proposed source-informed, and the CENA algorithms are compared in TableII.

D. Exploring Activities in the Source Domain

To examine the outcomes of the proposed segmentationalgorithm in the source space, cortical activities, estimatedby the wMNE source localization algorithm, averaged overthe interval of each segment of the target, rare, and commonERPs are shown in the bottom row of Figs. 12(a), 12(b), and12(c), respectively. As an estimate for the intensity level ofbackground activity (to be removed from the cortical activitymaps) pre-stimulus cortical activity is used. To achieve this,for each ERP, the PMF of the absolute pre-stimulus corticalactivity over the period [−100, 0] msec is, first, empiricallycalculated, and then a global threshold on cortical activity isdetermined by imposing some false positive (α) rate on thePMF. Here, we use α = 0.001.

Page 14: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

(a) (b) (c)

Fig. 12. Grand average ERP segmentation results (top), and the corresponding sequence of cortical activity maps (bottom), for the first 500 msec post-stimulusof a) target, b) rare-non target, and c) common-non target trials. For each case, ERP segmentation is done using three approaches: the proposed source-informedalgorithm (detected segment boundaries are shown in red), CENA (detected segment boundaries are shown in blue), and source-localization-based algorithm(detected segment boundaries are shown in black). The time instants of detected boundaries for each segmentation approach are also indicated in red (forthe proposed algorithm), blue (for CENA), and black (for source-localization-based algorithm). Segment boundaries at 0 and 500 msec are common amongall algorithms. At the bottom, average cortical activity maps for the segment intervals detected by the proposed source-informed algorithm are shown. Brainviews from top to bottom rows are: superior, anterior, left, right, posterior, and inferior. Temporal sequencing of the brain maps from left to right columnscorresponds to the order of segments, i.e., the intervals between two segment boundaries.

One can notice that a unique sequence of dynamic patternsof cortical activity, from the time of stimulus presentation to500 msec later, is obtained for each ERP. In all three ERPs,highly activated regions start to appear on the occipital cortexduring the interval [100, 200] msec – an expected indication ofthe phase of processing visual stimuli. It is observed that thelevel of occipital activity is evidently lower in the commonERP than it is in the other two ERP classes. An importantsignature of the oddball paradigm is the P300 componentshowing in the target ERP around 300 msec after the onsetof the stimulus. Examining the cortical activity during thesegments leading to the P300 peak of the target ERP, we canclearly see high activity in the parietal and frontal corticesduring the interval [230, 360] msec. Such cortical activity canbe related to the phase of stimulus-evaluation. In comparison,the non-target ERPs show much lower activity on the parietaland frontal regions during this period. This might reflectthe increased attentiveness that the subjects exhibit whenresponding to the target stimuli.

VI. DISCUSSIONS

The proposed source-informed segmentation algorithm ad-dresses the problem of segmenting EEG data on the basis ofchanges occurring in the spatial distribution of the underlyingcortical activity, without requiring the transformation of EEGdata into the source space, where the activity of interest takesplace. As the segmentation is performed in the sensor space, itoffers the advantage of not requiring to go through the processof source localization. While various techniques have beenproposed for solving the ill-posed inverse problem of EEG

source localization [61]–[66], the accuracy of the reconstructedactivities in the source space could heavily be dependenton the choice of the inverse method [67], as well as thechoice of the head model [57]. As structural information forthe head model is not provided through the EEG modality,and due to the inherent complexity of the involved inverseproblem, here, we seek to avoid going to the source space.Our theoretical analysis along with extensive simulation resultsshowed that the information contained in EEG recordings,relevant to the underlying cortical activity, can be used directlyto estimate segment boundaries, without the need for acquiringextra information about the underlying structure. Coupledwith functional connectivity analysis [8], [9], [20], [21], theproposed segmentation algorithm can provide new insightsabout the dynamics of brain function in action.

While in this paper, ERP-based experimental data was used,the proposed algorithm is not restricted to ERP data, and canbe applied to segment single-trial EEG, as demonstrated in[20], [21]. For the proposed source-informed segmentationalgorithm to support real-time processing (which could haveapplications in brain computer interfaces) without compromis-ing its fidelity to the segment definition, we have adopted ablock-wise sequential processing paradigm. As suggested by(88), for the algorithm to be able to capture the full featurespace within a segment, the segment length is required to coverat least as many time samples as the number of active clusters.The numbers of simultaneous, active clusters in Figs. 12(a)12(b), and 12(c) shows that this condition is not very hardto meet. In the algorithm, the minimum recognizable segmentduration is controlled through the width of the initial referencewindow. This puts the width of the reference window at the

Page 15: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

center of a trade-off, since the width of the initial referencewindow also controls the reliability of the decisions madeby the algorithm (see step 7 in the algorithm described inSection III-A). To enhance the reliability of decision makingwithout requiring wide reference windows, we used a separatedecision window that aggregates the consecutive boundarydecisions to analyze their persistence. Beyond the minimuminterval requirement imposed by the initial reference window,the proposed algorithm can still recognize segment boundariesat increments of single time-samples due to the single-sampleincrements of the reference window length and the searchfor the most probable boundaries throughout the consecutivepossible boundaries (see step 7 of Section III-A). Having tolook ahead of the tested instant, for the interval of the slidingwindow, to search for changes, and having a decision windowchecking the persistence of the probable boundaries, meansthat the algorithm is bound to wait for at least the sum ofthe intervals of these two windows before giving back a finaldecision on an estimated boundary.

As a final note, the reference window length dictates theminimum detectable segment duration. Fig. 6 provides guide-lines for selecting the reference and decision windows, basedon the expected performance of the segmentation algorithm.The widths of these windows are reported in terms of samples.Therefore, depending on the sampling rate, the widths of thesewindows can be translated into time intervals to estimate thehighest observable frequency. To allow for the analysis ofdata containing higher frequency oscillations, either narrowerreference windows or higher data sampling rates can be used.

VII. CONCLUSIONS

In this paper, we presented a new method that segmentsthe EEG recordings into intervals during which, the spatialdistribution of the underlying active cortical neurons staysquasi-stationary, while their activation intensities can exhibitsome variations. Extensive simulations are carried out andthe results confirm that the proposed method is successful inaccurately detecting the segmental structures of the simulatedEEG data under a wide range of possible scenarios. Resultsof applying the proposed algorithm to real EEG recordingswere comparable to the results of applying an alternativesegmentation approach.

Informed by the underlying neuronal activity, the proposedalgorithm offers a new way for studying the dynamics of brainfunction from EEG recordings. The segmentation algorithmcan have applications in several domains. For example, itcan be used to study the dynamics of functional connectivityduring task execution by meaningfully breaking the ERP intoa few segments, constructing graphs for each segment, andinvestigating the network properties. Comparing the segmen-tation outcomes (e.g., in terms of the number of segments,duration of segments) when applied to the ERPs obtained fromhealthy and patient groups could also result in the developmentof new biomarkers for brain-related diseases.

ACKNOWLEDGMENT

This work was supported by DARPA grant W911NF-16-1-0096, and Siemens. We thank Li Zhu, Tianjiao Zeng, Yunqi

Wang, and Anthony Yang for their help in conducting theexperiments, and preparing the manuscript.

REFERENCES

[1] C. Wilke, L. Ding, and B. He, “Estimation of time-varying connectivitypatterns through the use of an adaptive directed transfer function,” IEEETransactions on Biomedical Engineering, vol. 55, no. 11, pp. 2557–2564, 2008.

[2] C. Chang and G. H. Glover, “Time–frequency dynamics of resting-statebrain connectivity measured with fMRI,” NeuroImage, vol. 50, no. 1,pp. 81–98, 2010.

[3] R. M. Hutchison, et al., “Dynamic functional connectivity: promise,issues, and interpretations,” NeuroImage, vol. 80, pp. 360–378, 2013.

[4] N. Karamzadeh, et al., “Capturing dynamic patterns of task-basedfunctional connectivity with EEG,” NeuroImage, vol. 66, pp. 311–317,2013.

[5] C. Chang, et al., “EEG correlates of time-varying BOLD functionalconnectivity,” NeuroImage, vol. 72, pp. 227–236, 2013.

[6] J. Zhang, et al., “Inferring functional interaction and transition patternsvia dynamic bayesian variable partition models,” Human Brain Mapping,vol. 35, no. 7, pp. 3314–3331, 2014.

[7] L. Zhu and L. Najafizadeh, “Does brain functional connectivity alteracross similar trials during imaging experiments?” in IEEE SignalProcessing in Medicine and Biology Symposium (SPMB), 2014, pp. 1–4.

[8] A. Haddad and L. Najafizadeh, “Multi-scale analysis of the dynamics ofbrain functional connectivity using EEG,” in IEEE Biomedical Circuitsand Systems Conference (BioCAS), 2016, pp. 240–243.

[9] ——, “Source-informed segmentation: Towards capturing the dynamicsof brain functional networks through EEG,” in 50th Asilomar Conferenceon Signals, Systems and Computers, 2016, pp. 1290–1294.

[10] J. K. Grooms, et al., “Infraslow Electroencephalographic and DynamicResting State Network Activity,” Brain Connectivity, vol. 7, no. 5, pp.265–280, 2017.

[11] A. G. Mahyari, et al., “A Tensor Decomposition-Based Approach forDetecting Dynamic Network States From EEG,” IEEE Transactions onBiomedical Engineering, vol. 64, no. 1, pp. 225–237, 2017.

[12] F. Vecchio, et al., ““Small World” architecture in brain connectivity andhippocampal volume in Alzheimer’s disease: a study via graph theoryfrom EEG data,” Brain Imaging and Behavior, vol. 11, no. 2, pp. 473–485, 2017.

[13] M. Breakspear, “Dynamic models of large-scale brain activity,” NatureNeuroscience, vol. 20, no. 3, p. 340, 2017.

[14] G. Deco, et al., “The dynamics of resting fluctuations in the brain:metastability and its dynamical cortical core,” Scientific Reports, vol. 7,no. 1, p. 3095, 2017.

[15] J. E. Chen, M. Rubinov, and C. Chang, “Methods and Considerationsfor Dynamic Analysis of Functional MR Imaging Data,” NeuroimagingClinics, vol. 27, no. 4, pp. 547–560, 2017.

[16] H. Xie, et al., “Whole-brain connectivity dynamics reflect both task-specific and individual-specific modulation: a multitask study,” NeuroIm-age, 2017.

[17] R. Schmalzle, et al., “Brain connectivity dynamics during social in-teraction reflect social network structure,” Proceedings of the NationalAcademy of Sciences, vol. 114, no. 20, pp. 5153–5158, 2017.

[18] E. A. Allen, et al., “EEG Signatures of dynamic functional networkconnectivity states,” Brain topography, pp. 1–16, 2017.

[19] C. Jin, et al., “Dynamic brain connectivity is a better predictor of PTSDthan static connectivity,” Human Brain Mapping, vol. 38, no. 9, pp.4479–4496, 2017.

[20] A. Haddad and L. Najafizadeh, “Recognizing task-specific dynamicstructure of the brain function from EEG,” in IEEE 15th InternationalSymposium on Biomedical Imaging (ISBI), 2018, pp. 712–715.

[21] A. Haddad, F. Shamsi, and L. Najafizadeh, “On the SpatiotemporalCharacteristics of Class-Discriminating Functional Networks,” in Inter-national Conference of the IEEE Engineering in Medicine and BiologySociety (EMBC), 2018.

[22] B. He, et al., “Grand challenges in mapping the human brain: NSF work-shop report,” IEEE Transactions on Biomedical Engineering, vol. 60,no. 11, pp. 2983–2992, 2013.

[23] A. Banerjee, et al., “Study on brain dynamics by non linear analysisof music induced eeg signals,” Physica A: Statistical Mechanics and itsApplications, vol. 444, pp. 110–120, 2016.

[24] C. M. Michel and T. Koenig, “Eeg microstates as a tool for studyingthe temporal dynamics of whole-brain neuronal networks: A review,”NeuroImage, 2017.

Page 16: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

[25] M. X. Cohen, “Assessing transient cross-frequency coupling in EEGdata,” Journal of Neuroscience Methods, vol. 168, no. 2, pp. 494–499,2008.

[26] S. I. Dimitriadis, et al., “Tracking brain dynamics via time-dependentnetwork analysis,” Journal of Neuroscience Methods, vol. 193, no. 1,pp. 145–155, 2010.

[27] A. E. Haddad and L. Najafizadeh, “Global EEG segmentation usingsingular value decomposition,” in 37th Annual International Conferenceof the IEEE Engineering in Medicine and Biology Society (EMBC),2015, pp. 558–561.

[28] R. Aufrichtigl, S. B. Pedersen, and P. Jennum, “Adaptive segmentationof EEG signals,” in International Conference of the IEEE Engineeringin Medicine and Biology Society, 1991, pp. 453–454.

[29] G. Bodenstein and H. M. Praetorius, “Feature extraction from theelectroencephalogram by adaptive segmentation,” Proceedings of theIEEE, vol. 65, no. 5, pp. 642–652, 1977.

[30] L. Wong and W. Abdulla, “Time-frequency evaluation of segmentationmethods for neonatal EEG signals,” in 28th Annual International Confer-ence of the IEEE Engineering in Medicine and Biology Society (EMBS),2006, pp. 1303–1306.

[31] R. Biscay, et al., “Maximum a posteriori estimation of change pointsin the EEG,” International Journal of Bio-medical Computing, vol. 38,no. 2, pp. 189–196, 1995.

[32] B. E. Brodsky, et al., “A nonparametric method for the segmentation ofthe EEG,” Computer Methods and Programs in Biomedicine, vol. 60,no. 2, pp. 93–106, 1999.

[33] J. Fell, et al., “EEG analysis with nonlinear deterministic and stochasticmethods: a combined strategy.” Acta Neurobiologiae Experimentalis,vol. 60, no. 1, pp. 87–108, 2000.

[34] A. Kaplan, et al., “Macrostructural EEG characterization based onnonparametric change point segmentation: application to sleep analysis,”Journal of Neuroscience Methods, vol. 106, no. 1, pp. 81–90, 2001.

[35] A. A. Fingelkurts, et al., “Cortex functional connectivity as a neurophys-iological correlate of hypnosis: an EEG case study,” Neuropsychologia,vol. 45, no. 7, pp. 1452–1462, 2007.

[36] A. Y. Kaplan, et al., “Nonstationary nature of the brain activity asrevealed by EEG/MEG: methodological, practical and conceptual chal-lenges,” Signal Processing, vol. 85, no. 11, pp. 2190–2212, 2005.

[37] H. Hassanpour and M. Shahiri, “Adaptive segmentation using wavelettransform,” in International Conference on Electrical Engineering(ICEE), 2007, pp. 1–5.

[38] Z. J. Wang, P. W.-H. Lee, and M. J. McKeown, “A novel segmentation,mutual information network framework for EEG analysis of motortasks,” Biomedical Engineering online, vol. 8, no. 1, p. 9, 2009.

[39] R. F. Betzel, et al., “Synchronization dynamics and evidence for arepertoire of network states in resting EEG,” Frontiers in ComputationalNeuroscience, vol. 6, p. 74, 2012.

[40] A. Y. Mutlu and S. Aviyente, “Subspace analysis for characterizingdynamic functional brain networks,” in IEEE International Conferenceon Acoustics, Speech and Signal Processing (ICASSP), 2013, pp. 1272–1276.

[41] A. Khanna, et al., “Microstates in resting-state EEG: current status andfuture directions,” Neuroscience & Biobehavioral Reviews, vol. 49, pp.105–113, 2015.

[42] T. Koenig, et al., “A deviant EEG brain microstate in acute, neuroleptic-naive schizophrenics at rest,” European Archives of Psychiatry andClinical Neuroscience, vol. 249, no. 4, pp. 205–211, 1999.

[43] ——, “Millisecond by millisecond, year by year: normative EEG mi-crostates and developmental stages,” NeuroImage, vol. 16, no. 1, pp.41–48, 2002.

[44] P. Milz, et al., “The functional significance of EEGmicrostates—associations with modalities of thinking,” NeuroImage,vol. 125, pp. 643–656, 2016.

[45] F. Musso, et al., “Spontaneous brain activity and EEG microstates. Anovel EEG/fMRI analysis approach to explore resting-state networks,”NeuroImage, vol. 52, no. 4, pp. 1149–1161, 2010.

[46] R. D. Pascual-Marqui, C. M. Michel, and D. Lehmann, “Segmentationof brain electrical activity into microstates: model estimation and vali-dation,” IEEE Transactions on Biomedical Engineering, vol. 42, no. 7,pp. 658–665, 1995.

[47] H. Yuan, et al., “Spatiotemporal dynamics of the brain at rest—exploringEEG microstates as electrophysiological signatures of BOLD restingstate networks,” NeuroImage, vol. 60, no. 4, pp. 2062–2072, 2012.

[48] S. Cacioppo, et al., “Dynamic spatiotemporal brain analyses usinghigh performance electrical neuroimaging: theoretical framework andvalidation,” Journal of Neuroscience Methods, vol. 238, pp. 11–34, 2014.

[49] S. Cacioppo and J. T. Cacioppo, “Dynamic spatiotemporal brain analysesusing high-performance electrical neuroimaging, part ii: a step-by-steptutorial,” Journal of Neuroscience Methods, vol. 256, pp. 184–197, 2015.

[50] S. Dimitriadis, N. Laskaris, and A. Tzelepi, “On the quantization of time-varying phase synchrony patterns into distinct Functional ConnectivityMicrostates (FCµstates) in a multi-trial visual ERP paradigm,” Braintopography, vol. 26, no. 3, pp. 397–409, 2013.

[51] D. Lehmann, et al., “EEG microstate duration and syntax in acute,medication-naive, first-episode schizophrenia: a multi-center study,”Psychiatry Research: Neuroimaging, vol. 138, no. 2, pp. 141–156, 2005.

[52] H. Hallez, et al., “Review on solving the forward problem in EEG sourceanalysis,” Journal of Neuroengineering and Rehabilitation, vol. 4, no. 1,p. 46, 2007.

[53] B. J. Edelman, B. Baxter, and B. He, “EEG source imaging enhancesthe decoding of complex right-hand motor imagery tasks,” IEEE Trans-actions on Biomedical Engineering, vol. 63, no. 1, pp. 4–14, 2016.

[54] “Brainstorm.” [Online]. Available:http://neuroimage.usc.edu/brainstorm/

[55] F. J. Massey Jr, “The Kolmogorov-Smirnov test for goodness of fit,”Journal of the American statistical Association, vol. 46, no. 253, pp.68–78, 1951.

[56] Z. W. Birnbaum, “Numerical tabulation of the distribution of Kol-mogorov’s statistic for finite sample size,” Journal of the AmericanStatistical Association, vol. 47, no. 259, pp. 425–441, 1952.

[57] J. Song, et al., “EEG source localization: sensor density and head surfacecoverage,” Journal of Neuroscience Methods, vol. 256, pp. 9–21, 2015.

[58] K. A. Kiehl, et al., “Reduced P300 responses in criminal psychopathsduring a visual oddball task,” Biological Psychiatry, vol. 45, no. 11, pp.1498–1507, 1999.

[59] “EEGLAB.” [Online]. Available: https://sccn.ucsd.edu/eeglab/[60] J. Wackermann, et al., “Adaptive segmentation of spontaneous EEG

map series into spatially defined microstates,” International Journal ofPsychophysiology, vol. 14, no. 3, pp. 269–283, 1993.

[61] C. Phillips, M. D. Rugg, and K. J. Friston, “Anatomically informedbasis functions for EEG source localization: combining functional andanatomical constraints,” NeuroImage, vol. 16, no. 3, pp. 678–695, 2002.

[62] R. Grech, et al., “Review on solving the inverse problem in EEG sourceanalysis,” Journal of Neuroengineering and Rehabilitation, vol. 5, no. 1,p. 25, 2008.

[63] Z. Liu and B. He, “fMRI–EEG integrated cortical source imaging byuse of time-variant spatial constraints,” NeuroImage, vol. 39, no. 3, pp.1198–1214, 2008.

[64] Y. Lu, et al., “Noninvasive imaging of the high frequency brain activityin focal epilepsy patients,” IEEE Transactions on Biomedical Engineer-ing, vol. 61, no. 6, pp. 1660–1667, 2014.

[65] A. Al Hilli, L. Najafizadeh, and A. P. Petropulu, “A Weighted Approachfor Sparse Signal Support Estimation with Application to EEG SourceLocalization,” IEEE Transactions on Signal Processing, vol. 65, no. 24,pp. 6551–6565, 2017.

[66] F. Costa, et al., “Bayesian EEG source localization using a structuredsparsity prior,” NeuroImage, vol. 144, pp. 142–152, 2017.

[67] K. Mahjoory, et al., “Consistency of EEG source localization andconnectivity estimates,” NeuroImage, vol. 152, pp. 590–601, 2017.

APPENDIX

A. Multiple Active Clusters

Here, we extend the theoretical analysis presented in SectionII, where a single active cluster was considered, to a moregeneral scenario where there are multiple clusters of activecortical neurons in a given segment (e.g., the first and thirdsegments shown in Fig. 1).

1) Problem Formulation: Let SR ∈ RP×Tseg be the sourcematrix for a discrete-time interval {t1, t2, · · · , tTseg} duringwhich, R clusters of cortical points of sizes Q1, · · · , QR � Pstay activated. As in (6) we write

SR =

D1 · (11 + E1)

...DR · (1R + ER)

BR

=

D1 · 11

...DR · 1R

0R

+

D1 · E1

...DR · ER

BR

, (66)

Page 17: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

where Di ∈ RQi×Qi is a diagonal matrix with the averageactivation intensities of the cortical points within the ith

active cluster on its diagonal, Ei ∈ RQi×Tseg carries theDi-normalized deviations of the activation intensities at thecortical points within the ith active cluster over that interval,1i is a Qi×Tseg all-ones matrix, 0R is a (P−Q1−· · ·−QR)×Tseg all-zeros matrix, and BR ∈ R(P−Q1−···−QR)×Tseg givesthe low background activity of the cortical points outside theactive clusters. As in (7), the gain matrix can be partitionedinto

G = [G1| · · ·GR|G0], (67)

where Gi ∈ RC×Qi are the partitions of G corresponding tothe cortical points within the corresponding active clusters,and G0 ∈ RC×(P−Q1−···−QR) is the partition correspondingto the points outside these clusters. YR ∈ RC×Tseg , the EEGmatrix over the considered interval, can be expressed as

YR = G · SR + NR= (H1 + · · ·+ HR) + (He1 + · · ·+ HeR) + HbR + NR= HΣ · 1Σ + HE + HbR + NR,

(68)

where Hei = Gi ·Di · Ei, HbR = G0 · BR, HE = He1 +· · · + HeR , NR ∈ RC×Tseg is the matrix of additive noiseinjected into the EEG channels over the considered interval,and Hi ∈ RC×Tseg is

Hi = Gi · Di · 1i = [hi · · · hi], (69)

with hi ∈ RC a linear combination of the gain vectorscorresponding to the cortical points inside the ith activecluster, 1Σ an R× Tseg all-ones matrix, and HΣ ∈ RC×R

HΣ = [hi · · · hR]. (70)

Taking the components that belong to the column space ofHΣ away from the HE , HbR , and NR terms gives

YR = HΣ · (AT + ATbR + ATnR) + HE + HbR + NR, (71)

whereAT = 1Σ + (HT

Σ ·HΣ)−1 ·HTΣ ·HE , (72)

ATbR = (HTΣ ·HΣ)−1 ·HT

Σ ·HbR , (73)

ATnR= (HT

Σ ·HΣ)−1 ·HTΣ · NR, (74)

HE = HE −HΣ · (AT − 1Σ), (75)

HbR = HbR −HΣ · ATbR , and (76)

NR = NR −HΣ · ATnR. (77)

Here, the ith row of AT ∈ RR×Tseg gives the temporalvariation of the overall activation intensity of the ith activecluster, whereas the columns of HE ∈ RC×Tseg , now all areorthogonal to the space of HΣ , correspond to the total residualactivation intensity variations within all active clusters. Simi-larly, the ith rows of ATbR ,A

TnR∈ RR×Tseg correspond to the

temporal variations of the components of background activityand channel noise, respectively, which are in the space of HΣ .HbR and NR are, thus, the components of background activityand channel noise, respectively, that are in the direction

orthogonal to the space of HΣ . Given that usually R < C,the columns of HΣ are, generally, linearly independent, asthey represent linear combinations of different sets of gainvectors corresponding to the cortical points within differentactive clusters. This suggests the existence of (HT

Σ ·HΣ)−1.2) Asymptotic Analysis: It is intuitive, again, to view the

average of the gain vectors corresponding to the cortical pointswithin each active cluster as a feature vector characterizingthat cluster. However, the EEG matrix YR, in (68), onlyincorporates h1, · · · ,hR, a set of weighted averages of thecolumns of G1, · · · ,GR, respectively. Following logic similarto that applied earlier, an extended version of the result in (34)can be reached. Asymptotically, as the correlation among thegain vectors corresponding to the cortical points within theith active cluster increases to the point that they all becomeidentical to some exemplary gain vector fi ∈ RC , whichwould, in this case, be the same as the average of these gainvectors, i.e., as

Gi → [fi · · · fi], (78)

then, substituting in (68) through (77) we get

HΣ → F ·α, and (79)

HE → 0, (80)

whereF = [f1 · · · fR], (81)

α =

tr(D1). . .

tr(DR)

. (82)

Note that, given (78), the columns of HE would be nothingbut linear combinations of those of HΣ . Therefore, projectingHE onto the column space of HΣ would leave no residualmatrix, as stated by (80).

The results in (79), and (80) can also be achieved, asymp-totically, as the total energy activating cluster i gets compactedinto a smaller cluster, approaching size of 1 cortical point, withthe gain vector fi, i.e., as

Qi → 1, (83)

yet with

α =

D1

. . .DR

, (84)

where Di is now the average activation intensity of the singlecortical point constituting the entire ith active cluster.

Similarly, as the energy activating cluster i becomes ho-mogeneously distributed over the cortical points within thecorresponding cluster at any instant during the consideredinterval, i.e., as

Ei →

εTi...εTi

and Di → Di · Ii, (85)

where Ii is the Qi ×Qi identity matrix, we reach the resultsin (79), and (80), with fi as the average of the gain vectors

Page 18: Source-Informed Segmentation: A Data-Driven Approach for ...eceweb1.rutgers.edu/~laleh/files/tbme.pdf · Source-Informed Segmentation: A Data-Driven Approach for the Temporal Segmentation

corresponding to the ith active cluster, i.e.,

fi =1

Qi·Gi ·~1i, and (86)

α =

D1 ·Q1

. . .DR ·QR

. (87)

Given any of the asymptotic conditions in (78), (83), or (85),we reach the conclusion

YR → F ·α · (A + AbR + AnR)T + HbR + NR,

YR → F ·α · AT + HbR + NR, or

YR → [(F ·α · a1) · · · (F ·α · aTseg)] + HbR + NR,

(88)

where ai ∈ RR is the ith column of AT

. Given that, usually,R < C, the vectors fi, while most likely correlated, are,generally, linearly independent, as they correspond to thecortical points within different active clusters. However, ratherthan our target feature vectors fi themselves, the columnsof YR comprise Tseg different linear combinations of thetarget feature vectors over the considered interval. Given that,usually, Tseg > R, these linear combinations are most likelysufficient to span the entire column space of F. Since thecolumn space of F is fully determined by the target featurevectors fi, it is suggested, here, that this space be treatedas a feature space, characterizing the spatial distribution ofthe activated cortical points. Note that the terms HbR andNR, the residual low background activity and residual channelnoise, respectively, have their total energy spread in spaceswith dimensionalities dimbR , dimnR

≤ min(C, Tseg) − R,where R is the dimensionality of F. Given the assumption thatR < C, and that the way the residual background energy andresidual channel noise are spread over these spaces is well-homogeneous, the terms HbR and NR can be ignored next.

The minimum mean-squared-error (MMSE) estimate of theterm (F ·α · AT ) can be determined from the rank R singularvalue decomposition (SVD) approximation of YR, i.e.,

UR ·∆R · VTRMMSE−−−−→ F ·α · AT , (89)

where ∆R is an R-dimensional diagonal matrix with thelargest eigenvalues in the SVD decomposition of YR, andUR and VR are the corresponding left and right singularmatrices, respectively. A basis for the suggested feature spacecan be obtained from the column space of UR, the mostsignificant R-dimensional left singular subspace of YR. Prac-tically, though, the temporal variation courses of the overallactivation intensities of the different active clusters can behighly correlated, resulting in highly correlated vectors ai.Thus, a subspace of the linear combinations of fi can assumea much lower significance, making it indistinguishable fromthe column space of (HbR + NR). Hence, as a feature space,only the left singular subspace corresponding to the subset ofthe R most significant singular values should be considered.