functional brain signal processing: eeg & fmri lesson 2

Functional Brain Signal Processing: EEG & fMRI

Lesson 2

Kaushik Majumdar

Indian Statistical Institute Bangalore Center

[email protected]

M.Tech. (CS), Semester III, Course B50

EEG Processing

Preprocessing

Pattern recognition

EEG Artifacts

Benbadis and Rielo, 2008: http://emedicine.medscape.com/article/1140247-overview

Eye Blink Artifact: Electrooculogram (EOG)

Benbadis and Rielo, 2008: http://emedicine.medscape.com/article/1140247-overview

Matrix Representation of Multi-Channel EEG

M is an m x n matrix, whose m rows represent m EEG channels and n columns represent n time points.

Often during EEG processing we are to find a matrix W such that WM is the processed signal.

EOG Identification by Principal Component Analysis (PCA)

Majumdar, under preparation, 2013

PCA Algorithm (cont.)

PCA Algorithm (cont.)

PCA

Rotation and (Stretching or Contracting)

Performance of PCA in EOG Removal

Wallstrom et al., Int. J. Psychophysiol., 53: 105-119, 2004

EOG

Independent Component Analysis (ICA)

In PCA data components are assumed to be mutually orthogonal, which is too restrictive.

Original data sets

PCA components

ICA (cont.)

PCA will give poor results if the covariance matrix has eigenvalues close to each other.

ICA as Blind Source Separation (BSS)

S1 S4

Four musicians are playing in a room.

From the outside only music can be heard

through four microphones.

No one can be seen.

How the music heard from outside can be

decomposed into four sources?

S2 S3

2 4

3

1

Mathematical Formulation

A is mixing matrix, x is sensor vector, s is source vector and n is noise, which is to be eliminated by filtering.

Mathematical Formulation (cont.)

Given find such that

Any estimation technique of is called an ICA technique or BSS technique in general.

ICA Algorithm: FastICA

Whitening:

Normalization (make mean zero).

Make variance one i.e.,

E expectation, x is the vector of signals and I is identity matrix.

Hyvarinen and Oja, Neural Networks, 13: 411-430, 2000

FastICA (cont.)

B is orthogonal matrix and D is diagonal matrix of E

will satisfy

Whitening complete

Non-Gaussianity

ICA is appropriate only when probability distribution of the data set is non-Gaussian.

Gaussian distribution is of the form

Entropy of Gaussian Variable

A Gaussian variable has the largest entropy among a class of random variables with equal variance (for a proof see Cover & Thomas, Elements of Information Theory). Here we will give an intuitive argument.

Entropy of a Random Variable X

( ) ( ) log ( )En X p X p X dX

0 1 2 3 4 5 6 7-1

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

t

X =

sin

(10t

)

Deterministic

0 100 200 300 400 500 600 7000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1Random

t

X =

ran

dom

(t)

More informationLess (zero) information

Gaussian Random Variable Has Highest Entropy: Intuitive Proof

By Central Limit Theorem (CLT) the mean of a class of random variables (class is signified by uniform variance) follows normal distribution as the number of members in the class tends to infinity (i.e., becomes very large).

Infinite observations hold infinite or maximum amount of information.

Intuitive Proof (cont.)

Therefore a random variable with normal distribution has the highest information content.

So it has the highest entropy.

If each variable in a class of random variables admit only finite number of nonzero values, the one with uniform distribution will have the highest entropy.

Non-Gaussianity as Negentropy

H is entropy and J negentropy. J is to be maximized. When J is maximum y is reduced to a component. This can be shown by calculating the kurtosis for component and sum of components including the said component (See Hyvarinen & Oja, 2000, P. 7).

2

Steps of FastICA after Whitening

g is in the form of either of the two

Exercise

FastICA has been implemented in EEGLAB (in runica function). Remove artifacts from sample EEG data using the ICA implementation in EEGLAB.

Concept of Independence in PCA and ICA

In PCA independence means orthogonality i.e., pairwise dot product is zero.

In ICA independence is statistical independence. Let x, y be random variables, p(x) is probability distribution function of x and p(x,y) is joint probability distribution function of (x,y). If p(x,y) = p(x).p(y) holds we call x and y are statistically independent.

Independence (cont.)

If vectors v1 and v2 are orthogonal they are independent. Say not, then a1v1 + a2v2 = 0 implies, a1v1.v1 + a2v2.v1 = 0 or a1 = 0. Similarly a2 = 0.

If v1 = cv2 then both of them must have same probability distribution or p(v1,v2) = p(v1) = p(v2). If v1 and v2 are linearly independent p(v1,v2) = p(v1).p(v2) may or may not hold.

If p(v1,v2) = p(v1).p(v2) holds then v1 and v2 are linearly independent.

Conditions for ICA Applicability

Sources are statistically independent. Propagation delays in the mixing medium are

negligible. Sources are time varying. Mixing medium delays may affect sources in different locations differently and thereby corrupting their temporal structures.

Number of sources = number of sensors.

References

Benbadis and Rielo, EEG artifacts, eMedicine, available online at http://emedicine.medscape.com/article/1140247-overview, 2008.

Hyvarinen and Oja, Independent component analysis: algorithms and applications, Neural Networks, vol. 13, p. 411-431, 2000.

Majumdar, A Brief Survey of Quantitative EEG Analysis, Chapter 2.

THANK YOU

This lecture is available at http://www.isibang.ac.in/~kaushik

functional brain signal processing: eeg & fmri lesson 2

Documents

highest entropy

x n matrix

gaussian distribution

orthogonal matrix

covariance matrix

matrix w

identity matrix

largest entropy