classifying event-related desynchronization in eeg, ecog, and meg signals kim sang-hyuk

Classifying Event-Related Desynchronization in EEG, ECoG, and

MEG Signals

Kim Sang-Hyuk

BioimagiBioimagingng

• Introduction

• Experimental setup and procedure

• Preanalysis

• Data processing

• Generalization error estimation

Contents


Introduction

• Several different technologies exist for measuring brain activity– They have their own advantages and limitations

– Spatial and temporal resolution

– Cost, portability and risk to the user

• Comparative studies are required in order to guide

• Motor-imagery BCI experiments based on Electroencephalography (EEG), electrocorticography (ECoG) and magnetoencephalography (MEG)

• A simple binary synchronous (trial-based) paradigm

• Present quantitative results focusing on– The effect of the number of trial

– The effect of spatial filtering


Introduction

• EEG– Electrical signals are measured by passive electrodes

– Very high temporal resolution

– Low cost, risk, and portability

– Limitation of spatial resolution

• ECoG– Electrical signals obtained from an array of electrodes beneath the skull

– High SNR

– A better response at higher frequencies

– Invasive

• MEG– Measuring the tiny magnetic field fluctuations induced by the electrical activity of

cerebral neurons

– Expensive and nonportable


Experimental Setup and Procedure

• EEG– 8 untrained right handed male subjects

– 39 silver chloride electrodes

– Sampling frequency: 256Hz

– The subjects were seat in an armchair at 1-m distance in front of a computer screen

: used for data acquisition

: reference

Positions of electrodes


Experimental Setup and Procedure

• Each trial started with a blank screen

• A small fixation cross displayed in the center of the screen from second 2 to 9

• At 2s, a short warning tone (beep)

• At 3s, the fixation cross was overlaid with an arrow at the center of the monitor for 1.5s– The direction of arrow point either to the left or to the right

• In order to avoid event related signals in later processing stages, only data from seconds 4 to 9 of each trial was considered


Preanalysis

• In order to identify and exclude subjects that did not show significant μ-activity at all

• Restricted to only the 17 EEG channels that were located over or close to the motor cortex– Calculate of the μ-band using the Welch method (short time Fourier transform) for

each subject

• This feature extraction resulted in one parameter per trial and channel

• The eight data sets consisting of the Welch-features were classified with linear support vector machines including individual model selection for each subject

• Generalization errors were estimated by 10-fold cross validation (CV)

• For three subjects the preanalysis showed very poor error rates close to chance level, their data sets were excluded from further analysis


Preanalysis

Short Time Fourier Transform (STFT)

• A Fourier-related transformation used to examine the frequency and phase content of local sections of a signal over time

• Discrete-time STFT

– W[n] is the window function

– Window is sliding along time axis

Examples of window overlap


Preanalysis


Examples of STFT



• 5 segment for a trial, overlapping 50%

• Averaging the spectra of 5

• A vector of log amplitudes at different frequencies for each sensor

Preanalysis

55

A trial

55

A vector

Averaging


Autoregressive (AR) Model

• AR(p) model is defined as

– Where are the parameters of the model

– P is order

• The output is modeled as a linear combination of P past values of the output

• For the remaining five subjects, the recorded 5s windows of each trial resulted in a time series of 1280 sample points per channel

• AR model of order 3 is fitted to the time series of all 39 channels using forward backward linear prediction– The three resulting coefficients per channel and trial formed the new

representation of the data

– The extraction of the features did not explicitly incorporate prior knowlede

– They are not directly linked to the μ-rhythm

Data Preprocessing


Support Vector Machine

Linear Support Vector Machine

• Choose a decision boundary between classes such that margin is maximized– Margin: the distance in feature space between the boundary and the nearest

data points (support vectors)

Linearly separable case




• The function of hyperplane

– : weight vector normal to hyperplane

– : threshold

• The distance of a point from a hyperplane

0( ) 0Tg x w x w

w

( )g xd

w

0w




• Scale so that the value of , at the support vectors, is equal to 1 for S1 and equal to -1 for S2

– Margin:

–

–

• Compute the parameters , of the hyperplane so that to:– Minimize

– Subject to to where corresponding class indicator (+1 for , -1 for )

w ( )g x

2

w

S2

S1

0 11,Tw x w x S

0 21,Tw x w x S

w 0w21

( )2

J w w

0( ) 1, 1,2,...,Tiy w x w i N iyw 0w




• The Karush-Kuhn-Tucker (KKT) conditions

– is the vector of the Lagrange multipliers

– is the Lagrangian function defined as

• Finally results are

0( , , ) 0L w ww

00

( , , ) 0L w ww

0, 1,2,...,i i N

0[ ( ) 1] 0, 1,2,...,Ti iy w x w i N

0 01

1( , , ) [ ( ) 1]

2

NT T

i ii

L w w w w y w x w

0( , , )L w w

1

N

i i ii

w y x

1

0N

i ii

y



Soft Margin Support Vector Machine

• In the case where the classes are not separable, soft margin support vector machine is available

• The training feature vectors categorized into three cases– Vectors that fall outside the band and are correctly classified

– Vectors falling inside the band and which are correctly classified

– Vectors that are misclassified00 ( ) 1T

iy w x w

0( ) 0Tiy w x w




• All three cases can be treated under a single type of constraints

– The first category of data:

– The second:

– The third:

• The goal is to make the margin as large as possible but at the same time to keep the number of points with as small as possible

• Cost function

– Where is the vector of the parameters

0( ) 1Ti iy w x w

0i

0 1i 1i

2

01

1( , , ) ( )

2

N

ii

J w w w C I

i

1 0( )

0 0i

ii

I

0i




• The parameter C is a positive constant that controls the relative influence of the two competing terms

• Optimization of the cost function is difficult due to a discontinuous function– A closely related cost function

– Minimize

– Subject to

• Depending on C, the optimal margin will widen and more points will become support vectors– Finding a good value for C is part of the model selection procedure

2

01

1( , , )

2

N

ii

J w w w C

0[ ] 1 , 1,2,...,

0, 1,2,...,

Ti i

i

y w x w i N

i N


Generalization Error Estimation

K-Fold Cross Validation

• A statistical method for validating a predictive model

• Whole data is separated into k subsets (folds) of equal size

• Each fold is also divided into k subsets and k subsets are categorized into train set and test set– K-1 subsets are used for training of classifier

– 1 set is used for validation

• Model training and evaluation is repeated k times with each of the k subsets

An example of 5-fold cross-validation


Contents of Next Lecture

• Feature Selection Method– Fisher criterion

– Zero-norm optimization

– Recursive feature elimination (RFE)

• Results in EEG

• Procedure and results in ECoG

• Procedure and results in MEG

• Overview of results in EEG, ECoG and MEG

classifying event-related desynchronization in eeg, ecog, and meg signals kim sang-hyuk

Documents