determining the optimal feature for two classes motor...

International Journal of Mechanical & Mechatronics Engineering IJMME-IJENS Vol:19 No:01 132

I J E N SIJENS © February 2019 IJENS -IJMME-6363-011920

Determining the optimal feature for two classes

Motor-Imagery Brain-Computer Interface (L/R-MI-

BCI) systems in different binary classifiers Nibras Abo Alzahab1, Hassan Alimam2, MHD Shafik Alnahhas3, Ali Alarja4 and Zuheir Marmar5.

1. Biomedical Engineer, Biomedical Engineering Department, Faculty of Mechanical and Electrical Engineering, Damascus

University,

[email protected]

2. Teaching Assistant, Mechanical Design Engineering Department, Faculty of Mechanical and Electrical Engineering,

Damascus University,

[email protected], [email protected]

3. Undergraduate Final Year Student, Biomedical Engineering Department, Faculty of Mechanical and Electrical Engineering,

Damascus University,

[email protected]

4. Biomedical Engineer, Biomedical Engineering Department, Faculty of Mechanical and Electrical Engineering, Damascus

University,

[email protected]

5. Professor, Biomedical Engineering Department, Faculty of Mechanical and Electrical Engineering, Damascus University,

[email protected]

Abstract--Day-by-day, artificial intelligence becomes more and

more important in the field of healthcare. One important

application is Brain-Computer Interface (BCI) which has

many advances in enhancing the life quality of patients who

suffers from paralysis for a reason or another. Motor-

Imaginary BCI (MI-BCI) is mostly used to control robotics and

mechatronic systems, for example robotic arms, orthosis,

prothesis and exoskeletons. This research evaluates the effect

of different features on classification process of EEG signal in

MI-BCI systems. In this study, five healthy subjects performed

trailing imagery in order to acquire EEG signal dataset. Five

feature groups were extracted (Power Spectral Density (PSD),

Amplitude mean (AM), Standard Deviation (STD), Shannon

Entropy (SE) and Differential Entropy (DE)) from EEG signal

in MI-BCI of five different subjects. The features groups are

classified using five classification technique "ANN, Decision

tree, LDA, SVM, and KNN. The influence of features groups in

classification performance was compared separately according

to three classifier criteria (Accuracy, Precision and MCC).

One-way ANOVA test was used to compare influence of

features groups in classification performance. For classification

accuracy and precision, significant differences were obtained in

SVM and ANN classifiers for pairs of features which is fairly

supported by the experimental and statistical data. The results

of the research show that Power Spectral Density (PSD)

feature shows great ability to describe EEG signal in MI-BCI

field and considered as an effective feature in binary classifiers.

Additionally, Differential Entropy (DE) is considered as a

promising feature to be used in MI-BCI field. The results of

this study will be used for developing bio-robots, bio-

mechanical, and bio-mechatronic systems in ACIA Lab.

Index Term-- Electroencephalography (EEG), G-tech, Artificial

Intelligent (AI), Motor Imaginary Brain Computer Interface (MI-BCI),

Feature extraction, Binary-Classification, MATLAB, SPSS, One-way

ANOVA, Mathew Correlation Coefficient (MCC).

1 - INTRODUCTION

Artificial intelligence (AI) is becoming more and more

important in every aspect of life, for example industrial [1]

and medical applications. One of the most promising

medical applications of AI is Brain Computer Interface

(BCI) [2]. Brain-Computer Interface (BCI) system is a

communication system that depends on abnormal brain

activity. In other words, without depending on peripheral

muscular activity. This definition was shaped in the first

international meeting in 2000 which was sponsored by

international the National Center for Medical Rehabilitation

Research of the National Institute of Child Health and

Human Development of the National Institutes of Health [3].

Many efforts were being contributing to developing Brain-

Computer Interface (BCI) systems since then. Mason and

Birch in 2003 proposed a general framework for Brain–

Computer Interface design [4]. The proposed model consists

of nine components namely: the user, electrodes,

amplification, feature extraction, feature translation, control

interface Device Controller, Device and Operating

environment. However, this framework was modified

through time to be as it is shown in Solis-Escalante's thesis

[5] illustrated in Fig. 1. The upgraded scheme uses machine

learning in two stages, Feature extraction and classification,

to translate the EEG signal from the world of signal

processing into the world of control (feedback application).

F. Lotte et.al. in 2007 published a research paper to review

the algorithms used in classifying the

Electroencephalography (EEG) signals used to design

Brain-Computer Interface systems. The review covered

mainly five categories of classification algorithms namely:

linear classifiers, nonlinear Bayesian classifiers, neural

mailto:[email protected]



networks, nearest neighbor classifiers and combinations of

classifiers. In the end of the review, they provided a

guideline for researcher to choose the most suitable

classification algorithm for their research [6].

Herman et. al. in 2008 conducted a comparative analysis

between the number of features. The comparison, mainly,

depends on classifier accuracy (CA). In addition, three

classification algorithms were applied namely: Linear

discriminant analysis (LDA), regularized Fischer

discriminant (RFD) and support vector machine (SVM) with

linear and nonlinear kernels. As a result, Power Spectral

Density (PSD) was the most appropriate feature [7].

ZHAO et. al in 2009 depended in their research, in the field

Motor Imaginary Brain-Computer Interface (MI-BCI), on

how the duration of event-related

desynchronization/synchronization (ERD/ERS) could be

modulated and used to control a car in the 3D virtual reality

environment. They classified the MI-BCI into four classes:

left, right, foot (speed up) and no order. The proposed

approach is able to drive smoothly a virtual car [8].

Shan et. al. in 2015 provided a method to select the optimal

channel for classifying EEG signal in the field of Motor

Imaginary Brain-Computer Interface (MI-BCI). Each

feature they used in their research was the average of Hilbert

transform, which reflect the power, of each sub-band in the

range from 5 Hz to 35 Hz over the time of the trail. The

classifier used was multiclass Support Vector Machine

(SVM). The proposed method shows ability to effectively

classifying multi-class motor imagery patterns [9].

Shang-Lin Wu in 2016developed an innovative method with

swarm-optimized fuzzy integral to classify MI-BCI system

based on EEG signals into two classes, in order to control

the movement of a robotic arm. They used single LDA,

Conventional Methods and Fuzzy Fusion with Sugeno

Integral and Choquet integral Classification algorithms. The

best classification accuracy was achieved using Choquet

integral with particle swarm optimization [10].

Alansari, M., Kamel, M et. al.in 2018 developed an

enhancement method for BCI systems. The developed

method depends on wavelet-based feature extraction using

different sub bands of EEG signal. The research tested

different families, lengths and number of decomposition

levels. The results show that the proposed optimization

process outperformed other previous methods [11].

Datta, A., & Chatterjee, R. in 2019 presented three types of

feature extraction methods namely: Wavelet-based Energy

and Entropy (EngEnt), Bandpower (B P), and Adaptive

Autoregressive (AAR). Additionally, they combined the

extracted feature with various classification algorithms

namely: Support Vector Machine (SVM), K-Nearest

Neighbors (K-NN) and Naive Bayes (NB). As the research

results, EngEnt is the most suitable feature extraction

method and K-NN is most stable classification algorithm

[12].

This paper is considered as a step forward in our project

conducted in ACIA Lab (Automatic Control and Industrial

Automation Lab), Damascus University towards building

bio-controlled pneumatic bio-robotics [13]. The goal of the

research is to study various binary classifiers response to

different features extracted from EEG signals in MI-BCI

field. Therefore, the results of this research will be used to

control pneumatic bio-robotics, orthosis and prothesis in

specific. This methodology of the paper is consisting of four

main parts. Firstly, describing the data used in this research

which was provided by the Dr. Cichocki's Lab [14].

Secondly, reviewing five features extracted from EEG

signals namely: Power Spectral Density (PSD), Standard

Deviation (STD), Amplitude Mean (AM), Shannon Entropy

(SE) and Differential Entropy (DE). The review includes

theoretical description, mathematical equations and

programing code for each feature. Thirdly, a short

description of each classification algorithm. Finally, the

experimental work and classification performance criteria.

Afterwards, statistical analysis of the classification

performance criteria is described profoundly and explained

with a flow chart. In the end, the results of the statistical

analysis was mentioned and discussed. Table VIII and Table

IX shows the used Abbreviations and symbols respectively.

The overall architecture of the research is shown in Fig 2.

Fig.1. scheme of Brain-Computer Interface system (modified from [3])



Fig. 2 Overall architecture of the research.

2- METHODOLOGY

2.1 - Datasets Description:

Datasets are provided by the Prof. Cichocki’s Lab (Lab. for

Advanced Brain Signal Processing), BSI, RIKEN in

collaboration with Prof. Liqing Zhang in Shanghai Jiao

Tong University [14].

Datasets recording:

At the beginning of a trail, the subject was sitting in front of

a blank computer screen. Two seconds after the trail started,

a cue appeared on the screen as an arrow, left arrow for

imagining left hand movement and right arrow for

imagining right hand movement. The cue duration is

represented in Table. 1 in the duration column. g.tec

(g.USBamp) was used to record the electroencephalography

(EEG) at sample rate equals to 256 Hz. The reordered EEG

signals were bandpass filtered in the range from 2 Hz to 30

Hz. 50 Hz notch filter was applied too. Six electrodes were

used: C3, Cp3, C4, Cp4, Cz and Cpz. Fig. 3 and Fig. 4

illustrate the time scheme of EEG signal recording and the

six electrodes placement. Table I describes the details of the

used datasets in details [14].

http://www.gtec.at/products/g.USBamp/gUSBamp.htm



Table I Datasets description

Dataset Subject Class Channel Duration

(sec)

Trialsnu

mber

Sample

rate Device

SubA_6chan_2LR A LH/RH 6 3s 130 256Hz g.tec

SubC_6chan_2LR C LH/RH 6 3s 071 256Hz g.tec

SubF_6chan_2LR F LH/RH 6 4s 80 256Hz g.tec

SubG_6chan_2LR G LH/RH 6 4s 120 256Hz g.tec

SubH_6chan_2LR H LH/RH 6 3s 150 256Hz g.tec

Fig.3.The time scheme of EEG signal recording [14].

2.2 Feature Extraction

We aimed to extract five features in order to determine the optimal feature to be applied in MI-BCI systems. The five features

are: Power Spectral Density (PSD), Standard Deviation (STD), Amplitude Mean (AM), Shannon Entropy (SE) and

Differential Entropy (DE).

2.2.1 Power Spectral Density:

The power spectral density (PSD) could be estimated using Welch method [9] that depends on Fourier Transformation. As

shown in Fig. 5, Welch's method divides the signal into 𝐾 overlapped segment 𝑋(1), … , 𝑋(𝑗); 𝑗 = 0, … , 𝐿 − 1 where each

segment length is 𝐿. We take the finite Fourier Transformation of the sequences 𝑋1(𝑗)𝑊(𝑗), … , 𝑋𝐾𝑊(𝑗), where 𝑊(𝑗) is a

selected data window shown in equation (1). Thus, 𝐴1, … , 𝐴𝑘 are the finite Fourier Transformation sequences given in

equation (2):

𝑊(𝑗) = 1 − (𝑗−

𝐿−1

2𝐿+1

2

)

2

(1) [15]

𝐴𝑘(𝑛) =1

𝐿∑ 𝑋𝑘𝑊(𝑗)𝑒−2𝑘𝑖𝑗𝑛/𝐿𝐿−1

𝑗=0 (2) [15]

Where 𝑖 = √−1.

The 𝐾 modified periodogram is obtained by equation (3):

Fig. 4. the electrodes placement [14].



𝐼𝑘(𝑓𝑛) =𝐿

𝑈|𝐴𝑘(𝑛)|2; 𝑘 = 1,2, … , 𝐾 (3) [15]

Where

𝑓𝑛 =𝑛

𝐿; 𝑛 = 0, … ,

𝐿

2 (4) [15]

And

𝑈 =1

𝐿∑ 𝑊2(𝑗)𝐿−1

𝑗=0 (5) [915

Finally, the average of the periodograms represents the estimated Power Spectral Density (PSD), equation (6):

�̂�(𝒇𝒏) =𝟏

𝑲∑ 𝑰𝒌(𝒇𝒏)𝑲

𝒌=𝟏 (6) [15]

Fig.5. The overlapped segments in Welch's method [6].

This feature was extracted using MATLAB code:

psd_welch(j,i)= sum(pwelch(EEGDATA(j,:,i)))/length(EEGDATA(j,:,i));

Where pwelch is the function that extracts the PSD, psd_welch of a discrete-time signal, EEGDATA, using Welch's

averaged, modified periodogram method.

2.2.2 Amplitude Mean:

The signal’s amplitude mean is considered as a time-domain feature. Additionally, it is simple and easy to be extracted which

make it computationally effective. However, since it gives information about the shape of the signal, it is difficult to

differentiate between the signal of the right-hand movement and the signal of the left-hand movement [29].

Amplitude mean depends on finding the average value of voltage of the EEG signal according to equation (7).

𝒙 =∑ 𝒙(𝒏)𝑵

𝒏=𝟏

𝑵 (7) [18]


Mean(j,i)=mean(EEGDATA(j,:,i));

2.2.3 Standard Deviation:

Standard deviation represents a statistically description of EEG signal in BCI systems [17]. It represents the distance between

each sample, of the signal, from the amplitude mean value. Consequently, it can describe the signal more efficient than

amplitude mean does. In other words, standard deviation could be used as feature with acceptable accuracy.

The Standard deviation (𝜎) (STD) feature was extracted by applying the statistical equation of the STD, represented in

equations (8), over the EEG signal:

𝝈 = √∑ (𝒙(𝒏)−�̅�)𝟐𝑵𝒏=𝟏

𝑵−𝟏 (8) [18]




Std(j,i)=std(EEGDATA(j,:,i));

2.2.4 Shannon Entropy:

Any signal can be decomposed into set of functions (signal's coefficients) by using Wavelet transformation. Therefore,

Shannon Entropy (SE) feature was extracted depending on wavelet transformation. The wavelet transformation uses families

of functions generated from a basic signal called the mother signal, which is shifted and dilated according to the original

signal, called the wavelet [19, 20]. Equation (8) represents the continuous wavelet transformation (CWT):

𝑊(𝑎, 𝜏) = ∫ 𝑥(𝑡)Ψa,𝜏̅̅ ̅̅ ̅+∞

−∞ (8) [19]

Where 𝑎 is the scaling factor, 𝜏 is the shifting factor and Ψ𝑎,𝜏 is the mother wavelet. The mother wavelet is given in equation

(9):

Ψ𝑎,𝜏(𝑡) =1

|√𝑎|Ψ (

𝑡−𝜏

𝑎) (9) [21]

There are many wavelet families such as Daubechies, Symlets, Coiflets, Morlet, Mexican hat, and Meyer wavelets .

However, Symlets wavelet exhibits the highest compatibilities with EEG signals [22]. Fig. 6 represents the wavelet

families.

Fig.6. Wavelet Families (Modified from [23]).

After the signal was decomposed into a number of coefficients, The Energy 𝐸𝑖 of each coefficient was computed as the mean

of squared coefficients. By summing the energies of all coefficients, Total energy 𝐸𝑡𝑜𝑡 was calculated. Afterwards, the ration

between the energy of each coefficient 𝐸𝑖 and total energy equals to the relative wavelet energy 𝑃𝑖 [19, 24], as shown in

equation (10):

𝑃𝑖 =𝐸𝑖

𝐸𝑡𝑜𝑡 (10) [1824

Two entropy features are used in this research namely: Log energy entropy, Threshold entropy and Shannon entropy, given in

equation (11), (12) and (13) respectively:

𝐸𝑙𝑜𝑔(𝑠) = ∑ log (𝑃𝑖2)𝑖 (11) [24]



𝐸𝑇(𝑠𝑖) {10

|𝑃𝑖|>𝑝𝑒𝑙𝑠𝑒𝑤ℎ𝑒𝑟𝑒

(12) [24]

𝑬𝑺𝒉(𝒔) = ∑ 𝑷𝒊𝟐. 𝐥𝐨𝐠 (𝑷𝒊

𝟐)𝒊 (13) [24]

Where 𝑝 is the threshold.

Shannon entropy feature was extracted using MATLAB code:

EEG=EEGDATA(j,:,i);

[c,l] = wavedec(EEG,5,'sym3');

[thr,sorh,keepapp]=ddencmp('den','wv',EEG);

EEG2=wdencmp('gbl',c,l,'sym3',5,thr,sorh,keepapp);

Eshannon(i,j)=wentropy(EEG2,'shannon');

Where wavedec, ddencmpandwdencmpreturns wavelet transformation with Symlet-5 as a mother wavelet. And

wentropyreturns Shannon Entropy.

2.2.5 Differential Entropy:

Another good feature used to describe the hidden information in EEG signals is Differential Entropy (DE) [26, 27]. DE

measures the complexity of a continuous random variable and it is calculated by using equation (14)

ℎ(𝑋) = − ∫ 𝑓(𝑥) log(𝑓(𝑥)) 𝑑𝑥𝑋

(14) [26]

where 𝑋 is a random variable, 𝑓(𝑥) is the probability density function of 𝑋.

It was proven in [26] that EEG signal is subject to the Gaussian distribution 𝑁(µ, 𝜎2), in a series of sub-bands after 2Hz step-

band-pass filtering in range from 2Hz to 44Hz. Afterwards, Kolmogorov-Smirnov test verifies that EEG signals meets

Gaussian distribution with probability more than 90%. Therefore, its differential entropy, in a fixed frequency band 𝑖, can be

calculated by using equation (15)

ℎ(𝑋)=∫1

√2𝜋𝜎2𝑒

(𝑥−𝜇)2

2𝜎2 log (1

√2𝜋𝜎2𝑒

(𝑥−𝜇)2

2𝜎2 ) 𝑑𝑥 =1

2log (2𝜋𝑒𝜎2)

𝑋 (15) [25]

hence,

𝒉𝒊(𝑿) =𝟏

𝟐𝐥𝐨𝐠 (𝟐𝝅𝒆𝝈𝒊

𝟐) (16) [25]

Where ℎ𝑖 and 𝜎𝑖2 denote the differential entropy of the corresponding EEG signal in frequency band 𝑖 and the signal variance,

respectively, in equation (16).

This feature was extracted using MATLAB code, simply by applying the equation 15.:

De(j,i)=0.5*log(2*pi*exp(1)*var(EEGDATA(j,:,i)));

2.3 Classification Algorithms:

In this section, the five applied classification algorithms are briefly discussed namely: Neural Networks (NN), Decision Tree,

Linear Distribution Analysis (LDA), Support Victor Machine (SVM) and K-nearest Neighbor (K-NN).

2.3.1 Neural Networks:

An artificial neural network is an electrical analogue of the biological net of neurons. Therefore, assembling number of

artificial neurons is what creates NN, which has the ability to produce nonlinear decision boundaries. Moreover, artificial

neurons could be organized in a multilayer structure composing a multilayer perceptron neural network (MLP-NN). The

MLP-NN has at least three layers: Input layer, hidden layers, at least one layer, and output layer. The input of each neuron is

connected with the previous layer's output. The advantages of NN and MLP-NN are that they can produce nonlinear decision



boundaries, classify any number of classes and flexible. However, the architecture of the neural network, number of layers

and number of neurons in each layer, should be carefully selected [7, 28].

2.3.2 Decision Tree:

A decision tree represents a decision procedure to determine the class of a given input. Its simplicity, flexibility and

computational efficiency offers substantial advantages for the classification stage. However, its low accuracy is the major

drawback of decision tree [29, 30].

2.3.3 Linear Distribution Analysis (LDA):

LDA, also known as Fisher’s linear discriminant (FLD), is a linear classifier that determine the optimal hyperplane (a point in

1-D space, a Line in 2-D space and a surface in 3-D space) to classify the data into two classes. It is optimal to be applied

when the data has an equal covariance matrix and its distribution is Gaussian. LDA shows success in great number of BCI

applications like MI-BCI [30] and P300 Speller [32] due to its simplicity and its very low computational requirements.

However, its main drawback is its linearity which results unpleasant outcomes when dealing with complex nonlinear Data [7,

19, 32].

2.3.4 Support Victor Machine (SVM)

As LDA, SVM uses a hyperplane to separate the data into two classes but it depends on maximizing the distance, which

known as the margin, between the hyperplane and the data points. Basically, SVM classifiers are linear but with a slight

increase of its complexity it can produce a nonlinear decision boundary. Despite the fact that SVM classifiers has worthful

advantages, it suffers from low speed of execution [7, 19, 32].

2.3.5 K-nearest Neighbor (K-NN)

K-nearest Neighbor is a very simple classifier. It depends on determining the classes of K-nearest Neighbor of an instance.

Then, the class of the instance is the same class of the majority of its neighbors [7].

Applying ANN classification depends on “Pattern Recognition and Classification tool (nprtool)” MATLAB Built-in app. And

for Decision Tree, LDA, SVM and K-NN applied by using “Classification Learner” MATLAB Built-in app.

2.4 Experimental work:

Each feature was extracted from six EEG channels (C3, Cp3, C4, Cp4, Cz and Cpz) and was used to train classification

models, each classification algorithm used to train a model, which mean 25 models for each feature-classifier pair. This

procedure was repeated for the five subjects, overall 125 classification model represents the feature-classifier pairs.

The method used to train and validate each model is 10-folds cross validation, resulting of a confusion matrix for each model.

The resulted confusion matrices were used to calculate the performance criteria (performance metrics) which are Classifier

Accuracy (𝐴𝐶𝐶), Classifier Prescient which, AKA Positive Predicted Value (𝑃𝑃𝑉) and Matthews Correlation Coefficient

(MCC), presented in equations (17), (18) and (20), respectively. The structure of the confusion matrix is illustrated in Table.

II.

𝐴𝐶𝐶 =𝑇𝑃+𝑇𝑁

𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 (17) [33]

𝑃𝑃𝑉 =𝑇𝑃

𝑇𝑃+𝐹𝑃 (18) [33]

𝑀𝐶𝐶 =𝑇𝑃×𝑇𝑁−𝐹𝑃×𝐹𝑁

√(𝑇𝑃+𝐹𝑃)(𝑇𝑃+𝐹𝑁)(𝑇𝑁+𝐹𝑃)(𝑇𝑁+𝐹𝑁) (19) [33]



Table II The structure of Confusion Matrix [29]

True Condition

Total

Population

Condition Positive Condition

Negative 𝐴𝐶𝐶 =

Σ 𝑇𝑃 + Σ 𝑇𝑁

Σ 𝐴𝑙𝑙 𝑃𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛

Predict

ed

Condit

ion

Predicted

Condition

Positive

True Positive (TP) False Positive

(FP) 𝑃𝑃𝑉

=Σ 𝑇𝑃

Σ 𝑃𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒

Predicted

Condition

Negative

False Negative (FN) True Negative

(TN)

3 - STATISTICAL ANALYSIS

According to parametric statistical method [34, 35, 36], the

performance of binary classifier was determined by

comparing whether the optimal mean effect of different data

collected from five subjects as a training MI-BCI feature

have a significant difference, or otherwise this training

feature samples (trails) are not differ on each other, which

mean that average values of training feature samples are

similar. Therefore, five features groups “statistically” were

extracted (Power Spectral Density (PSD), Amplitude mean

(AM), Standard Deviation (STD), Shannon Entropy (SE)

and Differential Entropy (DE)) from EEG signal in MI-BCI

of five different subjects; five samples from five subjects

were represented as independent samples for each features

group. The influence of the features groups in classification

performance was compared separately according to two

classifier criteria (Performance metrics), namely: Accuracy

and Precision. Therefore, to determine classifier learning

effects over features groups, classification performance

differences between the different features groups were

compared using one-way ANOVA (parametric analysis of

variances) separately for Accuracy and Precision. The

features (PSD, AM, STD, SE and DE) were considered as

several independent variables and dependent variables as

accuracy and precision separately.

The reliability of applying parametric test was detected by

checking that individual classification performance data was

normally distributed. Therefore, Shapiro-Wilk and

Kolmogorov-Smirnoff statistical test were used at 0.05

significance levels to avoid whether the normality

assumption was in doubt. However, these statistical tests are

very sensitive when the sample size in each feature group

was less than 20 samples (df = 5 < 20). Consequently, in

each feature group sample size was consisted of five

subjects which was less than critical sample size twenty (df

= 5 < 20), therefore non-normality is less likely to be

checked by statistical test only "Shapiro-Wilk". However,

the test of normality will be carried out and the testing errors

for normality will be detected by creating the

unstandardized residuals for each dependant variables

“classification performance criteria” and testing the

normality for the residuals. When normality was reached,

ANOVA parametric test was performed using two

assumptions. First assumption, which was evaluated at

significance level (a = 0.05) by t-test "Levene's test" to

ensure that homogeneity of variances between feature-

groups, has not been violated. Second assumption was

discussed depending on the results of Levene's test and

ANOVA test in order to determine whether a significant F

ratio was obtained, therefore, to reject the null hypothesis

and accept the alternative hypothesis, which states that

classification performance criteria is different across

features groups at significance level (a = 0.05). When the

second assumption was reached significance (p< 0.05),

multiple comparisons were performed using subsequent

post-hoc t-tests, firstly, to determine difference between

groups and, secondly, to compare between which features a

significant difference “in performance criteria” can be

conducted. Afterwards, when the first assumption was

reached significance (p> 0.05), and equal variances were

assumed, Tukey HSD test will be applied. However, Games-

Howell post hoc test will be performed when first

assumption was rejected and unequal variances were

assumed. Significant difference will be considered by post-

hoc t-test when the p-value was below 0.05. Fig.7

demonstrates a flow chart of the statistical analysis.

Statistical analysis was performed using statistical package

for the social science SPSS 17.0.

Mathew's correlation coefficient:

In order to determine the significant influence of features

group on the generalization capability of classification

system, Mathew's correlation coefficient (MCC) was

calculated besides the popular performance criteria

(Accuracy and Precision). It is considered as a measurement

factor in which the quality of binary classification can be

evaluated significantly under the MCC values range from

(–1 ≤ MCC ≤ 1). MCC value can be calculated by means of

confusion matrix based on the correlation between outcomes

and predicted binary classifications [34, 37]. The MCC can

be calculated directly using the following equation (19).

The possible perfect prediction was achieved when the

MCC value reaches to +1, random prediction results

represent with zero value. But disarrangement between

prediction and observation indicates worst prediction

represent when MCC reaches -1.

4 – RESULTS AND DISCUSSION

Table IV represents the mean of classification accuracies for

classifier based on features groups. For accuracy, the highest

https://en.wikipedia.org/wiki/Confusion_matrix



mean of accuracy of 77.33 % was obtained with the ANN

classifier for differential entropy feature. Whereas, the

smallest 51.95% was obtained with the decision tree

classifier for amplitued mean feature. For accuracy means

values obtained by ANN, Decision Tree, LDA, SVM, and

KNN classifiers, There are No evidance that individual

features data is not normally distributed. p values for the

Shapiro-Wilk test is more than 0.05 (p>.05) suggesting that

the data is normally distributed. Also p values for residuals

are greater than 0.05 (p>.05) which indicats the normality

for the residuals. Assumption for homogeneity of variance

are checked by Levene's test. In ANN, Decision tree, and

LDA classifiers, Levene's test for homogeneity of variances

is not significant (p>.05) and therefore we can be confident

that the population variances for accuracy data are

approximately equal. In SVM and KNN classifiers, Levene's

test resulted in significant results (SVM: p-value = 0.004;

KNN: p-value = 0.045) and therefore the homogeneity of

variance assumption has been violated (p< .05). The

ANOVA test was significant (p< .05) to test feature effect

on classification accuracy for ANN and SVM classifiers. So

effect resulted in significant results for classification

accuracy (SVM: F(4, 20) = 3.944, p = 0.016 < 0.05; ANN:

F(4, 20) = 4.546, p = 0.009 < 0.05) (Table IV), therefore

there is a strong evidence of differences in classification

accuracies, for ANN and SVM classifier, between five

features. While other classifiers "KNN, Decision tree, and

LDA", in contrast, has weak evidence ( p> .05) of

differences in classification accuracies through features

groups and the ANOVA test was not significant (DT: F(4,

20) = 0.777, p = 0.553 > 0.05; LDA: F(4, 20) = 0.802, p =

0.538 > 0.05; KNN: F(4, 20) = 1.189, p = 0.346 > 0.05)

(Table IV).

Fig.7. Statistical analysis Flow Chart.



Table V shows the mean of classification precision of

different classifiers depending on features groups. The

minimum classification precision in the SVM, Decision tree

and KNN classifier was obtained for Amplitude mean and

PSD. It was 51.3% in Decision tree for Amplitude mean,

53.8% in SVM for Amplitude mean and 52.5% in KNN for

PSD. However, for all features "PSD, STD, AM, SE, and

DE", the maximum mean precision has attained the highset

avarage in ANN classification algorithm. Table IV

demonstrates the following means of precision related to

ANN classifier: 76.7% for Shannon entropy, 76.5% for

differential entropy, 75.5% for STD, 74.9 for PSD, and 64.4

for Amplitude mean (Table IV).

During the first assumption of statistical analyses, p values

for individual feature data in classification precision indicate

that there is strong evidence of normality for each feature

group. This is clearly represented at p values for the

Shapiro-Wilk test, which is more than 0.05 (p>.05)

suggesting that the data is normally distributed.

Also p values for residuals are greater than 0.05 (p>.05)

which indicate the normality for the residuals. Furthermore,

the equality of variances for precision data depending on

Levene's test was analyzed. The p value obtained was 0.162,

0.756, and 0.072 for ANN, LDA, and KNN classifiers,

respectively (Table IV), therefore threre is no evidence to

doubt assumption of equal variances for precision data.

Whereas other precision data for SVM and Decision Tree

classifiers have strong evidence to violate homogeinety of

variance assumption (p-values < .05) (Table IV). The

ANOVA test demonstrates that the differences in

classification precisions through features groups was not

significant for Decision tree, LDA and KNN classifiers. The

p- values obtained was 0.393, 0.372 and 0.434 for Decision

tree, LDA, and KNN classifiers respectively (Table IV).

While the other two classifiers "ANN, and SVM", in

contrast, has strong evidence (p<.05) of differences in

classification Precisions through features groups and,

consequently, the ANOVA test was significant. The p-

values and F-values obtained for ANN, and SVM classifiers

was (F(4, 20) = 3.893, p = 0.017 < 0.05; F(4, 20) = 2.207,

p = 0.01 < 0.05), respectively (Table IV).

Since significant differences were resulted from features

groups in classification performances and differences

between features groups was demonstrated by ANOVA test,

multiple comparsions were performed with the post-hoc

statistical analysis to determine which feature have

significant effect in classification algorithm. The data

collected from post-hoc statistical test are summarized in

Table VI.

Because the ANOVA test was not significant (p>.05) in

classification accuracies ,resulted from features groups in

Decision tree, LDA and KNN classifiers, multiple

comparisons test was not proceed. Therefore there is enough

evidence to conclude that different features have no

significant interaction effect on accuracy obtained by

Decision tree, LDA and KNN classifiers. For ANN

classifier post-hoc Tukey HSD's test was carried out to

determine which features have significant difference in

classification accuracy. Based on the results of Tukey HSD's

test, three significant differences (p-value < .05) for Tukey

HSD were found between features-couples Amplitude mean

and Differential entropy, between features Amplitude mean

and STD, and between Amplitude mean and PSD. The p

value obtained was 0.007, 0.031, and 0.049, respectively

(Table VI). Strong evidence (p-value = 0.007) that

differential entropy feature have significant effect in

classification accuracy more than Amplitude mean feature.

Subjects with Differential entropy have on average 13.82%

more classification accuracy than those on Amplitude mean.

There was also a significant difference between features

Amplitude mean and STD (p-value = 0.031). Subjects with

Amplitude mean feature have on average 11.40% less

classification accuracy than those on STD feature. There is

evidance of differences between Amplitude mean and PSD.

ANN classifier trained by PSD feature had significantly

larger classification accuracy than ANN classifier trained by

Amplitude mean feature with average 10.53%. Whereas, no

significant differences were found between PSD and STD,

between PSD and Differential Entropy, between PSD and

Shannon Entropy, between STD and Differential Entropy,

between STD and Shannon Entropy, between Differential

Entropy and Shannon Entropy, nor between Shannon

Entropy and Amplitude mean (p-value > .05) Table VI.

Because the classification accuracy in SVM classifier have

evidence of non-equality (p-value < .05 Table IV), the post-

hoc Tukey HSD test was substituted with Games – Howell's

test. Based on the results of Games - Howell's test, no

significant differences were found between PSD and STD,

between PSD and Amplitude mean, between PSD and

Differential Entropy, between PSD and Shannon Entropy,

between STD and Differential Entropy, between STD and

Shannon Entropy, nor between Differential Entropy and

Shannon Entropy (p-value > .05) as shown in Table IV.

There are three significant differences (p-value < .05) for

Tukey HSD between features Amplitude mean and

Differential entropy, between features Amplitude mean and

STD, and between Amplitude mean and Shannon entropy.

The p values obtained was 0.015, 0.041, and 0.009,

respectively (Table VI). Strong evidence (p-value = 0.009)

that Shannon entropy feature have significant effect in

classification accuracy more than Amplitude mean feature.

Subjects with Shannon entropy have on average 7.32% more

classification accuracy than those on Amplitude mean.

There is also a significant difference between features

Amplitude mean and Differential entropy (p-value = 0.015).

Subjects with Amplitude mean feature have on average

11.77% less classification accuracy than those on

Differential entropy feature. There is evidance of differences

between Amplitude mean and STD. ANN classifier which is

trained by STD feature had significantly larger classification

accuracy than ANN classifier trained by Amplitude mean

feature with average 9.45% (Table VI).

For classification precision, multiple comparisons was not

proceed test because the ANOVA test was not significant



(p>.05) in classification precision resulted from features

groups in Decision tree, LDA, and KNN classifiers.

For SVM classifier, post-hoc Games - Howell's test was

carried out to determine which features have significant

difference in classification precision.

No significant differences were found between PSD and

STD, between PSD and Amplitude mean, between PSD and

Differential Entropy, between PSD and Shannon Entropy,

between STD and Differential Entropy, between STD and

Amplitude mean, between STD and Shannon Entropy, nor

between Differential Entropy and Shannon Entropy (p-value

> .05) as shown in Table VI. There are Two significant

differences (p-value< .05) for Tukey HSD between features

Amplitude mean and Differential entropy, and between

Amplitude mean and Shannon entropy. The p value obtained

was 0.036, and 0.012, respectively (Table VI). Strong

evidence (p-value = 0.012) that Shannon entropy feature

have significant effect in classification precision more than

Amplitude mean feature. Subjects with Shannon entropy

have on average 7.02% more classification precision than

those on Amplitude mean. There was also a significant

difference between features Amplitude mean and

Differential entropy (p-value = 0.036). Subjects with

Amplitude mean feature have on average 11.34% less

classification precision than those on Differential entropy

feature (Table VI). For classification precision obtained with

ANN classifier, differences among features groups were

assessed by Tukey HSD's test.

Based on the results of Tukey HSD's test, three significant

differences (p-value < .05) for Tukey HSD were found

between Amplitude mean and Differential entropy, between

features Amplitude mean and STD, and between Amplitude

mean and Shannon entropy. The p values obtained was

0.029, 0.049, and 0.027, respectively (Table VI). Evidence

(p-value = 0.029) that differential entropy feature have

significant effect in classification precision more than

Amplitude mean feature. Subjects with Differential entropy

have on average 12.06% more classification precision than

those on Amplitude mean. There is also a significant

difference between Amplitude mean and STD (p-value =

0.049). Subjects with Amplitude mean feature have on

average 11.13% less classification precision than those on

STD feature. There is evidance of differences between

features Amplitude mean and Shannon entropy. ANN

classifier trained by Shannon entropy feature had

significantly larger classification accuracy than ANN

classifier trained by Amplitude mean feature with average

12.23%.Table III shows the values of avrage accuracy

values, avrage precision values and avrage MCC values

Table III Classification Performance Criteria.

(A) Average Accuracy Values. (B) Average Precision Values. (C) Average MCC Values.

(A) Accuracy (Average ± Slandered Deviation)

Features/Classifiers ACC ANN ACC DT ACC LDA ACC SVM ACC KNN

PSD 74.03 ± 6.75 54.43 ± 4.16 58.34 ± 7.51 58.28 ± 7.82 52.12 ± 4.67

STD 74.9 ± 6.59 58.93 ± 10.25 60.36 ± 4.26 63.22 ± 6.51 60.02 ± 6.66

AM 61.89 ± 7.08 51.95 ± 6.31 55.93 ± 3.68 53.76 ± 1.31 54.48 ± 2.96

SE 73.7 ± 5.49 55.1 ± 5.67 60.18 ± 5.71 61.09 ± 2.63 59.26 ± 12.58

DE 77.31 ± 4.67 53.39 ± 5.06 61.57 ± 5.48 65.53 ± 4.45 59.69 ± 6.15

(B) Precision (Average ± Slandered Deviation)

Features/Classifiers PPV ANN PPV DT PPV LDA PPV SVM PPV KNN

PSD 74.92 ± 7.14 55.65 ± 3.89 59.33 ± 6.16 63.79 ± 11.86 52.55 ± 5.45

STD 75.6 ± 7.24 55.51 ± 4.87 62.47 ± 5.08 63.75 ± 7.62 61.44 ± 9.6

AM 58.46 ± 12.94 51.31 ± 7.2 56.02 ± 3.91 53.85 ± 1.04 55.2 ± 3.45

SE 76.7 ± 6.54 55.04 ± 5.67 59.63 ± 5.03 60.88 ± 2.62 59.63 ± 12.81

DE 76.53 ± 4.33 53.64 ± 4.92 61.35 ± 5.38 65.2 ± 5.3 59.38 ± 6.09

(C) Matthews Correlation Coefficient (Average ± Slandered Deviation)

Features/Classifiers MCC ANN MCC DT MCC LDA MCC SVM MCC KNN

PSD 0.48 ± 0.14 0.5 ± 0.13 0.32 ± 0.07 0.48 ± 0.11 0.55 ± 0.09

STD 0.17 ± 0.19 0.1 ± 0.09 0.08 ± 0.1 0.1 ± 0.11 0.07 ± 0.1

AM 0.26 ± 0.14 0.25 ± 0.09 0.12 ± 0.07 0.2 ± 0.11 0.23 ± 0.11

SE 0.25 ± 0.17 0.27 ± 0.13 0.08 ± 0.03 0.22 ± 0.05 0.31 ± 0.09

DE 0.13 ± 0.21 0.2 ± 0.14 0.09 ± 0.06 0.19 ± 0.25 0.2 ± 0.12



The averaged classification MCC for each training feature

with each classification techniques were represented by

table VII.Mean classification MCC in ANN classifier were

(0.482, 0.502, 0.318, 0.478 and 0.546) for PSD, STD,

Amplitude Mean, Shannon entropy and Differential entropy,

respectively (table VII). The maximum classification MCC

was obtained for Differential entropy feature with mean and

standard deviation scores (M=0.546, SD=0.09). The

minimum classification MCC in ANN classifier was

observed for Amplitude mean feature group with mean and

standard deviation scores (M=0.318, SD=0.06). In the

second classifier Decision treein table V, mean classification

MCC based on (PSD, STD, Amplitude Mean, Shannon

entropy and Differential entropy) features groups were

(0.185, 0.122, 0.111, 0.102 and 0.098), respectively. The

mean and standard deviation scores related to the minimum

classification MCC in Decision tree classifier were

(M=0.098, SD=0.06) for Differential entropy. Hence, the

maximum classification MCC was obtained for PSD feature

with mean and standard deviation scores (M=0.185,

SD=0.17). for third classification technique "LDA", the

maximum classification MCC was observed for PSD feature

group with mean and standard deviation scores (M=0.256,

SD=0.14) and the minimum classification MCC with mean

and standard deviation scores (M=0.137, SD=0.04) were

obtained for Amplitude mean feature. Mean classification

MCC in LDA classifier were (0.256, 0.254, 0.137, 0.204

and 0.232) for(PSD, STD, Amplitude Mean, Shannon

entropy and Differential entropy) features groups,

respectively.The mean classification MCC in SVM classifier

were (0.247, 0.266, 0.075, 0.222 and 0.312) for PSD, STD,

Amplitude Mean, Shannon entropy and Differential entropy,

respectively (table V). Minimum classification MCC was

observed for Amplitude mean feature group with mean and

standard deviation scores (M=0.075, SD=0.02) and the

maximum classification MCC with mean and standard

deviation scores (M=0.312, SD=0.08) were obtained for

Differential entropy feature group. In the final classification

technique "KNN", mean classification MCC in were (0.171,

0.202, 0.091, 0.248 and 0.195) for (PSD, STD, Amplitude

Mean, Shannon entropy and Differential entropy) features

groups, respectively.The mean and standard deviation scores

related to the minimum classification MCC were (M=0.091,

SD=0.09) for Amplitude mean feature group. Hence, the

maximum classification MCC was obtained for Shannon

entropy feature with mean and standard deviation scores

(M=0.248, SD=0.24).

5 – CONCLUSION

In this paper, EEG data for 5 subjects were used as MI-BCI

signals. Afterwards, five feathers were extracted from the

EEG signals using MATLAB code for each signal and

described in the paper theoretically and mathematically. The

extracted features were used to train five different classifiers

using MATLAB Built-in applications. The next step was to

measure the effectiveness of the classification models

depending on classification performance metrics which was

analyzed statically. Using different classifier techniques on

the EEG signals contained in EEG Motor Imagery Dataset,

Significant influence of features groups on the

generalization capability of classification system was

determined by statistical parametric test one way ANOVA.

The classifiers ANN, LDA, Decision tree, SVM and KNN

applied on different features groups were used to determine

the optimal effects of this trained features on classification

performance criteria (Accuracy, Precision and MCC).

The results of classification performance "Accuracy and

Precision" vary from feature to feature in two different

classifier algorithms. For classification accuracy in ANN

classifier, significant differences (p-value < .05) are found

between features Amplitude mean and differential entropy,

between features Amplitude mean and STD, and between

Amplitude mean and PSD. For another pairs features no

significant differences was recorded in the other pairs of

features. With SVM classifier the ANOVA test

demonstrates that the differences in classification accuracies

was only significant (p-value < .05) for pairs of features

Amplitude mean and Differential entropy, between features

Amplitude mean and STD, and between Amplitude mean

and Shannon entropy. For LDA, decision tree and KNN

classifiers, Different features have no significant interaction

effect on accuracy obtained by these classifiers. Also it is

important to notice that the significant differences were

recorded on classification precisions for some Pairs features

" Amplitude mean and Differential entropy, and between

Amplitude mean and Shannon entropy" used in

classification algorithm SVM. On the otherwise, Significant

interaction effect can be obtained in classification precision

by Pairs of features "Amplitude mean and Differential

entropy, between features Amplitude mean and STD, and

between Amplitude mean and Shannon entropy" for ANN

classifier. Furthermore, classification precision obtained for

classification technique could not be altered with significant

difference when using data from different features to train

LDA, KNN, and decision tree classifiers. Also, it is possible

to get higher average classification MCC in ANN classifier

for Differential entropy feature group and the smaller

average in SVM classifiers for Amplitude mean feature

group. These results have important consideration to

determine not only the optimal features but also the

significant classifiers. Future work will be focused on a

difference between classification techniques for different

features in order to record optimal performance criteria and

compare it with MCC and other performance criteria

"sensitivity and specificity".

6 – ACKNOWLEDGMENT

This work was carried out in Biomedical Engineering

Department, Faculty of Mechanical and Electrical

Engineering at Damascus University. The authors thank

Prof. Hani AMASHA, Head of Biomedical Engineering

Department. Also, the authors thank Dr. Bilal Alchalabi and

appreciate his kind consulting. Finally, the authors thank

ACIA Lab, Pneumatic control unit “Mechanical Design

Engineering Department” for being a part of the

development of the project.



7- FUTURE STUDIES

The theme of this study is simplicity; Simple features were

extracted and simple classifiers were applied. Consequently,

the next step is to focus on the features and classifiers that

give the best performance. More complex studies will be

conducted in order to profoundly describe the information

contained in EEG signals.

REFERENCES [1] Philip O. Babalola et.al. (2017). Artificial Neural Network

Prediction of Aluminium Metal Matrix Composite with Silicon

Carbide Particles Developed Using Stir Casting

Method.International Journal of Mechanical & Mechatronics

Engineering IJMME-IJENS Vol:15 No:06.

[2] Nam, C. S., Nijholt, A., & Lotte, F. (Eds.). (2018). Brain–

Computer Interfaces Handbook: Technological and Theoretical

Advances. CRC Press.

[3] Wolpaw, J. R., Birbaumer, N., Heetderks, W. J., McFarland, D.

J., Peckham, P. H., Schalk, G., ... & Vaughan, T. M. (2000).

Brain-computer interface technology: a review of the first

international meeting. IEEE transactions on rehabilitation

engineering, 8(2), 164-173.

[4] Mason, S. G., & Birch, G. E. (2003). A general framework for

brain-computer interface design. IEEE transactions on neural

systems and rehabilitation engineering, 11(1), 70-85.

[5] Solis-Escalante, T. (2012). The Asynchronous Graz Brain

Switch (Doctoral dissertation, Universität Graz).

[6] Lotte, F., Congedo, M., Lécuyer, A., Lamarche, F., &Arnaldi, B.

(2007). A review of classification algorithms for EEG-based

brain–computer interfaces. Journal of neural engineering, 4(2),

R1.

[7] Herman, P., Prasad, G., McGinnity, T. M., & Coyle, D. (2008).

Comparative analysis of spectral approaches to feature

extraction for EEG-based motor imagery classification. IEEE

Transactions on Neural Systems and Rehabilitation

Engineering, 16(4), 317-326.

[8] Zhao, Q., Zhang, L., &Cichocki, A. (2009). EEG-based

asynchronous BCI control of a car in 3D virtual reality

environments. Chinese Science Bulletin, 54(1), 78-87.

[9] Shan, H., Xu, H., Zhu, S., & He, B. (2015). A novel channel

selection method for optimal classification in different motor

imagery BCI paradigms. Biomedical engineering online, 14(1),

93.

[10] Wu, S. L., Liu, Y. T., Hsieh, T. Y., Lin, Y. Y., Chen, C. Y.,

Chuang, C. H., & Lin, C. T. (2017). Fuzzy integral with particle

swarm optimization for a motor-imagery-based brain–computer

interface. IEEE Transactions on Fuzzy Systems, 25(1), 21-28.

[11] Alansari, M., Kamel, M., Hakim, B., &Kadah, Y. (2018,

January). Study of wavelet-based performance enhancement for

motor imagery brain-computer interface. In Brain-Computer

Interface (BCI), 2018 6th International Conference on (pp. 1-4).

IEEE.

[12] Datta, A., & Chatterjee, R. (2019). Comparative Study of

Different Ensemble Compositions in EEG Signal Classification

Problem. In Emerging Technologies in Data Mining and

Information Security (pp. 145-154). Springer, Singapore.

[13] Alimam, H. et. al. (2017). Design of EMG Acquisition Circuit

to Control an Antagonistic Mechanism Actuated by Pneumatic

Artificial Muscles PAMs. International Journal of Mechanical

& Mechatronics Engineering IJMME-IJENS Vol:17 No:05.

[14] http://www.bsp.brain.riken.jp/~qibin/homepage/Datasets.html

(Accessed: 02/03/2018 20:30).

[15] Welch, P. (1967). The use of fast Fourier transform for the estimation of power spectra: a method based on time averaging

over short, modified periodograms. IEEE Transactions on audio

and electroacoustics, 15(2), 70-73.

[16] Harris, F. J. (1978). On the use of windows for harmonic

analysis with the discrete Fourier transform. Proceedings of the IEEE, 66(1), 51-83.

[17] Atkinson, J., & Campos, D. (2016). Improving BCI-based

emotion recognition by combining EEG feature selection and kernel classifiers. Expert Systems with Applications, 47, 35-41.

[18] Tan, D., &Nijholt, A. (2010). Brain-computer interfaces and

human-computer interaction. In Brain-Computer Interfaces (pp. 3-19). Springer London.

[19] Rosso, O. A., Blanco, S., Yordanova, J., Kolev, V., Figliola, A.,

Schürmann, M., &Başar, E. (2001). Wavelet entropy: a new tool for analysis of short duration brain electrical signals. Journal of

neuroscience methods, 105(1), 65-75.

[20] ao-Guo Xu (2008) Pattern recognition of motor imagery EEG using wavelet transform

[21] Wang, D., Miao, D., &Xie, C. (2011). Best basis-based wavelet

packet entropy feature extraction and hierarchical EEG classification for epileptic detection. Expert Systems with

Applications, 38(11), 14314-14320.

[22] Al-Qazzaz, N. K., Hamid Bin Mohd Ali, S., Ahmad, S. A., Islam, M. S., & Escudero, J. (2015). Selection of mother

wavelet functions for multi-channel eeg signal analysis during a

working memory task. Sensors, 15(11), 29015-29035.

[23] https://www.mathworks.com/help/wavelet/gs/introduction-to-

the-wavelet-families.html (Accessed: 04/03/2018 20:30).

[24] Yordanova, J., Kolev, V., Rosso, O. A., Schürmann, M., Sakowitz, O. W., Özgören, M., &Basar, E. (2002). Wavelet

entropy analysis of event-related potentials indicates modality-

independent theta dominance. Journal of neuroscience methods, 117(1), 99-109.

[25] seyyed (2011) Emotion recognition method using entropy

analysis of EEG signals [26] Shi, L. C., Jiao, Y. Y., & Lu, B. L. (2013). Differential entropy

feature for EEG-based vigilance estimation. In Engineering in

Medicine and Biology Society (EMBC), 2013 35th Annual International Conference of the IEEE (pp. 6627-6630). IEEE.

[27] Duan, R. N., Zhu, J. Y., & Lu, B. L. (2013, November).

Differential entropy feature for EEG-based emotion classification. In Neural Engineering (NER), 2013 6th

International IEEE/EMBS Conference on (pp. 81-84). IEEE. [28] Konar, A. (1999). Artificial intelligence and soft computing:

behavioral and cognitive modeling of the human brain. CRC

press. [29] Utgoff, P. E. (1989). Incremental induction of decision

trees. Machine learning, 4(2), 161-186.

[30] Akram, F., Han, S. M., & Kim, T. S. (2015). An efficient word typing P300-BCI system using a modified T9 interface and

random forest classifier. Computers in biology and medicine, 56,

30-36. [31] Pfurtscheller G )1999( EEG event-related desynchronization

(ERD) and event-related synchronization (ERS)

Electroencephalography: Basic Principles, Clinical Applications and Related Fields 4th edn, ed E Niedermeyer and

F H Lopes da Silva (Baltimore, MD: Williams and Wilkins) pp

958–67 [32] Krusienski, D. J., Sellers, E. W., Cabestaing, F., Bayoudh, S.,

McFarland, D. J., Vaughan, T. M., &Wolpaw, J. R. (2006). A

comparison of classification techniques for the P300 Speller. Journal of neural engineering, 3(4), 299.

[33] Sammut, C., & Webb, G. I. (Eds.). (2011). Encyclopedia of

machine learning. Springer Science & Business Media. [34] MehrnazKhodamHazrati. (2013). On Human-Machine

Interfaces based on Electrical Brain Signals. Institute for Signal

Processing of the University of Lübeck.

[35] Friedman, J., Hastie, T., &Tibshirani, R. (2001). The elements of

statistical learning (Vol. 1, No. 10). New York, NY, USA::

Springer series in statistics. [36] Haykin, S. S. (2006). New directions in statistical signal

processing: from systems to brain. J. C. Príncipe, T. J.

Sejnowski, & J. McWhirter (Eds.). Cambridge, MA: Mit Press. [37] Boughorbel, S., Jarray, F., & El-Anbari, M. (2017). Optimal

classifier for imbalanced data using Matthews Correlation

Coefficient metric. PloS one, 12(6), e0177678.

http://www.bsp.brain.riken.jp/~qibin/homepage/Datasets.html

https://www.mathworks.com/help/wavelet/gs/introduction-to-the-wavelet-families.html

https://www.mathworks.com/help/wavelet/gs/introduction-to-the-wavelet-families.html



Table VI

A summary of statistical analysis results, Mean of classification accuracies across features groups.

*P>0.05, No significant difference between pair of means to conduct post hoc tests and compare each pair of features groups.

Classification Accuracy

Feature Group KNN SVM LDA

Decision

Tree ANN

PSD Group1

52.1246 58.2835 58.3401 54.4274 74.0306 Mean ±

0.90 0.23 0.66 0.62 0.24 Shapiro-Wilk p value

STD Group 2

60.0184 63.2187 60.3572 58.9270 74.8985 Mean ±

0.36 0.48 0.43 0.51 0.95 Shapiro-Wilk P value

Amplitude Mean Group 3

54.4796 53.7588 55.9303 51.9546 63.4931 Mean ±


Shannon Entropy Group 4

59.2636 61.0869 60.1833 55.1019 73.7024 Mean ±


Differential

Entropy Group 5

59.6931 65.5294 61.5735 53.3911 77.3131 Mean ±


ANOVA Assumption

0.734 0.942 0.476 0.439 0.159 Residuals for

Accuracy

p value 0.045 0.004 0.215 0.647 0.659 Levene Test

0.346* 0.016 0.538* 0.553* 0.009 One Way

ANOVA

1.189 3.944 0.802 0.777 4.546 F- Value



Table V

A summary of statistical analysis results, Mean of classification precisions across features groups

*P>0.05, No significant difference between pair of means to conduct post hoc tests and compare each pair of features groups.

Classification Precision


Decision

Tree ANN

PSD Group1

52.5541 63.7946 59.3272 55.6516 74.9217 Mean ±


STD Group2

61.4446 63.7508 62.4725 57.5125 75.5959 Mean ±


Amplitude Mean Group3

55.2003 53.8529 56.0235 51.3051 64.4646 Mean ±


Shannon Entropy Group4

59.6348 60.8828 59.6274 55.0387 76.7013 Mean ±


Differential Entropy Group5

59.3776 65.1995 61.3463 53.6358 76.5264 Mean ±


ANOVA Assumption

0.577 0.277 0.188 0.995 0.702 Residualsfor Precision

P value 0.072 0.008 0.756 0.045 0.162 Levene Test

0.434* 0.010 0.372* 0.393* 0.017 One Way ANOVA

0.994 2.207 1.127 1.080 3.893 F- Value



Table VI

A summary of multiple comparisons test results, Feature comparison with Post Hoc test.

* P>0.05, No evidence of difference in classification accuracy between features groups, mean difference (I-J) between pair wise

comparison will not be considered.

Feature J

Post Hoc Test ANN Accuracy

DE SE AM STD PSD

Mean

Difference

I-J

p

Value

Mean

Difference

I-J

p

Value

Mean

Difference

I-J

p

Value

Mean

Difference

I-J

p

Value

Mean

Difference

I-J

p

Value Feature I

-3.2825* 0.883 0.3281* 1.00 10.5375 0.049 -0.8678* 0.999 - - PSD

Tukey

HSD

"Equal

Variances"

-2.4146* 0.958 1.1960* 0.997 11.4054 0.031 - - -0.8678* 0.999 STD

-13.8200 0.007 -10.2093* 0.062 - - -11.4054 0.031 -10.5375 0.049 AM

-3.6107* 0.843 - - 10.2093* 0.062 -1.1960* 0.997 -0.3281* 1.00 SE

- - 3.6107* 0.843 13.8200 0.007 2.4146* 0.958 3.2825* 0.883 DE

SVM Accuracy

-7.2458* 0.446 -2.8033* 0.932 4.5247* 0.717 -4.9351* 0.810 - - PSD

Games

Howell

"Unequal

Variances"

-2.3107* 0.960 2.1317* 0.953 9.4598* 0.041 - - 4.9351* 0.810 STD

-11.7705 0.015 -7.3281 0.009 - - -9.4598 0.041 -4.5247* 0.717 AM

-4.4424 0.391 - - 7.3281 0.009 -2.1317* 0.953 2.8033* 0.932 SE

- - 4.4424* 0.391 11.7705 0.015 2.3107* 0.960 7.2458* 0.446 DE

ANN Precision

-1.6046* 0.992 -1.7796* 0.988 10.4570* 0.051 -0.6741* 1.00 - - PSD

Tukey

HSD

"Equal

Variances"

-0.9304* 0.999 -1.1054* 0.998 11.1312 0.049 - - 0.6741* 1.00 STD

-12.0617 0.029 -12.2367 0.027 - - -11.1312 0.049 10.4570* 0.051 AM

0.1749* 1.00 - - 12.2367 0.027 1.1054* 0.998 1.7796* 0.988 SE

- - -0.1749 1.00 12.0617 0.029 0.9304* 0.999 1.6046* 0.992 DE

SVM Precision

-1.4048* 0.999 2.911* 0.979 9.9417* 0.448 0.0438* 1.00 - - PSD

Games

Howell

"Unequal

Variances"

-1.4486* 0.996 2.8679* 0.921 9.8979* 0.176 - - 0.0438* 1.00 STD

-11.3466 0.036 -7.02992 0.012 - - -9.8979* 0.176 -9.9417* 0.448 AM

-4.3166* 0.530 - - 7.02992 0.012 -2.8679* 0.921 -2.911* 0.979 SE

- - 4.3166* 0.530 11.3466 0.036 1.4486* 0.996 1.4048* 0.999 DE



Table VII Averaged classification MCC for different classification techniques applied on the current data set based on different features groups.

Classification MCC


Decision

Tree ANN

PSD Group1

0.171 0.247 0.256 0.185 0.482 Mean ±

0.17 0.17 0.14 0.17 0.13 Std. deviation

STD Group2

0.202 0.266 0.254 0.122 0.502 Mean ±

0.20 0.13 0.09 0.05 0.13 Std. deviation

Amplitude Mean Group3

0.091 0.075 0.137 0.1110 0.318 Mean ±

0.09 0.02 0.04 0.05 0.06 Std. deviation

Shannon Entropy Group4

0.248 0.222 0.204 0.102 0.478 Mean ±

0.24 0.05 0.11 0.11 0.10 Std. deviation

Differential Entropy Group5

0.195 0.312 0.232 0.098 0.546 Mean ±

0.19 0.08 0.11 0.06 0.09 Std. deviation



Table VIII

List of Abbreviations.

Abbreviation Meaning Notes

MI-BCI Motor Imaginary Brain Computer Interface -

EEG Electroencephalography -

ACC classifier accuracy Performance Metric

LDA Linear discriminant analysis Classifier

RFD Regularized Fischer discriminant Classifier

SVM support vector machine Classifier

PSD Power Spectral Density Signal Feature

ERD event-related desynchronization -

ERS event-related synchronization -

STD Standard Deviation Signal Feature

AM Amplitude Mean Signal Feature

SE Shannon Entropy Signal Feature

DE Differential Entropy Signal Feature

CWT Continues Wavelet Transformation -

ANN Artificial Neural Networks Classifier

K-NN K-nearest Neighbor Classifier

MLP-NN multilayer perceptron neural network Classifier

FLD Fisher’s linear discriminant Classifier

PPV Positive Predicted Value (Classifier Prescient) Performance Metric

TPR True Positive Rate (Classifier Sensitivity) Performance Metric

MCC Matthews Correlation Coefficient Performance Metric

TP True Positive Actual class: Right-hand

Assigned class: Right-hand

FN False Negative Actual class: Right-hand

Assigned class: Left-hand

FP False Positive Actual class: Left-hand

Assigned class: Right-hand

TN True Negative Actual class: Left-hand

Assigned class: Left-hand

Table IX

List of Symbols.

Symbol Meaning Units

𝑋(𝑗) Signal Data Volts

PSD

𝑁 Number Of signal’s samples Sample

𝐾 Number; 𝐾 = 1,2,3, … none

𝑊(𝑗) Data Window Volts

𝐿 Segment Length Sample

𝐴𝑘 Finite Fourier Transformation Volts

𝐼𝑘 𝐾 - Modified Periodogram W/Hz

𝑈 Power Data Window 2Volts

�̂� Power Spectral Density (PSD) W/Hz

�̅� Amplitude Mean (AM) Volts AM

𝝈 Standard Deviation (STD) Volts STD

𝑊(𝑎, 𝜏) Continues Wavelet Transformation (CWT) Volts

SE

Ψ𝑎,𝜏 Mother Wavelet Volts

𝑎 Scaling Factor none

𝜏 Shifting Factor none

𝐸𝑖 Energy of Each Coefficient Joules

𝐸𝑡𝑜𝑡 Total Energy Joules

𝑃𝑖 Relative Wavelet Energy none

𝐸𝑇 Threshold Entropy none

𝐸𝑙𝑜𝑔 Log Energy Entropy none

𝑬𝑺𝒉 Shannon Entropy (SE) none

𝑋 Random Variable none

DE 𝑓(𝑥) Probability Density Function none

𝒉(𝑿) Differential Entropy (DE) none

determining the optimal feature for two classes motor...

Documents