exploring the possibility to employ a consumer-grade eeg

Exploring the possibility to employa consumer-grade EEG-device asan educational tool in a machine

learning course for physicists

THESIS

submitted in partial fulfillment of therequirements for the degree of

BACHELOR OF SCIENCEin

PHYSICS

Author : T.S. PoolStudent ID : S0141453Supervisor : Dr. S. Semrau2nd corrector : Prof. dr. K.E. Schalm

Leiden, The Netherlands, April 2018

Exploring the possibility to employa consumer-grade EEG-device asan educational tool in a machine

learning course for physicists

T.S. Pool

Huygens-Kamerlingh Onnes Laboratory, Leiden UniversityP.O. Box 9500, 2300 RA Leiden, The Netherlands

April 2018

Abstract

With data-generation becoming increasingly complex and automatized asa result of technological developments, using computers to perform

data-selection, preprocessing and data-analysis has becomeindispensable in many fields of physics and astronomy. Hence, acquiring

some basic knowledge of machine-learning techniques should be anessential part of the curriculum of these subjects. However, courses on

the subject are mainly aimed at future computer-scientists. In this study,we explore the potential of using the Emotiv EPOC+, a consumer-grade

EEG-device, as an educational tool in a hands-on machine learningcourse, tailor-made for physics and astronomy students. For this, we

perform various experiments with a single subject, and use elementaryneural networks to perform a binary classification to identify events in

the self-produced EEG-data. We find that the Emotiv is capable ofproducing data containing sufficient consistency within a single

recording to detect blinks and full-arm motion with more then 90%accuracy. However, these results are not reproducible with the sameneural network once the head-set has been removed from the head

between recordings. This means the networks have to be trained anew inorder to classify events in new data. For the Emotiv to serve as an

educational tool in a machine learning course a better understanding ofthis difference in noise between recordings is necessary, and a

standardized preprocessing to reduce noise should be developed.

Contents

1 Introduction 7

2 Machine learning and Physics 9

3 Machine learning and EEG-data 11

4 Emotiv EPOC+ and MNE-Python 134.1 Emotiv EPOC+ headset 144.2 EmotivPRO software 144.3 MNE-Python 15

4.3.1 The Raw and Epochs data structures 15

5 Experiments performed with Emotiv EPOC+ 175.1 Additional software and instruments used 175.2 General setup of experiments 18

5.2.1 Goal of the experiments 185.2.2 Making a recording 195.2.3 Training the neural networks 205.2.4 General properties of neural networks 205.2.5 Elementary workings of neural networks 22

5.3 Detecting Blinks 275.3.1 Creating data 275.3.2 Bayesan classifier 295.3.3 Training perceptrons 315.3.4 Convolutional networks 345.3.5 Comparison of classification 365.3.6 Reading in a continuous data-stream 36

5.4 Detecting movements 37

5

6 CONTENTS

5.4.1 Detecting single hand movements: optimizing para-meters 37

5.4.2 Testing consistency between files 485.4.3 Discerning left- and right-hand movements 505.4.4 Analyze limits of detectable movements 525.4.5 Detecting continuous movement 55

5.5 Detecting sounds 575.6 Suggestions for further experiments 63

5.6.1 Explore noise reduction 635.6.2 Compare the performances of more types of neural

networks 645.6.3 Explore unsupervised learning techniques 645.6.4 Explore multi-class classification 65

6 Conclusion 67

Appendices 71

A Emotiv EPOC+ Specifications 73

6

Chapter 1Introduction

In this research project, we explore the possibility to use a consumer ori-ented EEG device as an educational tool in a machine learning course. Forthis, the Emotive Epoc+ is used, which comes with an easy-to-use EEGheadset, a USB-device to read the data into your personal computer viaa Bluetooth connection, and supporting software to visualize, record andexport data. This way we construct original data sets of EEG recordings.We then employ machine learning techniques to train neural networks torecognize patterns in the recorded EEG-data.

This thesis contains 3 main components:

• an account of creating a hands-on machine learning course for physicsstudents, and an argument for choosing EEG-data as practice mate-rial.

• a description of the Emotiv Epoc+ and the Python packages used toanalyze the data and build the neural networks.

• an description of the experiments performed with the EEG headset,and the results obtained by several kinds of neural networks.

This report is meant to give inside into the potential of the EmotivEpoc+, as well as a practical guide for future users of the device.

7

Chapter 2Machine learning and Physics

Even though here and there some paradigm shifts may take place, overallphysics can be considered as a cumulative enterprise. Using prior achieve-ments as building blocks, it succeeds in creating more and more complexsystems of knowledge. This complexity grows in theory construction (inparticular mathematically) as well as in experimental undertakings, re-quiring evermore advanced instruments. Developments in technology al-low scientists to probe deeper and deeper into the subatomic world as wellas further and further into the universe, going far beyond detecting whatthe human eyes can see. Making ’observations’ is outsourced to machines.Besides having more sensitive and more various sensory systems then hu-mans, a great benefit of exploiting machines is that they allow these obser-vations to be automatized, being able to collect data at high speed, 24/7.

Because of the tremendous amount of data and the complexity of theobtained information, now, not only the observation itself, but also theinterpretation of the data has to be outsources to machines. This can be inthe form of preprocessing as well as in classifying.

The Large Hadron Collider at CERN produces about a million giga-bytes of data every second, making it impossible (and insofar as possible,very costly) to process by human labour alone. Machine learning algo-rithms are used to decide in real-time which data is potentially interestingfor further analysis and which data to toss out. Beyond just preprocessing,computor vision, similar to facial recognition, trained by machine learningalgorithms can be used to identify particle jets (narrow sprays of particlesoriginating at collisions), and even identify features within these jets [1].Likewise machine learning has important applications in quantum me-chanics [2] and condensed matter physics [3], e.g. accelerating the searchfor new materials.

9

10 Machine learning and Physics

Many advancements in machine learning are driven by tech giants’commercial applications and the data explosion generated by them. Sci-ence can benefit from these developments. For example, a neural net-work used by NOvA (NuMI Off-Axis νe Appearance), an experiment de-signed to detect neutrinos, is inspired by the architecture of GoogleNet[4]. Exploiting these deep learning tools is also met with skepticism, sincethese machine learning algorithms work mostly like a ”black box”; it isincreasingly difficult to keep an intuitive insight in how exactly certainconclusions are reached at. Therefore, the growing employment of ma-chine learning techniques to physical problems requires a constant effort,not only from computer scientists, but also from physicists themselves tobetter understand the inner workings of such algorithms and keep doingcross-checks on real data. Both the growing importance as the increasingcomplexity make it vital for any physics student to be at least familiar withthe basics techniques and capabilities of the machine learning enterprise.

Now, courses for machine learning for physicists exist [5] [6], but mostof these employ existing data sets, available in bite-size chunks on manyinternet databases. This way however it omits a very important step inexploiting the power of machine-learning algorithms, which is producingdata that is actually suitable to feed to a neural network. The way data-sets are constructed effects the performance of different neural networksin a distinctive way [7]. Creating training-sets entails producing consis-tent data to ensure equivalence between training and test samples, label-ing samples of data without influencing the content of the samples itself(preferable in an automated way in order to be able to produce large data-sets), and preprocessing the data. Also, when producing the data yourself,it is up to you to control what the neural network is actually training on,and that it doesn’t classify different kinds of noise, for example.

Hence, to enable students to produce their own data in a hands-onmachine learning course, in this research project the potentiality of theEEG-device Emotiv EPOC+ as an educational tool is explored.

10

Chapter 3Machine learning and EEG-data

One of the scientific fields where machine learning is extensively utilizedis in the preprocessing, analysis and classification of EEG data. Electroen-cephalography (EEG) is a method to monitor electrophysiological signals.It is a noninvasive method that uses electrodes placed on the scalp torecord the electrical activity of the brain. EEG measures voltage fluctu-ations resulting from ionic currents within the neurons of the brain. Un-derstanding how measured EEG data relates to certain quantities of thebrain is of substantial value for medical purposes, for example to give in-side in the effect of psychiatric conditions and to predict the effectivenessof possible treatments [8]. Also, because of its non-invasive nature and thefact that nowadays data-recordings are easy to obtain also for non-experts,EEG is a highly promising medium to create a brain-computer-interface.Applications of this are the direct control of prosthetics and exoskeletons,and even partial recovery of spinal cord injuries by long-term training on abrain-machine interface-based gait protocol [9], but also more commercialapplications like controlling gaming interfaces[10].

Because of the very small signal-to-noise ratio, it is often very hard toidentify specific features by the naked eye, even with the help of moredirect analysis techniques like using a Bayesian classifier or applying apower-spectral-density algorithm. On the other hand, with an EEG it iseasy to collect a lot of information because of the high resolution and theuse of multiple channels. This makes the data very suitable for analysiswith the help of machine-learning algorithms.

So just like machine learning can be applied to preprocess and analyzeEEG data, also EEG data can be used to understand machine learning [7].Because it is relatively easy to manipulate and rich in information, it issuitable to compare the performances of different types of networks.

11

12 Machine learning and EEG-data

Again, in this research we want to know whether a consumer gradeEEG-device such as the Emotiv EPOC+ is suited as an educational toolin a machine-learning course. The main obvious advantage of this tool isthat it is affordable and has a plug-and-play setup. One of the goals of thisresearch is to explore the disadvantages.

For this it is necessary to first investigate whether the data obtained bythis device is consistent enough to classify certain basic events, like blinksor bodily movements. Reaching basic results will be necessary prior toany preprocessing, in order to determine the correctness and effectivenessof any preprocessing steps. Only there-after preprocessing like noise (bothextra-cranial as well as perturbation of the data by unintended thoughtsand movements) and artifact removal can be examined to improve previ-ously obtained results and possibly lead to EEG data suitable for pursu-ing more sophisticated classifications like identifying specific intentionalstates. In this thesis however, no machine-learning techniques as appliedto the preporcessing, denoising and artifact detection will be discussed.The only pre-prossesing used during this research is the employement ofimplemented functions in the MNE-package itself that require virtually noprogramming to apply to the data.

12

Chapter 4Emotiv EPOC+ and MNE-Python

This chapter aims to introduce the reader to the Emotiv EPOC+ headsetand supporting software, and explains the basic features of MNE-Python;a free software package designed for handling human neurophysiologicaldata in an accessible and efficient way. It can be read as a manual, enablinganyone new to the Emotiv and the MNE software to quickly produce orig-inal data and manipulate it in such way that it can be used to train neuralnetworks and explore machine learning techniques.

Figure 4.1: The Emotiv EPOC+ consists of 14 channels. The channels connect tothe head via felt pads, saturated with a salt solution.

13

14 Emotiv EPOC+ and MNE-Python

4.1 Emotiv EPOC+ headset

The Emotiv EPOC+ is a 14 channel EEG headset, allowing the user to pro-duce high resolution raw EEG data (fig. 4.1) [11].Next to the EEG channels,it has 7 gyroscopic channels to monitor the tilt and movements of the head.The headset is via Bluetooth connected to a USB-stick to read the recordeddata into your personal computer. The resolution of the recording is 128Hz, which means that for every 14 channels we collect 128 data-points persecond. Additional specifications can be found in appendix A.

4.2 EmotivPRO software

With the EEG-device comes a user interface, called Emotive Pro. This soft-ware allows the user to read in the data and visualize in real-time the elec-tric potential recorded by the 14 channels, the movements of the gyro-scopic channels, and the frequency spectrum of the data via a Fast FourierTransform (fig.4.2).

Figure 4.2: The Emotiv Pro software interface. It can show the Fourier transformof a specific channel (front), the electric potential of every channel (in µV) and alisting of the recordings to play back.

At any time a recording of the data can be started and stopped. Tothese recordings you can add certain ’markers’. To any button on the key-board you can assign a marker-value. Pushing this button will store thetime and the value (the assigned marker-id) of the marker in a separatemarker-file, that will automatically be exported along with the recordedEEG-data. Recordings can be exported as an ’.csv’ file or with a .edf ex-tension (European Data Format). We will choose the later, because MNE-

14

4.3 MNE-Python 15

Python comes with a straight forward read-function that recognizes thisformat.

4.3 MNE-Python

MNE-Python, as a part of the MNE software suite, is a free software pack-age created for exploring, visualizing, and analyzing human neurophysio-logical data. It provides state-of-the art algorithms implemented in Pythonand builds upon the core libraries for scientific computation such as Numpyand Scipy. What follows below is just a description of the bare neces-sities for working with MNE and to sketch an idea of the structure ofMNE-python. A full documentation and detailed class-descriptions canbe found at the Martinos website https://martinos.org/mne/. More de-tailed descriptions of how to apply MNE-python is to be found in the nu-merous articles on the topic available on the internet [12].

4.3.1 The Raw and Epochs data structures

Analysis of EEG-data with MNE-Python typically involves the two basicdata-structures in this library; the Raw and Epochs objects.

An object of the Raw class is automatically created when the readfunc-tion is employed to load raw EEG data. The core structure of the objectis a 2D Numpy array (n channels x n data-points). The number ofdata-points is equal to the length of recording multiplied by the resolu-tion (n seconds x 128Hz). It has several attributes, such as an info-object,which is a dictionary containing the measurement info, a list of stringscontaining the channel names and an array of time points. Furthermorethe class contains ample methods for manipulating and plotting the Raw

object.From the Raw object you can extract a collection of time-locked trials

(events) and store these in an Epochs-object. The basic structure of this ob-ject is a 3D Numpy array of shape (n events, n channels, n data-points).This can be used to create training and test sets to feed to the neural net-work to be trained.

Because the Emotiv software does not produce an events-list that isrecognized by the Epochs class we have to create this by hand. Next tothe channels that contain the actual EEG data, the Raw object also includesa so-called marker-channel, which gives us the times the events occurredand the corresponding event ids ( the value assigned to the event). Usingthis information an event-object can be created, which is basically a 2D

15

https://martinos.org/mne/

16 Emotiv EPOC+ and MNE-Python

Numpy array, optionally attributed by a dictionary containing the valuesof the event id’s and the corresponding meaning of that value, i.e. theevents assigned to the id. Having done this, an Epochs object is created bycalling the class, accompanied by the event-object as a variable.

Further more MNE python encompasses a tremendous amount of im-plemented functionality to manipulate and visualize the imported datathat is stored in a Raw or Epochs-object, which will not be employed duringthis research. Some examples of the effects of preprocessing will be given,but in a machine-learning course we would want the preprocessing to bedone by neural networks as well, like the use of a denoising auto-encoder.The implemented methods in the MNE-library are basically a black-boxfor the user, and exploiting these to much just for the sake of improvingthe accuracy of the network might not be very instructive.

16

Chapter 5Experiments performed withEmotiv EPOC+

To explore the possibility of using this EEG-headset as a tool to createoriginal data that can be utilized in a course on machine learning, somepractical research is done to investigate the workings and the limits of thedevice. The experiments are also meant to serve as instructive examples.

5.1 Additional software and instruments used

The code to process and manipulate the raw EEG data imported from theEmotivPRO software should be written in Python 2.7 [13]. The reason forthis is that the visualization of the data using the MNE-package is onlysupported in Python 2, and not yet available in Python 3.

For creating models of different kinds of networks, it is very efficientto use Keras, an open-source library designed to easily build and trainneural networks, also written in Python [14]. Examples of such networksexploiting Keras can be found in the next sections, which describes severalexperiments with machine-learning in detail.

For these experiments, the JupyterLab interface is used as a computa-tional environment [15].

All the computation is done on an Apple Notebook (1,8 GHz Intel Corei7 processor, 4 GB 1333 MHz DDR3 Memory). Although the computationtime for training the networks was not excessive (up to several minutes),it is advisable to use a stronger computer, especially when multiple net-works should be trained in a sequence while step-wise varying one ormore parameters in search for optimization.

17

18 Experiments performed with Emotiv EPOC+

5.2 General setup of experiments

The main goal of this research project is to examine the possibility of us-ing the Emotiv EPOC+ to personally produce data-sets that are suitable toemploy in a machine learning course, i.e. data-sets that can be fed to a rel-atively basic neural network and produce significant results in a straightforward way. It is important that this EEG-devise allows the user (i.e. stu-dents) to create easy and intuitive results to serve as a starting point forpossible improvements and more complex experiments that also requiremore refined networks. Therefore all analysis and results presented belowis performed on data that comes from experiments done by and on the re-searcher himself. No other professionals, participants or data from opendata-sets was used.

The experiments were performed on two different locations: the officelocation at Gorlaeus Laboratories, and a home environment. In analyzingthe experiments the location where the data was taken is not taken intoaccount, although it can not be excluded that differences in back-groundnoise might have influenced the results. In general no effort is made tominimize back-ground noise produced by em-radiation from electronicdevices. However a basic notch-filter is applied to all the data to removethe powerline hum that was clearly present at 50 Hz.

All experiments were done with the subject in a similar sitting posi-tion, cautiously minimizing bodily movements (that were not part of thespecific experiment) while recording data to prevent unnecessary noiseor contamination of the data. It should be mentioned however that thisresearch is performed by, and hence the presented data was taken on, aperson suffering from a complete high spinal cord injury, which meansthat all muscles in trunk and lower limbs are paralyzed. This might havecaused potential noise from involuntary muscular activity to be lower thenin case the data was taken on an able-bodied subject. Therefore it couldin principle be possible that repeating the performed experiments on datataken on other subjects could lead to slightly different results. This, how-ever, should not influence the general conclusions drawn from the resultspresented below.

5.2.1 Goal of the experiments

The practical goal in performing the experiments and creating data is toproduce data-sets containing samples of labeled data in which a chosenevent is known to have occurred. A neural network can then be trainedto distinguish either data representing an event from an a sample of blank

18

5.2 General setup of experiments 19

data (in which no specific event has taken place, at least not known), orto distinguish two different events. So in all experiments a network istrained on a binary classification, i.e. distinguishing samples from justtwo different classes. If a network is trained to recognize the occurrence ofa chosen event, a continuous data-stream can then be read in window forwindow, with a step size that can be optimized after investigation. Everywindow can be fed to a trained network as a data-sample, to determinewhether the specific event took place within that time-window.

Alas, the software accessible during this research does not allow for areal-time analysis of recorded data. The data has to be recorded by theEmotive-Pro software and after saving be exported as a .csv or, in the cur-rent research, as an .edf format for further analysis at a later time.

5.2.2 Making a recording

After making sure the felt pads of the sensors are fully saturated with asalt solution (basic physiological salt solution or any contact lens solution)the headset can be slid over the skull until the reference pads are at thecorrect location behind the ears. After this the individual sensors shouldalways be replaced or moved under the hairs until the connectivity-helpof the Emotiv software confirms the connectivity is 100%

When starting a recording there is always the option to include a base-line recording implemented in the software. If chosen so, the user is in-structed to sit still for 15 seconds, first with eyes open and then again witheyes closed. Markers are automatically placed by the software, so no handmade labelling of the base-line recording is needed. Although in currentpreliminary research this convenient tool is not exploited, the results pre-sented below indicate that for any noise reduction it is of paramount im-portance to always start any new recording with a base-line recording.This is because the noise seems to differ significantly with every record-ing. Consequently, when feeding new recording into a network to searchfor the occurrence of events, the network first has to learn again what ablank sample of that particular recording looks like. This also means thatin actively reducing noise, the noise has to be sampled within every sin-gle recording, and cannot be fully removed by a standard prepossessingprocedure.

19


5.2.3 Training the neural networks

Using the Epochs-class an array containing the samples can be created. Asmentioned, the samples stored in the Epochs-object have a 2D-shape:(n events,

n channels. The samples are then divided in a 80/20 proportion to makea training-set and a test-set. For training of the neural networks 2 ap-proaches are chosen:

1. The 2D-shape of the samples can be linearized. For this the data ofall the 14 channels are put front to back. The result is a 1D-arrayof length: (14 x sample-length). This arrays can be fed to a 1D-perceptron.

2. The 2D-samples can serve as input in a 2D-convolutional network.

5.2.4 General properties of neural networks

To give a basic idea of all the components and features of the most com-monly used neural networks build with the implemented keras-library arebe explained here. In code, the network in general looks like this example:

batch_size = 32

epochs = 40

model = Sequential ()

model.add(Conv2D (10, kernel_size =(14 ,20),

activation=’relu’,

input_shape =(14 ,129 ,1)))

model.add(Dropout (0.45))

model.add(Flatten ())

model.add(Dense (256, activation=’relu’))




model.add(Dense(1, activation=’sigmoid ’))

model.compile(loss=’binary_crossentropy ’,

optimizer=keras.optimizers.Adadelta (),

metrics =[’accuracy ’])

model.summary ()

histories_3 = Histories ()

model.fit(x_train , y_train ,

batch_size=batch_size ,

epochs=epochs ,

verbose=1,callbacks =[ histories_3],validation_data =(x_test , y_test)

)

To go step by step through the terminology:

Batchsize:Number of samples that are propagated through the network before the

20


weights are updated. So choosing a higher batchsize will increase thespeed of the training process. However, because the weights are then lessfrequently updated you might need more training rounds (epochs)

Epochs:In Keras, the epochs denote the number of times that the entire datasetis passed through the network during the full training of the network.It is unfortunate that in the terminology of machine learning ’epoch’ thenrefers to two things: In MNE-Python it is also the name for the data recordedaround an event.

Con2D:This denotes that we are using a convolutional network.

kernelsize:This is the size of the kernels or ’filters’ that slide over the data. Duringthe forward pass, each filter is ’convolved’ across the width and height ofthe input data, computing the dot product between the entries of the filterand the input and producing a 2-dimensional activation map of that filter.As a result, the network learns filters that activate when it detects somespecific type of feature that occurs in the input.

activation:The type of activation function that is used. This output of the networkis converted by this function to determine the amplitude of the output.’Relu’ stands for rectified linear unit:

R(x) = x+ = max(0, x)

For the output of the final layer the sigmoid function is used becausewe are training the network to do a binary classification.

Dropout:The dropout is the fraction of weights that will be put back to a randomvalue. This can be used to prevent overtraining and to avoid the networkfrom getting stuck in local minima.

Flatten:Converts the 2D-structure of the network to a single dimension, so it canserve as input for the dense layers to follow. Dense layers are the standardbuilding blocks of a 1D-perceptron.

loss:The loss function is used to measure the inconsistency between predictedvalue (y ) and actual label (y). The network learns by minimizing the lossfunction. During this research, the binary crossentropy function is used:

L( f (~x), y) = −y ln( f (~x))− (1− y) ln(1− f (~x))

21


The function f (~x) represents the dot product of the inputvector ~x withthe weights of the neurons in the network. This will become more clearin the example below. The network learns by adjusting the weights afterfeeding one batchsize of samples to the network in such a way to minimizethe loss function. This is done by calculating its gradient.

optimizer:The learning rate of the network is a parameter that determines by howmuch the weights are adjusted after the gradient of the loss function hasbeen calculated. An ’optimizer’ is just to ajust this learning rate in a smartway. Usually, a high gradient means the loss-function is not yet close toa minimum, and a big learning rate can be used without the risk of over-shooting the minimum. Adadelta (’Ada’ is for ’Adaptive’) also takes pre-vious gradients into account when adjusting the learning rate.

5.2.5 Elementary workings of neural networks

An artificial neural network is a circuit composed of artificial neurons, ornodes. The connections between these nodes are modelled as weights. Theweights can take values between 0 and 1; large weights represent an exci-tatory connection. When feeding an input to a neural network, all inputvalues are modified by the weights of the nodes and summed. A singlebias value can then be added as a last parameter to determine the singleoutput value. Finally an activation function determines the amplitude ofthe output.

To explain these basic workings of neural networks, it might be instruc-tive to go through a simple example. Letas say we want to build a neuralnetwork that works as an AND gate. This means we have only two inputnodes and the possible combinations and required output can be repre-sented in the following table:

input output0 0 00 1 01 0 01 1 1

So in this network there are just two input nodes that can only take bi-nary values. We have to find two weights that gives us the desired outputin every possible combination of input. The input nodes can be repre-sented in a vector of length two, just as the weights. In formula, to gofrom inputvector ~x = (x1, x2) to the single output value y:

22


f (~x) = H((w1 + w2)~x + b) = y

In this formula, b is the bias-parameter, andH(n) our activation-function,in this case the Heaviside-function:

H(n) =

{0 if n < 01 if n ≥ 0

It is easy to see that good values of the parameters would be:

w1 = 0, 5 w2 = 0, 5 b = −0, 75

Now, in a more complex network, an algorithm is used to calculatethe values of the weights and the bias. The initial values of the weightsare randomly generated with a value between 0 and 1. As explained, thebatchsize determines after how many samples the weights are updated.For this optimization of the weights, the loss-function and its gradient areused, as explained above.

For an other example, using the building blocks of Keras, we can traina neural network to function as an XOR-gate. Going through this examplemight give some insight in how a network is coded in Keras.

The possible inputs and corresponding outputs of the XOR-gate areshown in the table below.

input output0 0 00 1 11 0 11 1 0

Now, in contrast to the simples AND-gate, this network can not besolved by a single-layer perceptron, so we need to add a so-called ’hidden’-layer. Schematically, the network we use can be pictured as in figure 5.1.

23


Figure 5.1: Schematic picture of the XOR-network. The biases are written as W00and W10, with input value 1.

In keras, this network is coded as follows:

x_train = np.array ([[0 ,0] ,[0 ,1] ,[1 ,0] ,[1 ,1]])

y_train = np.array ([0,1,1,0])

x_test = x_train

y_test = y_train

batch_size = 4

epochs = 500


model.add(Dense(2, activation=’relu’, input_shape =(2,)))


model.compile(optimizer=’AdaDelta ’,

loss=’binary_crossentropy ’,


We use all the possible inputs as training inputs as well as a testset. Thebatchsize is set on 4, so everytime all the possible inputs go through thenetwork the weights are updated. Running this code gives the followingoutput:

4 train samples

4 test samples

Sample lenght: 2, i.e. 0 seconds

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

dense_65 (Dense) (None , 2) 6

_________________________________________________________________


=================================================================

Total params: 9

Trainable params: 9

Non -trainable params: 0

_________________________________________________________________

24


As you can see there are 6 parameters in the first layer, and 3 parame-ters in the second layer. These are the weights as shown in figure 5.1. Theresult of the trainingprocess on the performance of the network is shownin figure 5.2

Figure 5.2: Performance of the double-layer perceptron training on the XOR-problem

We can print the final weights of the network, and recalculate by handthe outputs from the four possible inputs.

The weigths are:W00 = -0.52W01 = 0.0W02 = 0.52W03 = 0.35W04 = 0.52W05 = 0.35

For the second layer:

W10 = -0.22W11 = -2.10W12 = 1.4

Let’s say~x = (1, x1, x2). The first element is added as input for the bias,for computational simplicity. Then:

25


~x1 = (1, 0, 0)~x2 = (1, 0, 1)~x3 = (1, 1, 0)~x4 = (1, 1, 1)

To check the result of the network by hand we calculate the output,starting with ~x1. First we have to determine the values of a1 and a2 as infigure 5.1:

a1 = R( ~W0 · ~x1) = R(W00 · 1 + W02 · 0 + W04 · 0) =R(−0.52 · 1 + 0.52 · 0 + 0.52 · 0) = R(−0.52) = 0

a2 = R( ~W0 · ~x2) = R(W01 · 1 + W03 · 0 + W05 · 0) =R(0 · 1 + 0.35 · 0 + 0.35 · 0) = R(0) = 0

Then the final output, using sigmoid function ’S’ is:

y1 = S( ~W1 · ~a1) = S(W10 · 1 + W11 · 0 + W12 · 0) = S(−0.22) = 0.44→ 0

We can do the same for the other possible inputs:

y2 = S( ~W1 · ~a2) = S(W10 · 1 +W11 · 0 +W12 · 0.35) = S(0.27) = 0.57→ 1

y3 = S( ~W1 · ~a3) = S(W10 · 1 +W11 · 0 +W12 · 0.35) = S(0.27) = 0.57→ 1

And for the final one:

y4 = S(W10 · 1 + W11 · 0.52 + W12 · 0.7) =S(W10 · 1− 2.1 · 0.52 + 1.4 · 0.7) = S(−0.11) = 0.47→ 0

So you can see this network actually learns quite slowly. Only aftermore then 250 epochs it produces the correct output, and when we checkby hand the result after 500 epochs the output is only just within the limits.

Now of course this is the most simple network that can learn to functonas an XOR-gate. We could adjust the second layer to have 64 nodes. Thenthe network has in total (3 x 64) + 65 = 257 trainable parameters, and itlearns much faster, as is shown in figure 5.3. Also, how fast the networktrains towards a perfect performance, depends on the initial values of theweights, which have a random value when the networks starts training.

26

5.3 Detecting Blinks 27

That is why we find different performances over multiple runs.

Figure 5.3: Performance of the double-layer perceptron with 64 nodes over 3 runs,training on the XOR-problem

5.3 Detecting Blinks

For a first attempt to create data suitable for a network to train on, a record-ing is made where blinks are labeled. The main reason to choose ’blinks’ asa first event to train on is that occurrences are very easily discernibly by thenaked eye as well, so we can check by visualization whether the labelingprocess and the code to manipulate to data into a stack of training-samplesthat can serve as input for a network, is performing as it should. The goalis to train a network to recognize a blink when it occurs in a continuousstream of data. For this the network has to learn to distinguish a blinkfrom a ’blank’ piece of EEG data where no blinking took place.

5.3.1 Creating data

Two markers are assigned: the ’1’ button of a regular computer-keyboardmarks a (conscious) blink, and the ’2’ button is used to mark a piece of EEGdata where no blinking occurred. We created a recording of 294 seconds,containing in total 55 events: 28 blinks, and 27 marked fragments of datawhere no blinking occurred. This can for example be done as follows:

27


# creating raw objects by reading in EEG data

blinks_raw = mne.io.read_raw_edf(filename)

# Define the time points before and after the marker

# where the raw_file will be cut. So here every epoch

# will be contain 2 seconds of data , i.e. 257 data points

tmin = -1

tmax = 1

# define by hand the values assigned to the events.

event_id = dict(blinks=0, no_blinks =1)

# call function to create events -object (full code in Appendix)

events = create_events(markers)

# pass events -object as a variable to Epochs class , to create array of

samples

epochs = mne.Epochs(blinks_raw , events=events ,

event_id = event_id ,

tmin=tmin , tmax=tmax , preload = True)

This gives the following output:

64 matching events found

0 projection items activated

Loading data for 64 events and 257 original time points ...

0 bad epochs dropped

From this Epochs objects we can, with some manipulation, create a train-ing set and a test set, and a 1D Numpy array that contains the correct labelsfor the supervised learning that we intent to do. For example:

# convert objects to numpy arrays

samples = np.array(epochs)

labels = np.array(events [: ,2]) # second column contains the labels

# shuffle the samples and labels , before dividing into train and test set.

np.random.seed(SEED)

np.random.shuffle(x_train)

np.random.seed(SEED)

np.random.shuffle(y_train)

# divide samples in a training and test set.

split = int((len(x_samples)) *0.8)

x_test = samples[split:]

x_train = samples [:split]

y_test = labels[split :]

y_train = labels [:split]

A couple samples from the training set are plotted below:

28


0 50 100 150 200 250time(s*Hz)

20

15

10

5

0

5

Elec

tric

pote

ntia

l (V)

3 samples labeled as 'blinks'

(a) 3 samples from the set of ’blinks’

0 50 100 150 200 250time(s*Hz)

20

15

10

5

0

5

Elec

tric

pote

ntia

l (V)

3 samples labeled as 'blinks'

(b) 3 samples of ’blank’ data.

Figure 5.4: Samples from both categories to be distinguished by the neural net-work. Clearly the data shows quite some differences between the recorded blinks,but they show enough similarity, and difference with the ’blank’ data, that wemay expect a network to recognize this.

5.3.2 Bayesan classifier

Since the blinks are easily discernible by eye as well, the most straightfor-ward way of identifying a blink in a sample of data is to apply a Bayesanclassifier. In general, a Bayesan classifier is a rule that assigns an to obser-vation a ’best guess’ or estimate of what the unobserved label correspond-ing to that observation actually was.In the case of identifying blinks in asample of data, we can, for example, just look at the maximum value ofthe EEG-signal of both the samples where a blink occurred and the blanksamples taken in between blinks.

Plotting some samples of blinks and blanks of channel T7:

29


Figure 5.5: Some samples of the signal recorded by channel T7

Plotting the maximum values of all the 272 samples of the training setin a histogram can then visualize a value to choose as a classifier, as shownin figure 5.6.

Figure 5.6: Histogram of maximum values of samples of channel T7

By inspecting the plots, we conclude that a maximum value of 6 willserve well as a Bayesan classifier. The samples in the test set can now beclassified. The results are shown in the following confusion matrix:

30


blinks blankblinks predicted 34 3blanks predicted 3 29

5.3.3 Training perceptrons

The samples are now linearized so a perceptron can be trained. We chooseto take a time-window of 0.5 second around every blink, which equals65 data-points. After putting all the channels behind each other the totallength of a sample is then 14 x 65 = 910 data-points. This will be the shapeof the input-layer of the perceptron. The following model is trained on arecording containing ± 100 blinks and 100 labeled blank samples:

# Single layer perceptron

batch_size = 8

epochs = 15 # number of times the model is trained on the total training

set.

print(x_train.shape[0], ’train samples ’)

print(x_test.shape[0], ’test samples ’)


model.add(Dense(1, activation=’sigmoid ’, input_shape =(910 ,)))

model.summary ()

model.compile(optimizer=’rmsprop ’,

loss=’binary_crossentropy ’,


model.fit(x_train , y_train , epochs=epochs ,

batch_size=batch_size , verbose = 1 )

score = model.evaluate(x_test , y_test , verbose =1)

print(’Test loss:’, score [0])

print(’Test accuracy:’, score [1])

In the fit-attribute of the model-class the variable verbose = 1. The sameis done for the evaluate-attribute. This makes it possible to quickly exam-ine the properties of the model, and follow the improvements after everytrained epoch. To illustrate, the above model then gives the following out-put over the first couple epochs:

173 train samples

44 test samples

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================


=================================================================

Total params: 911

31


Trainable params: 911

Non -trainable params: 0

_________________________________________________________________

Train on 173 samples , validate on 44 samples

Epoch 1/15

173/173 [==============================] - 1s 4ms/step - loss: 0.5908 - acc:

0.7572 - val_loss: 0.5293 - val_acc: 0.7045

Epoch 2/15

173/173 [==============================] - 0s 457us/step - loss: 0.3886 -

acc: 0.8439 - val_loss: 0.6281 - val_acc: 0.7273

Epoch 3/15

173/173 [==============================] - 0s 498us/step - loss: 0.3025 -


Epoch 4/15

173/173 [==============================] - 0s 461us/step - loss: 0.2654 -


Epoch 5/15

173/173 [==============================] - 0s 504us/step - loss: 0.2057 -


The validation accuracy of 72,73% is calculated by dividing the number ofmisclassifications by the total number of classifications (i.e. the number ofsamples in the testset)

The following two models are also trained and the performance is com-pared to see if adding layers improves the accuracy (taking into accountthat training a multilayer perceptron (MLP) takes more computation time,and might be less effective even though it might need less epochs to reachsufficient accuracy):

32


# double layer perceptron


model.add(Dense (128, activation=’relu’, input_shape =(910 ,)))


# multi layer perceptron


model.add(Dense (128, activation=’relu’, input_shape =(910 ,)))



Over 15 epochs the following results are obtained:

Figure 5.7: The accuracy of 3 perceptrons plotted against the increasing numberof epochs.

Clearly the multi-layer perceptron performs significantly better thenthe single-layer. Now the next step is to train the network on one or morerecordings of blinks, and then test whether the network is capable of rec-ognizing blinks when reading in a separate file on which it is not trained.Two other files, both containing ±120 samples are used to check the con-sistency between different recordings.

33


Figure 5.8: Training on one file, testing on 2 others files

Clearly the performance is not as good as between using blinks withinthe same file. But considering the perfect accuracy on the training set, it isreasonable to assume that the model over-trains fairly quickly.

5.3.4 Convolutional networksKeeping the samples in their 2D-shape, the following convolutional net-work is trained:


model.add(Conv2D(4, kernel_size =(14 ,20),activation=’relu’,

input_shape =(14 ,65 ,1)))


model.add(Dense(8, activation=’relu’))


The width of the kernel here is an important variable. As is to see inthe convolutional layer, this model trains 4 kernels of width 20.

First the 3 files are trained individually, and the performance is checkedon test samples taken from the same file as the training samples. The ac-curacy is in all three cases a 100%

34


Figure 5.9: The performance of the convolution network on 3 independent files

Again we train the model on one recording, and test the performanceon the 2 other files. The results are plotted in figure 5.10.

Figure 5.10: The performance of the convolution network is plotted against thenumber of epochs. The network is trained on file 1, and then tested on file 1, file2 and file 3

Since the convolutional network performs better then a perceptron, thistype of network is used in trying to locate blinks when reading in a con-tinuous data-stream.

35


5.3.5 Comparison of classification

Now, using the samples from all the available recordings we can comparethe performance of the Bayesan Classifier and the different neural net-works that are used. We train on 270 samples and test on 69 samples. Bothnetworks are trained for 5 epochs.

misclassifications accuracy (%)Bayes classifier 6 91Perceptron 2 97Convolutional network 0 100

5.3.6 Reading in a continuous data-stream

The next step is to train the network on one or more recordings of blinks,and then test whether the network is capable of recognizing blinks whenreading in a separate file on which it is not trained.

The network is trained using the convolutional network as shown inthe previous section. Next, a raw file containing blinks that where notused as a training sample is read in, window by window. Every window isevaluated by the trained network and classified as a blink or a blank sam-ple. After classifying, the window advances 5 data-points, and anothersample of the data is taken and classified. This means that the algorithmchecks for the occurrence of a blink about 25 times a second. The assignedclass, 0 or 1, is marked with orange cross (x) and plotted in figure 5.11. Forvisualization, the actual data of one channel (channel AF4) is also shown.

36


Figure 5.11: Classification of data on the occurrence of blinks. The orange line atzero is in fact a row of crosses, which indicate pieces of data classified as a ’blank’.

We see that the algorithm detects the blinks as expected.

5.4 Detecting movements

To investigate the ability to detect patterns in EEG data stemming frombodily movements, several experiments are done, with different kind ofmovements.

First we discuss one experiment, and explore the effects of changingparameters of the networks. Then several other experiments are done,with more or less the same network.

5.4.1 Detecting single hand movements: optimizing para-meters

The basic experiment for this first exploration is as follows: one hand isresting on the keyboard, with the index-finger on the ’1’-button, and themiddle finger placed on the ’2’-button, labeling a movement. The freearm is making a sharp outwards movement when pressing the ’1’-buttonwith the other arm. The reason to choose for this set up is that it makes itpossible to do the same experiment with eyes closed, because the fingersof one hand are already on the labeling buttons.

Some samples of this experiment are shown in figure 5.12. It is hardto identify by eye a pattern in this data that would correspond to a move-ment of the arm. In general, surprisingly enough, there seems to be more

37


random noise in the blank samples then in the samples where an eventoccurred.

Figure 5.12: Raw data of all channels. Two samples are taken around a movementen 2 blank samples are shown to compare.

Perceptron

The 14 samples from the 14 channels corresponding to a single eventare linearized and several perceptrons are trained to compare the perfor-mance. A time-window of 1 second around the markers is used to capturethe events: t-min = -0.5, t-max = 0.5. The input shape for the network isthen 14 channels x 129 Hz x 1 sec = 1806.

The very first significant thing to do is remove the power-line hum andthe slow drifts. This can be easily done by adding a high-pass filter that

38


removes all the frequencies below 1 HZ, and a notch filter of width 2 at 50Hz.

Figure 5.13: Plot of the Power Spectrum Densities of a recording of actual EEG-data (left) and from a recording made on a dummy-head (right). On both record-ings the power-line hum at 50 Hz is clearly visible

To see the difference in performance on raw and filtered data, a fairlybasic 2-layer network is trained:


model.add(Dense (256, activation=’relu’, input_shape =(len(x_train [0]) ,)))


(a) Result of perceptron training on rawdata.

(b) Result of perceptron training on fil-tered data.

Figure 5.14: 2 plots comparing the difference in performance of a network trainedon detecting movements in raw and filtered data

Several things can be deduced from Figure 5.14. First of all, it is im-portant to notice there are some deviations between the different runs,

39


which is due to the fact that the initial weights of the network are assignedrandom values between 0 and 1. From that starting point the network im-proves by adjusting the weights after every epoch, until the error-functionreaches a (local) minimum. Also, when a perfect accuracy is reached onthe training-set, the network does not improve anymore, and also the per-formance on the test-set stabilizes. This is something that can in principlebe overcome by creating more data, so it will be harder for the network toreach a perfect classification on the training-set.

Figure 5.15: Performance of 3 perceptrons with multiple layers

Although the differences are not that significant, the above plot indi-cates again several things: in minimizing the error-function, the algorithmseems to get stuck in a point of local minimum, after which the improve-ment is put to a halt. Also, although the training performance of the 4-layer perceptron is best, the accuracy in classifying the test-set is the low-est, which indicates over-training. Also the accuracy on the training-setis 100% around 15 epochs, after which it cannot improve anymore andthis limits the accuracy on the test set as well. Both these problems canbe avoided by adding one or more drop-out layers to the network, whichresets some of the nodes in the network back to a random value between0 and 1 after every training-epoch. In finding to optimal parameters for anetwork however, it is important to stress here that the data, although con-taining 300 samples, is still quite cheap. The test-set consists of just over 50samples, which means that a misclassification of a single sample leads toa drop in the accuracy of 2%. Before adding the drop-out layers, a sanitycheck is done to see whether we are actually training on the movements.A network is trained to distinguish the first half of the movement-samplesfrom the second half, and also to distinguish alternating samples. The re-sult in shown in figure 5.16.

40


Figure 5.16: A perceptron is trained to distinguish samples all labelled as datacorresponding to a movement of the arm.

The network still trains very well (figure 5.16), which again indicatesthat the data is ’cheap’: the network simply learns to recognize every in-dividual sample instead of training on general features.

Adding 2 dropout layers on the multi-layer perceptron, also makesit possible to train over more epochs, since the takes the network moreepochs before it starts over-training. The following 3-layer perceptron istrained over 120 epochs, varying the drop-out between 0.25 and 0.75:


model.add(Dense (512, activation=’relu’, input_shape =(len(x_train [0]) ,)))





Local minima are now mostly avoided. Although the differences arenot that significant, the network still over-trains when using a small drop-out value. When using a high drop-out, the network does not train effi-ciently anymore, as is clear from figure 5.17 on the next page.

41


Figure 5.17: Performance of 3-layer perceptrons with 3 different values for thedrop-out

We look at the confusion matrices corresponding to the performancesshown in the graph above.

dropout 0.25 dropout 0.5 dropout 0.75move no move move no move move no move

move pred. 22 2 25 2 24 2no move pred. 3 32 0 32 1 32

42


Because the test-set only contains of 59 samples, the difference in mis-classifications is in fact marginal. Of course, the restriction for a higher ac-curacy could originate in some man-made mistakes in labeling the events.Another possibility is the occurrence of some random peaks in the noise.A way to investigate this all the samples are plotted in a single graph.

Figure 5.18: All samples in the recording, containing blank samples as well assamples where a movement occurred. For every timepoint an event is labeled,one second of data from all the 14 channels are put in a 1D-array. The length ofevery sample is therefore 14 x 129 = 1806 data-points

From figure 5.18, it seems reasonable to consider all samples with anamplitude of over 30 µV as noisy or to contain outliers. After removingthese, the total sample-set is reduced from 293 to 285. The remaining sam-ples are plotted in figure 5.19.

43


Figure 5.19: Samples after removing all samples containing an absolute valueabove 30 µV. Samples with data above this threshold are considered to be noisyor contain outliers and are not used in the training of the networks to improvethe performance.

A 3-layer network with a drop-out of 0.5 is trained again over 125epochs to show the difference in performance on the complete set and the’cleaned’ dataset (figure 5.20).

Figure 5.20: Performance of a perceptron on all samples and on the ’cleaned’ set,where all noisy samples are removed.

There are of course still much more parameters we could try to opti-mize here. As a last example, we can look at different lenghts of the time-

44


window wherein we look for the events. The results are shown in figure5.21.

Figure 5.21: Performance of a perceptron on several different time-windows

Convolutional network

We can also choose the preserve the 2 dimensional shape of the events.The same perceptron as in the previous section is used, but now combinedwith a convolutional network:


model.add(Conv2D(5, kernel_size =(14 ,20),


input_shape =(14 ,129 ,1)))








We see the performance of this network does not further improve theaccuracy already reached by the perceptron (figure 5.22).

45


(a) Result of training a perceptron on 3runs.

(b) Result of training the combination ofa convolutional network combined witha perceptron.

Figure 5.22: 2 plots comparing the difference in performance of a network trainedon raw and filtered data

When this apparent maximum in training result is reached, the exactvalues for the parameters are not that relevant (as long as the networkdoes not over-train), as shown in figure 5.23.

46


(a) Performance of a convolutional net-work training on 15 filters (instead of 5,as in figure 5.22b).

(b) Performance of a convolutional net-work training on filters with width 40(instead of 20, as in figure 5.22b).

Figure 5.23: 2 plots comparing the difference in performance after varying pa-rameters

Now the trained convolutional network is used to read in a piece ofcontinuous data. The data belongs to the same recording as the one usedto train the network, however the events in this fragment are not used ineither the training- or the test-set, to make sure that the network has notsimply learned the individual events. The stepsize of the window whereinwe look for an event is 20 datapoints, which means that about every 0.15seconds a snapshot is made from the data and fed to the network to testwhether a movement occurred.

Figure 5.24: Result of reading in a continuous data stream. The stars representa marked movement. So the horizontal blue line at zero is in fact a collection ofblue stars, marking a period without movement.

The results are as expected by looking at the accuracy of the network

47


on the test set. Most movements are detected at the correct location, butsome misclassifications are made. Of course it is always possible that somemovement is unwillingly made without consciously marking an event. In-teresting to see is also that mostly the movement is already detected in thedata before the marker-button is actually pressed. This would indicatethat the brain’s activity corresponding to a physical movement is mostlypresent at the very start of the movement.

Now the same routine is repeated for a different recording (figure 5.25).So a network is trained on a specific recording and then used to try anddetect movements in some other recording.

Figure 5.25: Result of reading in a continuous data stream from a recording onwhich is not trained.

Clearly the network is not capable of recognizing the movements in theEEG-data belonging to a file on which it is not trained. To investigate whythis is, an experiment is done as described in the next section.

5.4.2 Testing consistency between filesOne important goal we want to achieve with training a network, is that,after being trained, the network will be able to recognize the same eventson other files as well. To explore this possibility the following experimentis done:

• File 1: A recording of ±100 samples is made (50 movements and 50fragments of blank data).

• The head set is taken of and replaced on the head again.

• File 2.1: A recording of ±300 samples is made (150 movements and150 fragments of blank data).

48


• File 2.2: A recording of ±100 samples is made.


• File 3: A recording of ±100 samples is made.


• File 4: A recording of ±100 samples is made.

The convolutional network as shown in the previous section is usedto train on file 2.1. The trained network is then used to classify all thesamples from file 1, file 2.2, file 3 and file 4. The results are shown in aboveconfusion matrices.

file 1 file 2.1 file 3 file 4move blank move blank move blank move blank

move 37 43 43 19 45 38 48 1blank 6 1 9 32 3 4 5 50

In general, the network seems to perform better if the headset has notbeen moved. The network especially misclassifies blanks as being move-ments. This would indicate that the background noise is quite specific,and changes with a slightly different positioning of the headset. On file 4however, apparently the headset was replaced on the skull with the sen-sors in a similar position. In this case the noise turned out to be similarenough to the noise in file 2.2 to be able to distinguish the blanks from themovements.

The same experiment is repeated but now with eyes closed, to see if thiswill give a more stable result. The training result of file 2.1 (eyes closed) isshown in figure 5.26.

A similar trend is observed: it is mostly the noise (or baseline) thatchanges when the headset is removed and placed back on, which makesit difficult to train on a variety of events collected over multiple experi-ments or evaluate a new recording on possible events without the need oftraining the network again from scratch.

In an attempt to generalize the noise, the network is now trained on 3files where movements where labeled with closed eyes, and then tested ona 4th file (figure 5.27). The combined set then contains 307 train-samplesand 79 test samples.

Indeed, the network is now better capable of detecting the blank sam-ples, but performs significantly worse on detecting the movements. Oneway to resolve this problem is to make a base-line recording at the startof every new recording. Taking blank samples from this, the network canlearn to recognize the basic noise in that specific file.

49


Figure 5.26: Performance of a convolutional network on movements from arecording made with closed eyes.

file 1 file 2.2 file 3 file 4move blank move blank move blank move blank

move 50 50 39 7 48 31 100 49blank 1 0 3 31 2 18 5 54

Figure 5.27: Performance of a convolutional network in detecting movementswhen samples of 3 separate files are put together.

5.4.3 Discerning left- and right-hand movements

One possible way to overcome the problem with the varying backgroundsignal, is to not discern an event from a blank sample, but instead on clas-sifying two different events. Therefor, a network is trained to distinguishleft-hand from right-hand movements. A similar experiment is done asin the previous section. The network is trained on ’File 2.1’. File 2.2 isrecorded after some time, but without moving the position of the sensors

50


file 1, file 2.1, file 3 file 4move blank move blank

move predicted 40 1 83 25blank predicted 4 34 22 78

of the head-set.

Figure 5.28: Performance of a convolutional network in discerning a left-handmovement from a right-hand movement.

file 1 file 2.2 file 3left right left right left right

left 31 23 33 3 19 14right 16 23 1 30 5 10

Results are again not consistent between files. For further examina-tion of what is happening here, a network is trained to recognize a blanksample of data of a recording. This recording also contains marked occur-rences of left- and right-hand movements. An other network is trained todiscern left and right hand movements from previous files. A fragmentof data is now read in as a continuous stream, and classified window bywindow. First a network classifies the fragment as blank data or contain-ing an event. When classified as an event, a second network classifies thesamples as a left hand movement or a right hand movement (figure 5.29).

Interesting is to see that a right-hand movements is detected first as aleft-hand movement. Now this could be just because left- and right- handmovements are not so easily distinguished, but could also be originatingin actual movement. The movements are made by a left handed person,leaning with the left elbow on the table, compensating for lack of trunk-stability due to a spinal-cord injury and resulting paralysis. This means

51


Figure 5.29: Performance of a convolutional network in discerning a left-handmovement from a right-hand movement.

also leaning on left arm when reaching forward with right arm to pushto marker button. But more research would have to done to draw anydefinite conclusions.

5.4.4 Analyze limits of detectable movementsTo investigate the limits of the network in being able to detect changes inthe EEG data due to subtle movements, two experiments are done:

1. Discerning movement of index- and middle-finger.The hand that does the labeling is rested on the marker buttons. Al-ternating a button is pressed by the index-finger and the middle-finger. The experiment is done with the left hand, with eyes open.There are 200 labeled events in the recording. The chosen time-window is 1 sec.The following 3 perceptrons are trained:

Network 1:







Network 2:






52





Network 3:







Figure 5.30: Performance of 3 multi-layer perceptrons in discerning a movementsof index- and middle-finger.

For the multi-layer perceptron, adding an extra layer, nor increasingthe drop-out, can inprove the performance. Also 2 convolutionalnetworks are trained: Network 1:




input_shape =(14 ,129 ,1)))








Network 2:

53



model.add(Conv2D (10, kernel_size =(14 ,20),


input_shape =(14 ,129 ,1)))








Figure 5.31: Performance of 2 convolutional networks in discerning a movementsof index- and middle-finger.

Also for these networks, the performance cannot be improved byaltering the network, as shown in figure 5.31.

2. Discerning movement of left and right index-finger.Both left and right arm are left to rest on the marker-buttons. Alter-nating, the left and right index-finger presses a marker-button. Forthis experiment the first of the convolutional networks shown in pre-vious experiment is used (figure 5.32).

54


Figure 5.32: Performance of a convolutional network in discerning a movementsof left and right index-finger.

5.4.5 Detecting continuous movement

The Emotiv software also allows for a USB-devise to be assigned for label-ing events. In this case, a wireless laser-mouse is used as a marking-device.Moving the mouse results then in a continuous labeling. One experimentis performed here, just as an example of how the labeling can be used andto explore whether not only a spiky movement but also a continues move-ment can be detected. The mouse is moved in circles for a couple secondsand then hold still again for a short period of time. The marker-channel isplotted to illustrate.

Figure 5.33: Marker-channel belonging to a recording where a computer-mouseis moved in circles for 3 periods of time.

55


This is a very convenient way to create a great amount of labeled sam-ples. In this case the total sample set consist of 800 samples, half of whichare labeled movements. The following convolutional network is trained todistinguish the labeled samples during the movement, and samples takingfrom the 2 periods without movement in between the movements.




input_shape =(14 ,129 ,1)))








Figure 5.34: Performance of convolutional network on detecting continuousmovement.

There is an almost perfect training and test accuracy. Now the first partof the labelled movements are excluded from the training set, to make surethe network does not learn these by heart. This piece of data can then beread in as a continuous data-stream where the network classifies the dataone window at the time (figure 5.35).

56

5.5 Detecting sounds 57

Figure 5.35: Detecting continuous movements by reading in a recording windowby window.

5.5 Detecting sounds

In this experiment we try to recognize a brains activity as a result of hear-ing a short shouldpulse. Using the Wave-module of python [16], it isstraightforward to write a regular WAV audiofile that can be played bymost music-programs like iTunes.

As a first attempt a convolutional network is trained over 75 epochs.The samples are chosen to be of length 0.5 seconds (tmin = -0.1, tmax =0.4)

Now in the case of figure 5.36 the blank samples are taken in betweenthe soundpulses. However it could be that the response of the brain tothe soundpulse lasts longer than the 1,5 seconds given here. Thereforethe repeat the trainingprocess, but now take the blanks at the end of therecording, where no soundpulses are present anymore (figure 5.37).

57


Figure 5.36: Detecting soundpulses by a convolutional network. By adding ahigh drop-out it is clear that there is no overtraining.

Figure 5.37: Detecting soundpulses by a convolutional network. The blank sam-ples are here collected at the end of the recording.

Clearly the network is able to train here. To do a sanity check, we alsotry to train the network to distinguish blank samples taken from in be-tween the bleeps and the blank samples collected at the end of the record-ing (figure 5.38).

As expected the blanks from between the bleeps are just as distinguish-

58


Figure 5.38: Discerning the blanks taken in between the bleeps, and blank sam-ples taken at the end of the recording.

able from the blank samples taken at the end of the recording as the bleeps.Now there are two possible explanations for this. Either the noise slightlychanges over time, or there is an actual ’afterglow’ of activity in the braincaused by the soundpulse, that makes it unable to clearly discern the sam-ples taken at the moment of the bleep and the samples from in betweenthe bleeps. Also both options can be the case.

To first examine the factor contributed by the evolution of the noise,we try to discern the first half of the samples of bleeps from the secondhalf. The result is shown in figure 5.39. In theory this result could also becaused by a conditioning of the brain, showing less activity after hearinga bleep after this sound is repeated with a constant interval a lot of times,but this is not to be expected. To try to minimize the activity of the brain inbetween the bleeps, we also make a recording with a much larger intervalof 10 seconds in between the bleeps.

59


Figure 5.39: The first half of the samples taken at the moment of the bleeps isslightly discernable from the second half of the bleeps due to a shifting in thenoise.

Figure 5.40: Training the convolutional network on 200 samples, samples takenwhen listening to a soundpulse with intervals of 10 seconds.

From figure 5.40 it is clear that also when using a larger interval, anetwork cannot be trained to recognize the brains activity as a result oflistening to a soundpulse.

The above experiments are all, as mentioned, performed by using ear-phones. To exclude the possibility that the noise coming from the ear-

60


phones itself precludes a training performance by dominating the signal,the same experiment is also done by using speakers.

Figure 5.41: Using speakers to produce a sound-pulse instead of earphones, toeliminate possible noise from the earphones.

Also when using the speakers instead of the headphones, our networkis unable to distinguish the samples (figure 5.41).

One possibility to be able to train a network longer without the prob-lem of over-training, is to generate more samples. A way to do this is tonot just take a sample at the start of a soundpulse, but also 0,05 secondlater, and repeat this several times. The result of training this extended setof samples is shown in figure 5.42.

By increasing the drop-out to a value of 0.7 we can train this set of2000 samples over 600 epochs without over-training. But because we usea convolutional network, it could be the case that the kernels used by thenetwork to slide over the sample actually find the similarities where thesamples overlap. If this is the case, then using a perceptron should giveagain no training result at all.

61


Figure 5.42: Creating a bigger trainingset by multiplying the samples by using asmall time shift. This way test accuracy of more then 90% can be achieved.

Figure 5.43: The training of a perceptron on the extended trainingset of 2000 sam-ples.

Indeed, as shown in figure 5.43, a perceptron does not train on thisextended trainingset, and earlier results presented when using a convolu-tional network are mostly likely due to the overlapping of the samples.

62

5.6 Suggestions for further experiments 63

5.6 Suggestions for further experiments

5.6.1 Explore noise reductionThis should be done in 2 components.

• Create a PREP-pipeline

A PREP-pipeline is a standardized early-stage EEG processing pipelinethat focuses on the identification of bad channels and the calculation of arobust average reference. For a machine-learning course where the innerworkings of MNE-python itself is not a learning objective of the course, itcould be efficient to create a standardized preprocessing pipeline to pre-pare the data that has the consistency and informational content to makeit suitable for the comparison of several neural networks and to be ableto start with simple shallow networks right away. On the other hand it ispart of the goal of a hands-on-machine learning course to learn to recog-nize noise in data, and understand the different possible sources of noise.A standard PREP-pipeline would remove all these noises, along with un-expected artifact and even remove mislabeled data from the training andtest set, instead of forcing the student to work in a systematic way anddealing with and learn to distinguish, unavoidable and intrinsic sourcesof noise, and environmental or circumstantial sources and of course thehuman fail-ability like simple mislabeling and unsynchronized labeling.

Also, a standard preprocessing procedure might not so easy to realize.For example, when filtering on frequencies, it is hard to say beforehandwhich frequency band will contain relevant information about concerningthe occurrence of specific event. Likewise, when applying an IndependentComponent Analysis, it is not always evident before-hand which compo-nent contains the data produced by the events, and which component con-tain data produced by sources of noise or artifacts.

Most importantly: this research has shown that the noise contaminat-ing the data is different for every recording. Plausible explanations for thiscould be that the are because the position of the sensors will slightly differ(else) In contrary of intrinsic noise of the devise or environmental noise, oreven noise inherent to a particular experiment (e.g. electromagnetic noisefrom speakers when training on auditory input) can be removed in a stan-dard way once identified and analyzed can be removed in a preprocessingprocedure.

• Employ machine-learning techniques to denoise EEG-data

An interesting alternative to deal with noise is to apply neural net-works not just to classify events, but also to preprocess the recorded signalby a denoising network. An obvious example would be to use a denois-ing autoencoder for this, although preliminary research on this has not

63


shown any promising results [17]. In the process of producing a lower di-mensional representation of the data, the network also removes to muchfeature specific information from the data. More promising is the appli-cation of the more advanced back propagation neural network ensemble[18]. It has proven to reduce the noise inside the signals, but also prservingthe signals characteristics.

5.6.2 Compare the performances of more types of neuralnetworks

In this research, only the two most basic, i.e. the Multilayer Perceptronand the Convolutional Network have been used to examine the possibil-ity whether binairy classification of events extracted from EEG data waspossible, with only a qualitative comparison between performances, with-out any investigation into why certain events would be better classified byone type of network, nor is any statistical comparison given. The goal ofthis research was just to explore whether binairy classification would leadto satisfying results. Now this is shown to be possible it would be inter-esting to explore in a more quantitative way the performance of severalmore sophisticated networks, and also explore possible explanations forthe differences in performance. In fact, recent study has shown that Ran-dom Forest ensemble learning is superior when it comes to analysis EEGdata compared to other commonly used machine-learning algorithms [7].

5.6.3 Explore unsupervised learning techniques

In this research, networks are only trained on labeled data. One of theproblems encountered when training the networks was the over-trainingof the networks, especially when a more complex networks with multi-ple layers was required to distinguish the members of two classes. Cre-ating labeled data can be difficult and hazardous. First of all it requiresconstant attention and awareness not to mislabel any events, and preventcontaminated samples with unconscious eye blinks or other bodily move-ments. Also the labeling itself can influences the data, and labeling maynot be always synchronized in the same way with the time the events oc-curred. Finally, making long recordings with labeled data can also just bevery boring and uncomfortable to remain in the same body position to en-sure consistent data. An easy way to overcome these problems is to trainnetworks on unsupervised data. Already interesting results have beenmade in this particular field, particularly on emotion recognition by an-alyzing frequency components from the power-spectrum-density[19]. Ofcourse, the process of learning to creating training sets of labelled data isalso something that is valuable to include in a course on machine-learning,

64

5.6 Suggestions for further experiments 65

especially since setting up a hands-on-course was the initial intention ofthis exploratory research.

5.6.4 Explore multi-class classificationIn this research a network is only trained to distinguish an events froma blank samples, or to discern two events. When training on multipledifferent events, noise reduction will become increasingly important. Oneof the reason for is that when to create large data-sets of several types ofevents, it will be necessary to combine labelled data from several separaterecordings.One of the problems that this research has brought forwards isthat the noise that is present on individual recordings is, like mentionedbefore, very inconsistent.

65

Chapter 6Conclusion

The overall goal of this research project has been to explore the potential ofthe Emotiv Epoc+ as an educational tool in a hands-on machine learningcourse for physics and astronomy students. For this, the Emotiv has beenused to perform some practical experiments and supporting software hasbeen explored in analysing the created data.

The Emotiv has shown to be an accessible tool based on a plug-and-play concept. Very little background knowledge and practice is neededbefore consistent data with a 100% connectivity from the electrodes canbe achieved. Also the Emotiv-Pro software is straightforward and can beused with minimal introduction. The use of the MNE-python library how-ever does require some basic knowledge of Python and understandingof the concept of object-oriented programming. For all the rich possibili-ties of the MNE-package there is an outstanding documentation available,which is of great help. Also, since extensive use of the Keras-library wouldbe indispensible for any machine-learning course, some introductory lec-tures should be devoted to explaining its features.

The data recorded by the Emotiv has shown to be stable and con-sistent enough for a neural network to be trained to recognize repeatedevents. Blinking can be detected with up to a 100% accuracy. Also arm-movements can be discerned within periods of sitting still with accuraciesranging from 60% to over 90%, depending on the profoundness of themovement, as one would expect intuitively.

In order to utilize this consumer grade EEG-device in a machine learn-ing course, however, a better analysis of the base-line noise is necessary.The base-line signal turns out to be different for every new recording oncethe headset has been moved slightly. Also, there is an apparent evolu-tion of the noise throughout a recording, which makes a simple base-linemeasurement at the beginning of a recording only of limited use.

These problems have to be dealt with so that data from several differentrecordings can be put together. This way an extensive training and testset

67

68 Conclusion

can be composed throughout the intended course. This accumulation ofdata is essential, because during the course we would want to be able tocompare the performances of increasingly complex neural networks. Thecomplexity of the networks we can build is now restricted by the limitedsize of the training- and testset. By adding more layers to the network,overtraining quickly becomes a dominating factor. Also, with a modestamount of training-samples, the performance of any network is limited bythe noisiness of the recording and cannot be further improved by alteringthe network. Trying to discriminate between networks based on perfor-mance then becomes futile, since differences hinge on misclassificationsof just one or two samples. In this case, differences between successiveruns of the same network due to the randomly generated initial weightsof the neurons become larger then the difference in performance betweenalted networks. Therefore, to train and compare increasingly complex net-works, an an increasingly large sample-set is needed. To build this weneed to be able to combine multiple separate recording, and to be able toto this, the base-line noise has to be better understood and reduced.

68

Bibliography

[1] A. Radovic, M. Williams, D. Rousseau, M. Kagan, D. Bonacorsi,A. Himmel, A. Aurisano, K. Terao, and T. Wongjirad, Machine learningat the energy and intensity frontiers of particle physics, Nature 560 (2018).

[2] M. R. Hush, Machine learning for quantum physics, Science 355, 580(2017).

[3] D. Xue, P. V. Balachandran, J. Hogden, J. Theiler, D. Xue, and T. Look-man, Accelerated search for materials with targeted properties by adaptivedesign, in Nature communications, 2016.

[4] M. Gnida, Machine learning proliferates in particle physics, SymmetryMagazine (2018).

[5] Machine Learning for Physicists: https://machine-learning-for-physicists.org.

[6] A short course on Machine Learning for Physicists:https://www.umdphysics.umd.edu.

[7] A. Chan, C. E. Early, S. Subedi, Y. Li, and H. Lin, Systematic analysisof machine learning algorithms on EEG data for brain state intelligence,in 2015 IEEE International Conference on Bioinformatics and Biomedicine(BIBM), pages 793–799, 2015.

[8] H. D. B. . M. Khodayari-Rostamabad, Reilly, A machine learning ap-proach using EEG data to predict response to SSRI treatment for major de-pressive disorder, Clinical Neurophysiology, 124(10), 1975-1985 (2013).

[9] A. R. C. Donati, S. Shokur, E. Morya, D. S. F. Campos, R. C. Moioli,C. M. Gitti, P. B. Augusto, S. Tripodi, C. G. Pires, G. A. Pereira,F. Brasil, S. Gallo, A. A. Lin, A. K. Takigami, M. A. Aratanha, S. Joshi,H. Bleuler, G. Cheng, A. Rudolph, and M. A. L. Nicolelis, Long-Term

69

70 BIBLIOGRAPHY

Training with a Brain-Machine Interface-Based Gait Protocol Induces Par-tial Neurological Recovery in Paraplegic Patients, 6, 30383 (2016).

[10] D. E. Thompson, K. L. Gruis, and J. E. Huggins, A plug-and-play brain-computer interface to operate commercial assistive technology, Disabilityand Rehabilitation: Assistive Technology 9, 144 (2014).

[11] Emotiv Epoc+: https://www.emotiv.com/epoc/.

[12] A. Gramfort, M. Luessi, E. Larson, D. Engemann, D. Strohmeier,C. Brodbeck, R. Goj, M. Jas, T. Brooks, L. Parkkonen, andM. HAmAlAinen, MEG and EEG data analysis with MNE-Python, Fron-tiers in Neuroscience 7, 267 (2013).

[13] Python documentation: https://docs.python.org/2/tutorial/index.html.

[14] Keras documentation: https://keras.io.

[15] Jupyterlab Documentation: https://jupyterlab.readthedocs.io/en/stable/.

[16]

[17] F. Ambrogi, R. Bakker, N. Mota, T. Pelsmaeker, and S. Shokdrani,Gaming with the Mind: Classifying EEG Signals with Advanced MachineLearning Techniques, (2017).

[18] Y. Chen, M. Akutagawa, M. Katayama, Q. Zhang, and Y. Kinouchi,Neural network based EEG denoising, in 2008 30th Annual InternationalConference of the IEEE Engineering in Medicine and Biology Society, pages262–265, 2008.

[19] Z. Lan, O. Sourina, L. Wang, R. Scherer, and G. MAŒller-Putz, Un-supervised Feature Learning for EEG-based Emotion Recognition, in 2017International Conference on Cyberworlds (CW), pages 182–185, 2017.

70

Appendices

71

Appendix AEmotiv EPOC+ Specifications

EEG sensors14 channels: AF3, F7, F3, FC5, T7, P7, O1, O2, P8, T8, FC6, F4, F8, AF42 references: CMS/DRL references at P3/P4; left/right mastoid pro-cess alternativeSensor material: Saline soaked felt pads

ConnectivityWireless: Bluetooth Low EnergyProprietary USB receiver: 2.4GHz bandUSB: to change headset settings

EEG signalsSampling method: Sequential sampling, single ADCSampling rate: 2048 internal downsampled to 128 SPS or 256 SPS (userconfigured)Resolution: 14 bits with 1 LSB = 0.51IŒV (16 bit ADC, 2 bits instru-mental noise floor discarded), or 16 bits (user configured)Bandwidth: 0.16 a 43Hz, digital notch filters at 50Hz and 60HzFiltering: Built in digital 5th order Sinc filterDynamic range (input referred): 8400 IŒV(pp)Coupling mode: AC coupled

Motion sensorsIMU part: LSM9DS0Accelerometer: 3-axis +/-8gGyroscope: 3-axis +/-2000 dpsMagnetometer: 3-axis +/112 gaussSampling rate: 32 / 64 / 128 Hz (user configured)Resolution: 16 bits

Supported platformsWindows: 7,8,10; 2GB RAM; 200MB available disk spaceMAC: OS X; 2GB RAM; 500MB available disk spaceiOS: 9 or above; iPhone 5+, iPod Touch 6, iPad 3+, iPad mini

73

74 Emotiv EPOC+ Specifications

Android: 4.43+ (excluding 5.0); device with Bluetooth Low EnergyPower

Battery: Internal Lithium Polymer battery 640mAhBattery life: up to 12 hours using USB receiver, up to 6 hours usingBluetooth Low Energy

74

exploring the possibility to employ a consumer-grade eeg

Documents