determination of emotional state through …
TRANSCRIPT
DETERMINATION OF EMOTIONAL STATE THROUGH PHYSIOLOGICAL
MEASUREMENT
by
Lee B. Hinkle, B.S.
A thesis submitted to the Graduate Council of
Texas State University in partial fulfillment
of the requirements for the degree of
Master of Science
with a Major in Computer Science
December 2016
Committee Members:
Vangelis Metsis, Chair
Mina Guirguis
Yijuan Lu
FAIR USE AND AUTHOR’S PERMISSION STATEMENT
Fair Use
This work is protected by the Copyright Laws of the United States (Public Law 94-553,
section 107). Consistent with fair use as defined in the Copyright Laws, brief quotations
from this material are allowed with proper acknowledgment. Use of this material for
financial gain without the author’s express written permission is not allowed.
Duplication Permission
As the copyright holder of this work I, Lee B. Hinkle, authorize duplication of this work,
in whole or in part, for educational or scholarly purposes only.
DEDICATION
To the memory of my mother and father, Irene M. and Glenber L. Hinkle who
consistently stressed the importance of education and helping others.
v
ACKNOWLEDGEMENTS
One of my favorite metaphors is “we stand on the shoulders of giants”. The
work presented here is the result of the efforts of many people who have generously
shared their knowledge and support with me directly and indirectly. As I sit here writing
on a Friday night I am reminded that the research and discovery is the fun part but real
value comes from documenting the information so it can be used by others. I am
particularly grateful to those who teach and share their knowledge broadly.
I want to thank my advisor Dr. Vangelis Metsis for establishing the Intelligent
Multimodal Computing and Sensing lab which I am proud to be a part of as both student
and a researcher. I am grateful for his guidance on deeply technical topics as well as
presentations and papers. My fellow researcher and frequent lunch buddy Kamrad
Roudposhti was always willing to help me with MATLAB and machine learning
principles. I learned a great deal during our frequent brain storming sessions at the
whiteboard. I would like to thank a full cast of fellow researchers and students with
whom I have enjoyed spirited discussions and in some cases collaborated with on
graduate projects: Alakh Biniwale, Sahar Azimi, Jayadharini Jaiganesh, Justin May,
Xiaoqing Liao, Iaonnis Rigas, Priyank Trivedi, Adam Anderson, Thomas Hsiao.
I would like to thank Dr. Yijuan Lu and Dr. Mina Guirguis for agreeing to serve
on my committee. Molly O’Neil generously provided me with many great pointers and
bits of advice and also serves as an inspiration through the care she shows for her
vi
students and the effort she puts into teaching. Susan Alexander helped me immensely
with the purchasing and receipt of equipment, many times from non-traditional suppliers.
Shannon Hicks was always helpful when I really didn’t know who else to ask.
I would like to thank Texas State University for offering options for those of us
working in industry to continue our education at the Graduate Level. The Support Grant
Fellowship awarded to me by the Graduate College was critical for the purchase of the
virtual reality and recording equipment and is very much appreciated. I am hopeful that
this equipment will serve the needs of future Computer Science students for many years.
I would like to thank Dr. Kenneth Smith for the use of the Virtual Reality Lab and
David Morely for helping me with some of the early and very critical VR runs.
Although we have only met via email I would like to thank Bernard Tarver at
Great Lakes Neurotechnologies for his prompt and patient replies to my frequent
questions regarding the BioRadio.
Finally, I would like to thank my wife Christine for her support while returned to
full time student status and focused on my studies and research. I would also like to
thank my wonderful daughters; Paige lent me the original Oculus DK2 while Shawn
shared her MATLAB experiences and opinions.
vii
TABLE OF CONTENTS
Page
ACKNOWLEDGEMENTS ................................................................................................ v
LIST OF TABLES ............................................................................................................. ix
LIST OF FIGURES ............................................................................................................ x
LIST OF ABBREVIATIONS ........................................................................................... xii
ABSTRACT ..................................................................................................................... xiii
CHAPTER
1 INTRODUCTION ............................................................................................... 1
1.1 Motivation ............................................................................................. 2 1.2 Challenges ............................................................................................. 3 1.3 Applications .......................................................................................... 4
2 BACKGROUND AND RELATED WORK ....................................................... 6
2.1 Human Emotional Response ................................................................. 7
2.2 Physiological Signals ............................................................................ 7 2.2.1 Electrical Physiological Signals ............................................. 8
2.2.1 Non-Electrical Physiological Signals .................................. 10 2.3 Emotional Classification Labels ......................................................... 11 2.4 Experimental Elicitation of Emotional Response ............................... 12
2.5 Data Processing and Feature Extraction ............................................. 15
3 METHODOLOGY ............................................................................................ 17
3.1 Experimental Design for Elicitation of Response ............................... 17
3.2 Data Collection ................................................................................... 20
3.2.1 Physiological Signals Measured .......................................... 20 3.2.2 Skin Preparation and Electrode Placement .......................... 22
3.3. BioRadio mounting and cabling ........................................................ 25 3.3.1 Video Collection and Event Markers ................................... 28 3.3.2 Subject Feedback Regarding Arousal and Valence ............. 28
viii
3.4 Signal Processing and Classification .................................................. 30 3.4.1 Data Export .......................................................................... 30 3.4.2 Data Import, Table Join, and Labeling ................................ 32 3.4.3 Feature Extraction ................................................................ 36
3.5 Institutional Review Board ................................................................. 36
4 EXPERIMENTS AND RESULTS .................................................................... 38
4.1 Self-Reported Data – Range of Responses during Simulations.......... 38 4.2 Initial Signal Analysis and Data Conversion ...................................... 42
4.2.1 Rose and I Movie ................................................................. 43 4.2.2 Roller Coaster ...................................................................... 49
4.2.3 Pendulum Swing .................................................................. 55 4.3 Feature Extraction and Segment Size ................................................. 61
4.3.1 Simple Mean and Standard Deviation Feature Extraction ... 61 4.3.2 Temporal and Frequency Based Feature Extraction ............ 64 4.3.3 Domain Specific Feature Extraction .................................... 65
4.4 Feature Selection ................................................................................. 67 4.4.1 Simple Mean and Standard Deviation Feature Selection .... 67
4.4.2 Temporal and Frequency Based Feature Selection .............. 67 4.4.3 Domain Specific Feature Extraction .................................... 68
4.5 Classification Accuracy ...................................................................... 68
5 CONCLUSION .................................................................................................. 70
LITERATURE CITED ..................................................................................................... 72
ix
LIST OF TABLES
Table Page
1. Electrical Physiological Signal Names and Sensor Types ............................................ 10
2. Non-Invasive Physiological Signal Names and Sensor Types (partial list) ................. 11
3. Emotion Elicitation Techniques and States .................................................................. 14
4. Physiological Signals Recorded .................................................................................... 21
5. Summary of Subject Reponses ..................................................................................... 38
6. Subject 1-3 Detailed Responses .................................................................................... 39
7. Subject 4-5 Detailed Responses .................................................................................... 39
8. MATLAB Classifier Accuracy with 5 fold cross-validation, 1 second epoch,
and mean + standard deviation features ................................................................ 63
9. MATLAB Classifier Accuracy with 5 fold cross-validation, 5 second epoch,
and mean + standard deviation features ................................................................ 64
10. List of Temporal and Frequency Based Features ....................................................... 65
x
LIST OF FIGURES
Figure Page
1. Placement of Electrodes on Left and Right Hands ....................................................... 23
2. Placement of Electrodes on Face .................................................................................. 24
3. “Head” BioRadio mounting improvements .................................................................. 26
4. Body BioRadio mounting with Back Support Belt ...................................................... 27
5. Emotion and Range Subject Response Form – too complex ........................................ 29
6. Directory Listing showing Data Files and Size after Export ........................................ 31
7. Raw Subject 1 Data after Import as MATLAB Table .................................................. 32
8. Screenshot of Excel Spreadsheet used for Timing Synchronization ............................ 34
9. Sequence Diagram of Import from .csv ........................................................................ 35
10. Subject Responses marked on Arousal-Valence axis ................................................. 40
11. Summary of all Subject Responses marked on Arousal-Valence axis ....................... 41
12. Side by side screenshot of Signal Data shown in BioCapture Software ..................... 42
13. Subject Heart Rate during Rose and I Segment .......................................................... 44
14. Peripheral Blood Volume during Rose and I Segment ............................................... 45
15. EEG during Rose and I Segment ................................................................................ 46
16. EMG during Rose and I Segment ............................................................................... 47
17. GSR during Rose and I Segment ................................................................................ 48
18. Peripheral Temperature during Rose and I Segment .................................................. 49
xi
19. Subject Heart Rate during Roller Coaster Segment .................................................... 50
20. Subject Peripheral Blood Pulse Volume Roller Coaster Segment .............................. 51
21. Subject EEG f4 during Roller Coaster Segment ......................................................... 52
22. Subject EMG signal during Roller Coaster Segment ................................................. 53
23. Subject Galvanic Skin Response signal during Roller Coaster Segment ................... 54
24. Subject Peripheral Skin Temperature during Roller Coaster Segment ....................... 55
25. Subject Heart Rate during Pendulum Swing Segment ............................................... 56
26. Subject Peripheral Blood Pulse Volume Pendulum Swing Segment ......................... 57
27. Subject EEG f4 during Pendulum Swing Segment..................................................... 58
28. Subject EMG signal during Pendulum Swing Segment ............................................. 59
29. Subject Galvanic Skin Response signal during Pendulum Swing Segment ............... 60
30. Subject Peripheral Skin Temperature Pendulum Swing Segment .............................. 61
xii
LIST OF ABBREVIATIONS
Abbreviation Description
Ag/AgCl Silver/Silver Chloride
CSV Comma Separated Value
ECG Electrocardiogram
EDA Electrodermal Activity
EEG Electroencephalogram
EMG Electromyography
EOG Electrooculography
GSR Galvanic Skin Response
HRV Heart Rate Variability
kNN k-th Nearest Neighbor
PPG Photoplethysmogram
PPV Pulse Pressure Variation
PulseOx Pulse Oximetry
RIP Respiratory Inductance Plethysmography
SKT Skin Temperature
SpO2 Oxygen saturation of arterial hemoglobin
SVM Support Vector Machine
xiii
ABSTRACT
The goal of this thesis is to develop and evaluate methods of emotional response
classification using human physiological data. With the continued development of
automated systems that interact closely with humans in a more natural manner the ability
of such systems to determine the emotional state of nearby subjects and adapt
accordingly is increasingly important. Applications include the broad area of affective
computing as well as more specific areas such as evaluating the effectiveness of virtual
reality based treatment for social phobias. In this work, various non-invasive sensors are
used to collect physiological data during virtual reality simulations. Feature extraction,
feature selection, and machine learning is performed on the data to determine which
signals and algorithms produce the most accurate classification of the subject’s emotional
response to the simulations.
1
1 INTRODUCTION
The primary goal of this research is to determine emotional state through the non-
invasive collection and analysis of physiological data during virtual reality or other
immersive audio/visual simulations. Physiological data is data pertaining to biological
systems. Physiological signals, also known as biosignals, such as the appearance of the
skin, the mechanical response to light tapping, and acoustic data such as heart, breathe,
and gastrointestinal sounds have been collected and used to assess overall health for
many centuries [1]. Humans have the ability to determine if a person is upset or ill based
on observation and experience without formal medical training. Pale skin, sluggish
movement, and raspy voice are all common clues that a person might be ill. Similarly,
we are able to detect if a person is upset, happy, or sad by observing physical clues,
actions, and environmental factors. Humans however are not able to directly measure
the electrical potentials that exist on and within the body. As we think and move our
bodies develop electrical potentials that can be measured. With current technology we
have the ability to measure very small signal electrical differences through the use of
non-invasive adhesive sensors connected to very sensitive amplifiers. The data collected
can be processed using digital signal processing and machine learning classification
algorithms to perform meaningful work such as sleep analysis and disease diagnosis.
Continued technological advancements combined with existing and new needs is
driving significant work in the field of human computer interaction. One the technical
side the incredible growth in smartphones has not only provided easy access to data
acquisition at almost any time it has also brought about a significant increase in
2
performance while lowering of cost of many sensors. Cameras, ambient light detectors,
and accelerometer/gyroscope combination sensors have all benefited significantly due to
the popularity of smartphones. When incorporated into a smartphone or wearable device
these sensors can provide significant information about the owner’s activities and the
environment surrounding them. Additionally, significant improvements in processing
power due to the availability of cloud based computing and massively parallel graphical
processing units allow us to run much more sophisticated and computationally complex
algorithms on ever increasing amounts of collected sensor data.
This research will be applicable to many real-world problems including: the
evaluation of the effectiveness of various addiction treatments, the study of the impact of
media and/or audio/visual stimuli such as gaming, and the ability to monitor emergency
responders during crisis simulations.
1.1 Motivation
There are multiple technological and demographic changes that highlight the need
for evermore human aware computing. As more automated systems replace roles
previously performed by humans it will be beneficial to provide the systems with some
level of simulated empathy. This ability to adapt based on human emotion has been
termed Affective Computing by Rosalind Picard [2]. The driverless car is an appropriate
and familiar example of an emerging computing category that will benefit from
additional information regarding the passengers and bystanders emotional state. Another
example is home health devices and even robotic assistants where natural language
interfaces can lower the barrier and obvious divide between human and computer.
3
Today many facilities are outfitted with emergency defibrillators which upon removal
from the storage container provide spoken instructions on how to properly connect to the
person requiring assistance. Once connected data sensor will be used to control the
actions of the defibrillator or provide further vocal instructions. Lowering the barrier
and facilitating more natural interaction between humans and the device will clearly
improve the outcome in what is likely a life-or-death situation.
1.2 Challenges
In the following sections greater depth and detail will be provided on some of the
challenges that can lead to poor prediction of emotional response. One of, if not the, key
challenges is the obtainment of accurate measurements. There are a number of aspects
that need to be considered. Some are inherent in the biology of the subjects; these
include the fact that the signals themselves are very small and the body generates a
variety of different electrical potentials in response to brain activity, muscle activity, and
involuntary reactions to stimuli [3]. On the sensor side there is a constant struggle
between the need for very good electrical conductivity to the skin and the non-invasive
least intrusive goals for a system which would ideally be used continuously. For
example, highly accurate measurement of muscle activity typically entails the use of a
needle electrode in the muscle itself [4]. For the best non-invasive signal on the skin
surface, skin preparation such as an alcohol scrub to remove dead cells and a strong
adhesive electrode with pre-applied electrolyte gel is typically used. If the location has
hair it may be shaved beforehand to facilitate a better connection and lessen the difficulty
in removal of the electrode [5]. Given these hassles widespread adoption is unlikely.
4
However, it is easy to foresee that electrodes will be incorporated into clothing, jewelry,
and other worn accessories. Currently the signals available through these types of
sensors are far weaker and noisier than securely attached disposable electrodes [6].
Similarly, the results for replacing a gelled EEG electrode, one of the most cumbersome
electrodes due to the quantity required along with hair interference for skin preparation
are improving. [7]
This research is focused on the evaluation of current techniques from a technical
perspective however, there are significant personal and privacy issues that will need to be
addressed as data from sources such as webcams, wearable devices, and smartphones are
used to give systems and their developers the ability to classify a person’s emotional
state. Stephen Fairclough provides a brief introduction to some of these considerations
in his argument for stronger privacy protections of physiological data [8].
1.3 Applications
As machines become more human-like through emerging capabilities such as
natural language recognition and response and they are used even more pervasively to
complete tasks currently performed by humans such as home health care the importance
of proper emotion state classification will increase [9]. For example, a taxi driver would
be able to tell if a passenger appears uncomfortable and potentially nauseous. In that
case the driver would likely take action such as providing fresh air, describing the current
traffic situation and expected arrival time, or even modifying the route to limit the
number of turns and curves. As self-driving cars replace this human task, it will be
essential to have the capability to use the information regarding the passengers emotional
5
and physical well-being to perform the proper actions to ensure that their comfort is
maximized and needs are met.
Despite many advances in sensor technologies, amplification techniques, filtering
and machine learning algorithms the ability to predict emotional response is still quite
low even with fairly cumbersome sensors, in controlled environments, and with limited
classification labels [10]. However, the number of sensors worn or near a person will
continue to increase significantly. Ultimately accurate prediction will likely require a
significant amount of sensor fusion [11] with input not only from a number of sensors but
also from a number of different types of sensors. With so many sensors techniques to
eliminate noisy or erroneous data will have to be developed and improved so that a single
bad sensor does not disrupt the accuracy of the entire system. Three broad categories of
sensors are in use today to detect physiological data [12]. These include imaging via
cameras, surface electrical sensors as described here, and the 6-axis
accelerometer/gyroscope combination sensor [13] that is present in virtually every
modern smartphone and tablet [14].
6
2 BACKGROUND AND RELATED WORK
Human observable physiological data plays and important role in many areas
ranging from social interaction, parenting, and the practice of medicine. The discovery of
the electrical nature of living organisms and improvements in the ability to measure very
small signal electrical differences have enabled the collection of physiological data that is
useful for sleep analysis [15] and disease diagnosis [16]. Early work regarding
emotional states includes that by Ekman and Friesen on universal facial behaviors. This
study used six emotional states: happiness, anger, sadness, disgust, surprise, and fear
[17]. The interactions of the sympathetic and parasympathetic nervous system cause
physiological changes that are measurable. Work in this area includes Emotion
Classification based on Bio-Signals, [18] which measured electrodermal activity (skin
resistance), electrocardiograph (heart activity), peripheral temperature, and blood oxygen
levels. Image processing and machine learning techniques have been used to classify
emption based on facial images and voice recordings [19].
From end to end the process of classification of human state followed by the
research studies surveyed can be summarized as:
Collect Data preferably utilizing multiple subjects who are experiencing a wide
range of emotions.
Process the data to remove noise and erroneous samples.
Add labels with the emotional state information.
Parse the data into fixed time segments e.g. 2.56 seconds for 50 Hz sampling
yields 128 data points.
7
Perform feature extraction on each data type and time segment e.g. the mean
value and fundamental frequency.
Separate the data into training and testing portions. Typically, the training data
is significantly larger, a common split is 70% training, 30% testing.
Use the training data to train the targeted classifier.
Use the trained classifier to predict the emotional state of the testing data
samples.
Compare the predicted and actual results to establish the accuracy of the
predictions. Actual versus predicted results are shown in a confusion matrix
from which an overall accuracy and error rate can be computed.
2.1 Human Emotional Response
The prediction of human emotional response via biosignals is based on the
relationship between the parasympathetic and sympathetic branches of the autonomic
nervous system [20] [21] [22] [23]. Further details of the signals used are provided in
the following sections.
2.2 Physiological Signals
This research includes only physiological signals that can be measured non-
invasively. Specifically, sensors which involved puncturing of the skin or contact with
mucosal membranes (e.g. oral thermometer) were excluded when determining the signals
to be monitored. The remaining signals can be separated into two distinct categories –
electrical biosignals and physical biosignals. The electrical signals are monitored via
adhesive electrodes connected to a highly sensitive amplifier. The physical signals are
8
monitored with a variety of sensors which sense some physical aspect of the subject such
as movement or surface skin temperature.
2.2.1 Electrical Physiological Signals
Significant information can be gained through the collection of electrical
physiological data. The most commonly accessed electrical physiological signals are
summarized in Table 1 and include:
Electroencephalogram (EEG): An external measure of the activity occurring in the
brain. EEG signals are categorized based on location of measurement and frequency.
They are among the most difficult electrical signals to measure due to the low signal
amplitude, the difficulty in securing electrodes through the hair and to the scalp, and the
likelihood of other signals such as facial muscle movement induced EMG to overwhelm
the EEG signal [24].
Electrocardiogram (ECG): A measure of the electrical activity driving the heart [25].
Typically measured as a differential signal across the heart with electrodes placed on both
wrists or both shoulders with a ground reference on the elbow. For medical applications
additional electrodes are typically placed at predefined locations on the chest; the
resulting signals provide insight in the sequence of contractions of the heart.
Electrodermal Activity (EDA): Electrodermal activity is historically referred to as
Galvanic Skin Resistance (GSR). It is commonly recognized as a key signal in lie
detector equipment [26] that includes two electrodes strapped to the tips of two fingers on
the same hand. The fundamental observation is that during periods of increased arousal
the electrical conductance of the skin decreases especially when measured across the
9
palm of the hand or sole of the foot [27]. It is one of the earliest electrical physiological
signals to be associated with changes in emotions.
Electromyography (EMG): A measure of the electrical potential between two points
along a muscle which indicates the level of muscle activity. Due to the inherent
electrical resistance of the skin surface EMG measurements have limitations over needle
based electrodes but meet the non-invasive criteria. To detect emotion response EMG
measurements are typically made on muscles that tense under stress such as the jaw and
shoulder muscles [28].
Electrooculography (EOG): The eyeball exhibits a voltage potential from front to back.
By analyzing signals obtained from electrodes placed above/below and left/right of the
eye the position and movement of the eye can be calculated [29]. This is especially
useful for sleep studies where eye movement cannot be determined visually. Because
the eye is so closely coupled to our vision system which represents a significant portion
of our brain factors such as speed of movement and duration of held gaze are likely more
important to emotional state than the actual direction the subject is looking.
10
Table 1: Electrical Physiological Signal Names and Sensor Types
Signal Name Sensor Type
Sensor
Location
Physiological Data
Electroencephalogram
(EEG)
Ag/AgCl
electrode
Scalp Brain Activity
Electrocardiogram
(ECG)
Disposable
electrode
Wrists or
Chest
Heart Rate, Heart Rate
Variability
Electrodermal Activity
(EDA) aka GSR
Finger
Electrode
Hand or Foot
Skin Resistance, Sweat
Gland Activity
Electromyography
(EMG)
Disposable
electrode
Arms, Legs,
Face
Muscle Activity
Electrooculography
(EOG)
Ag/AgCl
electrode
Face Eye Movement, Position
2.2.1 Non-Electrical Physiological Signals
The common electrical signals described previously represent only a subset of the
possible physiological measurements that can be made. Many wrist or waist-worn
fitness trackers can accurately measure movement and therefore they are able determine
the number of steps taken. Mechanical or digital thermometers are frequently used to
measure body core temperature. A partial list of non-electrical physiological
measurements is in Table 2. In the field of emotion recognition and classification image
11
processing and machine learning techniques can be used on images of a person’s face to
classify emotion. Similar techniques have been applied to voice.
Table 2: Non-Invasive Physiological Signal Names and Sensor Types (partial list)
Signal Name Sensor Type Sensor
Location
Physiological Data
Photoplethysmogram
(PPG)
Photoelectric
Pulse Oximeter
Finger,
Earlobe
Heartrate, Blood
Oxygen Level
Respiration Belt, Flow
Sensor
Chest, Face Respiration Rate,
Volume
Pupil Imaging Near Eye Pupil Dilation, Eye
Movement
Position Accelerometer,
Gyroscope
Torso, Head,
Limbs
Movement
Peripheral Temperature
(SKT)
Thermistor Hand Skin Temperature
2.3 Emotional Classification Labels
As mentioned previously the goal of this research is to accurately determine a
person’s emotional state. It follows therefore that some standard and widely recognized
set of emotional state labels should be used. Unfortunately, there is not general
agreement on a single set of labels that can correctly and accurately reflect a person’s
range of emotions and categorize them completely.
12
There does seem to be general agreement that two broad indicators underpin
emotional state classification. The first is a measure of arousal or overall physiological
activity. Minimum arousal would be reflective of calm, relaxed, bored states while high
arousal would indicate excitement or fear. The second attribute is valence which is a
measure of positive versus negative emotion. For example, is a person is in a highly
aroused state they might be experiencing extreme joy or extreme terror. This difference
would be reflected in the value of the valence [30].
One measure of the difficulty for an emotional classification problem is the
number of possible outcomes, specifically the number of distinct emotional states that are
to be determined. A binary classifier might obtain 50% accuracy by selecting only a
single output state given an input of random data. Similarly, a classifier with six
potential outcomes would achieve an accuracy of only 16.7% by pure chance. Given the
wide variably and lack of clear consensus in the industry the labels used in classification
by the studies that reported results varies considerably. The classification labels are listed
in Table 3.
2.4 Experimental Elicitation of Emotional Response
The generation of intense physiological responses spanning a range of emotions is
a challenge. Ethical aspects and subject health must be taken into account when
planning experiments. Some of the least upsetting stimuli include colors, music, and
self-directed thought experiments e.g. “Imagine a time when you were very happy”.
With such an experimental setup the degree to which the emotion is experienced and
consequently the level of physiological response will be limited [31]. In order to elicit
13
stronger emotional responses different and likely more upsetting or deceitful techniques
have been used. This type of experimental setup necessitates a careful and thorough
review by an independent Institutional Review Board (IRB) to insure the protection of
the research subjects.
One surveyed study [32] utilized the existing International affective picture
system (IAPS) photoset which is available upon request and approval from the National
Institute of Mental Health Center for Emotion and Attention at the University of Florida
[33]. Several benefits of this photo set include its availability and broad usage and its
well-developed classification and categorization of the images. In order to induce a range
of emotions the photoset include images which may be disturbing and therefore careful
experimental procedures must be followed to insure proper outcomes and protect the
subjects.
Two related studies utilized video game techniques [34] [35]. Given the ubiquity
of video games and simulated environments this experimental setup poses less risk with
the tradeoff that the range of emotions may be reduced. Of particular note is that [34]
utilized the expertise of three trained physiologists to perform the labeling of the
subjects’ emotions based on the game scenario and direct observation of the subject.
Other experimental techniques employed by the studies surveyed include film
clips, a board game, startling noises, and music. Other possible methods of inducing an
emotional response include placing the hand in ice water also known as the “cold
pressor” test [36], public speaking, “floor drop” or other startling scenarios within VR
environments.
14
A summary of the techniques used by the surveyed studies during experimentation is
included in Table 3.
Table 3: Emotion Elicitation Techniques and States
Ref. Technique Assessment Emotional States (Labels)
[32] IAPS Photoset Based on IAPS
classification
Arousal and Valence
[18] Film clips, a
game, unexpected
noise
Subject Self-
Reporting
Joy, Sadness, Anger, Fear,
Surprise, Neutral
[34] Simulated (VR)
Driving
Experienced
Psychologists
Arousal and Valence + High Stress,
Low Stress, Disappointment,
Euphoria
[37] Music from
Existing Corpus
Subject Self-
Reporting
Anger, Joy, Sadness, Pleasure
[35] Video Game –
NHL 2003
against Friend,
Stranger,
Computer
Self-Reporting plus
Survey Questions
Boredom, Challenge, Excitement,
Frustration, Fun
[38] Film Fragments Prior Classification
of Film Fragments
and Self-Reporting
Neutral, Positive, Mixed, Negative
15
2.5 Data Processing and Feature Extraction
Since the signals measured are very small the effect of noise must be mitigated.
Electrical systems in the facilities where the data is collected usually operate at 50 or
60Hz and therefore a band pass or notch filter is typically employed to eliminate any
noise inducted via the building wiring and resulting electromagnetic fields [39].
Many of the studies reviewed employed fairly standard feature extraction
techniques that are applicable to time based signals. Typical features include the mean
value, root mean squared (RMS) value, mean max amplitude, rise duration, and heart
rate. Proper feature extraction is a key component of the process to insure good results
and includes some amount of domain knowledge. For example, heart rate can increase
for non-emotional reasons such as increased physical activity. It has been shown that
heart rate variability – the difference in timing between adjacent beat has been shown to
have some level of correlation to the level of arousal present [40].
There are a number of machine learning classifiers available [41]. The topic of
machine learning and classifiers is extensive, only a brief overview will be provided here.
In a typical case the classifier is provided with an array consisting of multiple columns
which contain attributes that have a measured value or calculated feature and one column
with the label for training sets [42]. For a classic home price prediction example
attributes might be total square footage and number of bedrooms. In this example the
label would be last selling price and this would also be the value to be predicted for new
samples. Each row of the input matrix represents a sample. When the training data is
input into the classifier it determines the best set of values to fit the provided labels.
Three of the most popular machine learning classifiers are:
16
k-Nearest Neighbor – the predicted value is the most prevalent value among the k
samples that are closest to the input value in multidimensional space [43].
Support Vector Machines – the multidimensional solution space is separated by
hyperplanes that are determine based on the training data. The input data prediction is
determined by its location relative to the hyperplane divided space [44].
Artificial Neural Networks – modeled after the human brain these classifiers are
constructed of neuron like nodes which can take multiple inputs to determine a single
output [45]. The output of each node is typically connected to multiple additional nodes.
The prediction accuracy when using electrical physiological signals and machine
learning techniques is typically low especially in juxtaposition with some current
successes in the areas of voice recognition, facial recognition, and spam filtering [46]
[47].
17
3 METHODOLOGY
The general activities included in this research are: Eliciting Emotional Response
and Data Collection, Data Processing, Labeling, and Feature Extraction, Feature
Selection, Training and Validation of Multiple Classifiers, Evaluation Using New Subject
Data. Additional detail on each of these steps is provided in the following sections.
3.1 Experimental Design for Elicitation of Response
The Virtual Reality sessions were chosen from applications available for the Oculus
Rift [48] that were also compatible with the Oculus DK2 headset with the following
general criteria:
Readily available content – free on Oculus web store
Range of relaxing to exciting, while avoiding disturbing or mature content
Both passive (movies, demos) and active (games) subject involvement.
The targeted total time wearing the VR headset was slightly more than 60 minutes.
Early experiments showed that after approximately 1 hour subjects began to tire of the
VR environment. The final session was a game which could be terminated at any time
based on the subject’s fatigue and desired to continue. The total time required to
describe and launch the application plus the recording of subject’s responses after each
session so the actual in VR time was significantly less. The total time required per
subject was approximate 2 ½ hours. This included skin preparation and application of
the adhesive electrodes, connection of the electrodes to the BioRadio, adjusting and
donning the RIP bands, and mounting of both radios as well as removal of the equipment
after the conclusion of the VR sessions.
18
Session 1: Introduction to Virtual Reality Movie/Demo
This is a short film/animation that introduces the 3D capabilities of the VR
headset. It was selected at the first session in order to get the subject used to the Virtual
Reality headset and provide time to verify that all the sensors were recording properly.
It contains several short 3D movies and is in general neutral.
Session 2: The Rose and I Movie
A short animated VR film by Penrose Studios that made its World Premiere in the
New Frontier section at the Sundance Film Festival 2016. It is largely relaxing and
pleasant with one potentially startling event where the flower “coughs.”
Session 3: Discovery VR Action Videos
The application contains a number of videos and the following two were selected:
Video 1 “Get Ready for the Drop” is a 360° Video of a Rollercoaster ride
Video 2 “Jump into the Unknown” is 360° Video of a Rope Pendulum Swing
As both the rollercoaster and the pendulum swing would be considered “thrill” rides
these were chosen to represent more exciting segments. In addition, the pendulum swing
contained scenes that would be unsettling to someone with a fear of heights.
Session 4: InCell Game
An interactive game where you ride along a tubular track and try to capture white
and green objects while avoiding red obstacles. This application was chosen to represent
19
game play where concentration is required and positive and negative consequences could
be observed.
Session 5: Lost Movie
The second 3D Virtual Reality movie which takes place in a dark forest with
multiple startling events such as a bird swooping by. It was anticipated that this movie
would score higher arousal marks than the previous Rose and I movie.
Session 6: Dream Deck
A 3D Virtual Reality demo including multiple short demos. The city scene, alien
encounter, and dinosaur were specifically singled out for subject response with the intent
of invoking emotions including fear (of heights), anxiety (non-human experience), and a
frightening (charging dinosaur) emotions.
Session 7: Lucky’s Tale Game
A 3D adventure game using 3rd person perspective in a VR environment. This
was chosen as the final session that the subject might find enjoyable and interactive. The
subjects were given the option of stopping at any time or continuing to finish a full game
which took approximately 15 minutes. For all the game the hope is that some in-game
event will elicit an emotional response that can be analyzed however given the variation
in game play this is very difficult to guarantee and the timing can vary considerably.
20
3.2 Data Collection
3.2.1 Physiological Signals Measured
Data collection was limited to non-invasive techniques, which include disposable
adhesive electrodes on the skin, wearable type sensors such as respiration straps, finger
electrodes, ear clips, motion, and potentially optical imaging. The physiological data
was measured using two BioRadio Wireless Physiology Monitor and associated
peripherals from Great Lakes Neurotechnology. The BioRadio provides 4 differential
inputs that can be configured for a variety of electrical biosignals and an additional
expansion pod that can be used for temperature sensing and pulse oximetry. As described
in the BioRadio User’s Guide “The BioRadio is worn by the person and is designed for
acquiring physiological signals from sensors attached on the body. Physiological signals
are amplified, sampled, and digitized, which can be wirelessly transmitted to a computer
Bluetooth receiver and/or recorded to onboard memory for post-analysis.” More details
can be found in the user manual which is available online. [49]. Respiration was
measured using Inductive Interface Cables and Universal Adjustable Respiratory
Inductive Effort (RIP) belts made by SleepSense (S.L.P. Inc). The RIP bands and
Interface Module provide a voltage input to the BioRadio which varies based on the
measure chest and abdomen volume. This technique has been shown to reliably measure
respiration [50]. The finger based pulse oximeter used was Model 3012LP by Nonin
Technologies. Pulse Oximetry utilizes two types of LEDs to measure the absorption of
light within the finger or earlobe to determine pulse, SpO2, and peripheral blood volume
[51].
21
Table 4: Physiological Signals Recorded
Radio Signal Location Qty
H_Ch1 EEG f4 High right forehead 1
H_Ch2 EOG - Horizontal Outside of eyes 1
H_Ch3 EOG - Vertical Above and below right eye 1
H_Ch4 EMG – Zygomaticus “smile” muscle Right cheek 1
H_int Accel XYZ, Gyro XYZ Rear of head 6
B_Ch1 GSR (Electrodermal Activity) Right index & pointer finger 1
B_Ch2 ECG Left and right wrists 1
B_Ch3 Chest Respiration (RIP) Chest strap 1
B_Ch4 Abdomen Respiration (RIP) Stomach strap 1
B_Aux Peripheral Temperature Right pinkie finger 1
B_Aux Heart Rate via PulseOx Right ring finger 1
B_Aux Blood Volume (PPG) via PulseOx Right ring finger 1
B_Aux Blood Oxygen (SpO2) via PulseOx Right ring finger 1
B_int Accel XYZ, Gyro XYZ Right waist 6
22
3.2.2 Skin Preparation and Electrode Placement
Each subject was provided with an instruction sheet guiding them in the skin
preparation and electrode application. They were asked to clean the skin where the
electrode will be attached with an alcohol swab to remove any oils, lotions, makeup, etc.
as well as dead skin. They were specifically told that was not necessary to scrub
vigorously. They were also instructed to take care particularly around the eyes and
installed the under eye electrode down far enough to avoid the sensitive under eye skin.
23
After the alcohol was allowed to dry, the cloth electrodes were removed from the
backing (it helps if the facilitator removes the electrode and hands it to the subject) and
applied to the skin as shown below.
Wrist/Hands
1 Middle segment of right index finger, palm side
2 Middle segment of right pointer finger, palm side
3 Left wrist, palm side
4 Right wrist, palm side
5 Right elbow
Figure 1: Placement of Electrodes on Left and Right Hands
24
Face
6 High on right forehead near hairline
7 Center of forehead ~2 cm above brow line
8 Above right eye and eyebrow
9 Right of right eye
10 Below right eye - careful not too close - skin is sensitive here
11 Most prominent point of right cheek bone
12 Above and slightly to right of mouth “dimple” area
13 Left of left eye
IMPORTANT: This diagram is mirrored for use while looking in mirror
Figure 2: Placement of Electrodes on Face
25
The electrodes used in this study were MVAP-II Electrodes containing a
Silver/Silver Chloride Sensing Element with Hydro Gel and manufactured by MVAP
Medical Supplies 1415 Lawrence Drive, Newbury Park, CA 91320. During early testing
two other electrode types were evaluated: The Skintact Premier 3415 by Leonhard Lang
GmbH and TD-141C square cloth electrodes from Florida Research Instruments. The
Skintact electrodes adhered well but were uncomfortable to remove after testing. The
Florida Research Instruments electrodes were more comfortable but came off several
times during evaluation. These observations are noted for the adhesive only; no
comparison of the relative electrical signal performance was made.
3.3. BioRadio mounting and cabling
A seemingly simple but significant issue was mounting the BioRadio uniformly
and securely. The BioRadio is equipped with a removable belt clip however since the
accelerometer and gyroscope are internal to the radio a uniform mounting independent of
clothing was desired. The characteristics of the radio attached to an elastic waist band
might vary significantly versus one clipped to a tighter belt.
For the head radio the first attempts were to simply clip the radio onto the straps
of the Oculus DK2. The Oculus is fairly immobile on the head however the BioRadio
moved significantly when just clipped to the strap. A big improvement in mounting was
made by using a plastic mounting bracket modified from an inexpensive LED headlamp
which was secured to the straps at the cross point directly in the back of the head. The
BioRadio clip was still used but given the thickness of the plastic bracket the radio’s
26
movement independent of the head was significantly reduced as shown in the right hand
picture in Figure 3.
Figure 3: “Head” BioRadio mounting improvements
In order to eliminate the possibility of clothing affecting the motion capture of the
“body” radio a back supporting belt was utilized. This belt attaches securely around the
waist and the Velcro closures were used to hold the body radio tightly against the right
side of the waist. Additionally, it was possible to also use the Velcro flaps to secure the
cables and sensor pods associated with the RIP belts.
27
Figure 4: Body BioRadio mounting with Back Support Belt
As can be seen in these photographs, cable management remained somewhat
cumbersome. This is a problem for several reasons. First the subject’s movement is
somewhat limited. This did not prove to be an issue for these activities that consisted
only of sitting and standing in a limited area but it will be a greater problem as movement
is increased. Directly related to this is the fact that the cable can become snagged and
disconnect during the activities. This required special attention especially with respect to
the ergonomic armrests and adjustment knobs on the chair. Finally cable movement can
induce noise into the signals. Best practices include taping or affixing the cables tightly
to the body but this was not practical in a non-medical setting. Future setups would
benefit from a fixed harness where the cables are joined into less cumbersome bundles.
Once wireless transmission and device power reaches maturity eliminating the cables
completely through the use of Bluetooth LE enabled sensors would result in a much more
pleasant experience for the subject.
28
3.3.1 Video Collection and Event Markers
All data collection experiments were recorded with a high definition video
recorder and external microphone. This video and audio record proved very valuable
when labeling the sessions. In addition, the marker functionality of the BioRadio was
used to mark key segments such as the start of a VR application in the data set.
3.3.2 Subject Feedback Regarding Arousal and Valence
Initial experimentation showed that getting consistent and reliable feedback from
the subject is challenging. It is quite difficult to describe many of the experiences in
consistent emotional terms. Indeed, the question “How did that make you feel?” is an
opened ended on often used in therapy. Even members of the research team who were
familiar with the classification of emotion and the arousal-valence model struggled to
enunciate what types of emotion a specific video or game induced.
Initial attempts to simplify this process included several variation of multiple
choice selections. One was specifically based on the six universal emotion categories
used by Ekman and Friesen [3] which are: happiness, anger, sadness, disgust, surprise,
and fear. Unfortunately, while these emotions may be present in all subjects they did not
cover the range of emotion reported during the experimentation. After watching what
could be best described as either a relaxing or boring video there is no best fit response
available with these six categories.
29
Figure 5: Emotion and Range Subject Response Form – too complex
Figure 5 shows a second iteration, which included a listing of emotions along with
a ranking. Subjects were asked to complete the table after each session. Two
fundamental issues arouse with the use of this form. First the revised categories still did
not match with the subject’s expressed emotion during the simulation. Second the
addition of a range considerably lengthened the time that subjects required to complete
the feedback.
Another issue arose once the testing was moved exclusively to the Virtual Reality
environment. The initial thinking was that the subjects would welcome a brief break
between the 3 to 8 minute sessions to remove the Virtual Reality headset and complete
the survey form. However, the frequent removal of the headset proved to be annoying
and broke the flow of the simulations. Once “inside” the Virtual Reality world it was
much preferred to continue with the sessions. Furthermore, many of the electrodes and
wires are located on the face and hands so the removal of the headset was much more
30
cumbersome than it would be during traditional usage. This also increased the risk that
one of the cables or electrodes could be disconnected.
The final methodology employed was a simplified version of the arousal-valence
model. Instead of removing the headset to fill out a form, the subject was asked verbally
the following questions after each session:
Did you find this [movie, game, demo] exciting, relaxing, or neutral?
Did you find this [movie, game, demo] pleasant, unpleasant, or neutral?
While there was still some hesitation on the subject’s part especially during longer
sessions that had multiple parts this simplified oral response method worked much better
than the prior methods.
3.4 Signal Processing and Classification
3.4.1 Data Export
The collected data was stored on a laptop running Windows 10 and two instances
of the BioCapture program. Each instance of BioCapture linked to one radio: Instance
on left was linked to the head radio, instance on right was linked to the body radio. For
consistency the recording was started on the head radio first and then on the body radio.
The typical offset involved with switching instances and setting up the second recording
was approximately 17 seconds. The keyboard was configured for a marker in the head
radio instance of BioCapture. All marker data was captured in the head instance. The
naming convention used was Sx_VRy[H,B].bcrx for subject x and session y and Head,
Body.
31
After the session was completed the files were exported from the BioCapture
program into standard comma delimited text (.csv) format. Two versions were exported,
one with RealTime information (used for synchronization) and an ElapsedTime version
(smaller and easier to handle in MATLAB).
Given the 250 Hz sampling rate each minute of collected data generated 15,000
rows in the table. The total number of rows depends on the session length and ranges
from approximately 70,000 to 140,000 rows for sessions 1 - 6. Due to the fact that it was
left to the subject as to how long to continue Session 7 has a broader range and can
exceed 250,000 rows if the subject completes the entire game. The head configuration
has 13 columns and the body configuration has 16 columns.
Figure 6: Directory Listing showing Data Files and Size after Export
32
3.4.2 Data Import, Table Join, and Labeling
After conversion to .csv format each of the 14 data files (7 sessions x 2 for Head
and Body) was imported in to a MATLAB table and stored on the University server.
MATLAB Version R2016a was used for this analysis. Product details can be found on
the MathWorks website [52]
Figure 7: Raw Subject 1 Data after Import as MATLAB Table
Since the head and body recording are separate they need to be joined prior to
classification. Unfortunately, there is no common signal nor ability to add a marker in
each file. There are known techniques of synchronizing the files based on cross
correlation however given that each of the signals is discrete and the times involved are
based on human reaction the files were synchronized using the real time data available
via the BioRadio. Specifically, an offset was calculated by subtracting the delay from
33
the start of the head recording to the start of the body recording and this was used to align
the rows prior to performing a table join. In addition, the start and stop times were also
used to discard the setup and takedown segments of each session.
A separate table containing column vectors with the subject and session data was
also created. This metadata was present in the file name and would be lost when all of
the data was combined. The table and its column vectors were joined to the initial table.
Classification labeling within each segment was much more manual and required the time
information gleaned from the video. A point was made to include a start mark (by
pressing the ‘S’ button on the laptop) in the head file that was audibly and visually visible
in the video. Since this data was also clearly present in the data file it served as the
alignment index between the video data and the signal data. For several of the sessions
the activities were further broken down into segments as previously described in the
subject self-reporting section. The video was viewed and time manually input into a
spreadsheet to convert to elapsed time in the data file.
34
Figure 8: Screenshot of Excel Spreadsheet used for Timing Synchronization
MATLAB was used to process the raw data into a single “mega table” that
included the 24 signal data, subject, session, self-reported arousal, self-reported valence,
and segment column vectors. In keeping with good programming practice the
experiment specific information was imported from separate .csv files with the hope that
future sessions could be processed without modifications to the MATLAB code itself.
36
3.4.3 Feature Extraction
Feature extraction was performed in a series of several experiments once the
dataset was available. For the very first runs a simple mean and standard deviation
feature table was built. Given the 24 time based signal present in the dataset this yielded
a table containing 48 feature vectors. A time sweep was performed from 1 second to 20
seconds to see what epoch size yielded the best results.
A second more complex set of feature extraction was performed using existing
code from the research group that extracted a total of 90 features for each signal.
3.5 Institutional Review Board
As this research involved human subjects, review and approval by the Texas State
Intuitional Review Board was required.
“Texas State University, by action of the President, has established an institutional review
board (IRB) to review human subject research. This board is supported by The Office of
Research Integrity and Compliance (ORIC). The IRB reviews research that is conducted
or supported by the Texas State University faculty, students or staff in order to determine
that the rights and welfare of the human subjects are adequately protected. The IRB is
guided by the ethical principles described in the 'Belmont Report' and by the regulations
of the U.S. Department of Health and Human Services found at Title 45 Code of Federal
Regulations, Part 46. Texas State maintains an approved Federal wide Assurance
(FWA00000191) of Compliance with the Office for Human Research Protection
(OHRP).” [53]
37
Application #2016M7258 Version #1 was approved 7/12/2016 and expires 6/30/2017.
All subjects signed an informed consent form prior to the virtual reality sessions.
Subjects were briefed on the content and approximate duration of each session
immediately prior to the session during the data collection. Subjects were periodically
reminded that they were free to stop at any time without repercussion or harm to the
research. Subjects were also told to be aware of the possibility of nausea and
read/accepted the in session Oculus health warning.
The collected dataset does not contain subject names or other known identifiable
information. All data is stored on the Texas State TRACS system with controlled access
by Dr. Vangelis Metsis.
38
4 EXPERIMENTS AND RESULTS
4.1 Self-Reported Data – Range of Responses during Simulations
Table 5: Summary of Subject Reponses
The subject self-reported data to the questions described in Section 3.3.3 are
shown in Table 5. The results are very asymmetrical with the majority of the segments
rated as ‘Exciting’ and ‘Pleasant’. This is likely due to the conservative selection of
stimuli and the limited number of subjects. For example, only one subject expressed any
trepidation regarding heights and therefore the Pendulum Swing and City Scene which
both involved a view from very high perspective with a large potential drop off was not
rated as unpleasant. By subject data is shown in Table 6 and Table 7.
41
Figure 10 shows the individual subject responses represented on a 3 x 3 grid which
reflect the arousal (y-axis) and valence (x-axis) of the rated response.
Figure 11: Summary of all Subject Responses marked on Arousal-Valence axis
Figure 11 shows the sum of the total subject responses represented on a 3 x 3 grid which
reflect the arousal (y-axis) and valence (x-axis) of the rated response.
42
4.2 Initial Signal Analysis and Data Conversion
Signals were plotted using the BioCapture software tool provided by the
BioRadio vendor for quick inspection and verification that the signal data was good.
Figure 12: Side by side screenshot of Signal Data shown in BioCapture Software
Figure 12 shows an initial visualization of the signals during an early run. The
signals on the left are from the “head” BioRadio and the signals on the right are from the
“body” BioRadio. The red line on these graphs represent the moment when the subject is
pushed from a cliff in the virtual reality simulation. Several interesting and encouraging
aspects are present. First the EEG f4 (labeled fR on the figure) shows a distinct change
43
after the “push”. This is in the absence of a significant change in EOGV and EOGH
which could also affect the EEG measurement. The EMG cheek signal shows an
increase which would be associated with a smile, laugh, or potentially jaw clench. The
GSR is increasing for a period of several seconds which is consistent with a stress or
excitement reaction. Finally, the heart rate rises also consistent with an increase in
arousal.
4.2.1 Rose and I Movie
The Rose and I Movie was rated as pleasant and relaxing or neutral by all
subjects. As such we would not anticipate any significant reactions or events during the
course of the session. The graphs below show six selected signals and represent a
baseline for comparison. The session itself is approximately 4 minutes long, the graphs
show a 90 second window which is similar in duration to the rollercoaster and pendulum
swing sessions. At the 45 second mark there is a scene in the movie where a flower
sneezes and the main character is startled.
44
Figure 13: Subject Heart Rate during Rose and I Segment
The heart rate remains relatively low and unchanged throughout this video as expected.
There is a difference in the base heart rate among the subjects that is evident during this
typically relaxing session.
45
Figure 14: Peripheral Blood Volume during Rose and I Segment
The peripheral blood pulse volume exhibits some variation but with no discernable
pattern related to the session stimuli.
46
Figure 15: EEG during Rose and I Segment
The EEG f4 signal exhibits some variation but with no discernable pattern related to the
session stimuli.
47
Figure 16: EMG during Rose and I Segment
The EMG signal exhibits some variation but with no discernable pattern related to the
session stimuli.
48
Figure 17: GSR during Rose and I Segment
Subject 3 shows a potential response at time 20 seconds however there is no in session
stimuli at this point. Subject 4 shows a response at time 45 seconds which is at the time
of the previously mentioned sneeze/startle scene. Subject 2 and 3 have potential
responses as well but less pronounced. No response can be seen in Subject 1 data.
Unfortunately, Subject 5’s data in invalid due to a disconnection of the finger electrode.
49
Figure 18: Peripheral Temperature during Rose and I Segment
The peripheral skin temperature for Subjects 1,3,4, and 5 increases during this session
while Subject 2 decreases. This is in contrast to the tendency of the peripheral
temperature to decrease during the rollercoaster and pendulum swing segments.
4.2.2 Roller Coaster
The rollercoaster segment includes climbing the ramp from time 0 to 24 seconds
with the remainder of the time being the actual ride. Subject 2 reported mild nausea
immediately following the ramp segment.
50
Figure 19: Subject Heart Rate during Roller Coaster Segment
Only Subject 2 who reported mild nausea and Subject 3 showed a significant increase in
heart rate during the roller coaster segment. Subjects 1 and 2 had steady heart rates and
Subject 5 had slightly decreasing heart rate during the ride segment.
51
Figure 20: Subject Peripheral Blood Pulse Volume Roller Coaster Segment
The peripheral blood pulse volume exhibits some variation but with no discernable
pattern related to the session stimuli.
52
Figure 21: Subject EEG f4 during Roller Coaster Segment
No consistent time related pattern is seen in the EEG f4 signal during the roller coaster
segment. Much of the variation here is likely due to the head and eye movement
resulting in artifacts in the f4 signal.
53
Figure 22: Subject EMG signal during Roller Coaster Segment
The EMG signal for S2 and S3 increases significantly at the point of the first drop
however the DC component is less meaningful and needs to be filtered out.
54
Figure 23: Subject Galvanic Skin Response signal during Roller Coaster Segment
The Galvanic Skin Response for all 4 Subjects increases about the time of the roller
coaster reaching the top of the ramp and during the first drop. Unfortunately, Subject 5’s
electrode came loose and the data is invalid.
55
Figure 24: Subject Peripheral Skin Temperature during Roller Coaster Segment
The peripheral skin temperature for all subjects trended down during this session.
4.2.3 Pendulum Swing
This session was intended to cause height related anxiety. Subject 1 was the only
participant who expressed a moderate fear of heights and consequently this segment was
ranked pleasant and exciting by 4 of the 5 participants. The most significant event
occurs at 57 seconds on the following graphs; it is when the participant is pushed off of
the cliff in the virtual reality simulation.
56
Figure 25: Subject Heart Rate during Pendulum Swing Segment
The heart rate of Subject 1 shows an increase following the “push” at 57 seconds. A
very slight increase can also be seen in subjects 2, 3, and 4 with no meaningful change in
Subject 5.
57
Figure 26: Subject Peripheral Blood Pulse Volume Pendulum Swing Segment
Peripheral blood pulse volume appears to decrease slightly in Subject 4 at the 57 second
“push” point but no clear trend can be seen across all subjects.
58
Figure 27: Subject EEG f4 during Pendulum Swing Segment
The promising rise in EEG f4 in Subject 1 previously discussed in Section 4.2 did not
appear in any of the remaining four subject’s EEG signal.
59
Figure 28: Subject EMG signal during Pendulum Swing Segment
The EMG signal charted here (auto-scaled) shows no specific markers at the 57 second
“push” point. For Subject 1 there appeared to be an increase in EMG at this time as
described in Section 3.3. That graph was also auto-scaled but across a much shorter time
window. Further processing of the EMG signal to remove the low frequency DC voltage
component is needed.
60
Figure 29: Subject Galvanic Skin Response signal during Pendulum Swing Segment
Subjects 2 and 4 seem to exhibit a galvanic skin response with a timing and that is
consistent with the “push” at the 57 second point. However, no clear rise/fall is seen in
Subjects 1, 3, and 5.
61
Figure 30: Subject Peripheral Skin Temperature Pendulum Swing Segment
The peripheral skin temperature for all subjects trended down during this session. There
is no specific response at the 57 second “push” event. This segment was completed
standing and it is possible that this had an impact causing the general downward trend in
the peripheral skin temperature. Further investigation regarding the effect of sitting
versus standing is warranted.
4.3 Feature Extraction and Segment Size
4.3.1 Simple Mean and Standard Deviation Feature Extraction
For initial quick analysis very simple mean and standard deviation features were
extracted with a 1 second (250 sample) window. The resulting MATLAB table
62
contained 48 features, 2 for each of the 24 time based input signals. The response class
was the session – in this case the relaxing VR2 and the exciting VR3 sessions.
The table was input in to the MATLAB classification tool [54] and all available
classifiers were run and evaluated with 5-fold cross validation for a range of epoch sizes.
While the accuracy is higher for a 1 second epoch this is likely due to overfitting as the
physiological signals and emotional response will not very significantly in such a short
time span. In addition, because data from all 5 subjects is used during the cross
validation the results are much higher than found during hold-one-out analysis where the
test set consists of the data from a subject who is not included in the training set.
63
Table 8: MATLAB Classifier Accuracy with 5 fold cross-validation, 1 second
epoch, and mean + standard deviation features
64
Table 9: MATLAB Classifier Accuracy with 5 fold cross-validation, 5 second
epoch, and mean + standard deviation features
4.3.2 Temporal and Frequency Based Feature Extraction
For second pass analysis utilized existing feature extraction algorithms that were
developed for a previous sleep study also using physiological data. For each signal the
90 features shown in Table 10 were generated resulting in a total of 2160 feature vectors.
65
Table 10: List of Temporal and Frequency Based Features
4.3.3 Domain Specific Feature Extraction
For the final pass feature extraction algorithms were developed based on existing
knowledge regarding the behavior of the physiological signals with respect to emotional
response. Each of the 15 features were computed by subject and are described briefly
below.
66
meanHR – the mean of the heart rate as reported by the pulse oximeter is calculated for
each epoch and is normalized by subtracting the mean of the subject’s heartrate for the
entire dataset (the base heart rate)
magPPV – the magnitude of the peripheral blood volume as reported by the pulse
oximeter was calculated for each epoch by subtracting the minimum value from the
maximum value.
slopeGSR – the slope of the electrodermal activity (EDA) or skin resistance was
calculated by subtracting the value of the last sample in the epoch from the value of the
first sample in the epoch.
meanGSR – the mean value of the skin resistance was calculated as the average of all
samples within each epoch.
slopePT – the slope of the peripheral skin temperature (SKT) was calculated for each
epoch by subtracting the minimum value from the maximum value.
mECGHR – the mean heartrate based on the ECG signal was calculated by counting the
number of peaks which were greater than 0.5 seconds apart during each epoch.
HRV – heart rate variability is a better predictor of emotion than raw heartrate [55]. The
variability of the heart rate was computed by taking the maximum distance between
adjacent peaks minus the minimum distance between adjacent peaks divided by the
average distance between peaks for a given epoch.
minHRV – is calculated as HRV except the min peak distance only is used.
maxHRV – is calculated as HRV except the max peak distance only is used.
respA, respC – a similar peak counting method as mECGHR is applied with a 2 second
peak to peak minimum for the abdomen and chest RIP signals.
67
respVA, respVC – the minimum peak to peak distance divided by the mean peak to peak
distance is computed for each epoch from the abdomen and chest RIP signals.
meanfR – the mean value of the f4 EEG signal is computed. This signal is previously
normalized by subject
magEMG – the magnitude of the EMG signal is calculated by subtracting the minimum
value in the epoch from the maximum value in the epoch.
The results using this mix of features were better than those using the simple
mean/standard deviation features and the temporal and frequency based features. The
highest results were achieved via feature selection and will be discussed in the next two
sections.
4.4 Feature Selection
4.4.1 Simple Mean and Standard Deviation Feature Selection
For the simple mean and standard deviation feature set only a few feature
selection experiments were run. The accelerometer and gyroscope data from both the
head and body radios were eliminated and did not significantly impact the results. Most
of the feature selection efforts were on the large temporal and frequency based feature set
and the smaller but varied domain based feature set which will be detailed in the next two
sections.
4.4.2 Temporal and Frequency Based Feature Selection
Feature selection was performed on the 2160 features individually using a sparse
technique as it was not possible to run feature selection on all features simultaneously.
68
The resulting 161 features identified were then again processed with a sparse technique
resulting in 30 final features.
4.4.3 Domain Specific Feature Extraction
Multiple combinations of features were run during the development of the domain
based features as well as some automated testing. The accuracy was found to be highest
with the following set of features: meanHR, magPPV, slopeGSR, mECGHR, HRV.
When each feature was evaluated individually the top 3 performing features were
meanHR, magPPV, and HRV. With leave one out analysis the best results were achieved
when the slopePT feature was removed. For a complete description of these features
see section 4.3.3.
4.5 Classification Accuracy
As discussed earlier the classification accuracy results calculated using 5 fold
cross-validation in the MATLAB Classification learner were much higher than leave on
out by subject results. Given the small dataset and the goals of the research to be able to
determine emotional response on new subjects the validation used for the following
results was based on hold one out by subject. Specifically, for each subject a test set
was built containing that subjects 24 time based signal data along with their reported
arousal response. Given the limitations of the response data, valence was not used and
arousal was classified as binary with 1 corresponding to an “exciting” response and a 0
corresponding to a “neutral” or “relaxing” response. The default MATLAB r2016a
SVM classifier was used with a 10 second epoch.
69
For the simple mean and standard deviation feature set utilizing all 48 features
(without feature selection) the hold one out accuracy was 74%. Using all 2160 temporal
and frequency based features the hold one out accuracy was 57%. After feature selection
to 30 total features the temporal and frequency based feature set hold one out accuracy
was 72%.
The highest overall hold one out accuracy of 80% was achieved using a Support
Vector Machine and the following five features: meanHR, magPPV, slopeGSR,
mECGHR, HRV.
70
5 CONCLUSION
Experimental design is critical to the success of this research. Testing the stimuli
and subject self-reporting responses prior to the more complex experiments with the full
sensor setup would be beneficial. Unfortunately, this will likely require a much larger
subject population as some level of desensitization will occur therefore it would be best
to perform the data collection on subjects who have not previously participated in the
virtual reality simulations. In order to better cover the relaxing portion of the arousal-
valence space, calming or meditative segments should be added. More challenging will
be the identification of emotionally unpleasant stimulus. Given the immersive nature of
the virtual reality environment care must be taken not to cause undue stress with explicit
or unsettling content. A thorough IRB review of the stimulus, subject selection, and
procedures would be warranted with this type of content.
Much larger number of subjects with multiple runs are needed. Some
physiological signals varied significantly between subjects with no universal telltale
markers found. If possible, selecting subjects based on presence or absence of sensitivity
to the intended stimuli would be beneficial. For example, a dataset with 50% of the
subjects expressing a fear of heights and 50% having no fear of heights engaged in a
virtual reality simulation that involves height and drop simulations would be very
interesting.
In order to increase the number of subjects several elements will need to be
improved. Less cumbersome equipment with better cable management and/or wireless
sensors along with sensors built into headset or other wearable type garments instead of
adhesive type electrodes would make the simulations much more pleasant and also
71
decrease the required setup time. Event marking should be as automated as possible.
Video recording with remote audio capabilities was invaluable for this preliminary
research but reviewing each video to determine event label start and stop points required
a large amount of time. This manual technique also increases the risk of error and timing
inaccuracies.
Feature extraction using both simple and complex techniques that were not
domain specific did not yield robust results when tested on subject data not in the training
set. Domain knowledge will likely need to be incorporated for each type of signal. For
example, the rise and recovery of a GSR event is well documented and fairly easy to spot
visually on a graph however specific feature extraction will be required to form a marker
for this type of event in the feature table. In addition, motion is likely not relevant for
the direct prediction of emotional response, however it could prove to be valuable in
detecting and eliminating noise and motion related artifacts from the signals that are
relevant.
With the continued progress in sensing technology and through the application of
machine learning on large datasets, the classification of human emotion will help guide
therapy, training, and the development of improved experiences with automated systems
that include affective computing capabilities.
72
LITERATURE CITED
[1] E. Kaniusas, Biomedical Signals and Sensors I: Linking Physiological Phenomena
and Biosignals, Heidelberg, Germany: Springer Verlag GmbH, 2012.
[2] R. W. Picard, Affective Computing, Cambridge: MIT Press, 1997.
[3] R. Ortner, E. Grunbacher and C. Guger, "State of the Art in Sensors, Signals and
Signal Processing," Guger Technologies OG, Graz, Austria, 2013.
[4] M. J. Jamal, Signal acquisition using surface EMG and circuit design considerations
for robotic prosthesis, INTECH Open Access Publisher, 2012.
[5] S. Day, Important factors in surface EMG measurement, Bortec Biomedical Ltd
publishers, 2002.
[6] E. P. Scilingo, A. Gemignani, R. Paradiso, N. Taccini, B. Ghelarducci and D. De
Rossi, "Performance evaluation of sensing fabrics for monitoring physiological and
biomechanical variables," IEEE Transactions on information technology in
biomedicine 9.3, pp. 345-352, 2005.
[7] C. Guger, G. Krausz, B. Allison and G. Edlinger, "Comparision of dry and gel based
electrodes for P300 brain-computer interfaces," Frontiers in Neuroscience, pp. 1-6,
2012.
[8] S. Fairclough, "Physiological data must remain confidential," Nature, pp. 263-263,
2014.
[9] I. Korhonen, J. Parkka and M. Van Gils, "Health monitoring in the home of the
73
future," IEEE Engineering in medicine and biology, vol. 3, no. 22, pp. 66-73, 2003.
[10] G. Valenza and E. P. Scilingo, Autonomic nervous system dynamics for mood and
emotional-state recognition: Significant advances in data acquisition, signal
processing and classification, Switzerland: Springer International Publishing, 2014.
[11] L. Xiao, S. Boyd and S. Lall, "A scheme for robust distributed sensor fusion based
on average consensus," In IPSN 2005. Fourth International Symposium on
Information Processing in Sensor Networks, pp. 63-70, 2005.
[12] A. Pantelopoulos and N. G. Bourbakis, "A survey on wearable sensor-based systems
for health monitoring and prognosis," IEEE Transactions on Systems, Man, and
Cybernetics, Part C (Applications and Reviews), vol. 1, no. 40, pp. 1-12, 2010.
[13] V. Kaarjakari, Practical MEMS: Design of microsystems, accelerometers,
gyroscopes, RF MEMS, optical MEMS, and microfluidic systems, Las Vegas: Small
Gear Publishing, 2009.
[14] N. D. Lane, E. Miluzzo, H. Lu, D. Peebles, T. Choudhury and A. Campbell, "A
survey of mobile phone sensing," IEEE Communications, vol. 48, no. 9, pp. 140-
150, 2010.
[15] Z. Xu, J. Hilton, G. Gates, S. Jelic, Y. Stern, M. Bartels, R. DeMeersman and R.
Basner, "Increased sympathetic and decreased parasympathetic cardiovascular
modulation in normal humans with acute sleep deprivation," Journal of applied
physiology, vol. 6, no. 98, pp. 2024-2032, 2005.
[16] W. H. Spriggs, Essentials Of Polysomnography: A Training Guide and Reference
74
For Sleep Technicians 2nd Edition, Burlington: Jones & Bartlett Learning, 2015.
[17] P. &. F. W. V. Ekman, "Constants Across Cultures in the Face and Emotion,"
Journal of Personality and Social Psychology, Vol 17(2), pp. 124-129, 1971.
[18] E.-H. Jang, B.-J. Park, S.-H. Kim, M.-A. Chung, M.-S. Park and J.-H. Sohn,
"Emotion Classification based on Bio-Signals: Emotion Recognition using Machine
Learning Algorithms," Proceeding of the 2014 International Conference on
Information Science, Electronics and Electrical Engineering, pp. 1373-1376, 2014.
[19] K. Gouizi, F. B. Reguig and C. Maaoui, "Emotion recognition from physiological
signals," Journal of medical engineering & technology, Vols. 6-7, no. 35, pp. 300-
307, 2011.
[20] P. Ekman, R. W. Levenson and W. V. Friesen, "Autonomic nervous system activity
distinguishes among emotions," Science, pp. 1208-1210, 1983.
[21] P. Rainville, A. Bechara, N. Naqvi and A. R. Damasio, "Basic emotions are
associated with distinct patterns of cardiorespiratory activity," International Journal
of Psychophysiology 61, no. 1, pp. 5-18, 2006.
[22] C. Collet, E. Vernet-Maury, G. Delhomme and A. Dittmar, "Autonomic nervous
system response patterns specificity to basic emotions," Journal of the Autonomic
Nervous System 62, no. 1-2, pp. 45-57, 1997.
[23] F. Canento, A. Fred, H. Silva, H. Gamboa and A. Lourenco, "Multimodal biosignal
sensor data handling for emotion recognition," IEEE In Sensors, pp. 647-650, 2011.
[24] R. W. Leon, I. Karacan and C. J. Hursch, Electroencephalography (EEG) of human
75
sleep: clinical applications, John Wiley & Sons, 1974.
[25] J. Sztajzel, "Heart rate variability: a noninvasive electrocardiographic method to
measure the autonomic nervous system," Swiss medical weekly, no. 134, pp. 514-
522, 2004.
[26] D. T. Lykken, "The GSR in the detection of guilt," Journal of Applied Psychology,
vol. 43, no. 6, p. 385, 1959.
[27] J. Block, "A study of affective responsiveness in a lie-detection situation," The
Journal of Abnormal and Social Psychology, vol. 1, no. 55, p. 11, 1957.
[28] J. T. Cacioppo, R. E. Petty, M. E. Losch and H. S. Kim, "Electromyographic activity
over facial muscle regions can differentiate the valence and intensity of affective
reactions," Journal of personality and social psychology, vol. 2, no. 50, p. 260, 1986.
[29] A. E. Kaufman, A. Bandopadhay and B. D. Shaviv, "An eye tracking computer user
interface," IEEE 1993 Symposium on Research Frontiers, pp. 120-121, 1993.
[30] E. A. Kesinger, "Remembering emotional experiences: The contribution of valence
and arousal," Reviews in the Neurosciences, vol. 4, no. 15, pp. 241-252, 2004.
[31] A. Gerrards-Hesse, K. Spies and F. Hesse, "Experimental Inductions of emotional
states and their effectiveness: A review," British Journal of Psychology, pp. 55-78,
1994.
[32] A. Haag, S. Goronzy, P. Schaich and J. Williams, "Emotion recognition using bio-
sensors: First steps towards an automatic system," In Tutorial and research
workshop on affective dialogue systems, pp. 36-48, 2004.
76
[33] P. Lang, B. Cuthbert and M. Bradley, "International affective picture system (IAPS):
Affective ratings of pictures and instruction manual. Technical Report A-8,"
University of Florida, Gainesville, FL, 2008.
[34] C. D. Katsis, N. Katertsidis, G. Gainiatsas and D. I. Fotiadis, "Toward Emotion
Recognition in Car-Racing Drivers: A Biosignal Processing Approach," IEEE
Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans, Vol.
38, no. 3, pp. 502-512, 2008.
[35] R. L. Mandryk, M. S. Atkins and K. M. Inkpen, "A Continuous and Objective
Evaluation of Emotional Experience with Interactive Play Environments,"
Proceedings of the SIGCHI Conference on Human Factors in Computing Systems,
pp. 1027-1036, 2006.
[36] L. Mitchell, R. A. MacDonald and E. Brodie, "Temperature and the Cold Pressor
Test," The Journal of Pain, pp. 233-238, 2004.
[37] C. Zong and M. Chetouani, "Hilbert-Huang Transform based Physiological Signals
Analysis for Emotion Recognition," Proceeding of the 9th IEEE International
Symposium on Signal Processing and Information (ISSPIT), p. 334 – 339, 2009.
[38] E. L. van den Broek, V. Lisy, J. H. Janssen, J. H. Westerink, M. H. Schut and K.
Tuinenbreijer, "Affective Man-Machine Interface: Unveiling Human Emotions
through Biosignals," BIOSTEC 2009, Communications in Computer and Information
Science (CCIS), pp. 21-47, 2009.
[39] P. M. Agante and J. P. Marques De Sa, "ECG noise filtering using wavelets with
soft-thresholding methods," IEEE, pp. 535-538, 1999.
77
[40] J. Goldie, C. McGregor and B. Murphy, "Determining levels of arousal using
electrocardiography: a study of HRV during transcranial magnetic stimulation," In
2010 Annual International Conference of the IEEE Engineering in Medicine and
Biology, pp. 1198-1201, 2010.
[41] J. Bell, Machine Learning: Hands-On for Developers and Technical Professionals,
Indianapolis: John Wiley and Sons, Inc, 2014.
[42] Y. S. Abu-Mostafa, M. Magdon-Ismail and H.-T. Lin, Learning from Data,
AMLbook, 2012.
[43] G. Tsoumakas and I. Katakis, Multi-label classification: An overview, Thessaloniki:
Dept. of Informatics, Aristotle University of Thessaloniki, 2006.
[44] L. Wang, Support vector machines: theory and applications, Vol 177, Spring Science
and Business Media, 2005.
[45] T. Mitchell, "Artifical Neural Networks," Machine Learning, pp. 81-127, 1997.
[46] S. Nicu, I. Cohen, T. Gevers and T. S. Huang, "Multimodal approaches for emotion
recognition: a survey," International Society for Optics and Photonics, pp. 56-67,
2005.
[47] T. S. Guzella and W. M. Caminhas, "A review of machine learning approaches to
spam filtering," Expert Systems with Applications, vol. 7, no. 36, pp. 10206-10222,
2009.
[48] "Oculus Rift Applications," 15 September 2016. [Online]. Available:
https://www.oculus.com/experiences/rift/.
78
[49] "BioRadio User Guide Rev E," 15 January 2016. [Online]. Available:
https://glneurotech.com/bioradio/wp-content/uploads/2015/02/392-0050-Rev-E-
BioRadio-User-Guide.pdf.
[50] P.-Y. Carry, P. Baconnier, A. Eberhard, P. Cotte and G. Benchetrit, "Evaluation of
respiratory inductive plethysmography: accuracy for analysis of respiratory
waveforms," CHEST Journal, vol. 4, no. 111, pp. 910-915, 1997.
[51] J. E. Sinex, "Pulse oximetry: principles and limitations," The American journal of
emergency medicine, vol. 1, no. 17, pp. 59-66, 1999.
[52] "MATLAB Product Overview," 15 January 2016. [Online]. Available:
https://www.mathworks.com/products/matlab/index.html.
[53] "Institutional Review Board," Texas State University, [Online]. Available:
http://www.txstate.edu/research/orc/IRB-Resources.html. [Accessed 4 November
2016].
[54] "Signal Processing and Machine Learning Techniques for Sensor Data Analytics,"
MathWorks, 1 September 2015. [Online]. Available:
https://www.mathworks.com/videos/signal-processing-and-machine-learning-
techniques-for-sensor-data-analytics-107549.html. [Accessed 18 January 2016].
[55] R. Lane, K. McRae, E. Reiman, K. Chen, G. Ahern and J. Thayer, "Neural correlates
of heart rate variability during emotion," Neuroimage, vol. 1, no. 44, pp. 213-222,
2009.
[56] X.-A. Fan, L.-X. Bi and Z.-L. Chen, "Using EEG to detect drivers' emotion with