emotional state classification in patient–robot interaction ...€¦ · gsr wavelet features....

13
IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013 63 Emotional State Classification in Patient–Robot Interaction Using Wavelet Analysis and Statistics-Based Feature Selection Manida Swangnetr and David B. Kaber, Member, IEEE Abstract—Due to a major shortage of nurses in the U.S., future healthcare service robots are expected to be used in tasks involving direct interaction with patients. Consequently, there is a need to design nursing robots with the capability to detect and respond to patient emotional states and to facilitate positive experiences in healthcare. The objective of this study was to develop a new computational algorithm for accurate patient emotional state classification in interaction with nursing robots during medical service. A simulated medicine delivery experiment was conducted at two nursing homes using a robot with different human-like features. Physiological signals, including heart rate (HR) and gal- vanic skin response (GSR), as well as subjective ratings of valence (happy–unhappy) and arousal (excited–bored) were collected on elderly residents. A three-stage emotional state classification algo- rithm was applied to these data, including: 1) physiological feature extraction; 2) statistical-based feature selection; and 3) a machine- learning model of emotional states. A pre-processed HR signal was used. GSR signals were nonstationary and noisy and were further processed using wavelet analysis. A set of wavelet co- efficients, representing GSR features, was used as a basis for current emotional state classification. Arousal and valence were significantly explained by statistical features of the HR signal and GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct classifications of emotional states and clearer relationships among the physiological response and arousal and valence. The new algorithm may serve as an effective method for future service robot real-time detection of patient emotional states and behavior adaptation to promote positive healthcare experiences. Index Terms—Emotions, machine learning, physiological vari- ables, regression analysis, service robots, wavelet analysis. I. I NTRODUCTION U SE OF service robots in healthcare operations represents a potential technological solution to reducing overloaded nursing staffs in hospitals [1] for critical tasks and to increase Manuscript received June 6, 2011; revised February 10, 2012; accepted June 18, 2012. Date of publication September 12, 2012; date of current version December 21, 2012. This work was supported in part by the Edward P. Fitts Department of Industrial and Systems Engineering at North Carolina State University. This paper was recommended by Associate Editor R. Roberts of the former IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans (2011 Impact Factor: 2.123). M. Swangnetr is with the Back, Neck, and Other Join Pain Research Group, Department of Production Technology, Khon Kaen University, Khon Kaen 40002, Thailand (e-mail: [email protected]). D. B. Kaber is with the Edward P. Fitts Department of Industrial and Systems Engineering, North Carolina State University, Raleigh, NC 27695-7906 USA (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TSMCA.2012.2210408 accuracy and reliability in basic nursing task performance (e.g., medication administration). During the past two decades, the healthcare industry, including hospitals and nursing homes, have identified the capability of mobile transport robots to assist nurses in routine patient services. Hospital delivery robots are now used to automatically transport prescribed medicines to nurse stations, meals, and linens to patient rooms, and medical records or specimens to labs. Existing commercial robots, like the Aethon Tug, are capable of autonomous point- to-point navigation in healthcare environments. Although such robots have been implemented in many hospitals, they do not deliver medicines or other healthcare-related materials directly to patients. There are always nurses who must go between robots and patients. As a result, current robot designs do not support direct interaction with patients. The robotics industry is currently seeking research results on how different features of robots support interactive tasks or social interaction between humans and robots as part of healthcare operations. In nursing operations, patient care requires timely and careful task performance as well as support of positive patient emo- tions. If future service robots are to be used in tasks involving direct interaction with patients, it is important to understand how patients perceive robots and evaluate robot performance as a basis for judging healthcare quality. Since emotions play an important role for patients in communication and interaction with hospital staff (e.g., describing pain), there is a need to design service robots to be capable of detecting, classifying and responding to patient emotions to achieve positive healthcare experiences. A. Subjective Measures of Emotional States Russell’s [2] theory of emotion is the most widely accepted in psychology and contends that all emotions can be organized in terms of continuous dimensions. The theory includes a 2-D emotion space, defined by valence (pleasantness/ unpleasantness) and arousal (strong engagement/ disengagement), as presented in Fig. 1. Watson and Tellegen [3] suggested that the axes of the emotion space should pass through regions where emotion labels used by individuals are most densely clustered. They conducted a factor analysis with varimax rotation resulting in a model, including positive affect (PA) and negative affect (NA), which was a 45 rotation from Russell’s [2] valence-arousal model. However, Russell and Feldman-Barrett [4] showed that when placing more emotion terms into the 2-D space, the increased term density provided 2168-2291/$31.00 © 2012 IEEE

Upload: others

Post on 20-Jun-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013 63

Emotional State Classification in Patient–RobotInteraction Using Wavelet Analysis and

Statistics-Based Feature SelectionManida Swangnetr and David B. Kaber, Member, IEEE

Abstract—Due to a major shortage of nurses in the U.S., futurehealthcare service robots are expected to be used in tasks involvingdirect interaction with patients. Consequently, there is a need todesign nursing robots with the capability to detect and respondto patient emotional states and to facilitate positive experiencesin healthcare. The objective of this study was to develop a newcomputational algorithm for accurate patient emotional stateclassification in interaction with nursing robots during medicalservice. A simulated medicine delivery experiment was conductedat two nursing homes using a robot with different human-likefeatures. Physiological signals, including heart rate (HR) and gal-vanic skin response (GSR), as well as subjective ratings of valence(happy–unhappy) and arousal (excited–bored) were collected onelderly residents. A three-stage emotional state classification algo-rithm was applied to these data, including: 1) physiological featureextraction; 2) statistical-based feature selection; and 3) a machine-learning model of emotional states. A pre-processed HR signalwas used. GSR signals were nonstationary and noisy and werefurther processed using wavelet analysis. A set of wavelet co-efficients, representing GSR features, was used as a basis forcurrent emotional state classification. Arousal and valence weresignificantly explained by statistical features of the HR signal andGSR wavelet features. Wavelet-based de-noising of GSR signalsled to an increase in the percentage of correct classifications ofemotional states and clearer relationships among the physiologicalresponse and arousal and valence. The new algorithm may serveas an effective method for future service robot real-time detectionof patient emotional states and behavior adaptation to promotepositive healthcare experiences.

Index Terms—Emotions, machine learning, physiological vari-ables, regression analysis, service robots, wavelet analysis.

I. INTRODUCTION

U SE OF service robots in healthcare operations representsa potential technological solution to reducing overloaded

nursing staffs in hospitals [1] for critical tasks and to increase

Manuscript received June 6, 2011; revised February 10, 2012; acceptedJune 18, 2012. Date of publication September 12, 2012; date of current versionDecember 21, 2012. This work was supported in part by the Edward P. FittsDepartment of Industrial and Systems Engineering at North Carolina StateUniversity. This paper was recommended by Associate Editor R. Roberts of theformer IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systemsand Humans (2011 Impact Factor: 2.123).

M. Swangnetr is with the Back, Neck, and Other Join Pain Research Group,Department of Production Technology, Khon Kaen University, Khon Kaen40002, Thailand (e-mail: [email protected]).

D. B. Kaber is with the Edward P. Fitts Department of Industrial and SystemsEngineering, North Carolina State University, Raleigh, NC 27695-7906 USA(e-mail: [email protected]).

Color versions of one or more of the figures in this paper are available onlineat http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSMCA.2012.2210408

accuracy and reliability in basic nursing task performance (e.g.,medication administration). During the past two decades, thehealthcare industry, including hospitals and nursing homes,have identified the capability of mobile transport robots toassist nurses in routine patient services. Hospital delivery robotsare now used to automatically transport prescribed medicinesto nurse stations, meals, and linens to patient rooms, andmedical records or specimens to labs. Existing commercialrobots, like the Aethon Tug, are capable of autonomous point-to-point navigation in healthcare environments. Although suchrobots have been implemented in many hospitals, they do notdeliver medicines or other healthcare-related materials directlyto patients. There are always nurses who must go betweenrobots and patients. As a result, current robot designs do notsupport direct interaction with patients. The robotics industryis currently seeking research results on how different featuresof robots support interactive tasks or social interaction betweenhumans and robots as part of healthcare operations.

In nursing operations, patient care requires timely and carefultask performance as well as support of positive patient emo-tions. If future service robots are to be used in tasks involvingdirect interaction with patients, it is important to understandhow patients perceive robots and evaluate robot performanceas a basis for judging healthcare quality. Since emotions playan important role for patients in communication and interactionwith hospital staff (e.g., describing pain), there is a need todesign service robots to be capable of detecting, classifying andresponding to patient emotions to achieve positive healthcareexperiences.

A. Subjective Measures of Emotional States

Russell’s [2] theory of emotion is the most widely acceptedin psychology and contends that all emotions can be organizedin terms of continuous dimensions. The theory includesa 2-D emotion space, defined by valence (pleasantness/unpleasantness) and arousal (strong engagement/disengagement), as presented in Fig. 1. Watson and Tellegen[3] suggested that the axes of the emotion space should passthrough regions where emotion labels used by individuals aremost densely clustered. They conducted a factor analysis withvarimax rotation resulting in a model, including positive affect(PA) and negative affect (NA), which was a 45◦ rotation fromRussell’s [2] valence-arousal model. However, Russell andFeldman-Barrett [4] showed that when placing more emotionterms into the 2-D space, the increased term density provided

2168-2291/$31.00 © 2012 IEEE

Page 2: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

64 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013

Fig. 1. Two-dimensional valence-arousal structure of affect (adapted fromWatson & Tellegen, 1985).

no further guidance for factor model rotation. Reisenzien [5]also offered that the valence-arousal model provided concep-tually separate building blocks of core affective feelings. Forexample, Reisenzien said enthusiasm is a PA that is a blend ofpleasantness and strong engagement. He also said distress is aNA that is a blend of unpleasantness and strong engagement.For this reason, in the present study, Russell’s [2] 2-D model ofemotion was used.

Subjective measures of emotion include self-reports, inter-views on emotional experiences, and questionnaires on whicha participant identifies images of expressions or phrases thatmost closely resemble their current feelings. Previous researchhas used image-based self-report measures for assessing humanemotional states in terms of valence and arousal, such as theself-assessment Manikin (SAM) [6]. Image-based question-naires are designed to overcome the disadvantage of subjectshaving to label emotions, which can lead to inconsistency inresponses. The SAM consists of pictures of manikins represent-ing five states of arousal (ranging from “excited” to “bored”)and five states of valence (ranging from “very happy” to “veryunhappy”). Subjects can rate their current emotional state byeither selecting a manikin or marking in a space between twomanikins, resulting in a nine-point scale.

B. Physiological Measures of Emotional States

Physiological responses have also been identified as reliableindicators of human emotional and cognitive states. They areconsidered automatic outcomes of the autonomous nervoussystem (ANS), primarily driven by emotions. The ANS is com-posed of two main subsystems: the sympathetic nervous sys-tem (tends to mobilize the body for emergencies—the “fight”response) and the parasympathetic nervous system (tends toconserve and store bodily resources—the “flight” response).Among several physiological measures, heart rate (HR) andgalvanic skin response (GSR) are the two most commonly usedfor revealing states of arousal and valence [7]. These responsesare also relatively simple and inexpensive to measure withminimal intrusiveness to subject behavior [8].

1) HR: HR is usually measured in milliseconds by detect-ing “QRS” wave complexes in an electrocardiographic (ECG)

Fig. 2. GSR waveform.

record and determining the intervals between adjacent “R”wave peaks. HR, in beats per minute (bpm), can be directlycalculated from “RR” intervals. There are also several statisticalmeasures that can be determined on HR (see Malik et al. [9] fora comprehensive description). Common features used in studiesof human emotion and cognitive states include: mean HR (inbpm) and the standard deviation of HR (SDHR), also expressedin bpm.

HR has been previously used to differentiate between userpositive and negative emotions in human–computer interaction(HCI) tasks. Mandryk and Atkins [10] developed fuzzy rules,based on a literature review, defining how physiological signalsrelated to arousal and valence. They asserted that when HRis high, arousal and valence are also high. However, otherstudies have shown that there are no observed HR differencesbetween positive and negative emotions [11]–[13]. Therefore,the relationships between HR and valence and arousal may notbe definite. Due to the practicality of HR for emotion stateclassification, further assessment of these relationships wasconducted in this study.

2) GSR: GSR measures electrodermal activity in terms ofchanges in resistance across two regions of skin. A voltageis applied between two electrodes attached to the skin andthe resulting current is proportional to the skin conductance(SC, μSiemens), or inversely proportional to the skin resistance(μOmhs). The response is typically large and varies slowly overtime; however, it has been found to fluctuate quickly duringmental, physical, and emotional arousal. A GSR signal consistsof two main components: SC level (SCL) or tonic level—thisrefers to the baseline level of response; and SC response (SCR)or phasic response—this refers to changes from the baselinecausing a momentary increase in SC (i.e., a small wave super-imposed on the SCL). SCR normally occurs in the presence ofa stimulus; however, a SCR that appears during rest periods, orin absence of stimuli, is referred to as “nonspecific” SCR (NS-SCR). Fig. 2 illustrates a typical GSR waveform.

Dawson et al. [7] identified GSR measures related to emo-tional state, including: SCL; change in SCL; frequency of NS-SCRs; SCR amplitude; SCR latency; SCR rise time; SCR halfrecovery time; SCR habitation (number of stimuli occurringbefore no response); and slope of SCR habitation. The mostcommonly used measure is the amplitude of SCR. Change inSC is widely accepted to reflect cognitive activity and emotionalresponse with a linear correlation to arousal. Specifically, SChas been found to increase as arousal increases [7], [10], [14],

Page 3: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION 65

[15]. However, the relationship between SC and valence is notdefinite. Although Dawson et al. [7] report that in most studiesSC does not distinguish positive from negative valence stimuli,other research has demonstrated some relationship between SCand valence. For example, increases in the magnitude of SChave been associated with negative valence [15], [16].

Relatively few studies have examined the psychological sig-nificance of SCL and NS-SCRs produced during the perfor-mance of on-going tasks. Dawson et al. [7] reported relationsbetween these measures and task engagement as well as emo-tions. Typically, SCL increases about 1 μSiemen above restinglevel during anticipation of a task and then increases another1 to 2 μSiemens during performance. Pecchinenda and Smith[17] measured NS-SCR rate, and maximum amplitude andslope of SCL during a difficult problem-solving task. Resultsshowed that SC activity increased at the start of trials butdecreased by the end under the most difficult condition.

C. Current Modeling Approaches for ClassifyingEmotional States

Although physiological signals can be noisy and some maylack definitive relationships with emotional states, numerousstudies have been conducted on emotional state classificationusing such objective data (e.g., [10], [15]–[20]). Emotion mod-eling approaches vary in terms of both physiological vari-able inputs and classification methods. Analysis of variance(ANOVA) and linear regression are the most common foridentifying physiological signal features that may be significantin differentiating among emotional states and selecting sets ofsignificant features for predicting emotions, respectively [18],[19]. However, these approaches make the assumption of linearrelationships between physiological responses and emotionalstates.

Fuzzy logic models are an alternative classification approachto deal with nonlinear relationships and uncertainty among sys-tem inputs and outputs. Such models can represent continuousprocesses that are not easily divided into discrete segments;that is, when a change from one linguistically defined state toanother is not clear. Mandryk and Atkins [10] developed a fuzzylogic model to transform HR, GSR, and facial electromyogra-phy (EMG) for smiling and frowning to arousal and valencestates during video game play. A second model was used toclassify arousal and valence into five lower level emotionalstates related to the gaming situation, including: boredom, chal-lenge, excitement, frustration, and fun. Model results revealedthe same trends as self-reported emotions for fun, boredom,and excitement. This approach has the advantage of describingvariations among specific emotional states during the course ofa complete emotional experience. The major drawback is fuzzyrules used in a fuzzy system for classification problems mustbe constructed manually by experts in the problem domain.However, as described above, previous research has not beenable to clearly define the relationships between physiologicalresponses (e.g., HR and GSR) and specific emotional states(valence and arousal).

Machine learning approaches have also been used to dealwith nonlinear relationships and uncertainty in emotion clas-

sification based on physiological responses. Machine learningalgorithms can automatically generate models, including rulesand patterns, from data. Supervised learning algorithms areused to identify functions for mapping inputs to desired outputsbased on training data and are later validated against test data.Artificial neural networks (ANN) are a common form used forhuman emotional state classification. Lee et al. [16] applied amultilayer perceptron (MLP) network to recognize emotionsfrom the standard deviation of ECG RR intervals (in ms), theroot mean-square of the difference in successive RR intervals(in ms), meanHR, the low-frequency/high-frequency ratio ofHR variability (HRV) and the SC magnitude for GSR. By usingratings from the SAM questionnaire as desired outputs, the net-work was able to learn sadness, calm pleasure, interesting plea-sure, and fear with correct classifications of 80.2%. Lisetti andNasoz [20] compared three different machine learning algo-rithms, including k-nearest neighbor (KNN), discriminant func-tion analysis (DFA), and a neural network using a Marquardtbackpropagation (MBP) algorithm, for emotion classificationin a HCI application. They used minimum, maximum, mean,and variance values of normalized GSR, body temperature, andHR as algorithm inputs while emotion states/outputs (sadness,anger, fear, surprise, frustration, and amusement) were elicitedbased on movie clips. Results showed that emotion recognitionby the KNN was 72.3% accurate, the DFA was 75.0% accurate,and the MBP NN was 84.1% accurate.

Instead of learning from labeled data, unsupervised learn-ing approaches automatically discover patterns in a data set.Amershi et al. [15] applied an unsupervised clustering tech-nique to affective expressions in educational games. Theyidentified several influential features from SC, HR, and EMGfor smile and frown muscles. Results showed that only a fewstatistical features of the responses (e.g., mean and standarddeviation) were relevant for defining clusters. In addition, clus-tering was able to identify meaningful patterns of reactionswithin the data.

Some studies have been conducted on recognizing humanemotional states when interacting with robots. Itoh et al. [21]developed a bioinstrumentation system to measure humanstress states during HRI. However, stress responses do notstrictly covary with emotional responses. For example, differentlevels of stress may occur with an emotional response of fear.Moreover, this study only used HRV as an indicator of stressbased on a defined classification rule in which the ratio ofsympathetic/parasympathetic responses increases if a humanfeels stress. Kulic and Croft [22] used physiological responsesto assess subject emotional states of valence and arousal byusing the motion of a robot manipulator arm as a stimulus. Theemotional state assessment was based on a rule-based classi-fication model constructed from a literature review. Subjectiveresponses were used separately to measure participant discreteemotion responses to the robot motion. A series of studies byLiu and Sarkar et al. [23]–[25] developed a human emotionalstate assessment model based on physiological responses in aHCI scenario. The model was then used for real-time emotionclassification in the same scenario. However, the manner inwhich a human interacts with a robot is similar but not identicalto interactions between a human and a computer [26].

Page 4: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

66 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013

D. Motivation and Objective

Real-world applications for interactive robots in hospitalenvironments are being developed. Physiological responses,such as HR and GSR signals from patients can be monitored,with minimal intrusiveness, in real-time in hospitals for deter-mining patient status. Therefore, it is possible that physiologicalmeasures can be extracted and emotional states classified inreal-time during patient interaction with robots. This situationprovides a basis for performing real-time robot expressionmodification according to current human emotional states andto ensure quality in robot-assisted healthcare operations.

Although there are several on-going research studies onemotional state identification based on physiological data,the literature reveals relatively few classification models forrecognizing human emotions when interacting with servicerobots, particularly nursing robots in medicine delivery tasks.Some methods of emotional classification have been adoptedand/or modified for testing in human interaction with humanoidrobots (e.g., [21]). However, the manner and circumstancesin which patients interact with a robot may be similar butare not identical to interactions between a healthy human anda robot in other situations. As nursing robots are expectedto become common in hospitals of the future, it is important todevelop accurate methods for assessing patient responses tosuch robots providing medical services. It also appears thatthe relationships between physiological responses (e.g., HRand GSR) and emotional states (valence and arousal) are notwell-defined. Therefore, further assessment is needed throughsensitive physiological feature identification along with robustemotional state classification modeling.

One major problem when analyzing physiological signals isnoise interference. Additional signal processing must be ap-plied to attenuate noise without distorting signal characteristics.Since physiological signals are nonstationary [27] and mayinclude random artifacts and other unpredictable phenomena,they are problematic for several signal processing methods, suchas fast Fourier transform (FFT). A wavelet transform, which isa tool for analysis of transient, nonstationary or time-varyingphenomena, is often useful for such processing.

Prior studies have represented stochastic physiological sig-nals using statistical features (based on expert domain knowl-edge) to classify emotional states. Unfortunately, informationcan be lost with such features as simplifying assumptions aremade, including knowledge of the probability density functionof the data. Furthermore, there may be signal features thathave not been identified by experts, but have the potential tosignificantly improve emotion classification accuracy. It hasbeen suggested that signal processing features may be usefulfor this purpose [27].

Considering these research issues, the objectives of thepresent study were to:

1) Assess the relationships between physiological re-sponses, including HR and GSR, and emotional statesin terms of valence and arousal during patient–robotinteraction (PRI)—Arousal states were expected to bebetter explained by GSR features; while, valence stateswere expected to be better explained by HR features.

2) Develop a machine learning algorithm for accurate pa-tient emotional state classification with the potential toclassify states in real-time during PRI.

3) Examine the utility of advanced signal processing fea-tures for representing physiological signals in emotionalstate identification—Wavelet coefficients can be used asa compressed representation of amplitude, time, and fre-quency features of physiological signals.

4) Develop a wavelet-based de-noising algorithm by iden-tifying the noise distribution and features of a referencesignal to eliminate those noise features overlapping withthe informative signal frequency.

5) Identify significant wavelet-based features for emotionalstate classification—A statistical approach can be used toidentify physiological features with utility for classifyingemotional states.

II. EXPERIMENT

An experiment was conducted to develop an empirical dataset that could be used as a basis for addressing the aboveobjectives. Observations on human emotional states and physi-ological responses in interacting with an actual service robot inthe target context were needed to develop the emotional stateclassification algorithm and to demonstrate the wavelet-basedsignal processing methods.

A. Procedure

With the aging U.S. population, the Health Resources andServices Administration predicted that elderly persons repre-sent the primary future users of healthcare facilities for age-related healthcare needs. Therefore, elderly will likely be thelargest user group of nursing robots in the future. For the presentstudy, we recruited 24 residents at senior centers (17 femalesand 7 males) in Cary, North Carolina. They ranged in agefrom 63 to 91 years with a mean of 80.5 years and a standarddeviation of 8.8 years.

At the beginning of the experiment, participants read andsigned an informed consent and completed a background sur-vey. They were then provided with a brief introduction onnursing robots and applications in healthcare tasks, includ-ing medicine delivery. This was followed by a familiarizationwith the SAM form for emotion ratings and the physiologicalmeasurement. A Polar HR monitor (Polar Electro Inc.) wasused, including a S810i wrist receiver and T31 transmitterattached to an elastic strap around the chest area. The monitorrecorded heart activity in RR intervals, and the Polar PrecisionPerformance software was used for analysis.

An iWorx GSR-200 amplifier (iWorx Systems, Inc.) wasused to apply a voltage between electrodes placed on the surfaceof a subject’s index and ring fingertips. Factory calibration ofthe amplifier equated 1 volt to a SC of 5 μSiemens. The outputvoltages of the GSR signal were transmitted to a DT9834 dataacquisition system (Data Translation, Inc.) and finally recordedin a computer using quickDAQ software (Data Translation,Inc.) with a sampling rate of 1024 Hz.

Page 5: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION 67

Fig. 3. PeopleBot platform.

Once the HR monitor and GSR electrodes were placed on asubject, they were asked to sit and relax on a sofa located in asimulated patient room. During this period, 1 min of HR andGSR data was recorded and the mean responses were later usedas baselines [28] for normalizing test trial data.

There were six events identified during each experiment testtrial. At the beginning of a trial, a PeopleBot robot (see Fig. 3)entered the simulated patient room (Event 1), with a containerof medicine in a gripper and stopped in front of the subject(Event 2). The robot notified subjects of its arrival (Event 3)and released the bottle of medicine from its gripper (Event 4).The robot then waited for a short period of time before it turnedaround (Event 5) and left the room (Event 6).

The design of the experiment was a randomized completeblock design with three independent variables representingrobot feature manipulations, including robot facial features(see Fig. 4; abstract or android), speech capability (synthesizedor digitized voice), and different modes of user interaction(i.e., visual messages or physical confirmation of receipt ofmedication with a touch screen). Each subject was exposed toall settings of each variable. No interactions of features werestudied in the experiment. The order of presentation of thecontrol condition (i.e., a robot without any features) amongstimulus trials was randomly assigned.

The physiological data (HR and GSR) were collectedthroughout trials. At the end of each trial, subjects completedthe SAM questionnaire, indicating their emotional responseto the specific robot configurations. After subjects completed14 test trials (2 replications of 2 levels of the face, voiceand interactivity conditions, plus 1 control condition), a finalinterview was conducted in which they provided comments ontheir impressions of the robot configurations. On average, eachsubject took ∼50 min. to complete the experiment.

B. Data Post-Processing and Overall Results

To address individual differences in internal scaling of emo-tions and physiological responses, all observations on the re-sponse measures were normalized for each participant. The

Fig. 4. Some robot feature manipulations.

arousal and valence ratings from the SAM questionnaire wereconverted to z-scores. Normalized ratings were then catego-rized as “low,” “medium,” or “high” levels of valence andarousal. There were two subjects who did not follow experimentinstructions in the arousal and valence ratings, and for thisreason their data were considered invalid and were excludedfrom analysis.

The physiological response measures, HR (in bpm) and GSR(in μSiemens), were also normalized by subtracting test trialreadings (Yi) from the mean baseline response (Ybaseline) anddividing by the maximum range for every participant, as shownin a following equation:

Ynormalized =Yi − Ybaseline

Max [abs(Yi − Ybaseline)]. (1)

Results of ANOVAs on the two subjective measures ofemotion, arousal, and valence indicated significant differencesin emotional state depending on robot feature settings. How-ever, no one feature appeared more powerful than any otherfor facilitating positive emotional experiences. Regarding thephysiological measures, there were also significant differencesamong robot feature types. The interactivity condition pro-duced greater HR and GSR responses than the face and voiceconditions when interacting with the robot. However, whenmaking comparisons among the settings of each feature, onlythe levels of interactivity appeared to differ significantly. Corre-lation analyses were also conducted between the physiological

Page 6: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

68 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013

TABLE IANOVA RESULTS ON HR FOR LEVELS OF VALENCE AND AROUSAL AT

SPECIFIC EVENTS (NOTE: F-STATISTICS INCLUDE NUMBERATOR

AND DENOMINATOR DEGREES OF FREEDOM. P-VALUES

REVEAL SIGNIFICANCE LEVELS FOR EACH TEST.BOLD VALUES ARE SIGNIFICANT.)

TABLE IIANOVA RESULTS ON GSR FOR LEVELS OF VALENCE

AND AROUSAL AT SPECIFIC EVENTS

and subjective ratings of emotions. Results revealed a strongpositive relation between valance and HR, but no significantcorrelation between arousal and GSR.

III. ANALYTICAL METHODOLOGY

A. Event Selection and Analytical Procedure

Analysis of physiological measures as a basis for emotionalstate classification is generally conducted on an event basis.Ekman [29] observed that human emotional responses typicallylast between 0.5 and 4 s. Consequently, short recording periods(< 0.5 s) might cause certain emotions to be missed or longperiods (> 4 s) might lead to observations of mixed emotions[30], [31]. In addition, Dawson et al. [7] indicated any GSRSCR, which begins between 1 and 3 s or 1 and 4 s afterstimulus onset, is considered to be elicited by that stimulus. Weused a 4-s time window for physiological data recording afteran event in the experiment trials for data analysis purposes.One-way ANOVA models were structured for each event andused to identify that event providing the greatest degree ofdiscrimination of physiological responses (i.e., HR and GSR)based on subject emotional states, which were categorized asthree levels of arousal and three levels of valence. Experimentresults, shown in Tables I and II, revealed significant differencesin HR for the levels of valence during Events 4 (openinggripper), 5 (subject accepting medicine) and 6 (robot leavingroom); while there were significant differences in GSR for thelevels of arousal during Events 2 (robot moving in front ofpatient), 4 and 6.

Across these analyses, only Events 4 and 6 supported HRand GSR for discriminating among emotional states. Event6 occurred after the robot moved from the patient room andwas not considered a potential stimulus. Therefore, Event 4,the robot opening its gripper to release the medicine bottleto a patient, was selected as the stimulus event for furtherinvestigation.

Fig. 5. Post-hoc results on GSR for levels of arousal at significant stimulusevents.

Fig. 6. Post-hoc results on HR for levels of valence at significant stimulusevents.

Further post-hoc analyses were conducted using Tukey’stests on HR and GSR responses recorded at influential stimulusevents to identify significant differences among the levels ofemotional state (arousal and valence). Results revealed higharousal to be associated with higher GSR than low and mediumarousal (see Fig. 5); whereas medium valence was associatedwith higher HR than low and high valence for significantstimulus events (Fig. 6).

Fig. 7 presents the overall analytical procedure used in thisstudy. Initially, we sought to extract statistical and waveletfeatures from the raw physiological signals. We then used astatistical approach to select significant or relevant featuresfor use in the machine learning model for emotional stateclassification. The following subsections describe each of thesesteps in detail.

B. Physiological Feature Extraction: HR Analysis

Based on the prior research (e.g., [32]), several statisticalfeatures of the HR response were identified for investigation inclassifying emotional states. These included mean and medianHR as measures of centrality and SDHR and the range of theresponse as measures of variation. (The 4-s time window fordata analysis was not sufficient for determining HRV or otherHR features in the frequency domain.) Plots of the normalizedHR response distributions for some data collected in the studyrevealed symmetry with a small number of outliers in a fewtrials. For such distributions, based on small sample sizes, thesample mean is a more efficient estimator of the populationmean (i.e., it has a smaller variance) than the median [33].Therefore, mean HR was used for analysis purposes. Basedon the data set, either SDHR or the range were consideredsuitable for estimating the population variance. These variables

Page 7: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION 69

Fig. 7. Procedure to development of emotional state classification algorithm.

were selected for subsequent statistical analysis to identifyphysiological signal features for classifying emotional states.

C. Physiological Feature Extraction: GSR Analysis

1) Wavelet Selection: As suggested above, the recordedGSR signals were nonstationary and noisy and required furthersignal processing. A small wavelet, referred to as the “mother”wavelet, was translated and scaled to obtain a time-frequencylocalization of the GSR signal. This leads to most of the energyof the signal being well represented by a few wavelet expansioncoefficients. The mother wavelet was chosen primarily basedon its similarity to the raw GSR signal [27]. Among severalfamilies of wavelets, Daubechies (dbN), Symlets (symN), andCoiflets (coifN) families have the properties of orthogonalityand compactness that can support fast computation algorithms.Unlike the near symmetrical shape of symN and coifN, anasymmetrically shaped dbN wavelet was found to closely matchthe GSR waveform. Although previous studies have used dbNwavelets to analyze GSR signals [34], [35], the choice ofwavelets was subjective and not primarily based on the shape ofa typical GSR signal. Therefore, these approaches may not havecaptured all informative features of the signal or eliminatednoise. Gupta et al. [36] suggested that if the form of the waveletis matched to the form of the signal, such that the maximumenergy of the signal is accounted for in the initial scaling space,then the energy in next lower wavelet subspace should be verysmall. The mother wavelet that produces the minimum meansquare error [MSE; see (2)] between two signals is the bestmatch to the signal

MSE =

∫[x(t)− x̂(t)]2 dt

n− 1. (2)

The typical frequency of the GSR signal is 0.0167–0.25 Hz[7]. Therefore, reconstruction of the signal in the wavelet scale,including frequencies below 0.5 Hz, will represent the GSRsignal, x(t); whereas signal reconstruction on the next lowerwavelet scale (including higher frequencies) will capture boththe GSR signal and noise, x̂(t). (Note: As the wavelet scaledecreases, the represented frequencies increase.) Several priorstudies have recommend that the cutoff frequency for high-

TABLE IIIMSES CALCULATED ON THE DIFFERENCES BETWEEN THE

RECONSTRUCTED SIGNALS FOR VARIOUS DBN WAVELETS

frequency noise be at least double the highest signal frequency.The MSE of the difference between these two reconstructedsignals for the common dbN, are shown in Table III. Resultsindicated that db3 was the most appropriate choice of motherwavelet to represent the GSR signal.

2) Noise Elimination: In recording the GSR signal, the mea-surement device also generated noise, including: white noiseexisting inherently in the amplifier (i.e., the power is spreadover the entire frequency spectrum), noise from poor electrodecontacts or variations in skin potential (i.e., a low-frequencyfluctuation), power line noise (60 Hz. in the US), motionartifacts, etc. [37], [38]. For the purpose of frequency analysis,signal noises were separated into: mid-band frequency noise,which overlaps the GSR frequency; and high-frequency noise(> 0.5 Hz).

a) High-frequency noise elimination: Based on decom-position of the GSR signal using the db3 wavelet, coefficientsrepresenting high-frequency (> 0.5 Hz) details of the signal(noise) were set to zero. Consequently, an entire 4-s GSR signal(1024× 4 = 4096 data points) was effectively represented byonly 24 wavelet coefficients. These coefficients were character-ized by a 4-s time localization at four frequency ranges. (Note:The number of coefficients for each frequency range is notuniform. The higher the frequency, the greater the number ofcoefficients.) The amplitude of the coefficients correspondedto the amplitude of the GSR signal in specific frequencies atspecific times.

b) Mid-band frequency noise elimination: Waveletthreshold shrinking algorithms have been widely used forremoval of noise from signals [39], [40]. The soft thresholdingshrinkage function was used in the present algorithm. Theconcept is to set all frequency subband coefficients, which areless than a particular threshold (λ), to zero since such details areoften associated with noise. Subsequently, shrinking is appliedto other nonzero coefficients based on the threshold value.

Page 8: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

70 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013

Fig. 8. Power spectrum of GSR occurring during subject rest period.

There are several wavelet threshold selection rules used fordenoising, for example: the Donoho and Johnstone universalthreshold (DJ threshold), calculated by σ

√2 log(N), where σ

is noise standard deviation and N is the signal size; a confidenceinterval threshold, such as 3σ or 4σ; the SureShrink threshold,based on Stein’s unbiased estimate of risk (a quadratic lossfunction); and the Minimax threshold developed to yield theminimum of the maximum MSE over a given set of functions.However, in practice, the noise level σ, which is needed in allthese thresholding procedures, is rarely known and is thereforecommonly estimated based on the median absolute deviationof the estimated wavelet coefficients at the finest level (i.e., thehighest frequency) divided by 0.6745 [41]. It can be seen thatthese selection rules are generated from signal characteristicsbut have no relation to the nature of the noise.

A frequency analysis was conducted on the 1 min of GSRsignal data recorded during the subject rest period. A FFT wasused to reveal the power spectrum of the baseline responseto range from 0 to 0.5 Hz with peak values occurring forfrequencies less than 0.1 Hz. (see Fig. 8). (This techniquewas only applied to the GSR signal during the rest period toidentify noise in specific frequency ranges of the referencesignal. Wavelet analysis was used for all test signal denoisingand feature identification.)

The typical frequency of the GSR signal during therest period (without any stimuli) was between 0.0167 and0.05 Hz. This frequency range is represented by approximationcoefficients after decomposition. Therefore, all detail coeffi-cients (frequencies between 0.05 and 0.5 Hz) represent signalnoise and can be used as thresholds for signal de-noising. Theconfidence interval threshold technique (mentioned above), de-termines a threshold from the standard deviation of noise. Thisis based on the basic concept that by setting a threshold λ to 3σ,the probability of noise coefficients out of the interval [−3σ, 3σ]will be very small (0.27%). However, this technique is basedon the assumption that the noise coefficients are normallydistributed. In fact, the distribution of wavelet detail coefficientsis better represented by a zero-mean Laplace distribution [42],which has heavier tails than a Gaussian distribution. Wavelet

Fig. 9. Laplace distribution fit to wavelet detail coefficients from the GSRdata collected during the subject rest period.

Fig. 10. Noisy GSR and de-noised GSR comparison.

detail coefficients obtained from the baseline data set weretested for fit to Normal and Laplace distribution using theKolmogorov-Smirnov test. Results confirmed that the detailcoefficient distributions were not significantly different fromthe Laplace distribution (p > 0.05), but there was a significantdifference from the Normal distribution. Fig. 9 illustrates thehistogram of detail coefficients from the signal recorded duringthe rest period fitted by a Laplace distribution.

The Laplace distribution has a mean and median of μ andvariance σ2

L = 2b2. to have 99.73% confidence that noise co-efficients will be eliminated from the signal, the thresholdof wavelet shrinkage is set to 4.18σL. When compared withthe normal distribution, noise coefficients modeled with theLaplace distribution will have a higher threshold value. Theσ of noise in the raw signal was estimated based on the restperiod data. We expected no GSR frequencies between 0.05and 0.5 Hz when the subject was not exposed to a stimulus.The σ represents the variance of the Laplace distribution on thewavelet coefficients for the signal during the rest period. This σis also used as the threshold for de-noising the signal when thesubject is exposed to a stimulus.

Fig. 10 illustrates an example GSR signal de-noised using theabove methodology.

Page 9: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION 71

The rectified signal with de-noising in all frequencies can becompared with the raw GSR signal and the GSR signal withonly high-frequency noise elimination. The wavelet analysisinitially eliminated the high-frequency noise, for example, anyabrupt signals caused by motion artifacts. The transformationthen eliminated noise present in the frequencies overlappingwith the GSR signal frequency, resulting in a smoother GSRsignal. This methodology appears to be highly promising forisolating informative GSR signal features and has not previ-ously been demonstrated.

D. Feature Selection

The feature selection step in the new algorithm was to reducethe complexity of any data model by selecting the minimum setof most relevant physiological signal features for classifyingemotional states. Principal component analysis (PCA) is themost commonly used data reduction technique in pattern recog-nition and classification [43] and has been applied to data in thewavelet domain (e.g., [44]). The basic idea is to project the dataon to a lower dimensional space, where most of the informationis retained. Unfortunately, transformed variables are not easyto interpret (i.e., assign abstract meaning; [45]). Moreover,PCA does not take class information into consideration; con-sequently, there is no guarantee that the classes in the trans-formed data are better separated than in the original form [43].

Multiple-hypothesis testing procedures are another form offeature selection technique that has been used for waveletcoefficient selection (e.g., [46], [47]). All coefficients are simul-taneously tested for a significant departure from zero; therefore,the selected set of wavelet coefficients will provide the bestcontrast between classes [41]. However, the features identifiedin this approach are based on their classification capabilities butnot on specific relationships with classes.

Prior research (e.g., [32]) has also used stepwise regressionprocedures for selection of physiological features for use inmodels to predict emotional states. This type of analysis can beused to identify features that have a significant relationship withemotional responses and to avoid overly complex classificationmodels. However, such analysis is typically conducted in theoriginal time domain.

On this basis, a stepwise backward-elimination regressionapproach was applied to identify the HR and GSR featuresthat were statistically significant in classifying emotional states.The SDHR and RangeHR responses were highly correlated andwere, therefore, examined in separate models. The analysis alsoexamined both high-frequency de-noised and total de-noisedGSR signals (i.e., signals with high and mid-band frequencyelimination). Although previous studies demonstrated somerelationships may exist between the statistical features of HRand GSR and emotional states, no prior work has investigatedwavelet coefficients to predict emotional states. Therefore, theentire range of wavelet coefficients, determined based on theGSR signals, was considered potentially viable for classifica-tion of concurrent subject emotional states. Since the orthogo-nality property of the Daubechies family of wavelets ensuresthere is no correlation between wavelet coefficients for thesignal being processed, it is possible to include multiple dbN

TABLE IVSIGNIFICANT HIGH-FREQUENCY DE-NOISED GSR AND

HR FEATURES FOR CLASSIFYING AROUSAL AND

VALENCE BASED ON REGRESSION ANALYSIS

TABLE VSIGNIFICANT TOTAL DE-NOISED GSR AND HR FEATURES

FOR CLASSIFYING AROUSAL AND VALENCE BASED

ON REGRESSION ANALYSIS

wavelet coefficients as predictors in a single regression modelof emotional states without violating model assumptions.

Results revealed regression models of arousal producing thehighest R-squared values to include: 15 high-frequency de-noised GSR wavelet coefficient features and SDHR; and 14total de-noised GSR wavelet coefficient features and SDHR(see Tables IV and V). For valence models, the best predictivemodels included: 14 high-frequency de-noised GSR waveletcoefficient features; and 11 total de-noised GSR wavelet co-efficient features, MeanHR and SDHR (see Tables IV and V).(Note that low and high GSR subscripts correspond with lowand high signal frequencies from time 0 to 4 s.)

E. Emotional State Classification

Among a large number of neural network structures, theMLP is used more often and has proven to be a universalapproximator [43]. We implemented neural network modelsusing the NeuralSolution software (NeuroDimension, Inc.) withan error-back propagation algorithm. A hyperbolic tangentsigmoid function was used as the activation function for allneurons in the hidden layer. A forward momentum of 0.9 [43]was also used at the hidden nodes to prevent a network fromgetting stuck at a local minimum during training. We deter-mined weights of links among the nodes in the ANN structureto minimize the classification error. We set MSE = 0.02[48] asa criterion error goal. For validation (testing) of the ANNs, thedata “hold-out” method was used, including separating 80% ofthe samples for training the NN and the remaining 20% for

Page 10: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

72 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013

TABLE VIOVERALL PCCS FOR AROUSAL AND VALENCE CLASSIFICATION

NETWORKS USING HIGH-FREQUENCY DE-NOISED WAVELET FEATURES

SELECTED BASED ON STEPWISE REGRESSION ANALYSIS

TABLE VIIOVERALL PCCS FOR AROUSAL AND VALENCE CLASSIFICATION

NETWORKS USING TOTAL DE-NOISED WAVELET FEATURES

SELECTED BASED ON STEPWISE REGRESSION ANALYSIS

validation. The validation data was randomly selected. Thenumber of data points was balanced among the classes ofvalence and arousal.

To create a parsimonious ANN for classifying subject emo-tional states based on physiological signal features, we useda single hidden layer of processing elements. The minimumnumber of hidden nodes (h) in the hidden layer can be definedbased on the number of inputs (n) and outputs (m) usingMasters [49] equation

h = Int[(m× n)

12

]. (3)

In this paper, the emotional states of subjects were classifiedinto three levels, including: low, medium, and high arousalor valence. Based on the sets of physiological data featuresselected from the stepwise regression procedure for both thehigh-frequency de-noising data set (16 features for arousalmodel and 14 features for valence model) and the total de-noising data set (15 features for arousal model and 13 featuresfor valence model), Masters’ equation indicated the minimumnumber of hidden nodes for the ANNs to be approximatelysix (for either emotional response). Holding fixed the set ofinputs selected from the stepwise regression and only using asingle hidden layer network structure, the number of hiddenlayer nodes was optimized to achieve the highest percentageof correct classifications (PCC) of subject emotional responses.

The PCCs for the best arousal and valence classificationnetworks from the high-frequency de-noised data set are shownin Table VI. Results revealed the overall PCC in validation ofthe ANN for classifying arousal to be 72%. This network wasconstructed with eight hidden nodes and produced a R-squaredvalue of 0.73. The PCC of the ANN for classifying valence was67%. The network was constructed with six hidden nodes andproduced a R-squared value of 0.6.

Results of arousal and valence state classification based onthe total de-noised data set, as shown in Table VII, revealed theoverall PCC in validation of the ANN for classifying arousal tobe 82%. This network was constructed with seven hidden nodesand produced a R-squared value of 0.78. The PCC of the ANNfor classifying valence was 73%. The network was constructedwith seven hidden nodes and produced a R-squared value of

Fig. 11. Sensitivity analysis of arousal state.

Fig. 12. Sensitivity analysis of valence state.

0.73. It can be seen that after applying the proposed algorithm toeliminate signal noise, overlapping with the informative signalfrequency, the classification models produced higher PCCs forboth arousal and valence state.

Sensitivity analyses were performed using the NeuroSolu-tions software to provide information on the significance of thevarious inputs to the arousal and valance classification networks(see Figs. 11 and 12). The HR signal appeared to have the leasteffect on the arousal model; however, it had a relatively higheffect on the valence model. In contrast, most GSR featureshad significant effects on arousal while fewer numbers of GSRfeatures had great impact on valence states.

To demonstrate the performance of the statistical featureselection methodology, additional ANNs were constructed in-cluding all GSR wavelet coefficients (24 features) and HRfeatures (MeanHR and SDHR) for classifying arousal andvalence. For the high-frequency de-noised data set, results(Table VIII) revealed an overall PCC in validation of the ANNfor classifying arousal to be 67%. The PCC of the ANN forclassifying valence was 63%. For the total de-noised data set,results (Table IX) revealed the PCC of the ANN for classifyingarousal to be 78%. The PCC of the ANN for classifying valencewas 75%. Using all wavelet coefficient features for classifyingcurrent emotional states not only produced lower PCCs butinvolved more complex networks. Although the valence model(total de-noised) in all features resulted in a slightly higherPCC (75% compared with 73% for the reduced model), the fullnetwork was much more complex (26 input and 8 hidden nodesversus 13 input and 7 hidden nodes for the reduced model).

Page 11: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION 73

TABLE VIIIOVERALL PCCS FOR AROUSAL AND VALENCE CLASSIFICATION

NETWORKS USING ALL HIGH-FREQUENCY

DE-NOISED WAVELET FEATURES

TABLE IXOVERALL PCCS FOR AROUSAL AND VALENCE CLASSIFICATION

NETWORKS USING ALL TOTAL DE-NOISED WAVELET FEATURES

IV. DISCUSSION

This study demonstrated wavelet analysis can be an effectiveapproach for significant feature extraction from physiologicaldata. The wavelet technology also serves as an efficient meansfor feature reduction; the number of GSR signal features wasreduced by 99.4% for analysis. The stepwise regression alsoproved to be an effective statistical method for further featurereduction. The number of GSR and HR features for the modelof arousal was reduced by 38.5% after applying high-frequencynoise elimination and by 42.3% after further applying mid-band frequency noise elimination. For the model of valence, thenumber of physiological signal features was reduced by 46.2%after applying high-frequency noise elimination and by 50%after further applying mid-band frequency noise elimination.Therefore, these methods can be used to develop emotionalstate classification models without high complexity but that in-clude significant physiological signal features (i.e., amplitude,time and frequency).

Comparison of the R-squared values from the regressionmodels (ranging from 0.0424 to 0.0534) with the neural net-work models (ranging from 0.6 to 0.78) indicated that the re-lationships between the physiological responses, including HRand GSR, and emotional states, in terms of valence and arousal,are likely nonlinear. This was in agreement with speculationbased on the regression analysis.

Prior research has established that GSR is an indicator ofarousal, with a linear correlation among the physiologicalresponse and emotional state. However, GSR has not beenshown to distinguish positive from negative valence stimuli.In contrast, HR has been shown to have a strong relationshipwith both arousal and valence states. The analyses on thehigh-frequency de-noised data set confirmed the relationshipsbetween GSR and HR and arousal; however, valence stateswere not predicted by HR features. We suspected two possiblereasons for this. First, there may have been some bias in subjectself-reports of emotion during the study. Since the SAM ques-tionnaire is a multi-dimensional subjective rating system, theorder of presentation of the subscales was randomized on formspresented to subjects after each trial. This is a typical procedurein human factors research used to promote subject attention to

form content in repeated observations. In this paper, the elderlyparticipants may have been unaware of the randomized orderof questions and may have misinterpreted the scales after sometrials. Another explanation for our findings is that the initialwavelet analysis only filtered out the high-frequency noise fromthe GSR signal. Frequency analysis of GSR signals recordedduring the rest period revealed some noise in mid-band frequen-cies, which overlapped the typical GSR frequency. This mid-band noise could have accounted for some of the variabilityin the valence response; otherwise, HR features might haveexplained valence states. After further applying the new mid-band de-noising algorithm and conducting sensitivity analyseson the NN models, results confirmed arousal states were betterexplained by GSR features than HR; while valence states werebetter explained by HR (than arousal) and fewer GSR features.

V. CONCLUSION AND FUTURE WORK

The results of this study further support relationships be-tween HR and GSR responses and emotional states, in termsof arousal and valence. Hazlett [18] indicated in his review thatsuch physiological measures mostly reflect arousal and maybe limited for indicating emotional valence. We found GSRand HR to be predictive of arousal states and for HR to be astronger indicator of valence. Hazlett noted that some studieshave validated facial EMG as a valence indicator. Our futureresearch will examine facial EMG signals as an additionalphysiological measure for classifying emotional states.

The machine learning algorithm we developed has the po-tential to classify patient emotional states in real-time duringinteraction with a robot. Percent correct classification accura-cies for the NN models on arousal and valance ranged from 73to 82%. Other emotion recognition methods have been devel-oped using facial expressions via image processing [e.g., 50].These methods have achieved high PCCs (e.g., 88.2–92.6%);however, this methodology is based on explicit emotion expres-sions. Interaction with service robots may not induce extremehappiness or unhappiness in patients. The study demonstrateda set of wavelet coefficients could be determined to effectivelyrepresent physiological signal features without additional post-processing steps and therefore can support fast emotional stateclassification, when integrated as inputs in NN models. Wavelettechnology also proved highly effective for eliminating noise,overlapping with informative physiological signal frequencies,and increasing the accuracy of emotional state classificationwith the neural networks.

In general, the wavelet transformation process supports fastcoefficient computation for simultaneous noise elimination andphysiological signal representation. This is important becausewe are ultimately interested in real-time classification of patientemotional states for service robot behavior adaptation. Theapproach requires that GSR and HR data be captured on paientsin real time; signals must then be transformed and reconstructedusing wavelet analysis; wavelet expansion coefficients must becomputed; and the coefficients must be used as inputs to atrained ANN for emotional state prediction. Classified patientstates can then be used as a basis for adapting robot behavior

Page 12: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

74 IEEE TRANSACTIONS ON HUMAN-MACHINE SYSTEMS, VOL. 43, NO. 1, JANUARY 2013

or interface configurations to ensure positive patient emotionalexperiences (e.g., high arousal and high valence) in healthcare.

Such a real-time emotional state classification system may bea valuable tool as part of future service robot design to ensurepatient perceptions of quality in healthcare operations. We planto implement the new human emotional classification algorithmon a service robot platform and to attempt to adapt various typesof behaviors, including positioning relative to users and speechpatterns. In the present study, the robot configuration generatingthe highest HR and GSR responses was the design requiringsubjects to confirm their receipt of medicine. In a future study,we will further explore such a configuration and use maximumphysiological responses for emotional state classification.

ACKNOWLEDGMENT

The authors thank Dr. Yuan-Shin Lee for comments on anearlier unpublished technical report of this work as well asDr. T. Zhang, Dr. B. Zhu, L. Hodge, and Dr. P. Mosaly forassistance in the experimental data collection and statisticalanalysis.

REFERENCES

[1] D. I. Auerbach, P. I. Buerhaus, and D. O. Staiger, “Better late than never:Workforce supply implications of later entry into nursing,” Health Affairs,vol. 26, no. 1, pp. 178–185, Jan./Feb. 2007.

[2] J. A. Russell, “A circumplex model of affect,” J. Person. Social Psychol.,vol. 39, no. 6, pp. 1161–1178, 1980.

[3] D. Watson and A. Tellegen, “Toward a consensual structure of mood,”Psychol. Bull., vol. 98, no. 2, pp. 219–235, Sep. 1985.

[4] J. A. Russell and L. Feldman-Barrett, “Core affect, prototypical emo-tional episodes, and other things called emotion: Dissecting the elephant,”J. Person. Social Psychol., vol. 76, no. 5, pp. 805–819, 1999.

[5] R. Reisenzein, “Pleasure-activation theory and the intensity of emotions,”J. Person. Social Psychol., vol. 67, no. 3, pp. 525–539, 1994.

[6] M. Bradley and P. Lang, “Measuring emotion: The self-assessmentManikin and the semantic differential,” J. Behav. Ther. Exp. Psychiatry,vol. 25, no. 1, pp. 49–59, Mar. 1994.

[7] M. E. Dawson, A. M. Schell, and D. L. Filion, “The electrodermal sys-tem,” in Handbook of Psychophysiology, J. T. Cacioppo, L. G. Tassinary,and G. G. Berntson, Eds., 3rd ed. New York: Cambridge Univ. Press,2007, pp. 159–181.

[8] C. D. Wickens, Engineering Psychology and Human Performance,2nd ed. New York: Harper Collins Pub. Inc., 1992.

[9] M. Malik, J. T. Bigger, A. J. Camm, R. E. Kleiger, A. Malliani, A. J. Moss,and P. J. Schwartz, “Heart rate variability: Standards of measurement,physiological interpretation, and clinical use,” Eur. Heart J., vol. 17, no. 3,pp. 354–381, Mar. 1996.

[10] R. L. Mandryk and M. S. Atkins, “A fuzzy physiological approach for con-tinuously modeling emotion during interaction with play technologies,”Int. J. Hum.-Comput. Stud., vol. 65, no. 4, pp. 329–347, 2007.

[11] S. A. Neumann and S. R. Waldstein, “Similar patterns of cardiovascularresponse during emotional activation as a function of affective valenceand arousal and gender,” J. Psychosom. Res., vol. 50, no. 5, pp. 245–253,May 2001.

[12] T. Ritz and M. Thöns, “Airway response of healthy individuals to affectivepicture series,” Int. J. Psychophysiol., vol. 46, no. 1, pp. 67–75, Oct. 2002.

[13] C. Peter and A. Herbon, “Emotion representation and physiology assign-ments in digital systems,” Interact. Comput., vol. 18, no. 2, pp. 139–170,Mar. 2006.

[14] A. Nakasone, H. Prendinger, and M. Ishizuka, “Emotion recognition fromelectromyography and skin conductance,” in Proc. 5th Int. Workshop BSI,Tokyo, Japan, 2005, pp. 219–222.

[15] S. Amershi, C. Conati, and H. Maclaren, “Using feature selection andunsupervised clustering to identify affective expressions in educationalgames,” in Proc. Workshop Motivational Affect Issues ITS, Jhongli,Taiwan, 2006, pp. 21–28.

[16] C. K. Lee, S. K. Yoo, Y. J. Park, N. H. Kim, K. S. Jeong, andB. C. Lee, “Using neural network to recognize human emotions from heart

rate variability and skin resistance,” in Proc. 27th Annu. Conf. IEEE Eng.Med. Biol., Shanghai, China, pp. 5523–5525.

[17] A. Pecchinenda and C. A. Smith, “The affective significance of skinconductance activity during a difficult problem-solving task,” Cognit.Emotion, vol. 10, no. 5, pp. 481–503, Sep. 1996.

[18] R. L. Hazlett, “Measuring emotional valence during interactive experi-ences: Boys at video game play,” in Proc. CHI, Novel Methods: Emotions,Gesture, Events, Montreal, QC, Canada, 2006, pp. 1023–1026.

[19] H. Leng, Y. Lin, and L. A. Zanzi, “An experimental study on physiologicalparameters toward driver emotion recognition,” Lecture Notes in Com-puter Science, vol. 4566, pp. 237–246, 2007.

[20] C. L. Lisetti and F. Nasoz, “Using noninvasive wearable computers to rec-ognize human emotions from physiological signals,” EURASIP J. Appl.Signal Process., vol. 2004, pp. 1672–1687, 2004.

[21] K. Itoh, H. Miwa, Y. Nukariya, M. Zecca, H. Takanobu, S. Roccella,M. C. Carrozza, P. Dario, and A. Takanishi, “Development of a bioin-strumentation system in the interaction between a human and a robot,”in Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst., Beijing, China, 2006,pp. 2620–2625.

[22] D. Kulic and E. Croft, “Physiological and subjective responses to articu-lated robot motion,” Robotica, vol. 25, no. 1, pp. 13–27, Jan. 2007.

[23] C. Liu, K. Conn, N. Sarkar, and W. Stone, “Online affect detection androbot behavior adaptation for intervention of children with autism,” IEEETrans. Robot., vol. 24, no. 4, pp. 883–896, Aug. 2008.

[24] P. Agrawal, C. Liu, and N. Sarkar, “Interaction between human and robot-An affect-inspired approach,” Interact. Stud., vol. 9, no. 2, pp. 230–257,2008.

[25] P. Rani, C. Liu, N. Sarkar, and E. Vanman, “An empirical study of machinelearning techniques for affect recognition in human-robot interaction,”Pattern Anal. Appl., vol. 9, no. 1, pp. 58–69, May 2006.

[26] C. L. Breazeal, “Emotion and sociable humanoid robots,” Int. J. Hum.-Comput. Stud., vol. 59, no. 1/2, pp. 119–155, Jul. 2003.

[27] K. Najarian and R. Splinter, Biomedical Signal and Image Processing.Boca Raton, FL: CRC Press, 2006.

[28] M. W. Scerbo, F. G. Freeman, P. J. Mikulka, R. Parasuraman,F. Di Nocera, and L. J. Prinzel, “The efficacy of psychophysiologicalmeasures for implementing adaptive technology,” NASA Langley Res.Center, Hampton, VA, NASA TP-2001-211018, 2001.

[29] P. Ekman, “Expression and the nature of emotion,” in Approaches toEmotion, K. S. Scherer and P. Ekman, Eds. Hillsdale, NJ: Erlbaum,1984.

[30] P. Ekman, “An argument for basic emotions,” Cognit. Emotion, vol. 6,no. 3/4, pp. 169–200, 1992.

[31] P. Ekman, “Basic emotions,” in Handbook of Cognition and Emotion,T. Dalgleish and M. Power, Eds. Sussex, U.K.: Wiley, 1999.

[32] G. Zhang, R. Xu, Q. Ji, P. Cowings, and W. Toscano, “Context, observa-tion, and operator State (COS): Dynamic fatigue monitoring,” presentedat the NASA Aviation Safety Tech. Conf., Denver, CO, 2008.

[33] J. F. Kenney and E. S. Keeping, Mathematics of Statistics, 3rd ed.Princeton, NJ: Van Nostrand, 1954, pt. 1.

[34] M. Slater, C. Guger, G. Edlinger, R. Leeb, G. Pfurtscheller, A. Antley,M. Garau, A. Brogni, and D. Friedman, “Analysis of physiologicalresponses to a social situation in an immersive virtual environment,”Presence, Teleoper. Virtual Environ., vol. 15, no. 5, pp. 553–569,Oct. 2006.

[35] J. Laparra-Hernández, J. M. Belda-Lois, E. Medina, N. Campos, andR. Poveda, “EMG and GSR signals for evaluating user’s perception ofdifferent types of ceramic flooring,” Int. J. Ind. Ergonom., vol. 39, no. 2,pp. 326–332, 2009.

[36] A. Gupta, S. D. Joshi, and S. Prasad, “On a new approach for estimatingwavelet matched to signal,” in Proc. 8th Nat. Conf. Commun., Bombay,India, 2002, pp. 180–184.

[37] C. J. Peek, “A primer of biofeedback instrumentation,” in Biofeedback:A Practitioner’s Guide, M. S. Schwartz and F. Andrasik, Eds., 3rd ed.New York: Guilford Press, 2003.

[38] R. Moghimi, “Understanding noise optimization in sensor signal-conditioning circuits,” EE Times, 2008. [Online]. Available: http://eetimes.com/design/automotive-design/4010307/Understanding-noise-optimization-in-sensor-signal-conditioning-circuits-Part-1a-of-4-parts-

[39] J. Li, Y. Hou, P. Wei, and G. Chen, “A novel method for the determinationof the wavelet denoising threshold,” in Proc. 1st ICBBE, Wuhan, China,2007, pp. 713–716.

[40] S. Poornachandra and N. Kumaravel, “A novel method for the eliminationof power line frequency in ECG signal using hyper shrinkage function,”Digit. Signal Process., vol. 18, no. 2, pp. 116–126, Mar. 2008.

[41] F. Abramovich, T. C. Bailey, and T. Sapatinas, “Wavelet analysis and itsstatistical applications,” Statistician, vol. 49, pt. 1, pp. 1–29, 2000.

Page 13: Emotional State Classification in Patient–Robot Interaction ...€¦ · GSR wavelet features. Wavelet-based de-noising of GSR signals led to an increase in the percentage of correct

SWANGNETR AND KABER: EMOTIONAL STATE CLASSIFICATION IN PATIENT–ROBOT INTERACTION 75

[42] E. Y. Lam, “Statistical modelling of the wavelet coefficients with differ-ent bases and decomposition levels,” Proc. Inst. Elect. Eng—Vis. ImageSignal Process., vol. 151, no. 3, pp. 203–206, Jun. 2004.

[43] R. Polikar, “Pattern recognition,” in Wiley Encyclopedia of BiomedicalEngineering, M. Akay, Ed. New York: Wiley, 2006.

[44] R. Yamada, J. Ushiba, Y. Tomita, and Y. Masakado, “Decomposition ofelectromyographic signal by principal component analysis of wavelet co-efficient,” in Proc. IEEE EMBS Asian-Pac. Conf. Biomed. Eng., Keihanna,Japan, 2003, pp. 118–119.

[45] J. L. Semmlow, Biosignal and Biomedical Image Processing: MATLAB-Based Applications. New York: Marcel Dekker, 2004.

[46] F. Abramovich and Y. Benjamini, “Thresholding of wavelet coefficientsas multiple hypotheses testing procedure,” Lecture Notes in Statistics,vol. 103, pp. 5–14, 1995.

[47] F. Abramovich and Y. Benjamini, “Adaptive thresholding of waveletcoefficients,” Comput. Stat. Data Anal., vol. 22, no. 4, pp. 351–361,Aug. 1996.

[48] L. J. Prinzel, “Research on hazardous states of awareness and physio-logical factors in aerospace operations,” NASA, Greenbelt, MD, NASA/TM-2002-211444, L-18149, NAS 1.15:211444, 2002.

[49] T. Masters, Practical Neural Network Recipes in C++. San Diego, CA:Academic, 1993.

[50] A. Chakraborty, A. Konar, U. K. Chakraborty, and A. Chatterjee, “Emo-tion recognition from facial expressions and its control using fuzzylogic,” IEEE Trans. Syst., Man, Cybern. A, Syst., Humans, vol. 39, no. 4,pp. 726–743, Jul. 2009.

Manida Swangnetr received the B.S. degree in in-dustrial engineering from Chulalongkorn University,Bangkok, Thailand, in 2001 and the M.S. degree inindustrial engineering and the Ph.D. degree in indus-trial and systems engineering with a focus in humanfactors and ergonomics from North Carolina StateUniversity, Raleigh, in 2006 and 2010, respectively.

Currently, she is a Lecturer in the Departments ofProduction Technology and Industrial Engineering atKhon Kaen University, Thailand. She is also a Re-search Team Member in the Back, Neck, and Other

Join Pain research group in the Faculty of Associate Medical Sciences. Priorto these appointments, she worked as a Research Assistant in the Departmentof Industrial and Systems Engineering at North Carolina State University. Shehas published several other papers on human emotional state classificationin interacting with robots through the International Ergonomics AssociationTriennial Conference, the Annual Meeting of the Human Factors & ErgonomicsSociety, and the AAAI Symposium on Dialog with Robots. Her current researchinterests include: cognitive engineering; human functional state modeling in useof automation; ergonomic interventions for occupational work; and ergonomicsapproaches to training capabilities for disabled persons.

Dr. Swangnetr is a Member of the Human Factors and Ergonomics Societyand is a registered Associate Ergonomics Professional.

David B. Kaber (M’99) received B.S. and M.S. de-grees in industrial engineering from the University ofCcntral Florida, Orlando, in 1991 and 1993, respec-tively and the Ph.D. degree in industrial engineeringfrom Texas Tech University, Lubbock, in 1996.

Currently, he is a Professor of Industrial and Sys-tems Engineering at North Carolina State University,Raleigh and Associate Faculty in biomedical engi-neering and psychology. He is also the Director ofthe Occupational Safety and Ergonomics Program,which is supported by the National Institute for Oc-

cupational Safety and Health. Prior to this, he worked as an Associate Professorat the same institution and as an Assistant Professor at Mississippi State Uni-versity, Mississippi State. His current research interests include computationalmodeling of human cognitive behavior in interacting with advanced automatedsystems and optimizing design of automation interfaces based on tradeoffs ininformation load, task performance, and cognitive workload.

Dr. Kaber is a recent Fellow of the Human Factors and Ergonomics Societyand is a Certified Human Factors Professional. He is also a Member of AlphaPi Mu, ASEE, IEHF, IIE, and Sigma Xi.