annotation and detection of blended emotions in real human-human dialogs recorded in a call center...

26
Annotation and Detection of Blended Emotions in Real Human-Human Dialogs recorded in a Call Center L. Vidrascu and L. Devillers TLP-LIMSI/CNRS - France IST AMITIES FP5 Project Automated Multi-lingual Interaction with Informatio HUMAINE FP6 NoE Human-Machine Interaction on Emotion CHIL FP6 Project Computer in the Human Interaction Loop

Upload: randolph-dawson

Post on 17-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Annotation and Detection of Blended Emotions

in Real Human-Human Dialogs recorded in a Call Center

L. Vidrascu and L. Devillers

TLP-LIMSI/CNRS - France

IST AMITIES FP5 Project Automated Multi-lingual Interaction with Information and Services

HUMAINE FP6 NoE Human-Machine Interaction on Emotion

CHIL FP6 Project Computer in the Human Interaction Loop

Vidrascu & Devillers - IEEE ICME 2005

Introduction Study of real-life emotions to improve the capacities of

current speech technologies detecting emotions can help by orienting the evolution of H-C

interaction via dynamic modification of dialog strategies

Most previous works on emotion have been conducted on acted or induced data with archetypal emotions.

Results on artificial data transfer poorly to real data expression of emotion complex: blended, shaded, masked dependent of contextual and social factors expressed at many different levels: prosodic, lexical, etc

Challenges for detecting emotions in real-life data representation of complex emotion robust annotation validation protocol

Vidrascu & Devillers - IEEE ICME 2005

Outline

real-life corpus recorded in a call center call centers are very interesting environments

because the recording can be made imperceptibly

emotion annotation emotion detection blended emotions perspectives

Vidrascu & Devillers - IEEE ICME 2005

Corpus recorded at a Web-based Stock Exchange Customer Service Center Dialogs are real agent-client interactions in French

covering a range of investment topics, account management and Web questions or problems,

5229 speech turns making 5012 in-task exchanges.

# agents 4# turns/dialog Average: 50 # words/turn Average: 9# words total 44.1 k

# clients 100 Min: 5 Max: 227 Min: 1 Max: 128# distinct 3k

Vidrascu & Devillers - IEEE ICME 2005

Outline

real-life corpus description emotion annotation phase is complex

definition of emotion representation and emotional unit

annotation validation

emotion detection blended emotions perspectives

Vidrascu & Devillers - IEEE ICME 2005

Three types of emotion representation describing emotions via appraisal

dimensions (Scherer, 1999) novelty, pleasantness, etc

describing emotions via abstract dimensions (Osgood, 1975) activation: active/passive, valence: negative/positive control: relation to stimulus

verbal categories 8 primary universal emotions for Ekman (2002) Primary vs. secondary/social (Plutchik, 1994)

Vidrascu & Devillers - IEEE ICME 2005

Emotion Definition and Annotation We consider emotion in a broad sense including attitudes and

emotions

Definition set of 5 task-dependent emotion labels : anger and

fear emotions, excuse, satisfaction, neutral attitudes. emotional unit: speaker turn

Dialog corpus labeled with audio listening 2 independent annotators: ambiguities ~3%

Anger Fear Exc. Sat. Neutr.

Client 9.9% 6.7% 0.1% 2.6% 80.7%

Agent 0.7% 1.3% 1.8% 4.0% 92.1%

Vidrascu & Devillers - IEEE ICME 2005

Annotation Validation Inter-annotation agreement measure

Kappa=0,8

Perceptual test to validate the presence of emotions in the corpus Test data: 40 speaker turns & 20 native French

subjects 75% of negative emotions were well-detected

Ref: Devillers, L., Vasilescu I., Mathon, C., (2003), “Acoustic cues for perceptual emotion detection in task-oriented Human-Human corpus”, 15th ICPhS, Barcelona

Vidrascu & Devillers - IEEE ICME 2005

Outline

real-life corpus description emotion annotation emotion detection

Prosodic, acoustic and some disfluencies cues Neutral/Negative, Fear/Anger classification

blended emotions perspectives

Vidrascu & Devillers - IEEE ICME 2005

Prosodic, acoustic and disfluencies cues

Crucial point: Selection of a set of relevant features Not well established / appears to be data-dependent Big and redundant set of features:

F0 features : min, max, mean, standard deviation, range, slope, regression coefficient and its mean square error, cross-variation of F0 between two adjoining voiced segments.

Energy features min, max, mean, standard deviation, range

Duration features: speaking rate (inverse of the average length of the speech voiced parts)

Other acoustic features: formants (first and second), and their bandwidths

Speech disfluencies cues: number and length of silent pauses (unvoiced parts between 200-800 ms) and filler pauses “euh”

Vidrascu & Devillers - IEEE ICME 2005

Speech Data Processing F0, Energy and acoustic cues

extraction (Praat)Ex: F0 processing, z-score normalization Since F0 feature detection is subject to error,

segments with duration of less than 30 ms are eliminated (1.4% of the segments, balanced on classes)

Automatic alignment for filler and silent pauses extraction: LIMSI system (HMMs with Gaussian mixtures

for acoustic modeling) word alignment was manually verified for

speaker turns labeled with negative emotions

Vidrascu & Devillers - IEEE ICME 2005

Features selection and Detection systems

Weka toolkit (www.cs.waikato.ac.nz): collection of machine learning algorithms for data mining tests selection of subsets of the best attributes

SVM predictif, Entropy measure (infogain), Correlation based Feature Selection

classifiers tested Decision tree that uses pruning (C4.5) Support Vector Machine (SVM) Voting algorithms (ADTree and Adaboost): combine

the outputs of different models

Vidrascu & Devillers - IEEE ICME 2005

Neutral/Negative emotion detection

using prosodic and acoustic cues, Jackknifing proc. (30 runs)

C4.5 AdaBoost ADTree SVM

5att

72.8 ( 5.2) 71.2 (4.5) 72.3(4.6) 67.2(6.3 )

10att

73.0 ( 5.3) 71.5( 4.8) 73.0( 5.7) 69.5( 5.6)

15att 71.7 ( 6.4) 71.1( 4.7) 71.6( 4.9) 70.8( 4.9)

20att

71.8 ( 5.3) 71.3( 4.3) 71.8( 5.1) 71.0( 4.9)

allatt 69.4 ( 5.6) 71.7( 4.3) 71.6( 4.8) 69.6( 3.5)

• Very few attributes (5att) yield high level of detection• Little differences between the different techniques

Vidrascu & Devillers - IEEE ICME 2005

Anger/Fear emotion detection

Decision Tree classifier:

56% correct detection with prosodic and acoustic cues 60% when adding disfluencies (silent pauses and filler pauses « euh ») cues

We hypothesize that this low performance is due to blended emotions

Vidrascu & Devillers - IEEE ICME 2005

Outline

real-life corpus description emotion annotation emotion detection blended emotions

In certain states of mind, it is possible to exhibit more than one emotion when trying to mask a feeling, conflicting emotions, suffering, etc

perspectives

Vidrascu & Devillers - IEEE ICME 2005

Blended emotions

In this financial task, Anger and Fear can be combined: « Clients can be angry because they are afraid of losing money » Confusion matrix (40% confusion): there are as

many « Anger classified Fear » as « Fear classified Anger ».

Re-annotation procedure of negative emotions with a new scheme defined for other tasks (medical call center, EmoTV), 2 different annotators

Vidrascu & Devillers - IEEE ICME 2005

New emotion annotation scheme

allows to choose 2 labels per segment: Major emotion: which is perceived as dominant Minor emotion: if another emotion is perceived

in background (the most intense minor emotion)

7 coarse classes (defined for another task) Fear, Sadness, Anger, Hurt, Positive, Surprise,

Neutral attitude

Vidrascu & Devillers - IEEE ICME 2005

Perception of emotion is very subjectiveHow to mix different annotations? Labeler 1: Major Anger, Minor SadnessLabeler 2: Major Fear, Minor Anger

exploit the differences by combining the labels from multiple annotators in a soft emotion vector

-> (wM/W Anger, wm/W Fear, wm/W Sadness)

For wM=2 , wm=1 ,W=6 in this example

-> (3/6 Anger, 2/6 Fear, 1/6 Sadness)

Vidrascu & Devillers - IEEE ICME 2005

Re-annotation result Because, we are focusing on Anger and Fear emotions, 4 classes were deduced from emotion vectors:

Fear (Fear>0; Anger=0) Anger (Fear=0; Anger>0) Blended emotion (Fear>0; Anger>0) Other (Fear=0; Anger=0)

Consistance between the first and the second annotation for 78% utterances If (Anger >= Fear) and previous annotation Anger ->

consistance

Same Major label in 64% utterances No common labels between the two annotators: 13%

Vidrascu & Devillers - IEEE ICME 2005

Re-annotation results

Validation of the presence of mixtures of emotion in the Anger and Fear segments Excerpt taken from a call: Client: “No, but I haven’t handled it at all. I was on holidays, I got a letter, about 4… 400 euros were missing…”

0

10

20

30

40

50

60

70

%

Other Fear Blend Anger

Reannotation of Fear and Anger

Fear Anger (1st annotation scheme)

Vidrascu & Devillers - IEEE ICME 2005

Summary and perspectives Detection performance

73% correct detection between Neutral and Negative emotion whereas only 60% between Fear and Anger

Validation of the presence of mixtures of Fear/Anger emotion

Emotion representation: Soft emotion vector medical call center corpus (20h annotated) multimodal corpus of TV interviews (EmoTV-HUMAINE)

Perspectives improve detection performance by using non complex

part of the corpus for training model analyse real-life blended emotions and perceptual test

on blended emotions

Vidrascu & Devillers - IEEE ICME 2005

Thank you for your attention

reference: L. Devillers, L. Vidrascu, L. Lamel, “Challenges in real-life emotion annotation and machine learning based detection”, Special issue, Journal of Neural Networks, to appear in July 2005.

Vidrascu & Devillers - IEEE ICME 2005

Combining lexical and paralinguistic

60

70

80

1 2 3 4 5 6 7 8 9 10

Combining Lexical and Paralinguistic scores

LexicalParalinguisticLexical + Paralinguistic

lexical unigram model : 78% detection neutral/negative linear combination of the 2 scores on 10 test sets (50 utterances)

Vidrascu & Devillers - IEEE ICME 2005

Emotion Detection Model

Emotion detection model is based on unigram models Due to the sparseness of the on-emotion data, each emotion model is an interpolation of an Emotion-specific model

and a General task-specific model estimated on the entire training corpus. The similarity between u and E is the normalized log likelihood ratio between an emotion model and the general

model.

Standard preprocessing procedures: compounding (negative forms ex: « pas_normal »), stemming, and stopping

)(

)(1/log,1)/(logwP

wPEwPuwtfL

EuPuwu

PL

EuPu

log1/log

Vidrascu & Devillers - IEEE ICME 2005

Experiments on Anger/Fear detection

Prosodic and acoustic cues 56% of detection around 60% when disfluencies are added

Lexical cues: ICME 2003 often same lexical words : problem, abnormal, etc difference is much more syntactic than lexical

Vidrascu & Devillers - IEEE ICME 2005

Attribute selection

CcAa

Cc

acpacpapACH

cpcpCH

)/(2log)/()()/(

)(2log)()(

weka toolkit With a model (SVM )

Information Gain (A: attribut; C : classe)

CFS (Correlation based Feature Selection)