advanced dialogue analysis as a part of an autonomous...
TRANSCRIPT
Hicham Atassi, Zdenek Smékal et al.Brno University of Technology, Department of Telecommunications,
Signal Processing Laboratory
Czech Republic
Advanced Dialogue Analysis as a Part of an Autonomous Intelligent System for Call Centres Surveillance and Assessment
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Introduction
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Call center is a centralized office used for the purpose of receiving or transmitting a large volume of requests by telephone. Call centers are usually a part of CRM.
In addition to a call centre, collective handling of letter, fax, live support software, IM and e-mail at one location is known as a contact center.
Call center
Inbound calls
Customer
Outbound calls
Customer
Product support, information inquiries
Telemarketing, donations, market
research
Introduction
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Call center recording system is usually used for twomain reasons
• The record can be considered as an evidence insome cases
• For evaluation and statistical purposes: therecords are selected randomly and evaluated manuallythrough reporting system.
If a call-center has 20 operators working daily for 7hours and 5 days per week, then the phone callsrecorded throughout one month make about 2800hours!
it is impossible to manually check all these phone callsin order to make a reliable image about agents’performance or to assess the quality of services.
Case of two operators
Which one is going to lose her job?
Agent 1
Time axis
emotion
agent 1customer
Anger
Neutral
happiness
Agent 2
Time axis
emotion
agent 2customer
Anger
Neutral
happiness
Criticisms from callers
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
• Operators working from a script
• Non-expert operators.
• Incompetent or untrained operators incapable of processingcustomers' requests effectively.
• Obsequious behavior by operators (e.g., relentless use of "sir,""ma'am" and "I'd be more than happy to assist you").
• Overseas location, with language and accent problems.
• Touch tone menu systems and automated queuing systems.
• Excessive waiting times to be connected to an operator.
• Complaints that departments of companies do not engage incommunication with one another.
• Deceit over location of call centre.
• Requiring the caller to repeat the same information multipletimes
Unspontaneous conversation
Frequent hesitations, unsatisfied customers
Extra joyful, keyword spotting
measurable
Detection based on correlatoion, keyword spotting
"If You Want to Scream, Press... - Preview". Online.wsj.com. Retrieved 2012-01-28
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
• Close scrutiny by management (e.g. frequent random call monitoring)
• Low compensation (pay and bonuses).
• Restrictive working practices (some operators are required to follow a pre-written script).
• High stress: a common problem associated with front-end jobs where employees deal directly with customers.
• Repetitive job task.
• Poor working conditions (e.g. poor facilities, poor maintenance and cleaning, cramped working conditions, management interference, lack of privacy and noisy).
• Impaired vision and hearing problems.
• Rude and abusive customers.
Stress detection
detectable
Criticisms from agents
"If You Want to Scream, Press... - Preview". Online.wsj.com. Retrieved 2012-01-28
Autonomous system?
Demands!
•High reliability in terms of classification accuracy•Low computational complexity•Work in a wide range of conditions: Different types of channels, noise, echo…•Multilingual analysis (deal with different languages )•Real-time processing
Proposed system
•One-dimensional and two-dimensional interpretation of emotion recognition results•Voice activity detection and dialog analysis•Age and gender recognition•Multichannel real-time processing• 25x faster than real time
AISCM
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
AISCM
MRMARMAutomatic reporting module Manual reporting module
Autonomous Intelligent System for Call-center Monitoring
ARM-MRM interaction interface
ARM (Automatic Reporting System)
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
First layerExtraction of traits
Second layerPost processing
Third layerInterpretation
Input instance (dual-channel phone call record)
Output evaluation
Emotion, gender, age, VAD waveforms, keywords
Emotion mapping, extraction of dialogue features, age evaluation
Fusion of all traits, decision making
First layer- extraction of traits
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
First layerExtraction of traits
Second layerPost processing
Third layerInterpretation
Input instance (dual-channel phone call record)
Output evaluation
Emotion, gender, age, VAD waveforms, keywords
Emotion mapping, extraction of dialogue features, age evaluation
Fusion of all traits, decision making
Extraction of traits
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Emotion recognition
• Trained using spontaneous speech from real call centers.• Multilingual: Czech, Slovak, Polish, Russian, German, French, Italian, Spanish and English.• Five emotional states: Anger, happiness, sadness, surprise and neutral state
• Both classification and regression approach.
One against all SVM
Generic GMM Gender dependent GMM
Emotion coupling classifier
3 layer classificationsystem
WeightedClassification accuracy[%]
58 61 63 68 71
With segmental features
59 61 63.4 68.2 71.6
Skeleton of emotions (classification)
2D trained NN (classification)
Skeleton of emotions (regression)
2D trained NN (regression)
MSE for valance 0.42 0.21 0.27 0.16
MSE for activation 0.40 0.26 0.27 0.13
Evaluation of the 2D approach
Classification results
Database building
Gender recognition
GMM
GMM
Male
Female
GMMInput feature vector
GMM
GMM
.
.
.
GMM
Activation
Evaluation
Surprise (threshold based activation)
Fusing NN 2D mapping NN
Gender dependent classifier
Emotion coupling classifier
General classifier
Language independent Language dependent
General classifier: Gaussian mixture model with seven classes (models)
General classifier: Gaussian mixture model with seven classes (models)
Gender dependent system:Two GMM models trained
separately using male/ female utterances
Emotion coupling classifierEmotion coupling classifier*: 21 trained GMM models using unique sets of features (one for each possible couple of emotions)
Fusing layerFeed forward back propagation network for fusing
Extraction of traits
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Age estimation
• average absolute error: 12,67 years for male speakers and 15.3 for female speakers• Tested on 200 samples
0 5 10 15 20 25 300
5
10
15
20
25
30
Počet
Rozložení abosolutní chybovosti u mužů
0 10 20 30 400
1
2
3
4
5
6
7
8
Absolutní chybovost [roky]
Rozložení abosolutní chybovosti u žen
Absolute error [years]
cou
nt
Males Females
Gender recognition
• This task is performed with perfect accuracy
Second layer- postprocessing
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
First layerExtraction of traits
Second layerPostprocessing
Third layerInterpretation
Input instance (dual-channel phone call record)
Output evaluation
Emotion, gender, age, VAD waveforms, keywords
Emotion mapping, extraction of dialogue features, age evaluation
Fusion of all traits, decision making
Second layer- postprocessing
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
• Emotions: mapping into the target dimension, smoothing, correction…• Age: correction, smoothing, final evaluation…• Voice activity: extraction of dialogue features, evaluation.
acti
vati
on
acti
vati
on
valence
valence
activationvalence
activationvalence
time (s)
time (s)
Left channel
Right channel
Dialogue analysis
The dialogue analysis is based on the output of an enhanced Global Speech Absence Probability (GSAP) Voice Activity Detector.
Three characteristics are considered• Reaction (turn taking)• Interruption • Hesitation
The following statistics are computed from each record• Count• Mean• Maximum and minimum
176 telephony records from Czech call centers were put under examination
A total of 24 parameters are obtained from each record3 characteristics X 4 statistical variables X 2 channels
Sensitivity
spec
ific
ity
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Reaction (turn taking)
Time of Reaction
time
Voice activity
time
Voice activity
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Interruption
Time of interruption
time
time
Voice activity
Voice activity
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Hesitation
hesitation
time
Voice activity
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
VAD example
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
time
Vo
ice
act
ivit
y
Client
Agent
hesitation hesitation interruptioninterruption
Inter dialogue correlations
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Turn taking Interruption Hesitation
count mean max min count mean max min count mean maxTu
rn t
akin
g
cou
nt
mea
nm
axm
in
Inte
rru
pti
on
cou
nt
mea
nm
axm
in
Hes
itat
ion
cou
nt
mea
nm
ax
The aim is to find the significant correlations among the dialogue features
min
Agent Client
...............
Inter dialogue correlations
5 10 15 20
5
10
15
20
Voice activity parameter index
Vo
ice
act
ivit
y p
ara
met
er in
dex
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Inter dialogue correlations
parameter direction parameter direction correlation
No. reactions agent-client No. reaction client-agent 0.99
No. interruptions agent-client No. reactions client-agent 0.80
Min. reaction client-agent min. interruption agent-client 0.79
Mean reaction client-agent Mean interruption agent-client 0.62
No. reactions client-agent No. hesitations agent-client 0.6
Max. reaction agent-client Max. interruption client-agent 0.61
No. reactions agent-client No. hesitations agent-client 0.63
No. hesitations agent-client No. reactions client-agent 0.63
Max. interruption client-agent Max. reaction agent-client 0.65
Mean interruption agent-client Min. reaction client-agent 0.84
Min. reaction client-agent No. reactions agent-client -0.67
No. Reactions agent-client Min. interruption agent-client -0.70
Almost perfect correlation The more you talk, the more you interrupt The agents were impatient More questions, more hesitations
More questions from the agent -> the client tends to finish the call quickly
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Emotion-dialogue correlations
5 10 15 20
1
2
3
4
Voice activity parameter index
Activation (agent)
Evaluation (agent)
Activation (client)
Evaluation (client)
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
The aim is to find the significant correlations between the dialogue features on the first hand and both activation and evaluation levels of emotion on the second hand
Emotion-dialogue correlations
parameter direction parameter direction correlation
Mean of activation client Mean of interruption agent-client 0.62
Mean of valence client Max. of interruption agent-client 0.61
Mean of evaluation client No. interactions agent-client 0.58
Mean of activation client Min. of interruption agent-client 0.57
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Conclusion: Some dialogue features can be successfully employed to predict the emotional state
Weighted cumulative sum (WAS)
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
0 1 2 3 4 5 6
x 105
0
0.2
0.4
0.6
0.8
1
0 1 2 3 4 5 6
x 105
0
0.2
0.4
0.6
0.8
1
1.2
time (ms)
time (ms)
Vo
ice
act
ivit
yV
oic
e a
ctiv
ity
WAS
The following statistics are computed from WAS: mean, median, maximum, minimum, std, percentiles, regression coefficient…
Successful call?
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Hypothesis: The dialogue features can be employed to identify the successful calls
The call is considered as successful when: a contract is concluded, a product is sold etc..
A small balanced and labeled set of 30 telephone records were selected for this experiment
Forwardselection
Client_mean_interupt
Client_max_hesitat
Agent_was_10_percent
Agent_client_was_corr
.
.
.
.Agent_min_reaction
SVM
Accuracy: 71%
Third layer- interpretation
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
First layerExtraction of traits
Second layerPostprocessing
Third layerInterpretation
Input instance (dual-channel phone call record)
Output evaluation
Emotion, gender, age, VAD waveforms, keywords
Emotion mapping, extraction of dialogue features, age evaluation
Fusion of all traits, decision making
Third layer- interpretation
3rd SPLAB workshop, 30 October – 1 November 2013, Brno Czech Republic
Open framework for final evaluation, searching and reporting. Based on roles. Can be trained using the basic traits.
Thank you for your attention