predicting voice elicited emotions nishant pandey

23
Predicting Voice Elicited Emotions Nishant Pandey

Upload: christian-lester

Post on 19-Jan-2016

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Predicting Voice Elicited Emotions Nishant Pandey

Predicting Voice Elicited Emotions

Nishant Pandey

Page 2: Predicting Voice Elicited Emotions Nishant Pandey

Synopsis

• Problem statement and motivation• Previous work and background• System• Intuition and Overview• Pre-processing of audio signals• Building feature space• Finding patterns in unlabelled data and labelling of samples• Regression Results

• Deployed System• Market Research

Page 3: Predicting Voice Elicited Emotions Nishant Pandey

Motivation• Automate the screening process in service based industries• Hourly job workers (two-thirds of U.S. Labour force or ~50 million job

seekers every year)

Problem Statement• To be able to analyse voice and predict listener emotions elicited by

the paralinguistic elements of the voice.

Page 4: Predicting Voice Elicited Emotions Nishant Pandey

Previous work

Current work focuses on predicting the elicited emotions of voice clips.

2 set of goals, which includes recognizing-• the type of personality traits intrinsically possessed by the speaker,

for e.g. speaker trait and speaker state• the types of emotions carried within the speech clip, for e.g. acoustic

affect (cheerful, trustworthy, deceitful etc.)

Page 5: Predicting Voice Elicited Emotions Nishant Pandey

Background – Emotion Taxonomy

The framework articulated by “FEELTRACE” • Includes all the emotionresponses we want topredict.• Emotions by finite

quantifiable dimensions.

Page 6: Predicting Voice Elicited Emotions Nishant Pandey

Features - Paralinguistic features of Voices

Concept Definition Data RepresentationAmplitude measurement of the variations over

time of the acoustic signalquantified values of a sound wave’s Oscillation

Energy acoustic signal energy representation in decibels

20*log10(abs(FFT))

Formants the resonance frequencies of the vocal tract

maxima detected using Linear Prediction on audio windows with high tonal content

Perceived pitch Perceived Fundamental frequency and harmonics

Formants

Fundamental frequency the reciprocal of time duration of one glottal cycle - a strict definition of “pitch”

first formant

Page 7: Predicting Voice Elicited Emotions Nishant Pandey

System – Intuition

Spectrogram of two job applicants responding to “Greet me as if I am a customer”

Page 8: Predicting Voice Elicited Emotions Nishant Pandey

System – Overview

Page 9: Predicting Voice Elicited Emotions Nishant Pandey

System – Pre-Processing of Audio Signals

• Pre-processing tasks involve:• Removing voice clips with <2 seconds

length and containing noise• audio signal to data in time and frequency

domain• Short-term Fast Fourier Transform per

frame• Energy measures in frequency domain

per frame• Linear prediction coefficient in

frequency domain per frame

Page 10: Predicting Voice Elicited Emotions Nishant Pandey

System - Feature Space ConstructionWe experimented with feature construction based on the following dimensions and combinations:• Signal measurements such as energy and amplitude.• Statistics such as min, max, mean, and standard deviation on signal

measurements• Measurement window in time domain: different time size and entire

time window• Measurement window in frequency domain: all frequencies, optimal

audible frequencies, and selected frequency ranges

Page 11: Predicting Voice Elicited Emotions Nishant Pandey

System – Labels and Right set of Features?• Conventional approach – getting voice samples rated by experts• Unsupervised Learning – Analyse features and their effectiveness

Process:1. Unsupervised learning is used to find patterns in unlabelled data.2. Now, training data sets are constructed based on clustering results

and manual labelling.

Page 12: Predicting Voice Elicited Emotions Nishant Pandey

System – How do we get the labels? Contd.

Parameters• Cost Function:• Connectivity• Dunn Index• Silhouette

Clustering Results• Technique: Hierarchical

Clustering• Number of clusters: 5 • Manual validation of clusters

was also done

Page 13: Predicting Voice Elicited Emotions Nishant Pandey

System – Visualization of clusters

Page 14: Predicting Voice Elicited Emotions Nishant Pandey

System – Modelling

Supervised Learning algorithms• Logistic Regression• Support Vector Machine• Random Forest

Semi-Supervised Learning algorithm• KODAMA

Output:• Binary outcome (positive or negative)• Numerical scores

Page 15: Predicting Voice Elicited Emotions Nishant Pandey

Case Study – Modelling

• Prediction – Positive vs Negative Response• A positive response could be one or multiple perceptions of a

“pleasant voice”, “makes me feel good”, “cares about me”, “makes me feel comfortable”, or “makes me feel engaged”.

• System.V1 -> Using SVM and V2 -> Random Forest• Interview Prompts: “Greet me as If I am a customer”

Page 16: Predicting Voice Elicited Emotions Nishant Pandey

System - Prediction Results

• Accuracy : 0.86• 95% CI : (0.76, 0.92)• P-Value [Acc > NIR] : 5.76e-07• Sensitivity : 0.81• Specificity : 0.88• Pos Pred Value : 0.81• Neg Pred Value : 0.88

Page 17: Predicting Voice Elicited Emotions Nishant Pandey

System - Prediction Results (KODAMA)• Kodama performs feature

extraction from noisy and high-dimensional data.• Output of Kodama includes

dissimilarity matrix from which we can perform clustering and classification.

Page 18: Predicting Voice Elicited Emotions Nishant Pandey

Deployed System

Page 19: Predicting Voice Elicited Emotions Nishant Pandey

Market Research• Demographics Matters• Young listeners (18-29 years

old) and Income less than $29000/year have more strict criteria of how they sense engaging.• No Correlation b/w emotion

elicited vs age/ ethnicity/ education level.• Bias towards female voice.

Page 20: Predicting Voice Elicited Emotions Nishant Pandey

Thanks

Page 21: Predicting Voice Elicited Emotions Nishant Pandey

Time and Frequency Domain

• Time Domain: https://en.wikipedia.org/wiki/Time_domain#/media/File:Fourier_transform_time_and_frequency_domains_(small).gif

• Frequency Domain:https://en.wikipedia.org/wiki/Frequency_domain#/media/File:Fourier_transform_time_and_frequency_domains_(small).gif

Page 22: Predicting Voice Elicited Emotions Nishant Pandey

Learnings – Difference in Voice Characteristics

• Result Improves by 10% - when a decision tree is layered by features related to voice characteristic on top of the Random Forest.

Page 23: Predicting Voice Elicited Emotions Nishant Pandey

Prediction Results – SVM vs Random Forest