automatic speech classification to five emotional states based on gender information abstract we...

2
AUTOMATIC SPEECH CLASSIFICATION TO FIVE AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of certain emotional speech styles for each gender separately. The components of the investigation: •500 emotionally expressed speech segments (2 male, 2 female actors) •5 basic emotions, i.e., Anger, Happy, Neutral, Sad and Surprise •a total of 87 global statistics of energy, pitch and formants •Bayes classifier where class pdfs are approximated via Parzen windows or modeled as Gaussians Dimitrios Ververidis and Constantine Kotropoulos ARISTOTLE UNIVERSITY OF THESSALONIKI, DEPARTMENT OF INFORMATICS ARISTOTLE UNIVERSITY OF THESSALONIKI, DEPARTMENT OF INFORMATICS Box 451, Thessaloniki 540 06, GREECE, e-mail: {jimver, costas}@zeus.csd.auth.gr URL: http://poseidon.csd.auth.gr INTRODUCTION Data FEATURE EXTRACTION EVALUATION OF SINGLE FEATURES ON EACH GENDER SEPARETELY Methods that can classify the emotions in speech would be of great usefulness in computer science, linguistic sciences, psychology, and medical sciences. For example: 1.Automatic Speech Recognition optimization. 2.Parkinson ‘s disease. 3.Improvement of the quality of an interface by detecting the frustration and the dissatisfaction of a user. •Public domain Danish Emotional Speech database (DES), obtained after request to Inger Samsø Engberg at the faculty of Institute of Electronic Systems at Aalborg Univ., Denmark. •The data used in the experiments are 500 sentences and words that are located between two silent segments, equally separated into two gender categories. •Four professional actors, two male and two female are speaking in 5 emotional states, such as anger, happiness, neutral, sadness, and surprise. Global statistical feature estimation : 87 statistics of the pitch, energy, and formant contours are extracted. •The statistics are calculated on rising, falling slopes, and maximum/minimum plateaux of the contours. •Typical examples are: the maximum, the minimum, the median, the mean, and interquartile range. In order to study the classification ability of each feature, a rating method has been implemented. Each feature is evaluated by the ratio between the between-class variance (σ b 2 ) and the within- class variance (σ w 2 ): 1 ( ) {( )( )|} L T w i i i i i S P E M M X X 0 0 1 ( )( )( ) L T b i i i i S P M M M M L: number of classes ( ) i P : A priori probability of ω i 1 2 n X = [x x ... x ] : Random Vector M i : Expected vector of ω i M 0 : Expected vector of the mixture density Pitch features: 20.Mean range 50.Mean value of falling slopes 43.Mean value of rising slopes 18. Maximum value of pitch 22.Interquartile range of pitch 26.Mean value of plateux at minima Energy features: 54.Maximum value 78.Mean value of rising slopes 85.Mean value of falling slopes 86. Median value of falling slopes 79. Median value of rising slopes

Upload: kenneth-conrad-dixon

Post on 17-Dec-2015

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of

AUTOMATIC SPEECH CLASSIFICATION TO FIVE AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATIONEMOTIONAL STATES BASED ON GENDER INFORMATION

ABSTRACT

We report on the statistics of global prosodic features of certain

emotional speech styles for each gender separately.

The components of the investigation:

•500 emotionally expressed speech segments (2 male, 2 female

actors)

•5 basic emotions, i.e., Anger, Happy, Neutral, Sad and Surprise

•a total of 87 global statistics of energy, pitch and formants

•Bayes classifier where class pdfs are approximated via Parzen

windows or modeled as Gaussians

Dimitrios Ververidis and Constantine KotropoulosARISTOTLE UNIVERSITY OF THESSALONIKI, DEPARTMENT OF INFORMATICSARISTOTLE UNIVERSITY OF THESSALONIKI, DEPARTMENT OF INFORMATICS

Box 451, Thessaloniki 540 06, GREECE, e-mail: {jimver, costas}@zeus.csd.auth.gr URL: http://poseidon.csd.auth.gr

INTRODUCTION

Data

FEATURE EXTRACTION

EVALUATION OF SINGLE FEATURES ON EACH GENDER SEPARETELY

Methods that can classify the emotions in speech would be of great usefulness in computer science, linguistic sciences, psychology, and medical sciences. For example:

1.Automatic Speech Recognition optimization.

2.Parkinson ‘s disease.

3.Improvement of the quality of an interface by detecting the frustration and the dissatisfaction of a user.

•Public domain Danish Emotional Speech database (DES), obtained after request to Inger Samsø Engberg at the faculty of Institute of Electronic Systems at Aalborg Univ., Denmark.

•The data used in the experiments are 500 sentences and words that are located between two silent segments, equally separated into two gender categories.

•Four professional actors, two male and two female are speaking in 5 emotional states, such as anger, happiness, neutral, sadness, and surprise.

Global statistical feature estimation: 87 statistics of the pitch, energy, and formant contours are extracted.

•The statistics are calculated on rising, falling slopes, and maximum/minimum plateaux of the contours.

•Typical examples are: the maximum, the minimum, the median, the mean, and interquartile range.

In order to study the classification ability of each feature, a rating method has been implemented. Each feature is evaluated by the ratio between the between-class variance (σb

2) and the within-class variance (σw

2):

1

( ) {( )( ) | }L

Tw i i i i

i

S P E M M

X X

0 01

( )( )( )L

Tb i i i

i

S P

M M M M

L: number of classes ( )iP : A priori probability of ωi

1 2 nX = [x x ... x ] : Random Vector

Mi : Expected vector of ωi

M0 : Expected vector of the mixture density

Pitch features:

20.Mean range

50.Mean value of falling slopes

43.Mean value of rising slopes

18. Maximum value of pitch

22.Interquartile range of pitch

26.Mean value of plateux at minima

Energy features:

54.Maximum value

78.Mean value of rising slopes

85.Mean value of falling slopes

86. Median value of falling slopes

79. Median value of rising slopes

Page 2: AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATION ABSTRACT We report on the statistics of global prosodic features of

Bayes classifier with Gauss pdfs

Both Males Females

Correct classification rate

51.6% 61.1% 57.1%

Correct classification

rate

Human Perception

Random

67% 20%

• The Sequential Forward Selection (SFS) algorithm is used for automatic feature selection.

• The criterion employed is the correct classification rate achieved by the selected features.

The correct classification rate is calculated by cross-validation where 90% of the data were used for training and 10% for validation.

Visualization in 2D

CONFUSION MATRICES

Stimuli Response (%)

Neutral Surprise Happiness Sadness Anger

Neutral 51 15 2 28 4

Surprise 5 64 7 9 14

Happiness 9 24 36 13 18

Sadness 17 6 2 70 5

Anger 12 19 26 12 31

Stimuli Response (%)

Neutral Surprise Happiness Sadness Anger

Neutral 60.8 2.6 0.1 31.7 4.8

Surprise 10 59.1 28.7 1 1.3

Happiness 8.3 29.8 56.4 1.7 3.8

Sadness 12.6 1.8 0.1 85.2 0.3

Anger 10.2 8.5 4.5 1.7 75.1

Correct classification rates by humans at 67%

Confusion matrices

•Black color: Humans reach the level of 67% correct classification score.

•Green color: Bayes classifier with Gaussian pdfs using features 76, 18, 44, 27, 7: When the crossvalidation method was used, a correct classification rate of 51.6% is obtained. (54%, when all data are used for training and testing).

•Blue color: The aforementioned method used only for male patterns. The method selects different features and achieves a correct classification score of 61.1%.

•Red color: The aforementioned method used only for female patterns. A correct classification score of 57.1% is achieved.

AUTOMATIC FEATURE SELECTION

•Features selected by SFS with a Bayes classifier when class pdfs are modeled as Gaussian pdfs as a criterion.

•PCA was used in order to reduce the dimensionality from five dimensions (5D) to two dimensions (2D) .

•Only the samples which belong to the interquartile range of the pdf for each class are shown .

•The ellipses denote the 60% likelihood contours for a 2-D Gauss model.

Stimuli Response (%)Neutral Surprise Happiness Sadness Anger

Neutral 67 3 8 20 1Surprise 3 60 18 6 13

Happiness 18 13 43 6 21Sadness 11 3 2 80 3Anger 6 19 13 6 56

Stimuli Response (%)Neutral Surprise Happiness Sadness Anger

Neutral 55 13 6 20 6

Surprise 12 61 14 6 7

Happiness 12 11 54 4 18

Sadness 13 4 4 58 21

Anger 6 10 18 9 57

CONCLUSIONS

•The rates reported can be further improved by analyzing the properties of two class problems.

•The features which can separate two classes could be different from those which separate 5 classes.

•By designing proper decision fusion algorithms, we may combine several two class classifiers and the overall system could outperform the rates obtained by the five class classifiers.

•If the words in the training set could be linguistically different as those in the testing set then the classification would be linguistic unbiased.

•The error estimation can be done according to the elements outside the diagonal line of the confusion matrix, so that the Bayes classifier matches the human classification and misclassification rates.

Classification rates of a Bayes classifier for male subjects at 61.1%

Classification rates of a Bayes classifier for female subjects at 57.1%

Classification rates of a Bayes classifier for both genders at 51.6%

Classifier Step 1 2 3 4 5 6 7 8 9 10

Bayes with Gaussian class pdfs (male)

54 43 74 81 21 78 8 69 18 -

Bayes with Parzen windows (male)

54 74 20 67 86 58 17 30 - -

Bayes with Gaussian class pdfs (female)

43 78 20 25 10 77 6 17 82 45

Bayes with Parzen windows (female)

43 80 18 39 86 - - - - -

Bayes with Gaussian class pdfs

78 18 45 26 7 - - - - -

•77% correct classification score for Surprise vs. Happy using features 78, 43, 22, 26, 85.

•Much classification information is lost because PCA is a lossy transformation.

The correct classification scores of the diagonal elements of the four tables, Human classification, Bayes classifier for males only, females only, both genders.