automatic speech classification to five emotional states based on gender information abstract we...
TRANSCRIPT
AUTOMATIC SPEECH CLASSIFICATION TO FIVE AUTOMATIC SPEECH CLASSIFICATION TO FIVE EMOTIONAL STATES BASED ON GENDER INFORMATIONEMOTIONAL STATES BASED ON GENDER INFORMATION
ABSTRACT
We report on the statistics of global prosodic features of certain
emotional speech styles for each gender separately.
The components of the investigation:
•500 emotionally expressed speech segments (2 male, 2 female
actors)
•5 basic emotions, i.e., Anger, Happy, Neutral, Sad and Surprise
•a total of 87 global statistics of energy, pitch and formants
•Bayes classifier where class pdfs are approximated via Parzen
windows or modeled as Gaussians
Dimitrios Ververidis and Constantine KotropoulosARISTOTLE UNIVERSITY OF THESSALONIKI, DEPARTMENT OF INFORMATICSARISTOTLE UNIVERSITY OF THESSALONIKI, DEPARTMENT OF INFORMATICS
Box 451, Thessaloniki 540 06, GREECE, e-mail: {jimver, costas}@zeus.csd.auth.gr URL: http://poseidon.csd.auth.gr
INTRODUCTION
Data
FEATURE EXTRACTION
EVALUATION OF SINGLE FEATURES ON EACH GENDER SEPARETELY
Methods that can classify the emotions in speech would be of great usefulness in computer science, linguistic sciences, psychology, and medical sciences. For example:
1.Automatic Speech Recognition optimization.
2.Parkinson ‘s disease.
3.Improvement of the quality of an interface by detecting the frustration and the dissatisfaction of a user.
•Public domain Danish Emotional Speech database (DES), obtained after request to Inger Samsø Engberg at the faculty of Institute of Electronic Systems at Aalborg Univ., Denmark.
•The data used in the experiments are 500 sentences and words that are located between two silent segments, equally separated into two gender categories.
•Four professional actors, two male and two female are speaking in 5 emotional states, such as anger, happiness, neutral, sadness, and surprise.
Global statistical feature estimation: 87 statistics of the pitch, energy, and formant contours are extracted.
•The statistics are calculated on rising, falling slopes, and maximum/minimum plateaux of the contours.
•Typical examples are: the maximum, the minimum, the median, the mean, and interquartile range.
In order to study the classification ability of each feature, a rating method has been implemented. Each feature is evaluated by the ratio between the between-class variance (σb
2) and the within-class variance (σw
2):
1
( ) {( )( ) | }L
Tw i i i i
i
S P E M M
X X
0 01
( )( )( )L
Tb i i i
i
S P
M M M M
L: number of classes ( )iP : A priori probability of ωi
1 2 nX = [x x ... x ] : Random Vector
Mi : Expected vector of ωi
M0 : Expected vector of the mixture density
Pitch features:
20.Mean range
50.Mean value of falling slopes
43.Mean value of rising slopes
18. Maximum value of pitch
22.Interquartile range of pitch
26.Mean value of plateux at minima
Energy features:
54.Maximum value
78.Mean value of rising slopes
85.Mean value of falling slopes
86. Median value of falling slopes
79. Median value of rising slopes
Bayes classifier with Gauss pdfs
Both Males Females
Correct classification rate
51.6% 61.1% 57.1%
Correct classification
rate
Human Perception
Random
67% 20%
• The Sequential Forward Selection (SFS) algorithm is used for automatic feature selection.
• The criterion employed is the correct classification rate achieved by the selected features.
The correct classification rate is calculated by cross-validation where 90% of the data were used for training and 10% for validation.
Visualization in 2D
CONFUSION MATRICES
Stimuli Response (%)
Neutral Surprise Happiness Sadness Anger
Neutral 51 15 2 28 4
Surprise 5 64 7 9 14
Happiness 9 24 36 13 18
Sadness 17 6 2 70 5
Anger 12 19 26 12 31
Stimuli Response (%)
Neutral Surprise Happiness Sadness Anger
Neutral 60.8 2.6 0.1 31.7 4.8
Surprise 10 59.1 28.7 1 1.3
Happiness 8.3 29.8 56.4 1.7 3.8
Sadness 12.6 1.8 0.1 85.2 0.3
Anger 10.2 8.5 4.5 1.7 75.1
Correct classification rates by humans at 67%
Confusion matrices
•Black color: Humans reach the level of 67% correct classification score.
•Green color: Bayes classifier with Gaussian pdfs using features 76, 18, 44, 27, 7: When the crossvalidation method was used, a correct classification rate of 51.6% is obtained. (54%, when all data are used for training and testing).
•Blue color: The aforementioned method used only for male patterns. The method selects different features and achieves a correct classification score of 61.1%.
•Red color: The aforementioned method used only for female patterns. A correct classification score of 57.1% is achieved.
AUTOMATIC FEATURE SELECTION
•Features selected by SFS with a Bayes classifier when class pdfs are modeled as Gaussian pdfs as a criterion.
•PCA was used in order to reduce the dimensionality from five dimensions (5D) to two dimensions (2D) .
•Only the samples which belong to the interquartile range of the pdf for each class are shown .
•The ellipses denote the 60% likelihood contours for a 2-D Gauss model.
Stimuli Response (%)Neutral Surprise Happiness Sadness Anger
Neutral 67 3 8 20 1Surprise 3 60 18 6 13
Happiness 18 13 43 6 21Sadness 11 3 2 80 3Anger 6 19 13 6 56
Stimuli Response (%)Neutral Surprise Happiness Sadness Anger
Neutral 55 13 6 20 6
Surprise 12 61 14 6 7
Happiness 12 11 54 4 18
Sadness 13 4 4 58 21
Anger 6 10 18 9 57
CONCLUSIONS
•The rates reported can be further improved by analyzing the properties of two class problems.
•The features which can separate two classes could be different from those which separate 5 classes.
•By designing proper decision fusion algorithms, we may combine several two class classifiers and the overall system could outperform the rates obtained by the five class classifiers.
•If the words in the training set could be linguistically different as those in the testing set then the classification would be linguistic unbiased.
•The error estimation can be done according to the elements outside the diagonal line of the confusion matrix, so that the Bayes classifier matches the human classification and misclassification rates.
Classification rates of a Bayes classifier for male subjects at 61.1%
Classification rates of a Bayes classifier for female subjects at 57.1%
Classification rates of a Bayes classifier for both genders at 51.6%
Classifier Step 1 2 3 4 5 6 7 8 9 10
Bayes with Gaussian class pdfs (male)
54 43 74 81 21 78 8 69 18 -
Bayes with Parzen windows (male)
54 74 20 67 86 58 17 30 - -
Bayes with Gaussian class pdfs (female)
43 78 20 25 10 77 6 17 82 45
Bayes with Parzen windows (female)
43 80 18 39 86 - - - - -
Bayes with Gaussian class pdfs
78 18 45 26 7 - - - - -
•77% correct classification score for Surprise vs. Happy using features 78, 43, 22, 26, 85.
•Much classification information is lost because PCA is a lossy transformation.
The correct classification scores of the diagonal elements of the four tables, Human classification, Bayes classifier for males only, females only, both genders.