an i-vector plda based gender identification approach for severely distorted and multilingual darpa...

1
An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen {Shivesh.Ranjan, Gang.Liu, John.Hansen}@utdallas.edu Why female and male speech differ? Vocal Tract Length (14cm vs 17.5cm). Length of vocal folds (ratio of vocal fold lengths is 0.8). Larynx Anatomy (difference in thickness). Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science The University of Texas at Dallas Richardson, Texas 75080-3021, USA Applications of Gender Identification Improving speech & speaker recognition accuracy. Accent identification, Speaker health identification. Emotion Recognition, Surveillance, Call center-business applications, Human computer intelligent interaction. Motivations for i-Vector based Gender ID approach i-Vector offers a compact representation of an utterance while preserving the speaker-specific attributes. Gender is an important speaker specific attribute. i-Vector based systems are the current state-of-the-art in Speaker ID and Language ID. GMM-UBM based Gender ID systems. Gender ID framework First 2 dimensions of MMI based 3-D projection of 2600 i-vectors from the FE test-set . Fundamentals of i-Vector G-PLDA framework i-Vector representation: i-Vector is obtained as Maximum-a-Posteriori (MAP) point estimates of given the utterance. Assume a set of R i-Vectors corresponding to a particular gender An i-Vector from such a set can be expressed as: The parameters , are estimated from a collection of gender-labeled i-Vectors. Gender Separability in the i-Vector Space Training and Test data-sets Fisher English (FE)Training Data 20,652 gender-labeled FE utterances (89% of the total corpus) was used to train the UBM, and the T matrix for i-Vector extraction. Fisher English (FE) Test Data 2,600 utterances selected randomly from the FE corpus (11% of the total corpus). Smaller test-sets of duration 20s, 10s, and 3s were also created. DARPA RATS Test Data 438 test-utterances from the different channels (A, B, C, D, E, F, G, H) and the clean (SRC) source, and in 5 different languages. DARPA RATS Unlabeled Development Set 502 utterances per channel for all the channels except H. 480 utterances for channel H. Results on FE data Duration Mismatch Compensation Retrain the gender ID system with corresponding shorter-duration segments. Unsupervised Domain Adaptation Issues with the RATS test-set Gender ID system is trained only on FE data, and no gender-labeled data is available for the RATS test-set. 4 of the 5 languages are not present in the FE training-set. Unsupervised Clustering Use unsupervised clustering (Label Generating-Max Margin Clustering) to assign labels to unlabeled RATS development data. Estimate the in-domain PLDA model using the estimated labels. Out-of-domain PLDA model adaptation Gender ID results on RATS data i-Vector based Gender ID: Conclusions On FE test-sets, the proposed approach is able to achieve accuracy and EER of up to 97.62% and 2.31% respectively. Duration mismatch compensation offers significantly smaller degradation in performance for shorter duration test segments. On RATS test-set, unsupervised domain adaptation strategy offered a 6.8% relative gain (5.25% absolute) in classification accuracy, and a 14.75% relative reduction (3.08% absolute) in EER.

Upload: baldric-green

Post on 17-Jan-2018

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen

An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data

Shivesh Ranjan, Gang Liu and John H. L. Hansen

{Shivesh.Ranjan, Gang.Liu, John.Hansen}@utdallas.edu

Why female and male speech differ?

• Vocal Tract Length (14cm vs 17.5cm).

• Length of vocal folds (ratio of vocal fold lengths is 0.8).

• Larynx Anatomy (difference in thickness).

Center for Robust Speech Systems (CRSS)

Erik Jonsson School of Engineering & Computer Science

The University of Texas at Dallas

Richardson, Texas 75080-3021, USA

Applications of Gender Identification

• Improving speech & speaker recognition accuracy.

• Accent identification, Speaker health identification.

• Emotion Recognition, Surveillance, Call center-business applications, Human computer intelligent interaction.

Motivations for i-Vector based Gender ID approach

• i-Vector offers a compact representation of an utterance while preserving the speaker-specific attributes.

• Gender is an important speaker specific attribute.

• i-Vector based systems are the current state-of-the-art in Speaker ID and Language ID.

• GMM-UBM based Gender ID systems.

Gender ID framework

First 2 dimensions of MMI based 3-D projection of 2600 i-vectors from the FE test-set.

Fundamentals of i-Vector G-PLDA framework

i-Vector representation:

i-Vector is obtained as Maximum-a-Posteriori (MAP) point estimates of given the utterance.

Assume a set of R i-Vectors corresponding to a particular gender

An i-Vector from such a set can be expressed as:

The parameters , are estimated from a collection of gender-labeled i-Vectors.

Gender Separability in the i-Vector Space

Training and Test data-sets

Fisher English (FE)Training Data

20,652 gender-labeled FE utterances (89% of the total corpus) was used to train the UBM, and the T matrix for i-Vector extraction.

Fisher English (FE) Test Data

2,600 utterances selected randomly from the FE corpus (11% of the total corpus). Smaller test-sets of duration 20s, 10s, and 3s were also created.

DARPA RATS Test Data

438 test-utterances from the different channels (A, B, C, D, E, F, G, H) and the clean (SRC) source, and in 5 different languages.

DARPA RATS Unlabeled Development Set

502 utterances per channel for all the channels except H. 480 utterances for channel H.

Results on FE data

Duration Mismatch Compensation

Retrain the gender ID system with corresponding shorter-duration segments.

Unsupervised Domain Adaptation

Issues with the RATS test-set

• Gender ID system is trained only on FE data, and no gender-labeled data is available for the RATS test-set.

• 4 of the 5 languages are not present in the FE training-set.

Unsupervised Clustering

• Use unsupervised clustering (Label Generating-Max Margin Clustering) to assign labels to unlabeled RATS development data.

• Estimate the in-domain PLDA model using the estimated labels.

Out-of-domain PLDA model adaptation

Gender ID results on RATS data

i-Vector based Gender ID: Conclusions

• On FE test-sets, the proposed approach is able to achieve accuracy and EER of up to 97.62% and 2.31% respectively. Duration

mismatch compensation offers significantly smaller degradation in performance for shorter duration test segments.

• On RATS test-set, unsupervised domain adaptation strategy offered a 6.8% relative gain (5.25% absolute) in classification

accuracy, and a 14.75% relative reduction (3.08% absolute) in EER.