an i-vector plda based gender identification approach for severely distorted and multilingual darpa...

An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data Shivesh Ranjan, Gang Liu and John H. L. Hansen {Shivesh.Ranjan, Gang.Liu, John.Hansen}@utdallas.edu Why female and male speech differ? • Vocal Tract Length (14cm vs 17.5cm). • Length of vocal folds (ratio of vocal fold lengths is 0.8). • Larynx Anatomy (difference in thickness). Center for Robust Speech Systems (CRSS) Erik Jonsson School of Engineering & Computer Science The University of Texas at Dallas Richardson, Texas 75080-3021, USA Applications of Gender Identification • Improving speech & speaker recognition accuracy. • Accent identification, Speaker health identification. • Emotion Recognition, Surveillance, Call center-business applications, Human computer intelligent interaction. Motivations for i-Vector based Gender ID approach • i-Vector offers a compact representation of an utterance while preserving the speaker-specific attributes. • Gender is an important speaker specific attribute. • i-Vector based systems are the current state-of-the-art in Speaker ID and Language ID. • GMM-UBM based Gender ID systems. Gender ID framework First 2 dimensions of MMI based 3-D projection of 2600 i-vectors from the FE test-set . Fundamentals of i-Vector G-PLDA framework i-Vector representation: i-Vector is obtained as Maximum-a-Posteriori (MAP) point estimates of given the utterance. Assume a set of R i-Vectors corresponding to a particular gender An i-Vector from such a set can be expressed as: The parameters , are estimated from a collection of gender-labeled i-Vectors. Gender Separability in the i-Vector Space Training and Test data-sets Fisher English (FE)Training Data 20,652 gender-labeled FE utterances (89% of the total corpus) was used to train the UBM, and the T matrix for i-Vector extraction. Fisher English (FE) Test Data 2,600 utterances selected randomly from the FE corpus (11% of the total corpus). Smaller test-sets of duration 20s, 10s, and 3s were also created. DARPA RATS Test Data 438 test-utterances from the different channels (A, B, C, D, E, F, G, H) and the clean (SRC) source, and in 5 different languages. DARPA RATS Unlabeled Development Set 502 utterances per channel for all the channels except H. 480 utterances for channel H. Results on FE data Duration Mismatch Compensation Retrain the gender ID system with corresponding shorter-duration segments. Unsupervised Domain Adaptation Issues with the RATS test-set • Gender ID system is trained only on FE data, and no gender-labeled data is available for the RATS test-set. • 4 of the 5 languages are not present in the FE training-set. Unsupervised Clustering • Use unsupervised clustering (Label Generating-Max Margin Clustering) to assign labels to unlabeled RATS development data. • Estimate the in-domain PLDA model using the estimated labels. Out-of-domain PLDA model adaptation Gender ID results on RATS data i-Vector based Gender ID: Conclusions • On FE test-sets, the proposed approach is able to achieve accuracy and EER of up to 97.62% and 2.31% respectively. Duration mismatch compensation offers significantly smaller degradation in performance for shorter duration test segments. • On RATS test-set, unsupervised domain adaptation strategy offered a 6.8% relative gain (5.25% absolute) in classification accuracy, and a 14.75% relative reduction (3.08% absolute) in EER.

Upload: baldric-green

Post on 17-Jan-2018

219 views

Category:

Documents

0 download

Report

Download

Embed Size (px):

TRANSCRIPT

An i-Vector PLDA based Gender Identification Approach for Severely Distorted and Multilingual DARPA RATS Data

Shivesh Ranjan, Gang Liu and John H. L. Hansen

{Shivesh.Ranjan, Gang.Liu, John.Hansen}@utdallas.edu

Why female and male speech differ?

• Vocal Tract Length (14cm vs 17.5cm).

• Length of vocal folds (ratio of vocal fold lengths is 0.8).

• Larynx Anatomy (difference in thickness).

Center for Robust Speech Systems (CRSS)

Erik Jonsson School of Engineering & Computer Science

The University of Texas at Dallas

Richardson, Texas 75080-3021, USA

Applications of Gender Identification

• Improving speech & speaker recognition accuracy.

• Accent identification, Speaker health identification.

• Emotion Recognition, Surveillance, Call center-business applications, Human computer intelligent interaction.

Motivations for i-Vector based Gender ID approach

• i-Vector offers a compact representation of an utterance while preserving the speaker-specific attributes.

• Gender is an important speaker specific attribute.

• i-Vector based systems are the current state-of-the-art in Speaker ID and Language ID.

• GMM-UBM based Gender ID systems.

Gender ID framework

First 2 dimensions of MMI based 3-D projection of 2600 i-vectors from the FE test-set.

Fundamentals of i-Vector G-PLDA framework

i-Vector representation:

i-Vector is obtained as Maximum-a-Posteriori (MAP) point estimates of given the utterance.

Assume a set of R i-Vectors corresponding to a particular gender

An i-Vector from such a set can be expressed as:

The parameters , are estimated from a collection of gender-labeled i-Vectors.

Gender Separability in the i-Vector Space

Training and Test data-sets

Fisher English (FE)Training Data

20,652 gender-labeled FE utterances (89% of the total corpus) was used to train the UBM, and the T matrix for i-Vector extraction.

Fisher English (FE) Test Data

2,600 utterances selected randomly from the FE corpus (11% of the total corpus). Smaller test-sets of duration 20s, 10s, and 3s were also created.

DARPA RATS Test Data

438 test-utterances from the different channels (A, B, C, D, E, F, G, H) and the clean (SRC) source, and in 5 different languages.

DARPA RATS Unlabeled Development Set

502 utterances per channel for all the channels except H. 480 utterances for channel H.

Results on FE data

Duration Mismatch Compensation

Retrain the gender ID system with corresponding shorter-duration segments.

Unsupervised Domain Adaptation

Issues with the RATS test-set

• Gender ID system is trained only on FE data, and no gender-labeled data is available for the RATS test-set.

• 4 of the 5 languages are not present in the FE training-set.

Unsupervised Clustering

• Use unsupervised clustering (Label Generating-Max Margin Clustering) to assign labels to unlabeled RATS development data.

• Estimate the in-domain PLDA model using the estimated labels.

Out-of-domain PLDA model adaptation

Gender ID results on RATS data

i-Vector based Gender ID: Conclusions

• On FE test-sets, the proposed approach is able to achieve accuracy and EER of up to 97.62% and 2.31% respectively. Duration

mismatch compensation offers significantly smaller degradation in performance for shorter duration test segments.

• On RATS test-set, unsupervised domain adaptation strategy offered a 6.8% relative gain (5.25% absolute) in classification

accuracy, and a 14.75% relative reduction (3.08% absolute) in EER.

Relevance Vector Machines with Empirical Likelihood-Ratio Kernels for PLDA Speaker Verification

Comité R&D Groupe de travail MASP PLDA 16 06 2008

SPEAKER VERIFICATION USING PLDA Modeling ON LASSO …

Fusion of SNR-Dependent PLDA Models for Noise Robust Speaker Verification

Distorted identities

Transcending Distorted Thinking

Exploring some limits of Gaussian PLDA modeling for i-vector distributionscs.uef.fi/odyssey2014/program/pdfs/3.pdf · 2014. 6. 13. · PLDA (G-PLDA), assumed that speaker and residual

Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck …llu/pdf/IS2014_lianglu.pdf · 2016-08-31 · Probabilistic Linear Discriminant Analysis (PLDA) with Bottleneck

An Enhanced Mechanism to Detect Distorted Fingerprints€¦ · An Enhanced Mechanism to Detect Distorted Fingerprints ... the distorted fingerprint rectification algorithm in

Habermas__On Systematically Distorted Communication

JOURNAL OF LA SNR-Invariant PLDA Modeling in Nonparametric

Distorted Nature

Http:// Distorted Images of God

2016 PCGHS Distorted Drill

SNRAware*PLDA*Modeling*for*Robust Speaker*Veriﬁcaonmwmak/papers/SYSU-CMU-2015.pdf · SNRAware*PLDA*Modeling*for*Robust Speaker*Veriﬁcaon * Departmentof*Electronic*and*Informaon*Engineering*

Distorted Reality

Distorted and Stunted Growth - Alerts

Ideology as Distorted Belief

Existentialist Freedom, Distorted Normativity, and

PLDA Communication events June 2006

How Privacy Distorted Standing Law

Geochemistry of aerodynamically distorted Australasian

RS2011 India OSIsoft Shivesh PI Coresight

Distorted History of Pakistan

[staa^Mo^^a^ae remHen qwfla a qulen lae plda. NL1111S. 15

Extraction of Virtual Baselines From Distorted Document ... of Virtual Baselines from Distorted Document ... these projection lines in sequence ... From Distorted Document Images Using

MISSING AND DISTORTED TIME

Distorted Model

Towards PLDA-RBM based Speaker Recognition in Mobile

Frontal Lisp, Lateral Lisp, Distorted R ppt.pdf · Frontal Lisp, Lateral Lisp, Distorted R Pam Marshalla, MA, CCC-SLP, Speech-Language Pathologist Morning: Lisps Afternoon: Distorted

Quantum Cosmology with Distorted Gravity

VIII XORNADAS PLDA ADICCIÓNS E ÁMBITO EDUCATIVO: Respostas asistenciais

NL WELKOM IN PLDA v.0 · WELKOM IN PLDA NL_WELKOM IN PLDA v.0.3 Pagina 3 van 16 ALGEMEEN Deze documentatie is bedoeld voor de externe gebruikers (economische operatoren) met alle

Shivesh ntpc ppt

Comité R&D Groupe de travail MASP PLDA 27 mai 2008