support vector machines based text-dependent speaker verification using hmm supervectors

18
research & development Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors Chengyu Dong France Telecom R&D Beijing 2008-01-21

Upload: farrah

Post on 27-Jan-2016

29 views

Category:

Documents


0 download

DESCRIPTION

Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors. Chengyu Dong France Telecom R&D Beijing 2008-01-21. Outline. Introduction HMM supervectors Normalized scores using SI HMM supervectors Experimental results Conclusions. Introduction. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

Chengyu Dong

France Telecom R&D Beijing

2008-01-21

Page 2: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Outline

Introduction

HMM supervectors

Normalized scores using SI HMM supervectors

Experimental results

Conclusions

Page 3: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Introduction

Subword based HMM is state-of-the-art technology for text dependent speaker verification (TDSV) system.

Support vector machines (SVM) using GMM supervector linear (GSL) kernel has proven to be an effective method for text-independent tasks.

Both two popular techniques inspire ideas and methods in research for TDSV tasks.

Page 4: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

HMM Baseline Systems

Forced alignment

1 1 11 1 1; log | ; log | ;i i i

i i i

t t ts bi t i t i t iO S P O S P O S

Phone LLR scores:

1 1

1

1;S ;

Ti

i

Nt

i t ii

O O S

Final verification score:

N segments , where frame to frame are belonging

to the ith phone. 1 2

1 11 1 1, ,..., N

N

tt tt tO O O O

1 1it it

A test utterance with observation is firstly segmented into 1 2, ,..., TO O O O

Page 5: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Support Vector Machines

SVM is a two-class classifier. It is another well- used and powerful modeling method.

In the standard formulation, a SVM, , is given by

1

1

,

,

,

M

i ii

M

i ii

f d

d

k d

v w v

v v

v v

f v

Each speaker is modeled by a set of support vectors to form a two-class hyperplane as the figure shows below.

Page 6: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

HMM Supervectors

The Block Diagram of HMM Supervectors Extraction

Page 7: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

HMM Supervectors

Kullback-Leibler divergence (KLD) of two HMM models and is defined as:

aa b a

blog

xR

xD x dx

x

a b

a b a b a b

1

a b

a b

a b1

a b

1

1

J

j j j jj

Jj j

j jj j j

J

j jj

D D a a D b b

D a aD b b

D b b

D b b

Finally deduce a conclusion

A good upper bound

estimation

Page 8: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

HMM Supervectors

Linear kernel:

1

1,

LT

LIN j jj

K X Y X Y x yL

,1

1, max

I JI J

LT

DTA j jj

K X Y X Y j x yM

Dynamic Time Alignment Kernel:

1 1I Ij j X

1 1J Jj j Y

Subject to: M X Y

, 1

, max 1, 1 2

1,

Ti j

Ti j

Ti j

D i j x y

D i j D i j x y

D i j x y

Optimization

function:

,D i j

, 1D i j

1,D i j

1, 1D i j

1

12

Page 9: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

HMM Supervectors

Linear kernel function:

1 1a b2 2

a b1

a b

,

TJ M

LIN g g g g g gg

T

K c m c m

Nonlinear kernel function:

a ba b

,,

1, max

I JI J

T

DTA j k j kj k

K jM

1a a2

1b b2

: ,I I I I

J J J J

j k j k j k j k

j k j k j k j k

where c m

c m

Page 10: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Normalized scores using SI HMM supervectors

The SVM discriminant function can be summarized as:

1

|

TJ

s s s T sj j j

j

S f y d W d

| |s s b sS f f

Normalization form:

The HMM supervector derives from the background SI HMMs

b

The concept of normalizing SVM score comes from zero normalization (Z-Norm).

Page 11: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Normalized scores using SI HMM supervectors

The reason why we use normalized score is

the lack of training data

,s s b Suppose:

denotes the dimensions which are adapted s

is the remaining part of SI HMM means. b

Some part of dimensions are not adapted.

Therefore SI HMM mean vectors remain in the supervector.

Page 12: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Normalized scores using SI HMM supervectors

Un-normalized SVM scores:

1 2

T s

T s T b

S W d

W W d

Normalized SVM scores:

1 2

1

T s T b

T s b T b b

T s b

S W d W d

W W

W

No discrimination

s b only shift the input dimension space

removes the nuisance of supervectorsS

Page 13: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Experimental Results 134 speakers involved in the evaluations. There are total 5292

target trials and 7840 imposter trials. Each participant is required to utter one password twice. The imposters were assumed to know the exact password of the target speaker.

SD HMMs is constructed by MAP adaptation with relevance factor to 1. Context-independent phone units are used as a universal phone set.

The acoustic features used in our system are the first 12 PLP coefficients together with the log-energy of each frame which are calculated every 10 ms using a 25ms Hamming window. The features are processed through a RASTA channel equalization filter. By including the first and the second derivatives over ±2 frame span, 39-dimensional feature vectors were finally used.

Page 14: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Experimental Results

Page 15: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Experimental Results

0. 0%

1. 0%

2. 0%

3. 0%

4. 0%

5. 0%

6. 0%

HMMGMM

SVM HMM+GMM

HMM+SVM

GMM+SVM

HMM+GMM+SVM

0

0. 01

0. 02

0. 03

0. 04

0. 05

0. 06

EER

DCF

System fusions on HMMs, GMMs and SVMs

;S 1 ;SF O f O

is a weighting factor determined a discriminant analysis procedure like LDA which follows the Fisher'sdiscrimination criterion.

System fusions

Page 16: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Experimental Results

2-D distribution of the scores for target and imposter trials (HMM and SVM scores)

3-D distribution of the scores for target and imposter trials (HMM, GMM and SVM scores)

Page 17: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Conclusions

SVMs using HMM supervectors provide another evidence for TDSV systems.

DTA kernel performs a little better than the linear kernel, but requires too much computational cost.

Normalized output score can remarkably improve the performance of the SVM system.

Fusion of HMM and SVM yields excellent results. EER is reduced from 4.01% to 3.47%.

When incorporates the fusion system, EER is further reduced to 2.95%.

Page 18: Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors

research & development

Thanks