support vector machines based text-dependent speaker verification using hmm supervectors
DESCRIPTION
Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors. Chengyu Dong France Telecom R&D Beijing 2008-01-21. Outline. Introduction HMM supervectors Normalized scores using SI HMM supervectors Experimental results Conclusions. Introduction. - PowerPoint PPT PresentationTRANSCRIPT
research & development
Support Vector Machines Based Text-Dependent Speaker Verification Using HMM Supervectors
Chengyu Dong
France Telecom R&D Beijing
2008-01-21
research & development
Outline
Introduction
HMM supervectors
Normalized scores using SI HMM supervectors
Experimental results
Conclusions
research & development
Introduction
Subword based HMM is state-of-the-art technology for text dependent speaker verification (TDSV) system.
Support vector machines (SVM) using GMM supervector linear (GSL) kernel has proven to be an effective method for text-independent tasks.
Both two popular techniques inspire ideas and methods in research for TDSV tasks.
research & development
HMM Baseline Systems
Forced alignment
1 1 11 1 1; log | ; log | ;i i i
i i i
t t ts bi t i t i t iO S P O S P O S
Phone LLR scores:
1 1
1
1;S ;
Ti
i
Nt
i t ii
O O S
Final verification score:
N segments , where frame to frame are belonging
to the ith phone. 1 2
1 11 1 1, ,..., N
N
tt tt tO O O O
1 1it it
A test utterance with observation is firstly segmented into 1 2, ,..., TO O O O
research & development
Support Vector Machines
SVM is a two-class classifier. It is another well- used and powerful modeling method.
In the standard formulation, a SVM, , is given by
1
1
,
,
,
M
i ii
M
i ii
f d
d
k d
v w v
v v
v v
f v
Each speaker is modeled by a set of support vectors to form a two-class hyperplane as the figure shows below.
research & development
HMM Supervectors
The Block Diagram of HMM Supervectors Extraction
research & development
HMM Supervectors
Kullback-Leibler divergence (KLD) of two HMM models and is defined as:
aa b a
blog
xR
xD x dx
x
a b
a b a b a b
1
a b
a b
a b1
a b
1
1
J
j j j jj
Jj j
j jj j j
J
j jj
D D a a D b b
D a aD b b
D b b
D b b
Finally deduce a conclusion
A good upper bound
estimation
research & development
HMM Supervectors
Linear kernel:
1
1,
LT
LIN j jj
K X Y X Y x yL
,1
1, max
I JI J
LT
DTA j jj
K X Y X Y j x yM
Dynamic Time Alignment Kernel:
1 1I Ij j X
1 1J Jj j Y
Subject to: M X Y
, 1
, max 1, 1 2
1,
Ti j
Ti j
Ti j
D i j x y
D i j D i j x y
D i j x y
Optimization
function:
,D i j
, 1D i j
1,D i j
1, 1D i j
1
12
research & development
HMM Supervectors
Linear kernel function:
1 1a b2 2
a b1
a b
,
TJ M
LIN g g g g g gg
T
K c m c m
Nonlinear kernel function:
a ba b
,,
1, max
I JI J
T
DTA j k j kj k
K jM
1a a2
1b b2
: ,I I I I
J J J J
j k j k j k j k
j k j k j k j k
where c m
c m
research & development
Normalized scores using SI HMM supervectors
The SVM discriminant function can be summarized as:
1
|
TJ
s s s T sj j j
j
S f y d W d
| |s s b sS f f
Normalization form:
The HMM supervector derives from the background SI HMMs
b
The concept of normalizing SVM score comes from zero normalization (Z-Norm).
research & development
Normalized scores using SI HMM supervectors
The reason why we use normalized score is
the lack of training data
,s s b Suppose:
denotes the dimensions which are adapted s
is the remaining part of SI HMM means. b
Some part of dimensions are not adapted.
Therefore SI HMM mean vectors remain in the supervector.
research & development
Normalized scores using SI HMM supervectors
Un-normalized SVM scores:
1 2
T s
T s T b
S W d
W W d
Normalized SVM scores:
1 2
1
T s T b
T s b T b b
T s b
S W d W d
W W
W
No discrimination
s b only shift the input dimension space
removes the nuisance of supervectorsS
research & development
Experimental Results 134 speakers involved in the evaluations. There are total 5292
target trials and 7840 imposter trials. Each participant is required to utter one password twice. The imposters were assumed to know the exact password of the target speaker.
SD HMMs is constructed by MAP adaptation with relevance factor to 1. Context-independent phone units are used as a universal phone set.
The acoustic features used in our system are the first 12 PLP coefficients together with the log-energy of each frame which are calculated every 10 ms using a 25ms Hamming window. The features are processed through a RASTA channel equalization filter. By including the first and the second derivatives over ±2 frame span, 39-dimensional feature vectors were finally used.
research & development
Experimental Results
research & development
Experimental Results
0. 0%
1. 0%
2. 0%
3. 0%
4. 0%
5. 0%
6. 0%
HMMGMM
SVM HMM+GMM
HMM+SVM
GMM+SVM
HMM+GMM+SVM
0
0. 01
0. 02
0. 03
0. 04
0. 05
0. 06
EER
DCF
System fusions on HMMs, GMMs and SVMs
;S 1 ;SF O f O
is a weighting factor determined a discriminant analysis procedure like LDA which follows the Fisher'sdiscrimination criterion.
System fusions
research & development
Experimental Results
2-D distribution of the scores for target and imposter trials (HMM and SVM scores)
3-D distribution of the scores for target and imposter trials (HMM, GMM and SVM scores)
research & development
Conclusions
SVMs using HMM supervectors provide another evidence for TDSV systems.
DTA kernel performs a little better than the linear kernel, but requires too much computational cost.
Normalized output score can remarkably improve the performance of the SVM system.
Fusion of HMM and SVM yields excellent results. EER is reduced from 4.01% to 3.47%.
When incorporates the fusion system, EER is further reduced to 2.95%.
research & development
Thanks