hiwire meeting granada, june 9-10, 2005
DESCRIPTION
GSTC UGR. HIWIRE MEETING Granada, June 9-10, 2005. JOSÉ C. SEGURA, LUZ GARCÍA JAVIER RAMÍREZ. Schedule. Non-linear feature normalization ECDF segmental implementation Progressive equalization 2-class normalization Non-linear speaker adaptation/independence - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/1.jpg)
HIWIRE MEETINGHIWIRE MEETINGGranada, June 9-10, 2005Granada, June 9-10, 2005
JOSÉ C. SEGURA, LUZ GARCÍAJOSÉ C. SEGURA, LUZ GARCÍAJAVIER RAMÍREZJAVIER RAMÍREZ
GSTC UGRGSTC UGR
![Page 2: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/2.jpg)
2 HIWIRE Meeting – Granada, 9-10 June, 2005
Schedule
Non-linear feature normalization ECDF segmental implementation Progressive equalization 2-class normalization
Non-linear speaker adaptation/independence Non-linear feature normalization Non-linear model adaptation
VAD and technique combination MO-LRT Bi-spectrum based VAD Combined Front-End
![Page 3: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/3.jpg)
3 HIWIRE Meeting – Granada, 9-10 June, 2005
Schedule
Non-linear feature normalization ECDF segmental implementation Progressive equalization 2-class normalization
Non-linear speaker adaptation/independence Non-linear feature normalization Non-linear model adaptation
VAD and technique combination MO-LRT Bi-spectrum based VAD Combined Front-End
![Page 4: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/4.jpg)
4 HIWIRE Meeting – Granada, 9-10 June, 2005
ECDF-based nonlinear transformation (1)
CDF-matching nonlinear transformation
))((][
))((][)()(
)()()()(
)(])[(][)(
1
11
xCCxTy
yCCyTxyCxC
duupyCduupxC
ypxTpxTyxpx
XY
YXYX
y
YY
x
XX
YYX
In previous works we modeled CDF’s by using histograms
![Page 5: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/5.jpg)
5 HIWIRE Meeting – Granada, 9-10 June, 2005
ECDF-based nonlinear transformation (2)
An alternative algorithm based on Order Statistics
Ttr
TttrT
T
trCyCCx
ECDFTtT
tryC
yyyyyyY
XYXt
tY
TrrrT
)(1
,,1)]([
5.0)())(ˆ(ˆ
,,15.0)(
)(ˆ
},,,{
11
)()2()1(21
Is faster, only requires sorting and table indexing Results are almost equal to those obtained with histograms
![Page 6: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/6.jpg)
6 HIWIRE Meeting – Granada, 9-10 June, 2005
ECDF Segmental implementation
Based on a sliding window
)12()2()1(},,,,{ TrrrTttTtt yyyyyyY
José C. Segura, M. Carmen Benítez, Ángel de la Torre, Antonio J. Rubio, Javier Ramírez, Cepstral domain segmental nonlinear feature transformations for robust speech recognition, IEEE Signal Processing Letters.,Vol.11, pp. 666-669, 2004
![Page 7: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/7.jpg)
7 HIWIRE Meeting – Granada, 9-10 June, 2005
Progressive normalization
As not all MFCC offer equal discrimination And HEQ introduces certain distortion Normalization up to a certain MFCC gives the best performance
![Page 8: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/8.jpg)
8 HIWIRE Meeting – Granada, 9-10 June, 2005
ECDF-based normalization results
![Page 9: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/9.jpg)
9 HIWIRE Meeting – Granada, 9-10 June, 2005
2-class normalization (1)
A first approach on parametric non-linear equalization PDF’s are modeled as two-Gaussian class mixtures for each MFCC Actually we use speech/noise like classes EM is used on each sentence to obtain the Gaussian classes
C0 C1
Tes
t01
Tes
t02
![Page 10: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/10.jpg)
10 HIWIRE Meeting – Granada, 9-10 June, 2005
2-class normalization (2)
22
2
211
1
1 )|2()|1(ˆ xxy
yxx
y
y yyP
yyPx
Equalization of C1 between Test02(Car) and Test01(Clean) of WSJ0 data
Nonlinear parametric transformation
![Page 11: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/11.jpg)
11 HIWIRE Meeting – Granada, 9-10 June, 2005
2-class normalization results
![Page 12: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/12.jpg)
12 HIWIRE Meeting – Granada, 9-10 June, 2005
Schedule
Non-linear feature normalization ECDF segmental implementation Progressive equalization 2-class normalization
Non-linear speaker adaptation/independence Non-linear feature normalization Non-linear model adaptation
VAD and technique combination MO-LRT Bi-spectrum based VAD Combined Front-End
![Page 13: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/13.jpg)
13 HIWIRE Meeting – Granada, 9-10 June, 2005
ECDF Features Normalization
HEQ as a non-linear speaker normalization technique using ECDF
-20 0 20 400
0.5
1
a) Reference (blue) and estimated (red) ECDF-20 -10 0 10 20
-20
0
20
40
b) Transformation
0 200 400 600-20
-10
0
10
20
30
c) Original features0 200 400 600
-20
-10
0
10
20
d) Transformed features
![Page 14: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/14.jpg)
14 HIWIRE Meeting – Granada, 9-10 June, 2005
ECDF Norm. for SA
85,50
86,00
86,50
87,00
87,50
88,00
88,50
89,00
89,50
TES
T01
WA
C
MLLR BASELINE AFE ECDF NORM
Test01WER (%)
Test01WAC (%)
MLLR 10,97 89,03
BASELINE 13,22 86,78
AFE 12,74 87,26
ECDF 11,23 88,77
![Page 15: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/15.jpg)
15 HIWIRE Meeting – Granada, 9-10 June, 2005
ECDF Models Adaptation
2 APPROACHES Pure Equalization: “HEQ MOD”
new Gaussian Distributions:
- shift on the means: X ->X HEQ
- scale factor on the variances
Equalization mixed with linear transformation: “HEQ PLIN”
LT: XA = M*X + B
M’, B’ such that
D(XA, XHEQ) = || M’X+B’ - XHEQ || 2 = minimum
Speaker Independent Features
Sp
ea
ke
r S
pe
cif
ic F
ea
ture
s
![Page 16: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/16.jpg)
16 HIWIRE Meeting – Granada, 9-10 June, 2005
Models Adaptation
85,00
85,50
86,00
86,50
87,00
87,50
88,00
88,50
89,00
89,50
TES
T01
WA
C
MLLR BASELINE HEQ MOD HEQ PLIN
Test01WER (%)
Test01WAC (%)
MLLR 10,97 89,03
BASELINE 13,22 86,78
HEQ MOD 12,95 87,05
HEQ PLIN 13,31 86,52
![Page 17: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/17.jpg)
17 HIWIRE Meeting – Granada, 9-10 June, 2005
SA methods. Comparison
85,0085,5086,0086,5087,0087,5088,0088,5089,0089,50
Tes
t 01
WA
C
Baseline MLLR HEQMOD
HEQ PLIN ECDFNORM
Speaker adaptation methods
![Page 18: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/18.jpg)
18 HIWIRE Meeting – Granada, 9-10 June, 2005
Future Work 1/2
45,00
50,00
55,00
60,00
65,00
70,00W
AC
MLLR MRC BASELINE AFE ECDF NORM
SA models using MLLR are not robust against noise
Feature Normalization + MLLR
![Page 19: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/19.jpg)
19 HIWIRE Meeting – Granada, 9-10 June, 2005
Future Work 2/2
Non linear Feature Normalization and Model Adaptation
Development of further experiments with more complex tasks on WSJ1 database (spoke3 and spoke4)
![Page 20: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/20.jpg)
20 HIWIRE Meeting – Granada, 9-10 June, 2005
Schedule
Non-linear feature normalization ECDF segmental implementation Progressive equalization 2-class normalization
Non-linear speaker adaptation/independence Non-linear feature normalization Non-linear model adaptation
VAD and technique combination MO-LRT Bi-spectrum based VAD Combined Front-End
![Page 21: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/21.jpg)
21 HIWIRE Meeting – Granada, 9-10 June, 2005
Previous work on VAD
Voice activity detection: Kullback-Leibler divergence
J. Ramírez, J. C. Segura, C. Benítez, A. de la Torre, A. Rubio, “A New Kullback-Leibler VAD for Robust Speech Recognition”,IEEE Signal Processing Letters, Vol.11, No.2, pp. 666-669, Feb. 2004
Long-term spectral divergence J. Ramírez, J. C. Segura, C. Benítez, A. de la Torre, A. Rubio, “Efficient Voice Activity
Detection Algorithms Using Long-Term Speech Information”,Speech Communication, Vol. 42/3-4, pp. 271-287, 2004
Subband SNR estimation using OS filters J. Ramírez, J. C. Segura, C. Benítez, A. de la Torre, A. Rubio, “An Effective Subband
OSF-based VAD with Noise Reduction for Robust Speech Recognition”,To appear in IEEE Transactions on Speech and Audio Processing, 2005/2006.
Multiple observation likelihood ratio test J. Ramírez, J. C. Segura, C. Benítez, L. García, A. Rubio, “Statistical Voice Activity
Detection using a Multiple Observation Likelihood Ratio Test”, To appear in IEEE Signal Processing Letters
![Page 22: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/22.jpg)
22 HIWIRE Meeting – Granada, 9-10 June, 2005
Likelihood ratio test
Generalization of the Sohn’s VAD: J. Sohn, N. S. Kim, W. Sung, “A statistical model-based voice activity
detection”, IEEE Signal Processing Letters, vol. 16 (1), pp. 1-3, 1999.
Two hypothesis are considered: H0 : y= n Absence of speech
(Silence) H1 : y= s + n Speech presence
Optimum decision rule (Bayes classifier):
l-frame observation vector:
LRT evaluation Adequate signal model
)( )H(P
)H(P
)H|ˆ(
)H|ˆ()ˆ(
1
0
0H|
1H|
1H
0H0
1
l
ll p
pL
y
yy
y
y
ly
LRT: Likelihood ratio test
![Page 23: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/23.jpg)
23 HIWIRE Meeting – Granada, 9-10 June, 2005
Multiple observation likelihood ratio test
MO-LRT (multiple observation LRT): Given a set of N= 2m+1 consecutive observations:
LRT:
Under statistical independence:
Recursive Log-LRT:
ml
mlk k
kmllmlN
k
k
p
pL
)H|ˆ(
)H|ˆ()ˆ,...,ˆ,...,ˆ(
0H|
1H|
0
1
y
yyyy
y
y
)( )H(P
)H(P
)H|ˆ,...,ˆ,...,ˆ(
)H|ˆ,...,ˆ,...,ˆ()ˆ,...,ˆ,...,ˆ(
1
0
0H|,...,,...,
1H|,...,,...,
1H
0H0
1
mllml
mllmlmllmlN
mllml
mllml
p
pL
yyy
yyyyyy
yyy
yyy
ml
mlk k
kmllmlN
k
k
p
p
)H|ˆ(
)H|ˆ(ln)ˆ,...,ˆ,...,ˆ(
0H|
1H|
0
1
y
yyyy
y
y
mllllml yyyyy ˆ ..., ,ˆ ,ˆ ,ˆ ..., ,ˆ 11
![Page 24: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/24.jpg)
24 HIWIRE Meeting – Granada, 9-10 June, 2005
Analysis: Optimum delay
-20 0 20 400
0.01
0.02
0.03
0.04m= 1
-20 0 20 400
0.01
0.02
0.03
0.04m= 4
-20 0 20 400
0.01
0.02
0.03
0.04m= 6
-20 0 20 400
0.01
0.02
0.03
0.04m= 8
Non-speech
Speech
Non-speech
Speech
Non-speech
Speech
Non-speech
Speech
Probability distributions Classification errors
Increasing m (number of the observations): Reduction of the overlap between the distributions Misclassification errors:
Reduced for speech vs Moderate increase for non-speech
1 2 3 4 5 6 7 8 9 10 11 120.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
0.22
0.24
0.26
Non-Speech
SpeechTotal error
m
![Page 25: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/25.jpg)
25 HIWIRE Meeting – Granada, 9-10 June, 2005
Analysis: Optimum delay
ROC analysis AURORA 3 Spanish (High-Ch1, 5dB)
Sohn’s VAD
MO-LRT
![Page 26: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/26.jpg)
26 HIWIRE Meeting – Granada, 9-10 June, 2005
Speech recognition experiments
MO-LRT G.729 AMR1 AMR2 AFE
86.14 70.32 74.29 82.89 83.29
Ref. VAD Woo Li Marzinzik Sohn
86.86 81.09 82.11 85.23 83.80
VAD
Noiseestimation
WienerFiltering
(WF)MFCC HTK
Framedropping
(FD)
Average Wacc (%) for CT and MCT
AURORA 2:
![Page 27: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/27.jpg)
27 HIWIRE Meeting – Granada, 9-10 June, 2005
Speech recognition experiments
WACC (%) MO-LRT G.729 AMR1 AMR2 AFE
WM 96.33 88.62 94.65 95.67 95.28
MM 91.61 72.84 80.59 90.91 90.23
HM 87.43 65.50 62.41 85.77 77.53
Average 91.79 75.65 74.33 90.78 87.68
MO-LRT Woo Li Marzinzik Sohn
WM 96.33 95.35 91.82 94.29 96.07
MM 91.61 89.30 77.45 89.81 91.64
HM 87.43 83.64 78.52 79.43 84.03
Average 91.79 89.43 82.60 87.84 90.58
AURORA 3: Spanish SpeechDat-Car
![Page 28: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/28.jpg)
28 HIWIRE Meeting – Granada, 9-10 June, 2005
Work in progress
Statistical tests in the bispectrum domain:
J. M. Górriz, et al., “Voice Activity Detection Based on HOS”, 8th International Work-Conference on Artificial Neural Networks (IWANN'2005)
J. M. Górriz, et al., “Statistical Tests for Voice Activity Detection”, Non-linear Speech Processing (NOLISP’2005), 2005.
J. M. Górriz, et al., “Bispectra analysis-based VAD for robust speech recognition”, First International Work-Conference on the Interplay Between Natural and Artificial Computation (IWINAC’2005)
Bispectrum LRT (application of MO-LRT on the bispectra)
J. M. Górriz, et al, “An Improved MO-LRT VAD Based on a Bispectra Gaussian Model”, Submitted to Electronics Letters.
![Page 29: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/29.jpg)
29 HIWIRE Meeting – Granada, 9-10 June, 2005
GSTC-UGR speech recognition results
Noisereduction
LTSEVAD
Framedropping
SegmentalECDF
(Gaussian ref.)Progressive
HTK
LTSE VAD: J. Ramírez, et al., “Efficient Voice Activity Detection Algorithms Using Long-
Term Speech Information”, Speech Communication, Vol. 42/3-4, pp. 271-287, 2004
Segmental ECDF: 60 frame delay J. C. Segura, et al., “Cepstral Domain Segmental Nonlinear Feature
Transformations for Robust Speech Recognition”, IEEE Signal Processing Letters, Vol.11, No. 5, pp. 517 - 520, 2004
Progressive: Log-E + Up to the 4th cepstral coefficient
![Page 30: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/30.jpg)
30 HIWIRE Meeting – Granada, 9-10 June, 2005
GSTC-UGR speech recognition results
AURORA 2
WACC (%)
SET A SET B SET C Average
Multicondition training
GSTC-UGR 90.58 90.23 89.10 90.14
HIWIRE baseline 88.40 88.96 88.97 88.74
Clean training GSTC-UGR 86.01 86.84 85.00 86.14
HIWIRE baseline 64.00 69.10 64.73 66.18
AURORA 3
WACC (%)
Italian Spanish Average
WM MM HM WM MM HM WM MM HM
GSTC-UGR 96.94 91.89 86.19 96.52 92.03 89.95 96.73 91.96 88.07
HIWIRE baseline 94.40 87.14 46.75 89.30 83.18 65.50 91.85 85.16 56.13
WER Relative Improvements: 12% (MCT) 59% (CT)
WER Relative Improvements: 60% (WM) 46% (MM) 73% (HM)
![Page 31: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/31.jpg)
31 HIWIRE Meeting – Granada, 9-10 June, 2005
GSTC-UGR speech recognition results
Test 1 2 3 4 5 6 7 Avg
GSTC-UGR 13.37 19.52 37.53 40.22 39.19 37.16 39.30 32.33
HIWIRE baseline 13.22 24.68 46.00 47.62 52.67 44.79 54.73 40.53
Test 8 9 10 11 12 13 14 Avg.
GSTC-UGR 21.40 30.76 45.49 48.43 50.46 45.30 48.77 41.52
HIWIRE baseline 22.58 36.21 55.40 58.31 65.34 54.11 62.28 50.60
AURORA 4 WER (%) (clean training experiments)
WER Relative Improvements: 20% (Test sets 1:7) 17% (Test sets 8:14)
![Page 32: HIWIRE MEETING Granada, June 9-10, 2005](https://reader036.vdocuments.site/reader036/viewer/2022062315/56814c58550346895db97677/html5/thumbnails/32.jpg)
HIWIRE MEETINGHIWIRE MEETINGGranada, June 9-10, 2005Granada, June 9-10, 2005
JOSÉ C. SEGURA, LUZ GARCÍAJOSÉ C. SEGURA, LUZ GARCÍAJAVIER RAMÍREZJAVIER RAMÍREZ
GSTC UGRGSTC UGR