2007-02-081 higher order cepstral moment normalization (hocmn) for robust speech recognition...

30
2007-02-08 1 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

Upload: austyn-sherwood

Post on 29-Mar-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 1

Higher Order Cepstral Moment Normalization

(HOCMN) for Robust Speech Recognition

Speaker: Chang-wen HsuAdvisor: Lin-shan Lee

2007/02/08

Page 2: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 2

Outline Introduction

CMS/CMVN/HEQ Higher Order Cepstral Moment

Normalization (HOCMN) Even order HOCMN Odd order HOCMN Cascade system Fundamental principles Experimental Results

Conclusions

Page 3: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 3

Introduction Feature normalization in cepstral domain is widely

used in robust speech recognition: CMS: normalizing the first moment CMVN: normalizing the first and second moments Cepstrum Third-order Normalization (CTN): normalizing

the first three moments (Electronics Letters, 1999) HEQ: normalizing the full distribution (all order moments) How about normalizing a few higher order moments only?

Higher order moments are more dominated by higher value samples

Normalizing only a few higher order moments may be good enough, while avoiding over-normalization

Page 4: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 4

Introduction• Cepstral Normalization

• CMS: •CMVN:

Timeprogressively

( ) ( ) [ ( )]CMSX n X n E X n ( ) [ ( )]

( )CMVNX

X n E X nX n

Page 5: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 5

Introduction• Histogram Equalization

Page 6: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 6

Higher Order Cepstral Moment Normalization If the distribution of the cepstral coefficients can be

assumed to be quasi-Gaussian: Odd order moments can be normalized to zero Even order moments can be normalized to some specific

values Define notation:

X(n): a certain cepstral coefficient of the n-th frame X[k](n): with the k-th moment normalized X[k,l](n): with both the k-th and l-th moments normalized X[k,l,m](n): with the k-th, l-th and m-th moments normalized HOCMN[k,l,m]: an operator normalizing the k-th, l-th and m-

th moments For example

Page 7: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 7

Cepstral Moment Normalization Moment estimation:

Time average of MFCC parameters

Purpose: For odd order L

For even order N

[ ] ( ) 0LLE X n

Page 8: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 8

Even order HOCMN Only the moment for a single even order N can

be normalized and CMS can always be performed in advance

Therefore, the new feature coefficients can be expressed as

Let the desired value of the N-th moment of the new feature coefficient be , that is

Page 9: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 9

Even order HOCMN Aurora 2, clean condition training, word accuracy averaged over 0~20dB

and all types of noise (sets A,B,C)

CMVN=HOCMN[1,2]

Page 10: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 10

80.40

80.80

81.20

81.60

82.00

82.40

60 70 80 90 100 110 120l

Acc.

[1,100]

Even order HOCMN Evaluation of the expectation value for the moments

Sample average over a reference interval• Full utterance• Moving window of l frames

…… X(n-3) X(n-2) X(n-1) X(n) X(n+1) X(n+2) X(n+3) ……

l

to be normalized

l=86 is best

Page 11: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 11

Experimental results

CMVN (l=86)

CMVN (full-utterance)

Aurora 2, clean condition training, word accuracy averaged over 0~20dB and all types of noise (sets A,B,C)

Page 12: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 12

Odd order HOCMN (1/3) Besides the first moment (CMS), only

another single moment of odd order L can be normalized in addition

The L-th HOCMN can be obtained from the (L-1)-th HOCMN (which is for an even number as discussed previously)

Then, the new feature coefficients can be expressed as

“a” and “c” are to be solved

Page 13: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 13

Odd order HOCMN (2/3) To solve “a” and “c”

The first moment is set to zero The N-th moment is set to zero

After some mathematics and approximation

Page 14: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 14

Odd order HOCMN (3/3) Because the formula for “a” above is only

an approximation, a recursive solution can be obtained in about two iterations

Page 15: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 15

Cascade system Cascading an odd order operator HOCMN[1,L] (L

is an odd number) and an even order operator HOCMN[1,N] (N is an even number) can obtain an operator HOCMN[1,L,N]

Page 16: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 16

Experimental results

CN

CTN=HOCMN[1,2,3]

CN (l=86)

Aurora 2, clean condition training, word accuracy averaged over 0~20dB and all types of noise (sets A,B,C)

CMVN

CTN=HOCMN[1,2,3]

CMVN (l=86)

Page 17: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 17

Skewness and Kurtosis Skewness

Third moment about the mean and normalized to the standard deviation

Pdf departure from symmetric• Positive/negative indicate skew to right/left• Zero indicate symmetric

Kurtosis

Fourth moment about the mean and normalized to the standard deviation

Peaked or “flat with tails of large size” as compared to standard Gaussian

• “3” is the fourth moment of N(0,1)• Positive/negative indicate flatter/more peaked

Page 18: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 18

Skewness and Kurtosis 1st-moment always normalized Define: Generalized skewness of odd order L

L are not necessary 3 Similar meaning as skewness (skew to right or left)

except in the sense of L–th moment

Define: Generalized kurtosis of even order N

N are not necessary 4 Similar meaning as kurtosis (peaked or flat) except

in the sense of N–th moment

( ) , : an odd integerL LS E X L

Page 19: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 19

Skewness and Kurtosis Normalizing odd order moment is to constrain

the pdf to be symmetric about the origin Except in the sense of L-th moment

Normalizing even order moment is to constrain the pdf to be “equally flat with tails of equal size” Except in the sense of N-th moment

Page 20: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 20

The order of normalized moments are not necessary integers

Generalized moment Type 1:

• Reduced to odd order moment when u is an odd integer L (ex: L=1 or 3)

Type 2:

• Reduced to even order moment when u is an even integer N (ex: N=2 or 4)

HOCMN with non-integer moment orders

Generalized Moments

Page 21: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 21

Experimental Setup Aurora2 database

Training: Clean condition training Testing: Set A, B and C Development: All from clean training data

39-dimension feature coefficients C0~C12 MFCC, Δ, Δ2

Normalization performed on C0~C12

Page 22: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 22

Experimental Results

• Higher order moments can derive more robust features• Normalizing only three orders of moments are better than full distribution

Page 23: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 23

Experimental Results

Page 24: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 24

Experimental Results

Page 25: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 25

PDF Analysis

HEQ Over fitting to Gaussian Loss original statistics

HOCMN Fitting the generalized skewness

and kurtosis Retain more speech nature

HEQ

HOCMN

Original C0 & C1

Page 26: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 26

Distance Analysis Distance definition:

• HOCMN can derive smaller distance between clean and noisy speech• distance reduction has similar trend as error rate reduction

Page 27: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 27

Experimental Results

• Slight improvement for HOCMN with non-integer order moments• Especially for lower SNR values• Other robust techniques can be combined with it

Page 28: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 28

Experimental Results

Page 29: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 29

Experimental Results

For multi-condition training: HOCMN performs better than CMVN for

all SNR values Better than HEQ for higher SNR values

Page 30: 2007-02-081 Higher Order Cepstral Moment Normalization (HOCMN) for Robust Speech Recognition Speaker: Chang-wen Hsu Advisor: Lin-shan Lee 2007/02/08

2007-02-08 30

Conclusions We proposed a unified framework for

higher moment order cepstral normalization

Normalization of higher moment order gives more robust features

Parameter set can be appropriately selected by development set

Skewness/kurtosis/distance analysis can further demonstrate the concepts of the normalization techniques