minimum phoneme error based heteroscedastic linear discriminant analysis for speech recognition bing...

17
Minimum Phoneme Error Based Het eroscedastic Linear Discriminan t Analysis For Speech Recogniti on Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton St. Cambridge Reporter : Chang Chih Hao

Upload: ursula-fox

Post on 29-Dec-2015

225 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech

RecognitionBing Zhang and Spyros Matsoukas,

BBN Technologies, 50 Moulton St. Cambridge

Reporter : Chang Chih Hao

Page 2: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

Introduction

• LDA and HLDA– Better classification accuracy

– some common Limitations• None of them assumes any prior knowledge of confusable hypotheses

• Their objective functions do not directly relate to the word error rate (WER)

• Minimum Phoneme Error– Minimize phoneme errors in lattice-based training frameworks

– Since this criterion is closely related to WER, MPE_HLDA tends to be more robust than other projection methods, which makes it potentially better suited for a wider variety of features.

Page 3: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

MPE Objection Function

• MPE-HLDA model

• MPE-HLDA aims at minimizing expected number of phoneme errors introduced by the MPE-HLDA model in a given hypothesis lattice, or equivalently maximizing the function

m m

Tm m

t t

A

C diag A A

o Ao

, | (4)

is the total number of training utterances,

is the sequence of p-dimensional observation vectors in utterance r,

is the "raw accuracy" score of wor

r

R

MPE r r rr w

r

r

F O P w O w

R

O

w

d hypothesis .rw

,m mC

Page 4: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

MPE Objection Function

| is the posterior probability of hypothesis in the lattice

| |

|

is the language model probability of hypothesis ,

k : in order

r

r r r

k

r r r

r r k

r r rw

r r

P w O w

P O w P wP w O

P O w P w

P w w

to reduce acoustic scores dynamic range, thereby avoiding

the concentration of all posterior mass in the top-1 hypothesis of the lattice.

Page 5: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

MPE Objection Function

• It can be shown that the derivative of (4) with respect to A is

, log | ,, (6)

, | ,

is the MPE score of utterance r (average accuracy over all hypotheses),

is the average accuracy ove

r

RMPE qr r

rr q

r r qr r

r

F O P O q rk D q r

A A

where

D q r P q O r q r

r

q

rr all hypotheses that contain arc q .

Page 6: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

MPE Objection Function

log | , log | ,

and are the begin and end time of are ,

denotes the posterior probability of Gaussian m in arc at time t.

qr

qr

qr

r r

qr

Eqr r tm

t S m

q q r

mr

P O q r P o mt

A A

S E q

t q

1 1 1

1 1 1 1

log | ,t m mm m t p m m t

T T

m m t m t m m m p m m t m t m

Tmt t m t m

Tmt t m t m

P o mC C P I A C R

A

C C diag o o A C I A C o o

where

P diag o o

R o o

Page 7: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

MPE Objection Function

• Therefore, Eq.(6) can be rewritten as

1 1

1

,

,

,

,

qr

r

r qr

qr

r

r qr

qr

r

r qr

MPE

m m m m p mm

Em

m r qr q t S

Em m

m r q tr q t S

Em m

m r q tm r q t S

F Ok C C g I A kJ

A

where

D q r t

g D q r t P

J C D q r t R

39*39

39*162

Page 8: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

MPE-HLDA Implementation

• In theory, the derivative of the MPE-HLDA objective function can be computed based on Eq.(12), via s single forward-backward pass over the training lattices. In practice, however, it is not possible to fit all the full covariance matrices in memory.

• Two steps– First, run a forward-backward pass over the training lattices to acumulate

– Second, uses these statistics together with the full covariance matrices to synthesize the derivative.

• The Paper used gradient descent in updating the projection matrix.

Page 9: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

MPE-HLDA Implementation

Page 10: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

Experimental Framework

A Lp*n

n*l

l*1

p*1

Global feature projection

---there is more useful information in longer contexts

---Reduce the computational cost

Page 11: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

Experimentation

• Conversational Telephone Speech (CTS)– 2300 hours of training data

• 800 hours : training the initial ML model

• 1500 hours : held-out training data – Lattice generation

– Discriminative training

– MPE-HLDA : only 370 hours

– Testing set• Eval03

• Dev04

Page 12: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

Experimentation

• Conversational Telephone Speech (CTS)– Feature

• Frame concatenated PLP cepstra– 15 frames, l = 225, n = 130, p = 60

Page 13: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

Experimentation

Page 14: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

Experimentation

• Broadcast News (BN)– 600 hours : training the initial model (Hub4 and TDT)

– 330 hours : held-out data

Page 15: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton

Thanks

Page 16: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton
Page 17: Minimum Phoneme Error Based Heteroscedastic Linear Discriminant Analysis For Speech Recognition Bing Zhang and Spyros Matsoukas, BBN Technologies, 50 Moulton