a plsa-based language model for conversational telephone speech david mrva and philip c.woodland
DESCRIPTION
A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland. 2004/12/08 邱炫盛. Outline. Language Model PLSA Model Experimental Results Conclusion. Language Model. The task of a language model is to calculate probability n-gram model - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/1.jpg)
A PLSA-based Language Model for Conversational Telephone Speech
David Mrva and Philip C.Woodland
2004/12/08 邱炫盛
![Page 2: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/2.jpg)
Outline
• Language Model• PLSA Model• Experimental Results• Conclusion
![Page 3: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/3.jpg)
Language Model
• The task of a language model is to calculate probability
• n-gram model – Range of dependencies is limited to n-words – Information is ignored
)( ii hwP
),...,()( 11 iniiii wwwPhwP
![Page 4: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/4.jpg)
Language Model (cont.)
• Topic-based language model– Latent Semantic Analysis– Topic-based language model– PLSA-based language model
![Page 5: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/5.jpg)
PLSA Model• PLSA is general machine learning technique for
modeling the co-occurrences of events.
• Co-occurrence of words and documents
• Hidden variable = aspect
• PLSA in this paper is a mixture of unigram distribution.
![Page 6: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/6.jpg)
PLSA Model (cont.)
P(d)d w
P(w|d)
P(d)td w
P(t|d)
P(w|t)
Graphical Model Representation
![Page 7: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/7.jpg)
PLSA Model (cont.)
P(wj|z1)
P(wj|z2)
P(wj|zk)
∑
P(z1|di)
P(z2|di)
P(zk|di)w1 w2 w3…….wj
di
![Page 8: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/8.jpg)
PLSA Model (cont.)
N
i
M
j
K
kkjikij
N
i
M
j
dwnij
K
kkik
kikiii
zwpdzpdwn
dwpL
zwpdzp
zwpdzpzwpdzpzwpdzpdwp
ij
1 1 1
1 1
),(
1
2211
)|()|(log),(
))|((log(log
)|()|(
)|()|(...)|()|()|()|()|(
M: number of words in vocabulary
N: number of documents in training collection
K: number of aspects or topics
![Page 9: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/9.jpg)
PLSA Model (cont.)
iiii
K
kijkijz
K
kikjijk
K
kijijk
,d|wzijzikj,d|wzij
ijkikj
ijk
ikj
ij
K
kkjik
dddd
d,|wzp,d|wpd|,zwp,d|wzp
d|wp,d|wzp
d,|wpd|,zwpEd|wpE
,d|wzp|d,zwp,d|wzp|d,zwp
dwpzwpdzp
k
ijkkijk
~H~
~logˆ~logˆ
~logˆ
~log~log~log
Step-E
logloglog
|log||log
,,
11
1
1
:
![Page 10: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/10.jpg)
PLSA Model (cont.)
0~|ˆ
~1ˆ
1log ~
logˆ
logˆ~logˆ
H~H
0H~
H~
0H~
H~
0log~
log log~
log
1
1
1
11
,,
,,,,
,,,,
K
kijkijk
K
k ijk
ijkijk
K
k ijk
ijkijk
K
kijkijk
K
k
ijkijk
iiii
iiiiiiii
iiiiiiii
ijijijij
d,|wzp,dwzp
,d|wzpd,|wzp
,d|wzp
xx,d|wzpd,|wzp
,d|wzp
,d|wzp,d|wzpd,|wzp,d|wzp
dddd
dddddddd
dddddddd
|dwpd|wp|dwpd|wp
![Page 11: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/11.jpg)
K
k
ikkj
ikkj
ij
ikjijk
K
k
ikkjijk
K
kikjijk
iiij
dzp|zwp
dzp|zwp|dwp|d,zwp
,dwzp
dzp|zwp,d|wzp
d|,zwp,d|wzp
ddd|wp
1
1
1
,
|ˆˆ
|ˆˆˆ
ˆ|ˆ
|logˆmax
~logˆmax
~ maximum
~log maximum
conditional independent
![Page 12: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/12.jpg)
PLSA Model (cont.)
k
ik
j
kj
i kk j
i kk j
Tikj
M
j
K
kikijkijdzP
wkjk
N
i
M
jkjijkij|zwP
d ziki
z wkjk
N
i
M
j
K
k
ikkjijkij
d Tiki
z wkjk
C
dzpdzp,d|wzpdwn
zwp|zwp,d|wzpdwn
dzpzwp
dzp|zwp,dwzpdwn
dzpzwpLE
|1|logˆ,
|1logˆ,
|1|1
|log|ˆ,
|1|1
Step-M
1 1|
1 1
1 1 1
:
![Page 13: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/13.jpg)
PLSA Model (cont.)
![Page 14: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/14.jpg)
PLSA Model (cont.)
i
M
jijkij
M
jij
M
jijij
K
k
M
jijkij
M
jijkij
ik
M
j
N
i
ijkij
N
i
ijkij
kj
dn
,d|wzpdwn
dwn
,d|wzpdwn
,d|wzpdwn
,d|wzpdwndzP
,d|wzpdwn
,d|wzpdwn|zwP
k
1
1
1
1 1
1
1 1
1
,
,
,
,
,|
,
,
...difference take
![Page 15: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/15.jpg)
PLSA Model (cont.)
)|(1)|()|(
)|()|(1
1)|(
),(
)|(),()()|(
1
1 1
1
,
,1
ikK
q iqqi
ikkiik
dw
dw kkk
hzpii
hzpzwphzpzwp
ihzp
dwn
dzpdwnzphzp
Use PLSA in language model:P(zk|di) are used as mixture weights when calculating the word probability.The history hi is used instead of di to re-estimate these weight on the test set.
![Page 16: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/16.jpg)
PLSA Model (cont.)
K
kikkiii
ikK
qics
iqqi
icsikki
ik
i
hzpzwp)|hp(w
hzpbibi
hzpzwphzpzwp
bihzp
h
1
1
1)(
1
)(1
)|()|(
ndistrbutio cprior topi theof weight the:bth word-i theof score confidence the:cs(i)
)|(1))|()|((
))|()|((1)|(
document. theof topicabout the model the toavailablen informatioenough not and,history in the errorsn recognitio of Because
![Page 17: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/17.jpg)
PLSA Model (cont.)Account for the whole document history of word irrespective of the do
cument length.Have no means for representing the word order because of mixture o
f unigram distribution.
Combine n-gram with PLSA:
When PLSA used in decoding, Viterbi-based decoder is not suitable.Two-pass decoder:• First pass:
– n-gram, output a confidence score• Second pass:
– PLSA, rescoring the lattices
)()|()|()|(
iunigram
iiPLSAiigramnii wP
hwPhwPhwP
![Page 18: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/18.jpg)
PLSA Model (cont.)• During the re-scoring, the PLSA history comprises of all segments in
a document but the current segment.
• PLSA history is fixed for all words in a given segment.
• Refer to “history “ as “context” (ctx). It contains both past and future words.
![Page 19: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/19.jpg)
Experimental ResultsTwo Test Sets• NIST’s Hub5 speech-to-text evaluation 2002(eval02)
– Switchboard I and II– 62k words,19k form Switchboard I
• NIST’s Rich Transcription Spring 2003 CTS speech-to-text evalation(eval03)– Switchboard II phase 5 and Fisher– 74k words, 36k from Fisher
![Page 20: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/20.jpg)
Experimental Results (cont.)
![Page 21: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/21.jpg)
Experimental Results (cont.)• The reduction is greater if PLSA’s training text relates to
the test set.
• PP of (ref.ctx,10) <PP of (rec.ctx,10)
• b=10 is the best value
• Use of confidence score makes the PLSA model less sensitive to b
![Page 22: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/22.jpg)
Experimental Results (cont.)
![Page 23: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/23.jpg)
Experimental Results (cont.)• baseline: n-gram trained on 20M words of Fisher
transcripts. Increased to 500 classes• PLSA: 750 aspects,100 EM iterations• Separate into eval03dev,eval03tst
– Interpolation weight of the word and class-based n-gram were set to minimize perplexity.
– A slight improvement when side-based documents were used.
![Page 24: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/24.jpg)
Experimental Results (cont.)• b=100 is best value
– PLSA model needs much more data to estimate the topic of Fisher than SwbI
• Having a long context is very important.
![Page 25: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/25.jpg)
Experimental Results (cont.)
![Page 26: A PLSA-based Language Model for Conversational Telephone Speech David Mrva and Philip C.Woodland](https://reader036.vdocuments.site/reader036/viewer/2022070502/56814660550346895db3828a/html5/thumbnails/26.jpg)
Conclusion
• PLSA with the suggested modifications in a language model reduces perplexity.
• Future work:– Re-score lattices to calculate WERs– Combine semantics-oriented model with synta
x-based language model