![Page 1: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/1.jpg)
Neural Segmental CRFs for Sequence Modelling
Liang Lu
Based on the work with Lingpeng Kong @ CMU
28 June 2016
![Page 2: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/2.jpg)
Carry on from Mark1
• Sequence-to-sequence modelling◦ speech synthesis:
word sequence → waveform
◦ speech recognition:waveform → word sequence
◦ machine translation:word sequence → word sequence
1Assuming that you were in the talk with attention and long-term memory.2 of 27
![Page 3: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/3.jpg)
Next Question
• Is speech recognition more special?◦ monotonic alignment
◦ long input sequence
◦ output sequence is much shorter (word/phonme)
3 of 27
![Page 4: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/4.jpg)
Speech Recognition
• monotonic alignment◦ encoder-decoder model does not naturally apply◦ x1:T → c→ y1:L
• long input sequence◦ expensive for global normalised model
• output sequence is much shorter (word/phonme)◦ length mismatch
4 of 27
![Page 5: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/5.jpg)
Speech Recognition
• Hidden Markov Model
◦ monotonic alignment√
◦ long input sequence → locally normalised
◦ length mismatch → hidden states
• Connectionist Temporal Classification◦ monotonic alignment
√
◦ long input sequence → locally normalised
◦ length mismatch → blank state
5 of 27
![Page 6: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/6.jpg)
Speech Recognition
• Locally normalised models:◦ conditional independence assumption
◦ label bias problem
◦ better results given by sequence training: local → global normalisation
• Question:Why not sticking to the globally normalised models from scratch?
[1] D. Andor, et al, “Globally Normalized Transition-Based Neural Networks”, ACL, 2016.
[2] D. Povey, et al, “Purely sequence-trained neural networks for ASR based on lattice-free
MMI” Interspeech, 2016
6 of 27
![Page 7: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/7.jpg)
(Segmental) Conditional Random Field
CRF segmental CRF
7 of 27
![Page 8: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/8.jpg)
(Segmental) Conditional Random Field
• CRF [Lafferty et al. 2001]
P(y1:L | x1:T ) =1
Z (x1:T )
∏j
exp(w>Φ(yj , x1:T )
)(1)
where L = T .
• Segmental (semi-Markov) CRF [Sarawagi and Cohen 2004]
P(y1:L,E, | x1:T ) =1
Z (x1:T )
∏j
exp(w>Φ(yj , ej , x1:T )
)(2)
where ej = 〈sj , nj〉 denotes the beginning (sj) and end (nj) timetag of yj ; E = {e1:L} is the latent segment label.
8 of 27
![Page 9: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/9.jpg)
(Segmental) Conditional Random Field
1Z(x1:T )
∏j exp
(w>Φ(yj , x1:T )
)
• Learnable parameter w
• Engineering the feature function Φ(·)
• Designing Φ(·) is much harder for speech than NLP
9 of 27
![Page 10: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/10.jpg)
Neural Segmental CRF
• Using (recurrent) neural networks to learn the feature functionΦ(·).
x1 x2 x3 x4
y2y1
x5 x6
y3
[1] Y. Liu, et al, “Exploring Segment Representations for Neural Segmentation Models”,arXiv 2016.
10 of 27
![Page 11: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/11.jpg)
Neural conditional random fields
• Training criteria◦ Conditional maximum likelihood
L(θ) = logP(y1:L | x1:T )
= log∑E
P(y1:L,E | x1:T ) (3)
◦ Hinge loss – similar to structured SVM.
something complicated!
11 of 27
![Page 12: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/12.jpg)
Neural conditional random fields
• Viterbi decoding◦ Partially Viterbi decoding
y∗1:L = arg maxy1:L
log∑E
P(y1:L,E | x1:T ) (4)
◦ Fully Viterbi decoding
y∗1:L,E
∗ = arg maxy1:L,E
logP(y1:L,E | x1:T ) (5)
[1] L. Lu, et al, “Segmental Recurrent Neural Networks for End-to-end Speech
Recognition”, Interspeech 2016.
12 of 27
![Page 13: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/13.jpg)
Related works
• (Segmental) CRFs for speech
• Neural CRFs
• Structured SVMs
13 of 27
![Page 14: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/14.jpg)
Comparison to CTC
[1] A. Senior, et al, “Acoustic Modelling with CD-CTC-sMBR LSTM RNNs”, ASRU 2015.
14 of 27
![Page 15: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/15.jpg)
Comparison to CTC
x1 x2 x3 x4
y1 y2 y3 y4
15 of 27
![Page 16: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/16.jpg)
Comparison to CTC
x1 x2 x3 x4
y2 y3 y4<b>
16 of 27
![Page 17: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/17.jpg)
Comparison to CTC
x1 x2 x3 x4
y2 y3 y4<b>
×
17 of 27
![Page 18: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/18.jpg)
Comparison to CTC
x1 x2 x3 x4
y3 y4<b>
×
<b>
18 of 27
![Page 19: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/19.jpg)
Comparison to CTC
x1 x2 x3 x4
y3 y4<b>
×
<b>
×
19 of 27
![Page 20: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/20.jpg)
Comparison to CTC
x1 x2 x3 x4
y3 y4<b>
×
<b>
×
20 of 27
![Page 21: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/21.jpg)
Experiment
• TIMIT dataset◦ 3696 training utterances (∼ 3 hours)
◦ core test set (192 testing utterances)
◦ trained on 48 phonemes, and mapped to 39 for scoring
◦ log filterbank features (FBANK)
◦ using LSTM as an implementation of RNN
21 of 27
![Page 22: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/22.jpg)
Experiment
• Speed up training
x1 x2 x3 x4· · ·
x1 x2 x3 x4· · ·
a) concatenate / add
b) skip
22 of 27
![Page 23: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/23.jpg)
Experiment
Table: Results of tuning the hyperparameters.
Dropout layers hidden PER3 128 21.2
0.2 3 250 20.16 250 19.33 128 21.3
0.1 3 250 20.96 250 20.4
× 6 250 21.9
23 of 27
![Page 24: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/24.jpg)
Experiment
Table: Results of three types of acoustic features.
Features Deltas d(xt) PER24-dim FBANK
√72 19.3
40-dim FBANK√
120 18.9Kaldi × 40 17.3
Kaldi features – 39 dimensional MFCCs spliced by a context window of 7, followed byLDA and MLLT transform and with feature-space speaker-dependent MLLR
24 of 27
![Page 25: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/25.jpg)
Experiment
Table: Comparison to related works.
System LM SD PERHMM-DNN
√ √18.5
first-pass SCRF [Zweig 2012]√ × 33.1
Boundary-factored SCRF [He 2012] × × 26.5Deep Segmental NN [Abdel 2013]
√ × 21.9Discriminative segmental cascade [Tang 2015]
√ × 21.7+ 2nd pass with various features
√ × 19.9CTC [Graves 2013] × × 18.4RNN transducer [Graves 2013] – × 17.7Attention-based RNN [Chorowski 2015] – × 17.6Segmental RNN × × 18.9Segmental RNN × √
17.3
25 of 27
![Page 26: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/26.jpg)
Conclusion
• Neural Segmental CRFs are flexible and powerful sequence models◦ handwriting recognition
◦ joint word segmentation and POS tagging
• However, speed matters for large vocabulary speech recognition◦ WFST-based decoder
◦ context-dependent vs. context-independent phones
[1] L. Kong, et al, “Segmental Recurrent Neural Networks”, ICLR 2016.
26 of 27
![Page 27: Neural Segmental CRFs for Sequence Modelling · Boundary-factored SCRF [He 2012] 26.5 Deep Segmental NN [Abdel 2013] p 21.9 Discriminative segmental cascade [Tang 2015] p 21.7 + 2nd](https://reader036.vdocuments.site/reader036/viewer/2022070820/5f1cf24ebc326d395f26d18d/html5/thumbnails/27.jpg)
Thank you ! Questions?
27 of 27