yun-nung (vivian) chenyvchen/s105-icb/doc/170613_recenttr… · chen, et al., ^end-to-end memory...
TRANSCRIPT
![Page 1: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/1.jpg)
1
Y U N - N U N G ( V I V I A N ) C H E N
H T T P : / / V I V I A N C H E N . I D V . T W
H A K K A N I - T U R , T U R , G A O , D E N G
![Page 2: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/2.jpg)
Outline
Introduction
Spoken Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Contextual Spoken Language Understanding
Model Architecture
End-to-End Training
Experiments
Conclusion & Future Work
2
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
![Page 3: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/3.jpg)
Outline
Introduction
Spoken Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Contextual Spoken Language Understanding
Model Architecture
End-to-End Training
Experiments
Conclusion & Future Work
3
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
![Page 4: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/4.jpg)
Dialogue System Pipeline
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
4
ASRLanguage Understanding (LU)• User Intent Detection• Slot Filling
Dialogue Management (DM)• Dialogue State Tracking• Policy Decision
Output Generation
Hypothesisare there any action movies to see this weekend
Semantic Frame (Intents, Slots)request_moviegenre=actiondate=this weekend
System Actionrequest_locaion
Text responseWhere are you located?
Screen Displaylocation?
Text InputAre there any action movies to see this weekend?
Speech Signal
![Page 5: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/5.jpg)
LU Importance
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
5
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
15
30
45
60
75
90
10
5
12
0
13
5
15
0
16
5
18
0
19
5
21
0
22
5
24
0
25
5
27
0
28
5
30
0
31
5
33
0
34
5
36
0
37
5
39
0
40
5
42
0
43
5
45
0
46
5
48
0
49
5
Succ
ess
Rat
e
Simulation Epoch
Learning Curve of System Performance
Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05
RL Agent w/o LU errors
Rule Agent w/o LU errors
![Page 6: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/6.jpg)
LU Importance
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0
15
30
45
60
75
90
10
5
12
0
13
5
15
0
16
5
18
0
19
5
21
0
22
5
24
0
25
5
27
0
28
5
30
0
31
5
33
0
34
5
36
0
37
5
39
0
40
5
42
0
43
5
45
0
46
5
48
0
49
5
Succ
ess
Rat
e
Simulation Epoch
Learning Curve of System Performance
Upper Bound DQN - 0.00 DQN - 0.05 Rule - 0.00 Rule - 0.05
RL Agent w/o LU errors
RL Agent w/ 5% LU errors
Rule Agent w/o LU errors
Rule Agent w/ 5% LU errors
>5% performance drop
The system performance is sensitive to LU errors, for both rule-based and reinforcement learning agents.
![Page 7: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/7.jpg)
Dialogue System Pipeline
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
7
SLU usually focuses on understanding single-turn utterances
The understanding result is usually influenced by 1) local observations 2) global knowledge.
ASRLanguage Understanding (LU)• User Intent Detection• Slot Filling
Dialogue Management (DM)• Dialogue State Tracking• Policy Decision
Output Generation
Hypothesisare there any action movies to see this weekend
Semantic Frame (Intents, Slots)request_moviegenre=actiondate=this weekend
System Actionrequest_locaion
Text responseWhere are you located?
Screen Displaylocation?
Text InputAre there any action movies to see this weekend?
Speech Signal
current bottleneck error propagation
![Page 8: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/8.jpg)
Spoken Language Understanding
8
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
just sent email to bob about fishing this weekend
O O O OB-contact_name
O
B-subject I-subject I-subject
U
S
I send_email
D communication
send_email(contact_name=“bob”, subject=“fishing this weekend”)
are we going to fish this weekend
U1
S2
send_email(message=“are we going to fish this weekend”)
send email to bob
U2
send_email(contact_name=“bob”)
B-messageI-message
I-message I-message I-messageI-message I-message
B-contact_nameS1
Domain Identification Intent Prediction Slot Filling
![Page 9: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/9.jpg)
Outline
Introduction
Spoken Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Contextual Spoken Language Understanding
Model Architecture
End-to-End Training
Experiments
Conclusion & Future Work
9
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
![Page 10: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/10.jpg)
MODEL ARCHITECTURE
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
10
u
Knowledge Attention Distributionpi
mi
Memory Representation
Weighted Sum
h
∑ Wkg
oKnowledge Encoding
Representation
history utterances {xi}
current utterance
c
Inner Product
Sentence Encoder
RNNin
x1 x2 xi…
Contextual Sentence Encoder
x1 x2 xi…
RNNmem
slot tagging sequence y
ht-1 ht
V V
W W W
wt-1 wt
yt-1 yt
U U
RNN Tagger
M M
Idea: additionally incorporating contextual knowledge during slot tagging
Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
![Page 11: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/11.jpg)
MODEL ARCHITECTURE
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
11
u
Knowledge Attention Distributionpi
mi
Memory Representation
Weighted Sum
h
∑ Wkg
oKnowledge Encoding
Representation
history utterances {xi}
current utterance
c
Inner Product
Sentence Encoder
RNNin
x1 x2 xi…
Contextual Sentence Encoder
x1 x2 xi…
RNNmem
slot tagging sequence y
ht-1 ht
V V
W W W
wt-1 wt
yt-1 yt
U U
RNN Tagger
M M
Idea: additionally incorporating contextual knowledge during slot tagging
Chen, et al., “End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding,” in Interspeech, 2016.
1. Sentence Encoding 2. Knowledge Attention 3. Knowledge Encoding
CNN
CNN
![Page 12: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/12.jpg)
END-TO-END TRAINING
• Tagging Objective
• RNN Tagger
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
12
slot tag sequence contextual utterances & current utterance
ht-1 ht+1ht
V V V
W W W W
wt-1 wt+1wt
yt-1 yt+1yt
U U U
o
M M M
Automatically figure out the attention distribution without explicit supervision
![Page 13: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/13.jpg)
Outline
Introduction
Spoken Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Contextual Spoken Language Understanding
Model Architecture
End-to-End Training
Experiments
Conclusion & Future Work
13
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
![Page 14: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/14.jpg)
EXPERIMENTS
• Dataset: Cortana communication session data– GRU for all RNN
– adam optimizer
– embedding dim=150
– hidden unit=100
– dropout=0.5
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
14
Model Training SetKnowledge Encoding
Sentence Encoder
First Turn Other Overall
RNN Taggersingle-turn x x 60.6 16.2 25.5
The model trained on single-turn data performs worse for non-first turns due to mismatched training data
![Page 15: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/15.jpg)
EXPERIMENTS
• Dataset: Cortana communication session data– GRU for all RNN
– adam optimizer
– embedding dim=150
– hidden unit=100
– dropout=0.5
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
15
Model Training SetKnowledge Encoding
Sentence Encoder
First Turn Other Overall
RNN Taggersingle-turn x x 60.6 16.2 25.5
multi-turn x x 55.9 45.7 47.4
Treating multi-turn data as single-turn for training performs reasonable
![Page 16: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/16.jpg)
EXPERIMENTS
• Dataset: Cortana communication session data– GRU for all RNN
– adam optimizer
– embedding dim=150
– hidden unit=100
– dropout=0.5
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
16
Model Training SetKnowledge Encoding
Sentence Encoder
First Turn Other Overall
RNN Taggersingle-turn x x 60.6 16.2 25.5
multi-turn x x 55.9 45.7 47.4
Encoder-Tagger
multi-turn current utt (c) RNN 57.6 56.0 56.3
multi-turn history + current (x, c) RNN 69.9 60.8 62.5
Encoding current and history utterances improves the performance but increases the training time
![Page 17: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/17.jpg)
EXPERIMENTS
• Dataset: Cortana communication session data– GRU for all RNN
– adam optimizer
– embedding dim=150
– hidden unit=100
– dropout=0.5
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
17
Model Training SetKnowledge Encoding
Sentence Encoder
First Turn Other Overall
RNN Taggersingle-turn x x 60.6 16.2 25.5
multi-turn x x 55.9 45.7 47.4
Encoder-Tagger
multi-turn current utt (c) RNN 57.6 56.0 56.3
multi-turn history + current (x, c) RNN 69.9 60.8 62.5Proposed multi-turn history + current (x, c) RNN 73.2 65.7 67.1
Applying memory networks significantly outperforms all approaches with much less training time
![Page 18: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/18.jpg)
EXPERIMENTS
• Dataset: Cortana communication session data– GRU for all RNN
– adam optimizer
– embedding dim=150
– hidden unit=100
– dropout=0.5
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
18
Model Training SetKnowledge Encoding
Sentence Encoder
First Turn Other Overall
RNN Taggersingle-turn x x 60.6 16.2 25.5
multi-turn x x 55.9 45.7 47.4
Encoder-Tagger
multi-turn current utt (c) RNN 57.6 56.0 56.3
multi-turn history + current (x, c) RNN 69.9 60.8 62.5
Proposedmulti-turn history + current (x, c) RNN 73.2 65.7 67.1
multi-turn history + current (x, c) CNN 73.8 66.5 68.0
CNN produces comparable results for sentence encoding with shorter training time
![Page 19: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/19.jpg)
Outline
Introduction
Spoken Dialogue System
Spoken/Natural Language Understanding (SLU/NLU)
Contextual Spoken Language Understanding
Model Architecture
End-to-End Training
Experiments
Conclusion & Future Work
19
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
![Page 20: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/20.jpg)
Conclusion
• The proposed end-to-end memory networks store
contextual knowledge, which can be exploited dynamically
based on an attention model for manipulating knowledge
carryover for multi-turn understanding
• The end-to-end model performs the tagging task instead of
classification
• The experiments show the feasibility and robustness of
modeling knowledge carryover through memory networks
20
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
![Page 21: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/21.jpg)
Future Work
• Leveraging not only local observation but also global
knowledge for better language understanding
– Syntax or semantics can serve as global knowledge to guide
the understanding model
– “Knowledge as a Teacher: Knowledge-Guided Structural
Attention Networks,” arXiv preprint arXiv: 1609.03286
21
End
-to-En
d M
emo
ry Netw
orks fo
r Mu
lti-Turn
Spo
ken Lan
guage U
nd
erstand
ing Yu
n-N
un
g (Vivian
) Ch
en
![Page 22: YUN-NUNG (VIVIAN) CHENyvchen/s105-icb/doc/170613_RecentTr… · Chen, et al., ^End-to-End Memory Networks with Knowledge Carryover for Multi-Turn Spoken Language Understanding, in](https://reader033.vdocuments.site/reader033/viewer/2022052423/5f07af277e708231d41e3810/html5/thumbnails/22.jpg)
Q & AT H A N K S F O R Y O U R AT T E N T I O N !
22
The code will be available at https://github.com/yvchen/ContextualSLU