deep learning neural network with memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › mlds_2015_2...
TRANSCRIPT
![Page 1: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/1.jpg)
Neural Network with Memory
Hung-yi Lee
![Page 2: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/2.jpg)
Memory is important
47
47
11
1 2 3
Input:2 dimensions
Output:1 dimension
𝑥1 𝑥2 𝑥3
𝑦1 𝑦2 𝑦3
1 4 4
1 7 7
1
+
Network needs memory to achieve this
11
23
![Page 3: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/3.jpg)
Memory is important
x1 x2
y
c1
10
10
Network with Memory 1-10
1 1 1 1
1
1
MemoryCell4
747
11
1 2 3
𝑥1 𝑥2 𝑥3
𝑦1 𝑦2 𝑦3
0
4 7
1111
111
1
1
![Page 4: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/4.jpg)
Memory is important
x1 x2
y
c1
10
10
Network with Memory 1-10
1 1 1 1
1
1
MemoryCell4
747
11
1 2 3
𝑥1 𝑥2 𝑥3
𝑦1 𝑦2 𝑦3
0
4 7
1212
121
2
11
![Page 5: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/5.jpg)
Memory is important
x1 x2
y
c1
10
10
Network with Memory 1-10
1 1 1 1
1
1
MemoryCell4
747
11
1 2 3
𝑥1 𝑥2 𝑥3
𝑦1 𝑦2 𝑦3
0
1 1
33
30
3
110
![Page 6: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/6.jpg)
Outline
Long Short-term Memory (LSTM)
Variants of RNN
Vanilla Recurrent Neural Network (RNN)
![Page 7: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/7.jpg)
Outline
Long Short-term Memory (LSTM)
Variants of RNN
Vanilla Recurrent Neural Network (RNN)
![Page 8: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/8.jpg)
Application
• (Simplified) Speech Recognition
TSI TSI TSI I I N N N
x1 x2 x3 x4
y1 y2 y3 y4 ……
……
S S @ @ @ @
x1 x2 x3 x4
y1 y2 y3 y4 ……
……
Utterance 1 Utterance 2
We use DNN. All the frames are considered independently.
![Page 9: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/9.jpg)
RNN input: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁
RNN
x1
y1=softmax(Woa1)
Input of RNN is one utterance
WiWh
Wo
a1=σ(Wix1+Wh0)memory
copy
The order cannot change.
y1
a10
![Page 10: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/10.jpg)
RNN input: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁
RNN
x2
y2=softmax(Woa2)
WiWh
Wo
a2=σ(Wix2+Wha1)memory
copy
y1 y2
a10 a2
Input of RNN is one utterance
The order cannot change.
![Page 11: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/11.jpg)
RNN input: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁
RNN
x3
y3=softmax(Woa3)
WiWh
Wo
a3=σ(Wix3+Wha2)memory
copy
y1 y2
a10 a2
y3
a3
Input of RNN is one utterance
The order cannot change.
![Page 12: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/12.jpg)
RNN
copy copy
x1 x2 x3
y1 y2 y3
Output yi depends on x1, x2, …… xi
Wi
Wo
Wh Wh WhWi Wi
Wo Wo
0a1
a1 a2
a2 a3
The same network is used again and again.
Input data: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁 Input of RNN is one utterance
init
![Page 13: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/13.jpg)
RNN
Output yi depends on x1, x2, …… xi
The same network is used again and again.
x1 x2 x3
y1 y2 y3
Wi
WhWo
……Wh Wh
Wi
Wo
Wi
Wo
0
Input data: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁 Input of RNN is one utterance
init
![Page 14: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/14.jpg)
Cost
RNN input: 𝑥1 𝑥2 𝑥3 …… 𝑥𝑁
RNN output: 𝑦1 𝑦2 𝑦3 …… 𝑦𝑁
RNN output: 𝑦1 𝑦2 𝑦3 …… 𝑦𝑁
𝐶 =1
2
𝑛=1
𝑁
𝑦𝑛 − 𝑦𝑛 2 𝐶 =1
2
𝑛=1
𝑁
−𝑙𝑜𝑔𝑦𝑟𝑛𝑛
![Page 15: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/15.jpg)
Training
RNN Training is very difficult in practice.
Backpropagation through time (BPTT)
𝑦1 𝑦2 𝑦3
x1 x2 x3
y1 y2 y3
Wi
WhWo
init……
Wh Wh
Wi
Wo
Wi
Wo
0
𝑤 is an element in Wh, Wi or Wo 𝑤 ← 𝑤 − 𝜂𝜕𝐶 ∕ 𝜕𝑤
![Page 16: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/16.jpg)
More Applications
• Input and output are vector sequences with the same length
POS Tagging
John saw the saw.
PN V D N
x1 x2 x3 x4
y1 y2 y3 y4
x1 x2 x3
y1 y2 y3
![Page 17: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/17.jpg)
More Applications
• Name entity recognition• Identifying names of people, places, organizations, etc.
from a sentence
• Harry Potter is a student of Hogwarts and lived on Privet Drive.
• people, organizations, places, not a name entity
• Information extraction• Extract pieces of information relevant to a specific
application, e.g. flight booking
• I would like to leave Boston on November 2nd and arrive in Taipei before 2 p.m.
• place of departure, destination, time of departure, time of arrival, other
![Page 18: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/18.jpg)
Outline
Long Short-term Memory (LSTM)
Variants of RNN
Vanilla Recurrent Neural Network (RNN)
![Page 19: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/19.jpg)
Elman Network & Jordan Network
xt xt+1
yt yt+1
……Wh
Wi
Wo
Wi
Wo
……
xt xt+1
yt yt+1
……
Wh
Wi
Wo
Wi
Wo
……
Elman Network Jordan Network
![Page 20: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/20.jpg)
Deep RNN
…… ……
xt xt+1 xt+2
……
……
yt
……
……
yt+1…
…yt+2
……
……
![Page 21: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/21.jpg)
Bidirectional RNN
yt+1
…… ……
…………yt+2yt
xt xt+1 xt+2
xt xt+1 xt+2
![Page 22: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/22.jpg)
Many to one
• Input is a vector sequence, but output is only one vector
Sentiment Analysis
……
我 覺 太得 糟 了
超好雷
好雷
普雷
負雷
超負雷
看了這部電影覺得很高興 …….
這部電影太糟了…….
這部電影很棒 …….
Positive (正雷) Negative (負雷) Positive (正雷)
……
![Page 23: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/23.jpg)
Many to Many (Output is shorter)
• Both input and output are vector sequences, but the output is shorter.
Speech Recognition
x1 x2 x3 x4 ……
好 好 好
Trimming
棒 棒 棒 棒 棒
“好棒”
You can never recognize “好棒棒” !
![Page 24: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/24.jpg)
Many to Many (Output is shorter)
• Both input and output are vector sequences, but the output is shorter.
• Connectionist Temporal Classification (CTC)
• Add an extra symbol “φ” (同上)
好 φ φ 棒 φ φ φ φ
好 φ φ 棒 φ 棒 φ φ
“好棒”
“好棒棒”
![Page 25: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/25.jpg)
learnin
g
Many to Many (No Limitation)
• Both input and output are vector sequences with different lengths. → Sequence to sequence learning
Machine Translation
mach
ine
機 器 學 習
……
……
never-ending
慣 性
![Page 26: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/26.jpg)
Many to Many (No Limitation)
•推文接龍• Ref: http://pttpedia.pixnet.net/blog/post/168133002-
%E6%8E%A5%E9%BE%8D%E6%8E%A8%E6%96%87
推xxx: ptt萬歲推dd: 歲平安噓dddf: 全推zzzzzzzzzzz: 家就是你家
……
推tlkagk: =========斷==========
![Page 27: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/27.jpg)
Many to Many (No Limitation)
• Both input and output are vector sequences with different lengths. → Sequence to sequence learning
Add a symbol “===“ (斷)learn
ing
mach
ine
機 器 學 習
===
![Page 28: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/28.jpg)
One to Many
• Input is one vector, but output is a vector sequence
Caption generationInput
a woman is throwing a
……
…… ===(斷)
![Page 29: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/29.jpg)
Outline
Long Short-term Memory (LSTM)
Variants of RNN
Vanilla Recurrent Neural Network (RNN)
![Page 30: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/30.jpg)
MemoryCell
Long Short-term Memory (LSTM)
Input Gate
Output Gate
Signal control the input gate
Signal control the output gate
Forget Gate
Signal control the forget gate
Other part of the network
Other part of the network
(Other part of the network)
(Other part of the network)
(Other part of the network)
LSTM
Special Neuron:4 inputs, 1 output
![Page 31: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/31.jpg)
𝑧
𝑧𝑖
𝑧𝑓
𝑧𝑜
𝑔 𝑧
𝑓 𝑧𝑖multiply
multiplyActivation function f is usually a sigmoid function
Between 0 and 1
Mimic open and close gate
c
𝑐′ = 𝑔 𝑧 𝑓 𝑧𝑖 + 𝑐𝑓 𝑧𝑓
ℎ 𝑐′𝑓 𝑧𝑜
𝑎 = ℎ 𝑐′ 𝑓 𝑧𝑜
𝑔 𝑧 𝑓 𝑧𝑖
𝑐′
𝑓 𝑧𝑓
𝑐𝑓 𝑧𝑓
𝑐
![Page 32: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/32.jpg)
x1 x2 Input
Original Network:
Simply replace the neurons with LSTM
𝑎1 𝑎2
……
……
𝑧1 𝑧2
![Page 33: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/33.jpg)
x1 x2
+
+
+
+
+
+
+
+
Input
𝑎1 𝑎2
4 times of parameters
![Page 34: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/34.jpg)
LSTM - Example
100
0
𝑥1
𝑥2
𝑥3
𝑦
310
0
200
0
410
0
200
0
101
7
3-10
0
610
0
101
6
When x2 = 1, add the numbers of x1 into the memory
When x3 = 1, output the number in the memory.
0 0 3 3 7 7 7 0 6
When x2 = -1, reset the memory
![Page 35: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/35.jpg)
𝑥1
𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
x1 x2 x3
x1
x2
x3
x1
x2
x3
x1
x2
x3
y
1 0 0
100
0
0
1
1
1
1
0
-10
0
0
100
-10
0
010
1000
![Page 36: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/36.jpg)
𝑥1
𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
3 1 0
3
1
0
3
1
0
3
1
0
y
1 0 0
100
0
0
1
1
1
1
0
-10
0
0
100
-10
0
010
100
3
≈1 3
≈103
3≈0
≈0
![Page 37: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/37.jpg)
𝑥1
𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
4 1 0
4
1
0
4
1
0
4
1
0
y
1
1
1
1
4
≈1 4
≈137
7≈0
≈0
1 0 0
100
0
0
0
-10
0
0
100
-10
0
010
100
![Page 38: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/38.jpg)
𝑥1
𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
2 0 0
2
0
0
2
0
0
2
0
0
y
1
1
1
1
2
≈0 0
≈17
7≈0
≈0
1 0 0
100
0
0
0
-10
0
0
100
-10
0
010
100
![Page 39: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/39.jpg)
𝑥1
𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
1 0 1
1
0
1
1
0
1
1
0
1
y
1
1
1
1
1
≈0 0
≈17
7≈1
≈7
1 0 0
100
0
0
0
-10
0
0
100
-10
0
010
100
![Page 40: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/40.jpg)
𝑥1
𝑥2
𝑥3
𝑦
310
0
410
0
200
0
101
7
3-10
0
+
+
+
+
3 -1 0
3
-1
0
3
-1
0
3
-1
0
y
1
1
1
1
3
≈0 0
≈07
0≈0
≈0
0
1 0 0
100
0
0
0
-10
0
0
100
-10
0
010
100
![Page 41: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/41.jpg)
What is the next wave?
• Attention-based Model
Reading Head Controller
Input x
Reading Head Writing Head
DNN
Writing Head Controller
output y
…… ……Memory
![Page 42: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/42.jpg)
Recommended Reading List
• The Unreasonable Effectiveness of Recurrent Neural Networks
• http://karpathy.github.io/2015/05/21/rnn-effectiveness/
• Understanding LSTM Networks
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
![Page 43: Deep Learning Neural Network with Memoryspeech.ee.ntu.edu.tw › ~tlkagk › courses › MLDS_2015_2 › Lecture...More Applications •Input and output are vector sequences with the](https://reader033.vdocuments.site/reader033/viewer/2022060321/5f0d25907e708231d438e7cd/html5/thumbnails/43.jpg)
Acknowledgement
•感謝葉軒銘同學於上課時發現投影片上的錯誤