computational linguistics week 10
Post on 10-Feb-2017
507 Views
Preview:
TRANSCRIPT
Computa(onal Linguis(cs Week 10
Neural Sequence Modeling
Mark Chang
Outlines
• Recurrent Neural Networks • Long short-‐term Memory • Neural Turing Machine • Applica(ons
Recurrent Neural Networks
短期記憶
白
白日依山盡,黃河入海流
白日
白日依
…..
白日依山
短期記憶
白 n(白)
日 n(日)
n W1
W2
x1
x2
bWb
y
n W1
W2
x1
x2
bWb
y
Recurrent Neural Network
白
日 n(n(白),日)
n(白)
依 n(n(n(白),日),依)
類神經網路到深度學習
Feedforward Neural Network Recurrent Neural Network
Long Short Term Memory Neural Turing Machine
Recurrent Neural Network
n
in,t
= w
c
x
t
+ w
p
n
out,t�1 + w
b
n
out,t
=1
1 + e
�nin,t
把上一個時間點的nout,接回這個時間點的nin
Recurrent Neural Network
….
x0
y0 y1
x1 x2
y2 yt
xt
Recurrent Neural Network
x0 x1 xt-‐1 xt
y0 y1 yt-‐1 yt
Backward Propaga(on Through Time
t = 0
�in,0 =
@J
@nout,0
@nout,0
@nin,0
= �out,0
@nout,0
@nin,0
t = 1 �in,0=
@J
@nout,1
@nout,1
@nin,1
@nin,1
@nout,0
@nout,0
@nin,0
=�out,1
@nout,1
@nin,1
@nin,1
@nout,0
@nout,0
@nin,0
=�in,1
@nin,1
@nout,0
@nout,0
@nin,0
=�out,0
@nout,0
@nin,0
Backward Propaga(on Through Time
�in,s
=
8>><
>>:
@J
@nout,s
@nout,s
@nin,s
if s = t
�in,s+1
@nin,s+1
@nout,s
@nout,s
@nin,s
otherwise
http://cpmarkchang.logdown.com/posts/278457-neural-network-recurrent-neural-network
�in,s+1�in,s
= �in,s+1
@nin,s+1
@nout,s
@nout,s
@nin,s
�in,t
=@J
@nout,t
@nout,t
@nin,t
Deep RNN
y0
x0
y1
x1
yt-‐1
xy-‐1
yt
xt
Bi-‐Direc(onal RNN
x0
x0
x1
x1
xt-‐1
xy-‐1
xt
xt
y0 y1 yt-‐1 yt
Long Short-‐Term Memory
Vanishing Gradient Problem
�in,0
�in,0 = �
out,t
@nout,t
@nin,t
@nin,t
@nout,t�1
...@n
in,1
@nout,0
@nout,0
@nin,0
�out,t
Long Short-‐Term Memory
xt m yt
Cin
c cc
k n
b
nout
Memory Cell kout
Cread Cforget Cwrite
mout,t
mout,t-‐1
Cout min,t
Long Short-‐Term Memory 輸入值 Cin
讀取開關 Cread 遺忘開關 Cforget 寫入開關 Cwrite
輸出值 Cout
Long Short-‐Term Memory
• 寫入開關Cwrite:控制是否可寫入記憶體
C
write
= sigmoid(wcw,x
x
t
+ w
cw,y
y
t�1 + w
cw,b
)
k
out
= sigmoid(wk,x
x
t
+ w
k,b
)
min,t = kout
Cwrite
Long Short-‐Term Memory
• 遺忘開關Cforget:控制是否保留之前的值
C
forget
= sigmoid(wcf,x
x
t
+ w
cf,y
y
t
+ w
cf,b
)
mout,t = min,t +C
forget
mout,t�1
Long Short-‐Term Memory
• 讀取開關Cread :控制是否可讀取記憶體
n
out
= sigmoid(mout,t
)
C
read
= sigmoid(wcr,x
x
t
+ w
cr,y
y
t�1 + w
cr,b
)
Cout
nout
= Cread
Training: Backward Propaga(on
hRp://www.felixgers.de/papers/phd.pdf
mout,t = min,t +C
forget
mout,t�1 min,t = k
out
Cwrite
@mout,t
@wk,x
=@m
in,t
@wk,x
+ Cforget
@mout,t�1
@wk,x
= Cwrite
@kout
@wk,x
+ Cforget
@mout,t�1
@wk,x
Long-‐Short Term Memory
https://class.coursera.org/neuralnets-2012-001/lecture/95
Neural Turing Machine
Neural Turing Machine
Input Output
Read/Write Head
controller
Memory
Memory
Memory Address
Memory Block
Block Length
0 1 … i … n
0
j
m
…
…
Read Opera(on
11 2
21 3
42 1
Read Opera(on:
0 00 00.9 0.1
0 1 … i … n
2
64
r0
r1
r2
3
75 =
2
64
1 ⇤ 0.9 + 2 ⇤ 0.11 ⇤ 0.9 + 1 ⇤ 0.12 ⇤ 0.9 + 4 ⇤ 0.1
3
75 =
2
64
1.1
1.0
2.2
3
75
X
i
w(i) = 1, 0 w(i) 1, 8i
r X
i
w(i)M(i)
Read Vector: rHead Loca(on: w
Memory : M1.1
1.0
2.2
Erase Opera(on
Erase Opera(on:
0
1
111 2
21 3
42 1
0 00 00.9 0.1
0 1 … i … n
0
j
m
…
…
11 2
3
1
0.1 1.8
0.2 3.6 0 e(j) 1, 8j
M =
2
64
1(1� 0.9) 2(1� 0.1) 3 ...
1 1 2 ...
2(1� 0.9) 4(1� 0.1) 1 ...
3
75 =
2
64
0.1 1.8 3 ...
1 1 2 ...
0.2 3.6 1 ...
3
75
M(i) (1� w(i)e)M(i)
Head Loca(on: w
Erase Vector: e
Memory : M
Add Opera(on
Add Opera(on:
1
1
0
0 00 00.9 0.1
0 1 … i … n
11 2
3
1
0.1 1.8
0.2 3.6
2
3
10.2 3.6
1.9
1.9
1.1
1.0
M =
2
64
0.1 + 0.9 1.8 + 0.1 3 ...
1.0 + 0.9 1.0 + 0.1 2 ...
0.2 3.6 1 ...
3
75 =
2
64
1.0 1.9 3 ...
1.9 1.1 2 ...
0.2 3.6 1 ...
3
75
M(i) M(i) + w(i)a
Add Vector: a
Memory : M
Head Loca(on: w
0
j
m
…
…
Controller controller
Input
Read Vector: r
Head Loca(on: w
Output
Add Vector: aErase Vector: e
Addressing Mechanisms
Content Addressing Parameter: Interpola(on Parameter: Convolu(onal Shi^ Parameter: Sharpening Parameter:
Memory Key: k
s
�
�
g
0 0000 1
.45 .05 .50 0 0 0
.45 .05 .50 0 0 0
0 0 0 1 0 0
Head Loca(on: w
11 2 04 0
21 3 01 1
42 1 15 0
0 00 00.9 0.1
wt�1Head Loca(on:
MMemory: Previous State
2
3
1
Memory Key: k
� = 50
g = 0.5
00 1s =
� = 50
Controller Outputs
Content Addressing
Interpola(on
Convolu(onal Shi^
Sharpening
Content Addressing
11 2 04 0
21 3 01 1
42 1 15 02
3
1
.16 .16 .16 .16 .16 .16 0 0000 1 .15 .10 .47 .08 .13 .17
Memory Key: kMemory : M
Head Loca(on: w
K[u,v] =u · v
|u| · |v|w(i) e�K[k,M(i)]
Pj e
�K[k,M(j)]
� = 50 � = 5 � = 0
找出記憶體 中與 內容相近的位置。 參數 :調整集中度
M k
�
Interpola(on
0 00 00.9 0.1
0 0000 1
0 0000 1 0 00 00.9 0.1 .45 .05 .50 0 0 0
wt�1
wt
g = 1 g = 0.5 g = 0
wt gwt + (1� g)wt�1
將讀寫頭位置 與上一個時段位置 結合。 參數 :調整目前的與上個時段的比率
wt wt�1
g
Convolu(onal Shi^
.45 .05 .50 0 0 0 .45 .05 .50 0 0 0
.45 .05 .50 0 0 0 .45 .05 .50 0 0 0
.45 .05 .50 0 0 0
.025 .475 .025 .25 0 .225
01 0 00 1 .5 0 .5
-‐1 0 1 -‐1 0 1 -‐1 0 1
s = s = s =
wi�1 wi wi+1
s1s0s�1
wi
w(i) X
j
w(j)s(i� j)
w(i) w(i� 1)s(1) + w(i)s(0) + w(i+ 1)s(�1)
s
將 內的數值做平移。 參數 :調整平移方向 s
w w
w
Sharpening
0 0 0 1 0 0 0 .37 0 .62 0 0
0 .45 .05 .50 0 0
.16 .16 .16 .16 .16 .16
w(i) w(i)�Pj w(j)
�
� = 50 � = 5 � = 0
使 中的值更集中(或分散)。 參數 :調整集中度 �
w
w
Experiment: Repeat Copy
hRps://github.com/fumin/ntm
Evolu(on of Recurrent Neural Network
Recurrent Neural Network
Long Short Term Memory
Neural Turing Machine
短期記憶
可控制記憶體的讀寫
可更靈活地控制記憶體讀寫頭的位置
Applica(ons
Machine Transla(on
hRp://arxiv.org/pdf/1409.3215.pdf
A B C -‐> W X Y Z
Chinese Word Segmenta(on
hRp://arxiv.org/pdf/1602.04874v1.pdf
Chinese Poetry Genera(on
hRp://emnlp2014.org/papers/pdf/EMNLP2014074.pdf
Image Cap(on Genera(on
hRp://arxiv.org/pdf/1411.4555v2.pdf
Visual Ques(on Answering
hRp://arxiv.org/pdf/1505.00468v6.pdf
Further Reading
• The Unreasonable Effec(veness of RecurrentNeural Networks – hRp://karpathy.github.io/2015/05/21/rnneffec(veness/
• Understanding LSTM Networks – hRp://colah.github.io/posts/2015-‐08-‐Understanding-‐LSTMs/
• Recurrent Neural Networks – hRp://cpmarkchang.logdown.com/posts/278457-‐neural-‐network-‐recurrent-‐neural-‐network
• Neural Turing Machine – hRp://cpmarkchang.logdown.com/posts/279710-‐neural-‐network-‐neural-‐turing-‐machine
top related