Download - Computation Graphs - cs.umd.edu
![Page 1: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/1.jpg)
Computation Graphs
Brian Thompsonslides by Philipp Koehn
2 October 2018
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 2: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/2.jpg)
1Neural Network Cartoon
• A common way to illustrate a neural network
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 3: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/3.jpg)
2Neural Network Math
• Hidden layerh = sigmoid(W1x+ b1)
• Final layery = sigmoid(W2h+ b2)
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 4: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/4.jpg)
3Computation Graph
sigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 5: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/5.jpg)
4Simple Neural Network
11
4.5
-5.2
-2.0-4.6-1.5
3.72.9
3.7
2.9
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 6: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/6.jpg)
5Computation Graph
sigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
[3.7 3.72.9 2.9
]
[4.5 −5.2
]
[−1.5−4.6
]
[−2.0
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 7: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/7.jpg)
6Processing Input
sigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
[3.7 3.72.9 2.9
]
[4.5 −5.2
]
[−1.5−4.6
]
[−2.0
]
[1.00.0
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 8: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/8.jpg)
7Processing Input
sigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
[3.7 3.72.9 2.9
]
[4.5 −5.2
]
[−1.5−4.6
]
[−2.0
]
[1.00.0
][3.72.9
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 9: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/9.jpg)
8Processing Input
sigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
[3.7 3.72.9 2.9
]
[4.5 −5.2
]
[−1.5−4.6
]
[−2.0
]
[1.00.0
][3.72.9
][2.2−1.6
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 10: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/10.jpg)
9Processing Input
sigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
[3.7 3.72.9 2.9
]
[4.5 −5.2
]
[−1.5−4.6
]
[−2.0
]
[1.00.0
][3.72.9
][2.2−1.6
][.900.168
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 11: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/11.jpg)
10Processing Input
sigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
[3.7 3.72.9 2.9
]
[4.5 −5.2
]
[−1.5−4.6
]
[−2.0
]
[1.00.0
][3.72.9
][2.2−1.6
][.900.168
][3.18
][1.18
][.765
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 12: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/12.jpg)
11Error Function
• For training, we need a measure how well we do
⇒ Cost function
also known as objective function, loss, gain, cost, ...
• For instance L2 normerror =
1
2(t− y)2
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 13: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/13.jpg)
12Gradient Descent
• We view the error as a function of the trainable parameters
error(λ)
• We want to optimize error(λ) by moving it towards its optimum
λ
error(λ)
gradient = 1
current λoptimal λ
λ
error(λ)
grad
ient =
2
current λoptimal λ
λ
error(λ)
gradient = 0.2
current λoptimal λ
• Why not just set it to its optimum?
– we are updating based on one training example, do not want to overfit to it– we are also changing all the other parameters, the curve will look different
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 14: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/14.jpg)
13Calculus Refresher: Chain Rule
• Formula for computing derivative of composition of two or more functions
– functions f and g– composition f ◦ g maps x to f(g(x))
• Chain rule
(f ◦ g)′ = (f ′ ◦ g) · g′
or
F ′(x) = f ′(g(x))g′(x)
• Leibniz’s notationdzdx = dz
dy ·dydx
if z = f(y) and y = g(x), thendzdx = dz
dy ·dydx = f ′(y)g′(x) = f ′(g(x))g′(x)
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 15: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/15.jpg)
14Final Layer Update
• Linear combination of weights s =∑
kwkhk
• Activation function y = sigmoid(s)
• Error (L2 norm) E = 12(t− y)
2
• Derivative of error with regard to one weight wk
dE
dwk=dE
dy
dy
ds
ds
dwk
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 16: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/16.jpg)
15Error Computation in Computation Graph
L2
tsigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 17: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/17.jpg)
16Error Propagation in Computation Graph
E
B
A
• Compute derivative at node A: dEdA = dE
dBdBdA
• Assume that we already computed dEdB (backward pass through graph)
• So now we only have to get the formula for dBdA
• For instance B is a square node
– forward computation: B = A2
– backward computation: dBdA = dA2
dA = 2A
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 18: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/18.jpg)
17Derivatives for Each Node
L2
tsigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
dL2dsigmoid = do
di =ddi
12(t− i)
2 = t− i
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 19: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/19.jpg)
18Derivatives for Each Node
L2
tsigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
dL2dsigmoid = do
di =ddi
12(t− i)
2 = t− i
dsigmoiddsum = do
di =ddiσ(i) = σ(i)(1− σ(i))
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 20: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/20.jpg)
19Derivatives for Each Node
L2
tsigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
dL2dsigmoid = do
di =ddi
12(t− i)
2 = t− i
dsigmoiddsum = do
di =ddiσ(i) = σ(i)(1− σ(i))
dsumdprod = do
di1= d
di1i1 + i2 = 1, do
di2= 1
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 21: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/21.jpg)
20Derivatives for Each Node
L2
tsigmoid
sum
b2prod
W2sigmoid
sum
b1prod
W1x
dL2dsigmoid = do
di =ddi
12(t− i)
2 = t− i
dsigmoiddsum = do
di =ddiσ(i) = σ(i)(1− σ(i))
dsumdprod = do
di1= d
di1i1 + i2 = 1, do
di2= 1
dsumdprod = do
di1= d
di1i1i2 = i2, do
di2= i1
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 22: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/22.jpg)
21Backward Pass: Derivative Computation
t
b2
W2
b1
W1x
L2
sigmoid
sum
prod
sigmoid
sum
prod
i2 − i1
σ′(i)
1, 1
i2, i1
σ′(i)
1, 1
i2, i1
[3.7 3.72.9 2.9
]
[4.5 −5.2
]
[−1.5−4.6
]
[−2.0
][1.0]
[1.00.0
][3.72.9
][2.2−1.6
][.900.17
][3.18
][1.18
][.765
][.0277
] [.235
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 23: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/23.jpg)
22Backward Pass: Derivative Computation
t
b2
W2
b1
W1x
L2
sigmoid
sum
prod
sigmoid
sum
prod
i2 − i1
σ′(i)
1, 1
i2, i1
σ′(i)
1, 1
i2, i1
[3.7 3.72.9 2.9
]
[4.5 −5.2
]
[−1.5−4.6
]
[−2.0
][1.0]
[1.00.0
][3.72.9
][2.2−1.6
][.900.17
][3.18
][1.18
][.765
][.0277
][.180
]×[.235
]=[.0424
][.235
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 24: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/24.jpg)
23Backward Pass: Derivative Computation
t
b2
W2
b1
W1x
L2
sigmoid
sum
prod
sigmoid
sum
prod
i2 − i1
σ′(i)
1, 1
i2, i1
σ′(i)
1, 1
i2, i1
[3.7 3.72.9 2.9
]
[4.5 −5.2
]
[−1.5−4.6
]
[−2.0
][1.0]
[1.00.0
][3.72.9
][2.2−1.6
][.900.17
][3.18
][1.18
][.765
][.0277
][.0424
],[.0424
][.180
]×[.235
]=[.0424
][.235
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 25: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/25.jpg)
24Backward Pass: Derivative Computation
t
b2
W2
b1
W1x
L2
sigmoid
sum
prod
sigmoid
sum
prod
i2 − i1
σ′(i)
1, 1
i2, i1
σ′(i)
1, 1
i2, i1
[3.7 3.72.9 2.9
]
[4.5 −5.2
]
[−1.5−4.6
]
[−2.0
][1.0]
[1.00.0
][3.72.9
][2.2−1.6
][.900.17
][3.18
][1.18
][.765
][.0277
]
[−.0260−.0260
],[
0171 0−.0308 0
][.0171−.0308
],[.0171−.0308
][.0171−.0308
][.191−.220
],[.0382 .00712
][.0424
],[.0424
][.180
]×[.235
]=[.0424
][.235
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 26: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/26.jpg)
25Gradients for Parameter Update
t
b2
W2
b1
W1x
L2
sigmoid
sum
prod
sigmoid
sum
prod
i2 − i1
σ′(i)
1, 1
i2, i1
σ′(i)
1, 1
i2, i1
[3.7 3.72.9 2.9
]
[4.5 −5.2
]
[−1.5−4.6
]
[−2.0
]
[.0171 0−.0308 0
][.0171−.0308
]
[.0382 .00712
][.0424
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 27: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/27.jpg)
26Parameter Update
t
b2
W2
b1
W1x
L2
sigmoid
sum
prod
sigmoid
sum
prod
i2 − i1
σ′(i)
1, 1
i2, i1
σ′(i)
1, 1
i2, i1
[3.7 3.72.9 2.9
]– µ[.0171 0−.0308 0
]
[4.5 −5.2
]– µ
[.0171−.0308
]
[−1.5−4.6
]– µ
[.0382 .00712
]
[−2.0
]– µ
[.0424
]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 28: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/28.jpg)
27
toolkits
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 29: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/29.jpg)
28Explosion of Deep Learning Toolkits
• University of Montreal: Theano
• Google: Tensorflow
• Microsoft: CNTK
• Facebook: Torch, pyTorch
• Amazon: MX-Net
• CMU: Dynet
• AMU/Edinburgh: Marian
• ... and many more
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 30: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/30.jpg)
29Toolkits
• Machine learning architectures around computations graphs very powerful
– define a computation graph– provide data and a training strategy (e.g., batching)– toolkit does the rest
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 31: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/31.jpg)
30Example: Theano
• Deep learning toolkit for Python
• Included as library
> import numpy
> import theano
> import theano.tensor as T
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 32: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/32.jpg)
31Example: Theano
• Definition of parameters
> x = T.dmatrix()
> W = theano.shared(value=numpy.array([[3.0,2.0],[4.0,3.0]]))
> b = theano.shared(value=numpy.array([-2.0,-4.0]))
• Definition of feed-forward layer
> h = T.nnet.sigmoid(T.dot(x,W)+b)
note: x is matrix→ process several training examples (sequence of vectors).
• Define as callable function
> h function = theano.function([x], h)
• Apply to data
> h function([[1,0]])
array([[ 0.73105858, 0.11920292]])
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 33: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/33.jpg)
32Example: Theano
• Same setup for hidden→output layer
W2 = theano.shared(value=numpy.array([5.0,-5.0] ))
b2 = theano.shared(-2.0)
y pred = T.nnet.sigmoid(T.dot(h,W2)+b2)
• Define as callable function > predict = theano.function([x], y pred)
• Apply to data
> predict([[1,0]])
array([[ 0.7425526]])
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 34: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/34.jpg)
33Model Training
• First, define the variable for the correct output
> y = T.dvector()
• Definition of a cost function (we use the L2 norm).
> l2 = (y-y pred)**2
> cost = l2.mean()
• Gradient descent training: computation of the derivative
> gW, gb, gW2, gb2 = T.grad(cost, [W,b,W2,b2])
• Update rule (with learning rate 0.1)
> train = theano.function(inputs=[x,y],outputs=[y pred,cost],
updates=((W, W-0.1*gW), (b, b-0.1*gb),
(W2, W2-0.1*gW2), (b2, b2-0.1*gb2)))
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 35: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/35.jpg)
34Model Training
• Training data
> DATA X = numpy.array([[0,0],[0,1],[1,0],[1,1]])
> DATA Y = numpy.array([0,1,1,0])
• Predict output for training data
> predict(DATA X)
array([ 0.18333462, 0.7425526 , 0.7425526 , 0.33430961])
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 36: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/36.jpg)
35Model Training
• Train with training data
> train(DATA X,DATA Y)
[array([ 0.18333462, 0.7425526 , 0.7425526 , 0.33430961]),
array(0.06948320612438118)]
• Prediction after training
> train(DATA X,DATA Y)
[array([ 0.18353091, 0.74260499, 0.74321824, 0.33324929]),
array(0.06923193686092949)]
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 37: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/37.jpg)
36
example: dynet
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 38: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/38.jpg)
37Dynet
• Our example: static computation graph, fixed set of data
• But: language requires different computation data for different data items(sentences have different length)
⇒ Dynamically create a computation graph for each data item
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 39: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/39.jpg)
38Example: Dynetmodel = dy.model()
W p = model.add parameters((20, 100))
b p = model.add parameters(20)
E = model.add lookup parameters((20000, 50))
trainer = dy.SimpleSGDTrainer(model)
for epoch in range(num epochs):
for in words, out label in training data:
dy.renew cg()
W = dy.parameter(W p)
b = dy.parameter(b p)
score sym = dy.softmax(
W*dy.concatenate([E[in words[0]],E[in words[1]]])+b)
loss sym = dy.pickneglogsoftmax(score sym, out label)
loss sym.forward()
loss sym.backward()
trainer.update()
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 40: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/40.jpg)
39Model Parametersmodel = dy.model()
W p = model.add parameters((20, 100))
b p = model.add parameters(20)
E = model.add lookup parameters((20000, 50))
trainer = dy.SimpleSGDTrainer(model)
for epoch in range(num epochs):
for in words, out label in training data:
dy.renew cg()
W = dy.parameter(W p)
b = dy.parameter(b p)
score sym = dy.softmax(
W*dy.concatenate([E[in words[0]],E[in words[1]]])+b)
loss sym = dy.pickneglogsoftmax(score sym, out label)
loss sym.forward()
loss sym.backward()
trainer.update()
Model holds the values for the weight matrices and weight vectors
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 41: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/41.jpg)
40Training Setupmodel = dy.model()
W p = model.add parameters((20, 100))
b p = model.add parameters(20)
E = model.add lookup parameters((20000, 50))
trainer = dy.SimpleSGDTrainer(model)
for epoch in range(num epochs):
for in words, out label in training data:
dy.renew cg()
W = dy.parameter(W p)
b = dy.parameter(b p)
score sym = dy.softmax(
W*dy.concatenate([E[in words[0]],E[in words[1]]])+b)
loss sym = dy.pickneglogsoftmax(score sym, out label)
loss sym.forward()
loss sym.backward()
trainer.update()
Defines the model update function (could be also Adagrad, Adam, ...)
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 42: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/42.jpg)
41Setting up Computation Graphmodel = dy.model()
W p = model.add parameters((20, 100))
b p = model.add parameters(20)
E = model.add lookup parameters((20000, 50))
trainer = dy.SimpleSGDTrainer(model)
for epoch in range(num epochs):
for in words, out label in training data:
dy.renew cg()
W = dy.parameter(W p)
b = dy.parameter(b p)
score sym = dy.softmax(
W*dy.concatenate([E[in words[0]],E[in words[1]]])+b)
loss sym = dy.pickneglogsoftmax(score sym, out label)
loss sym.forward()
loss sym.backward()
trainer.update()
Create a new computation graph. Inform it about parameters.
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 43: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/43.jpg)
42Operationsmodel = dy.model()
W p = model.add parameters((20, 100))
b p = model.add parameters(20)
E = model.add lookup parameters((20000, 50))
trainer = dy.SimpleSGDTrainer(model)
for epoch in range(num epochs):
for in words, out label in training data:
dy.renew cg()
W = dy.parameter(W p)
b = dy.parameter(b p)
score sym = dy.softmax(
W*dy.concatenate([E[in words[0]],E[in words[1]]])+b)
loss sym = dy.pickneglogsoftmax(score sym, out label)
loss sym.forward()
loss sym.backward()
trainer.update()
Builds the computation graph by defining operations.
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018
![Page 44: Computation Graphs - cs.umd.edu](https://reader031.vdocuments.site/reader031/viewer/2022012504/617e41bfc9f9a444511a2c9d/html5/thumbnails/44.jpg)
43Training Loopmodel = dy.model()
W p = model.add parameters((20, 100))
b p = model.add parameters(20)
E = model.add lookup parameters((20000, 50))
trainer = dy.SimpleSGDTrainer(model)
for epoch in range(num epochs):
for in words, out label in training data:
dy.renew cg()
W = dy.parameter(W p)
b = dy.parameter(b p)
score sym = dy.softmax(
W*dy.concatenate([E[in words[0]],E[in words[1]]])+b)
loss sym = dy.pickneglogsoftmax(score sym, out label)
loss sym.forward()
loss sym.backward()
trainer.update()
Process training data. Computations are done in forward and backward.
Philipp Koehn Machine Translation: Computation Graphs 2 October 2018