“deep”&learning& - carnegie mellon...
TRANSCRIPT
![Page 1: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/1.jpg)
“Deep” Learning
![Page 2: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/2.jpg)
2
natural language analyzer
Big picture: natural language analyzers
Natural language input signal: -‐ Web page -‐ Ques<on -‐ Search query -‐ Tweet -‐ Voice command
Output analysis: -‐ Ques<on -‐ Answer -‐ Command to a robot -‐ Trending topics
![Page 3: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/3.jpg)
3
sen<ment analyzer
speech recognizer
tokenizer
POS tagger
syntac<c parser
seman<c parser
machine translator
named en<ty
recognizer
spell corrector
coreference resolu<on
classifica<on
Big picture: natural language analyzers
Natural language input signal: -‐ Web page -‐ Ques<on -‐ Search query -‐ Tweet -‐ Voice command
Output analysis: -‐ Ques<on -‐ Answer -‐ Command to a robot -‐ Trending topics
![Page 4: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/4.jpg)
4
sen<ment analyzer
speech recognizer
tokenizer
POS tagger
syntac<c parser
seman<c parser
machine translator
named en<ty
recognizer
spell corrector
coreference resolu<on
classifica<on
Today: deep learning for NLP components
Natural language input signal: -‐ Web page -‐ Ques<on -‐ Search query -‐ Tweet -‐ Voice command
Output analysis: -‐ Ques<on -‐ Answer -‐ Command to a robot -‐ Trending topics
![Page 5: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/5.jpg)
Agenda
• Big picture
• Why deep learning?
• Building blocks of a deep neural network
• How to train deep neural networks
• Important results
![Page 6: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/6.jpg)
6
do
classifica<on
Running example: document classifica<on
sen<ment analyzer
speech recognizer
tokenizer
POS tagger
syntac<c parser
seman<c parser
machine translator
named en<ty
recognizer
spell corrector
coreference resolu<on
document category
![Page 7: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/7.jpg)
7
classifica<on
Running example: document classifica<on
• Politics• Business• Science• Sports• Health
output = argmaxl f( l, d)
d l
Barcelona lost to Real Madrid Sports
![Page 8: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/8.jpg)
8
How to define f(l, d): linear models
Linear models: f(l, d) = w . g(l,d)
0 0 0 1 0 0 1 0 0 0 0 …
0.4 -‐1.2 0.2 0.2 -‐0.4 -‐1.0 5.1 1.1 2.3 0.8 -‐0.1 … Number of <mes Barcelona appears
in a document labeled Sports
Number of <mes lost appears in a document labeled Sports
Barcelona lost to Real Madrid
Sportsl = d =
![Page 9: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/9.jpg)
9
How to define f(l, d): linear models
Linear models: f(l, d) = w . g(l,d) -‐ Easy to implement -‐ Easy to op<mize w Two possible improvements: -‐ Define more complex func<ons -‐ Find becer representa<ons of (l,d)
Figure credits: Barbara Rosario
![Page 10: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/10.jpg)
Agenda
• Big picture
• Why deep learning?
• Building blocks of a deep neural network
• How to train deep neural networks
• Important results
![Page 11: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/11.jpg)
11
Linear models: f(l, d) = w . g(l,d) = w(l) . x(d) e.g., y1 = x1 w1,1+ x2 w2,1+ x3 w3,1+ x4 w4,1+ x5 w5,1 = w(1) . x(d)
w1,1
w5,3
Number of <mes lost appears in a document
Number of <mes Barcelona appears in a document
neural network v1.0: linear model
- Politics
- Science
- Sports
![Page 12: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/12.jpg)
12
neural network v1.0: linear model
Linear models: f(l, d) = w . g(l,d) = w(l) . x(d) e.g., y1 = x1 w1,1+ x2 w2,1+ x3 w3,1+ x4 w4,1+ x5 w5,1 = w(1) . x(d) x
W same as
y
W
=
x y
similar words s<ll share no parameters!
- Politics
- Science
- Sports
![Page 13: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/13.jpg)
13
neural network v2.0: representa<on learning
Big idea: induce low-‐dimensional dense feature representa<ons of high-‐dimensional objects
![Page 14: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/14.jpg)
14
neural network v2.1: representa<on learning
x
W
y dense
Did this really solve the problem?
Big idea: embed words in a dense vector space and use the word embeddings as dense features
![Page 15: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/15.jpg)
15
neural network v3.0: complex func<ons
x
W
y
Big idea: define more complex func<ons by adding a hidden layer
y = W x
![Page 16: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/16.jpg)
16
neural network v3.0: complex func<ons
x
W2
y
Big idea: define more complex func<ons by adding a hidden layer
W1
h1
y = W2 h1 = W2 (W1 x) = W x Wait! Is that true?!
![Page 17: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/17.jpg)
17
neural network v3.0: complex func<ons
x
W2
y
Big idea: define more complex func<ons by adding a hidden layer
W1
h1
Universal approxima<on theorem Cybenko., G. (1989)
y = W2 h1 = W2 a1(W1 x)
non-‐linear func<ons, e.g., logis<c func<on a1(z) = 1 / (1 + e-‐z)
z
a1(z)
induced features
![Page 18: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/18.jpg)
18
neural network v3.0: complex func<ons
hcps://en.wikipedia.org/wiki/Ac<va<on_func<on
Popular ac<va<on/transfer/non-‐linear func<ons:
![Page 19: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/19.jpg)
19
neural network v3.5: “deeper” networks
x
W2
y
W1
h1
y = W3 h2 = W3 a2( W2 a1(W1 x) )
W3
h2
Wait but why do we need more layers?
![Page 20: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/20.jpg)
20
neural network v3.5: “deeper” networks
x
W2
y
W1
h1
y = W3 h2 = W3 a2( W2 a1(W1 x) )
W3
h2
![Page 21: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/21.jpg)
21
neural network v4.0: recurrent neural networks
Big idea: use hidden layers to represent sequen<al state
x
y
Feed-‐forward neural networks
x1
y
x2 x3
Recurrent neural networks
Figure credits: Andrej Karpathy
How did we represent x for document classifica<on?
Real …. Madrid =
Real Madrid
Real …. Madrid
≠ Real Madrid
Do we share parameters across states?
![Page 22: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/22.jpg)
22
neural network v4.0: recurrent neural networks
Figure credits: Christopher Olah
How to compute the hidden layers?
![Page 23: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/23.jpg)
23
neural network v4.1: output sequences
Figure credits: Andrej Karpathy
![Page 24: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/24.jpg)
24
neural network v4.1: output sequences
Figure credits: Andrej Karpathy
Example: Character-‐level language models
![Page 25: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/25.jpg)
25
neural network v4.1: output sequences
Credits: Andrej Karpathy
Sample output: Copyright was the succession of independence in the slop of Syrian influence that was a famous German movement based on a more popular servicious, non-‐doctrinal and sexual power post. Many governments recognize the military housing of the [[Civil Liberaliza<on and Infantry Resolu<on 265 Na<onal Party in Hungary]], that is sympathe<c to be to the [[Punjab Resolu<on]] (PJS)[hcp://www.humah.yahoo.com/guardian.cfm/7754800786d17551963s89.htm Official economics Adjoint for the Nazism, Montgomery was swear to advance to the resources for those Socialism's rule, was star<ng to signing a major tripad of aid exile.]]
![Page 26: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/26.jpg)
26
neural network v4.2: Long-‐Short Term Memory
Figure credits: Christopher Olah hcp://colah.github.io/posts/2015-‐08-‐Understanding-‐LSTMs/
LSTMs
Regular RNNs
![Page 27: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/27.jpg)
27
neural network v4.2: Long-‐Short Term Memory
Figure credits: Christopher Olah hcp://colah.github.io/posts/2015-‐08-‐Understanding-‐LSTMs/
![Page 28: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/28.jpg)
28
neural network v4.3: bidirec<onal RNNs
Figure credits: Christopher Olah
Unidirec<onal RNNs Bidirec<onal RNNs
![Page 29: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/29.jpg)
29
neural network v4.4: acen<on models
Bahdanau et al. (2015)
![Page 30: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/30.jpg)
30
neural network v5: convolu<onal NN
Figure credits: Christopher Olah
![Page 31: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/31.jpg)
31
neural network v5: convolu<onal NN
Figure credits: Christopher Olah
Feed-‐forward NN
Convolu<onal NN
![Page 32: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/32.jpg)
32
neural network v5: convolu<onal NN
Figure credits: Christopher Olah
Convolu<onal layer
convolu<onal layer 2 convolu<onal layer 1
Do we share parameters of different convolu<ons? In the same layer? In different layers?
![Page 33: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/33.jpg)
convolu<onal layer 2 convolu<onal layer 1
33
neural network v5: convolu<onal NN
Figure credits: Christopher Olah
convolu<onal layer 2 Max pooling layer convolu<onal layer 1
2D convolu<ons
![Page 34: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/34.jpg)
34
neural network v5.1: recursive NNs
![Page 35: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/35.jpg)
35
neural network v6: dropout
![Page 36: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/36.jpg)
Agenda
• Big picture
• Why deep learning?
• Building blocks of a deep neural network
• How to train deep neural networks
• Important results
![Page 37: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/37.jpg)
How to train NN models? • argmaxl f(d, l) only tells us which label to predict. • Supervised learning (need input/output pairs) • Loss func<on: e.g., cross-‐entropy between empirical distribu<on and model distribu<on
-‐ log p(l*|d) = -‐ log ( ef(d, l*) / Σl ef(d, l) )
x y h1
soymax layer
Regression problems? Mean square error
E[(y -‐ y*)2]
![Page 38: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/38.jpg)
How to op<mize the loss? • Stochas<c gradient descent
![Page 39: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/39.jpg)
How to op<mize the loss?
• Parameter ini<aliza<on – Break the symmetry – Use small values – Random restarts
– Popular choice: uniform with mean=zero and variance = 1 / size of previous layer
hcp://andyljones.tumblr.com/post/110998971763/an-‐explana<on-‐of-‐xavier-‐ini<aliza<on
![Page 40: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/40.jpg)
How to op<mize the loss?
• Other op<miza<on methods – Variants of stochas<c gradient descent (e.g., averaged SGD, SGD with momentum) See hcp://research.microsoy.com/pubs/192769/tricks-‐2012.pdf
– Adagrad – Adam
– Adadelta
![Page 41: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/41.jpg)
How to op<mize the loss?
• Compu<ng gradients: the hard way – Analy<cally derive the expression that represents the gradient with respect to each input.
– Compute that expression.
• Compu<ng gradients: automa<c differen<a<on – Translate the loss func<on into a sequence of atomic opera+ons
– Hard-‐code the differen<a<on of each atomic opera<on with respect to its parameters is hard-‐coded.
– Recursively compute the gradient of the loss func<on with respect to model parameters using chain rule.
![Page 42: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/42.jpg)
How to op<mize the loss: automa<c differen<a<on
![Page 43: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/43.jpg)
How to op<mize the loss: automa<c differen<a<on
![Page 44: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/44.jpg)
How to op<mize the loss: deep learning libraries
hcps://github.com/soumith/convnet-‐benchmarks/ Also see: CMU’s locally grown library at hcps://github.com/clab/cnn
![Page 45: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/45.jpg)
Agenda
• Big picture
• Why deep learning?
• Building blocks of a deep neural network
• How to train deep neural networks
• Important results
![Page 46: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/46.jpg)
46
Major results: language modeling
![Page 47: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/47.jpg)
Krizehvsky et al. (2012)
47
Major results: image classifica<on
![Page 48: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/48.jpg)
48
Major results: ImageNet
Krizehvsky et al. (2012): posi<ve and nega<ve examples
![Page 49: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/49.jpg)
49
Major results: ImageNet
Krizehvsky et al. (2012): sample convolu<on filters
![Page 50: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/50.jpg)
50
Major results: speech recogni<on
Graves et al. (2013)
![Page 51: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/51.jpg)
51
Major results: transla<on
Sutskever et al. (2014)
![Page 52: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/52.jpg)
52
Major results: transla<on
Bahdanau et al. (2015)
![Page 53: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/53.jpg)
53
Major results: dependency parsing
Chen and Manning (2014)
![Page 54: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/54.jpg)
54
Major results: dependency parsing
Dyer et al. (2015)
![Page 55: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/55.jpg)
Important things we didn’t cover
• Dark knowledge • Connec<on to graphical models • Alterna<ves to the soymax output layer
![Page 56: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/56.jpg)
Agenda
• Big picture
• Why deep learning?
• Building blocks of a deep neural network
• How to train deep neural networks
• Important results
![Page 57: “Deep”&Learning& - Carnegie Mellon Universitydemo.clab.cs.cmu.edu/.../files/slides/26-deep-learning.pdf · 2019-04-30 · 2 natural&& language& analyzer& Big&picture:&natural&language&analyzers&](https://reader034.vdocuments.site/reader034/viewer/2022042308/5ed46d389801056341574568/html5/thumbnails/57.jpg)
57
sen<ment analyzer
speech recognizer
tokenizer
POS tagger
syntac<c parser
seman<c parser
machine translator
named en<ty
recognizer
spell corrector
coreference resolu<on
classifica<on
Open ques<on: can we do without the intermediate linguis<c abstrac<ons?
Natural language input signal: -‐ Web page -‐ Ques<on -‐ Search query -‐ Tweet -‐ Voice command
Output analysis: -‐ Ques<on -‐ Answer -‐ Command to a robot -‐ Trending topics