lecture 4 backpropagation - university of chicagoshubhendu/pages/files/lecture4_pauses.pdflecture 4...
TRANSCRIPT
![Page 1: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/1.jpg)
Lecture 4Backpropagation
CMSC 35246: Deep Learning
Shubhendu Trivedi&
Risi Kondor
University of Chicago
April 5, 2017
Lecture 4 Backpropagation CMSC 35246
![Page 2: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/2.jpg)
Things we will look at today
• More Backpropagation
• Still more backpropagation• Quiz at 4:05 PM
Lecture 4 Backpropagation CMSC 35246
![Page 3: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/3.jpg)
Things we will look at today
• More Backpropagation• Still more backpropagation
• Quiz at 4:05 PM
Lecture 4 Backpropagation CMSC 35246
![Page 4: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/4.jpg)
Things we will look at today
• More Backpropagation• Still more backpropagation• Quiz at 4:05 PM
Lecture 4 Backpropagation CMSC 35246
![Page 5: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/5.jpg)
To understand, let us just calculate!
Lecture 4 Backpropagation CMSC 35246
![Page 6: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/6.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
Consider example x; Output for x is y; Correct Answer is y
Loss L = (y − y)2
y = xTw = x1w1 + x2w2 + . . . xdwd
Lecture 4 Backpropagation CMSC 35246
![Page 7: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/7.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
Consider example x; Output for x is y; Correct Answer is y
Loss L = (y − y)2
y = xTw = x1w1 + x2w2 + . . . xdwd
Lecture 4 Backpropagation CMSC 35246
![Page 8: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/8.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
Want to update wi (forget closed form solution for a bit!)
Update rule: wi := wi − η ∂L∂wi
Now
∂L
∂wi=∂(y − y)2
∂wi= 2(y − y)∂(x1w1 + x2w2 + . . . xdwd)
∂wi
Lecture 4 Backpropagation CMSC 35246
![Page 9: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/9.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
Want to update wi (forget closed form solution for a bit!)
Update rule: wi := wi − η ∂L∂wi
Now
∂L
∂wi=∂(y − y)2
∂wi= 2(y − y)∂(x1w1 + x2w2 + . . . xdwd)
∂wi
Lecture 4 Backpropagation CMSC 35246
![Page 10: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/10.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
Want to update wi (forget closed form solution for a bit!)
Update rule: wi := wi − η ∂L∂wi
Now
∂L
∂wi=
∂(y − y)2
∂wi= 2(y − y)∂(x1w1 + x2w2 + . . . xdwd)
∂wi
Lecture 4 Backpropagation CMSC 35246
![Page 11: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/11.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
Want to update wi (forget closed form solution for a bit!)
Update rule: wi := wi − η ∂L∂wi
Now
∂L
∂wi=∂(y − y)2
∂wi=
2(y − y)∂(x1w1 + x2w2 + . . . xdwd)
∂wi
Lecture 4 Backpropagation CMSC 35246
![Page 12: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/12.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
Want to update wi (forget closed form solution for a bit!)
Update rule: wi := wi − η ∂L∂wi
Now
∂L
∂wi=∂(y − y)2
∂wi= 2(y − y)∂(x1w1 + x2w2 + . . . xdwd)
∂wi
Lecture 4 Backpropagation CMSC 35246
![Page 13: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/13.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
We have:∂L
∂wi=
2(y − y)xiUpdate Rule:
wi := wi − η(y − y)xi = wi − ηδxi where δ = (y − y)
In vector form: w := w − ηδxSimple enough! Now let’s graduate ...
Lecture 4 Backpropagation CMSC 35246
![Page 14: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/14.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
We have:∂L
∂wi= 2(y − y)xi
Update Rule:
wi := wi − η(y − y)xi = wi − ηδxi where δ = (y − y)
In vector form: w := w − ηδxSimple enough! Now let’s graduate ...
Lecture 4 Backpropagation CMSC 35246
![Page 15: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/15.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
We have:∂L
∂wi= 2(y − y)xi
Update Rule:
wi := wi − η(y − y)xi = wi − ηδxi where δ = (y − y)
In vector form: w := w − ηδxSimple enough! Now let’s graduate ...
Lecture 4 Backpropagation CMSC 35246
![Page 16: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/16.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
We have:∂L
∂wi= 2(y − y)xi
Update Rule:
wi := wi − η(y − y)xi = wi − ηδxi where δ = (y − y)
In vector form: w := w − ηδx
Simple enough! Now let’s graduate ...
Lecture 4 Backpropagation CMSC 35246
![Page 17: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/17.jpg)
One Neuron Again
x1 x2 . . . xd
y
w1 w2
wd
We have:∂L
∂wi= 2(y − y)xi
Update Rule:
wi := wi − η(y − y)xi = wi − ηδxi where δ = (y − y)
In vector form: w := w − ηδxSimple enough! Now let’s graduate ...
Lecture 4 Backpropagation CMSC 35246
![Page 18: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/18.jpg)
Simple Feedforward Network
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
y = w(2)1 z1 + w
(2)2 z2
z1 = tanh(a1) where a1 = w(1)11 x1 + w
(1)21 x2 + w
(1)31 x3 likewise
for z2
Lecture 4 Backpropagation CMSC 35246
![Page 19: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/19.jpg)
Simple Feedforward Network
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
y = w(2)1 z1 + w
(2)2 z2
z1 = tanh(a1) where a1 = w(1)11 x1 + w
(1)21 x2 + w
(1)31 x3 likewise
for z2
Lecture 4 Backpropagation CMSC 35246
![Page 20: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/20.jpg)
Simple Feedforward Network
z1 = tanh(a1) where a1 =
w(1)11 x1 + w
(1)21 x2 + w
(1)31 x3
z2 = tanh(a2) where a2 =
w(1)12 x1 + w
(1)22 x2 + w
(1)32 x3
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
Output y = w(2)1 z1 + w
(2)2 z2; Loss L = (y − y)2
Want to assign credit for the loss L to each weight
Lecture 4 Backpropagation CMSC 35246
![Page 21: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/21.jpg)
Simple Feedforward Network
z1 = tanh(a1) where a1 =
w(1)11 x1 + w
(1)21 x2 + w
(1)31 x3
z2 = tanh(a2) where a2 =
w(1)12 x1 + w
(1)22 x2 + w
(1)32 x3
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
Output y = w(2)1 z1 + w
(2)2 z2; Loss L = (y − y)2
Want to assign credit for the loss L to each weight
Lecture 4 Backpropagation CMSC 35246
![Page 22: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/22.jpg)
Top Layer
Want to find: ∂L
∂w(2)1
and ∂L
∂w(2)2
Consider w(2)1 first
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(2)1
= ∂(y−y)2
∂w(2)1
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(2)1
= 2(y − y)z1
Familiar from earlier! Update for w(2)1 would be
w(2)1 := w
(2)1 − η ∂L
∂w(2)1
= w(2)1 − ηδz1 with δ = (y − y)
Likewise, for w(2)2 update would be w
(2)2 := w
(2)2 − ηδz2
Lecture 4 Backpropagation CMSC 35246
![Page 23: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/23.jpg)
Top Layer
Want to find: ∂L
∂w(2)1
and ∂L
∂w(2)2
Consider w(2)1 first
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(2)1
=
∂(y−y)2
∂w(2)1
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(2)1
= 2(y − y)z1
Familiar from earlier! Update for w(2)1 would be
w(2)1 := w
(2)1 − η ∂L
∂w(2)1
= w(2)1 − ηδz1 with δ = (y − y)
Likewise, for w(2)2 update would be w
(2)2 := w
(2)2 − ηδz2
Lecture 4 Backpropagation CMSC 35246
![Page 24: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/24.jpg)
Top Layer
Want to find: ∂L
∂w(2)1
and ∂L
∂w(2)2
Consider w(2)1 first
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(2)1
= ∂(y−y)2
∂w(2)1
=
2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(2)1
= 2(y − y)z1
Familiar from earlier! Update for w(2)1 would be
w(2)1 := w
(2)1 − η ∂L
∂w(2)1
= w(2)1 − ηδz1 with δ = (y − y)
Likewise, for w(2)2 update would be w
(2)2 := w
(2)2 − ηδz2
Lecture 4 Backpropagation CMSC 35246
![Page 25: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/25.jpg)
Top Layer
Want to find: ∂L
∂w(2)1
and ∂L
∂w(2)2
Consider w(2)1 first
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(2)1
= ∂(y−y)2
∂w(2)1
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(2)1
=
2(y − y)z1
Familiar from earlier! Update for w(2)1 would be
w(2)1 := w
(2)1 − η ∂L
∂w(2)1
= w(2)1 − ηδz1 with δ = (y − y)
Likewise, for w(2)2 update would be w
(2)2 := w
(2)2 − ηδz2
Lecture 4 Backpropagation CMSC 35246
![Page 26: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/26.jpg)
Top Layer
Want to find: ∂L
∂w(2)1
and ∂L
∂w(2)2
Consider w(2)1 first
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(2)1
= ∂(y−y)2
∂w(2)1
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(2)1
= 2(y − y)z1
Familiar from earlier! Update for w(2)1 would be
w(2)1 := w
(2)1 − η ∂L
∂w(2)1
= w(2)1 − ηδz1 with δ = (y − y)
Likewise, for w(2)2 update would be w
(2)2 := w
(2)2 − ηδz2
Lecture 4 Backpropagation CMSC 35246
![Page 27: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/27.jpg)
Top Layer
Want to find: ∂L
∂w(2)1
and ∂L
∂w(2)2
Consider w(2)1 first
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(2)1
= ∂(y−y)2
∂w(2)1
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(2)1
= 2(y − y)z1
Familiar from earlier! Update for w(2)1 would be
w(2)1 := w
(2)1 − η ∂L
∂w(2)1
=
w(2)1 − ηδz1 with δ = (y − y)
Likewise, for w(2)2 update would be w
(2)2 := w
(2)2 − ηδz2
Lecture 4 Backpropagation CMSC 35246
![Page 28: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/28.jpg)
Top Layer
Want to find: ∂L
∂w(2)1
and ∂L
∂w(2)2
Consider w(2)1 first
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(2)1
= ∂(y−y)2
∂w(2)1
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(2)1
= 2(y − y)z1
Familiar from earlier! Update for w(2)1 would be
w(2)1 := w
(2)1 − η ∂L
∂w(2)1
= w(2)1 − ηδz1 with δ = (y − y)
Likewise, for w(2)2 update would be w
(2)2 := w
(2)2 − ηδz2
Lecture 4 Backpropagation CMSC 35246
![Page 29: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/29.jpg)
Top Layer
Want to find: ∂L
∂w(2)1
and ∂L
∂w(2)2
Consider w(2)1 first
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(2)1
= ∂(y−y)2
∂w(2)1
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(2)1
= 2(y − y)z1
Familiar from earlier! Update for w(2)1 would be
w(2)1 := w
(2)1 − η ∂L
∂w(2)1
= w(2)1 − ηδz1 with δ = (y − y)
Likewise, for w(2)2 update would be w
(2)2 := w
(2)2 − ηδz2
Lecture 4 Backpropagation CMSC 35246
![Page 30: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/30.jpg)
Next Layer
There are six weights to assigncredit for the loss incurred
Consider w(1)11 for an illustration
Rest are similar
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(1)11
= ∂(y−y)2
∂w(1)11
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(21)11
Now:∂(w
(2)1 z1+w
(2)2 z2)
∂w(1)11
= w(2)1
∂(tanh(w(1)11 x1+w
(1)21 x2+w
(1)31 x3))
∂w(1)11
+ 0
Which is: w(2)1 (1− tanh2(a1))x1 recall a1 =?
So we have: ∂L
∂w(1)11
= 2(y − y)w(2)1 (1− tanh2(a1))x1
Lecture 4 Backpropagation CMSC 35246
![Page 31: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/31.jpg)
Next Layer
There are six weights to assigncredit for the loss incurred
Consider w(1)11 for an illustration
Rest are similar
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(1)11
=
∂(y−y)2
∂w(1)11
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(21)11
Now:∂(w
(2)1 z1+w
(2)2 z2)
∂w(1)11
= w(2)1
∂(tanh(w(1)11 x1+w
(1)21 x2+w
(1)31 x3))
∂w(1)11
+ 0
Which is: w(2)1 (1− tanh2(a1))x1 recall a1 =?
So we have: ∂L
∂w(1)11
= 2(y − y)w(2)1 (1− tanh2(a1))x1
Lecture 4 Backpropagation CMSC 35246
![Page 32: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/32.jpg)
Next Layer
There are six weights to assigncredit for the loss incurred
Consider w(1)11 for an illustration
Rest are similar
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(1)11
= ∂(y−y)2
∂w(1)11
=
2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(21)11
Now:∂(w
(2)1 z1+w
(2)2 z2)
∂w(1)11
= w(2)1
∂(tanh(w(1)11 x1+w
(1)21 x2+w
(1)31 x3))
∂w(1)11
+ 0
Which is: w(2)1 (1− tanh2(a1))x1 recall a1 =?
So we have: ∂L
∂w(1)11
= 2(y − y)w(2)1 (1− tanh2(a1))x1
Lecture 4 Backpropagation CMSC 35246
![Page 33: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/33.jpg)
Next Layer
There are six weights to assigncredit for the loss incurred
Consider w(1)11 for an illustration
Rest are similar
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(1)11
= ∂(y−y)2
∂w(1)11
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(21)11
Now:∂(w
(2)1 z1+w
(2)2 z2)
∂w(1)11
= w(2)1
∂(tanh(w(1)11 x1+w
(1)21 x2+w
(1)31 x3))
∂w(1)11
+ 0
Which is: w(2)1 (1− tanh2(a1))x1 recall a1 =?
So we have: ∂L
∂w(1)11
= 2(y − y)w(2)1 (1− tanh2(a1))x1
Lecture 4 Backpropagation CMSC 35246
![Page 34: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/34.jpg)
Next Layer
There are six weights to assigncredit for the loss incurred
Consider w(1)11 for an illustration
Rest are similar
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(1)11
= ∂(y−y)2
∂w(1)11
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(21)11
Now:∂(w
(2)1 z1+w
(2)2 z2)
∂w(1)11
=
w(2)1
∂(tanh(w(1)11 x1+w
(1)21 x2+w
(1)31 x3))
∂w(1)11
+ 0
Which is: w(2)1 (1− tanh2(a1))x1 recall a1 =?
So we have: ∂L
∂w(1)11
= 2(y − y)w(2)1 (1− tanh2(a1))x1
Lecture 4 Backpropagation CMSC 35246
![Page 35: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/35.jpg)
Next Layer
There are six weights to assigncredit for the loss incurred
Consider w(1)11 for an illustration
Rest are similar
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(1)11
= ∂(y−y)2
∂w(1)11
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(21)11
Now:∂(w
(2)1 z1+w
(2)2 z2)
∂w(1)11
= w(2)1
∂(tanh(w(1)11 x1+w
(1)21 x2+w
(1)31 x3))
∂w(1)11
+ 0
Which is: w(2)1 (1− tanh2(a1))x1 recall a1 =?
So we have: ∂L
∂w(1)11
= 2(y − y)w(2)1 (1− tanh2(a1))x1
Lecture 4 Backpropagation CMSC 35246
![Page 36: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/36.jpg)
Next Layer
There are six weights to assigncredit for the loss incurred
Consider w(1)11 for an illustration
Rest are similar
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(1)11
= ∂(y−y)2
∂w(1)11
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(21)11
Now:∂(w
(2)1 z1+w
(2)2 z2)
∂w(1)11
= w(2)1
∂(tanh(w(1)11 x1+w
(1)21 x2+w
(1)31 x3))
∂w(1)11
+ 0
Which is: w(2)1 (1− tanh2(a1))x1
recall a1 =?
So we have: ∂L
∂w(1)11
= 2(y − y)w(2)1 (1− tanh2(a1))x1
Lecture 4 Backpropagation CMSC 35246
![Page 37: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/37.jpg)
Next Layer
There are six weights to assigncredit for the loss incurred
Consider w(1)11 for an illustration
Rest are similar
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(1)11
= ∂(y−y)2
∂w(1)11
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(21)11
Now:∂(w
(2)1 z1+w
(2)2 z2)
∂w(1)11
= w(2)1
∂(tanh(w(1)11 x1+w
(1)21 x2+w
(1)31 x3))
∂w(1)11
+ 0
Which is: w(2)1 (1− tanh2(a1))x1 recall a1 =?
So we have: ∂L
∂w(1)11
= 2(y − y)w(2)1 (1− tanh2(a1))x1
Lecture 4 Backpropagation CMSC 35246
![Page 38: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/38.jpg)
Next Layer
There are six weights to assigncredit for the loss incurred
Consider w(1)11 for an illustration
Rest are similar
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
∂L
∂w(1)11
= ∂(y−y)2
∂w(1)11
= 2(y − y)∂(w(2)1 z1+w
(2)2 z2)
∂w(21)11
Now:∂(w
(2)1 z1+w
(2)2 z2)
∂w(1)11
= w(2)1
∂(tanh(w(1)11 x1+w
(1)21 x2+w
(1)31 x3))
∂w(1)11
+ 0
Which is: w(2)1 (1− tanh2(a1))x1 recall a1 =?
So we have: ∂L
∂w(1)11
= 2(y − y)w(2)1 (1− tanh2(a1))x1
Lecture 4 Backpropagation CMSC 35246
![Page 39: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/39.jpg)
Next Layer
∂L
∂w(1)11
=
2(y − y)w(2)1 (1− tanh2(a1))x1
Weight update:
w(1)11 := w
(1)11 − η ∂L
∂w(1)11
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
Likewise, if we were considering w(1)22 , we’d have:
∂L
∂w(1)22
= 2(y − y)w(2)2 (1− tanh2(a2))x2
Weight update: w(1)22 := w
(1)22 − η ∂L
∂w(1)22
Lecture 4 Backpropagation CMSC 35246
![Page 40: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/40.jpg)
Next Layer
∂L
∂w(1)11
=
2(y − y)w(2)1 (1− tanh2(a1))x1
Weight update:
w(1)11 := w
(1)11 − η ∂L
∂w(1)11
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
Likewise, if we were considering w(1)22 , we’d have:
∂L
∂w(1)22
= 2(y − y)w(2)2 (1− tanh2(a2))x2
Weight update: w(1)22 := w
(1)22 − η ∂L
∂w(1)22
Lecture 4 Backpropagation CMSC 35246
![Page 41: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/41.jpg)
Next Layer
∂L
∂w(1)11
=
2(y − y)w(2)1 (1− tanh2(a1))x1
Weight update:
w(1)11 := w
(1)11 − η ∂L
∂w(1)11
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
Likewise, if we were considering w(1)22 , we’d have:
∂L
∂w(1)22
= 2(y − y)w(2)2 (1− tanh2(a2))x2
Weight update: w(1)22 := w
(1)22 − η ∂L
∂w(1)22
Lecture 4 Backpropagation CMSC 35246
![Page 42: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/42.jpg)
Next Layer
∂L
∂w(1)11
=
2(y − y)w(2)1 (1− tanh2(a1))x1
Weight update:
w(1)11 := w
(1)11 − η ∂L
∂w(1)11
x1 x2 x3
z1 z2
y
w(1)11 w
(1)21 w
(1)31
w(1)12 w
(1)22 w
(1)32
w(2)1 w
(2)2
Likewise, if we were considering w(1)22 , we’d have:
∂L
∂w(1)22
= 2(y − y)w(2)2 (1− tanh2(a2))x2
Weight update: w(1)22 := w
(1)22 − η ∂L
∂w(1)22
Lecture 4 Backpropagation CMSC 35246
![Page 43: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/43.jpg)
Let’s clean this up...
Recall, for top layer: ∂L
∂w(2)i
= (y − y)zi = δzi (ignoring 2)
One can think of this as: ∂L
∂w(2)i
= δ︸︷︷︸local error
zi︸︷︷︸local input
For next layer we had: ∂L
∂w(1)ij
= (y − y)w(2)j (1− tanh2(aj))xi
Let δj = (y − y)w(2)j (1− tanh2(aj)) = δw
(2)j (1− tanh2(aj))
(Notice that δj contains the δ term (which is the error!))
Then: ∂L
∂w(1)ij
= δj︸︷︷︸local error
xi︸︷︷︸local input
Neat!
Lecture 4 Backpropagation CMSC 35246
![Page 44: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/44.jpg)
Let’s clean this up...
Recall, for top layer: ∂L
∂w(2)i
= (y − y)zi = δzi (ignoring 2)
One can think of this as: ∂L
∂w(2)i
= δ︸︷︷︸local error
zi︸︷︷︸local input
For next layer we had: ∂L
∂w(1)ij
= (y − y)w(2)j (1− tanh2(aj))xi
Let δj = (y − y)w(2)j (1− tanh2(aj)) = δw
(2)j (1− tanh2(aj))
(Notice that δj contains the δ term (which is the error!))
Then: ∂L
∂w(1)ij
= δj︸︷︷︸local error
xi︸︷︷︸local input
Neat!
Lecture 4 Backpropagation CMSC 35246
![Page 45: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/45.jpg)
Let’s clean this up...
Recall, for top layer: ∂L
∂w(2)i
= (y − y)zi = δzi (ignoring 2)
One can think of this as: ∂L
∂w(2)i
= δ︸︷︷︸local error
zi︸︷︷︸local input
For next layer we had: ∂L
∂w(1)ij
= (y − y)w(2)j (1− tanh2(aj))xi
Let δj = (y − y)w(2)j (1− tanh2(aj)) = δw
(2)j (1− tanh2(aj))
(Notice that δj contains the δ term (which is the error!))
Then: ∂L
∂w(1)ij
= δj︸︷︷︸local error
xi︸︷︷︸local input
Neat!
Lecture 4 Backpropagation CMSC 35246
![Page 46: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/46.jpg)
Let’s clean this up...
Recall, for top layer: ∂L
∂w(2)i
= (y − y)zi = δzi (ignoring 2)
One can think of this as: ∂L
∂w(2)i
= δ︸︷︷︸local error
zi︸︷︷︸local input
For next layer we had: ∂L
∂w(1)ij
= (y − y)w(2)j (1− tanh2(aj))xi
Let δj = (y − y)w(2)j (1− tanh2(aj)) = δw
(2)j (1− tanh2(aj))
(Notice that δj contains the δ term (which is the error!))
Then: ∂L
∂w(1)ij
= δj︸︷︷︸local error
xi︸︷︷︸local input
Neat!
Lecture 4 Backpropagation CMSC 35246
![Page 47: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/47.jpg)
Let’s clean this up...
Recall, for top layer: ∂L
∂w(2)i
= (y − y)zi = δzi (ignoring 2)
One can think of this as: ∂L
∂w(2)i
= δ︸︷︷︸local error
zi︸︷︷︸local input
For next layer we had: ∂L
∂w(1)ij
= (y − y)w(2)j (1− tanh2(aj))xi
Let δj = (y − y)w(2)j (1− tanh2(aj)) = δw
(2)j (1− tanh2(aj))
(Notice that δj contains the δ term (which is the error!))
Then: ∂L
∂w(1)ij
= δj︸︷︷︸local error
xi︸︷︷︸local input
Neat!
Lecture 4 Backpropagation CMSC 35246
![Page 48: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/48.jpg)
Let’s clean this up...
Recall, for top layer: ∂L
∂w(2)i
= (y − y)zi = δzi (ignoring 2)
One can think of this as: ∂L
∂w(2)i
= δ︸︷︷︸local error
zi︸︷︷︸local input
For next layer we had: ∂L
∂w(1)ij
= (y − y)w(2)j (1− tanh2(aj))xi
Let δj = (y − y)w(2)j (1− tanh2(aj)) = δw
(2)j (1− tanh2(aj))
(Notice that δj contains the δ term (which is the error!))
Then: ∂L
∂w(1)ij
= δj︸︷︷︸local error
xi︸︷︷︸local input
Neat!
Lecture 4 Backpropagation CMSC 35246
![Page 49: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/49.jpg)
Let’s clean this up...
Recall, for top layer: ∂L
∂w(2)i
= (y − y)zi = δzi (ignoring 2)
One can think of this as: ∂L
∂w(2)i
= δ︸︷︷︸local error
zi︸︷︷︸local input
For next layer we had: ∂L
∂w(1)ij
= (y − y)w(2)j (1− tanh2(aj))xi
Let δj = (y − y)w(2)j (1− tanh2(aj)) = δw
(2)j (1− tanh2(aj))
(Notice that δj contains the δ term (which is the error!))
Then: ∂L
∂w(1)ij
= δj︸︷︷︸local error
xi︸︷︷︸local input
Neat!
Lecture 4 Backpropagation CMSC 35246
![Page 50: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/50.jpg)
Let’s clean this up...
Let’s get a cleaner notation to summarize this
Let wi j be the weight for the connection FROM node i tonode j
Then∂L
∂wi j= δjzi
δj is the local error (going from j backwards) and zi is thelocal input coming from i
Lecture 4 Backpropagation CMSC 35246
![Page 51: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/51.jpg)
Let’s clean this up...
Let’s get a cleaner notation to summarize this
Let wi j be the weight for the connection FROM node i tonode j
Then∂L
∂wi j= δjzi
δj is the local error (going from j backwards) and zi is thelocal input coming from i
Lecture 4 Backpropagation CMSC 35246
![Page 52: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/52.jpg)
Let’s clean this up...
Let’s get a cleaner notation to summarize this
Let wi j be the weight for the connection FROM node i tonode j
Then∂L
∂wi j=
δjzi
δj is the local error (going from j backwards) and zi is thelocal input coming from i
Lecture 4 Backpropagation CMSC 35246
![Page 53: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/53.jpg)
Let’s clean this up...
Let’s get a cleaner notation to summarize this
Let wi j be the weight for the connection FROM node i tonode j
Then∂L
∂wi j= δjzi
δj is the local error (going from j backwards) and zi is thelocal input coming from i
Lecture 4 Backpropagation CMSC 35246
![Page 54: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/54.jpg)
Let’s clean this up...
Let’s get a cleaner notation to summarize this
Let wi j be the weight for the connection FROM node i tonode j
Then∂L
∂wi j= δjzi
δj is the local error (going from j backwards) and zi is thelocal input coming from i
Lecture 4 Backpropagation CMSC 35246
![Page 55: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/55.jpg)
Credit Assignment: A Graphical Revision
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0 w2 0
Let’s redraw our toy network with new notation and labelnodes
Lecture 4 Backpropagation CMSC 35246
![Page 56: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/56.jpg)
Credit Assignment: Top Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0 w2 0
δ
Local error from 0: δ = (y − y), local input from 1: z1
∴∂L
∂w1 0= δz1; and update w1 0 := w1 0 − ηδz1
Lecture 4 Backpropagation CMSC 35246
![Page 57: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/57.jpg)
Credit Assignment: Top Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0 w2 0
δ
Local error from 0: δ = (y − y), local input from 1: z1
∴∂L
∂w1 0= δz1; and update w1 0 := w1 0 − ηδz1
Lecture 4 Backpropagation CMSC 35246
![Page 58: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/58.jpg)
Credit Assignment: Top Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0 w2 0
δ
Local error from 0: δ = (y − y), local input from 1: z1
∴∂L
∂w1 0= δz1; and update w1 0 := w1 0 − ηδz1
Lecture 4 Backpropagation CMSC 35246
![Page 59: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/59.jpg)
Credit Assignment: Top Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0 w2 0
Lecture 4 Backpropagation CMSC 35246
![Page 60: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/60.jpg)
Credit Assignment: Top Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0w2 0
δ
Local error from 0: δ = (y − y), local input from 2: z2
∴∂L
∂w2 0= δz2 and update w2 0 := w2 0 − ηδz2
Lecture 4 Backpropagation CMSC 35246
![Page 61: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/61.jpg)
Credit Assignment: Top Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0w2 0
δ
Local error from 0: δ = (y − y), local input from 2: z2
∴∂L
∂w2 0= δz2 and update w2 0 := w2 0 − ηδz2
Lecture 4 Backpropagation CMSC 35246
![Page 62: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/62.jpg)
Credit Assignment: Next Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0 w2 0
Lecture 4 Backpropagation CMSC 35246
![Page 63: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/63.jpg)
Credit Assignment: Next Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0w2 0
δ
Lecture 4 Backpropagation CMSC 35246
![Page 64: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/64.jpg)
Credit Assignment: Next Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0w2 0
δ
δ1
Local error from 1: δ1 = (δ)(w1 0)(1− tanh2(a1)), localinput from 3: x3
∴∂L
∂w3 1= δ1x3 and update w3 1 := w3 1 − ηδ1x3
Lecture 4 Backpropagation CMSC 35246
![Page 65: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/65.jpg)
Credit Assignment: Next Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0w2 0
δ
δ1
Local error from 1: δ1 = (δ)(w1 0)(1− tanh2(a1)), localinput from 3: x3
∴∂L
∂w3 1= δ1x3 and update w3 1 := w3 1 − ηδ1x3
Lecture 4 Backpropagation CMSC 35246
![Page 66: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/66.jpg)
Credit Assignment: Next Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0w2 0
δ
δ1
Local error from 1: δ1 = (δ)(w1 0)(1− tanh2(a1)), localinput from 3: x3
∴∂L
∂w3 1= δ1x3 and update w3 1 := w3 1 − ηδ1x3
Lecture 4 Backpropagation CMSC 35246
![Page 67: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/67.jpg)
Credit Assignment: Next Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0 w2 0
Lecture 4 Backpropagation CMSC 35246
![Page 68: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/68.jpg)
Credit Assignment: Next Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2w4 2w3 1
w1 0w2 0
δ
Lecture 4 Backpropagation CMSC 35246
![Page 69: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/69.jpg)
Credit Assignment: Next Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2
w4 2
w3 1
w1 0w2 0
δ
δ1
Local error from 2: δ2 = (δ)(w2 0)(1− tanh2(a2)), localinput from 4: x2
∴∂L
∂w4 2= δ2x2 and update w4 2 := w4 2 − ηδ2x2
Lecture 4 Backpropagation CMSC 35246
![Page 70: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/70.jpg)
Credit Assignment: Next Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2
w4 2
w3 1
w1 0w2 0
δ
δ1
Local error from 2: δ2 = (δ)(w2 0)(1− tanh2(a2)), localinput from 4: x2
∴∂L
∂w4 2= δ2x2 and update w4 2 := w4 2 − ηδ2x2
Lecture 4 Backpropagation CMSC 35246
![Page 71: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/71.jpg)
Credit Assignment: Next Layer
5x1 4x2 3 x3
1z1 2z2
0
y
w5 1 w4 1 w3 1
w5 2
w4 2
w3 1
w1 0w2 0
δ
δ1
Local error from 2: δ2 = (δ)(w2 0)(1− tanh2(a2)), localinput from 4: x2
∴∂L
∂w4 2= δ2x2 and update w4 2 := w4 2 − ηδ2x2
Lecture 4 Backpropagation CMSC 35246
![Page 72: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/72.jpg)
Let’s Vectorize
Let W (2) =
[w1 0
w2 0
](ignore that W (2) is a vector and hence
more appropriate to use w(2))
Let
W (1) =
w5 1 w5 2
w4 1 w4 2
w3 1 w3 2
Let
Z(1) =
x1x2x3
and Z(2) =
[z1z2
]
Lecture 4 Backpropagation CMSC 35246
![Page 73: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/73.jpg)
Let’s Vectorize
Let W (2) =
[w1 0
w2 0
](ignore that W (2) is a vector and hence
more appropriate to use w(2))
Let
W (1) =
w5 1 w5 2
w4 1 w4 2
w3 1 w3 2
Let
Z(1) =
x1x2x3
and Z(2) =
[z1z2
]
Lecture 4 Backpropagation CMSC 35246
![Page 74: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/74.jpg)
Let’s Vectorize
Let W (2) =
[w1 0
w2 0
](ignore that W (2) is a vector and hence
more appropriate to use w(2))
Let
W (1) =
w5 1 w5 2
w4 1 w4 2
w3 1 w3 2
Let
Z(1) =
x1x2x3
and Z(2) =
[z1z2
]
Lecture 4 Backpropagation CMSC 35246
![Page 75: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/75.jpg)
Feedforward Computation
1 Compute A(1) = Z(1)TW (1)
2 Applying element-wise non-linearity Z(2) = tanhA(1)
3 Compute Output y = Z(2)TW (2)
4 Compute Loss on example (y − y)2
Lecture 4 Backpropagation CMSC 35246
![Page 76: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/76.jpg)
Feedforward Computation
1 Compute A(1) = Z(1)TW (1)
2 Applying element-wise non-linearity Z(2) = tanhA(1)
3 Compute Output y = Z(2)TW (2)
4 Compute Loss on example (y − y)2
Lecture 4 Backpropagation CMSC 35246
![Page 77: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/77.jpg)
Feedforward Computation
1 Compute A(1) = Z(1)TW (1)
2 Applying element-wise non-linearity Z(2) = tanhA(1)
3 Compute Output y = Z(2)TW (2)
4 Compute Loss on example (y − y)2
Lecture 4 Backpropagation CMSC 35246
![Page 78: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/78.jpg)
Feedforward Computation
1 Compute A(1) = Z(1)TW (1)
2 Applying element-wise non-linearity Z(2) = tanhA(1)
3 Compute Output y = Z(2)TW (2)
4 Compute Loss on example (y − y)2
Lecture 4 Backpropagation CMSC 35246
![Page 79: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/79.jpg)
Flowing Backward
1 Top: Compute δ
2 Gradient w.r.t W (2) = δZ(2)
3 Compute δ1 = (W (2)T δ)� (1− tanh(A(1))2)
Notes: (a): � is Hadamard product. (b) have written W (2)T δas δ can be a vector when there are multiple outputs
4 Gradient w.r.t W (1) = δ1Z(1)
5 Update W (2) :=W (2) − ηδZ(2)
6 Update W (1) :=W (1) − ηδ1Z(1)
7 All the dimensionalities nicely check out!
Lecture 4 Backpropagation CMSC 35246
![Page 80: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/80.jpg)
Flowing Backward
1 Top: Compute δ
2 Gradient w.r.t W (2) = δZ(2)
3 Compute δ1 = (W (2)T δ)� (1− tanh(A(1))2)
Notes: (a): � is Hadamard product. (b) have written W (2)T δas δ can be a vector when there are multiple outputs
4 Gradient w.r.t W (1) = δ1Z(1)
5 Update W (2) :=W (2) − ηδZ(2)
6 Update W (1) :=W (1) − ηδ1Z(1)
7 All the dimensionalities nicely check out!
Lecture 4 Backpropagation CMSC 35246
![Page 81: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/81.jpg)
Flowing Backward
1 Top: Compute δ
2 Gradient w.r.t W (2) = δZ(2)
3 Compute δ1 = (W (2)T δ)� (1− tanh(A(1))2)
Notes: (a): � is Hadamard product. (b) have written W (2)T δas δ can be a vector when there are multiple outputs
4 Gradient w.r.t W (1) = δ1Z(1)
5 Update W (2) :=W (2) − ηδZ(2)
6 Update W (1) :=W (1) − ηδ1Z(1)
7 All the dimensionalities nicely check out!
Lecture 4 Backpropagation CMSC 35246
![Page 82: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/82.jpg)
Flowing Backward
1 Top: Compute δ
2 Gradient w.r.t W (2) = δZ(2)
3 Compute δ1 = (W (2)T δ)� (1− tanh(A(1))2)
Notes: (a): � is Hadamard product. (b) have written W (2)T δas δ can be a vector when there are multiple outputs
4 Gradient w.r.t W (1) = δ1Z(1)
5 Update W (2) :=W (2) − ηδZ(2)
6 Update W (1) :=W (1) − ηδ1Z(1)
7 All the dimensionalities nicely check out!
Lecture 4 Backpropagation CMSC 35246
![Page 83: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/83.jpg)
Flowing Backward
1 Top: Compute δ
2 Gradient w.r.t W (2) = δZ(2)
3 Compute δ1 = (W (2)T δ)� (1− tanh(A(1))2)
Notes: (a): � is Hadamard product. (b) have written W (2)T δas δ can be a vector when there are multiple outputs
4 Gradient w.r.t W (1) = δ1Z(1)
5 Update W (2) :=W (2) − ηδZ(2)
6 Update W (1) :=W (1) − ηδ1Z(1)
7 All the dimensionalities nicely check out!
Lecture 4 Backpropagation CMSC 35246
![Page 84: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/84.jpg)
Flowing Backward
1 Top: Compute δ
2 Gradient w.r.t W (2) = δZ(2)
3 Compute δ1 = (W (2)T δ)� (1− tanh(A(1))2)
Notes: (a): � is Hadamard product. (b) have written W (2)T δas δ can be a vector when there are multiple outputs
4 Gradient w.r.t W (1) = δ1Z(1)
5 Update W (2) :=W (2) − ηδZ(2)
6 Update W (1) :=W (1) − ηδ1Z(1)
7 All the dimensionalities nicely check out!
Lecture 4 Backpropagation CMSC 35246
![Page 85: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/85.jpg)
Flowing Backward
1 Top: Compute δ
2 Gradient w.r.t W (2) = δZ(2)
3 Compute δ1 = (W (2)T δ)� (1− tanh(A(1))2)
Notes: (a): � is Hadamard product. (b) have written W (2)T δas δ can be a vector when there are multiple outputs
4 Gradient w.r.t W (1) = δ1Z(1)
5 Update W (2) :=W (2) − ηδZ(2)
6 Update W (1) :=W (1) − ηδ1Z(1)
7 All the dimensionalities nicely check out!
Lecture 4 Backpropagation CMSC 35246
![Page 86: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/86.jpg)
Flowing Backward
1 Top: Compute δ
2 Gradient w.r.t W (2) = δZ(2)
3 Compute δ1 = (W (2)T δ)� (1− tanh(A(1))2)
Notes: (a): � is Hadamard product. (b) have written W (2)T δas δ can be a vector when there are multiple outputs
4 Gradient w.r.t W (1) = δ1Z(1)
5 Update W (2) :=W (2) − ηδZ(2)
6 Update W (1) :=W (1) − ηδ1Z(1)
7 All the dimensionalities nicely check out!
Lecture 4 Backpropagation CMSC 35246
![Page 87: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/87.jpg)
So Far
Backpropagation in the context of neural networks is all aboutassigning credit (or blame!) for error incurred to the weights
• We follow the path from the output (where we have anerror signal) to the edge we want to consider
• We find the δs from the top to the edge concerned byusing the chain rule
• Once we have the partial derivative, we can write theupdate rule for that weight
Lecture 4 Backpropagation CMSC 35246
![Page 88: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/88.jpg)
So Far
Backpropagation in the context of neural networks is all aboutassigning credit (or blame!) for error incurred to the weights
• We follow the path from the output (where we have anerror signal) to the edge we want to consider
• We find the δs from the top to the edge concerned byusing the chain rule
• Once we have the partial derivative, we can write theupdate rule for that weight
Lecture 4 Backpropagation CMSC 35246
![Page 89: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/89.jpg)
So Far
Backpropagation in the context of neural networks is all aboutassigning credit (or blame!) for error incurred to the weights
• We follow the path from the output (where we have anerror signal) to the edge we want to consider
• We find the δs from the top to the edge concerned byusing the chain rule
• Once we have the partial derivative, we can write theupdate rule for that weight
Lecture 4 Backpropagation CMSC 35246
![Page 90: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/90.jpg)
So Far
Backpropagation in the context of neural networks is all aboutassigning credit (or blame!) for error incurred to the weights
• We follow the path from the output (where we have anerror signal) to the edge we want to consider
• We find the δs from the top to the edge concerned byusing the chain rule
• Once we have the partial derivative, we can write theupdate rule for that weight
Lecture 4 Backpropagation CMSC 35246
![Page 91: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/91.jpg)
What did we miss?
Exercise: What if there are multiple outputs? (look at slidefrom last class)
Another exercise: Add bias neurons. What changes?
As we go down the network, notice that we need previous δs
If we recompute them each time, it can blow up!
Need to book-keep derivatives as we go down the network andreuse them
Lecture 4 Backpropagation CMSC 35246
![Page 92: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/92.jpg)
What did we miss?
Exercise: What if there are multiple outputs? (look at slidefrom last class)
Another exercise: Add bias neurons. What changes?
As we go down the network, notice that we need previous δs
If we recompute them each time, it can blow up!
Need to book-keep derivatives as we go down the network andreuse them
Lecture 4 Backpropagation CMSC 35246
![Page 93: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/93.jpg)
What did we miss?
Exercise: What if there are multiple outputs? (look at slidefrom last class)
Another exercise: Add bias neurons. What changes?
As we go down the network, notice that we need previous δs
If we recompute them each time, it can blow up!
Need to book-keep derivatives as we go down the network andreuse them
Lecture 4 Backpropagation CMSC 35246
![Page 94: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/94.jpg)
What did we miss?
Exercise: What if there are multiple outputs? (look at slidefrom last class)
Another exercise: Add bias neurons. What changes?
As we go down the network, notice that we need previous δs
If we recompute them each time, it can blow up!
Need to book-keep derivatives as we go down the network andreuse them
Lecture 4 Backpropagation CMSC 35246
![Page 95: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/95.jpg)
What did we miss?
Exercise: What if there are multiple outputs? (look at slidefrom last class)
Another exercise: Add bias neurons. What changes?
As we go down the network, notice that we need previous δs
If we recompute them each time, it can blow up!
Need to book-keep derivatives as we go down the network andreuse them
Lecture 4 Backpropagation CMSC 35246
![Page 96: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/96.jpg)
A General View of BackpropagationSome redundancy in upcoming slides, but redundancy can be good!
Lecture 4 Backpropagation CMSC 35246
![Page 97: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/97.jpg)
An Aside
Backpropagation only refers to the method for computing thegradient
This is used with another algorithm such as SGD for learningusing the gradient
Next: Computing gradient ∇xf(x, y) for arbitrary f
x is the set of variables whose derivatives are desired
Often we require the gradient of the cost J(θ) with respect toparameters θ i.e ∇θJ(θ)Note: We restrict to case where f has a single output
First: Move to more precise computational graph language!
Lecture 4 Backpropagation CMSC 35246
![Page 98: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/98.jpg)
An Aside
Backpropagation only refers to the method for computing thegradient
This is used with another algorithm such as SGD for learningusing the gradient
Next: Computing gradient ∇xf(x, y) for arbitrary f
x is the set of variables whose derivatives are desired
Often we require the gradient of the cost J(θ) with respect toparameters θ i.e ∇θJ(θ)Note: We restrict to case where f has a single output
First: Move to more precise computational graph language!
Lecture 4 Backpropagation CMSC 35246
![Page 99: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/99.jpg)
An Aside
Backpropagation only refers to the method for computing thegradient
This is used with another algorithm such as SGD for learningusing the gradient
Next: Computing gradient ∇xf(x, y) for arbitrary f
x is the set of variables whose derivatives are desired
Often we require the gradient of the cost J(θ) with respect toparameters θ i.e ∇θJ(θ)Note: We restrict to case where f has a single output
First: Move to more precise computational graph language!
Lecture 4 Backpropagation CMSC 35246
![Page 100: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/100.jpg)
An Aside
Backpropagation only refers to the method for computing thegradient
This is used with another algorithm such as SGD for learningusing the gradient
Next: Computing gradient ∇xf(x, y) for arbitrary f
x is the set of variables whose derivatives are desired
Often we require the gradient of the cost J(θ) with respect toparameters θ i.e ∇θJ(θ)Note: We restrict to case where f has a single output
First: Move to more precise computational graph language!
Lecture 4 Backpropagation CMSC 35246
![Page 101: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/101.jpg)
An Aside
Backpropagation only refers to the method for computing thegradient
This is used with another algorithm such as SGD for learningusing the gradient
Next: Computing gradient ∇xf(x, y) for arbitrary f
x is the set of variables whose derivatives are desired
Often we require the gradient of the cost J(θ) with respect toparameters θ i.e ∇θJ(θ)
Note: We restrict to case where f has a single output
First: Move to more precise computational graph language!
Lecture 4 Backpropagation CMSC 35246
![Page 102: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/102.jpg)
An Aside
Backpropagation only refers to the method for computing thegradient
This is used with another algorithm such as SGD for learningusing the gradient
Next: Computing gradient ∇xf(x, y) for arbitrary f
x is the set of variables whose derivatives are desired
Often we require the gradient of the cost J(θ) with respect toparameters θ i.e ∇θJ(θ)Note: We restrict to case where f has a single output
First: Move to more precise computational graph language!
Lecture 4 Backpropagation CMSC 35246
![Page 103: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/103.jpg)
An Aside
Backpropagation only refers to the method for computing thegradient
This is used with another algorithm such as SGD for learningusing the gradient
Next: Computing gradient ∇xf(x, y) for arbitrary f
x is the set of variables whose derivatives are desired
Often we require the gradient of the cost J(θ) with respect toparameters θ i.e ∇θJ(θ)Note: We restrict to case where f has a single output
First: Move to more precise computational graph language!
Lecture 4 Backpropagation CMSC 35246
![Page 104: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/104.jpg)
Computational Graphs
Formalize computation as graphs
Nodes indicate variables (scalar, vector, tensor or anothervariable)
Operations are simple functions of one or more variables
Our graph language comes with a set of allowable operations
Examples:
Lecture 4 Backpropagation CMSC 35246
![Page 105: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/105.jpg)
Computational Graphs
Formalize computation as graphs
Nodes indicate variables (scalar, vector, tensor or anothervariable)
Operations are simple functions of one or more variables
Our graph language comes with a set of allowable operations
Examples:
Lecture 4 Backpropagation CMSC 35246
![Page 106: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/106.jpg)
Computational Graphs
Formalize computation as graphs
Nodes indicate variables (scalar, vector, tensor or anothervariable)
Operations are simple functions of one or more variables
Our graph language comes with a set of allowable operations
Examples:
Lecture 4 Backpropagation CMSC 35246
![Page 107: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/107.jpg)
Computational Graphs
Formalize computation as graphs
Nodes indicate variables (scalar, vector, tensor or anothervariable)
Operations are simple functions of one or more variables
Our graph language comes with a set of allowable operations
Examples:
Lecture 4 Backpropagation CMSC 35246
![Page 108: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/108.jpg)
Computational Graphs
Formalize computation as graphs
Nodes indicate variables (scalar, vector, tensor or anothervariable)
Operations are simple functions of one or more variables
Our graph language comes with a set of allowable operations
Examples:
Lecture 4 Backpropagation CMSC 35246
![Page 109: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/109.jpg)
z = xy
x y
z
×
Graph uses × operation for the computation
Lecture 4 Backpropagation CMSC 35246
![Page 110: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/110.jpg)
Logistic Regression
x w
u1 u2
b
y
dot
+
σ
Computes y = σ(xTw + b)
Lecture 4 Backpropagation CMSC 35246
![Page 111: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/111.jpg)
H = max{0, XW + b}
X W
U1 U2
b
H
MM+
Rc
MM is matrix multiplication and Rc is ReLU activation
Lecture 4 Backpropagation CMSC 35246
![Page 112: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/112.jpg)
Back to backprop: Chain Rule
Backpropagation computes the chain rule, in a manner that ishighly efficient
Let f, g : R→ RSuppose y = g(x) and z = f(y) = f(g(x))
Chain rule:dz
dx=dz
dy
dy
dx
Lecture 4 Backpropagation CMSC 35246
![Page 113: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/113.jpg)
Back to backprop: Chain Rule
Backpropagation computes the chain rule, in a manner that ishighly efficient
Let f, g : R→ R
Suppose y = g(x) and z = f(y) = f(g(x))
Chain rule:dz
dx=dz
dy
dy
dx
Lecture 4 Backpropagation CMSC 35246
![Page 114: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/114.jpg)
Back to backprop: Chain Rule
Backpropagation computes the chain rule, in a manner that ishighly efficient
Let f, g : R→ RSuppose y = g(x) and z = f(y) = f(g(x))
Chain rule:dz
dx=dz
dy
dy
dx
Lecture 4 Backpropagation CMSC 35246
![Page 115: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/115.jpg)
Back to backprop: Chain Rule
Backpropagation computes the chain rule, in a manner that ishighly efficient
Let f, g : R→ RSuppose y = g(x) and z = f(y) = f(g(x))
Chain rule:dz
dx=dz
dy
dy
dx
Lecture 4 Backpropagation CMSC 35246
![Page 116: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/116.jpg)
x
y
z
dzdy
dydx
Chain rule:dz
dx=dz
dy
dy
dx
Lecture 4 Backpropagation CMSC 35246
![Page 117: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/117.jpg)
x
y1 y2
z
dy1dx
dy2dx
dzdy1
dzdy2
Multiple Paths:dz
dx=
dz
dy1
dy1dx
+dz
dy2
dy2dx
Lecture 4 Backpropagation CMSC 35246
![Page 118: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/118.jpg)
x
y1 y2
. . .yn
z
dy1dx
dy2dx
dyndx
dzdy1
dzdy2
dzdyn
Multiple Paths:dz
dx=∑j
dz
dyj
dyjdx
Lecture 4 Backpropagation CMSC 35246
![Page 119: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/119.jpg)
Chain Rule
Consider x ∈ Rm,y ∈ Rn
Let g : Rm → Rn and f : Rn → RSuppose y = g(x) and z = f(y), then
∂z
∂xi=∑j
∂z
∂yj
∂yj∂xi
In vector notation:∂z∂x1
...∂z∂xm
=
∑
j∂z∂yj
∂yj∂x1
...∑j∂z∂yj
∂yj∂xm
= ∇xz =
(∂y
∂x
)T∇yz
Lecture 4 Backpropagation CMSC 35246
![Page 120: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/120.jpg)
Chain Rule
Consider x ∈ Rm,y ∈ Rn
Let g : Rm → Rn and f : Rn → R
Suppose y = g(x) and z = f(y), then
∂z
∂xi=∑j
∂z
∂yj
∂yj∂xi
In vector notation:∂z∂x1
...∂z∂xm
=
∑
j∂z∂yj
∂yj∂x1
...∑j∂z∂yj
∂yj∂xm
= ∇xz =
(∂y
∂x
)T∇yz
Lecture 4 Backpropagation CMSC 35246
![Page 121: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/121.jpg)
Chain Rule
Consider x ∈ Rm,y ∈ Rn
Let g : Rm → Rn and f : Rn → RSuppose y = g(x) and z = f(y), then
∂z
∂xi=∑j
∂z
∂yj
∂yj∂xi
In vector notation:∂z∂x1
...∂z∂xm
=
∑
j∂z∂yj
∂yj∂x1
...∑j∂z∂yj
∂yj∂xm
= ∇xz =
(∂y
∂x
)T∇yz
Lecture 4 Backpropagation CMSC 35246
![Page 122: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/122.jpg)
Chain Rule
Consider x ∈ Rm,y ∈ Rn
Let g : Rm → Rn and f : Rn → RSuppose y = g(x) and z = f(y), then
∂z
∂xi=∑j
∂z
∂yj
∂yj∂xi
In vector notation:∂z∂x1
...∂z∂xm
=
∑
j∂z∂yj
∂yj∂x1
...∑j∂z∂yj
∂yj∂xm
= ∇xz =
(∂y
∂x
)T∇yz
Lecture 4 Backpropagation CMSC 35246
![Page 123: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/123.jpg)
Chain Rule
Consider x ∈ Rm,y ∈ Rn
Let g : Rm → Rn and f : Rn → RSuppose y = g(x) and z = f(y), then
∂z
∂xi=∑j
∂z
∂yj
∂yj∂xi
In vector notation:∂z∂x1
...∂z∂xm
=
∑
j∂z∂yj
∂yj∂x1
...∑j∂z∂yj
∂yj∂xm
= ∇xz =
(∂y
∂x
)T∇yz
Lecture 4 Backpropagation CMSC 35246
![Page 124: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/124.jpg)
Chain Rule
∇xz =
(∂y
∂x
)T∇yz
(∂y∂x
)is the n×m Jacobian matrix of g
Gradient of x is a multiplication of a Jacobian matrix(∂y∂x
)with a vector i.e. the gradient ∇yz
Backpropagation consists of applying such Jacobian-gradientproducts to each operation in the computational graph
In general this need not only apply to vectors, but can applyto tensors w.l.o.g
Lecture 4 Backpropagation CMSC 35246
![Page 125: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/125.jpg)
Chain Rule
∇xz =
(∂y
∂x
)T∇yz
(∂y∂x
)is the n×m Jacobian matrix of g
Gradient of x is a multiplication of a Jacobian matrix(∂y∂x
)with a vector i.e. the gradient ∇yz
Backpropagation consists of applying such Jacobian-gradientproducts to each operation in the computational graph
In general this need not only apply to vectors, but can applyto tensors w.l.o.g
Lecture 4 Backpropagation CMSC 35246
![Page 126: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/126.jpg)
Chain Rule
∇xz =
(∂y
∂x
)T∇yz
(∂y∂x
)is the n×m Jacobian matrix of g
Gradient of x is a multiplication of a Jacobian matrix(∂y∂x
)with a vector i.e. the gradient ∇yz
Backpropagation consists of applying such Jacobian-gradientproducts to each operation in the computational graph
In general this need not only apply to vectors, but can applyto tensors w.l.o.g
Lecture 4 Backpropagation CMSC 35246
![Page 127: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/127.jpg)
Chain Rule
∇xz =
(∂y
∂x
)T∇yz
(∂y∂x
)is the n×m Jacobian matrix of g
Gradient of x is a multiplication of a Jacobian matrix(∂y∂x
)with a vector i.e. the gradient ∇yz
Backpropagation consists of applying such Jacobian-gradientproducts to each operation in the computational graph
In general this need not only apply to vectors, but can applyto tensors w.l.o.g
Lecture 4 Backpropagation CMSC 35246
![Page 128: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/128.jpg)
Chain Rule
We can ofcourse also write this in terms of tensors
Let the gradient of z with respect to a tensor X be ∇Xz
If Y = g(X) and z = f(Y), then:
∇Xz =∑j
(∇XYj)∂z
∂Yj
Lecture 4 Backpropagation CMSC 35246
![Page 129: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/129.jpg)
Chain Rule
We can ofcourse also write this in terms of tensors
Let the gradient of z with respect to a tensor X be ∇Xz
If Y = g(X) and z = f(Y), then:
∇Xz =∑j
(∇XYj)∂z
∂Yj
Lecture 4 Backpropagation CMSC 35246
![Page 130: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/130.jpg)
Chain Rule
We can ofcourse also write this in terms of tensors
Let the gradient of z with respect to a tensor X be ∇Xz
If Y = g(X) and z = f(Y), then:
∇Xz =∑j
(∇XYj)∂z
∂Yj
Lecture 4 Backpropagation CMSC 35246
![Page 131: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/131.jpg)
Recursive Application in a ComputationalGraph
Writing an algebraic expression for the gradient of a scalarwith respect to any node in the computational graph thatproduced that scalar is straightforward using the chain-rule
Let for some node x the successors be: {y1, y2, . . . yn}Node: Computation result
Edge: Computation dependency
dz
dx=
n∑i=1
dz
dyi
dyidx
Lecture 4 Backpropagation CMSC 35246
![Page 132: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/132.jpg)
Recursive Application in a ComputationalGraph
Writing an algebraic expression for the gradient of a scalarwith respect to any node in the computational graph thatproduced that scalar is straightforward using the chain-rule
Let for some node x the successors be: {y1, y2, . . . yn}
Node: Computation result
Edge: Computation dependency
dz
dx=
n∑i=1
dz
dyi
dyidx
Lecture 4 Backpropagation CMSC 35246
![Page 133: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/133.jpg)
Recursive Application in a ComputationalGraph
Writing an algebraic expression for the gradient of a scalarwith respect to any node in the computational graph thatproduced that scalar is straightforward using the chain-rule
Let for some node x the successors be: {y1, y2, . . . yn}Node: Computation result
Edge: Computation dependency
dz
dx=
n∑i=1
dz
dyi
dyidx
Lecture 4 Backpropagation CMSC 35246
![Page 134: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/134.jpg)
Recursive Application in a ComputationalGraph
Writing an algebraic expression for the gradient of a scalarwith respect to any node in the computational graph thatproduced that scalar is straightforward using the chain-rule
Let for some node x the successors be: {y1, y2, . . . yn}Node: Computation result
Edge: Computation dependency
dz
dx=
n∑i=1
dz
dyi
dyidx
Lecture 4 Backpropagation CMSC 35246
![Page 135: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/135.jpg)
Flow Graph (for previous slide)
. . .
x
y1 y2 . . . yn
. . .
z
Lecture 4 Backpropagation CMSC 35246
![Page 136: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/136.jpg)
Recursive Application in a ComputationalGraph
Fpropagation: Visit nodes in the order after a topological sort
Compute the value of each node given its ancestors
Bpropagation: Output gradient = 1
Now visit nods in reverse order
Compute gradient with respect to each node using gradientwith respect to successors
Successors of x in previous slide {y1, y2, . . . yn}:
dz
dx=
n∑i=1
dz
dyi
dyidx
Lecture 4 Backpropagation CMSC 35246
![Page 137: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/137.jpg)
Recursive Application in a ComputationalGraph
Fpropagation: Visit nodes in the order after a topological sort
Compute the value of each node given its ancestors
Bpropagation: Output gradient = 1
Now visit nods in reverse order
Compute gradient with respect to each node using gradientwith respect to successors
Successors of x in previous slide {y1, y2, . . . yn}:
dz
dx=
n∑i=1
dz
dyi
dyidx
Lecture 4 Backpropagation CMSC 35246
![Page 138: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/138.jpg)
Recursive Application in a ComputationalGraph
Fpropagation: Visit nodes in the order after a topological sort
Compute the value of each node given its ancestors
Bpropagation: Output gradient = 1
Now visit nods in reverse order
Compute gradient with respect to each node using gradientwith respect to successors
Successors of x in previous slide {y1, y2, . . . yn}:
dz
dx=
n∑i=1
dz
dyi
dyidx
Lecture 4 Backpropagation CMSC 35246
![Page 139: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/139.jpg)
Recursive Application in a ComputationalGraph
Fpropagation: Visit nodes in the order after a topological sort
Compute the value of each node given its ancestors
Bpropagation: Output gradient = 1
Now visit nods in reverse order
Compute gradient with respect to each node using gradientwith respect to successors
Successors of x in previous slide {y1, y2, . . . yn}:
dz
dx=
n∑i=1
dz
dyi
dyidx
Lecture 4 Backpropagation CMSC 35246
![Page 140: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/140.jpg)
Recursive Application in a ComputationalGraph
Fpropagation: Visit nodes in the order after a topological sort
Compute the value of each node given its ancestors
Bpropagation: Output gradient = 1
Now visit nods in reverse order
Compute gradient with respect to each node using gradientwith respect to successors
Successors of x in previous slide {y1, y2, . . . yn}:
dz
dx=
n∑i=1
dz
dyi
dyidx
Lecture 4 Backpropagation CMSC 35246
![Page 141: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/141.jpg)
Automatic Differentiation
Computation of the gradient can be automatically inferredfrom the symbolic expression of fprop
Every node type needs to know:
• How to compute its output• How to compute its gradients with respect to its inputsgiven the gradient w.r.t its outputs
Makes for rapid prototyping
Lecture 4 Backpropagation CMSC 35246
![Page 142: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/142.jpg)
Automatic Differentiation
Computation of the gradient can be automatically inferredfrom the symbolic expression of fprop
Every node type needs to know:
• How to compute its output• How to compute its gradients with respect to its inputsgiven the gradient w.r.t its outputs
Makes for rapid prototyping
Lecture 4 Backpropagation CMSC 35246
![Page 143: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/143.jpg)
Automatic Differentiation
Computation of the gradient can be automatically inferredfrom the symbolic expression of fprop
Every node type needs to know:
• How to compute its output
• How to compute its gradients with respect to its inputsgiven the gradient w.r.t its outputs
Makes for rapid prototyping
Lecture 4 Backpropagation CMSC 35246
![Page 144: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/144.jpg)
Automatic Differentiation
Computation of the gradient can be automatically inferredfrom the symbolic expression of fprop
Every node type needs to know:
• How to compute its output• How to compute its gradients with respect to its inputsgiven the gradient w.r.t its outputs
Makes for rapid prototyping
Lecture 4 Backpropagation CMSC 35246
![Page 145: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/145.jpg)
Automatic Differentiation
Computation of the gradient can be automatically inferredfrom the symbolic expression of fprop
Every node type needs to know:
• How to compute its output• How to compute its gradients with respect to its inputsgiven the gradient w.r.t its outputs
Makes for rapid prototyping
Lecture 4 Backpropagation CMSC 35246
![Page 146: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/146.jpg)
Computational Graph for a MLP
Figure: Goodfellow et al.
To train we want to compute ∇W (1)J and ∇W (2)J
Two paths lead backwards from J to weights: Through crossentropy and through regularization cost
Lecture 4 Backpropagation CMSC 35246
![Page 147: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/147.jpg)
Computational Graph for a MLP
Figure: Goodfellow et al.
To train we want to compute ∇W (1)J and ∇W (2)J
Two paths lead backwards from J to weights: Through crossentropy and through regularization cost
Lecture 4 Backpropagation CMSC 35246
![Page 148: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/148.jpg)
Computational Graph for a MLP
Figure: Goodfellow et al.
To train we want to compute ∇W (1)J and ∇W (2)J
Two paths lead backwards from J to weights: Through crossentropy and through regularization cost
Lecture 4 Backpropagation CMSC 35246
![Page 149: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/149.jpg)
Computational Graph for a MLP
Figure: Goodfellow et al.
Weight decay cost is relatively simple: Will always contribute2λW (i) to gradient on W (i)
Two paths lead backwards from J to weights: Through crossentropy and through regularization cost
Lecture 4 Backpropagation CMSC 35246
![Page 150: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/150.jpg)
Computational Graph for a MLP
Figure: Goodfellow et al.
Weight decay cost is relatively simple: Will always contribute2λW (i) to gradient on W (i)
Two paths lead backwards from J to weights: Through crossentropy and through regularization cost
Lecture 4 Backpropagation CMSC 35246
![Page 151: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/151.jpg)
Symbol to Symbol
Figure: Goodfellow et al.
In this approach backpropagation never accesses anynumerical values
Instead it just adds nodes to the graph that describe how tocompute derivatives
A graph evaluation engine will then do the actual computation
Approach taken by Theano and TensorFlow
Lecture 4 Backpropagation CMSC 35246
![Page 152: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/152.jpg)
Symbol to Symbol
Figure: Goodfellow et al.
In this approach backpropagation never accesses anynumerical values
Instead it just adds nodes to the graph that describe how tocompute derivatives
A graph evaluation engine will then do the actual computation
Approach taken by Theano and TensorFlow
Lecture 4 Backpropagation CMSC 35246
![Page 153: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/153.jpg)
Symbol to Symbol
Figure: Goodfellow et al.
In this approach backpropagation never accesses anynumerical values
Instead it just adds nodes to the graph that describe how tocompute derivatives
A graph evaluation engine will then do the actual computation
Approach taken by Theano and TensorFlow
Lecture 4 Backpropagation CMSC 35246
![Page 154: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/154.jpg)
Symbol to Symbol
Figure: Goodfellow et al.
In this approach backpropagation never accesses anynumerical values
Instead it just adds nodes to the graph that describe how tocompute derivatives
A graph evaluation engine will then do the actual computation
Approach taken by Theano and TensorFlow
Lecture 4 Backpropagation CMSC 35246
![Page 155: Lecture 4 Backpropagation - University of Chicagoshubhendu/Pages/Files/Lecture4_pauses.pdfLecture 4 Backpropagation CMSC 35246: Deep Learning Shubhendu Trivedi & Risi Kondor University](https://reader034.vdocuments.site/reader034/viewer/2022042411/5f2960e029bc904287498937/html5/thumbnails/155.jpg)
Next time
Regularization Methods for Deep Neural Networks
Lecture 4 Backpropagation CMSC 35246