cs 7643: deep learning - georgia institute of …...cs 7643: deep learning dhruv batra georgia tech...
TRANSCRIPT
![Page 1: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/1.jpg)
CS 7643: Deep Learning
Dhruv Batra Georgia Tech
Topics: – Computational Graphs
– Notation + example– Computing Gradients
– Forward mode vs Reverse mode AD
![Page 2: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/2.jpg)
Administrativia• HW1 Released
– Due: 09/22
• PS1 Solutions– Coming soon
(C) Dhruv Batra 2
![Page 3: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/3.jpg)
Project• Goal
– Chance to try Deep Learning– Combine with other classes / research / credits / anything
• You have our blanket permission• Extra credit for shooting for a publication
– Encouraged to apply to your research (computer vision, NLP, robotics,…)
– Must be done this semester.
• Main categories– Application/Survey
• Compare a bunch of existing algorithms on a new application domain of your interest
– Formulation/Development• Formulate a new model or algorithm for a new or old problem
– Theory• Theoretically analyze an existing algorithm
(C) Dhruv Batra 3
![Page 4: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/4.jpg)
Administrativia• Project Teams Google Doc
– https://docs.google.com/spreadsheets/d/1AaXY0JE4lAbHvoDaWlc9zsmfKMyuGS39JAn9dpeXhhQ/edit#gid=0
– Project Title– 1-3 sentence project summary TL;DR– Team member names + GT IDs
(C) Dhruv Batra 4
![Page 5: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/5.jpg)
Recap of last time
(C) Dhruv Batra 5
![Page 6: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/6.jpg)
How do we compute gradients?• Manual Differentiation
• Symbolic Differentiation
• Numerical Differentiation
• Automatic Differentiation– Forward mode AD– Reverse mode AD
• aka “backprop”
(C) Dhruv Batra 6
![Page 7: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/7.jpg)
Any DAG of differentiable modules is allowed!
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 7
Computational Graph
![Page 8: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/8.jpg)
Directed Acyclic Graphs (DAGs)• Exactly what the name suggests
– Directed edges– No (directed) cycles– Underlying undirected cycles okay
(C) Dhruv Batra 8
![Page 9: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/9.jpg)
Directed Acyclic Graphs (DAGs)• Concept
– Topological Ordering
(C) Dhruv Batra 9
![Page 10: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/10.jpg)
Directed Acyclic Graphs (DAGs)
(C) Dhruv Batra 10
![Page 11: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/11.jpg)
Computational Graphs• Notation #1
(C) Dhruv Batra 11
f(x1, x2) = x1x2 + sin(x1)
![Page 12: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/12.jpg)
Computational Graphs• Notation #2
(C) Dhruv Batra 12
f(x1, x2) = x1x2 + sin(x1)
![Page 13: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/13.jpg)
Example
(C) Dhruv Batra 13
f(x1, x2) = x1x2 + sin(x1)
+
sin( )
x1 x2
*
![Page 14: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/14.jpg)
Logistic Regression as a Cascade
(C) Dhruv Batra 14
Given a library of simple functions
Compose into a
complicate function� log
✓1
1 + e�w
|x
◆
w
|x
Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 15: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/15.jpg)
Forward mode vs Reverse Mode• Key Computations
(C) Dhruv Batra 15
![Page 16: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/16.jpg)
16
g
Forward mode AD
![Page 17: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/17.jpg)
17
g
Reverse mode AD
![Page 18: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/18.jpg)
Example: Forward mode AD
(C) Dhruv Batra 18
f(x1, x2) = x1x2 + sin(x1)
+
sin( )
x1 x2
*
![Page 19: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/19.jpg)
(C) Dhruv Batra 19
+
sin( )
x1 x2
*
x1 x1
w1 = cos(x1)x1
x2
w2 = x1x2 + x1x2
w3 = w1 + w2
Example: Forward mode ADf(x1, x2) = x1x2 + sin(x1)
![Page 20: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/20.jpg)
(C) Dhruv Batra 20
+
sin( )
x1 x2
*
x1 x1
w1 = cos(x1)x1
x2
w2 = x1x2 + x1x2
w3 = w1 + w2
Example: Forward mode ADf(x1, x2) = x1x2 + sin(x1)
![Page 21: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/21.jpg)
Example: Reverse mode AD
(C) Dhruv Batra 21
f(x1, x2) = x1x2 + sin(x1)
+
sin( )
x1 x2
*
![Page 22: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/22.jpg)
(C) Dhruv Batra 22
Example: Reverse mode ADf(x1, x2) = x1x2 + sin(x1)
+
sin( )
x1 x2
*
w3 = 1
w1 = w3 w2 = w3
x1 = w1 cos(x1) x1 = w2x2 x2 = w2x1
![Page 23: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/23.jpg)
Forward Pass vs Forward mode AD vs Reverse Mode AD
(C) Dhruv Batra 23
+
sin( )
x1 x2
*
w3 = 1
w1 = w3 w2 = w3
x1 = w2x2 x2 = w2x1x1 = w1 cos(x1)
+
sin( )
x2
*
x1 x1 x2
w2 = x1x2 + x1x2
w3 = w1 + w2
w1 = cos(x1)x1
x1
+
sin( )
x1 x2
*
f(x1, x2) = x1x2 + sin(x1)
![Page 24: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/24.jpg)
Forward mode vs Reverse Mode• What are the differences?
• Which one is more memory efficient (less storage)? – Forward or backward?
(C) Dhruv Batra 24
![Page 25: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/25.jpg)
Forward mode vs Reverse Mode• What are the differences?
• Which one is more memory efficient (less storage)? – Forward or backward?
• Which one is faster to compute? – Forward or backward?
(C) Dhruv Batra 25
![Page 26: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/26.jpg)
Plan for Today• (Finish) Computing Gradients
– Forward mode vs Reverse mode AD– Patterns in backprop– Backprop in FC+ReLU NNs
• Convolutional Neural Networks
(C) Dhruv Batra 26
![Page 27: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/27.jpg)
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 28: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/28.jpg)
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 29: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/29.jpg)
add gate: gradient distributor
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 30: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/30.jpg)
add gate: gradient distributorQ: What is a max gate?
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 31: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/31.jpg)
add gate: gradient distributormax gate: gradient router
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 32: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/32.jpg)
add gate: gradient distributormax gate: gradient routerQ: What is a mul gate?
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 33: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/33.jpg)
add gate: gradient distributormax gate: gradient routermul gate: gradient switcher
Patterns in backward flow
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 34: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/34.jpg)
+
Gradients add at branches
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 35: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/35.jpg)
Duality in Fprop and Bprop
(C) Dhruv Batra 35
+
+
FPROP BPROPSUM
COPY
![Page 36: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/36.jpg)
36
Graph (or Net) object (rough psuedo code)
Modularized implementation: forward / backward API
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 37: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/37.jpg)
37
(x,y,z are scalars)
x
y
z*
Modularized implementation: forward / backward API
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 38: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/38.jpg)
38
(x,y,z are scalars)
x
y
z*
Modularized implementation: forward / backward API
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 39: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/39.jpg)
39
Example: Caffe layers
Caffe is licensed under BSD 2-Clause
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 40: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/40.jpg)
40
* top_diff (chain rule)
Caffe is licensed under BSD 2-Clause
Caffe Sigmoid Layer
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 41: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/41.jpg)
(C) Dhruv Batra 41
![Page 42: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/42.jpg)
(C) Dhruv Batra 42
![Page 43: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/43.jpg)
Key Computation in DL: Forward-Prop
(C) Dhruv Batra 43Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 44: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/44.jpg)
Key Computation in DL: Back-Prop
(C) Dhruv Batra 44Slide Credit: Marc'Aurelio Ranzato, Yann LeCun
![Page 45: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/45.jpg)
f(x) = max(0,x)(elementwise)
4096-d input vector
4096-d output vector
Jacobian of ReLU
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 46: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/46.jpg)
46
f(x) = max(0,x)(elementwise)
4096-d input vector
4096-d output vector
Q: what is the size of the Jacobian matrix?
Jacobian of ReLU
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 47: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/47.jpg)
47
f(x) = max(0,x)(elementwise)
4096-d input vector
4096-d output vector
Q: what is the size of the Jacobian matrix?[4096 x 4096!]
Jacobian of ReLU
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 48: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/48.jpg)
i.e. Jacobian would technically be a[409,600 x 409,600] matrix :\
f(x) = max(0,x)(elementwise)
4096-d input vector
4096-d output vector
Q: what is the size of the Jacobian matrix?[4096 x 4096!]
in practice we process an entire minibatch (e.g. 100) of examples at one time:
Jacobian of ReLU
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 49: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/49.jpg)
Q: what is the size of the Jacobian matrix?[4096 x 4096!]
Q2: what does it look like?
f(x) = max(0,x)(elementwise)
4096-d input vector
4096-d output vector
Jacobian of ReLU
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 50: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/50.jpg)
Jacobians of FC-Layer
(C) Dhruv Batra 50
![Page 51: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/51.jpg)
Jacobians of FC-Layer
(C) Dhruv Batra 51
![Page 52: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/52.jpg)
Jacobians of FC-Layer
(C) Dhruv Batra 52
![Page 53: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/53.jpg)
Convolutional Neural Networks(without the brain stuff)
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 54: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/54.jpg)
54
Example: 200x200 image40K hidden units
~2B parameters!!!
- Spatial correlation is local- Waste of resources + we have not enough training samples anyway..
Fully Connected Layer
Slide Credit: Marc'Aurelio Ranzato
![Page 55: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/55.jpg)
55
Example: 200x200 image40K hidden unitsFilter size: 10x10
4M parameters
Note: This parameterization is good when input image is registered (e.g., face recognition).
Locally Connected Layer
Slide Credit: Marc'Aurelio Ranzato
![Page 56: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/56.jpg)
56
STATIONARITY? Statistics is similar at different locations
Note: This parameterization is good when input image is registered (e.g., face recognition).
Example: 200x200 image40K hidden unitsFilter size: 10x10
4M parameters
Locally Connected Layer
Slide Credit: Marc'Aurelio Ranzato
![Page 57: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/57.jpg)
57
Share the same parameters across different locations (assuming input is stationary):Convolutions with learned kernels
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato
![Page 58: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/58.jpg)
Convolutions for mathematicians
(C) Dhruv Batra 58
![Page 59: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/59.jpg)
(C) Dhruv Batra 59
"Convolution of box signal with itself2" by Convolution_of_box_signal_with_itself.gif: Brian Ambergderivative work: Tinos (talk) - Convolution_of_box_signal_with_itself.gif. Licensed under CC BY-SA 3.0 via Commons -
https://commons.wikimedia.org/wiki/File:Convolution_of_box_signal_with_itself2.gif#/media/File:Convolution_of_box_signal_with_itself2.gif
![Page 60: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/60.jpg)
Convolutions for computer scientists
(C) Dhruv Batra 60
![Page 61: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/61.jpg)
Convolutions for programmers
(C) Dhruv Batra 61
![Page 62: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/62.jpg)
Convolution Explained• http://setosa.io/ev/image-kernels/
• https://github.com/bruckner/deepViz
(C) Dhruv Batra 62
![Page 63: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/63.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 63
![Page 64: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/64.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 64
![Page 65: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/65.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 65
![Page 66: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/66.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 66
![Page 67: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/67.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 67
![Page 68: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/68.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 68
![Page 69: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/69.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 69
![Page 70: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/70.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 70
![Page 71: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/71.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 71
![Page 72: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/72.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 72
![Page 73: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/73.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 73
![Page 74: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/74.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 74
![Page 75: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/75.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 75
![Page 76: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/76.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 76
![Page 77: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/77.jpg)
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 77
![Page 78: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/78.jpg)
Mathieu et al. “Fast training of CNNs through FFTs” ICLR 2014
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 78
![Page 79: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/79.jpg)
* -1 0 1-1 0 1-1 0 1
=
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 79
![Page 80: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/80.jpg)
Learn multiple filters.
E.g.: 200x200 image100 FiltersFilter size: 10x10
10K parameters
Convolutional Layer
Slide Credit: Marc'Aurelio Ranzato(C) Dhruv Batra 80
![Page 81: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/81.jpg)
30721
Fully Connected Layer32x32x3 image -> stretch to 3072 x 1
10 x 3072 weights
activationinput
110
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 82: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/82.jpg)
30721
Fully Connected Layer32x32x3 image -> stretch to 3072 x 1
10 x 3072 weights
activationinput
1 number: the result of taking a dot product between a row of W and the input (a 3072-dimensional dot product)
110
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 83: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/83.jpg)
83
Convolutional Layer
![Page 84: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/84.jpg)
84
Convolutional Layer
![Page 85: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/85.jpg)
32
32
3
Convolution Layer32x32x3 image -> preserve spatial structure
width
height
depth
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 86: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/86.jpg)
32
32
3
Convolution Layer
5x5x3 filter
32x32x3 image
Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 87: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/87.jpg)
32
32
3
Convolution Layer
5x5x3 filter
32x32x3 image
Convolve the filter with the imagei.e. “slide over the image spatially, computing dot products”
Filters always extend the full depth of the input volume
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 88: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/88.jpg)
32
32
3
Convolution Layer32x32x3 image5x5x3 filter
1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image(i.e. 5*5*3 = 75-dimensional dot product + bias)
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 89: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/89.jpg)
32
32
3
Convolution Layer32x32x3 image5x5x3 filter
convolve (slide) over all spatial locations
activation map
1
28
28
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 90: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/90.jpg)
32
32
3
Convolution Layer32x32x3 image5x5x3 filter
convolve (slide) over all spatial locations
activation maps
1
28
28
consider a second, green filter
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n
![Page 91: CS 7643: Deep Learning - Georgia Institute of …...CS 7643: Deep Learning Dhruv Batra Georgia Tech Topics: –Computational Graphs –Notation + example –Computing Gradients –Forward](https://reader030.vdocuments.site/reader030/viewer/2022040406/5ea11ff560c57b30fa37a334/html5/thumbnails/91.jpg)
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We stack these up to get a “new image” of size 28x28x6!
Slide Credit: Fei-Fei Li, Justin Johnson, Serena Yeung, CS 231n