neural nets for vision - nyu computer sciencefergus/teaching/vision/nnets.pdf- neural networks for...
TRANSCRIPT
![Page 1: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/1.jpg)
NEURAL NETS FOR VISIONCVPR 2012 Tutorial on Deep Learning
Part II
Marc'Aurelio Ranzato [email protected]/~ranzato
![Page 2: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/2.jpg)
2
Building an Object Recognition System
“CAR”
CLASSIFIERFEATURE
EXTRACTOR
IDEA: Use data to optimize features for the given task.
Ranzato
![Page 3: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/3.jpg)
3
Building an Object Recognition System
“CAR”
CLASSIFIER
What we want: Use parameterized function such that a) features are computed efficiently b) features can be trained efficiently
Ranzato
f X ;
![Page 4: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/4.jpg)
4
Building an Object Recognition System
“CAR”END-TO-END
RECOGNITIONSYSTEM
– Everything becomes adaptive. – No distiction between feature extractor and classifier.– Big non-linear system trained from raw pixels to labels.
Ranzato
![Page 5: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/5.jpg)
5
Building an Object Recognition System
“CAR”END-TO-END
RECOGNITIONSYSTEM
Q: How can we build such a highly non-linear system?
Ranzato
![Page 6: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/6.jpg)
6
Building an Object Recognition System
“CAR”END-TO-END
RECOGNITIONSYSTEM
Q: How can we build such a highly non-linear system?
Ranzato
A: By combining simple building blocks we can make more and more complex systems.
![Page 7: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/7.jpg)
7
Building A Complicated Function
Ranzato
sin x
cos x
log x
exp x
x3log cos exp sin3x
Simple Functions
One Example of Complicated Function
![Page 8: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/8.jpg)
8
Building A Complicated Function
Ranzato
sin x
cos x
log x
exp x
x3log cos exp sin3x
Simple Functions
– Function composition is at the core of deep learning methods. – Each “simple function” will have parameters subject to training.
One Example of Complicated Function
![Page 9: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/9.jpg)
9
Implementing A Complicated Function
Ranzato
log cos exp sin3x
Complicated Function
sin x x3 exp x cos x log x
![Page 10: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/10.jpg)
10
Intuition Behind Deep Neural Nets
“CAR”
Ranzato
![Page 11: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/11.jpg)
11
Intuition Behind Deep Neural Nets
“CAR”
Ranzato
NOTE: Each black box can have trainable parameters. Their composition makes a highly non-linear system.
![Page 12: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/12.jpg)
12
Intuition Behind Deep Neural Nets
“CAR”
Ranzato
NOTE: System produces a hierarchy of features.
Intermediate representations/features
![Page 13: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/13.jpg)
13
“CAR”
Ranzato
Q: What do the intermediate representations do?
Intuition Behind Deep Neural Nets
![Page 14: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/14.jpg)
14
Ranzato
“CAR”
Lee et al. “Convolutional DBN's for scalable unsup. learning...” ICML 2009
Intuition Behind Deep Neural Nets
![Page 15: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/15.jpg)
15
Ranzato
“CAR”
Lee et al. ICML 2009
Intuition Behind Deep Neural Nets
![Page 16: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/16.jpg)
16
Ranzato
“CAR”
Lee et al. ICML 2009
Intuition Behind Deep Neural Nets
![Page 17: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/17.jpg)
17
IDEA # 1Learn features from data
IDEA # 2Use differentiable functions that produce
features efficiently
IDEA # 3End-to-end learning:
no distinction between feature extractor and classifier
IDEA # 4“Deep” architectures:
cascade of simpler non-linear modules
KEY IDEAS OF NEURAL NETS
Ranzato
![Page 18: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/18.jpg)
18
KEY QUESTIONS
- What is the input-output mapping?
- How are parameters trained?
- How computational expensive is it?
- How well does it work?
Ranzato
![Page 19: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/19.jpg)
19
Outline- Neural Networks for Supervised Training
- Architecture- Loss function
- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:
- semi-supervised / multi-task / multi-modal
- Comparison to Other Methods- boosting & cascade methods- probabilistic models
- Large-Scale Learning with Deep Neural Nets
Ranzato
![Page 20: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/20.jpg)
20
Outline- Neural Networks for Supervised Training
- Architecture- Loss function
- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:
- semi-supervised / multi-task / multi-modal
- Comparison to Other Methods- boosting & cascade methods- probabilistic models
- Large-Scale Learning with Deep Neural Nets
Ranzato
![Page 21: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/21.jpg)
21
Linear Classifier: SVMInput:
Binary label:
Parameters:
Output prediction:
Loss:
X ∈RD
y
W ∈RD
W T X
Ranzato
L=12∥W∥
2max [0,1−W
TX y ]
L
W T X y
Hinge Loss
1
![Page 22: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/22.jpg)
22
Linear Classifier: Logistic Regression
Ranzato
L=12∥W∥
2 log 1exp −W
TX y
L
W T X y
Log Loss
X ∈RD
y
W ∈RD
W T X
Input:
Binary label:
Parameters:
Output prediction:
Loss:
![Page 23: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/23.jpg)
23
Logistic Regression: Probabilistic Interpretation
p y=1∣X =1
1e−WT X
Ranzato
L=− log p y∣X
Q: What is the gradient of w.r.t. ?L W
Input:
Binary label:
Parameters:
Output prediction:
Loss:
X ∈RD
y
W ∈RD
W T X
1
![Page 24: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/24.jpg)
24
Logistic Regression: Probabilistic Interpretation
p y=1∣X =1
1e−WT X
Ranzato
L= log 1exp −W T X y
Q: What is the gradient of w.r.t. ?L W
Input:
Binary label:
Parameters:
Output prediction:
Loss:
X ∈RD
y
W ∈RD
W T X
1
![Page 25: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/25.jpg)
25
Ranzato
sin x
cos x
log x
exp x
x3−log
1
1e−WT X
Simple Functions
Complicated Function
![Page 26: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/26.jpg)
26
Logistic Regression: Computing Loss
Ranzato
−log 1
1e−WT X
Complicated Function
W T X1
1e−u−log p
Lu p
![Page 27: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/27.jpg)
27
Chain Rule
Ranzato
dLdx
x y
Given and ,
What is ?
y x dL /dy
dL /dx
dLdy
![Page 28: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/28.jpg)
28
Chain Rule
Ranzato
dLdx
dLdy
x y
dLdx
=dLdy
⋅dydx
Given and ,
What is ?
y x dL /dy
dL /dx
![Page 29: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/29.jpg)
29
Chain Rule
Ranzato
dLdx
dLdy
x y
All needed information is local!
dLdx
=dLdy
⋅dydx
Given and ,
What is ?
y x dL /dy
dL /dx
![Page 30: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/30.jpg)
30
Ranzato
Logistic Regression: Computing Gradients
Ranzato
W T X1
1e−u−log p
Lu p
dLdp
X
−1p
![Page 31: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/31.jpg)
31
Ranzato
Logistic Regression: Computing Gradients
Ranzato
W T X1
1e−u−log p
Lu p
dpdu
dLdp
X
p1− p −1p
![Page 32: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/32.jpg)
32
Ranzato
Logistic Regression: Computing Gradients
Ranzato
W T X1
1e−u−log p
Lu p
dudW
dpdu
dLdp
dLdW
=dLdp
⋅dpdu
⋅dudW
= p−1 X
X
−1p
p1− pX
![Page 33: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/33.jpg)
33
Ranzato
What Did We Learn?
Ranzato
LogisticRegression
- Logistic Regression- How to compute gradients of complicated functions
![Page 34: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/34.jpg)
34
Ranzato
LogisticRegression
LogisticRegression
Neural Network
![Page 35: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/35.jpg)
35
Neural Network
Ranzato
– A neural net can be thought of as a stack of logistic regression classifiers. Each input is the output of the previous layer.
LogisticRegression
LogisticRegression
LogisticRegression
NOTE: intermediate units can be thought of as linear classifiers trained with implicit target values.
![Page 36: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/36.jpg)
36
Key Computations: F-Prop / B-Prop
F-PROP
X Z
Ranzato
![Page 37: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/37.jpg)
37
{∂Z∂ X
,∂Z∂
}
∂ L∂ X
∂ L∂Z
∂L∂
Ranzato
B-PROP
Key Computations: F-Prop / B-Prop
![Page 38: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/38.jpg)
38
Neural Net: Training
Ranzato
Layer 3Layer 2Layer 1
A) Compute loss on small mini-batch
![Page 39: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/39.jpg)
39
Neural Net: Training
Ranzato
Layer 3Layer 2Layer 1
F-PROP
A) Compute loss on small mini-batch
![Page 40: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/40.jpg)
40
Neural Net: Training
Ranzato
Layer 3Layer 2Layer 1
F-PROP
A) Compute loss on small mini-batch
![Page 41: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/41.jpg)
41
Neural Net: Training
Ranzato
Layer 3Layer 2Layer 1
F-PROP
A) Compute loss on small mini-batch
![Page 42: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/42.jpg)
42
Neural Net: Training
Ranzato
Layer 3Layer 2Layer 1
A) Compute loss B) Compute gradient w.r.t. parameters
B-PROP
A) Compute loss on small mini-batch
![Page 43: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/43.jpg)
43
Neural Net: Training
Ranzato
Layer 3Layer 2Layer 1
B-PROP
A) Compute loss B) Compute gradient w.r.t. parametersA) Compute loss on small mini-batch
![Page 44: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/44.jpg)
44
Neural Net: Training
Ranzato
Layer 3Layer 2Layer 1
B-PROP
A) Compute loss B) Compute gradient w.r.t. parametersA) Compute loss on small mini-batch
![Page 45: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/45.jpg)
45
Neural Net: Training
Ranzato
Layer 3Layer 2Layer 1
A) Compute loss B) Compute gradient w.r.t. parametersC) Use gradient to update parameters −
dLd
A) Compute loss on small mini-batch
![Page 46: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/46.jpg)
46
NEURAL NET: ARCHITECTURE
Ranzato
W j∈RM×N , b j∈R
N
h j h j1h j1=W j1
T h jb j1
h j∈RM , h j1∈R
N
![Page 47: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/47.jpg)
47
Ranzato
h j h j1h j1=W j1
T h jb j1
x =1
1e−x
NEURAL NET: ARCHITECTURE
x
x
![Page 48: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/48.jpg)
48
Ranzato
NEURAL NET: ARCHITECTURE
x = tanh x
h j h j1h j1=W j1
T h jb j1
x
x
![Page 49: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/49.jpg)
49
is equivalent to
Ranzato
Graphical Notations
f X ;W
W
X h
hk is called feature, hidden unit, neuron or code unit
![Page 50: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/50.jpg)
50
MOST COMMON ARCHITECTURE
yX
Errory
NOTE: Multi-layer neural nets with more than two layers are nowadays called deep nets!!
Ranzato
![Page 51: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/51.jpg)
51
NOTE: User must specify number of layers, number of hidden units, type of layers and loss function.
Ranzato
MOST COMMON ARCHITECTURE
yX
Errory
NOTE: Multi-layer neural nets with more than two layers are nowadays called deep nets!!
![Page 52: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/52.jpg)
52
MOST COMMON LOSSES
L=12∑i=1
N y i− y i
2
Square Euclidean Distance (regression):
Ranzato
yX
Errory
y , y∈RN
![Page 53: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/53.jpg)
53
Cross Entropy (classification):
L=−∑i=1
Ny i log y i
Ranzato
MOST COMMON LOSSES
yX
Errory
y , y∈[0,1]N , ∑i=1
Ny i=1, ∑i=1
Ny i=1
![Page 54: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/54.jpg)
54
NOTE: User specifies loss based on the task.
Ranzato
MOST COMMON LOSSES
yX
Errory
![Page 55: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/55.jpg)
55
Algorithm (SGD):Given a small mini-batch- FPROP- BPROP- PARAMETER UPDATE −
∂L∂
Ranzato
TRAINING
yX
Errory
![Page 56: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/56.jpg)
56
NOTES– User chooses optimization algorithm.– Computational cost of F-PROP & B-PROP is similar.
Ranzato
TRAINING
yX
Errory
![Page 57: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/57.jpg)
58
TOY EXAMPLE: SYNTHETIC DATA1 input & 1 output100 hidden units in each layer
Ranzato
![Page 58: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/58.jpg)
59
1 input & 1 output3 hidden layers
Ranzato
TOY EXAMPLE: SYNTHETIC DATA
![Page 59: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/59.jpg)
60
1 input & 1 output3 hidden layers, 1000 hiddens Regression of cosine
Ranzato
TOY EXAMPLE: SYNTHETIC DATA
![Page 60: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/60.jpg)
61
1 input & 1 output3 hidden layers, 1000 hiddens Regression of cosine
Ranzato
TOY EXAMPLE: SYNTHETIC DATA
![Page 61: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/61.jpg)
62
TOY EXAMPLE: MNIST
Linear Classifier..................................................................12.0%
Boosted stumps...................................................................7.7%Product of boosted stumps..................................................1.3%Boosted trees.......................................................................1.5% Stumps on Haar features.....................................................0.9%
K-NN.....................................................................................5.0%
SVM Gaussian kernel...........................................................1.4%
2 layer nnet 800 hiddens......................................................1.6%4 layer nnet (pre-trained)......................................................1.0% Conv. Net (pre-trained).........................................................0.6%
http://yann.lecun.com/exdb/mnist/ Ranzato
![Page 62: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/62.jpg)
63
Outline- Neural Networks for Supervised Training
- Architecture- Loss function
- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:
- semi-supervised / multi-task / multi-modal
- Comparison to Other Methods- boosting & cascade methods- probabilistic models
- Large-Scale Learning with Deep Neural Nets
Ranzato
![Page 63: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/63.jpg)
64
Ranzato
FULLY CONNECTED NEURAL NET
![Page 64: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/64.jpg)
65
Example: 1000x1000 image 1M hidden units
1B parameters!!!
- Spatial correlation is local- Better to put resources elsewhere!
Ranzato
FULLY CONNECTED NEURAL NET
![Page 65: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/65.jpg)
66
LOCALLY CONNECTED NEURAL NET
Example: 1000x1000 image 1M hidden units Filter size: 10x10
10M parameters
Ranzato
![Page 66: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/66.jpg)
67
LOCALLY CONNECTED NEURAL NET
Ranzato
Example: 1000x1000 image 1M hidden units Filter size: 10x10
10M parameters
![Page 67: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/67.jpg)
68
LOCALLY CONNECTED NEURAL NET
Ranzato
STATIONARITY? Statistics is similar at different locations
Example: 1000x1000 image 1M hidden units Filter size: 10x10
10M parameters
![Page 68: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/68.jpg)
69
CONVOLUTIONAL NET
Share the same parameters across different locations: Convolutions with learned kernels
Ranzato
![Page 69: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/69.jpg)
70
Learn multiple filters.
E.g.: 1000x1000 image 100 Filters Filter size: 10x10
10K parameters
Ranzato
CONVOLUTIONAL NET
![Page 70: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/70.jpg)
71
NEURAL NETS FOR VISION
A standard neural net applied to images:- scales quadratically with the size of the input- does not leverage stationarity
Solution:- connect each hidden unit to a small patch of the input- share the weight across hidden unitsThis is called: convolutional network.
LeCun et al. “Gradient-based learning applied to document recognition” IEEE 1998
Ranzato
![Page 71: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/71.jpg)
72
Let us assume filter is an “eye” detector.
Q.: how can we make the detection robust to the exact location of the eye?
Ranzato
CONVOLUTIONAL NET
![Page 72: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/72.jpg)
73
By “pooling” (e.g., max or average) filterresponses at different locations we gainrobustness to the exact spatial locationof features.
Ranzato
CONVOLUTIONAL NET
![Page 73: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/73.jpg)
74
CONV NETS: EXTENSIONSOver the years, some new modules have proven to be very effective when plugged into conv-nets:
- L2 Pooling
- Local Contrast Normalization
h i1, x , y=∑ j , k ∈N x , y h i , j , k2
h i1, x , y=hi , x , y−mi , N x , y
i , N x , y
layer i1layer i
x , yN x , y
layer i1layer i
x , yN x , y
Jarrett et al. “What is the best multi-stage architecture for object recognition?” ICCV 2009 Ranzato
![Page 74: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/74.jpg)
75
CONV NETS: L2 POOLING
-1 0 0 0 0
+1
Kavukguoglu et al. “Learning invariant features ...” CVPR 2009 Ranzato
h i h i1∑i=1
5⋅i
2
![Page 75: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/75.jpg)
76
CONV NETS: L2 POOLING
0 0 0 0+1
+1
Kavukguoglu et al. “Learning invariant features ...” CVPR 2009 Ranzato
h i h i1∑i=1
5⋅i
2
![Page 76: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/76.jpg)
77
LOCAL CONTRAST NORMALIZATION
h i1, x , y=hi , x , y−mi , N x , y
i , N x , y
0
1
0
-1
0
0
0.5
0
-0.5
0
LCN
Ranzato
![Page 77: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/77.jpg)
78
1
11
1
-9
1
LCN
0
0.5
0
-0.5
0
Ranzato
h i1, x , y=hi , x , y−mi , N x , y
i , N x , y
LOCAL CONTRAST NORMALIZATION
![Page 78: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/78.jpg)
79
L2 Pooling & Local Contrast Normalizationhelp learning more invariant representations!
CONV NETS: EXTENSIONS
Ranzato
![Page 79: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/79.jpg)
80
CONV NETS: TYPICAL ARCHITECTURE
Filtering Pooling LCN
One stage (zoom)
LinearLayer
Whole system
1st stage 2nd stage 3rd stage
Input Image
ClassLabels
Ranzato
![Page 80: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/80.jpg)
81
CONV NETS: TRAINING
Algorithm:Given a small mini-batch- FPROP- BPROP- PARAMETER UPDATE
Since convolutions and sub-sampling are differentiable, we can use standard back-propagation.
Ranzato
![Page 81: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/81.jpg)
82
CONV NETS: EXAMPLES
- Object category recognition Boureau et al. “Ask the locals: multi-way local pooling for image recognition” ICCV 2011- Segmentation Turaga et al. “Maximin learning of image segmentation” NIPS 2009- OCR Ciresan et al. “MCDNN for Image Classification” CVPR 2012- Pedestrian detection Kavukcuoglu et al. “Learning convolutional feature hierarchies for visual recognition” NIPS 2010- Robotics Sermanet et al. “Mapping and planning..with long range perception” IROS 2008
Ranzato
![Page 82: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/82.jpg)
83
CONV NETS: LIMITATIONS
- requires lots of labeled data to train
- difficult optimization
- scalability
Ranzato
![Page 83: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/83.jpg)
84
LIMITATIONS & SOLUTIONS
- requires lots of labeled data to train
- difficult optimization
- scalability
Ranzato
+ unsupervised learning
+ layer-wise training
+ distributed training
![Page 84: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/84.jpg)
85
Outline- Neural Networks for Supervised Training
- Architecture- Loss function
- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:
- semi-supervised / multi-task / multi-modal
- Comparison to Other Methods- boosting & cascade methods- probabilistic models
- Large-Scale Learning with Deep Neural Nets
Ranzato
![Page 85: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/85.jpg)
86
BACK TO LOGISTIC REGRESSION
Ranzato
Error
input
target
prediction
![Page 86: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/86.jpg)
87
Unsupervised Learning
Ranzato
Error
input prediction
?
![Page 87: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/87.jpg)
88
Unsupervised Learning
Ranzato
Q: How should we train the input-output mapping if we do not have target values?
A: Code has to retain information from the input but only if this is similar to training samples.
By better representing only those inputs that are similar to training samples we hope to extract interesting structure (e.g., structure of manifold where data live).
input code
![Page 88: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/88.jpg)
89
Unsupervised Learning
Ranzato
Q: How to constrain the model to represent training samples better than other data points?
![Page 89: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/89.jpg)
90
Unsupervised Learning
Ranzato
– reconstruct the input from the code & make code compact (auto-encder with bottle-neck).
– reconstruct the input from the code & make code sparse (sparse auto-encoders)
– add noise to the input or code (denoising auto-encoders)
– make sure that the model defines a distribution that normalizes to 1 (RBM).
see work in LeCun, Ng, Fergus, Lee, Yu's labs
see work in Y. Bengio, Lee's lab
see work in Y. Bengio, Hinton, Lee, Salakthudinov's lab
![Page 90: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/90.jpg)
91
AUTO-ENCODERS NEURAL NETS
Ranzato
encoder
Error
input predictiondecoder
code
– input higher dimensional than code - error: ||prediction - input|| - training: back-propagation
2
![Page 91: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/91.jpg)
92
SPARSE AUTO-ENCODERS
Ranzato
encoder
Error
input prediction
code
– sparsity penalty: ||code||- error: ||prediction - input||
- training: back-propagation
SparsityPenalty
12
- loss: sum of square reconstruction error and sparsity
decoder
![Page 92: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/92.jpg)
93
SPARSE AUTO-ENCODERS
Ranzato
encoder
Error
input prediction
code
– input: code:
- loss:
SparsityPenalty
Le et al. “ICA with reconstruction cost..” NIPS 2011
h=W T XX
L X ;W =∥W h−X∥2∑ j∣h j∣
decoder
![Page 93: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/93.jpg)
94
How To Use Unsupervised Learning
Ranzato
1) Given unlabeled data, learn features
![Page 94: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/94.jpg)
95
How To Use Unsupervised Learning
Ranzato
1) Given unlabeled data, learn features2) Use encoder to produce features and train another layer on the top
![Page 95: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/95.jpg)
96
How To Use Unsupervised Learning
Ranzato
Layer-wise training of a feature hierarchy
1) Given unlabeled data, learn features2) Use encoder to produce features and train another layer on the top
![Page 96: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/96.jpg)
97
How To Use Unsupervised Learning
Ranzato
1) Given unlabeled data, learn features2) Use encoder to produce features and train another layer on the top3) feed features to classifier & train just the classifier
Reduced overfitting since features are learned in unsupervised way!
input
label
prediction
![Page 97: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/97.jpg)
98
How To Use Unsupervised Learning
Ranzato
1) Given unlabeled data, learn features2) Use encoder to produce features and train another layer on the top3) feed features to classifier & jointly train the whole system
Given enough data, this usually yields the best results: end-to-end learning!
input
label
prediction
![Page 98: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/98.jpg)
99
Visualizing Learned Features
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
Q: can we interpret the learned features?
![Page 99: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/99.jpg)
100
Visualizing Learned Features
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
1st layer features
2nd layer features
Q: how are these images computed?
![Page 100: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/100.jpg)
101
Visualizing Learned Features
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
W h=W 1h1W 2h2reconstruction:
0.9 + 0.7 + 0.5 + 1.0 + ...=≈
Columns of show what each code unit represents.W
![Page 101: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/101.jpg)
102
Visualizing Learned Features
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
1st layer features
![Page 102: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/102.jpg)
103
Visualizing Learned Features
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
Q: How about the second layer features?
A: Similarly, each second layer code unit can be visualized by taking its bases and then projecting those bases in image space through the first layer decoder.
![Page 103: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/103.jpg)
104
Visualizing Learned Features
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
Q: How about the second layer features?
Missing edges have 0 weight.Light gray nodes have zero value.
A: Similarly, each second layer code unit can be visualized by taking its bases and then projecting those bases in image space through the first layer decoder.
![Page 104: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/104.jpg)
105
Example of Feature Learning
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007
1st layer features
2nd layer features
Ranzato
![Page 105: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/105.jpg)
106
Outline- Neural Networks for Supervised Training
- Architecture- Loss function
- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:
- semi-supervised / multi-task / multi-modal
- Comparison to Other Methods- boosting & cascade methods- probabilistic models
- Large-Scale Learning with Deep Neural Nets
Ranzato
![Page 106: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/106.jpg)
107
Semi-Supervised Learning
Ranzato
airplane
truck
deer
frog
bird
![Page 107: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/107.jpg)
108
Semi-Supervised Learning
Ranzato
airplane
truck
deer
frog
bird
LOTS & LOTS OF UNLABELED DATA!!!
![Page 108: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/108.jpg)
109
Ranzato
Semi-Supervised Learning
input
prediction
Weston et al. “Deep learning via semi-supervised embedding” ICML 2008
label
Loss = supervised_error + unsupervised_error
![Page 109: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/109.jpg)
110
Multi-Task Learning
Ranzato
Face detection is hard because of lighting, pose, but also occluding goggles.
Face detection could made be easier by face identification.
The identification task may help the detection task.
![Page 110: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/110.jpg)
111
Multi-Task Learning- Easy to add many error terms to loss function.- Joint learning of related tasks yields better representations.
Example of architecture:
Collobert et al. “NLP (almost) from scratch” JMLR 2011 Ranzato
![Page 111: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/111.jpg)
112
Multi-Modal Learning
Ranzato
Audio and Video streams are often complimentary to each other.
E.g., audio can provide important clues to improve visual recognition, and vice versa.
![Page 112: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/112.jpg)
113
Multi-Modal Learning
- Weak assumptions on input distribution- Fully adaptive to data
Ngiam et al. “Multi-modal deep learning” ICML 2011 Ranzato
Example of architecture:modality #1
modality #2
modality #3
![Page 113: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/113.jpg)
114
Outline- Neural Networks for Supervised Training
- Architecture- Loss function
- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:
- semi-supervised / multi-task / multi-modal
- Comparison to Other Methods- boosting & cascade methods- probabilistic models
- Large-Scale Learning with Deep Neural Nets
Ranzato
![Page 114: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/114.jpg)
115
Boosting & Forests
Deep Nets:- single highly non-linear system- “deep” stack of simpler modules- all parameters are subject to learning
Boosting & Forests:- sequence of “weak” (simple) classifiers that are linearly combined to produce a powerful classifier- subsequent classifiers do not exploit representations of earlier classifiers, it's a “shallow” linear mixture
- typically features are not learned Ranzato
![Page 115: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/115.jpg)
116
Properties Deep Nets Boosting
Hierarchical features
Easy to parallelize
End-to-end learning
Fast training
Fast at test time
Leverage unlab. data
Adaptive features
Ranzato
![Page 116: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/116.jpg)
117
Deep Neural-Nets VS Probabilistic ModelsDeep Neural Nets:- mean-field approximations of intractable probabilistic models- usually more efficient- typically more unconstrained (partition function has to be replaced by other constraints, e.g. sparsity).
Hierarchical Probabilistic Models (DBN, DBM, etc.):- in the most interesting cases, they are intractable- they better deal with uncertainty- they can be easily combined
Ranzato
![Page 117: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/117.jpg)
118
Example: Auto-Encoder
E [Z∣X ]= W T Xbe
Neural Net:
Probabilistic Model (Gaussian RBM):
X=W d Zbd
Z= W eT Xbe
E [ X ∣Z ]=W Zbd
Ranzato
code
reconstruction
![Page 118: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/118.jpg)
119
Properties Deep Nets Probab. Models
Hierarchical features
Models uncertainty
End-to-end learning
Fast training
Fast at test time
Leverage unlab. data
Adaptive features
Ranzato
![Page 119: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/119.jpg)
120
Outline- Neural Networks for Supervised Training
- Architecture- Loss function
- Neural Networks for Vision: Convolutional & Tiled- Unsupervised Training of Neural Networks- Extensions:
- semi-supervised / multi-task / multi-modal
- Comparison to Other Methods- boosting & cascade methods- probabilistic models
- Large-Scale Learning with Deep Neural Nets
Ranzato
![Page 120: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/120.jpg)
121
Tera-Scale Deep Learning @ Google
Observation #1: more features always improve performance unless data is scarce.
Observation #2: deep learning methods have higher capacity and have the potential to model data better.
Q #1: Given lots of data and lots of machines, can we scale up deep learning methods?
Q #2: Will deep learning methods perform much better?
Ranzato
![Page 121: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/121.jpg)
122
The Challenge
– best optimizer in practice is on-line SGD which is naturally sequential, hard to parallelize.
– layers cannot be trained independently and in parallel, hard to distribute
– model can have lots of parameters that may clog the network, hard to distribute across machines
A Large Scale problem has: – lots of training samples (>10M)– lots of classes (>10K) and – lots of input dimensions (>10K).
Ranzato
![Page 122: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/122.jpg)
123
Our Solution
input
1st layer
2nd layer
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
![Page 123: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/123.jpg)
124
1st machine
2nd
machine3rd
machine
Ranzato
Our Solution
input
1st layer
2nd layer
![Page 124: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/124.jpg)
125
MODEL PARALLELISM
Ranzato
Our Solution
1st machine
2nd
machine3rd
machine
![Page 125: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/125.jpg)
126
Distributed Deep Nets
Deep Net MODEL PARALLELISM
input #1
RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012
![Page 126: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/126.jpg)
127
Distributed Deep Nets
MODEL PARALLELISM
+DATA
PARALLELISM
RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012
input #1input #2
input #3
![Page 127: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/127.jpg)
128
Asynchronous SGD
PARAMETER SERVER
1st replica 2nd replica 3rd replicaRanzato
![Page 128: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/128.jpg)
129
Asynchronous SGD
PARAMETER SERVER
1st replica 2nd replica 3rd replicaRanzato
∂ L∂1
![Page 129: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/129.jpg)
130
Asynchronous SGD
PARAMETER SERVER
1st replica 2nd replica 3rd replicaRanzato
1
![Page 130: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/130.jpg)
131
Asynchronous SGD
PARAMETER SERVER(update parameters)
1st replica 2nd replica 3rd replicaRanzato
![Page 131: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/131.jpg)
132
Asynchronous SGD
PARAMETER SERVER
1st replica 2nd replica 3rd replicaRanzato
∂ L∂2
![Page 132: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/132.jpg)
133
Asynchronous SGD
PARAMETER SERVER
1st replica 2nd replica 3rd replicaRanzato
2
![Page 133: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/133.jpg)
134
Asynchronous SGD
PARAMETER SERVER(update parameters)
1st replica 2nd replica 3rd replicaRanzato
![Page 134: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/134.jpg)
135
Unsupervised Learning With 1B Parameters
DATA: 10M youtube (unlabeled) frames of size 200x200.
RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012
![Page 135: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/135.jpg)
136
Deep Net:– 3 stages
– each stage consists of local filtering, L2 pooling, LCN - 18x18 filters- 8 filters at each location- L2 pooling and LCN over 5x5 neighborhoods
– training jointly the three layers by:- reconstructing the input of each layer- sparsity on the code
RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012
Unsupervised Learning With 1B Parameters
![Page 136: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/136.jpg)
137
Deep Net:– 3 stages
– each stage consists of local filtering, L2 pooling, LCN - 18x18 filters- 8 filters at each location- L2 pooling and LCN over 5x5 neighborhoods
– training jointly the three layers by:- reconstructing the input of each layer- sparsity on the code
RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012
Unsupervised Learning With 1B Parameters
1B parameters!!!
![Page 137: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/137.jpg)
138
Validating Unsupervised Learning
RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012
The network has seen lots of objects during training, but without any label.
Q.: how can we validate unsupervised learning?
Q.: Did the network form any high-level representation? E.g., does it have any neuron responding for faces?
– build validation set with 50% faces, 50% random images- study properties of neurons
![Page 138: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/138.jpg)
139
Validating Unsupervised Learning
1st stage 2nd stage 3rd stage
neu
ron
respo
nses
RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012
![Page 139: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/139.jpg)
140
Top Images For Best Face Neuron
Ranzato
![Page 140: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/140.jpg)
141
Best Input For Face Neuron
RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012
![Page 141: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/141.jpg)
142
Unsupervised + Supervised (ImageNet)
1st stage 2nd stage 3rd stage
Input Image y
COST
y
RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012
![Page 142: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/142.jpg)
143
Object Recognition on ImageNet
IMAGENET v.2011 (16M images, 20K categories)
METHOD ACCURACY %
Weston & Bengio 2011
Deep Net (from random)
Deep Net (from unsup.)
9.3
13.6
15.8
Linear Classifier on deep features 13.1
RanzatoLe et al. “Building high-level features using large-scale unsupervised learning” ICML 2012
![Page 143: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/143.jpg)
144
Top Inputs After Supervision
Ranzato
![Page 144: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/144.jpg)
145
Ranzato
Top Inputs After Supervision
![Page 145: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/145.jpg)
146
Experiments: and many more...
- automatic speech recognition
- natural language processing
- biomed applications
- finance
Generic learning algorithm!!
Ranzato
![Page 146: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/146.jpg)
147
ReferencesTutorials & Background Material
Ranzato
– Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, 2(1), pp.1-127, 2009.– LeCun, Chopra, Hadsell, Ranzato, Huang: A Tutorial on Energy-Based Learning, in Bakir, G. and Hofman, T. and Schölkopf, B. and Smola, A. and Taskar, B. (Eds), Predicting Structured Data, MIT Press, 2006
Convolutional Nets– LeCun, Bottou, Bengio and Haffner: Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998– Jarrett, Kavukcuoglu, Ranzato, LeCun: What is the Best Multi-Stage Architecture for Object Recognition?, Proc. International Conference on Computer Vision (ICCV'09), IEEE, 2009- Kavukcuoglu, Sermanet, Boureau, Gregor, Mathieu, LeCun: Learning Convolutional Feature Hierachies for Visual Recognition, Advances in Neural Information Processing Systems (NIPS 2010), 23, 2010
![Page 147: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/147.jpg)
148
Unsupervised Learning
Ranzato
– ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning. Le, Karpenko, Ngiam, Ng. In NIPS*2011– Rifai, Vincent, Muller, Glorot, Bengio, Contracting Auto-Encoders: Explicit invariance during feature extraction, in: Proceedings of the Twenty-eight International Conference on Machine Learning (ICML'11), 2011- Vincent, Larochelle, Lajoie, Bengio, Manzagol, Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion, Journal of Machine Learning Research, 11:3371--3408, 2010.- Gregor, Szlam, LeCun: Structured Sparse Coding via Lateral Inhibition, Advances in Neural Information Processing Systems (NIPS 2011), 24, 2011- Kavukcuoglu, Ranzato, LeCun. "Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition". ArXiv 1010.3467 2008
- Hinton, Krizhevsky, Wang, Transforming Auto-encoders, ICANN, 2011
Multi-modal Learning– Multimodal deep learning, Ngiam, Khosla, Kim, Nam, Lee, Ng. In Proceedings of the Twenty-Eighth International Conference on Machine Learning, 2011.
![Page 148: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/148.jpg)
149
– Gregor, LeCun “Emergence of complex-like cells in a temporal product network with local receptive fields” Arxiv. 2009– Ranzato, Mnih, Hinton “Generating more realistic images using gated MRF's” NIPS 2010– Le, Ngiam, Chen, Chia, Koh, Ng “Tiled convolutional neural networks” NIPS 2010
Locally Connected Nets
Ranzato
Distributed Learning– Le, Ranzato, Monga, Devin, Corrado, Chen, Dean, Ng. "Building High-Level Features Using Large Scale Unsupervised Learning". International Conference of Machine Learning (ICML 2012), Edinburgh, 2012.
Papers on Scene Parsing– Farabet, Couprie, Najman, LeCun, “Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers”, in Proc. of the International Conference on Machine Learning (ICML'12), Edinburgh, Scotland, 2012.Papers on Segmentation– Turaga, Briggman, Helmstaedter, Denk, Seung Maximin learning of image segmentation. NIPS, 2009.
– Farabet, Couprie, Najman, LeCun, “Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers”, in Proc. of the International Conference on Machine Learning (ICML'12), Edinburgh, Scotland, 2012.
![Page 149: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/149.jpg)
150
Papers on Object Recognition
Ranzato
- Boureau, Le Roux, Bach, Ponce, LeCun: Ask the locals: multi-way local pooling for image recognition, Proc. International Conference on Computer Vision 2011- Sermanet, LeCun: Traffic Sign Recognition with Multi-Scale Convolutional Networks, Proceedings of International Joint Conference on Neural Networks (IJCNN'11)- Ciresan, Meier, Gambardella, Schmidhuber. Convolutional Neural Network Committees For Handwritten Character Classification. 11th International Conference on Document Analysis and Recognition (ICDAR 2011), Beijing, China.- Ciresan, Meier, Masci, Gambardella, Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification. International Joint Conference on Artificial Intelligence IJCAI-2011.
Papers on Action Recognition
– Learning hierarchical spatio-temporal features for action recognition with independent subspace analysis, Le, Zou, Yeung, Ng. In Computer Vision and Pattern Recognition (CVPR), 2011
![Page 150: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/150.jpg)
151
Papers on Vision for Robotics
Ranzato
– Hadsell, Sermanet, Scoffier, Erkan, Kavackuoglu, Muller, LeCun: Learning Long-Range Vision for Autonomous Off-Road Driving, Journal of Field Robotics, 26(2):120-144, February 2009,
– Serre, Wolf, Bileschi, Riesenhuber, Poggio. Robust Object Recognition with Cortex-like Mechanisms, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 3, 411-426, 2007.- Pinto, Doukhan, DiCarlo, Cox "A high-throughput screening approach to discovering good forms of biologically inspired visual representation." {PLoS} Computational Biology. 2009
Papers on Biological Inspired Vision
Deep Convex Nets & Deconv-Nets– Deng, Yu. “Deep Convex Network: A Scalable Architecture for Speech Pattern Classification.” Interspeech, 2011.- Zeiler, Taylor, Fergus "Adaptive Deconvolutional Networks for Mid and High Level Feature Learning." ICCV. 2011
![Page 151: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/151.jpg)
152
Software & LinksDeep Learning website
Matlab code for R-ICA unsupervised algorithm
Python-based learning library
C++ code for ConvNets
Lush learning library which includes ConvNets
Code used to generate demo for this tutorial
Ranzato
– http://deeplearning.net/
– http://eblearn.sourceforge.net/
– http://deeplearning.net/
- http://lush.sourceforge.net/
– http://eblearn.sourceforge.net/
- http://deeplearning.net/software/theano/
- http://ai.stanford.edu/~quocle/rica_release.zip
- http://lush.sourceforge.net/
- http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/
![Page 152: NEURAL NETS FOR VISION - NYU Computer Sciencefergus/teaching/vision/nnets.pdf- Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions:](https://reader031.vdocuments.site/reader031/viewer/2022022517/5b0a254a7f8b9abe5d8dc3a3/html5/thumbnails/152.jpg)
153
Acknowledgements
Ranzato
Quoc Le, Andrew Ng
Jeff Dean, Kai Chen, Greg Corrado, Matthieu Devin, Mark Mao, Rajat Monga, Paul Tucker, Samy Bengio
Yann LeCun, Pierre Sermanet, Clement Farabet