introduction to computational vision training neural nets...
TRANSCRIPT
Introduction to Computational VisionTraining Neural Nets and CNNs
Agastya Kalra1
Outline
• Training
• Convolutional Layers
Agastya Kalra2
SGD Formalized
3
Momentum
• Adds a velocity term•
• Adds a speedup of at most
• Momentum constant usually 0.9, 0.5, 0.99corresponding to a 10x, 2x, 100x increase in max speed
4
Nesterov Momentum
• Applies momentum first then gradient
5 http://cs231n.stanford.edu/slides/2016/winter1516_lecture6.pdf
Setting the learning rate
6 http://cs231n.stanford.edu/slides/2016/winter1516_lecture6.pdf
● Also good to try a setting for 100 iterations and see which is best on validation set
Others
7
• Adagrad• Adaptive Learning Rates
• Adam• Adaptive Learning Rates + Momentum
• RMSProp• Adaptive learning rates with a slightly different decay
How to choose?
• SGD < SGD+Momentum < SGD+Nesterov Momentum• Adam is a good default• RMSProp is good for RNNs, but also good default• SGD + Nesterov momentum is best if you have
time/resources to optimize learning rate• More of an Art
8
Outline
• Training
• Convolutional Layers
Agastya Kalra9
Fully Connected Layer
10
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Fully Connected Layer
11
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
12
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
13
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
14
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
15
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
16
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
17
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
18
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
19
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
20
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
21
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
22
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
23
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
24
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
25
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
26
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
27
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
28
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
29
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
30
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
31
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
32
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
33
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
34
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
35
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
36
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
37
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
38
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
39
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
40
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
41
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf
Convolutional Layer: Im2Col
In Practice, this is actually implemented at a matrix multiplication:
https://www.mathworks.com/help/images/ref/im2col.htmlThen Backprop is the same as backprop through a matrix multiply.
Note: If you want to tie the params of 2 weights, initialize them the same and sum their gradients at each timestep.
42
43