introduction to computational vision training neural nets...

Introduction to Computational VisionTraining Neural Nets and CNNs

Agastya Kalra1

Outline

• Training

• Convolutional Layers

Agastya Kalra2

SGD Formalized

3

Momentum

• Adds a velocity term•

• Adds a speedup of at most

• Momentum constant usually 0.9, 0.5, 0.99corresponding to a 10x, 2x, 100x increase in max speed

4

Nesterov Momentum

• Applies momentum first then gradient

5 http://cs231n.stanford.edu/slides/2016/winter1516_lecture6.pdf

Setting the learning rate

6 http://cs231n.stanford.edu/slides/2016/winter1516_lecture6.pdf

● Also good to try a setting for 100 iterations and see which is best on validation set

Others

7

• Adagrad• Adaptive Learning Rates

• Adam• Adaptive Learning Rates + Momentum

• RMSProp• Adaptive learning rates with a slightly different decay

How to choose?

• SGD < SGD+Momentum < SGD+Nesterov Momentum• Adam is a good default• RMSProp is good for RNNs, but also good default• SGD + Nesterov momentum is best if you have

time/resources to optimize learning rate• More of an Art

8

Outline

• Training

• Convolutional Layers

Agastya Kalra9

Fully Connected Layer

10

http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture5.pdf

Fully Connected Layer

11


Convolutional Layer

12


Convolutional Layer

13


Convolutional Layer

14


Convolutional Layer

15


Convolutional Layer

16


Convolutional Layer

17


Convolutional Layer

18


Convolutional Layer

19


Convolutional Layer

20


Convolutional Layer

21


Convolutional Layer

22


Convolutional Layer

23


Convolutional Layer

24


Convolutional Layer

25


Convolutional Layer

26


Convolutional Layer

27


Convolutional Layer

28


Convolutional Layer

29


Convolutional Layer

30


Convolutional Layer

31


Convolutional Layer

32


Convolutional Layer

33


Convolutional Layer

34


Convolutional Layer

35


Convolutional Layer

36


Convolutional Layer

37


Convolutional Layer

38


Convolutional Layer

39


Convolutional Layer

40


Convolutional Layer


41


Convolutional Layer: Im2Col

In Practice, this is actually implemented at a matrix multiplication:

https://www.mathworks.com/help/images/ref/im2col.htmlThen Backprop is the same as backprop through a matrix multiply.

Note: If you want to tie the params of 2 weights, initialize them the same and sum their gradients at each timestep.

42

https://www.mathworks.com/help/images/ref/im2col.html

introduction to computational vision training neural nets...

Documents