deep learning class #1 - go deep or go home
TRANSCRIPT
![Page 1: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/1.jpg)
Louis Monier@louis_monier
https://www.linkedin.com/in/louismonier
Deep Learning BasicsGo Deep or Go Home
Gregory Renard@redohttps://www.linkedin.com/in/gregoryrenard
Class 1 - Q1 - 2016
![Page 2: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/2.jpg)
Supervised Learning
cathatdogcatmug
…
… =?
prediction
dog (89%)
training set …
Training(days)
Testing(ms)
![Page 3: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/3.jpg)
Single Neuron to Neural Networks
![Page 4: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/4.jpg)
Single Neuron
+
x
w1
xpos
x
w2
ypos
w3
ChoiceScore
Linear Combination Nonlinearity
3.4906 0.9981, red
Inputs
3.45
1.21
-0.314
1.59
2.65
![Page 5: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/5.jpg)
More Neurons, More Layers: Deep Learning
input1
input2
input3
input layer hidden layers output layer
circleiris
eye
face
89% likely to be a face
![Page 6: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/6.jpg)
How to find the best values for the weights?
w = Parameter
Error
a
b
Define error = |expected - computed| 2
Find parameters that minimize average error
Gradient of error wrt w is
Update:
take a step downhill
![Page 7: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/7.jpg)
Egg carton in 1 million dimensions
Get to here
or here
![Page 8: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/8.jpg)
Backpropagation: Assigning Blame
input1
input2
input3
label=1
loss=0.49
w1
w2
w3
b
w1’
w2’
w3’
b’
1). For loss to go down by 0.1, p must go up by 0.05.For p to go up by 0.05, w1 must go up by 0.09.For p to go up by 0.05, w2 must go down by 0.12....
prediction p=0.3
p’=0.89
2). For loss to go down by 0.1, p’ must go down by 0.01
3). For p’ to go down by 0.01, w2’ must go up by 0.04
![Page 9: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/9.jpg)
Stochastic Gradient Descent (SGD) and Minibatch
It’s too expensive to compute the gradient on all inputs to take a step.
We prefer a quick approximation.
Use a small random sample of inputs (minibatch) to compute the gradient.
Apply the gradient to all the weights.
Welcome to SGD, much more efficient than regular GD!
![Page 10: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/10.jpg)
Usually things would get intense at this point...
Chain rule
Partial derivatives
HessianSum over paths
Credit: Richard Socher Stanford class cs224d
Jacobian matrix
Local gradient
![Page 11: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/11.jpg)
Learning a new representation
Credit: http://colah.github.io/posts/2014-03-NN-Manifolds-Topology/
![Page 12: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/12.jpg)
Training Data
![Page 13: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/13.jpg)
Train. Validate. Test.Training set
Validation set
Cross-validation
Testing set
2 3 41
2 3 41
2 3 41
2 3 41
shuffle!
one epoch
Never ever, ever, ever, ever, ever, ever use for training!
![Page 14: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/14.jpg)
Tensors
![Page 15: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/15.jpg)
Tensors are not scary
Number: 0D
Vector: 1D
Matrix: 2D
Tensor: 3D, 4D, … array of numbers
This image is a 3264 x 2448 x 3 tensor
[r=0.45 g=0.84 b=0.76]
![Page 16: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/16.jpg)
Different types of networks / layers
![Page 17: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/17.jpg)
Fully-Connected NetworksPer layer, every input connects to every neuron.
input1
input2
input3
input layer hidden layers output layer
circleiris
eye
face
89% likely to be a face
![Page 18: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/18.jpg)
Recurrent Neural Networks (RNN)Appropriate when inputs are sequences.
We’ll cover in detail when we work on text.
pipe a not is this
t=0t=1t=2t=3t=4
this is not a pipe
t=0 t=1 t=2 t=3 t=4
![Page 19: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/19.jpg)
Convolutional Neural Networks (ConvNets)Appropriate for image tasks, but not limited to image tasks.
We’ll cover in detail in the next session.
![Page 20: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/20.jpg)
HyperparametersActivations: a zoo of nonlinear functions
Initializations: Distribution of initial weights. Not all zeros.
Optimizers: driving the gradient descent
Objectives: comparing a prediction to the truth
Regularizers: forcing the function we learn to remain “simple”
...and many more
![Page 21: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/21.jpg)
ActivationsNonlinear functions
![Page 22: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/22.jpg)
Sigmoid, tanh, hard tanh and softsign
![Page 23: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/23.jpg)
ReLU (Rectified Linear Unit) and Leaky ReLU
![Page 24: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/24.jpg)
Softplus and Exponential Linear Unit (ELU)
![Page 25: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/25.jpg)
Softmax
x1 = 6.78
x2 = 4.25
x3 = -0.16
x4 = -3.5
dog = 0.92536
cat = 0.07371
car = 0.00089
hat = 0.00003
Raw scores Probabilities
SOFTMAX
![Page 26: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/26.jpg)
Optimizers
![Page 27: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/27.jpg)
Various algorithms for driving Gradient DescentTricks to speed up SGD.
Learn the learning rate.
Credit: Alex Radford
![Page 28: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/28.jpg)
Cost/Loss/Objective Functions
=?
![Page 29: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/29.jpg)
Cross-entropy Loss
x1 = 6.78
x2 = 4.25
x3 = -0.16
x4 = -3.5
dog = 0.92536
cat = 0.07371
car = 0.00089
hat = 0.00003
Raw scores Probabilities
SOFTMAX
dog: loss = 0.078
cat: loss = 2.607
car: loss = 7.024
hat: loss = 10.414
If label is:
=?
label (truth)
![Page 30: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/30.jpg)
RegularizationPreventing Overfitting
![Page 31: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/31.jpg)
Overfitting
Very simple.Might not predict well.
Just right? Overfitting. Rote memorization.Will not generalize well.
![Page 32: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/32.jpg)
Regularization: Avoiding Overfitting
Idea: keep the functions “simple” by constraining the weights.
Loss = Error(prediction, truth) + L(keep solution simple)
L2 makes all parameters medium-size
L1 kills many parameters.
L1+L2 sometimes used.
![Page 33: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/33.jpg)
At every step, during training, ignore the output of a fraction p of neurons.
p=0.5 is a good default.
Regularization: Dropout
input1
input2
input3
![Page 34: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/34.jpg)
One Last Bit of Wisdom
![Page 35: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/35.jpg)
![Page 36: Deep Learning Class #1 - Go Deep or Go Home](https://reader034.vdocuments.site/reader034/viewer/2022052606/589b53771a28ab4a398b6e6d/html5/thumbnails/36.jpg)
Workshop : Keras & Addition RNNhttps://github.com/holbertonschool/deep-learning/tree/master/Class%20%230