cse446: neural networks spring 2017 › ... › 17sp › slides › 14_neuralnet… · multilayer...
TRANSCRIPT
![Page 1: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/1.jpg)
CSE446: Neural NetworksSpring 2017
Many slides are adapted from Carlos Guestrin and Luke Zettlemoyer
![Page 2: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/2.jpg)
Human Neurons
• Switching time• ~ 0.001 second
• Number of neurons– 1010
• Connections per neuron– 104-5
• Scene recognition time– 0.1 seconds
• Number of cycles per scene recognition?– 100 much parallel computation!
![Page 3: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/3.jpg)
Perceptron as a Neural Network
g
This is one neuron:
– Input edges x1 ... xn, along with basis
– The sum is represented graphically
– Sum passed through an activation function g
![Page 4: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/4.jpg)
Sigmoid Neuron
-6 -4 -2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
g
Just change g!• Why would be want to do this?
• Notice new output range [0,1]. What was it before?
• Look familiar?
![Page 5: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/5.jpg)
Optimizing a neuron
We train to minimize sum-squared error
Solution just depends on g’: derivative of activation function!
![Page 6: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/6.jpg)
g
Re-deriving the perceptron update
For a specific, incorrect example:• w = w + y*x (our familiar update!)
![Page 7: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/7.jpg)
Sigmoid units: have to differentiate g
![Page 8: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/8.jpg)
Aside: Comparison to logistic regression
• P(Y|X) represented by:
• Learning rule – MLE:
-6 -4 -2 0 2 4 60
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
![Page 9: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/9.jpg)
g
Perceptron, linear classification, Boolean functions: xi∈{0,1}
• Can learn x1 ∨ x2?• -0.5 + x1 + x2
• Can learn x1 ∧ x2?• -1.5 + x1 + x2
• Can learn any conjunction or disjunction?• 0.5 + x1 + … + xn
• (-n+0.5) + x1 + … + xn
• Can learn majority?• (-0.5*n) + x1 + … + xn
• What are we missing? The dreaded XOR!, etc.
![Page 10: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/10.jpg)
Going beyond linear classification
Solving the XOR problemy = x1 XOR x2
v1 = (x1 ∧¬x2) = -1.5+2x1-x2
v2 = (x2 ∧¬x1) = -1.5+2x2-x1
y = v1∨ v2
= -0.5+v1+v2
x1
x2
1
v1
v2
y
1-0.5
1
1
-1.5
2
-1
2
-1-1.5
= (x1 ∧ ¬x2) ∨ (x2 ∧¬x1)
![Page 11: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/11.jpg)
Hidden layer
• Single unit:
• 1-hidden layer:
• No longer convex function!
![Page 12: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/12.jpg)
©Carlos Guestrin 2005-2009
Example
data for NN
with hidden
layer
![Page 13: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/13.jpg)
Learned
weights for
hidden layer
![Page 14: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/14.jpg)
Forward propagation1-hidden layer:
Compute values left to right
1. Inputs: x1, …, xn
2. Hidden: v1 ,…, vn
3. Output: y
x1
x2
1
v1
v2
y
1
![Page 15: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/15.jpg)
Forward propagation1-hidden layer:
Compute values left to right
1. Inputs: x1, …, xn
2. Hidden: v1 ,…, vn
3. Output: y
x1
x2
1
v1
v2
y
1
![Page 16: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/16.jpg)
Forward propagation1-hidden layer:
Compute values left to right
1. Inputs: x1, …, xn
2. Hidden: v1 ,…, vn
3. Output: y
x1
x2
1
v1
v2
y
1
![Page 17: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/17.jpg)
Forward propagation1-hidden layer:
Compute values left to right
1. Inputs: x1, …, xn
2. Hidden: v1 ,…, vn
3. Output: y
x1
x2
1
v1
v2
y
1
![Page 18: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/18.jpg)
©Carlos Guestrin 2005-2009 18
![Page 19: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/19.jpg)
©Carlos Guestrin 2005-2009 19
![Page 20: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/20.jpg)
©Carlos Guestrin 2005-2009 20
![Page 21: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/21.jpg)
Gradient descent for 1-hidden layer
Dropped w0 to make derivation simpler
Gradient for last layer same as the single node
case, but with hidden nodes v as input!
![Page 22: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/22.jpg)
Gradient descent for 1-hidden layer
Dropped w0 to make derivation simpler
For hidden layer,
two parts:
• Normal update
for single neuron
• Recursive
computation of
gradient on
output layer
![Page 23: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/23.jpg)
Multilayer neural networks
Inference and
Learning:
• Forward pass:
left to right, each
hidden layer in
turn
• Gradient
computation:
right to left,
propagating
gradient for
each nodeForward
Gradient
![Page 24: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/24.jpg)
Forward propagation – prediction
• Recursive algorithm
• Start from input layer
• Output of node Vk with parents U1,U2,…:
![Page 25: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/25.jpg)
Back-propagation – learning
• Just gradient descent!!!
• Recursive algorithm for computing gradient
• For each example
– Perform forward propagation
– Start from output layer
• Compute gradient of node Vk with parents U1,U2,…
• Update weight wik
• Repeat (move to preceding layer)
![Page 26: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/26.jpg)
Back-propagation – pseudocode
Initialize all weights to small random numbers
• Until convergence, do:– For each training example x,y:
1. Forward propagation, compute node values Vk
2. For each output unit o (with labeled output y):
δo = Vo(1-Vo)(y-Vo)3. For each hidden unit h:
δh = Vh(1-Vh) Σk in output(h) wh,kδk
4. Update each network weight wi,j from node i to node j
wi,j = wi,j + ηδjxi,j
![Page 27: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/27.jpg)
Convergence of backprop
• Perceptron leads to convex optimization
– Gradient descent reaches global minima
• Multilayer neural nets not convex
– Gradient descent gets stuck in local minima
– Selecting number of hidden units and layers = fuzzy process
– NNs have made a HUGE comeback in the last few years!!!• Neural nets are back with a new name!!!!
– Deep belief networks
– Huge error reduction when trained with lots of data on GPUs
![Page 28: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/28.jpg)
Overfitting in NNs
• Are NNs likely to overfit?– Yes, they can represent
arbitrary functions!!!
• Avoiding overfitting?– More training data
– Fewer hidden nodes / better topology
– Regularization
– Early stopping
![Page 29: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/29.jpg)
Image ModelsObject Recognition
Slides from Jeff Dean at Google
![Page 30: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/30.jpg)
Number Detection
Slides from Jeff Dean at Google
What are these numbers?
![Page 31: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/31.jpg)
Slides from Jeff Dean at Google
Acoustic Modeling for Speech Recognition
Trained in <5 days on cluster of 800 machines
30% reduction in Word Error Rate for English!(“biggest single improvement in 20 years of speech research”)
Launched in 2012 at time of Jellybean release of Android
Close collaboration with Google Speech team
label
![Page 32: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/32.jpg)
Slides from Jeff Dean at Google
Fully-connected layers
Input
Layer 1
Layer 7
...
Softmax to predict object class
Convolutional layers!(same weights used at all!spatial locations in layer)!!
Convolutional networks developed by!Yann LeCun (NYU)
Basic architecture developed by Krizhevsky, Sutskever & Hinton (all now at Google).!
Won 2012 ImageNet challenge with 16.4% top-5 error rate
2012-era Convolutional Model for Object Recognition
![Page 33: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/33.jpg)
Slides from Jeff Dean at Google
24 layers deep!
2014-era Model for Object Recognition
Developed by team of Google Researchers:!
Won 2014 ImageNet challenge with 6.66% top-5 error rate
Module with 6 separate!
convolutional layers
![Page 34: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/34.jpg)
Slides from Jeff Dean at Google
Good Fine-grained Classification
“hibiscus” “dahlia”
![Page 35: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/35.jpg)
Slides from Jeff Dean at Google
Good Generalization
Both recognized as a
“meal”
![Page 36: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/36.jpg)
Slides from Jeff Dean at Google
Sensible Errors
“snake” “dog”
![Page 37: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/37.jpg)
Slides from Jeff Dean at Google
Works in practice for real users.
![Page 38: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/38.jpg)
Slides from Jeff Dean at Google
Works in practice for real users.
![Page 39: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/39.jpg)
Object Detection
![Page 40: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/40.jpg)
YOLO
40DEMO
![Page 41: CSE446: Neural Networks Spring 2017 › ... › 17sp › Slides › 14_NeuralNet… · Multilayer neural networks Inference and Learning: • Forward pass: left to right, each hidden](https://reader034.vdocuments.site/reader034/viewer/2022042410/5f27dc65ee710619b22e1e1c/html5/thumbnails/41.jpg)
What you need to know about neural networks
• Perceptron:
– Relationship to general neurons
• Multilayer neural nets
– Representation
– Derivation of backprop
– Learning rule
• Overfitting