backpropagation learning - mit opencourseware · generalized delta rule • delta rule only works...

22
Backpropagation learning MIT Department of Brain and Cognitive Sciences 9.641J, Spring 2005 - Introduction to Neural Networks Instructor: Professor Sebastian Seung

Upload: vuongxuyen

Post on 03-Mar-2019

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Backpropagation learning

MIT Department of Brain and Cognitive Sciences 9.641J, Spring 2005 - Introduction to Neural Networks Instructor: Professor Sebastian Seung

Page 2: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Simple vs. multilayer perceptron

Page 3: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Hidden layer problem

• Radical change for the supervised learning problem.

• No desired values for the hidden layer.• The network must find its own hidden

layer activations.

Page 4: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Generalized delta rule

• Delta rule only works for the output layer.

• Backpropagation, or the generalized delta rule, is a way of creating desired values for hidden layers

Page 5: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Outline

• The algorithm• Derivation as a gradient algoritihm• Sensitivity lemma

Page 6: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Multilayer perceptron

• L layers of weights and biases• L+1 layers of neurons

x0 W 1,b1 ⎯ → ⎯ x1 W 2 ,b2 ⎯ → ⎯ L W L ,bL ⎯ → ⎯ xL

xil = f Wij

l x jl−1 + bi

l

j=1

nl−1

∑⎛

⎝ ⎜

Page 7: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Reward function

• Depends on activity of the output layer only.

• Maximize reward with respect to weights and biases.

R xL( )

Page 8: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Example: squared error

• Square of desired minus actual output, with minus sign.

R xL( )= −12

di − xiL( )2

i=1

nL

Page 9: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Forward pass

For l =1 to L,

uil = Wij

l x jl−1 + bi

l

j=1

nl−1

xil = f ui

l( )

Page 10: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Sensitivity computation

• The sensitivity is also called “delta.”

siL = ′ f ui

L( ) ∂R∂xi

L

= ′ f uiL( ) di − xi

L( )

Page 11: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Backward pass

for l = L to 2

s jl−1 = ′ f u j

l−1( ) sil

i=1

nl

∑ Wijl

Page 12: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Learning update

• In any order

∆Wijl = ηsi

l x jl−1

∆bil = ηsi

l

Page 13: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Backprop is a gradient update

• Consider R as function of weights and biases.

∂R∂Wij

l = sil x j

l−1

∂R∂bi

l = sil

∆Wijl = η ∂R

∂Wijl

∆bil = η ∂R

∂bil

Page 14: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Sensitivity lemma

• Sensitivity matrix = outer product– sensitivity vector– activity vector

• The sensitivity vector is sufficient.• Generalization of “delta.”

∂R∂Wij

l =∂R∂bi

l x jl−1

Page 15: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Coordinate transformation

uil = Wij

l f u jl−1( )+ bi

l

j=1

nl−1

∂uil

∂u jl−1 = Wij

l ′ f u jl−1( )

∂R∂ui

l =∂R∂bi

l

Page 16: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Output layer

xiL = f ui

L( )ui

L = WijL x j

L−1 + biL

j∑

∂R∂bi

L = ′ f uiL( ) ∂R

∂xiL

Page 17: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Chain rule

• composition of two functionsul−1 → ul → Rul−1 → R

∂R∂u j

l−1 =∂R∂ui

l∂ui

l

∂u jl−1

i∑

∂R∂bj

l−1 =∂R∂bi

l Wijl ′ f u j

l−1( )i

Page 18: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Computational complexity

• Naïve estimate– network output: order N– each component of the gradient: order N– N components: order N2

• With backprop: order N

Page 19: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Biological plausibility

• Local: pre- and postsynaptic variables

• Forward and backward passes use same weights

• Extra set of variables

x jl−1 Wij

l ⎯ → ⎯ xil , s j

l−1 Wijl ← ⎯ ⎯ si

l

Page 20: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Backprop for brain modeling

• Backprop may not be a plausible account of learning in the brain.

• But perhaps the networks created by it are similar to biological neural networks.

• Zipser and Andersen:– train network– compare hidden neurons with those found

in the brain.

Page 21: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

LeNet

• Weight-sharing• Sigmoidal neurons• Learn binary outputs

Page 22: Backpropagation learning - MIT OpenCourseWare · Generalized delta rule • Delta rule only works for the output layer. • Backpropagation, or the generalized delta rule, is a way

Machine learning revolution

• Gradient following– or other hill-climbing methods

• Empirical error minimization