ming-feng yeh1 chapter 11 back- propagation. ming-feng yeh2 objectives a generalization of the lms...

39
Ming-Feng Yeh 1 CHAPTER 11 CHAPTER 11 Back- Back- Propagation Propagation

Upload: alison-obrien

Post on 21-Dec-2015

221 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 1

CHAPTER 11CHAPTER 11

Back-Back-PropagationPropagation

Page 2: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 2

ObjectivesObjectives

A generalization of the LMS algorithm, called backpropagation, can be used to train multilayer networks.

Backpropagation is an approximate steepest descent algorithm, in which the performance index is mean square error.

In order to calculate the derivatives, we need to use the chain rule of calculus.

Page 3: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 3

MotivationMotivation

The perceptron learning and the LMS algorithm were designed to train single-layer perceptron-like networks.They are only able to solve linearly separable classification problems.Parallel Distributed ProcessingThe multilayer perceptron, trained by the backpropagation algorithm, is currently the most widely used neural network.

Page 4: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 4

Three-Layer NetworkThree-Layer Network

321 SSSR Number of neurons in each layer:

Page 5: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 5

Pattern Classification: Pattern Classification: XOR gateXOR gate

The limitations of the single-layer perceptron (Minsky & Papert, 1969)

0,

0

011 tp

1,

1

022 tp

1,

0

133 tp

0,

1

144 tp

1P

2P 4P

3P

Page 6: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 6

Two-Layer XOR NetworkTwo-Layer XOR Network

Two-layer, 2-2-1 network

11w

12w

1P

4P

AND

11

1

11n

12n

1

5.1

11a

12a

21n 2

1a

1p

2p

22

11

5.1

Individual Decisions

Page 7: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 7

Solved Problem P11.1Solved Problem P11.1

Design a multilayer network to distinguish these categories.

T1 1111 p

T2 1111 p

T3 1111 p

T4 1111 p

Class I Class II01 bWp02 bWp

03 bWp04 bWp

There is no hyperplane that can separate these two categories.

Page 8: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 8

Solution of Problem P11.1Solution of Problem P11.1

11

1

11n

12n

1

11a

12a

21n 2

1a

1p

2p

1

1

2

2

2

2

3p

4p

AND

OR

Page 9: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 9

Function ApproximationFunction Approximation

Two-layer, 1-2-1 networknnf

enf

n

)( ,

1

1)( 21

.10,10,10,10 12

11

12

11 bbww

.0,1,1 221

21 bww

Page 10: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 10

Function ApproximationFunction Approximation

The centers of the steps occur where the net input to a neuron in the first layer is zero.

The steepness of each step can be adjusted by changing the network weights.

110100

110)10(012

12

12

12

12

11

11

11

11

11

wbpbpwn

wbpbpwn

Page 11: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 11

Effect of Parameter ChangesEffect of Parameter Changes

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1

0

1

2

3

12b

20 15 10 5 0

Page 12: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 12

Effect of Parameter ChangesEffect of Parameter Changes

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1

0

1

2

3

21w

1.0

0.5

0.0

-0.5

-1.0

Page 13: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 13

Effect of Parameter ChangesEffect of Parameter Changes

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1

0

1

2

3

21w

1.0

0.5

0.0

-0.5

-1.0

Page 14: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 14

Effect of Parameter ChangesEffect of Parameter Changes

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -1

0

1

2

3

2b

1.0

0.5

0.0

-0.5

-1.0

Page 15: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 15

Function ApproximationFunction Approximation

Two-layer networks, with sigmoid transfer functions in the hidden layer and linear transfer functions in the output layer, can approximate virtually any function of interest to any degree accuracy, provided sufficiently many hidden units are available.

Page 16: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 16

Backpropagation AlgorithmBackpropagation Algorithm

For multilayer networks the outputs of one layer becomes the input to the following layer.

1,...,2,1,0 ),( 1111 Mmmmmmm baWfaMaapa ,0

Page 17: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 17

Performance IndexPerformance Index

Training Set:

Mean Square Error:

Vector Case:

Approximate Mean Square Error:

Approximate Steepest Descent Algorithm

p1 t1{ , } p2 t2{ , } pQ tQ{ , }

F x E e2 = E t a– 2 =

F x E eTe = E t a–

Tt a– =

F̂ x t k a k – T t k a k – eTk e k = =

w i jm

k 1+ wi jm

k F̂

w i jm

------------–= bim

k 1+ bim

k F̂

bim

---------–=

Page 18: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 18

Chain RuleChain Rule

If f(n) = en and n = 2w, so that f(n(w)) = e2w.

Approximate mean square error:

dw

wdn

dn

ndf

dw

wndf )()())((

2)()())((

nedw

wdn

dn

ndf

dw

wndf

)()()]()([)]()([)(ˆ TT kkkkkkF eeatatx

mji

mi

mi

mjim

ji

mji

mji w

n

n

Fkw

w

Fkwkw

,,

,,,

ˆ)(

ˆ)()1(

mi

mi

mi

mim

i

mi

mi b

n

n

Fkb

b

Fkbkb

ˆ

)(ˆ

)()1(

Page 19: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 19

Sensitivity & GradientSensitivity & Gradient

The net input to the ith neurons of layer m:

The sensitivity of to changes in the ith element of the net input at layer m:

Gradient:

mi

S

j

mj

mji

mi bawn

m

1

1

1, 1 ,1

,

mi

mim

jmji

mi

b

na

w

n

F̂mi

mi nFs ˆ

1

,,

ˆˆ

m

jmim

ji

mi

mi

mji

asw

n

n

F

w

F

mi

mim

i

mi

mi

mi

ssb

n

n

F

b

F

1ˆˆ

Page 20: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 20

Steepest Descent AlgorithmSteepest Descent Algorithm

The steepest descent algorithm for the approximate mean square error:

Matrix form:

1,

,,, )(

ˆ)()1(

mj

mi

mjim

ji

mi

mi

mji

mji askw

w

n

n

Fkwkw

mi

mim

i

mi

mi

mi

mi skb

b

n

n

Fkbkb

)(ˆ

)()1(

Wm

k 1+ Wm

k sm

am 1–

T

–=

bmk 1+ bm

k sm–=

sm F

nm

----------

F

n1m

---------

F

n2m

---------

F

nS

mm

-----------

=

Page 21: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 21

BP the SensitivityBP the Sensitivity

Backpropagation: a recurrence relationship in which the sensitivity at layer m is computed from the sensitivity at layer m+1. Jacobian matrix:

.

1

2

1

1

1

12

2

12

1

12

11

2

11

1

11

1

111

m

s

m

sm

m

sm

m

s

m

s

m

m

m

m

m

m

s

m

m

m

m

m

m

m

m

mmm

m

m

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

n

Page 22: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 22

Matrix RepressionMatrix Repression

The i,j element of Jacobian matrix

).(

)(

1,

1,

1,

1

11,1

mj

mmji

mj

mj

mm

jimj

mjm

jimj

s

l

mi

ml

mli

mj

mi

nfw

n

nfw

n

aw

n

baw

n

n

m

,)(11

mmmm

m

nFWn

n

.

)(00

0)(0

00)(

)( 2

1

m

s

m

mm

mm

mm

mnf

nf

nf

nF

Page 23: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 23

Recurrence RelationRecurrence Relation

The recurrence relation for the sensitivity

The sensitivities are propagated backward through the network from the last layer to the first layer.

.))((

ˆ))((

ˆˆ

1T1

11

1

T1

mmmm

mTmm

mm

m

mm

sWnF

n

FWnF

n

F

n

n

n

Fs

.121 ssss MM

Page 24: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 24

Backpropagation AlgorithmBackpropagation Algorithm

At the final layer:

.)(2

)()()(ˆ

1

2

T

Mi

iiiM

i

S

jjj

Mi

Mi

Mi n

aat

n

at

nn

Fs

M

atat

)()( M

iM

Mi

Mi

M

Mi

Mi

Mi

i nfn

nf

n

a

n

a

)()(2 Mi

Mii

Mi nfats

))((2 atnFs MMM

Page 25: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 25

SummarySummary

The first step is to propagate the input forward through the network:

The second step is to propagate the sensitivities backward through the network: Output layer: Hidden layer:

The final step is to update the weights and biases:

Maa1,...,2,1,0 ),( 1111 Mmmmmmm baWfa

pa 0

))((2 atnFs MMM 1,2,...,1 ,))(( 1T1 Mmmmmmm sWnFs

T1)()()1( mmmm kk asWW mmm kk sbb )()1(

Page 26: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 26

BP Neural NetworkBP Neural Network

m

jS mw,1

mjw ,1

mjiw ,

Layer m

1

j

mS

1

i

Layer m-1

1mS

mw 1,1

miw 1,

m

S mw1,1

m

SS mmw,1

m

S mw,1

m

Si mw,

1

k

MS

Layer MMa1

Mka

M

S Ma

Layer 1

1p

2p

Rp

11,1w

11,2w

1

1,1Sw

1

,1 RSw

1

2

1S

Page 27: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 27

Ex: Function ApproximationEx: Function Approximation

g p 14---p sin+=

1-2-1Network

+

t

ep

Page 28: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 28

Network ArchitectureNetwork Architecture

1-2-1Network

ap

Page 29: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 29

Initial ValuesInitial ValuesW1

0 0.27–

0.41–= b1

0 0.48–

0.13–= W2

0 0.09 0.17–= b20 0.48=

Network ResponseSine Wave

-2 -1 0 1 2-1

0

1

2

3

Initial Network Response:

p

2a

Page 30: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 30

Forward PropagationForward Propagationa

0p 1= =

a1 f1 W1a0 b1+ l ogsig 0.27–

0.41–1

0.48–

0.13–+

logsig 0.75–

0.54–

= = =

a2

f2 W2a1 b2

+ purelin 0.09 0.17–0.321

0.3680.48+( ) 0.446= = =

e t a– 1 4---p sin+

a2– 1 4---1 sin+

0.446– 1.261= = = =

a1

1

1 e0.75+--------------------

1

1 e0.54+--------------------

0.321

0.368= =

Initial input:Output of the 1st layer:

Output of the 2nd layer:

error:

Page 31: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 31

Transfer Func. DerivativesTransfer Func. Derivatives

))(1(1

1

1

11

)1(1

1)(

11

21

aaee

e

e

edn

dnf

nn

n

n

n

1)()(2 ndn

dnf

Page 32: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 32

BackpropagationBackpropagation

The second layer sensitivity:

The first layer sensitivity:

522.2261.112

)]([2))((2 22222

enfn atFs

0997.0

0495.0

522.217.0

09.0

368.0)368.01(0

0321.0)321.01(

))(1(0

0))(1())(( 2

22,1

21,1

12

12

11

1122111 ssWnFs

w

w

aa

aaT

Page 33: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 33

Weight UpdateWeight Update

Learning rate 1.0

0772.0171.0

368.0321.0]522.2[1.017.009.0

)()0()1( 1222

TasWW

]732.0[]522.2[1.0]48.0[)0()1( 222 sbb

420.0

265.0]1[

0997.0

0495.01.0

41.0

27.0

)()0()1( 0111 TasWW

140.0

475.0

0997.0

0495.01.0

13.0

48.0)0()1( 111 sbb

Page 34: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 34

Choice of Network StructureChoice of Network Structure

Multilayer networks can be used to approximate almost any function, if we have enough neurons in the hidden layers.

We cannot say, in general, how many layers or how many neurons are necessary for adequate performance.

Page 35: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 35

Illustrated Example 1Illustrated Example 1

g p 1i 4----- p sin+=

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

1-3-1 Network 1i 2i

4i 8i

Page 36: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 36

Illustrated Example 2Illustrated Example 2

g p 164

------ p sin+=

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

1-5-1

1-2-1 1-3-1

1-4-1

22 p

Page 37: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 37

ConvergenceConvergenceg p 1 p sin+=

-2 -1 0 1 2-1

0

1

2

3

1

23

4

5

0

-2 -1 0 1 2-1

0

1

2

3

1

2

34

5

0

22 p

Convergence to Global Min. Convergence to Local Min.The numbers to each curve indicate the sequence of iterations.

Page 38: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 38

GeneralizationGeneralization

In most cases the multilayer network is trained with a finite number of examples of proper network behavior:

This training set is normally representative of a much larger class of possible input/output pairs.

Can the network successfully generalize what it has learned to the total population?

p1 t1{ , } p2 t2{ , } pQ tQ{ , }

Page 39: Ming-Feng Yeh1 CHAPTER 11 Back- Propagation. Ming-Feng Yeh2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train

Ming-Feng Yeh 39

Generalization ExampleGeneralization Exampleg p 1

4---p sin+= p 2– 1.6– 1.2– 1.6 2 =

-2 -1 0 1 2-1

0

1

2

3

-2 -1 0 1 2-1

0

1

2

3

1-2-1 1-9-1

For a network to be able to generalize, it should have fewer parameters than there are data points in the training set.

Generalize well Not generalize well