back-propagation algorithm - ulisboa · 1 back-propagation algorithm perceptron gradient descent...

26
1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

Upload: vuongdieu

Post on 27-Aug-2018

238 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

1

Back-Propagation Algorithm

Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

Page 2: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

2

Inner-product

A measure of the projection of one vectoronto another

!

net =<r w ,

r x >=||

r w || " ||

r x || "cos(#)

!

net = wi" x

i

i=1

n

#

Activation function

!

o = f (net) = f ( wi " xi)i=1

n

#

!

f (x) := sgn(x) =1 if x " 0

#1 if x < 0

$ % &

Page 3: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

3

!

f (x) :="(x) =1

1+ e(#ax )

!

f (x) :="(x) =1 if x # 0

0 if x < 0

$ % &

sigmoid function!

f (x) :="(x) =

1 if x # 0.5

x if 0.5 > x > 0.5

0 if x $ %0.5

&

' (

) (

Gradient Descent To understand, consider simpler linear

unit, where

Let's learn wi that minimize the squarederror, D={(x1,t1),(x2,t2), . .,(xd,td),..,(xm,tm)}

• (t for target)!

o = wi" x

i

i= 0

n

#

Page 4: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

4

Error for different hypothesis,for w0 and w1 (dim 2)

We want to move the weight vector in thedirection that decrease E

wi=wi+Δwi w=w+Δw

Page 5: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

5

Differentiating E

Update rule for gradient decent

!

"wi=# (t

d

d $D

% & od)x

id

Page 6: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

6

Stochastic Approximation togradient descent

The gradient decent training rule updates summingover all the training examples D

Stochastic gradient approximates gradient decent byupdating weights incrementally

Calculate error for each example Known as delta-rule or LMS (last mean-square) weight

update Adaline rule, used for adaptive filters Widroff and Hoff (1960)

!

"wi=#(t $ o)x

i

Page 7: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

7

XOR problem and Perceptron By Minsky and Papert in mid 1960

Page 8: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

8

Multi-layer Networks The limitations of simple perceptron do not

apply to feed-forward networks withintermediate or „hidden“ nonlinear units

A network with just one hidden unit canrepresent any Boolean function

The great power of multi-layer networks wasrealized long ago But it was only in the eighties it was shown how to

make them learn

Multiple layers of cascade linear units stillproduce only linear functions

We search for networks capable ofrepresenting nonlinear functions Units should use nonlinear activation functions Examples of nonlinear activation functions

Page 9: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

9

XOR-example

Page 10: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

10

Back-propagation is a learning algorithm formulti-layer neural networks

It was invented independently several times Bryson an Ho [1969] Werbos [1974] Parker [1985] Rumelhart et al. [1986]

Parallel Distributed Processing - Vol. 1FoundationsDavid E. Rumelhart, James L. McClelland and the PDPResearch Group

What makes people smarter than computers? These volumesby a pioneering neurocomputing.....

Page 11: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

11

Back-propagation The algorithm gives a prescription for

changing the weights wij in any feed-forward network to learn a training set ofinput output pairs {xd,td}

We consider a simple two-layer network

xk

x1 x2 x3 x4 x5

Page 12: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

12

Given the pattern xd the hidden unit jreceives a net input

and produces the output

!

net jd

= w jk

k=1

5

" xkd

!

V j

d= f (net j

d) = f ( w jk

k=1

5

" xkd)

Output unit i thus receives

And produce the final output!

netid

= Wij

j=1

3

" V j

d= (Wij #

j=1

3

" f ( w jk

k=1

5

" xkd))

!

oid

= f (netid) = f ( Wij

j=1

3

" V j

d) = f ( (Wij #

j=1

3

" f ( w jk

k=1

5

" xkd)))

Page 13: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

13

Out usual error function

For l outputs and m input output pairs{xd,td}

!

E[r w ] =

1

2(t

i

d

i=1

l

"d =1

m

" # oi

d)2

In our example E becomes

E[w] is differentiable given f is differentiable Gradient descent can be applied

!

E[r w ] =

1

2(t

i

d

i=1

2

"d =1

m

" # oi

d)2

!

E[r w ] =

1

2(ti

d

i=1

2

"d =1

m

" # f ( Wij

j

3

" $ f ( w jk xk

d

k=1

5

" )))2

Page 14: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

14

For hidden-to-output connections thegradient descent rule gives:

!

"Wij = #$%E

%Wij

= #$ (tid # oi

d) f

'

d=1

m

& (netid) ' (#V j

d)

"Wij =$ (tid # oi

d) f

'

d=1

m

& (netid) 'V j

d

!

"id

= f'(neti

d)(ti

d # oid)

$Wij =% "id

d=1

m

& V j

d

For the input-to hidden connection wjk wemust differentiate with respect to the wjk Using the chain rule we obtain

!

"w jk = #$%E

%w jk

= #$%E

%Vj

d&%Vj

d

%w jkd=1

m

'

Page 15: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

15

!

"id

= f'(neti

d)(ti

d# oi

d)

!

" j

d= f

'(net j

d) Wij

i=1

2

# "id

!

"w jk =# (tid

i=1

2

$d=1

m

$ % oid) f

'(neti

d)Wij f

'(net j

d) & xk

d

!

"w jk =# $id %

i=1

2

&d=1

m

& Wij f'(net j

d) % xk

d

!

"w jk =# $ j

d

d=1

m

% & xkd

!

"w jk =# $ j

d

d=1

m

% & xkd

!

"Wij =# $id

d=1

m

% V j

d

we have same form with a differentdefinition of δ

Page 16: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

16

In general, with an arbitrary number of layers,the back-propagation update rule has alwaysthe form

Where output and input refers to the connectionconcerned

V stands for the appropriate input (hidden unit orreal input, xd )

δ depends on the layer concerned!

"wij =# $outputd=1

m

% &Vinput

By the equation

allows us to determine for a givenhidden unit Vj in terms of the δ‘s of theunit oi

The coefficient are usual forward, but theerrors δ are propagated backward back-propagation

!

" j

d= f

'(net j

d) Wij

i=1

2

# "id

Page 17: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

17

We have to use a nonlinear differentiableactivation function Examples:

!

f (x) =" (x) =1

1+ e(#$%x )

!

f'(x) =" '

(x) =# $"(x) $ (1%"(x))

!

f (x) = tanh(" # x)

f'(x) =" # (1$ f (x)

2)

Page 18: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

18

Consider a network with M layersm=1,2,..,M

Vmi from the output of the ith unit of the

mth layer V0

i is a synonym for xi of the ith input Subscript m layers m’s layers, not

patterns Wm

ij mean connection from Vjm-1 to Vi

m

Stochastic Back-PropagationAlgorithm (mostly used)

1. Initialize the weights to small random values2. Choose a pattern xd

k and apply is to the input layer V0k= xd

k forall k

3. Propagate the signal through the network

4. Compute the deltas for the output layer

5. Compute the deltas for the preceding layer for m=M,M-1,..2

6. Update all connections

7. Goto 2 and repeat for the next pattern

!

Vi

m= f (neti

m) = f ( wij

m

j

" V j

m#1)

!

"iM

= f'(neti

M)(ti

d#Vi

M)

!

"im#1

= f'(neti

m#1) w ji

m

j

$ " j

m

!

"wij

m=#$i

mV j

m%1

!

wij

new= wij

old+ "wij

Page 19: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

19

More on Back-Propagation Gradient descent over entire network

weight vector Easily generalized to arbitrary directed

graphs Will find a local, not necessarily global

error minimum In practice, often works well (can run

multiple times)

Gradient descent can be very slow if η is tosmall, and can oscillate widely if η is to large

Often include weight momentum α

Momentum parameter α is chosen between 0and 1, 0.9 is a good value!

"wpq (t +1) = #$%E

%wpq

+& ' "wpq (t)

Page 20: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

20

Minimizes error over training examples Will it generalize well

Training can take thousands of iterations,it is slow!

Using network after training is very fast

Page 21: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

21

Convergence of Back-propagation Gradient descent to some local minimum

Perhaps not global minimum... Add momentum Stochastic gradient descent Train multiple nets with different initial weights

Nature of convergence Initialize weights near zero Therefore, initial networks near-linear Increasingly non-linear functions possible as training

progresses

Page 22: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

22

Expressive Capabilities ofANNs Boolean functions:

Every boolean function can be represented bynetwork with single hidden layer

but might require exponential (in number of inputs)hidden units

Continuous functions: Every bounded continuous function can be

approximated with arbitrarily smallerror, by network with one hidden layer [Cybenko1989; Hornik et al. 1989]

Any function can be approximated to arbitraryaccuracy by a network with twohidden layers [Cybenko 1988].

NETtalk Sejnowski et al 1987

Page 23: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

23

Prediction

Page 24: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

24

Page 25: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

25

Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

Page 26: Back-Propagation Algorithm - ULisboa · 1 Back-Propagation Algorithm Perceptron Gradient Descent Multi-layerd neural network Back-Propagation More on Back-Propagation Examples

26

RBF Networks, Support Vector Machines