1 performance optimization steepest descent. 2 objective to learn algorithms how to optimize a...

47
1 Performance Optimization Steepest Descent

Upload: juliet-perry

Post on 19-Dec-2015

235 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

1

Performance Optimization

Steepest Descent

Page 2: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

2

objective

To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize F(x)

Page 3: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

3

Basic Optimization Algorithmxk 1+ xk kpk+=

x k xk 1+ x k– kpk= =

pk - Search Direction

k - Learning Rate

or

xk

x k 1+

kpk

Page 4: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

4

Steepest Descent

F x k 1+ F xk

Choose the next step so that the function decreases:

Page 5: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

5

Steepest Descent

F xk 1+ F xk x k+ F xk gkT x k+=

For small changes in x we can approximate F(x):

g k F x x xk=

where

Page 6: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

6

Steepest Descent

F xk 1+ F xk x k+ F xk gkT x k+=

g kT

x k kg kTpk 0=

If we want the function to decrease:

Page 7: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

7

Steepest Descent

F xk 1+ F xk x k+ F xk gkT x k+=

g kT

x k kg kTpk 0=

If we want the function to decrease:

pk g– k=

We can maximize the decrease by choosing:

x k 1+ xk kg k–=

Page 8: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

8

Steepest Descent

pk g– k=

We can maximize the decrease by choosing:

x k 1+ xk kg k–=

Two general methods to select k:- minimize F(x) w.r.t. k

- use a predetermined value (e.g. 0.2, 1/k)

Page 9: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

9

ExampleF x x1

22 x1x

22x 2

2x1+ + +=

x00.5

0.5=

F x x1

F x

x2

F x

2x 1 2x2 1+ +

2x 1 4x 2+= = g0 F x

x x0=

3

3= =

0.1=

x1 x0 g0– 0.5

0.50.1 3

3– 0.2

0.2= = =

x2 x1 g1– 0.2

0.20.1 1.8

1.2– 0.02

0.08= = =

Page 10: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

10

Plot

-2 -1 0 1 2-2

-1

0

1

2

Page 11: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

11

Stable Learning Rates

Suppose that the performance index is a quadratic function:

cG xdAxxx TT

21)(

dAxx )(G

Steepest descent algorithm with constant learning rate:

)(1 dAxxgxx kkkkk

dxAIx kk ][1

A linear dynamic system will be stable if the eigenvalues of the matrix [I-A] are less than one in magnitude.

Page 12: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

13

Stable Learning Rates

Let {1, 2,…, n} and {z1,z2,…, zn} be the eigenvalues and eigenvectors of the Hessian matrix. Then

iiiiiiii zzzAzzzAI )1(][

Condition for the stability of the steepest descent algorithm is then

11 i

Assume that the quadratic function has a strong minimum point, then its eigenvalues must be positive numbers. Hence,

i 2

This must be true for all eigenvalues:max

2

Page 13: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

14

ExampleA 2 2

2 4= 1 0.764= z1

0.851

0.526–=

2 5.24 z20.526

0.851=

=

2

max------------ 2

5.24---------- 0.38= =

-2 -1 0 1 2-2

-1

0

1

2

-2 -1 0 1 2-2

-1

0

1

2 0.37= 0.39=

Page 14: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

15

CHAPTER 10

Widrow-Hoff Widrow-Hoff LearningLearning

Page 15: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

16

Objectives

Widrow-Hoff learning is an approximate steepest descent algorithm, in which the performance index is mean square error.

It is widely used today in many signal processing applications.

It is precursor to the backpropagation algorithm for multilayer networks.

Page 16: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

17

ADALINE Network

ADALINE (Adaptive Linear Neuron) network and its learning rule, LMS (Least Mean Square) algorithm are proposed by Widrow and Marcian Hoff in 1960.

Both ADALINE network and the perceptron suffer from the same inherent limitation: they can only solve linearly separable problems.

The LMS algorithm minimizes mean square error (MSE), and therefore tires to move the decision boundaries as far from the training patterns as possible.

Page 17: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

18

ADALINE Network

n = Wp + b a = purelin(Wp + b)

+

aS1

nS11 b

S1

W

SR

R

pR1

S

Single-layer perceptron

aS1+

nS11 b

S1

W

SR

R

pR1

S

Page 18: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

19

Single ADALINE

Set n = 0, then Wp + b = 0 specifies a decision boundary.The ADALINE can be used to classify objects into two categories if they are linearly separable.

11w

12w

1p

2p n a

1

b

nnpurelinabn

wwp

p

)(

, 12112

1

Wp

Wp

1p

2p

W0a

0a

Page 19: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

20

Mean Square Error

The LMS algorithm is an example of supervised training.The LMS algorithm will adjust the weights and biases of the ADALINE in order to minimize the mean square error, where the error is the difference between the target output (tq) and the network output (pq).

1 ,1 pz

wx

b

])[(])[(][)( 2T22 zxx tEatEeEF

zxpw TT1 ba

E[·]: expected value

MSE:

Page 20: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

21

Mean Square Error

matrixn correlatioinput :][ and between n vector correlatio-cross :][

][ 2

TEttE

tEc

zzRzzh

022)2()( TT RxhRxxhxx cF

hRx 1

xzzxzxxzzxzx

zxx

][][2][]2[

])[()(

TTT2

TTT2

2T

EtEtEttE

tEF

Rxxhx TT2 c

matrix symmetric: ,2)(

ectorconstant v: ,)()(TT

TT

RRxxRRxRxx

hhhxxh

Page 21: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

Example 1

Page 22: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

23

Solved Problem P10.3

0 1 2 3-3 -2 -1

0

1

2

3

-2

-1

4

1w

2wSo the contour of the performance surface will be circular. The center of the contours (the minimum point) is .

x

Page 23: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

Approximate Steepest Descent

Page 24: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

Approximate Gradient

Page 25: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

Approximate Gradient(conti.)

Page 26: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

Approximate Gradient(conti.)

Page 27: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

28

LMS Algorithm

The steepest descent algorithm with constant learning rate is

kFkk xx

xxx )(1

)()(2)()(ˆ 2 kkekeF zx )()(21 kkekk zxx

Matrix notation of LMS algorithm:

)(2)()1(

)()(2)()1( T

kkk

kkkk

ebb

peWW

The LMS algorithm is also referred to as the delta rule or the Widrow-Hoff learning algorithm.

Page 28: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

29

Quadratic Functions

General form of quadratic function:

Ax

dAxx

Axxxdx

)(

)(

)(

2

T21T

G

G

cG

ADALINE network mean square error:Rxxhxx TT2)( cF

RAhd 2 ,2

(A: Hessian matrix)

If the eigenvalues of the Hessian matrix are all positive, then the quadratic function will have one unique global minimum.

Page 29: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

30

Orange/Apple Example

1,

111

,1,11

1

2211 tt pp

5.01

0.2,0.1,0.0

.101010101

][

max

T222

1T112

1T

ppppppER

In practical applications, the stable learning rate might NOT be practical to calculate R, and could be selected by trial and error.

Page 30: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

31

Orange/Apple Example

Start, arbitrary, with all the weights set to zero, and then will apply input p1, p2, p1, p2, etc., in that order, calculating the new weights after each input is presented.

1)0()0()0()0(0)0()0()0()0( 11 atatea pWpW

1,

111

,1,11

1

2211 tt pp

2.0

000)0(

W

.4.04.04.0)0()0(2)0()1( T pWW e

4.1)1()1()1()1(4.0)1()1()1()1( 22 atatea pWpW

.16.096.016.0)1()1(2)1()2( T pWW e

Page 31: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

32

Orange/Apple Example

36.0)2()2()2()2(

64.0)2()2()2()2(

1

1

atate

a pWpW

.0160.01040.10160.0)2()2(2)2()3( T pWW e .010)( W

This decision boundary falls halfway between the two reference patterns. The perceptron rule did NOT produce such a boundary,

The perceptron rule stops as soon as the patterns are correctly classified, even though some patterns may be close to the boundaries. The LMS algorithm minimizes the mean square error.

Page 32: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

Perceptron rule V.S. LMS algorithm

Page 33: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

Perceptron rule V.S. LMS algorithm(conti.)

Page 34: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

Perceptron rule V.S. LMS algorithm(conti.)

Page 35: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

Perceptron rule V.S. LMS algorithm(conti.)

Page 36: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

37

Solved Problem P10.4

1,

1

1,1,

1

12211 tt pp

Train the network using the LMS algorithm, with the initial guess set to zero and a learning rate = 0.25.

w1 w1

Page 37: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

38

Solved Problem P10.8

2

2,

1

1:4,

1

2,

2

1:3

0

2,

1

2:2,

2

1,

1

1:1

8765

4321

pppp

pppp

classclass

classclass

Train the network using the LMS algorithm, with the initial guess set to zero and a learning rate = 0.04.

Page 38: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

39

Tapped Delay Line

D

D

D

)(ky )()(1 kykp

)1()(2 kykp

)1()( RkykpR

At the output of the tapped delay line we have an R-dim. vector, consisting of the input signal at the current time and at delays of from 1 to R–1 time steps.

Page 39: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

40

Adaptive Filter

)(kn)(ka

1

b

11w

12wD

D

D

)(ky

Rw1

bikyw

bpurelinkaR

ii

1

1 )1(

)()( Wp

Page 40: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

41

Solved Problem P10.1

)(kn )(ka

11w

12wD

)(ky

D13w

4)1(,5)0(,...}0,0,0,4,5,0,0,0{...,)}({

3 ,1 ,2 131211

yyky

www

Just prior to k = 0 ( k < 0 ):Three zeros have enteredthe filter, i.e., y(3) = y(2) = y(1) = 0, the output just prior to k = 0 is zero.

k = 0: 10005

312)0()0(

Wpa

Page 41: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

42

Solved Problem P10.1

k = 1: 13054

312)1()1(

Wpa

k = 2: 1954

0312)2()2(

Wpa

k = 3: 124

00

312)3()3(

Wpa

k = 4: 0000

312)4()4(

Wpa

Page 42: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

43

Solved Problem P10.1

The effect of y(0) last from k = 0 through k = 2, so it will have an influence for three time intervals.This corresponds to the length of the impulse response of this filter.

0)4(,12)3(,19)2(,13)1(,10)0(,0)1( aaaaaa

)2()1(

)()()( 131211

kyky

kywwwkka Wp

Page 43: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

44

Solved Problem P10.6

)(kn )(ka

)(ky11w

12w

D

D +

+

)()( kykt

)(ke

Application of ADALINE: adaptive predictorThe purpose of this filter is to predict the next value of the input signal from the two previous values.Suppose that the input signal is a stationary random process with autocorrelation function given by

.1)2(,1)1(,3)0(,)()()( yyyy CCCnkykyEnC

Page 44: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

45

Solved Problem P10.6

Sketch the contour plot of the performance index (MSE).i.

)2()1(

)()(kyky

kk pz

.1)2(,1)1(,3)0(,)()()( yyyy CCCnkykyEnC

3)0()()( 22 yCkyEktEc

3113

)0()1()1()0(

yy

yyT

CCCC

E zzR

11

)2()1(

)2()()1()(

y

y

CC

kykykyky

EtE zh

Page 45: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

46

Solved Problem P10.6

Performance Index (MSE): Rxxhxx TT2)( cFThe optimal weights are

21

21

83

84

81

83

11

11

11

3113

hRx

The Hessian matrix is Eigenvalues: 1 = 4, 2 = 8.

Eigenvectors:

6226

2)(2 RAxF

11

,11

21 vv

The contours of F(x) will be elliptical, with the long axis of each ellipse along the 1st eigenvector, since the 1st eigenvalue has the smallest magnitude.The ellipses will be centered at .x

Page 46: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

47

Solved Problem P10.6

The maximum stable value of the learning for the LMS algorithm:

ii.25.0822 max

x

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0 1 2-1-2

0

1

2

-1

-2

The LMS algorithm is approximate steepest descent, so the trajectory for small learning rates will move perpendicular to the contour lines.

iii.

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

x

1v

2v

0 1 2-1-2

0

1

2

-1

-2

Page 47: 1 Performance Optimization Steepest Descent. 2 objective To learn algorithms how to optimize a performance index F(x) -> to Find the value of x that minimize

48

Applications

Noise cancellation system to remove 60-Hz noise from EEG signal (Fig. 10.6)

Echo cancellation system in long distance telephone lines (Fig. 10.10)

Filtering engine noise from pilot’s voice signal (Fig. P10.8)