supervised learning 1.early learning algorithms 2.first order gradient methods 3.second order...

48
Supervised learning 1. Early learning algorithms 2. First order gradient methods 3. Second order gradient methods

Post on 19-Dec-2015

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Supervised learning

1. Early learning algorithms

2. First order gradient methods

3. Second order gradient methods

Page 2: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Early learning algorithms

• Designed for single layer neural networks

• Generally more limited in their applicability

• Some of them are– Perceptron learning– LMS or Widrow- Hoff learning– Grossberg learning

Page 3: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Perceptron learning

1. Randomly initialize all the networks weights.

2. Apply inputs and find outputs ( feedforward).

3. compute the errors.

4. Update each weight as

5. Repeat steps 2 to 4 until the errors reach the satisfactory level.

)()()()1( kekpkwkw jiijij

Page 4: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Performance OptimizationGradient based methods

Page 5: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Basic Optimization Algorithm

Page 6: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Steepest Descent (first order Taylor expansion)

Page 7: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Example

Page 8: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Plot

-2 -1 0 1 2-2

-1

0

1

2

Page 9: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

LMS or Widrow- Hoff learning

• First introduce ADALINE (ADAptive LInear NEuron) Network

Page 10: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

LMS or Widrow- Hoff learningor Delta Rule

• ADALINE network same basic structure as the perceptron network

Page 11: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Approximate Steepest Descent

Page 12: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Approximate Gradient Calculation

Page 13: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

LMS AlgorithmThis algorithm inspire from steepest

descent algorithm

Page 14: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Multiple-Neuron Case

Page 15: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Difference between perceptron learning and LMS learning

• DERIVATIVE

• Linear activation function has derivative

but

• sign (bipolar, unipolar) has not derivative

Page 16: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Grossberg learning (associated learning)

• Sometimes known as instar and outstar training• Updating rule:

• Where could be the desired input values (instar training, example: clustering) or the desired output values (outstar) depending on network structure.

• Grossberg network (use Hagan to more details)

)()()()1( kwkxkwkw iiii

ix

Page 17: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

First order gradient method

Back propagation

Page 18: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Multilayer Perceptron

R – S1 – S2 – S3 Network

Page 19: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Example

Page 20: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Elementary Decision Boundaries

Page 21: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Elementary Decision Boundaries

Page 22: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Total Network

Page 23: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Function Approximation Example

Page 24: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Nominal Response

-2 -1 0 1 2-1

0

1

2

3

Page 25: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Parameter Variations

Page 26: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Multilayer Network

Page 27: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Performance Index

Page 28: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Chain Rule

Page 29: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Gradient Calculation

Page 30: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Steepest Descent

Page 31: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Jacobian Matrix

Page 32: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Backpropagation (Sensitivities)

Page 33: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Initialization (Last Layer)

Page 34: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Summary

Page 35: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

• Back-propagation training algorithm

• Backprop adjusts the weights of the NN in order to minimize the network total mean squared error.

Network activationForward Step

Error propagationBackward Step

Summary

Page 36: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Example: Function Approximation

Page 37: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Network

Page 38: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Initial Conditions

Page 39: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Forward Propagation

Page 40: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Transfer Function Derivatives

Page 41: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Backpropagation

Page 42: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Weight Update

Page 43: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Choice of Architecture

Page 44: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Choice of Network Architecture

Page 45: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Convergence Global minium (left) local minimum

(rigth)

Page 46: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Generalization

Page 47: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Disadvantage of BP algorithm

• Slow convergence speed• Sensitivity to initial conditions• Trapped in local minima• Instability if learning rate is too large

• Note: despite above disadvantages, it is popularly used in control community. There are numerous extensions to improve BP algorithm.

Page 48: Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods

Improved BP algorithms(first order gradient method)

1. BP with momentum

2. Delta- bar- delta

3. Decoupled momentum

4. RProp

5. Adaptive BP

6. Trinary BP

7. BP with adaptive gain

8. Extended BP