neural networks

37
Neural Networks Machine Learning; Thu May 21, 2008

Upload: mailund

Post on 03-Dec-2014

2.414 views

Category:

Technology


4 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Neural Networks

Neural Networks

Machine Learning; Thu May 21, 2008

Page 2: Neural Networks

Motivation

Problem: With our linear methods, we can train the weights but not the basis functions:

Activator

Trainableweight

Fixed basisfunction

Page 3: Neural Networks

Motivation

Problem: With our linear methods, we can train the weights but not the basis functions:

Can we learn the basis functions?

Page 4: Neural Networks

Motivation

Problem: With our linear methods, we can train the weights but not the basis functions:

Maybe reuse ideas from our previous technology? Adjustable weight vectors?

Can we learn the basis functions?

Page 5: Neural Networks

Historical notes

This motivation is not how the technology came about!

Neural networks were developed in an attempt to achieve AI by modeling the brain (but artificial neural networks have very little to do with biological neural networks).

Page 6: Neural Networks

Feed-forward neural network

Two levels of “linear regression” (or classification):

Output layer

Hidden layer

Page 7: Neural Networks

Feed-forward neural network

Page 8: Neural Networks

Example

tanh activation function for 3 hidden layers

Linear activation function for output layer

Page 9: Neural Networks

Example

tanh activation function for 3 hidden layers

Linear activation function for output layer

Page 10: Neural Networks

Example

tanh activation function for 3 hidden layers

Linear activation function for output layer

Page 11: Neural Networks

Example

tanh activation function for 3 hidden layers

Linear activation function for output layer

Page 12: Neural Networks

Example

tanh activation function for 3 hidden layers

Linear activation function for output layer

Page 13: Neural Networks

Network training

If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training.

Page 14: Neural Networks

Network training

Regression problems:

If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training.

Page 15: Neural Networks

Network training

Classification problems:

If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training.

Page 16: Neural Networks

Network training

Training by minimizing an error function.

If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training.

Page 17: Neural Networks

Network training

Training by minimizing an error function.

Typically, we cannot minimize analytically but need numerical optimization algorithms.

If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training.

Page 18: Neural Networks

Network training

Training by minimizing an error function.

Typically, we cannot minimize analytically but need numerical optimization algorithms.

If we can give the network a probabilistic interpretation, we can use our “standard” estimation methods for training.

Notice: There will typically be more than one minimum of the error function (due to symmetries).

Notice: At best, we can find local minima.

Page 19: Neural Networks

Parameter optimization

Numerical optimization is beyond the scope of this class – you can (should) get away with just using appropriate libraries such as GSL.

Page 20: Neural Networks

Parameter optimization

Numerical optimization is beyond the scope of this class – you can (should) get away with just using appropriate libraries such as GSL.

These can typically find (local) minima of general functions, but the most efficient algorithms also need the gradient of the function to minimize.

Page 21: Neural Networks

Back propagation

Back propagation is an efficient algorithm for computing both the error function and the gradient.

Page 22: Neural Networks

Back propagation

Back propagation is an efficient algorithm for computing both the error function and the gradient.

For iid data:

Page 23: Neural Networks

Back propagation

Back propagation is an efficient algorithm for computing both the error function and the gradient.

For iid data:

We just focus on this guy

Page 24: Neural Networks

Back propagation

After running the feed-forward algorithm, we know the values z

i and y

k.

we need to compute the ds (called errors).

Page 25: Neural Networks

Back propagation

The choice of error function and output activator typically makes it simple to deal with the output layer

Page 26: Neural Networks

Back propagation

For hidden layers, we apply the chain rule

Page 27: Neural Networks

Calculating error and gradient

Algorithm

1. Apply feed forward to calculate hidden and output variables and then the error.

2. Calculate output errors and propagate errors back to get the gradient.

3. Iterate ad lib.

Page 28: Neural Networks

Calculating error and gradient

Algorithm

1. Apply feed forward to calculate hidden and output variables and then the error.

2. Calculate output errors and propagate errors back to get the gradient.

3. Iterate ad lib.

Page 29: Neural Networks

Calculating error and gradient

Algorithm

1. Apply feed forward to calculate hidden and output variables and then the error.

2. Calculate output errors and propagate errors back to get the gradient.

3. Iterate ad lib.

Page 30: Neural Networks

Calculating error and gradient

Algorithm

1. Apply feed forward to calculate hidden and output variables and then the error.

2. Calculate output errors and propagate errors back to get the gradient.

3. Iterate ad lib.

Page 31: Neural Networks

Calculating error and gradient

Algorithm

1. Apply feed forward to calculate hidden and output variables and then the error.

2. Calculate output errors and propagate errors back to get the gradient.

3. Iterate ad lib.

Page 32: Neural Networks

Calculating error and gradient

Algorithm

1. Apply feed forward to calculate hidden and output variables and then the error.

2. Calculate output errors and propagate errors back to get the gradient.

3. Iterate ad lib.

Page 33: Neural Networks

Calculating error and gradient

Algorithm

1. Apply feed forward to calculate hidden and output variables and then the error.

2. Calculate output errors and propagate errors back to get the gradient.

3. Iterate ad lib.

Page 34: Neural Networks

Calculating error and gradient

Algorithm

1. Apply feed forward to calculate hidden and output variables and then the error.

2. Calculate output errors and propagate errors back to get the gradient.

3. Iterate ad lib.

Page 35: Neural Networks

Overfitting

The complexity of the network is determined by the hidden layer – and overfitting is still a problem.

Use e.g. test datasets to alleviate this problem.

Page 36: Neural Networks

Early stopping

Training data Test data

...or just stop the training when overfitting starts to occur.

Page 37: Neural Networks

Summary

● Neural network as general regression/classification models

● Feed-forward evaluation

● Maximum-likelihood framework

● Error back-propagation to obtain gradient