learning from data: adaptive basis functions · radial basis functions learning from data: adaptive...

15
Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 http://www.anc.ed.ac.uk/amos/lfd/ Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Upload: hoanghanh

Post on 25-Apr-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Learning from Data: Adaptive BasisFunctions

Amos Storkey, School of Informatics

November 21, 2005

http://www.anc.ed.ac.uk/∼amos/lfd/

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 2: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Neural Networks

I Hidden to output layer - a linear parameter modelI But adapt the “features” of the model.I Neural Network features pick particular directions in input

space.I But could use other features - eg localisation features

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 3: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Radial Basis Functions

I Radial Basis Functions are also linear parameter models.I Have localised features.I But so far we have only considered fixed basis functionsI Instead could adapt the basis functions as we do with

neural networks.I The rest is just the same.

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 4: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

End of Lecture

I Well pretty much.I But for completeness we can reiterate the process!I Compare with Neural NetworksI Some pictures.I Error functions.

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 5: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Neural Networks

I Output of a node is nonlinear function of linear combinationof parents.

I In other words a function of a projection to a particulardirection

yi = gi

∑j

wijxj + µi

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 6: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Radial Basis Function

I Output of a node is a nonlinear function of the distance ofthe input from a particular point.

I Nonlinear function is usually decaying: hence it is a localmodel.

y(x,θ) =∑

i

wiφi(x, bi)

φi has parameters bi and for radial basis functions isgenerally a function of |x− ri | for some centre ri ⊂ bi .

I Of course all the sums work for anything of this form,including radial basis functions, neural networks etc.

I Call general form adaptive basis functions. Onlyrequirement is differentiability w.r.t parameters.

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 7: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Radial Basis Functions

−1

−0.5

0

0.5

1

−1−0.8

−0.6−0.4

−0.20

0.20.4

0.60.8

1

0

0.5

1

x(1)x(2)−1

−0.5

0

0.5

1

−1

−0.5

0

0.5

1

0

0.5

1

1.5

x(1)

x(2)

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 8: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 9: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Error Functions

I Regression - sum squared error:

Etrain =∑

µ

(yµ − f (xµ,θ))2

I Classification:

Etrain = −∑

µ

(yµ log f µ + (1− yµ) log(1− f µ))

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 10: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Regularisation and Initialisation

I Regularisation: width of basis functions determinessmoothness. Could ensure basis width is not too small toprevent overfitting, or use a validation set to set the basiswidth.

I Initialisation: matters. Best try multiple restarts as withneural networks. Given basis initialisations, can get goodinitialisations for the weights in the regression case bytreating it as a linear parameter model and solving.

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 11: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Optimisation

I Can calculate all the derivatives just as before.I Gather all the parameters together into a vector. Optimise

using e.g. conjugate gradients.I Example code is on the lecture notes.I Another approach for regression involves iteration of

solving the linear parameter model and updating the basisfunctions (see notes)

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 12: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Comparison

I Radial basis functions give local models. Hence away fromthe data we get a prediction of zero. Does our data reallytell us nothing about what happens in non-local regions?Even so should we really predict zero?

I Both RBF and MLP subject to local minima.I Understanding the result is slightly easier for radial basis

functions.

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 13: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Committees and Error Bars

I Getting a prediction is one thing, but what about predictionuncertainty?

I We could get some gauge of uncertainty by looking at thevariation in predictions across different learnt models.

I Use committees.

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 14: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Committee Approach

I Pick a number of different models (could even be differentstarting points of the same model).

I Predict using the average prediction of the models.I Get a measure of the confidence in the prediction by

looking at the variance of each model prediction aroundthe average prediction.

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions

Page 15: Learning from Data: Adaptive Basis Functions · Radial Basis Functions Learning from Data: Adaptive Basis Functions Amos Storkey, School of Informatics November 21, 2005 ˆ¼amos/lfd

Radial Basis Functions

Better/Other Ways

I Bayesian methods. Calculate the posterior distribution ofthe parameters. Use that to obtain error bars in data space.

I Take the limit of an infinite number of Bayesian neuralnetworks: gives Gaussian process models, where thenon-linear prediction and error bar problem can be solvedanalytically.

I Look at dependence of the prediction on the training data,by resampling the training data: bootstrap and bagging.

Amos Storkey, School of Informatics Learning from Data: Adaptive Basis Functions