optimizing number of hidden neurons in neural networks

26
1 Optimizing number of hidden neurons in neural networks Janusz A. Starzyk School of Electrical Engineering and Computer Science Ohio University Athens Ohio U.S.A IASTED International Conference on Artificial Intelligence and Applications Innsbruck, Austria Feb, 2007

Upload: nguyet

Post on 01-Feb-2016

41 views

Category:

Documents


0 download

DESCRIPTION

Optimizing number of hidden neurons in neural networks. IASTED International Conference on Artificial Intelligence and Applications Innsbruck, Austria Feb, 2007. Janusz A. Starzyk School of Electrical Engineering and Computer Science Ohio University Athens Ohio U.S.A. Outline. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Optimizing number of hidden neurons in neural networks

1

Optimizing number of hidden neurons in neural networks

Janusz A. StarzykSchool of Electrical Engineering and Computer Science

Ohio UniversityAthens Ohio U.S.A

IASTED International Conference on Artificial Intelligence and Applications Innsbruck, Austria Feb, 2007

Page 2: Optimizing number of hidden neurons in neural networks

2

Outline

Neural networks – multi-layer perceptron Overfitting problem Signal-to-noise ratio figure (SNRF) Optimization using signal-to-noise ratio figure Experimental results Conclusions

Page 3: Optimizing number of hidden neurons in neural networks

3

Neural networks– multi-layer perceptron

(MLP)

)( 11 yfz ii xwy 11

122ii zwy )( 22 yfz

Inputs x Outputs z

Page 4: Optimizing number of hidden neurons in neural networks

4

Neural networks– multi-layer perceptron

(MLP)

Efficient mapping from inputs to outputs

Powerful universal function approximation

Number of inputs and outputs determined by the data

Number of hidden neurons: determines the fitting accuracy critical

inputs outputs

1 2 3 4 5 6 7 8 9 10-1.5

-1

-0.5

0

0.5

1

1.5

training datafunction approximation

MLP

Page 5: Optimizing number of hidden neurons in neural networks

5

Overfitting problem

Generalization:

Overfitting: overestimates the function complexity, degrades generalization capability

Bias/variance dilemma Excessive hidden neuron

overfitting

Training data(x, y) Model

MLP training new data

(x’) y’Model

1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4-5

0

5

10

15

20

25

30

35

40Training dataDesired functionOverfitted functionTesting setDesired value for new dataPredicted value for new data

Page 6: Optimizing number of hidden neurons in neural networks

6

Overfitting problem

Avoid overfitting: cross-validation & early stopping

All available training data

(x, y)

training data(x, y)

testing data(x’, y’)

4.4 4.6 4.8 5 5.2 5.4 5.6 5.8

-1.2

-1.1

-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

3.2 3.4 3.6 3.8 4 4.2 4.4-1

-0.9

-0.8

-0.7

-0.6

-0.5

-0.4

-0.3

-0.2

Training error etrain

Testing error etest

Number of hidden neurons

Fitting error

etrain

etest

MLPtraining

MLPtesting

Optimum number

Stopping criterion:

etest starts to increase or etrain

and etest start to diverge

Page 7: Optimizing number of hidden neurons in neural networks

7

Overfitting problem

How to divide available data?

All available training data

(x, y)

training data(x, y)

testing data(x’, y’)

Number of hidden neurons

Fitting error

etrain

etest

Optimum number

When to stop?

data wasted

Can test error catch the generalization error?

Page 8: Optimizing number of hidden neurons in neural networks

8

Overfitting problem

Desired:

•Quantitative measure of unlearned useful information from etrain

•Automatic recognition of overfitting

1.5 2 2.5 3-5

0

5

10

15

20

25

30

35

40

45

Training dataDesired functionOverfitted functionTesting setDesired value on new dataPredicted value on new data

1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4

-5

0

5

10

15

Training data cubicTesting setDesired value on new dataPredicted value on new data

1.6 1.8 2 2.2 2.4 2.6 2.8 3 3.2 3.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2Training dataTesting setFitting function

Page 9: Optimizing number of hidden neurons in neural networks

9

Sampled data: function value + noise Error signal:

approximation error component + noise component

Signal-to-noise ratio figure (SNRF)

Noise part Should not be learned

Useful signalShould be reduced

Assumption: continuous function & WGN as noise Signal-to-noise ratio figure (SNRF):

signal energy/noise energy Compare SNRFe and SNRFWGN

Learning should stop – ?If there is useful signal left unlearnedIf noise dominates in the error signal

Page 10: Optimizing number of hidden neurons in neural networks

10

Signal-to-noise ratio figure (SNRF)– one-dimensional case

1 1.5 2 2.5 3 3.5 4 4.5 5-1.5

-1

-0.5

0

0.5

1

1.5Training dataQuadratic fitting

1 1.5 2 2.5 3 3.5 4 4.5 5-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

Training data and approximating function Error signal

),...2,1( Ninse iii

approximation error component noise component+

How to measure the level of these two components?

Page 11: Optimizing number of hidden neurons in neural networks

11

iii nse

nsns EEE

N

ii

iins

e

eeCE

1

2

),(

),( 1 iis eeCE

1 1.5 2 2.5 3 3.5 4 4.5 5-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

0),( 1 ii nnC

Signal-to-noise ratio figure (SNRF)– one-dimensional case

High correlation between

neighboring samples of signals

snsn EEE

Page 12: Optimizing number of hidden neurons in neural networks

12

Signal-to-noise ratio figure (SNRF)– one-dimensional case

),(),(

),(

1

1

iiii

ii

n

se eeCeeC

eeC

E

ESNRF

),(),(

),(

1

1

iiii

iiWGN nnCnnC

nnCSNRF

0)(_ NWGNSNRF

NNWGNSNRF

1)(_

Page 13: Optimizing number of hidden neurons in neural networks

13

-0.015 -0.01 -0.005 0 0.005 0.01 0.0150

5

10

15

20

25

30

35Histogram of SNRF for WGN with 216 samples

1.7mean : 0

standard deviation: 0.0039

Signal-to-noise ratio figure (SNRF)– one-dimensional case

)(7.1)()( ___ NNNth WGNSNRFWGNSNRFWGNSNRF Hypothesis test:

5% significance level

Page 14: Optimizing number of hidden neurons in neural networks

14

Signal-to-noise ratio figure (SNRF)– multi-dimensional case

Signal and noise level: estimated within neighborhood

),...2,1( 1

NpeewEM

ipippisp

sample p

Ni

Np

d

dw

eed

M

iMpi

Mpi

pi

pippi

,...2,1

,...2,1

1

1

1

M neighbors

Page 15: Optimizing number of hidden neurons in neural networks

15

N

psps EE

1

N

p

M

ipippi

N

iisnsn eeweEEE

1 11

2

N

p

M

ipippi

N

ii

N

p

M

ipippi

n

se

eewe

eew

E

ESNRF

1 11

2

1 1

All samples

Signal-to-noise ratio figure (SNRF)– multi-dimensional case

Page 16: Optimizing number of hidden neurons in neural networks

16

Signal-to-noise ratio figure (SNRF)– multi-dimensional case

0)(

1 11

2

1 1_

N

p

M

ipippi

N

ii

N

p

M

ipippi

WGNSNRF

eewe

eew

N

NNWGNSNRF

2)(_

)(2.1)()( ___ NNNth WGNSNRFWGNSNRFWGNSNRF

WGNSNRF

M=1 threshold multi-dimensional (M=1)≈ threshold one-dimensional

Page 17: Optimizing number of hidden neurons in neural networks

17

Optimization using SNRF

Noise dominates in the error signal, Little information left unlearned,

Learning should stop

SNRFe< threshold SNRFWGN

Start with small network Train the MLP etrain

Compare SNRFe & SNRFWGN

Add hidden neurons

Stopping criterion:SNRFe< threshold SNRFWGN

Page 18: Optimizing number of hidden neurons in neural networks

18

Optimization using SNRF

Set the structure of MLP Train the MLP with back-propagation iteration

etrain

Compare SNRFe & SNRFWGN

Keep training with more iterations

Applied in optimizing number of iterations in back-propagation training to avoid overfitting

(overtraining)

Page 19: Optimizing number of hidden neurons in neural networks

19

Experimental results

Optimizing number of iterations

-3 -2 -1 0 1 2 3-0.5

0

0.5

1

1.5Testing performance using 10 iterations

x

y

testing dataapproximated value

-3 -2 -1 0 1 2 3-0.5

0

0.5

1

1.5Testing performance using 200 iterations

x

y

testing dataapproximated value

noise-corrupted 0.4sinx+0.5

Page 20: Optimizing number of hidden neurons in neural networks

20

Optimization using SNRF

Optimizing order of polynomial

-2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5

0

0.2

0.4

0.6

0.8

1

x

y

Training dataTesting dataDesired function

5 10 15 20 25

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

order of fitting polynomial

Training errorTesting errorGeneralization errorSNRFStopping threshold

4 6 8 10 12 14

0

0.005

0.01

0.015

0.02

order of fitting polynomial

Training errorTesting errorGeneralization error

Page 21: Optimizing number of hidden neurons in neural networks

21

Experimental results

0 5 10 15 20 25 30 35 40 45 50-2

0

2

4

6

number of hidden neurons

SN

RF

SNRF of error signal vs. number of hidden neurons

SNRF of error signalthreshold

0 5 10 15 20 25 30 35 40 45 500

0.2

0.4

0.6

0.8

1Training MSE and Testing MSE vs. number of hidden neurons

number of hidden neurons

MS

E

training performancetesting performance

Optimizing number of hidden neuronstwo-dimensional function

-1-0.5

00.5

1-1

-0.50

0.51

-3

-2

-1

0

1

2

3

Training data from multi-dimensional function

Page 22: Optimizing number of hidden neurons in neural networks

22

Experimental results

-1

-0.5

0

0.5

1

-1

-0.5

0

0.5

1

-3

-2

-1

0

1

2

3

Difference between desired function and approximating function using 25 neurons

-1

-0.5

0

0.5

1

-1-0.5

00.5

1

-3

-2

-1

0

1

2

Difference between desired function and approximating function using 35 neurons

Page 23: Optimizing number of hidden neurons in neural networks

23

Experimental results

Mackey-glass database

Every consecutive 7 samples the following sampleMLP

0 2 4 6 8 10 12 14 16 18 20-0.5

0

0.5

1

1.5

2

2.5

number of hidden neurons(b)

SN

RF

SNRF of error signal vs. number of hidden neurons

SNRF of error signalthreshold

0 2 4 6 8 10 12 14 16 18 200

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02Training MSE and Testing MSE vs. number of hidden neurons

number of hidden neurons(a)

MS

E

Training MSETesting MSE

Page 24: Optimizing number of hidden neurons in neural networks

24

0 20 40 60 80 100 120 140 160 180 200-0.02

-0.01

0

0.01

0.02Error signal obtained in OAA

sample

erro

r sig

nal

0 50 100 150 200 250 300 350 400-2

0

2

4x 10

-3 Autocorrelation of error signal obtained in OAA

Aut

ocor

rela

tion

Experimental results

WGN characteristic

Page 25: Optimizing number of hidden neurons in neural networks

25

Experimental results

Puma robot arm dynamics database8 inputs (positions, velocities, torques) angular acceleration

MLP

0 10 20 30 40 50 60 70 80 90 100-0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

number of hidden neurons

SN

RF

SNRF of error signal vs. number of hidden neurons

SNRF of error signalthreshold

0 10 20 30 40 50 60 70 80 90 1000

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5Training MSE and Testing MSE vs. number of hidden neurons

number of hidden neurons

MS

E

training performancetesting performance 6th degree polynomial fit

Page 26: Optimizing number of hidden neurons in neural networks

26

Conclusions

Quantitative criterion based on SNRF to optimize number of hidden neurons in MLP

Detect overfitting by training error only No separate test set required Criterion: simple, easy to apply, efficient and

effective Optimization of other parameters of neural

networks or fitting problems