improving constructive training of rbf networks through selective pruning and model selection

ARTICLE IN PRESS

Neurocomputing 64 (2005) 537–541

0925-2312/$ -

doi:10.1016/j

�CorrespoRecife - PE 5

E-mail ad

www.elsevier.com/locate/neucom

Letter

Improving constructive training of RBF networksthrough selective pruning and model selection

Adriano L.I. Oliveiraa,b,�, Bruno J.M. Meloa, Silvio R.L. Meirab

aPolytechnic School, University of Pernambuco, Rua Benfica, 455, Madalena, Recife - PE 50.750-410, BrazilbCenter of Informatics, Federal University of Pernambuco, P.O. Box 7851, Cidade Universitaria,

Recife - PE 50.732-970, Brazil

Received 26 October 2004; received in revised form 25 November 2004; accepted 28 November 2004

Communicated by R.W. Newcomb

Available online 19 January 2005

Abstract

This letter proposes a constructive training method for radial basis function networks. The

proposed method is an extension of the dynamic decay adjustment (DDA) algorithm, a fast

constructive algorithm for classification problems. The proposed method, which is based on

selective pruning and DDA model selection, aims to improve the generalization performance

of DDA without generating larger networks. Simulations using four image recognition

datasets from the UCI repository demonstrate the validity of the proposed method.

r 2005 Elsevier B.V. All rights reserved.

Keywords: Neural network; RBF network; Model complexity; Classification

1. Introduction

The dynamic decay adjustment algorithm (DDA) is a fast algorithm forconstructive training of radial basis function networks (RBFNs) and probabilisticneural networks (PNNs) [1,2]. DDA relies on two parameters, namely, yþ and y� inorder to decide about the introduction of RBF neurons in the network. Originally, it

see front matter r 2005 Elsevier B.V. All rights reserved.

.neucom.2004.11.027

nding author. Polytechnic School, University of Pernambuco, Rua Benfica, 455, Madalena,

0.750-410, Brazil. Tel.: +5581 99764841; fax: +55 81 34137749.

dress: [email protected] (A.L.I. Oliveira).

www.elsevier.com/locate/neucom

ARTICLE IN PRESS

A.L.I. Oliveira et al. / Neurocomputing 64 (2005) 537–541538

was assumed that these parameters would not influence classification performanceand therefore the use of their default values, yþ ¼ 0:4 and y� ¼ 0:1; wasrecommended for all datasets [1,2].In contrast, we have observed that, for some datasets, the value of y� considerably

influences generalization performance [3]. To take advantage of this observation, wehave proposed a method for improving RBF-DDA by carefully selecting the value ofy� [3]. This method has proved valuable for both classification problems [3] andnovelty detection in time series [4]. In spite of its advantages, the method has onedrawback: it generates much larger networks than RBF-DDA trained with thedefault parameters [3].A recent extension to DDA has appeared in the literature with a different aim,

namely, reducing the number of neurons generated by DDA [5]. The method,referred to as RBF-DDA with temporary neurons (RBF-DDA-T), introduces on-line pruning of neurons on each DDA training epoch [5]. We have attempted tointegrate RBF-DDA-T with y� selection, however, we have observed that themethod severely prunes the networks for smaller values of y�; thereby generatingmuch smaller networks with heavily degraded performance. Conversely, RBF-DDAgenerates larger networks for smaller y�; which, for some datasets, considerablyimproves performance [3,4].This letter proposes an extension to RBF-DDA which combines selective pruning

and parameter selection. We call this extension RBF-DDA-SP. In contrast to RBF-DDA-T, the method proposed here prunes only a portion of the neurons whichcover only one training sample and pruning is carried out only after the last epoch ofDDA training.

2. The proposed method

The DDA algorithm builds RBF networks with one hidden layer for classification.The hidden neurons use Gaussians as activation functions, Rið x!Þ ¼ expð�ðk x!�

ri!

kÞ2=ðs2i ÞÞ; where x! is the input vector and k x!� ri

!k is the Euclidean distance

between the input vector x! and the center ri!: ri

! and si are determined by DDA.Each output is computed as f ð x!Þ ¼

Pmi¼1 Ai � Rið x!Þ; where m is the number of

RBFs connected to that output unit and Ai is the weight of connection i [1,2]. Thereis one output unit for each class.The DDA algorithm relies on two parameters in order to decide about the

introduction of hidden RBF neurons in the networks [1,2]. One of the parameters isyþ; a positive threshold which must be overtaken by an activation of an RBF of thesame class so that no new RBF is added. The other is y�; a negative threshold, whichis the upper limit for the activation of conflicting classes [1,2].Each training epoch of DDA starts by setting Ai ¼ 0:0; 8i: Next, each training

sample x! is considered by DDA. Let pci denote an RBF neuron of class c already

inserted in the network by DDA. During training, a new RBF is introduced in thenetwork if )pc

i : Rci ð x!ÞXyþ: In this case, the weight of the new neuron pc

j is set toAj ¼ 1:0; and rj

!¼ x!: DDA also sets automatically the value of sj [1,2]. In contrast,

ARTICLE IN PRESS

A.L.I. Oliveira et al. / Neurocomputing 64 (2005) 537–541 539

if 9pci : Rc

i ð x!ÞXyþ; the algorithm does not introduce a new neuron, instead, itincrements the weight of that connection (from neuron pc

i to the output unit), that is,Aiþ ¼ 1:0: Therefore, in a trained RBF-DDA Ai gives the number of trainingsamples covered by RBF neuron i with Rið x!ÞXyþ:The DDA algorithm is executed over the training data until no change in the

network occurs. In most problems this takes place in only four to five epochs oftraining [1,2].The method proposed in this letter, called RBF-DDA-SP, firstly builds an RBF

network using DDA. Subsequently, a percentage p of the neurons which coveronly one training sample, that is, whose weight Ai ¼ 1:0; are removed from thenetwork. The neurons to be pruned are randomly selected from those which coveronly one training sample. Thus, our method has two critical parameters, namely, p

and y�: These parameters can be selected via cross-validation for improvedperformance.In our method, y� is selected via cross-validation, starting with y� ¼ 0:1: Next, y�

is decreased by y� ¼ y� � 10�1: This is done because we have observed thatperformance does not change significantly for intermediate values of y� [3]. y� isdecreased until the cross-validation error starts to increase, since smaller values leadto overfitting [3]. The near optimal y� found by this procedure is subsequently usedto train using the complete training set [3,4].

3. Experiments

The training method proposed in this letter was tested using four imagerecognition datasets available from the UCI machine learning repository [6]. Thebenchmark datasets used in the experiments were (classes, attributes, trainingsamples, test samples): optdigits (10, 64, 3823, 1797), pendigits (10, 16, 7494, 3498),letter (26, 16, 15000, 5000), and satimage (6, 36, 4435, 2000).Fig. 1 compares the proposed method (RBF-DDA-SP) with both the original

RBF-DDA and RBF-DDA-T regarding the generalization performance (leftgraphics) and the corresponding number of hidden RBFs of the networks (rightgraphics) as function of y� . These results were obtained for the optdigits dataset.RBF-DDA-SP was trained pruning 50% of the neurons which covered only onetraining sample. The results for RBF-DDA-SP correspond to means over 10 runs ofsimulations.Notice that for y� ¼ 0:1; the performance of the methods is similar, with a slight

advantage for RBF-DDA. For smaller values of y�; the generalization performanceimproves for both the proposed RBF-DDA-SP and RBF-DDA (up to y� ¼ 10�5).On the other hand, as y� decreases, performance severely degrades for RBF-DDA-T. This occurs because smaller values of y� generate networks with larger number ofneurons which cover only one training sample. RBF-DDA-T prunes the network ateach training epoch, thereby removing all training samples which cover only oneneuron. These training samples are put on an outlier list and are not considered insubsequent training epochs [5].

ARTICLE IN PRESS

Table 1

Classification errors on test sets and number of hidden RBFs for each dataset

Method optdigits pendigits letter satimage

RBF-DDA

(default) 10.18% [1953] 8.12% [1427] 15.60% [7789] 14.95% [2812]

RBF-DDA-T

(default) 14.75% [655] 8.43% [978] 25.32% [2837] 24.75% [662]

RBF-DDA

(y� sel.) 2.78% [3812] 2.92% [5723] 5.30% [12861] 8.55% [4099]

RBF-DDA-SP 3.13% [2672] 3.04% [4344] 6.54% [9358] 9.18% [2934]

(30%; y� sel.) (0.23%) (0.14%) (0.18%) (0.27%)

RBF-DDA-SP 3.30% [2292] 3.17% [3884] 7.10% [8191] 9.59% [2546]

(40%; y� sel.) (0.28%) (0.18%) (0.12%) (0.32%)

RBF-DDA-SP 3.57% [1912] 3.29% [3424] 8.00% [7023] 10.05% [2157]

(50%; y� sel.) (0.26%) (0.19%) (0.25%) (0.40%)

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9 10

Cla

ssifi

catio

n er

ror

on te

st s

et (

%)

-log(θ-)

1 2 3 4 5 6 7 8 9 10

-log(θ-)

RBF-DDA-T

RBF-DDA-SP (50%)

RBF-DDA

0

500

1000

1500

2000

2500

3000

3500

4000

Num

ber

of h

idde

n R

BF

neu

rons

RBF-DDA

RBF-DDA-SP (50%)

RBF-DDA-T

(a) (b)

Fig. 1. Comparison of the proposed method with RBF-DDA and RBF-DDA-T as function of y�: (a)classification errors. (b) number of RBFs. Results on optdigits.

A.L.I. Oliveira et al. / Neurocomputing 64 (2005) 537–541540

On the other hand, the pruning strategy adopted by RBF-DDA-SP removes onlya portion of those neurons which are considered to cover only one training sample(i.e., with Ai ¼ 1:0), thereby producing networks with better generalizationperformance. In addition, RBF-DDA-SP pruning is carried out only at the lasttraining epoch, thereby avoiding premature pruning. A similar behavior to thatdepicted in Fig. 1 was observed for the other datasets considered.Table 1 compares the classification performance and the complexity of the

networks of the proposed method with RBF-DDA trained with default parameters(yþ ¼ 0:4 and y� ¼ 0:1) [1], RBF-DDA-T [5] and RBF-DDA with y� selection [3] ineach dataset. For each dataset, Table 1 shows both the classification error on the test

ARTICLE IN PRESS

A.L.I. Oliveira et al. / Neurocomputing 64 (2005) 537–541 541

set and the number of hidden RBF neurons, for each training method. Theseexperiments considered RBF-DDA-SP with three different percentages of pruning,namely, 30%, 40%, and 50%. For example, in the case of RBF-DDA-SP with 30%of pruning, the method prunes, after DDA training, 30% of the neurons which coveronly one training sample. For RBF-DDA-SP simulations were carried out tentimes for each dataset, since the method randomly selects the neurons to be pruned.Table 1 reports both the mean and the standard deviation of the classification errorsfor RBF-DDA-SP.RBF-DDA-T simulations were carried out with yþ ¼ 0:4 and both y� ¼ 0:1 and

y� ¼ 0:2 for each dataset. Table 1 shows only the best RBF-DDA-T classificationresults obtained for each dataset (which used y� ¼ 0:2 for optdigits and satimage,and y� ¼ 0:1 for pendigits and letter). The values of y� for RBF-DDA with y�

selection and for RBF-DDA-SP for each dataset were: optdigits and pendigits

(y� ¼ 10�5); letter and satimage (y� ¼ 10�4).The results of Table 1 show that the proposed method considerably outperforms

both RBF-DDA (default) and RBF-DDA-T for the three percentages of pruningconsidered. In addition, it can be observed that the proposed method achievesclassification performance more close to that of RBF-DDA with y� selection [3],with the advantage of generating much smaller networks. RBF-DDA-SP perfor-mance is higher for smaller amounts of pruning (e.g., 30%). On the other hand,higher pruning rates produce networks with less neurons and a slight degradation inperformance.

References

[1] M.R. Berthold, J. Diamond, Boosting the performance of RBF networks with dynamic decay

adjustment, in: G. Tesauro, D. Touretzky, T. Leen, (Eds.), Advances in Neural Information

Processing, vol. 7, MIT Press, New York, 1995, pp. 521–528.

[2] M. Berthold, J. Diamond, Constructive training of probabilistic neural networks, Neurocomputing 19

(1998) 167–183.

[3] A.L.I. Oliveira, F.B.L. Neto, S.R.L. Meira, Improving RBF-DDA performance on optical character

recognition through parameter selection, in: Proceedings of the 17th International Conference on

Pattern Recognition (ICPR’2004), vol. 4, pp. 625–628.

[4] A.L.I. Oliveira, F.B.L. Neto, S.R.L. Meira, Improving novelty detection in short time series through

RBF-DDA parameter adjustment, in: Proceedings of International Joint Conference on Neural

Networks (IJCNN’2004), IEEE Press.

[5] J. Paetz, Reducing the number of neurons in radial basis function networks with dynamic decay

adjustment, Neurocomputing 62 (2004) 79–91.

[6] C. Blake, C. Merz, UCI repository of machine learning databases, Available from [http://www.

ics.uci.edu/�mlearn/MLRepository.html] (1998).

http://www.ics.uci.edu/mlearn/MLRepository.html



improving constructive training of rbf networks through selective pruning and model selection

Documents