improving constructive training of rbf networks through selective pruning and model selection
TRANSCRIPT
ARTICLE IN PRESS
Neurocomputing 64 (2005) 537–541
0925-2312/$ -
doi:10.1016/j
�CorrespoRecife - PE 5
E-mail ad
www.elsevier.com/locate/neucom
Letter
Improving constructive training of RBF networksthrough selective pruning and model selection
Adriano L.I. Oliveiraa,b,�, Bruno J.M. Meloa, Silvio R.L. Meirab
aPolytechnic School, University of Pernambuco, Rua Benfica, 455, Madalena, Recife - PE 50.750-410, BrazilbCenter of Informatics, Federal University of Pernambuco, P.O. Box 7851, Cidade Universitaria,
Recife - PE 50.732-970, Brazil
Received 26 October 2004; received in revised form 25 November 2004; accepted 28 November 2004
Communicated by R.W. Newcomb
Available online 19 January 2005
Abstract
This letter proposes a constructive training method for radial basis function networks. The
proposed method is an extension of the dynamic decay adjustment (DDA) algorithm, a fast
constructive algorithm for classification problems. The proposed method, which is based on
selective pruning and DDA model selection, aims to improve the generalization performance
of DDA without generating larger networks. Simulations using four image recognition
datasets from the UCI repository demonstrate the validity of the proposed method.
r 2005 Elsevier B.V. All rights reserved.
Keywords: Neural network; RBF network; Model complexity; Classification
1. Introduction
The dynamic decay adjustment algorithm (DDA) is a fast algorithm forconstructive training of radial basis function networks (RBFNs) and probabilisticneural networks (PNNs) [1,2]. DDA relies on two parameters, namely, yþ and y� inorder to decide about the introduction of RBF neurons in the network. Originally, it
see front matter r 2005 Elsevier B.V. All rights reserved.
.neucom.2004.11.027
nding author. Polytechnic School, University of Pernambuco, Rua Benfica, 455, Madalena,
0.750-410, Brazil. Tel.: +5581 99764841; fax: +55 81 34137749.
dress: [email protected] (A.L.I. Oliveira).
ARTICLE IN PRESS
A.L.I. Oliveira et al. / Neurocomputing 64 (2005) 537–541538
was assumed that these parameters would not influence classification performanceand therefore the use of their default values, yþ ¼ 0:4 and y� ¼ 0:1; wasrecommended for all datasets [1,2].In contrast, we have observed that, for some datasets, the value of y� considerably
influences generalization performance [3]. To take advantage of this observation, wehave proposed a method for improving RBF-DDA by carefully selecting the value ofy� [3]. This method has proved valuable for both classification problems [3] andnovelty detection in time series [4]. In spite of its advantages, the method has onedrawback: it generates much larger networks than RBF-DDA trained with thedefault parameters [3].A recent extension to DDA has appeared in the literature with a different aim,
namely, reducing the number of neurons generated by DDA [5]. The method,referred to as RBF-DDA with temporary neurons (RBF-DDA-T), introduces on-line pruning of neurons on each DDA training epoch [5]. We have attempted tointegrate RBF-DDA-T with y� selection, however, we have observed that themethod severely prunes the networks for smaller values of y�; thereby generatingmuch smaller networks with heavily degraded performance. Conversely, RBF-DDAgenerates larger networks for smaller y�; which, for some datasets, considerablyimproves performance [3,4].This letter proposes an extension to RBF-DDA which combines selective pruning
and parameter selection. We call this extension RBF-DDA-SP. In contrast to RBF-DDA-T, the method proposed here prunes only a portion of the neurons whichcover only one training sample and pruning is carried out only after the last epoch ofDDA training.
2. The proposed method
The DDA algorithm builds RBF networks with one hidden layer for classification.The hidden neurons use Gaussians as activation functions, Rið x!Þ ¼ expð�ðk x!�
ri!
kÞ2=ðs2i ÞÞ; where x! is the input vector and k x!� ri
!k is the Euclidean distance
between the input vector x! and the center ri!: ri
! and si are determined by DDA.Each output is computed as f ð x!Þ ¼
Pmi¼1 Ai � Rið x!Þ; where m is the number of
RBFs connected to that output unit and Ai is the weight of connection i [1,2]. Thereis one output unit for each class.The DDA algorithm relies on two parameters in order to decide about the
introduction of hidden RBF neurons in the networks [1,2]. One of the parameters isyþ; a positive threshold which must be overtaken by an activation of an RBF of thesame class so that no new RBF is added. The other is y�; a negative threshold, whichis the upper limit for the activation of conflicting classes [1,2].Each training epoch of DDA starts by setting Ai ¼ 0:0; 8i: Next, each training
sample x! is considered by DDA. Let pci denote an RBF neuron of class c already
inserted in the network by DDA. During training, a new RBF is introduced in thenetwork if )pc
i : Rci ð x!ÞXyþ: In this case, the weight of the new neuron pc
j is set toAj ¼ 1:0; and rj
!¼ x!: DDA also sets automatically the value of sj [1,2]. In contrast,
ARTICLE IN PRESS
A.L.I. Oliveira et al. / Neurocomputing 64 (2005) 537–541 539
if 9pci : Rc
i ð x!ÞXyþ; the algorithm does not introduce a new neuron, instead, itincrements the weight of that connection (from neuron pc
i to the output unit), that is,Aiþ ¼ 1:0: Therefore, in a trained RBF-DDA Ai gives the number of trainingsamples covered by RBF neuron i with Rið x!ÞXyþ:The DDA algorithm is executed over the training data until no change in the
network occurs. In most problems this takes place in only four to five epochs oftraining [1,2].The method proposed in this letter, called RBF-DDA-SP, firstly builds an RBF
network using DDA. Subsequently, a percentage p of the neurons which coveronly one training sample, that is, whose weight Ai ¼ 1:0; are removed from thenetwork. The neurons to be pruned are randomly selected from those which coveronly one training sample. Thus, our method has two critical parameters, namely, p
and y�: These parameters can be selected via cross-validation for improvedperformance.In our method, y� is selected via cross-validation, starting with y� ¼ 0:1: Next, y�
is decreased by y� ¼ y� � 10�1: This is done because we have observed thatperformance does not change significantly for intermediate values of y� [3]. y� isdecreased until the cross-validation error starts to increase, since smaller values leadto overfitting [3]. The near optimal y� found by this procedure is subsequently usedto train using the complete training set [3,4].
3. Experiments
The training method proposed in this letter was tested using four imagerecognition datasets available from the UCI machine learning repository [6]. Thebenchmark datasets used in the experiments were (classes, attributes, trainingsamples, test samples): optdigits (10, 64, 3823, 1797), pendigits (10, 16, 7494, 3498),letter (26, 16, 15000, 5000), and satimage (6, 36, 4435, 2000).Fig. 1 compares the proposed method (RBF-DDA-SP) with both the original
RBF-DDA and RBF-DDA-T regarding the generalization performance (leftgraphics) and the corresponding number of hidden RBFs of the networks (rightgraphics) as function of y� . These results were obtained for the optdigits dataset.RBF-DDA-SP was trained pruning 50% of the neurons which covered only onetraining sample. The results for RBF-DDA-SP correspond to means over 10 runs ofsimulations.Notice that for y� ¼ 0:1; the performance of the methods is similar, with a slight
advantage for RBF-DDA. For smaller values of y�; the generalization performanceimproves for both the proposed RBF-DDA-SP and RBF-DDA (up to y� ¼ 10�5).On the other hand, as y� decreases, performance severely degrades for RBF-DDA-T. This occurs because smaller values of y� generate networks with larger number ofneurons which cover only one training sample. RBF-DDA-T prunes the network ateach training epoch, thereby removing all training samples which cover only oneneuron. These training samples are put on an outlier list and are not considered insubsequent training epochs [5].
ARTICLE IN PRESS
Table 1
Classification errors on test sets and number of hidden RBFs for each dataset
Method optdigits pendigits letter satimage
RBF-DDA
(default) 10.18% [1953] 8.12% [1427] 15.60% [7789] 14.95% [2812]
RBF-DDA-T
(default) 14.75% [655] 8.43% [978] 25.32% [2837] 24.75% [662]
RBF-DDA
(y� sel.) 2.78% [3812] 2.92% [5723] 5.30% [12861] 8.55% [4099]
RBF-DDA-SP 3.13% [2672] 3.04% [4344] 6.54% [9358] 9.18% [2934]
(30%; y� sel.) (0.23%) (0.14%) (0.18%) (0.27%)
RBF-DDA-SP 3.30% [2292] 3.17% [3884] 7.10% [8191] 9.59% [2546]
(40%; y� sel.) (0.28%) (0.18%) (0.12%) (0.32%)
RBF-DDA-SP 3.57% [1912] 3.29% [3424] 8.00% [7023] 10.05% [2157]
(50%; y� sel.) (0.26%) (0.19%) (0.25%) (0.40%)
0
10
20
30
40
50
60
70
80
90
100
1 2 3 4 5 6 7 8 9 10
Cla
ssifi
catio
n er
ror
on te
st s
et (
%)
-log(θ-)
1 2 3 4 5 6 7 8 9 10
-log(θ-)
RBF-DDA-T
RBF-DDA-SP (50%)
RBF-DDA
0
500
1000
1500
2000
2500
3000
3500
4000
Num
ber
of h
idde
n R
BF
neu
rons
RBF-DDA
RBF-DDA-SP (50%)
RBF-DDA-T
(a) (b)
Fig. 1. Comparison of the proposed method with RBF-DDA and RBF-DDA-T as function of y�: (a)classification errors. (b) number of RBFs. Results on optdigits.
A.L.I. Oliveira et al. / Neurocomputing 64 (2005) 537–541540
On the other hand, the pruning strategy adopted by RBF-DDA-SP removes onlya portion of those neurons which are considered to cover only one training sample(i.e., with Ai ¼ 1:0), thereby producing networks with better generalizationperformance. In addition, RBF-DDA-SP pruning is carried out only at the lasttraining epoch, thereby avoiding premature pruning. A similar behavior to thatdepicted in Fig. 1 was observed for the other datasets considered.Table 1 compares the classification performance and the complexity of the
networks of the proposed method with RBF-DDA trained with default parameters(yþ ¼ 0:4 and y� ¼ 0:1) [1], RBF-DDA-T [5] and RBF-DDA with y� selection [3] ineach dataset. For each dataset, Table 1 shows both the classification error on the test
ARTICLE IN PRESS
A.L.I. Oliveira et al. / Neurocomputing 64 (2005) 537–541 541
set and the number of hidden RBF neurons, for each training method. Theseexperiments considered RBF-DDA-SP with three different percentages of pruning,namely, 30%, 40%, and 50%. For example, in the case of RBF-DDA-SP with 30%of pruning, the method prunes, after DDA training, 30% of the neurons which coveronly one training sample. For RBF-DDA-SP simulations were carried out tentimes for each dataset, since the method randomly selects the neurons to be pruned.Table 1 reports both the mean and the standard deviation of the classification errorsfor RBF-DDA-SP.RBF-DDA-T simulations were carried out with yþ ¼ 0:4 and both y� ¼ 0:1 and
y� ¼ 0:2 for each dataset. Table 1 shows only the best RBF-DDA-T classificationresults obtained for each dataset (which used y� ¼ 0:2 for optdigits and satimage,and y� ¼ 0:1 for pendigits and letter). The values of y� for RBF-DDA with y�
selection and for RBF-DDA-SP for each dataset were: optdigits and pendigits
(y� ¼ 10�5); letter and satimage (y� ¼ 10�4).The results of Table 1 show that the proposed method considerably outperforms
both RBF-DDA (default) and RBF-DDA-T for the three percentages of pruningconsidered. In addition, it can be observed that the proposed method achievesclassification performance more close to that of RBF-DDA with y� selection [3],with the advantage of generating much smaller networks. RBF-DDA-SP perfor-mance is higher for smaller amounts of pruning (e.g., 30%). On the other hand,higher pruning rates produce networks with less neurons and a slight degradation inperformance.
References
[1] M.R. Berthold, J. Diamond, Boosting the performance of RBF networks with dynamic decay
adjustment, in: G. Tesauro, D. Touretzky, T. Leen, (Eds.), Advances in Neural Information
Processing, vol. 7, MIT Press, New York, 1995, pp. 521–528.
[2] M. Berthold, J. Diamond, Constructive training of probabilistic neural networks, Neurocomputing 19
(1998) 167–183.
[3] A.L.I. Oliveira, F.B.L. Neto, S.R.L. Meira, Improving RBF-DDA performance on optical character
recognition through parameter selection, in: Proceedings of the 17th International Conference on
Pattern Recognition (ICPR’2004), vol. 4, pp. 625–628.
[4] A.L.I. Oliveira, F.B.L. Neto, S.R.L. Meira, Improving novelty detection in short time series through
RBF-DDA parameter adjustment, in: Proceedings of International Joint Conference on Neural
Networks (IJCNN’2004), IEEE Press.
[5] J. Paetz, Reducing the number of neurons in radial basis function networks with dynamic decay
adjustment, Neurocomputing 62 (2004) 79–91.
[6] C. Blake, C. Merz, UCI repository of machine learning databases, Available from [http://www.
ics.uci.edu/�mlearn/MLRepository.html] (1998).