hybridized particle swarm algorithm for adaptive structure training of multilayer feed-forward...

Hybridized Particle Swarm Algorithm for AdaptiveStructure Training of Multilayer Feed-Forward Neural

Network: QSAR Studies of Bioactivity ofOrganic Compounds

QI SHEN,1,2 JIAN-HUI JIANG,1 CHEN-XU JIAO,1 WEI-QI LIN,1 GUO-LI SHEN,1 RU-QIN YU1

1State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry andChemical Engineering, Hunan University, Changsha 410082, People’s Republic of China

2Chemistry Department, Zhengzhou University, Zhengzhou 450052, People’s Republic of China

Received 11 November 2003; Accepted 19 May 2004DOI 10.1002/jcc.20094

Published online in Wiley InterScience (www.interscience.wiley.com).

Abstract: The multilayer feed-forward ANN is an important modeling technique used in QSAR studying. The trainingof ANN is usually carried out only to optimize the weights of the neural network and without paying attention to thenetwork topology. Some other strategies used to train ANN are, first, to discover an optimum structure of the network,and then to find weights for an already defined structure. These methods tend to converge to local optima, and may alsolead to overfitting. In this article, a hybridized particle swarm optimization (PSO) approach was applied to the neuralnetwork structure training (HPSONN). The continuous version of PSO was used for the weight training of ANN, andthe modified discrete PSO was applied to find appropriate the network architecture. The network structure andconnectivity are trained simultaneously. The two versions of PSO can jointly search the global optimal ANNarchitecture and weights. A new objective function is formulated to determine the appropriate network architecture andoptimum value of the weights. The proposed HPSONN algorithm was used to predict carcinogenic potency of aromaticamines and biological activity of a series of distamycin and distamycin-like derivatives. The results were compared tothose obtained by PSO and GA training in which the network architecture was kept fixed. The comparison demonstratedthat the HPSONN is a useful tool for training ANN, which converges quickly towards the optimal position, and canavoid overfitting in some extent.

© 2004 Wiley Periodicals, Inc. J Comput Chem 25: 1726–1735, 2004

Key words: artificial neural network; particle swarm optimization; quantitative structure–activity relationships

Introduction

The multilayer feed-forward artificial neural network (ANN) as animportant modeling technique has been widely used in QSARstudying. The advantage of ANN over other methods, such asmultiple linear regression (MLR) or partial least squares (PLS), isthe inherent ability to incorporate nonlinear relationships betweendescriptors and activity.1

The training of ANN is commonly carried out with a standardback-propagation (BP)-type algorithm in the weight space of anetwork with fixed topology. However, the BP neural network2,3

as a gradient search algorithm has some limitations, such asoverfitting, local optimum problems, and sensitivity to the initialvalues of weights. In addition, the BP neural network requires apredefined topology where the the numbers of hidden neuronsmust be already known. The architecture of the network is one of

the most important considerations when solving problems usingmultilayer feed-forward neural networks. An oversimplified net-work architecture might hamper the convergence of the network.On the other hand, too large a network size would lead to over-fitting, and thus poor generalization performance. Besides thebetter generalization ability, small networks are better becausethey are usually faster and cheaper to build. One method of findingthe appropriate network architecture is to train a network thatcontains a large number of hidden nodes and then prune thenetwork. This slows down the speed of convergence because themajority of the training time is spent on training a network that is

Correspondence to: R.-Q. Yu; e-mail: [email protected]

Contract/grant sponsor: the National Natural Science Foundation ofChina; contract/grant numbers: 20375012, 20105007, and 20205005

© 2004 Wiley Periodicals, Inc.

larger than necessary. It may also lead to overfitting. On the otherhand, a number of algorithms have been introduced that allow theneural networks to grow during training. But it is difficult todetermine the size of an optimal set of newly added neurons.

Besides the BP algorithm, an evolutionary search,4,5,6 such asgenetic algorithms (GAs) and evolution algorithms (EAs), are usedfor the ANN training. But most of them only optimized the weightsof the neural network and seldom mentioned the network topology.Some other evolutionary strategies used to train ANN are, first, todiscover the optimum structure of a network and then to findweights for an already defined structure. Such an approach seemsdifficult to circumvent the local optima and overfitting problems.From this point of view it is highly desirable to design a method-ology capable to find the appropriate network architecture andconnectivity simultaneously.

Particle swarm optimization (PSO),7–10 a relatively new opti-mization technique can also be applied in ANN training, whichoriginated as a simulation of a simplified social system. Similar toGAs and EAs, PSO is a population-based optimization tool thatsearches for optima by updating generations. However, unlikeGAs and EAs, PSO possesses no evolution operators such ascrossover and mutation. Compared to GAs and EAs, PSO presentsthe advantage of being conceptionally very simple, requiring lowcomputation costs and few parameters to adjust. Most versions ofPSO have operated in continuous and real-number space. A mod-ified discrete PSO algorithm11 has been proposed in our previousstudy to select variables in MLR and PLS modeling, and hasshown satisfied performance. In this article, a continuous versionof PSO is used for the weight training of ANN, and the modifieddiscrete PSO is applied to find the appropriate network architec-ture. Network structure and connectivity are searched simulta-neously. The two version of PSO can jointly search the globaloptimal ANN architecture and weights. A new objective functionis formulated to determine the appropriate network architectureand optimum value of the weights. The formulation and corre-sponding computer flow chart are presented in detail in this article.

As examples of the application of the proposed training algo-rithm, carcinogenic potency of aromatic amines12 and biologicalactivity of a series of distamycin and distamycin-like deriva-tives13–17 were predicted from some selected molecular parame-ters. The results have demonstrated that the proposed method is auseful tool for training ANN, which converges quickly towards theoptimal position and can avoid overfitting to some extent. Thisindicates that the proposed network is superior compared withsome other networks.

Theory

Particle Swarm Optimization

PSO, developed by Eberhart and Kennedy in 1995, is a stochasticglobal optimization technique inspired by social behavior of birdflocking. The algorithm models the exploration of a problem spaceby a population of individuals or particles. In PSO, each singlesolution is a particle in the search space. Each individual in PSOflies in the search space with a velocity that is dynamically ad-

justed according to the flying experience of its own and its com-panions.

PSO is initialized with a group of random particles. Eachparticle is treated as a point in a D-dimensional space. The ithparticle is represented as xi � (xi1,xi2, . . .xiD). The best previousposition of the ith particle that give the best fitness value isrepresented as pi � (pi1,pi2, . . . piD). The best particle among allthe particles in the population is represented by pg � (pg1,pg2, . . .pgD). Velocity, the rate of the position change for particle i isrepresented as vi � (vi1,vi2, . . . viD). In every iteration, eachparticle is updated by following the two best values. After findingthe aforementioned two best values, the particle updates its veloc-ity and positions according to the following equations:

vid�new� � vid�old� � c1�r1��pid � xid� � c2�r2��pgd � xid� (1)

xid�new� � xid�old� � ��vid�new� (2)

where c1 and c2 are two positive constants called learning factors,r1 and r2 are random numbers in the range of (0,1). � is arestriction factor to determine velocity weight. Equation (1) is usedto calculate the particle’s new velocity according to its previousvelocity and the distances of its current position from its own bestposition and the group’s best position. Then the particle fliestoward a new position according to eq. (2). Such an adjustment ofthe particle’s movement through the space causes it to searcharound the two best positions. If the minimum error criterion isattained or the number of cycles reaches a user-defined limit, thealgorithm is terminated.

Modified Discrete Particle Swarm Optimization

For a discrete problem expressed in a binary notation, a particlemoves in a search space restricted to 0 or 1 on each dimension. Ina binary problem, updating a particle represents changes of a bitthat should be in either state 1 or 0, and the velocity represents theprobability of bit xid taking the value 1 or 0.

In the PSO algorithm, a population of particles is updated on abasis of information about each particle’s previous best perfor-mance and the best particle in the population. According to theinformation sharing mechanism of PSO, a modified discrete PSO11

was proposed as follows. The velocity vid of every individual is arandom number in the range of (0,1). The resulting change inposition then is defined by the following rule:

If �0 � vid � a�, then xid�new� � xid�old� (3)

If �a � vid � �1 � a�/2�, then xid�new� � pid (4)

If ��1 � a�/2 � vid � 1�, then xid�new� � pgd (5)

where a is a random value in the range of (0,1) called staticprobability. Static probability a started with a value 0.5 and de-creases to 0.33 when the iteration terminates. Although the veloc-ity in the modified discrete PSO is different from that in a contin-uous version of PSO, the information-sharing mechanism and

Hybridized Particle Swarm Algorithm 1727

updating model of the particle by following the two best positionsis the same in the two PSO versions.

To circumvent convergence to local optima and improve theability of the modified PSO algorithm to overleap local optima, 5%of the particles are forced to fly randomly, not following the twobest particles:

If �0 � vid � 0.05� and for �0 � b � 0.5� then xid�new� � 0

(6)

If �0 � vid � 0.05� and for �0.5 � b � 1� then xid�new� � 1

(7)

If �0.05 � vid � 1� then xid�new� � xid�old� (8)

where b is a random value in the range of (0,1). In the randomflying operator, 5% of particles were randomly selected, and eachsite of the selected particles has a probability of 0.05 to vary thevalue in a stochastic manner. With this simple strategy, the algo-rithms provide an effective avenue to deal with local optimum ora premature convergence problem.

Using decreasing static probability and some percent of ran-domly fling particles to overleap local optima, the modified PSOremains having satisfactory converging characteristics.

Multilayer Feed-Forward ANN

There exist many types of neural networks widely used in differentfields of application. ANN is constructed with nodes (neurons) andlinking weights, which control the signal transfers between thenodes by the so-called transfer function. The training of ANN is tosearch optimal structure and values of the linking weights. In thiswork the supervised multilayer feed-forward ANN is taken as anexample, with its structure and weight searched simultaneouslywith discrete and continuous PSO algorithms, respectively. For thetransfer function f(x), the input Ij to node j is the weighted sum ofthe outputs of all nodes (i � 1,2, . . ., n) connected to it:

Ij � �j � �i

wjiOj (9)

Oj � f�Ij� (10)

where Oi and Oj are the outputs of the ith and jth nodes, wji is thelinking weight between node j and i, and �j is a threshold value. Inthis work, the sigmoid function was selected as the transfer func-tion. The fitness is judged (vide infra) of all the samples in thetraining set, which is taken as the objective function. For evalua-tion of the trained net, the fitness for the prediction set is calculatedby using the same formula that will be given later; the onlydifference is that samples of the prediction set were never takeninto account in the training of the net.

ANN Adaptive Structure Training by Hybridizing PSO(HPSONN)

Network architecture and connectivity are the most importantconsiderations when training the multilayer feed-forward neuralnetwork. The efficient scheme is to train network architecture andconnectivity simultaneously. A continuous version of PSO is usedfor weight training of ANN, and the modified discrete PSO isapplied to find the appropriate network architecture. In continuousPSO, real-number strings were adopted to code all the particles.Each real-number coded string stands for a set of weights, whichis maked up of one ANN together with all of its nodes. Actually,each bit of a particle is a real number that stands for a linkingweight of the given ANN. In the modified discrete PSO, eachparticle is encoded by a string of binary bits associated with thenumber of weights, which is maked up of one ANN together withall its nodes. A bit “0” in a particle represents the uselessness of thecorresponding weight. In this algorithm, the number of weightsinvolved in the computation is adaptively adjusted. The ANNtrained by hybridizing PSO is described as follows.

Step 1. Randomly initialize all the initial strings WEI and STRin two versions of PSO with an appropriate size of a population.WEI are real number-coded strings standing for a set of weights.STR are strings of binary bits corresponding to WEI.

Step 2. The last weight W concerned with the ANN computa-tion is the Hadamard multiplication product of vector WEI andSTR, that is, if WEI � (weiij)m�n and STR � (strij)m�n , thenHadamard multiplication gives a product WEI � STR � (weiij �strij)m�n. Calculate the fitness function of the ANN correspondingto each weight of the population W. If the best object function ofthe generation fulfills the end condition, the training is stoppedwith the results output; otherwise, go to the next step.

Step 3. Update the WEI population according to fitness func-tion by applying the continuous PSO.

Step 4. Update the STR population according to the modifieddiscrete PSO.

Step 5. Go back to the second step to calculate the fitness of therenewed population. The HPSONN scheme is presented inScheme 1.

Fitness Function

In HPSONN, the performance of each particle is measured accord-ing to a predefined fitness function. Training ANN by hybridizingPSO should necessarily look for the optimum network architectureand optimum set of the weights and bias. In light of these require-ments, we can formulate the objection function, whose minimiza-tion will generate an optimum network configuration. The fitnessfunction of HPSONN is evaluated based on two aspects: theaccuracy of the network output and the complexity of the network.The accuracy of the network is defined as root-mean-square error(RMSE).

RMSE � �¥i �Oiout � di�

2

n(11)

1728 Shen et al. • Vol. 25, No. 14 • Journal of Computational Chemistry

where oiout and di are, respectively, the actual and desired output

values of output node for the ith sample; n is the number ofsamples involved. Here, the values of di are scaled into (0.0, 1.0).

The complexity of the network is simply defined as

� b/m (12)

where b is the number of weights that are not equal to 0 or thenumber of weights that was involved in ANN computation; m isthe total number of weights that is equal to the size of a particle.Then the fitness function of HPSONN could be expressed asfollows:

f � RMSE�1 � �� (13)

where is the weighting coefficient between the accuracy and thecomplexity of network to penalize large networks. According toexperience, is set to 0.1.

Data Sets

Aromatic Amine Data

A set of 39 aromatic amines with their carcinogenic potenciescollected by Benigni et al.12 in their comprehensive review wasused for testing the performance of HPSONN in QSAR studies.The carcinogenic potency is expressed as BRR � log(MW/TD50)rat, where MW is the molecular weight, and TD50 isthe daily dose rate necessary to halve the probability of an exper-imental rat remaining tumorless to the end of its standard life span.Among 39 aromatic amines, 32 randomly selected samples wereused as a training set, and the remaining 7 samples as the predic-tion set. Each sample is described by the following five parametersfor training the neural network, including the octano/water parti-tion coefficient (AlogP),18 indicator variable I (I � 1 for biphe-nylamines with a bridge between the phenyl rings), and threeelectrotopological-state indices (E-State indices, N-aasC, S-aaCH,and S-ssssC).19,20 The E-State index for an atom represents theelectron accessibility associated with each atom type. It is anindication of the presence, absence of a given atom type, and thecount of the number of atoms of a particular element type. Forexample, in the symbol N-aasC, “N” represents the number ofcarbon atoms in the phenyl group connected to other atoms besideshydrogen, “a” stands for the bond in an aromatic ring, “s” standsfor the single bonds of that group, and “C” represents the carbonatom.

Distamycin and Distamycin-Like Derivative Data

A set of 71 compounds synthesized by Cozzi and his cowork-ers13–17 in recent years were gathered and randomly divided intotwo groups—55 for the training set, the remaining 16 compoundsfor the predicting set. These compounds were derived from over 10different parent structures, which can be represented by five basicstructures as shown in Figure 1. The detailed structures and bio-logical activity data of the compounds studied are listed in Table1. Each sample is described by six parameters for training theneural network, including Apol (sum of atomic polarizabilities),Area (molecular surface area), MW (molecular weight), Density,Hbond acceptor (number of hydrogen bond acceptors), and AlogP(log of the partition coefficient).

The HPSONN algorithm was programmed in Matlab 6.0 andrun on a personal computer (Intel Pentium processor 733MHZ 128MB RAM).

Results and Discussion

Aromatic Amine Data

As a comparison, multivariate linear regression (MLR) methodswere first utilized for modeling the carcinogenic potencies ofaromatic amines using the five variables. The correlation coeffi-cient (R) for the training set and the validation set were 0.9537 and0.9425, respectively. The correlation between the calculated andexperimental values of BRR is shown in Figure 2. We calculatedthe bias of MLR for the results of training and prediction sets for

Scheme 1. The chart of the HPSONN scheme.


the aromatic amine data. The biases for the training and predictionsets are 0.2829 and 0.4107, respectively, indicating that MLRA isinadequate for modeling the aromatic amine data.

To compare with HPSONN, weights of ANN trained by PSO(PSONN) and GA (GANN) were also performed in which thenetwork architecture was kept fixed. Network topology optimiza-tion was not considered in PSONN and GANN. All methods werestopped after 1000 iterations. The correlation coefficient for thetraining set by GANN and PSONN were 0.971 and 0.970, respec-tively. GANN and PSONN gave the correlation coefficients of0.953 and 0.954 for the validation set, respectively. A comparisonof PSONN, GANN, and MLR shows that better results wereobtained from the PSONN and GANN algorithm. The QSARmodel involved is a nonlinear one. The convergence processes for

GANN and PSONN can be examined in Figure 3a b. RMSE wasdrawn in Figure 3a and b. Curve 1 is the convergence curve drawnwith the RMSE for the training set obtained with the most fittedmodel in each generation. Curve 2 is the RMSE curve for thepredicted set calculated by this most fitted model for the trainingset. Curve 3 is the average for the RMSs of the training setobtained by all models of each generation of each algorithm. As acomparison of Figure 3a and b, the minimum fitness value can beobtained in the searching process by PSONN and GANN. Thefitness value drops more quickly in PSONN. The minimum fitnesscan be obtained in about 70 cycles during the PSO optimization,but it needs about 200 cycles during the GA optimization. Exper-imental results confirm that PSONN converges to the best solutionquickly. Second, in the weight training of ANN by GA and PSO,

Figure 1. Five parent structures of tested distamycin and distamycin-like derivatives.


Table 1. The Distamycin and Distamycin-Like Derivatives Studied in the Present Work and Their Activities.

Compound Seriesb mc Xc Yc Zc R1c R2

c exp.d

1 I 0 Br H H CH2CH2Br C(NH)NH2 � HCl �0.5512 I 0 Cl F H CH2CH2Cl C(NH)NH2 � HCl 5.4053 I 0 Cl CH3 H CH2CH2Cl C(NH)NH2 � HCl 2.4074a I 0 Cl CH3 CH3 CH2CH2Cl C(NH)NH2 � HCl 5.4285a I 0 Cl H H H C(NH)NH2 � HCl 6.1096a I 0 Cl H H CH2CH3 C(NH)NH2 � HCl 3.7387a I 0 Cl H H CH2CH2CH3 C(NH)NH2 � HCl 6.6058a I 0 Cl CH3 H CH2CH3 C(NH)NH2 � HCl 4.0139a I 1 Cl H H CH2CH3 C(NH)NH2 � HCl 1.548

10 I 0 Cl H H CH2CH2Cl C(NOH)NH2 7.29811 I 0 Cl H H CH2CH2Cl C(NNH2)NH2 6.60412 I 0 Cl H H CH2CH2Cl C(NCN)NH2 3.63513 I 0 Cl H H CH2CH2Cl C(NCH2)NH2 � HCl 3.65114 I 0 Cl H H CH2CH2Cl C(NCH3)NHCH3 � HCl 3.59715 I 0 Cl H H CH2CH2Cl NHC(NH)NH2 � HCl 4.16516 I 0 Cl H H CH2CH2Cl CH2N(CH3)2 � HCl 6.91217 I 0 Cl H H CH2CH2Cl CONH2 4.86818a I 0 Cl H H CH2CH2Cl CN 4.54319a I 0 Cl H H CH2CH2Cl COOH 6.86720a I 0 Cl H H CH2CH2CH3 C(NH)NH2 � HCl 3.73821 I 0 Cl H H CH2CH2CH3 C(NCN)NH2 5.89422 I 0 Cl H H CH2CH2CH3 C(NCH3)NH2 � HCl 3.05423 I 0 Cl H H CH2CH2CH3 NHC(NH)NH2 � HCl 4.13024 I 1 Cl H H CH2CH2Cl C(NH)NH2 � HCl 1.97425 I 1 Cl H H CH2CH2Cl C(NOH)NH2 3.74726 I 1 Cl H H CH2CH2Cl C(NCN)NH2 1.68627 I 1 Cl H H CH2CH2Cl C(NCH3)NH2 � HCl 1.36128 I 1 Cl H H CH2CH2Cl C(NCH3)NHCH3 � HCl 1.85629 II 1 Cl H H CH2CH2Cl NHC(NH)NH2 � HCl 2.00230 II 0 CH CH CH — C(NH)NH2 � HCl 3.91831 II 0 N CH CH — C(NH)NH2 � HCl 3.55532 II 0 N N CH — C(NH)NH2 � HCl 5.41733 II 0 CH CH N — C(NH)NH2 � HCl 7.24334 II 0 CH N CH — C(NH)NH2 � HCl 4.36435a II 0 CH N N — C(NH)NH2 � HCl 7.59436a II 0 N N N — C(NH)NH2 � HCl 7.18937 II 0 N N N — CH2N(CH3)2 8.00638 II 1 N CH CH — C(NH)NH2 � HCl 3.00139 II 1 N CH N — C(NH)NH2 � HCl 6.41140a III 4 — — — CH2ACBrCONH C(NCH3)NH2 � HCl 0.36541a III 4 — — — CH2ACBrCONH C(NCH3)NCH3 � HCl 0.14042 III 4 — — — CH2ACBrCONH C(NH)N(CH3)2 � HCl 0.64243 III 4 — — — CH2ACBrCONH NHC(NH)NH2 � HCl 0.34444 III 4 — — — CH2ACBrCONH C(NOH)NH2 1.82845 III 4 — — — CH2ACBrCONH C(NCN)NH2 1.10946 III 4 — — — CH2ACBrCONH CONH2 1.26147 III 4 — — — CH2ACBrCONH CN 2.68148 III 4 — — — CH2ACBrCONH C(NH)NH2 � HCl 0.99349 III 4 — — — CH2ACClCONH C(NH)NH2 � HCl 0.70850 III 3 — — — NHCHO C(NH)NH2 � HCl 8.48751 III 3 — — — NHCOC(CH2)Br C(NH)NH2 � HCl 3.84052 IV — N CH — H H 2.24553 IV — CH N — H H 3.50654 IV — N CH — H CH3 2.01055 IV — N CH — CH3 CH3 2.93956 IV — N CH — CN H 1.73357 IV — CH N — H CH3 4.721

(continued )


although the RMSE of the predicted set (curve 2, Fig. 3a and b)significantly decreases at the very beginning generations, the per-formance of the fittest model of each generation for the predictionset (curves 2) improved very slowly compared to the convergencerate for the same model as evaluated by the training set (curve 1).The condition in Figure 3a is similar to Figure 3b. With thedecrease of RMSE for the training set (curve 1, Fig. 3a and b) theRMSE for the predicted set turned to increasing slightly, and nofurther improvements were observed. These phenomena imply thatoverfitting for the training sets occurs under this condition. As a

comparison, back propagation algorithm were also utilized formodeling the carcinogenic potencies of aromatic amines using thefive variables. It was found that the correlation coefficients ob-tained with back propagation were 0.9625 and 0.9430, respec-tively, for the training and the prediction set. These results are aslightly better than those obtained with MLRA, but worse thanthose given by GANN and PSONN, indicating the back propaga-tion algorithm is possibly trapped to local optima and producessuboptimal results.

To further improve the QSAR model, the HPSONN algorithmwas used to evaluate the carcinogenic potencies of the aromaticamines. In HPSONN, a continuous version of PSO is used for theweight training of ANN, and the modified discrete PSO is appliedto find the appropriate network architecture. In the present work,the three-layer ANN is used with 10 hidden nodes for five descrip-tion variables. The population size of PSO is selected as 100, andthe restriction factor � in eq. (2) is set at 0.6. The correlationbetween the calculated and experimental values of BRR is shownin Figure 4. The correlation coefficient for the training set and thevalidation set were 0.9646 and 0.9648, respectively. The correla-tion coefficient for the validation set was consistent with that forthe training set by HPSONN. Compared with GANN and PSONN,the R for the validation set by HPSONN was better than that byGANN and PSONN. There is no sign of overfitting in HPSONN asoften happens in the ANN training in GANN and PSONN. Theintroduction of the network architecture optimization into thealgorithm improved the characteristic performance of the trainedANN, as the correlation coefficient for the prediction set is im-proved with the topology training used.

The convergence processes for HPSONN are shown in Figure3c. From curve 1 in Figure 3c, one can see that HPSONN canconverge to a satisfactory solution in about 70 cycles. The trendsof variation of curves 2 and 1 seem to be parallel to each other forHPSONN (Fig. 3c), and might finally turn to be identical. There

Table 1. (Continued )

Compound Seriesb mc Xc Yc Zc R1c R2

c exp.d

58a IV — CH N — CH3 CH3 3.78759a IV — CH N — CN H 4.91860 V — NH — — H H 1.11561 V — NCH3 — — H H 0.59962 V — O — — H H 1.51563 V — S — — H H 2.38964a V — NH — — H CH3 1.22465 V — NH — — CH3 CH3 1.77166 V — NCH3 — — H CH3 1.31167 V — NCH3 — — CH3 CH3 0.78968 V — O — — H CH3 1.97769 V — O — — CH3 CH3 1.58470 V — S — — H CH3 2.09871 V — S — — CH3 CH3 2.649

aThe compounds selected as the predicting set.bThe structure from which the compounds derived.cm, the number of the repeated substituent in the structure; X, Y, Z, R1, R2, the substituent of the structures in Figure 1.dThe natural logarithm of the 50% growth inhibitory concentration against L1210 murine leukemia cells.

Figure 2. Calculated vs. observed BRR using MLR modeling foraromatic amine data.


seems to be no sign of overfitting to the training set as oftenhappens in the ANN training. A comparison between the curvesshown in Figure 3a, b, and c can provide some insights into thedifference of the performance associated with the three algorithmsfor training ANN. The curves obtained by GANN and PSONN arerelatively smooth compared to those obtained with the use ofhybridized PSO (HPSONN). This seems to be related with the factthat members generated by HPSONN are significantly differentfrom each other within one generation and also between twosubsequent generations. This provides each new generation with a

relatively high chance to hold its population diversity in the searchof coding strings with increasingly improved performance. Theway to maintain the population diversity can prevent the incestleading to misleading local optima. The steep changes on theconvergence curves in Figure 3c reflect the process of theHPSONN algorithm searching the solution space with a ratherlarge span. HPSONN cannot only hold population diversity butalso reserve the best particle among all the particles in the popu-lation. The introduction of hybridized PSO for the ANN architec-ture optimization seems very beneficial for circumventing theproblem of overfitting.

Distamycin and Distamycin-Like Derivatives Data

First, PSONN and GANN were utilized for modeling distamycinand distamycin-like derivatives data. The correlation coefficient(R) for the training set by GANN and PSONN were 0.870 and0.8752, respectively. GANN and PSONN gave the correlationcoefficients of 0.841 and 0.836 for the validation set, respectively.For this data set the result using GANN and PSONN are very closeto each other. The R for the validation set were poorer than that forthe training set by GANN and PSONN. This seems to be asymptom of overfitting. The convergence processes for GANN andPSONN can be examined in Figure 5a and b. As a comparison ofFigure 5a and b, PSONN shows a little higher convergence ratethan GANN. (PSONN almost convergences in about 50 cycles,while GANN takes about 100 cycles to reach convergences.)Second, with the decrease of RMSE for the training set (curve 1,Fig. 5a and b) the RMSE for the predicted set turned to increaseslightly, and no further improvements was observed. These phe-nomena imply that overfitting for the training sets occurs underthis condition.

Figure 3. Convergence curves for GANN (a), PSONN (b), HPSONN(c) for aromatic amine data. Curve 1: RMSE for training set obtainedwith the most fitted model in each generation. Curve 2: RMSE curvefor predicted set calculated by this most fitted model for the trainingset. Curve 3: the average RMSE for the training set of all the RMSsobtained by all models of each generation.

Figure 4. Calculated vs. observed BRR using HPSONN modeling foraromatic amine data.


To further improve the QSAR model, the HPSONN algorithmwas used to evaluate the distamycin and distamycin-like deriva-tives data. In this dataset, the three-layer ANN is used with 20hidden nodes for six description variables. The correlation betweenthe calculated and experimental values of inhibitory activity isshown in Figure 6. The correlation coefficient for the training setand the validation set were 0.8606 and 0.8648, respectively. The Rfor the validation set corresponds with that for the training set. Theconvergence processes for HPSONN are shown in Figure 5c. Thetrends of variation of curves 2 and 1 seem to be parallel to eachother for HPSONN. There seems to be no sign of overfitting to thetraining set as often happens in ANN training. The results with this

data set revealed again that the proposed HPSONN offered sub-stantially improved generalization performance as well as conver-gence rate compared to GANN.

PSO possesses a very unique attribute in comparison with GAand evolutionary computing. That is, there are two colonies inPSO. One is the colony comprising all the personal best positions,and the other is the colony comprising all the personal positions.The colony of the current personal positions is obtained via ad-vantageous information sharing among the global best position, thepersonal position, and the corresponding personal best positions.Thus, this colony is used for the search for improved solutions incurrent iteration. The colony of the personal best positions collectsthe personal optimal solutions obtained through the search history.Therefore, the average fitness of the colony of the current personalpositions tends to be improved as the solutions evolve, while thefitness of each personal best position will continue to improveduring the evolution. This is a very unique attribute for the PSOalgorithm. It seems that it is due to the introduction of this uniqueattribute that the PSO algorithm generally exhibits a higher con-vergence rate than GA. Although GANN algorithms can change itsaverage fitness function as the solution is evolving, the colony inHPSONN that is composed by best particles in the population canchange its fitness function of each individual as the solution isevolving. It shows again that the introduction of network archi-tecture optimization into the algorithm can avoid overfitting tosome extent and improve the characteristic performance of thetrained ANN.

The ANN models tended to be overfitted in GANN andPSONN during the later period of training for both datasets, whileHPSONN continued to generate sound ANN models throughoutthe training period. These results indicate that ANN with theintroduction of HPSONN pruning architecture is essential for

Figure 5. Convergence curves for GANN (a), PSONN (b), HPSONN(c) for distamycin and distamycin-like derivatives data. Curve 1:RMSE for training set obtained with the most fitted model in eachgeneration. Curve 2: RMSE curve for predicted set calculated by thismost fitted model for the training set. Curve 3: the average RMSE forthe training set of all the RMSs obtained by all models of eachgeneration.

Figure 6. Calculated vs. observed BRR using HPSONN modeling fordistamycin and distamycin-like derivatives data.


circumventing the problem of overfitting. One can also observethat both PSONN and HPSONN gave an improved convergencerate compared with GANN, indicating that the PSO algorithmprovides a higher convergence rate than the conventional GA.Therefore, the integration of the architecture pruning and PSOalgorithm in the proposed HPSONN guarantees not only betterprediction performance but also faster convergence than GANN.

Selection of Parameters for HPSONN

In a general way, multilayer feed-forward neural networks aresensitive to the number of hidden layer neurons. As architectureoptimization is performed in HPSONN and the useless weight areset to 0 automatically, increasing of the hidden layer neurons havelittle impact upon network performance. In this experiment, eventhe hidden layer neurons were sets to 50, and symptoms of over-fitting rarely occurred. As small networks could reduce the enor-mous search spaces in learning and are usually faster and cheaperto build, the hidden layer neurons were selected as 10 and 20.

Parameter in the fitness function [eq. (10)] is the weightingcoefficient between the accuracy and complexity of the network.The larger the value of the parameter , the simpler the networkarchitecture. But this may hamper convergence of the network, andthe algorithm may converge to a poor solution. On the other hand,a small value of parameter is favorable for the algorithm toconverge more easily but possibly to lead to overfitting. Accord-ingly, is set to 0.1 by experience to keep a balance between theaccuracy and complexity of the network.

The restriction factor � in the continuous version of PSOdetermines the rate of the position change for a particle. The valueof � is undertaken by a preliminary examination. Setting � as asmall value, the algorithm would make a deep search in a localregion. That may lead to overfitting, and the algorithm tends toconverge to local optima. But a small value of � would bebeneficial for reaching a sufficiently low RMSE. The value of �was selected as 0.6 to reach a low RMSE and to jump out from thelocal minima.

Conclusion

Finding the appropriate network architecture and simultaneouslyits connectivity is essential in ANN training. In this article, acontinuous version of PSO is used for weight training of ANN, andthe modified discrete PSO is applied to find the appropriate net-work architecture. The network structure and connectivity are

trained simultaneously. The two versions of PSO can jointly searchthe global optimal ANN architecture and weights. A new objectivefunction is formulated to determine the appropriate network archi-tecture and optimum value of the weights. Bioactivities of thearomatic amines and distamycin and distamycin-like derivativeswere predicted by the proposed training algorithm. The resultshave demonstrated that the proposed method is a useful tool fortraining ANN, which converges quickly towards the optimal po-sition and can avoid overfitting to some extent.

References

1. Qing-zhang, Lv.; Guo-li, Shen.; Ru-qin, Yu. J. Comput Chem 2002,23, 1357.

2. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Parallel DistributedProcessing: Exploration in the Microstructures of Cognition; MITPress: Cambridge, 1986.

3. Svozil, D.; Kvasnicka, V.; Pospichal, J. Chem Intel Lab Syst 1997, 39,43.

4. Hibbert, D. B. Chem Intel Lab Syst 1993, 19, 277.5. Lavine, B. K.; Moores, A. J. Anal Lett 1999, 32, 433.6. Eshelmen, L. J.; Schaffer, J. D. Preventing Premature Convergence in

Genetic Algorithms by Preventing Incest, ICGA,91; Kaufmann (Mor-gan): California, 1991.

7. Kennedy, J.; Eberhart, R.. IEEE Int 1 Conf. on Neural Networks,Perth, Australia, 1995, p. 1942.

8. Shi, Y.; Eberhart, R. In IEEE World Congress on ComputationalIntelligence, 1998, p. 69.

9. Clerc, M.; Kennedy, J. IEEE Trans Evolut Comput 2002, 6, 58.10. Shi, Y.; Eberhart, R. In: Proc congress on evolutionary computation,

2001.11. Shen, Q.; Jing, J. H.; Yu, R. Q. Eur Pharm Sci, to appear.12. Benigni, R.; Giuliani, A.; Franke, R.; Gruska, A. Chem Rev 2000, 100,

3696.13. Cozzi, P.; Beria, I.; Biasoli, G.; Caldarelli, M.; Capolongo, L.;

D’Alessio, R.; Geroni, C.; Mazzini, S.; Ragg, E.; Rossi, C.; Mongelli,N. Biorg Med Chem Lett 1997, 7, 2985.

14. Cozzi, P.; Beria, I.; Biasoli, G.; Caldarelli, M.; Capolongo, L.; Geroni,C.; Mongelli, N. Bioorg Med Chem Lett 1997, 7, 2979.

15. Baraldi, P.G.; Cozzi, P.; Geroni, C.; Mongelli, N.; Romagnoli, R.;Spalluto, G. Bioorg Med Chem 1999, 7, 251.

16. Cozzi, P. Farmaco 2001, 56, 57.17. Baraldi, P. G.; Beria, I.; Cozzi, P.; Bianchi, N.; Gambari, R.; Romag-

noli, R. Bioorg Med Chem 2993, 11, 965.18. Viswanadhan, V. N.; Ghose, A. K.; Revankar, G. R.; Robins, R. K.

J Chem Inf Comput Sci 1989, 29, 163.19. Hall, L. H.; Kier, L. B.. J Chem Inf Comput Sci 1991, 31, 76.20. Hall, L. H.; Kier, L. B.. J Chem Inf Comput Sci 1995, 35, 1039.


hybridized particle swarm algorithm for adaptive structure training of multilayer feed-forward...

Documents